Summary of TRM Error Detection Meeting For L1 AOIT TRM's: Basic idea: detect FIFO Empty, FIFO Full, Missing Subsys Gap, and Unexpected Subsys Gap errors for all AOIT's which are both (a) used, and (b) FIFOed. Interrupt TCC upon error, and also force all AOIT's to a defined state (probably all low) using the Maginot Line during this time. At initialize time, minimize the possible impact of unused channels by disabling their error detection logic and forcing their outputs to a known state. Initialize Time --------------- - all FIFOed Terms - full error detect - disabled Interrupt Request (*) - FIFO Mode - set up in TRICS itself - all unplugged FIFOed Terms - no error detect - disabled Interrupt Request (*) - Test Data Register Mode - set up in TRICS Init Auxi (overwrites intrinsic TRICS settings described above) - all Bypassed Terms (i.e. 255:240) - no error detect - disabled Interrupt Request (*) - Bypass Mode - set up in TRICS itself Run Time -------- - Any AOIT in Specific Trigger or Exposure Group gets its associated FPGA - no change to error detection or mode - Enabled to generate Interrupt Request (*) - All other AOIT gets its associated FPGA - no change to error detection or mode - Disabled to generate Interrupt Request (*) (*) Lots of Interrupt control bits exist - in TRM: - each source of interrupt (FIFO Not Empty vs. FIFO Error) has an enable in CSR - overall chip interrupt enable also in CSR - in VME FPGA: - each chip has interrupt enable - overall card interrupt enable Note that the 2 per-chip interrupt control bits (one in each TRM and one in the VME FPGA) are essentially equivalent, but they both must be enabled for interrupts to be allowed. At initialize time we could choose to set the overall chip interrupt enable in the TRM but clear it in the VME FPGA. Then at run time TCC would only need to touch the VME FPGA. Or TCC could change them both at run time, or always leave them set in the VME FPGA but change them in the TRM at Run time. I don't really have a strong opinion on which makes sense, does one look better from the software POV? Monitor Data ------------ - every 5 s for each FPGA - read FIFO Status Register - clear any errors that are set (*) - read Output Term Scaler Registers - display for all FPGA's - FIFO Status Register - for "real" terms will be real status - for unplugged or Bypass will be always clear - Rates for each output term (*) is this compatible with the interrupt service routine? This is what we wrote on the board... In the case of FIFO Error ------------------------- - System automatically flips L1 Maginot Line - only AOIT TRM's listen to Maginot Line - mask off in BSF to FEBz, Disable TRM's - System automatically causes TCC interrupt request - L1 Accepts stopped until TCC clears errors - TCC runs its interrupt service routine: - disable receiving new interrupts from this source - Pause L1FW - Read all FPGA FIFO Status - increment software error counters - For all FPGA's with errors - clear errors - read FIFO Address Pointer Register until FIFO has not been in reset for 100us - verify error registers remain cleared - wait for 1 second - Resume L1 FW - enable receiving new interrupts from this source Hardware wiring --------------- - M123 Upper Backplane VME IRQ4* must be tapped and both differential ECL and single-ended TTL copies must be made - differential ECL copy must be INVERTED and delivered to L1 Helper FPGA MSA Input 32 - TTL copy must be delivered to Ironics card bit xxxx (inverted?) For other L1 TRM's (FE Bz, Individual Disable, Global Disable): Basic idea: no errors detectable in these TRM's. Also these TRM's should not force their outputs to a predefined state while Maginot Line is high. At initialize time, do not attempt to distinguish between used and unused channels in TRM setup. We have additional lines of defense against bad data entering on nominally unused channels. Initialize Time --------------- - no error detection - disable interrupt generation - all in Bypass mode - disable Maginot Line (HQ_TS(1)) in BSF, as described in BSF Initialization file Run Time -------- - no changes made at run time Monitor Data ------------ - every 5 s for each FPGA - read Output Term Scaler Registers - display for all FPGA's (or all "used" FPGA's?) - Rates for each output term Hardware Wiring --------------- - no special hardware wiring is required For L2 TRM's: Basic idea: detect FIFO Empty and FIFO Full Errors only. Do not interrupt TCC or otherwise stop the flow of L2 decisions. Count on Front Ends to notice that we have caused a problem and raise SCL Initialize Request. As part of SCL Initialize Request response, TCC will poll the appropriate FIFO Status Registers to see if the L2 FW is reporting any problems. Each of the L2 TRM's provide a FIFO Not Empty output via their P5 I/O outputs. This is routed to the L2 Helper FPGA and is used to control the L2 Helper's state engine. Initialize Time --------------- - enable Full/Empty Error Detection - also enable FIFO Not Empty signal generation as described in the TRM Initialization file Run Time -------- - no changes made at run time Monitor Data ------------ - every 5 s for each FPGA - read FIFO Status Register - clear any errors that are set (*) - read Output Term Scaler Registers - display for all FPGA's - FIFO Status Register - for "real" terms will be real status - for unplugged or Bypass will be always clear - Rates for each output term (*) is this compatible with the interrupt service routine? In the case of FIFO Error ------------------------- - Only get notification via SCL Initialize Request from one or more Front Ends - No automatic stoppage of L2 Decisions - TCC runs its interrupt service routine (details may be incorrect, I have extrapolated from the L1 ISR): - disable receiving new interrupts from this source - Pause L1FW - Read all FPGA FIFO Status - increment software error counters - For all FPGA's with errors - clear errors - read FIFO Address Pointer Register until FIFO has not been in reset for 100us - verify error registers remain cleared - wait for 1 second - Issue SCL Initialize - Wait for all Front-Ends to acknowledge - Drop sCL Initialize - Resume L1 FW - enable receiving new interrupts from this source