D-Zero Hall L1 Framework and L1 Calorimeter Trigger Logbook --------------------------------------------------------------- Log book for 1993 is in D0_HALL_LOGBOOK.LBK_1993 Date: At: Topics: .............................................................................. Date: 30-DEC-1994 At: MSU TCC Problem - Jan (just back from vacation) called They were trying a new COOR (which was unrelated to this problem, but caused a request for an initialize), after the initialize, TRGMON stayed stale, with no sptrg #31, no lights on the L1FW This is identical to 25-MAY-1994,12-JUL-1994 and probably 13-JUN-1994 Trying to do a directory on TCC produces the error message: %DIRECT-E-OPENIN, error opening D0HTCC::DUA0:[TRIGGER]TRICS_*.LOG; as input -RMS-F-NET, network operation failed at remote node; DAP code = 01F77C54 TRICS had problem writing to its logfile at one time, and switched mode: to fly with NO logfile. But TCC still tries to get its input files (init_auxi, reset,...) from the disk. There was no particular rush today, so Philippe tried to see if he could find a possible "emergency recovery" action (in case this happen in the middle of a run, and we don't want to loose scaler information). So Ph. changed the variable holding the location of TRICS's command files. Philippe then told TCC to initialize, and it did well with the L1FW (lights flashed, and sptrg #31 appeared) but the initialization got in trouble when it reached the L1.5CT and the load from_local_disk command. The initialization seemed to hang, but was just slow, as it had to timeout each of the 12 EXE file OPEN. Reboot TCC and everything looks normal. There was an access after that. The disk loss probably occured around 6 am (last entry in MPOOL_SERVER.LOG) TCC didn't try reaching its disk (after giving up on writing logfiles) and there was no problems or symptoms for COOR, as long as no request for initialization, or begin/end run file was sent. For some of the earlier entries, we heard that there had been other network problems at DZero, and Jan had to reboot a bunch of L2 nodes around the same time. But this time it didn't appear so, at least there was no entries in the logbooks... .............................................................................. Date: 23-24-DEC-1994 At: DZero Topics: Deliver the cost estimates and descriptions of the Run II equipment to Jim and have a meeting with him about all of this, run some Cal Trig Random Cell tests. Deliver the Dec94 upgrade description and cost estimate to Jim Christenson and talk with him about it for 15 or 20 minutes. Mail all the 5 files to Mike Tuts at Mikes request. Between Stores, Run some tests of the L1 Cal Trigger. Setting up with eta 1:20 and all options except do not check the Framework terms; then twice start seeing errors after about 4k loops of CalTrig_Random. All of the errors are Px Py off by 4096. Note that the system was not locked on LU page 4 like it was last week. If the same loop was asked for again, the same error always occured again. S-HTT/PAR%rand% Loop 1000/900000, Error Count is 0 S-HTT/PAR%rand% Loop 2000/900000, Error Count is 0 S-HTT/PAR%rand% Loop 3000/900000, Error Count is 0 S-HTT/PAR%rand% Loop 4000/900000, Error Count is 0 %% time: 23-DEC-1994 14:47:49.10 Global Py Momentum Sum is -490 instead of -4586, T1 trunc = 4096 Pick was 218,HD,POS,E_20,P_5,LUP_8-2-7-7,EMET_REF,REF_0,244,CMP_3 Loop 4329/900000, Error Count 1. Continue? Same Loop? ReSynch? Global Py Momentum Sum is -589 instead of -4685, T1 trunc = 4096 Pick was 195,HD,POS,E_11,P_28,LUP_2-8-7-7,TOTET_REF,REF_3,128,CMP_3 Loop 4356/900000, Error Count 2. Continue? Same Loop? ReSynch? Global Py Momentum Sum is -999 instead of 3097, T1 trunc = 0 Pick was 86,HD,NEG,E_13,P_13,LUP_7-8-7-7,TOTET_REF,REF_1,78,CMP_2 Loop 4427/900000, Error Count 3. Continue? Same Loop? ReSynch? Start over and Try again . S-HTT/PAR%rand% Loop 1000/900000, Error Count is 0 S-HTT/PAR%rand% Loop 2000/900000, Error Count is 0 S-HTT/PAR%rand% Loop 3000/900000, Error Count is 0 S-HTT/PAR%rand% Loop 4000/900000, Error Count is 0 %% time: 23-DEC-1994 14:51:59.89 Global Px Momentum Sum is -353 instead of -4449, T1 trunc = 4096 Pick was 240,EM,NEG,E_17,P_8,LUP_7-3-7-7,EMET_REF,REF_2,112,CMP_3 Loop 4231/900000, Error Count 1. Continue? Same Loop? ReSynch? Global Py Momentum Sum is -243 instead of -4339, T1 trunc = 4096 Pick was 67,HD,NEG,E_8,P_3,LUP_5-4-7-7,HDET_VETO,REF_0,240,CMP_1 Loop 4444/900000, Error Count 2. Continue? Same Loop? ReSynch? Global Py Momentum Sum is -67 instead of 4029, T1 trunc = 0 Pick was 169,EM,POS,E_1,P_22,LUP_8-5-7-7,HDET_VETO,REF_0,36,CMP_3 Loop 4533/900000, Error Count 3. Continue? Same Loop? ReSynch? Global Px Momentum Sum is 544 instead of 4640, T1 trunc = 0 Pick was 188,HD,POS,E_19,P_31,LUP_6-2-7-7,EMET_REF,REF_0,192,CMP_3 Loop 4582/900000, Error Count 4. Continue? Same Loop? ReSynch? Global Px Momentum Sum is 2027 instead of -2069, T1 trunc = 4096 Global Py Momentum Sum is 619 instead of -3477, T1 trunc = 4096 Pick was 201,HD,NEG,E_15,P_12,LUP_3-7-7-7,TOTET_REF,REF_2,191,CMP_1 Loop 4880/900000, Error Count 5. Continue? Same Loop? ReSynch? Note that these errors are: "Global Px Momentum Sum" and "Global Px Momentum Sum" errors. Last week we saw "Cell Px" errors. What goes on at high eta that is funny and can cause errors of 4096 when you allow the LU page to move around?? Give up on the above attack and try working in just eta 1:16. Setting up with eta 1:16 and all options except do Not check the Framework terms then we run 600k loops of CalTrig_Random with zero errors. Now try locking on LU page #4. Setting up with eta 1:20 and all options except: lock on LU page #4 and do Not check the Framework terms; then twice run 100k loops of CalTrig_Random with zero errors. S-HTT/PAR%rand% Loop 100000/100000, Error Count is 0 S-HTT/PAR%rand% Loop 100000/100000, Error Count is 0 .............................................................................. Date: 15-16-DEC-1994 At: DZero Topics: Fix CTFE ref supply at eta=-9:-12 phi=16 Reset pedestals for this CTFE Run random test (errors in Px card) Install a timing marker cable in M102 Reboot TCC, Atlas meetings, ECB meeting about central detectors, talk with Mike Matulik and Marvin Johnson. Investigate pedestal drift problem on CTFE card at eta=-9:-12 phi=16. The problem is traced back to a ceramic bypass cap in the distribution network for the the -1 V ref supply to the ADC for the channel at eta = -12. This card SN#326 was fixed and put back into the system. The bypass cap that shorted was on U294. After returning to the old Init_DAC_Bytes.LSM file, the pedestals for this card are now around 10 ADC counts. It must be that this bad capacitor had been sick for a while. Run find_dac on this card to find the correct new values and update Init_DAC_Bytes.LSM. Copy the new Init_DAC_Bytes.LSM file to MSU. The "special" Init_DAC_Bytes.LSM file that had been used for one or two days to compensate for the CTFE "drift" was deleted. Run 100,000 loops of random test on eta 1:16 and all phis, all pages, all reference sets, including the large tile andor term test in the L1 FW. --> No error detected. Run loops of random test on the full eta 1:20 coverage, limited to lookup page #4 (and without checking the andor terms). A series of errors appear, all centered around the Tier #1 Px card in crate +17:20 ; 25:32. E-HRD/TST%rand% Cell Px =2620, CAT2 Thrsh CMP_0 =2620 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:24:38.94 E-HRD/TST%rand% Pick was 209,EM,POS,E_18,P_31,LUP_4-4-4-4,TOTET_REF,REF_3,108,CMP_0 %% time: 15-DEC-1994 19:24:39.05 P-HTT/PAR%rand% Loop 17193/100000, Error Count 1. Continue? Same Loop? ReSynch? %% time: 15-DEC-1994 19:24:39.15 (error not repeated by redoing same loop) E-HRD/TST%rand% Cell Px =2681, CAT2 Thrsh CMP_0 =2681 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:28:17.62 E-HRD/TST%rand% Cell Px =2681, CAT2 Thrsh CMP_1 =2582 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:28:17.72 E-HRD/TST%rand% Cell Px =2681, CAT2 Thrsh CMP_2 =2587 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:28:17.83 E-HRD/TST%rand% Cell Px =2681, CAT2 Thrsh CMP_3 =2575 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:28:17.93 E-HRD/TST%rand% Pick was 238,EM,POS,E_18,P_32,LUP_4-4-4-4,EMET_REF,REF_3,8,CMP_0 %% time: 15-DEC-1994 19:28:18.03 P-HTT/PAR%rand% Loop 18423/100000, Error Count 2. Continue? Same Loop? ReSynch? %% time: 15-DEC-1994 19:28:18.13 (error not repeated by redoing same loop) E-HRD/TST%rand% Cell Px =2612, CAT2 Thrsh CMP_3 =2612 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:33:44.57 E-HRD/TST%rand% Pick was 39,EM,POS,E_19,P_32,LUP_4-4-4-4,EMET_REF,REF_1,31,CMP_3 %% time: 15-DEC-1994 19:33:44.68 P-HTT/PAR%rand% Loop 19326/100000, Error Count 3. Continue? Same Loop? ReSynch? %% time: 15-DEC-1994 19:33:44.77 (error repeated 4 times by redoing same loop, then not repeated after 6 more) E-HRD/TST%rand% Cell Px =2612, CAT2 Thrsh CMP_0 =2597 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:34:39.80 E-HRD/TST%rand% Cell Px =2612, CAT2 Thrsh CMP_1 =2612 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:34:39.90 E-HRD/TST%rand% Cell Px =2612, CAT2 Thrsh CMP_2 =2602 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:34:40.01 E-HRD/TST%rand% Cell Px =2612, CAT2 Thrsh CMP_3 =2587 but comp bit#5:3 =101 %% time: 15-DEC-1994 19:34:40.11 E-HRD/TST%rand% Pick was 143,HD,POS,E_18,P_27,LUP_4-4-4-4,HDET_VETO,REF_3,182,CMP_1 %% time: 15-DEC-1994 19:34:40.22 P-HTT/PAR%rand% Loop 20777/100000, Error Count 4. Continue? Same Loop? ReSynch? %% time: 15-DEC-1994 19:34:40.32 (we did not try to repeat this one) In all this cases, the card claims that its sum is "smaller" than the comparator threshold, when it should say "equal" or "greater". The front LED matched what TCC read out (LEDs off). Dan shoved on the card (which "didn't move"). But we had time to run another 20,000 loops and there were no more errors. Dan Installed a timing marker cable for K.Johns. This is LEMO cable sticking out of the door of rack M102. This cable is plugged in a CTMBD for the Start Digitization DIGIMEM backplane of this rack, and shows the IML Latch Clock signal, which is MTG TSS #2, and CTMBD monitor signal #F. We were able to install this cable without turning power off. We opened the front door of M102 only about 5", with Steve carefully watching the air flow sensor, and plugged the cable into the CTMBD. Checked this signal against the BC T0 timing reference from Carmen's Master Clock and all looked fine. TCC was rebooted before it was returned to the shifters for the next quiet time. ATLAS meeting at Argone with video link to UCI on the 15th. Meeting on the 16th with a failed video link to CERN. Talk with Mike Matulik about SLIM and give him the first written information about SLIM. Talk with Marvin about Mike working on SLIM and about using SAR in our new L1 Run II equipment, and about FPGA's, and about VHDL of the Run II equipment. .............................................................................. Date: 14-DEC-1994 At: MSU Topics: Edit the Init_DAC_Bytes.LSM file Edit the Init_DAC_Bytes.LSM file to compensate for all channels on the CTFE card eta -9:-12 phi 16 having pedestals that have drifted down from 8 to 4. The following changes were made: -9,16 EM move DAC pedestal from 38 to 49. -9,16 HD move DAC pedestal from 38 to 49. -10,16 EM move DAC pedestal from 33 to 44. -10,16 HD move DAC pedestal from 30 to 41. -11,16 EM move DAC pedestal from 27 to 38. -11,16 HD move DAC pedestal from 38 to 49. -12,16 EM move DAC pedestal from 37 to 48. -12,16 HD move DAC pedestal from 43 to 54. can't work with 54 so 49 In TrgCur: at Fermi there are now two Init_DAC_Bytes.LSM files. Ver 3 is the one that was in use up until tonight. Ver 6 is the temporary one that we will use until this CTFE is fixed. Only the ver 6 file is in D0HTCC::DUA0:[Trigger]. This temporary ver 6 file was NOT copied to MSU. For the -12,16 HD TT I wanted to use a DAC pedestal value of 54 but the TRICS READ_LOAD DAC Pedestal file function just said "Bad_Failure" when I had this big of a value in the pedestal file. So I moved 54 to 49. The Compare_DAC_Byte.Exe would also not run when the value 54 was in the file. The conversion slope of this CTFE looks OK, i.e. when I load 255 in the pedestal DAC then the ADC reads something like 78 or 80. So for now I assume that it is just a "drift" in the offset reference supply and that the +- 1 Volt ADC supplies are still operating OK. .............................................................................. Date: 12-DEC-1994 At: MSU/ Topics: "Test-load" EM_Fraction L1.5 CT Fermi DSP code, Calls from John Butler with questions about the End of Run files. EM_Fraction (Tool #3) DSP code was test-loaded between stores #5270 and #5271. The configuration was: Term EM Et 1x2 Isolation EM_Fraction Global L1 Spec Number Threshold Threshold Threshold Cnt Thresh Triggers ------ --------- --------- ----------- ---------- -------- 0 3.0 GeV 0.1 0.1 1 1 2 1 3.0 GeV 0.2 0.1 1 3 i.e. a "goof-off" or testing configuration which has nothing to do with global running. The typical type of configuration file errors were found and fixed. Bill Cobau made the configuration files. Dan Owen collected ~200 "noise" events using this configuration (with Pass_one_of (100) ). This code was removed before global running on store #5271 began. Steve verified (via READLOG) that the global run for store #5271 used the correct (old) L1.5 Cal Trig code, default parameters, and COOR parameters, even though the Framework had been initialized while Steve was swapping files on D0HTCC to return to the old L1.5 CT operation. TRGMON also showed rates and programming compatible with the old operation. END of RUN File Questions In the evening, Captain J.Butler calls Philippe about problems with the end of run summary for run #86834 (N.Amos couldn't be reached, and we were the last ones to mess with the trigger). Philippe looks in TCC's logfile and sees nothing too particular. The sequence of coor messages was: WRT_HOST BEG_RUN LOGGER$BRD:TCC_BEGIN_0086834.INFO %% time: 16:49:24.90 WRT_HOST PAUS_RUN LOGGER$BRD:TCC_PAUSE_0086834_081.INFO %% time: 17:30:03.46 WRT_HOST PAUS_RUN LOGGER$BRD:TCC_PAUSE_0086834_082.INFO %% time: 17:30:06.84 WRT_HOST RESU_RUN LOGGER$BRD:TCC_RESUM_0086834_036.INFO %% time: 17:32:26.59 WRT_HOST END_RUN LOGGER$BRD:TCC_END_0086834.INFO %% time: 17:43:25.85 Note that the pause and resume sub-numbering seems to follow a sequence that is not reset with run numbers, and independent of each other, but looking in LOGGER$BRD, this seems to be standard. There is also two PAUSEs for one RESUME (COOR only knows why!), but this shouldn't be a problem. The run lasted about 1 hour (54 mn). Philippe found the Begin and End Run files in COPYCFG$ARCHIVE, and checked that the beam crossing numbers didn't roll over. Computing the number of beam crossing elapsed gives 911 E6 which matches the 286.3 * 60 * 54. Computing the livetime (enable sptrg#30/beam X) gives 91.3 %, which is typical, unlike the reading that Butler had of >98%, which propagated to other numbers in the run summary. Philippe told John B. that there was nothing really wrong and that they should indeed go after Norm to recover this good run. The following run gave a more normal run summary. .............................................................................. Date: 2-DEC-1994 At: MSU Topics: MTG PROM Files: Verify current versions, Delete old Timing_Specification_Files, Need to investigate the problem with the FE_Busy_4A TSF file, Copy some important information from the paper log book so that we have it at MSU or D0 The following is a list of the MTG PROM File Versions that should now be running in the system at D-Zero MTG Ch's L1 Cal Trig MTG Direct-In-Test-Trig MTG ERPB MTG ----- --------------- ----------------------- -------- 1-8 1M 1K 1C 9-16 2L 1K 2A 17-24 3K 1K 2A 25-32 4M 1K 2A MTG Ch's FE-Busy MTG L1 Framework MTG Hold Transfer MTG ----- ----------- ---------------- ----------------- 1-8 - 1R 1L 9-16 - 2L 1L 17-24 - 3M 1L 25-32 4A 4L 4L MTG Ch's L15 FW Control MTG L15 FW Receive MTG L15 Veto Conf MTG ----- ------------------ ------------------ ----------------- 1-8 1A 1B 1B 9-16 2A 1B 1B 17-24 3A 1B 1B 25-32 4A 1B 1B MTG Ch's Start Digitize MTG ----- ------------------ 1-8 1L 9-16 1L 17-24 1L 25-32 4M Today the following old versions of Timing_Specification_Files were deleted both at MSU and Fermi: L1 Cal Trig files: 1L, 2K, 4L ERPB MTG files: 1A, 1B Note that L1 Framework MTG file 3N is being kept in case we need to run the COMINT with only 6 clocks between beam crossings. There appears to be a problem with the FE_Busy_MTG_PROM_4_SN_4A Timing_Spec file. At the very least the signal names in this TSF file are wrong. And it appears the the actually up-down tick times are also wrong. Figure out what is wrong, fix this file, check it against the running part. See page #20 of the L1 & L15 Framework paper log book #2. I think that this signal should be up for two ticks starting with tick 108. Information from L1 L15 Framework Log Book #2 --------------------------------------------- The following is the list of signals on the two M101 <--> M114 Cables M101 <--> M114 Cable Number 1 log book page #18 -------------------------------------------------- Signal Pair Function ------ ----------------------- 1 Data Block Builder Busy \ 2 68k Prepair Data | 3 68k Display State | signals to the scalers in M101 4 Wait Slave Ready | 5 VBD Run DMA List | 6 Wait Find VBD Buffer / 7 L15 Stretch signal, L15 Control to the M114 MTG's 8 L15 Stretch signal, L15 Control to the M114 MTG's 9 L15 Potential \ 10 L15 Skip | signals to the scalers in the bottom of M114 11 L15 Cycle / 12 NC 13 NC 14 NC 15 Special I/O MTG Ch #25 Output "L15 Operational" to And-Or Term #110. 16 NC 17 NC M101 <--> M114 Cable Number 2 log book page #17 ----------------------------------------------------- Signal Pair Function ------ ----------------------- 1 Spec Trig #30 FSTD Output Live Beam X Clock to M114 Live Beam X Scalers. 2 Spec Trig #30 FSTD Output Live Beam X Clock to M114 Live Beam X Scalers. 3 NC 4 NC 5 NC 6 NC 7 NC 8 AND-OR Input Term #71 L0_Slow_Inter from L0 to the AND-OR Network 9 AND-OR Input Term #72 L0_Slow_Z_Good from L0 to the AND-OR Network 10 AND-OR Input Term #73 L0_MI_Flag_0 from L0 to the AND-OR Network 11 AND-OR Input Term #74 L0_MI_Flag_1 from L0 to the AND-OR Network 12 AND-OR Input Term #75 L0_MI_Flag_2 from L0 to the AND-OR Network 13 AND-OR Input Term #76 L0_MI_Flag_3 from L0 to the AND-OR Network 14 AND-OR Input Term #77 L0_Slow_Z_Center from L0 to the AND-OR Network 15 NC 16 NC 17 L0 Direct-In-Test-Trigger Front connections to the Framework Main Timing MTG log book page #21 ----------------------------------------------------------------------- Term = B2 = B7 x B8 x B9 x B13 x B14 = B15 <-- Level 15 Stretch Term = E14 = E15 <-- Latched Global Specific Trigger Fired Term = B5 <-- COMINT Write A/B Control Clear Most Recent BAR <-- B4 x Term = E5 <-- Clear Most Recent Term = E4 <-- Front-End Busy BAR Term = B3 <-- COMINT Read A/B Control Latched Global Specific Trig Fired --> E17 = E18 = E19 = E20 = E21 = Term Level 15 Stretch --> B17 = B18 x B19 = B20 = B21 = Term = E24 Input Signal ----- ------------------------------------- B3 COMINT Read A/B Control B4 Clear Most Recent BAR output E4 Front-End Busy BAR E5 Clear Most Recent input B5 COMINT Write A/B Control E15 Latched Global Specific Trigger Fired B15 Level 15 Stretch E17 Latched Global Specific Trigger Fired B17 Level 15 Stretch Front connections to the L1 Calorimeter Trigger MTG log book page #21 ------------------------------------------------------------------------ Input Signal ----- ------------------------------------- B4 Read A/B Control E5 Front-End Busy B6 Write A/B Control E29 Front-End Busy BAR B29 Clear Most Recent BAR .............................................................................. Date: 29,30-NOV-1994 At: Fermi Topics: Water Leak in M103-M104 radiator, 1-DEC-1994 Look at TT +6,23 EM which has been excluded, DAC Pedestals for TT's +17,15 and +18,15, Added more sensors to the RPSS, Work on getting Pulser runs for L15CT_PROV, Solder over the pin hole in the "U" tube in the bottom radiator between M103 and M104. The soldering when well because the radiator was blown out with building air and the area to be soldered was cleaned with both sand paper and a steel brush. This radiator appeared to have a dimple in it right where the pin hole was. This dimple is thought to be caused during the brazing process by some one who cooks the thin Cu "U" tube too much. Smoke from the soldering brought the VESDA up to a 4, i.e. just short of an alarm. The Drip Detector Strip under this radiator had to be taken out and cleaned to get the corrosion off of it. The drip detector function of the RMI was re-enabled. This required turning on both the Drip Detector "Sensor Input" and the Drip Detector "Local Alarm" switches back on. In rack M110 I installed an Air Flow sensor and a 95 deg F Temperature sensor. These are connected to appropreate input to the RPSS. I connected a "RPSS" cable and ran it over to the back of M101 for use with the PhotoHelic differential air pressure guage. See the new file TrgHard:[RPSS]PhotoHelic_to_ RPSS.txt for more details about this. I put the appropriate labels on the RPSS. Look at TT +6,23 EM which has been excluded since Sept shutdown +6,23 EM is rack M105, CBus = 1, MBA = 170, CA = 44:45, "2nd" EM channel on the CTFE card, Clock Control Register is FA = 81, bit of value 4 controls this EM channel. Well, it still looks noisy on the scope and running a L1 Cal Trig with a 3 GeV EM Ref Set threshold everywhere you get about 0.6 Hz if +6,23 EM is excluded and about 5 to 6 Hz if +6,23 EM is not exclude. The decision is to leave it excluded. Should we cut the resistor ? We looked at the Examine from this run and all the noise is coming from eta,phi,depth 11, 45, 2 in the Calorimeter. For the last couple of weeks in the Physics run Examines, Trigger Towers Eta +17 and +18 Phi 15 both EM and HD have looked a little hot or noisy. Looking at TrgMon ADC counts when there is no beam, this just looks like a CTFE pedestal "drift". I play human histogram and by hand make the following changes to Init_DAC_Bytes.LSM: +17,15 EM move DAC pedestal from 34 to 32. +17,15 HD move DAC pedestal from 31 to 29. +18,15 EM move DAC pedestal from 34 to 33. Copy the new Init_DAC_Bytes.LSM to TCC and have TRICS load it in. Also copy Init_DAC_Bytes.LSM to MSU. Note that this is the second time in the last couple of months when we have had a couple of channels on a single CTFE all "drift" at once. Jan used the compare on the new CALIB (which now looks at L1 Cal Trig data) to verify that she could see these channels move. Work with Jan and company to get the Pulser Runs for L15CT_PROV. Get the LOW amplitude run by going back to the single L1 trigger version of the CONFIG files and using TRICS to tell L15 FW that the L1 trigger does not require L15CT confirmation. This is in DATA3:[CAL]CALOR_086328_01.X_ZRD01 OK, finally get a L15CT pulser run at High amplitude. This is in the file DATA3:[CAL]CALOR_086419_01.X_ZRD01. This was also done by hand using TRICS to turn off the L15 FW so that the Cal Pulser would increment. Still need to get the Config files working. .............................................................................. Date: 25,26-NOV-1994 At: Fermi Topics: Air Flow Sensor un-Tied Down, Water Leak in M103-M104 radiator, Joan has a couple of channels for us to look at, RPSS print set. About 6 AM on the 25th the call finally comes that they have found the water leak. It is showing up in the "Dan Owen drip detector" behind M104. It appears to be controlled by the Mud Flap so we decide to leave the system running. I start out for Fermi. At Fermi I look at it while the system is running. All the drips from the end of the mud flap appear to be going onto the drip detector and then into the channel between the racks and then down between and out of the racks. This is at the radiator between M103 and M104. During the 4AM shot setup on the 26th. Pull off the shockless system G10 and the bottom mud flap between M103 and M104. The leak is from the very bottom turn around "U" tube on the bottom radiator on the M104 side of this radiator on the bottom surface of the "U" tube. It is about 1/8" of an inch into the "U" tube from the but weld. It is on the surface of the "U" tube that was stretched when the "U" tube was manufactured. One can see stress marks in this section of the "U" tube. I packed paper towels around this part of the radiator to try to make sure that the "spray" was converted into a "flow" and then reinstalled the mud flaps and the shockless system G10. During the 10 hour shutdown on the 29th an attempt can be made to solder over this section of the "U" tube. It is not corroded too badly at this time. The hole is far enough from the but weld that the area can be cleaned. The above work took 2 hours (i.e. the full time of shot setup). Estimate 4 hours for the solder job. Need hose splice and compressed air. Remove the tie down from the air flow sensor at the input to M102. Did not boot TCC and it came up just fine. Is Philippe's relatively new idea of restarting zeller at TRICS Init time curing the problem of TCC not talking to the CBus's after a long power off time? For the last couple of weeks Joan says that Trigger Towers Eta +17 and +18 Phi 15 both EM and HD have looked a little hot or noisy. I wanted to look at there pedestals during the shot setup that just passed but did not have time. We need to look at these two towers. There does not appear to be an RPSS print set at D-Zero; need to bring on here. .............................................................................. Date: 23-NOV-1994 At: MSU Topics: Air Flow Sensor Tied Down, Water Drip Power Trip of L1 -----> M101 - M102 Air Flow Sensor is still TIED DOWN RPSS Sensor <----- 16:53 CST All L1 racks power down because the RMI has detected a water drip. about 18:05 EST: Steve is called at MSU by Marcel and is told that L1 has powered down. He is told that RPSS has detected a water FLOW problem. Steve gives Marcel Dan's home telephone number (and also Steve's home telephone number). Steve tries to call Dan at home but receives no answer. 18:15 EST Edmunds is called at home by Jan and told that Detector Shifters and Joan are investigating looking for the water leak. Edmunds suggests places to look for the water. Edmunds is told that he will be called as soon as the search for the leak is complete and that he will be kept informed. Edmunds and Jan discuss the possibility to turn off the drip detector part of the RMI and leave the rest of RPSS running. About 18:17 EST Steve calls Joan in control room. He is told by Joan that Jan is on the phone with Dan, and that Dan has been informed of the current situation, and that Dan has suggested turning off drip detector. Steve tells Joan that Steve will be watching at MSU for a while longer. At the conclusion of this short telephone conversation Steve is under the (mistaken) assumption that, since Jan and Dan have been in contact, Dan is "in the loop." Steve does NOT call Dan. Steve has no further contact with D0 Control Room. About 17:25 CST Someone (not Edmunds) takes the decisions: There is no water leak. It is OK to turn off the RMI drip detector. It is OK to Power Up L1. This decision was taken in D0 Control Room with no telephone call to MSU. 17:30 CST L1 is Initialized. It has a couple of problems. In a mail message at 19:09 EST Steve reports on the problems: (1) after the 1st INITIAL, a CTFE comparator (Total Et Ref Set 0 at -18,21) read back 254 after being programmed to 255. (2) after the 1st download, none of the muon Specific Triggers had any And-Or rate. (3) after the 2nd download, first Geographic Section 0 was 100% Front-End Busy, then Geo Sect 0-13 (with the exception of 1 and 5) thrashed around a lot. I do not understand anything about problem 1. Problems 2 and 3 (with the exception of Geo Sec #0) are standard problems after the L1 FW has been powered off. It was not necessary to power cycle HTCC and its BA23. In a mail message at 20:04 Jan reports that the RMI drip detector has been disabled, L1 has been powered up, and "We didn't have any problems". At about 20:30 Edmunds having never heard anything from anyone comes to Physics Dept so that he can use a tube and the phone and not miss any in coming calls. He learns that all has been running since 15 minutes after the first and only call to him. Not clear who took the decision that all was OK and power should be turned back on. As of 21:15 EST there has still been only the original L1 Initialize at 18:30 so nothing has been done to investigate the problem #1 reported above about the CTFE comparator (e.g. a second Initial was not done to see if the problem would repeat itself e.g. was this were the water was spraying). Edmunds having not been contacted except for the 18:15 call and having stayed off of the phone so as not to block any incoming calls) is not aware of any of the details of the water leak investigation or the decision to turn back on (Lum was about 4.0). Just two weeks ago we had a near miss with disaster caused by a water leak in a BLS rack. Because the RMI for that rack repeatedly tripped it off people finally believed that there might be a problem. Without our L1 RMI we are flying blind. To date we have had zero false alarms from the L1 RMI drip detector so there is no reason not to take such alarms seriously. None of the people who are familiar with the past history of L1 water leaks were contacted before some one took the decision to turn L1 back on. .............................................................................. Date: 21-NOV-1994 At: MSU Topics: Look at L15CT statistics from three more nice stores. After about 13 hours of continuous running starting from Lum of about 9.5 E30 S-15C/HDL% 68k parked Status ok (Load_Code Interrupt) 20-NOV-1994 00:02:48.48 S-15C/HDL% 68k never had to Un-Stick the DSPs %% time: 20-NOV-1994 00:02:48.55 S-15C/HDL% 68k never saw any Byte Misalignment Problem in Object Lists S-15C/HDL% Reading 68k Run Counters... %% time: 20-NOV-1994 00:02:49.75 S-15C/HDL%...Orbit Master Loops Count = -1197666861 S-15C/HDL%..."That's Me" With Transfer Count = 1353435 S-15C/HDL%..."That's Me" NO Transfer Count = 858472 S-15C/HDL%..."Bystander" With Transfer Count = 3874102 S-15C/HDL%..."Bystander" NO Transfer Count = 20499323 S-15C/HDL%..."Mark&Pass" With Transfer Count = 22 S-15C/HDL%..."Mark&Pass" NO Transfer Count = 0 S-15C/HDL%..."Un-Stick" With Transfer Count = 0 S-15C/HDL%..."Un-Stick" NO Transfer Count = 0 S-15C/HDL% Put all DSPs in Reset, ready for code download %% time: 20-NOV-1994 00:02:50.54 Statistics: Number of MFP events: 22 Number of "Un-Stick" GDSP from Step D3: 0 Total Number of events processed by L15CT: 2,211,907 % of events processed by L15CT and NOT Transfered up to L2: 38.8% After about 14 hours of continuous running starting from Lum of about 9.5 E30 S-15C/HDL% 68k parked Status ok (Load_Code Interrupt) 20-NOV-1994 17:03:04.70 S-15C/HDL% 68k never had to Un-Stick the DSPs %% time: 20-NOV-1994 17:03:04.77 S-15C/HDL% 68k never saw any Byte Misalignment Problem in Object Lists S-15C/HDL% Reading 68k Run Counters... %% time: 20-NOV-1994 17:03:05.97 S-15C/HDL%...Orbit Master Loops Count = -976982064 S-15C/HDL%..."That's Me" With Transfer Count = 1450241 S-15C/HDL%..."That's Me" NO Transfer Count = 901330 S-15C/HDL%..."Bystander" With Transfer Count = 4367174 S-15C/HDL%..."Bystander" NO Transfer Count = 21556732 S-15C/HDL%..."Mark&Pass" With Transfer Count = 23 S-15C/HDL%..."Mark&Pass" NO Transfer Count = 0 S-15C/HDL%..."Un-Stick" With Transfer Count = 0 S-15C/HDL%..."Un-Stick" NO Transfer Count = 0 S-15C/HDL% Put all DSPs in Reset, ready for code download %% time: 20-NOV-1994 17:03:06.76 Statistics: Number of MFP events: 23 Number of "Un-Stick" GDSP from Step D3: 0 Total Number of events processed by L15CT: 2,351,571 % of events processed by L15CT and NOT Transfered up to L2: 38.3% After about 15 hours of continuous running starting from Lum of about 9.5 E30 S-15C/HDL% 68k parked Status ok (Load_Code Interrupt) 21-NOV-1994 11:07:51.09 E-15C/HDL% 68k Last Un-Stick Action was for a problem at %X 000000D3... E-15C/HDL%...Local DSP A2=%XB3FF801F A3=%XB3FF801F A4=%XB3FF801F A1=%XB3FF801F E-15C/HDL%...Local DSP B3=%XB3FF801F B4=%XB3FF801F B1=%XB3FF801F E-15C/HDL%...Local DSP C2=%XB3FF801F C3=%XB3FF801F C4=%XB3FF801F C1=%XB3FF801F E-15C/HDL%...Global DSP B2=%XB327000F %% time: 21-NOV-1994 11:07:51.56 E-15C/HDL%...a DSP not at D0 Un-Stick Count = 0 E-15C/HDL%...Global not at D3 Un-Stick Count = 1 E-15C/HDL%...a DSP not at D15 Un-Stick Count = 0 S-15C/HDL% 68k never saw any Byte Misalignment Problem in Object Lists S-15C/HDL% Reading 68k Run Counters... %% time: 21-NOV-1994 11:07:52.98 S-15C/HDL%...Orbit Master Loops Count = -675499484 S-15C/HDL%..."That's Me" With Transfer Count = 1446763 S-15C/HDL%..."That's Me" NO Transfer Count = 846762 S-15C/HDL%..."Bystander" With Transfer Count = 5039252 S-15C/HDL%..."Bystander" NO Transfer Count = 23282802 S-15C/HDL%..."Mark&Pass" With Transfer Count = 22 S-15C/HDL%..."Mark&Pass" NO Transfer Count = 0 S-15C/HDL%..."Un-Stick" With Transfer Count = 0 S-15C/HDL%..."Un-Stick" NO Transfer Count = 1 S-15C/HDL% Put all DSPs in Reset, ready for code download %% time: 21-NOV-1994 11:07:53.77 Statistics: Number of MFP events: 22 Number of "Un-Stick" GDSP from Step D3: 1 Total Number of events processed by L15CT: 2,293,525 % of events processed by L15CT and NOT Transfered up to L2: 36.9% All 11 of the LDSP's show in the "# of Obj Found" part of the LDSP Status Longwords that they overflowed their Object Lists i.e. found 9 or more objects. The Status Longword from the GDSP says $27 in the "Terms Answers" byte. I understand the "7" part but not the "2" part. .............................................................................. Date: 18,19-NOV-1994 At: D0 Hall Topics: Air Flow Sensor Tied Down, Bring more de-H-ed MBD's to D0 Hall, Electronics Board meeting for Muon, Inventory of CTMBD's, Find out how PhotoHelic Limit Switches work, L15CT Pulser Run ConFig file, L15CT 68k_Ser Counters for a long run. -----> M101 - M102 Air Flow Sensor is still TIED DOWN RPSS Sensor <----- Bring more de-H-ed MBD's and CTMBD's to D0 Hall Bring MBD's SN#7, and SN#17 back to D0 after de-H-ing them at MSU. Bring CTMBD SN#14 back to D0 Hall after de-H-ing it at MSU. CTMBD SN#14 had been in use in M109 Tier 2 but when we started Data Block Builder reading LTCC cards then CTMBD SN#14 had a problem with data bit of value 4 on only the first read, i.e. the first read after this CTMBD recognised its MBA. CTMBD SN#14 has had its 10H101's removed from the bus driver section. See the log book entries from 3 and 10 FEB-1994 for more details about this CTMBD. There are now 4 CTMBD's and 2 MBD's in the spares cabinet at D0 Hall. Other circuit board and 10H101 considerations: How many of the CTMBD's that are currently in use in Tier 1, Tier 2, Tier 3, and other locations have not had their 10H101's pulled. Are there any other cards in use that still have 10H101's e.g. the TLM's in the top of M102. Are there other cards still at NWA that we should recover to de-H them. Number of Rack CTMBD's Functions ---- --------- ---------------------------------------------------------- M101 1 L1 FW Timing Signals to And-Or Input Terms, bottom card M102 2 Spec Trig Fired - Start Digitize Backplane, FSTD Backplane M103 4 L1CT Final Readout, L15 Framework, two Tier 1's M104 2 two Tier 1's M105 3 two Tier 1's and one Tier 2 M106 2 two Tier 1's M107 3 two Tier 1's and one Tier 3 M108 2 two Tier 1's M109 3 two Tier 1's and one Tier 2 M110 2 two Tier 1's M111 3 two Tier 1's and one Tier 2 M112 2 two Tier 1's M114 + 1 lower M114 backplane DBSC's Foreign scalers ------- 30 number of CTMBD's in use at D0, plus 4 spare at D0, plus 1 in MSU Test Rack. ---> 35 total ?? The only non-H CTMBD's appear to be: both cards in M102, the Tier 2 in M109, the 4 spares at D0 Hall, and the card in the MSU Test rack. This leaves 27 CTMBD's that still have 10H101's. The CTMBD in the MSU Test Rack is a mess and can only be used there. Is it OK to pull just the 10H101's from the bus driver section or does one also need to replace the parts in the LED and Lemo driver sections. Replacing the parts in the CTMBD's may make sense because in principal these cards will need to continue into Run II. Replacing the parts in the TLM's in the top of M102 may also make sense because a mistake in TAS number requires a Data Cable resync and wastes a lot of time. Perhaps can do something during the February shutdown. PhotoHelic Differential Air Pressure Sensor B A Layout of pins from the rear view of the C H F AMP Hex connectoron the PhotoHelic gauge D E Connections: Upper Limit Switch: E,F Lower Limit Switch: C,D Bulb: A,B no connection H When cold the bulb has about 4.8 Ohms resistance. It may operate at very low power, i.e. mostly as an IR emiter, for long life. The detectors are photo resistors. When pulled out of their narrow slot holders they look like 2k in room light and >100k in the dark. When in their narrow slot holders they look like 120k in room light and 20k to a flashlight held at about 1 foot distance. The detectors are part No. CL905L 421. The Lower Limit detector is dark until the pressure is > the lower limit. The Upper Limit detector becomes dark when the pressure is > the upper limit. L15CT 68k_Ser counters Look at L15CT 68k Counters after 16 hours of continuous running starting from a luminosity of 9.0E30 %% time: 19-NOV-1994 08:05:16.49 E-15C/HDL% 68k Last Un-Stick Action was for a problem at %X 000000D3... E-15C/HDL%...Local DSP A2=%XDAFF801F A3=%XDAFF801F A4=%XDAFF801F A1=%XDAFF801F E-15C/HDL%...Local DSP B3=%XDAFF801F B4=%XDAFF801F B1=%XDAFF801F E-15C/HDL%...Local DSP C2=%XDAFF801F C3=%XDAFF801F C4=%XDAFF801F C1=%XDAFF801F E-15C/HDL%...Global DSP B2=%XDA27000F E-15C/HDL%...a DSP not at D0 Un-Stick Count = 0 E-15C/HDL%...Global not at D3 Un-Stick Count = 1 E-15C/HDL%...a DSP not at D15 Un-Stick Count = 0 S-15C/HDL% 68k never saw any Byte Misalignment Problem in Object Lists S-15C/HDL% Reading 68k Run Counters... %% time: 19-NOV-1994 08:05:18.28 S-15C/HDL%...Orbit Master Loops Count = -579710553 S-15C/HDL%..."That's Me" With Transfer Count = 1577255 S-15C/HDL%..."That's Me" NO Transfer Count = 949188 S-15C/HDL%..."Bystander" With Transfer Count = 4953894 S-15C/HDL%..."Bystander" NO Transfer Count = 23877085 S-15C/HDL%..."Mark&Pass" With Transfer Count = 25 S-15C/HDL%..."Mark&Pass" NO Transfer Count = 0 S-15C/HDL%..."Un-Stick" With Transfer Count = 0 S-15C/HDL%..."Un-Stick" NO Transfer Count = 1 S-15C/HDL% Put all DSPs in Reset, ready for code download 19-NOV 08:05:19.07 So in this 16 hours of continuous running we had 1 unstick GDSP from not reaching Step D3. We had only 25 MFP events (i.e. we can hardly use this for data transport error checking). There would have been almost 1000 in spill pulser events but none of these should overlap with a "physics" trigger that is using L15CT so that should not have caused the GDSP Step D3 Timeout. Once again all DSP status longwords actually look OK so this must have taken just slightly over the timeout period. All 11 of the LDSP's show in the "# of Obj Found" part of the LDSP Status Longwords that they overflowed their Object Lists i.e. found 9 or more objects. The Status Longword from the GDSP says $27 in the "Terms Answers" byte. I understand the "7" part but not the "2" part. Number of events processed by L15CT (i.e. "N" + "n") 2526443 Fraction of the time when L15CT processed the event AND the event was NOT transfered up to L2 37.6% L15CT Pulser Run The problem with the L15CT pulser run having only one pattern is that the pulser needs the TAS protocol to complete in one Beam Crossing cycle in order to thank that all when OK and thus it should increment. Jan is setting up a "parallel" pure L1 trigger to cause this to happen. It now looks like: CFG_CAL:CALOR_PLS_TRIG_LOW.CAL;1 @CFG_CAL:Calor_pls_trig_low.trig @CFG_CAL:calor_pls_trig.lev1 <-- sys "With cal L15 requirements" @CFG_LV0:L2_pass_fail.filt @CFG_CAL:calelec_detector.req @CFG_LV0_CRATE:trig_level1.req @CFG_CAL_CRATE:cetec_random.req @CFG_CAL_CRATE:cal_inspill_reset.req @CFG_CAL_CRATE:cal_norm_ccn.req @CFG_CAL_CRATE:cal_norm_ecnw.req @CFG_CAL_CRATE:cal_norm_ecne.req @CFG_CAL_CRATE:cal_norm_ccs.req @CFG_CAL_CRATE:cal_norm_ecsw.req @CFG_CAL_CRATE:cal_norm_ecse.req @CFG_CAL_CRATE:trig_l15ct.req @CFG_CAL_CRATE:calor_mpls_low.req @CFG_CAL:Calor_pls_trig_l15.trig @CFG_CAL:calor_pls_trig_l15.lev1 @cfg_cal:cal_pls_trig.l15 <---- file does not exist @CFG_CAL:calelec_detector.req @CFG_LV0_CRATE:trig_level1.req @CFG_CAL_CRATE:cetec_random.req @CFG_CAL_CRATE:cal_inspill_reset.req @CFG_CAL_CRATE:cal_norm_ccn.req @CFG_CAL_CRATE:cal_norm_ecnw.req @CFG_CAL_CRATE:cal_norm_ecne.req @CFG_CAL_CRATE:cal_norm_ccs.req @CFG_CAL_CRATE:cal_norm_ecsw.req @CFG_CAL_CRATE:cal_norm_ecse.req @CFG_CAL_CRATE:trig_l15ct.req The much easier thing to do is to use just one Spec Trig and just not tell the L15 Framework that this Spec Trig requires L1.5 confirmation. .............................................................................. Date: 9,10,11-NOV-1994 At: D0 Hall Topics: Air Flow Sensor Tied Down, Bring Spare VME Modules to D0 Hall, Bring more de-H-ed MBD's to D0 Hall, Try to Look at And-Or IMLRO T5 mismatch, L15CT_Pulser Runs for L15CT_PROV, Test having TCC read L15CT 68k_Ser scalers and DSP status while L15CT is processing events, Trouble starting data taking for the Friday morning store -----> M101 - M102 Air Flow Sensor is still TIED DOWN RPSS Sensor <----- Bring the following VME Modules to D0 Hall: Short 214 MSU SN#1 FANCY 214 MSU SN#13 MVME-135 MSU SN#2 These were added to the existing stock of spare modules already at DZero: "V" Type 214 MSU SN#5 IRONIC I/O MSU SN#7 VMX DRIVER MSU SN#4 TERM SELECT P2 MSU SN#2 All 7 of these modules were moved to the bottom of the Spare Cards Storage Rack (where the spare Hydra-II is also stored). The TrgBook:VME_Inventory.LBK file was brought up to date. Bring more de-H-ed MBD's to D0 Hall Bring MBD's SN#5, SN#11, SN#15 back to D0 after de-H-ing them at MSU. Location MBA Pull MBD Install MBD ---------- ----- ---------- ------------- M101 FSTD 132 SN#17 SN#5 M101 Busy 135 SN#7 SN#15 M114 Upper 105 SN#4 SN#11 MBD SN#11 which was just installed in M114 upper backplane, MBA=105, is wired as a AND-OR MBD. MBD SN#4 which was just pulled out of M114 upper backplane had no Timing Signal wire wrap wiring on it. Take MBD's SN#4, SN#7, and SN#17 to MSU for De-H-ing. MBD SN#4 is Rev A. We have been making a mistake since day one with the way that we setup the MBD's and the CTMBD's. We have not been wiring anything to the 10H115 receivers for the timing signals that are not used on the Specific Backplane CBus. Thus for a 10H115 that services some used channels and some "nothing connected" channels, the input bias network is screwed up for all of its channels. This is probably not too big of a problem with the CTMBD's where the 10H115's are directly driven by 10H116's but it is still wrong event there. It is definitely a problem on the MBD's where the 10H115's are trying to receive timing signals over long cables. What we should do on the MBD's is start wiring the unused channels of the Specific Backplane CBus to channel 16 of the Timing Bus i.e. the LED ON Timing Signal which never moves. For the CTMBD's we could use one of the unused Cal Trig Timing Signals for this purpose (e.g. Cal Trig MTG Channels 23, 24, or 32). The ECL data book says not to leave disconnected any of the inputs to a 10H115 and I know that this really does cause problems from my experience with the ECL scope boxes which now have bias resistors on their inputs so that they work the same with just one channel in use as with all 4 channels in use. I edited the [D0_Text.Timing_and_Control]MBD_and_CTMBD_Timing_Signal_Wiring.Txt to include a warning about connecting something to these unused Specific CBus Timing Channels. Try to Look at And-Or IMLRO T5 mismatch Make a version of VTC_Test called VTC_Test_2 that when it finds a T5 error it prints out first the 2 hex digits from the Spec Trig 0:15 IMLRO and then the 2 hex digits from the Spec Trig 16:31 IMLRO. This makes a total of 8 characters that get send to the VTC terminal upon T5 errors. Will the L2 Sequencer wait this long? I have loaded VTC so that are expected to be Pilot COMINT Timeouts. L15CT Pulser Runs for L15CT_PROV The config file for making the L15CT Pulser runs are setup as follows: CFG_Cal:Calor_PLS_Trig_Low.Cal CFG_Cal:Calor_PLS_Trig_Low.Trig CFG_Cal:Calor_PLS_Trig_Low.Lev1 CFG_Cal:Calor_PLS_Trig_Low.L15 CFG:Cal_PLS.RS CFG_Cal:CalElec_Detector.Req CFG_Cal_Crate:Trig_L15CT.Req CFG_Cal_Crate:Calor_MPls_Low.Req The files for High and Low amplitude have the obvious differences in file name. The indentation here indicates who calls whom. This whole setup is a little strange (e.g. look at the distribution of what is in what directory, how do you know if it is safe to edit a subordinate file because what other master file may be calling it?). For now we are developing a setup that will use the same config file for the L15CT Pulser run as for the Dan Owen Pulser run. When this was first tried there were nothing but Token Loop Count Overflows from the L15CT crate. This was because the L15CT Ref Set was set at 2.0 GeV and every TT in the world was a candidate for the object list. This caused a lot of LDSP processing which caused GDSP to be late making it to step D3 which caused 68k_Ser to timeout DSP processing and to produce no L15CT Data Block which caused the Token Loop Count Overflow on the L15CT Crate. For now this has been "Fixed" by setting the L15CT Ref Set to 1000 GeV for all Trigger Towers. It is not clear if we should also move the 68k_Ser Timeout up by a little bit. Steve finds that we have timed out 7 times during Global Physics running since 30-Oct-1994. This timeout had been kept short because at one time it was thought to be useful to try to "salvage" events that died during L15CT processing. For the past N months when ever 68k_Ser times out the DSP processing of an event, the event is flushed when the L2 sequencer cleans up Data Cable 0. Expected values of data from the Low and High amplitude L15CT pulser runs: EM |Eta|=1 |Eta|=20 HD |Eta|=1 |Eta|=20 A + --------- ---------- A + --------- ---------- M Low| 140 1 or 2 M Low| 85 1 or 2 P High| Saturate 19 P High| Saturate 13 These are order of magnitude the MAXIMUM values that one will expect to see. These represent what happens when a Pulser Pattern, by chance, hits some where in all 4 Cal Towers that make up a Trigger Tower. You also will set values of approximately 3/4, 2/4, and 1/4 of what is shown above. The point of all of this is that it looks like (except at high eta) we will get prety good bit coverage of the L15CT with the pulsers setup as they are for Dan Owen Pulser Runs. Test TCC read 68k_Ser scalers and DSP status while L15CT is processing events While we are in the middle of a global physics run I started having TCC read from L15CT information from either 68k_Ser or from DSP status words. From a .com file TCC was asked to loop through L15CTSYS DSP_STAT 68K_CTRL 68K_STAT 68K_ERR 68K_CNT 68K_FLAG waiting 10 seconds between each step. This did not appear to cause L15CT an problems, i.e. L15CT was not bothered by having a usec here or there taken by TCC caused VME cycles and the VME mastership transfer is working OK even when L15CT is processing events. Trouble starting data taking for the Friday morning store Something must have been going on early, e.g. there were initializes at: Initialize Starts Initialize Done ----------------- --------------- 8:53:38 8:54:29 8:55:30 8:56:21 8:56:39 8:57:30 8:59:17 9:00:08 9:04:29 9:05:20 9:12:00 9:12:51 9:12:58 9:13:49 9:16:20 9:17:11 Then at 9:17:53 COOR starts a full trigger download to TCC. At this time the log file indicates that TCC was seeing fresh data. As part of the download, COOR pauses L1FW at 9:23:05 and so that it can load L15CT which it starts to do. Then we see: C-RCV/CH1% 1:35 %000025FF L15CTSYS START CRATE(0) %% time: 09:23:24.46 S-15C/HDL% Preparing Params for L1.5 CT Crate %% time: 11-NOV-1994 09:23:24.53 S-15C/HDL% Copying Params to L1.5 CT Crate %% time: 11-NOV-1994 09:23:24.60 S-EXC/MBX% Flush_to_File now Servicing Exception Mailbox %% time: 09:23:31.11 X-DSP/EXC%2203468%PAS-F-FILALRACT, file already active %% time: 09:23:30.85 X-DSP/EXC%Skipping %% time: 11-NOV-1994 09:23:30.85 S-EXC/MBX% Exception Mailbox now empty %% time: 11-NOV-1994 09:23:31.48 TRICS V6.3 CLOSED LOGFILE, DUA0:[TRIGGER]TRICS_30OCT94.LOG %% ti 09:23:31.48 C-RCV/CH2% 1:26 %00000001 PHAT CLOSELOG %% time: 11-NOV-1994 09:42:47.84 It should have completed the "Copying Params to L1.5 CT Crate" in about 6 or 7 seconds, i.e. at about 9:23:30. Is it OK that the log entries "PAS-F-FILALRACT, file already active" and " Skipping" are out of time order? .............................................................................. Date: 2,3,4-NOV-1994 At: D0 Hall Topics: Air Flow Sensor Tied Down, L15CT Trigger Tower data problem, Replace MBD's with non-10H MBD's, Collect L1 Trig overlap data, Run Find-DAC, Spares that I need to bring to D0 Hall, Monitor L15CT operation -----> M101 - M102 Air Flow Sensor is still TIED DOWN RPSS Sensor <----- L15CT Trigger Tower Data Work on the L15CT Trigger Tower data problem that Dan Owen discovered last weekend. The problem was that Local DSP A3 had its Rack #2 Tot Et data bits of value 2 and 4 stuck low. A longword of Rack #2 Tot Et data from A3's Type #1 DeBug Section when there was no energy in the calorimeter should read $1A1C1C1C but it was reading $18181818. The path for this data is Lower CRC Ch#2 Copy #1 (i.e. rear row), to the Ten Port Paddle 4 connector side next to top connector,to Comm Port #3 on DSP A3. Note that this same data goes to Local DSP A4 Comm Port #2 where it was reading out OK. The problem was in the CRC to DSP Paddle Board cable. At the DSP end of the cable the receptacles for data bits of value 2 and 4 have no spring force in there contacts. I expect that someone bent these contacts with a paper clip (attempting to test the cable) or else it was a defective connector to start with. I made another cable, tested it, and installed it. The labels from the old cable were put on the new cable. The new cable is installed OK but it is not threaded into the Panduit cable tray in a very fancy way. I was afraid to move too many cables too much because it would be easy to pull out a connector. I cut the ends off of the old cable for autopsie but I did not un-thread the rest of the old cable from the Panduit cable try. After replacing this cable I ran the L15CT Test Trigger to get some events moving and captured one event via ZBDump in the file VWork1: L15CT_ZBD_Dump_2Nov94.Txt This file has both the L1 data and the L15CT data including MFP data. I wrote a small table at the top of this event to help navigate through it. If there are any old junky ZBD files around we should get rid of them. Collect L1 Trigger Overlap Data Between about 21:16 and 23:20 collect L1 Trigger overlap data in the file VWork1: SpTrg_Fired_List_2215_2Nov94.Txt The average Ch#13 D-Zero Luminosity was 7.2E30. The V10.0 08E30 prescale file was in use during this time. The "Edmunds overlap analysis" of this SpTrg_Fired_List file is in VWork1: SpTrg_Fired_Analysis_2215_2Nov94.Txt. DeHed MBD's for L1 Framework Bring MBD's SN#13 and SN#16 both non 10H101 MBD's to Fermi. MBD SN#16 was removed from lower M101 AND-OR last week and taken to MSU for de-H-ing. MBD SN#13 was pulled out of the MSU Test Rack and de-H-ed. while checking MBD SN#13 at Fermi I noticed something funny about the resistance of the Spec Front CBus Inverted MS Data Bit line. It is only a couple of hundred ohms to GND. Trace this to the 10H188 driver for the front CBus U20. Pull this chip and install a socket. Now MBD SN#13 and SN#16 have resistances that look the same. I also notice that on Rev. B MBD's that the 10H188 drivers for the mid and high data bits of the Spec Front CBus (i.e. U15 and U20) have their pin #16 only pick up Gnd via a trace over the top to pin #1. But it is easy to fold pin #16 down against the solder side Gnd plane to pick up Gnd. I make this "modification" to MBD's SN#13 and SN#16. While studying the print set for the MBD's I notice that the 10H101's are in the CONTROL circuit that tells the MBD which way to drive data and if it should be driving data anywhere or just holding the Spec Front CBus at low. Thus oscillating 10H101's could really make global problems for MBD operation. Prepair CTMBD SN#09 along with a CCCP card to install in the FSTD Cell backplane in rack M102. Eventually I plan to also do this in M101 but I want to try this one backplane at a time. I also want to fix a broken CTMBD at MSU to bring here as a spare before committing to use 2 of them in FSTD backplanes. M101 Upper And-Or backplane; pull MBD SN#15 and install MBD SN#16. M102 Upper And-Or backplane; pull MBD SN#11 and install MBD SN#13. M102 FSTD Cell backplane; pull MBD SN#05 and install CTMBD SN#09 and a CCCP. MBD's SN#5, SN#11, and SN#15 will return to MSU for de-H-ing. Remember that when replacing a MBD with a CTMBD and a CCCP that the CBus and Timing Buss cables from M114 need to plug into a different location on the backplane for the MBD than for the CTMBD. Now with all And-Or backplanes running with de-H-ed MBD's try using the special version of VTC code that reports mismatches between the two halves of the And-Or Network. This shows very few errors. Less than one per screen full of 1's and 0's. They are all "T5" mismatches. There are few enough of these errors that I leave this version of code in over night. In the morning, running with a lower prescale file, it is possible that the "T5" mismatches are slightly more frequent (perhaps almost one per screen full). I doubt that this is any longer a MBD problem. It could be: bad IML, bad IMLRO, bad And-Or Backplane. It is also very possible that muon Level 1 signals (T5 is And-Or Terms 32:39) are still moving at the time of the rising edge of the IML Clock signal. Best estimate of what to do next is to make a special version of VTC that prints the value read from both IMLRO's. It is possible that at high luminosity that there are fewer "T5" mismatch error because the activity in the muon L1 Trig is different at high Luminosity. Find_DAC_Bytes Made a full sweep of Find-DAC. There was only one TT that failed (which was an excluded TT). Find-DAC had not been run since 17-AUG-1994 although some hand patches had been made. Did a Read-Load of the new file so with the next TRICS Init it will start using the new values. There was very little change: 3 tower(s) incremented by -2 148 tower(s) incremented by -1 2276 tower(s) incremented by 0 128 tower(s) incremented by 1 5 tower(s) incremented by 2 Spares that I need to bring to D0 Hall I need to bring more VME stuff to support L15CT e.g. 214's, 135, ? Wire wrap wire (other than black), Small PROM Labels, Monitor M15CT Operation Steve collected the file VWork1: TrgMon_Dump.Txt_L15CT_V10_Initial_Run which has lots of nice information about the initial view of L15CT as it is actually dumping events. Today I used long integration TrgMon to collect more L15CT functioning data. With luminosity of 5.2 E30 and running the V100-6E30 prescale file one sees: Spec Trig Trig Name L15 Reject % L15 Skip % --------- --------- ------------ ---------- 7 EM_1_High 62 23 8 EM_2_Med 6 42 These Reject percentages do not seem to change very much with luminosity. When this information is compared with the measured L1 Trigger Overlap data it is not completely clear why there is such a big difference in the L15 Skip percentages. Both of these triggers have a lot of overlap with Spec Trig #11 (about 34%), which is also processed by L15CT, but it requires zero objects and thus always passes. .............................................................................. Date: 1-NOV-1994 At: MSU Topics: Start Trigger List V10.0 ---> Start L15 Cal Trigger Throwing Away Events. As of the first run in the early AM on 1-NOV-1994 Trigger List V10.0 was in use. This Trigger List uses L15 Cal Trig to actually filter events. .............................................................................. Date: 26,27,28-OCT-1994 At: Fermi Topics: Air Flow Sensor Tied Down, Bring spare cards to Fermi, Work on the bad data signals from M101, Bring more IC's to Fermi for the Chip Kit, Meeting with Atlas people, DAQ Conference, Swap MBD's in the M101 lower And-Or crate, review T% error distributions -----> M101 - M102 Air Flow Sensor is TIED DOWN RPSS Sensor <----- Bring MBD SN#18 (non 10H101) and CTMBD SN#10 to D-Zero Hall as tested spare cards. On the lower And-Or crate in M101 put the "T cable" onto the front CBus and take a look at all the signal lines. Every thing looks just fine: Timing Signals, Card Address, Function Address, and Data lines. I can not see Direction and Strobe do anything but that is OK. With the power turned off use the Fluke to look at the CBus cable between M114 and M101. Look from the M114 end (using the 34 pin connector that was put on last week. This only lets me look at Data lines, Direction, Strobe, and the high order 7 Function Address lines. Can see back to the terminator in M101 and verify that there are no open lines and that the terminator is OK and that there are no line to line shorts. Everything looked OK. Could not check MBA lines, CA lines, or the low order FA line because there was not a connector installed on these signals in M114 yet. In the CBus cable between the M101 MBD's and the M114 BBB crush on another 34 pin connector so that one can see the Mother Board Address lines, the Card Address lines, and the Function Address line of value 1. These signals all look just fine on the scope. Also this allowed me to verify (because I can read the MBA) that it is the lower M101 And-Or Crate that has the bad looking data lines. So the bad looking data line problem from the lower And-Or crate in M101 must be one of the MBD cards in M101. It could be either that the lower And-Or MBD has a problem driving the lines or else that one of the other 3 MBD's in M101 is leaking onto the bus back to M114 when it is not being addressed. Bring more IC's to Fermi for the Chip Kit here. This is a bit funny because it is "illegal" to repair and test a card in the running system at D-Zero Hall. The Chip Kit at Fermi now has some of each of the following: 74 LS 00 10 101 Sockets Solder Tail: 16, 20, 24 74 ALS 04 10 H 102 110 Ohm 8 Resistor DIP Packs 74 HCT 08 10 103 CTFE PAL'S: 1,2,3,4 74 ALS 30 10 104 PAL's: 16V8 16RA8 16R8 74 ALS 32 10 H 104 MTG PROM's 245A's 74 ALS 74 10 109 COMINT PROM's 265's 74 ALS 86 10 H 115 74 LS 123 10 H 116 74 ALS 138 10 H 124 74 F 139 10 H 125 74 ALS 240 10 H 131 74 ALS 245 10 133 74 ALS 520 10 H 162 74 ALS 540 10 H 166 74 ALS 541 10 H 173 74 ALS 574 10 H 188 10 H 189 Swap MBD's in the M101 lower And-Or crate. Pull MBD SN#16 and install MBD SN#18 a non 10H101 MBD. The data line value of 4 now looks OK for the data coming from this crate. Start the error checking version of VTC code running. During a Global Physics run there now appears to be about one mismatch in 2000 events (i.e. one per screen full on the VTC console). Compare this to last week when almost every event had a mismatch in the And-Or Term 128:255 range. Looking at things now, the mismatch histogram is: CODE Counts And-Or Terms #'s T5 30 32:39 TC 5 88:95 TF 5 112:119 TI 2 136:143 While reloading VTC (to start running the And-Or Term readout match error checking version) Philippe notice that TCC started complaining about Pilot COMINT timeouts. This fits with what we guessed last week, i.e. at least some of the Pilot Timeouts are caused by VTC reloads. This also was seen when we returned the "standard" VTC code. These were 28-OCT-1994 at 7:30 AM and 12:15 PM respecitvely. While D0 was running muon-only triggers at between 14 and 200 Hz, Steve looked again at the distribution of T% errors on the VTC console. Now there were many more errors per VTC screen, typically between 2 and 7 errors per screen. The average number of errors per screen was order of 4. ALL of these errors were T5 errors. At 200 Hz it is difficult to accurately count errors on the screen, but most readings were taken at low rate running or while the data cable was re-synching (i.e. VTC console relatively stable). We never saw multiple T% errors on a single event, and the time between T5 errors was approximately uniformly distributed between 1 row (or fraction of a row, i.e. 80 or fewer events) to about 10 rows. We looked at the distribution of MBD's in the Framework (and M114) racks. Here is what we saw: # of FW MBD's # of FW MBD's Rack # (de-H'ed) (not de-H'ed) # of CT MBD's ------ ------------- ------------- ------------- M101 1 3 0 M102 1 2 1 M103* 0 0 2 M114 0 1 1 ---- ---- ---- 2 6 3 We need to start a project to de-H the 6 FW MBD's. * M103 FW Expansion Backplane, consisting of L1.5 Framework and L1 Cal Trig Final Readout halves .............................................................................. Date: 19,20,21-OCT-1994 At: Fermi Topics: Work on the readout problem from M101 and M102 lower And-Or backplanes, Error in the BBB description, Edit the Set_IML_FF.DAT file, SBSC readout cycles, Connectors to monitor CBus data install in M114, Hand edit Init_DAC_BYTES.LSM Notice Pause Timouts of Pilot COMINT. Turn off most of L1 Cal Trig. Leave only the first 4 racks of Tier 1 running so that VTC does not find a pilot data error on each event. This means that all of the And-Or Terms 128:255 are undefined but typically drift high. Running the VME code that compares all 255 And-Or terms and running with the IML's set for real input data one could see errors on almost every event. These included some "T9" errors not seen last week. All other errors were in And-Or Input Terms 128:255. Switch the IML's over to all $FF test data and the number of VTC discovered errors went way down (perhaps one in 50 events). Running Test Trigger at about 28 Hz watch the Front-CBus signals on the lower And-Or backplane. Everything looks OK. All Data Lines are OK and the CA and FA lines look fine. This is with the IML's setup for $FF test data. Pull the MBD SN#18 from M102 lower And-Or backplane and install MBD SN #19 which is a non 10H101 MBD. Swap the IMLRO's in M102 between the lower and upper And-Or backplanes. IMLRO SN#10 started out in the lower and IMLRO SN# 11 started out in the upper. With the IML's using real data we are back seeing lots of errors (almost every event) still including some T9 errors. Pull the BBB's in M114 for CBus 2 that service the L1 Cal Trig Tier 2 and Tier 3. This makes no difference. This makes a space where I can plug the ECL to scope box into the backplane BBB to COMINT bus. Data line of value 8 looks fine. Data line of value 4 has sections that look funny. In these funny sections the lows look OK but the highs have hash and are not a full high. I did not carefully check other data lines. What I'm looking at is the data block builder running at 10 to 20 Hz. Now what section of the COMINT Data Block Builder PROM's do these funny sections correspond to? There are about 1618 reads in the CBus #2 COMINT PROM's. This takes about 950 usec i.e. about 580 nsec per read. An IMLRO has 16 registers or about 9 usec to read. So on the scope try to navigate to 1%. There are some other markers (e.g. where the reads are for Large Tile pattern and L15 Cal Trig final readout IMLRO's are all visible because their BBB's are pulled out. Thus trying to navigate on the scopes horizontal scale and look for landmarks the areas that I think look funny on data line of value 4 correspond to the two M101 And-Or IMLRO's. People were running but on the next test I should execute Set_IML_FF.DAT to have fixed high data to look at. So before putting things back together and turning L1 Cal Trig on I swapped the BBB for M101. I pulled SN# 14 and installed SN#9. BBB SN#9 is a card that has been used before. See logbook notes from 22-DEC-1993 and 10-FEB-1994. I did not check the BBB to COMINT backplane signals after swapping the M101 BBB. It was learned that the BBB card description does not match the schematic of the BBB. The BBB does pass MBA, CA, FA, DIR, STRB, and write data independent of whether or not it is being addresses. We should edit the BBB description. It was relearned that during the Monitor Pool Refresh when TCC reads the SBSC's by hand that it needs to do Write Cycles with the Write Strobe active. Are these broadcast only on CBus #2 or are they also on CBus #0, #1, #3 ?? Edit Set_IML_FF.DAT so that the IML test data pattern for the And-Or terms that carry Bunch Number show a legal Bunch Number i.e. only one bit out of the six is set. I set the Bunch Number And-Or bit that is readout on data line of value 4. This change was necessary so that VTC would not report a non- valid Bunch Number on every event and thus make it harder to see the And-Or IMLRO read problems. In rack M114 on the CBus cables from M101 and M102 install a 34 pin connector (covering FA2:FA8,Strobe,Direction,D1:D8) so that one can monitor the CBus signals to/from these racks. Look at the Data from the AND-OR backplane IMLRO's Rack M101 Rack M102 Data Line ------------------------ ------------------------- Value 0:127 128:255 0:127 128:255 --------- ---------- ---------- ---------- ---------- 1 OK Bad OK OK 2 OK Not too Bad OK OK 4 Bot OK Top Bad VERY Bad OK OK 8 OK Not too Bad OK OK 16 OK OK OK OK 32 Top Fuzy Fuzy Top&Bot OK OK 64 OK OK OK OK 128 OK OK OK OK Order of problems: OK > Fuzy > Not_too_Bad > Bad > Very_Bad All data bits look OK for the readout of the FSTD Cell (FSTD cards and DBSC cards) for both M101 and M102. All data bits look OK for the readout of the M101: Front-End-Buzy IMLRO, (Str Dig + FEBz Disable + L2 Disable) IMLRO, and the top card file DBSC's. All data bits look OK for the readout of the M102: Spec Trig Fired IMLRO. What can be wrong in M101 ?: 1. Bad front CBus cables, 2. Bad And-Or Card with a 10H188 data driver that stays turned on, 2. Trouble with the And-Or card file MBD's (i.e. they can not properly drive data back to the BBB but they do properly clear off the bus when not addressed so that the two top backplanes in M101 readout OK, 3. Trouble with a MBD in the upper two backplanes of M101 e.g. one of these MBD leaks current all the time onto the bus back to the BBB. Review the stuff from 21,22,23 JULY 1994 when the upper AND-OR backplane was worked on in M101 for a similar or same problem Last week Jan had found a couple of TT's that had drifted by one count. She saw these using the newest features of Cal CALIB. Today hand edit Init_DAC_Bytes.LSM to bring these channels back to 8. Copy the new version of this file to D0HTCC:: and MSU::[Trg_Current.DZero]. The changes are: Channel Was_Reading Old_DAC New_DAC -12,16 EM 7 34 36 -12,16 HD 7 40 42 -14,7 EM 9 35 33 -14,7 HD 9 44 42 We notice that there have been some Timouts of Pilot COMINT. Some of these look very likely associated with know boots of the VTC 68k. Did DAQ EXP cause the other cases by randomly booting VTC ??? This appears to have happened 14 times since 1-JULY-1994. Philippe has dug out the log files and made a summary in VWork1:Pause_Timout.Log .............................................................................. Date: 13,14-OCT-1994 At: D0 Hall Topics: Continue IMLRO readout testing We continue testing IMLRO readout of ALL And-Or Input Terms. Philippe installed a new TCC system which allows "histogramming" of a single Function Address, without de-asserting MBA, CA, etc. Using IML Test-Data registers to force ALL And-Or Input Terms true, we saw "bit 4" readback errors at low rate (1 error per 10^6 reads). Neither shutting off the MPOOL Server nor halting the Framework MTG affected the error rate. When the histogram sample is big enough (e.g. 10^6) one can get more than 1 error per histogram (seen 3 / 10^6 twice). This proves wrong the potential theory that it would be the first of the reads (when the address is just selected) that can fail. Dan installed (temporarily) a version of VTC code which compared ALL And-Or Input Terms between M101 and M102. Running at 1 Hz, we watched 300 events. 21 of these 300 events had errors (we did not check or display the "bit value" of the error). ALL of these errors were in the 16 IMLRO registers corresponding to the upper And-Or Input Terms (i.e. CBUS = 2, MBA = 65/129, CA = 50) The distribution was: And-Or Input Function Number of Term Range Address Errors ------------ -------- --------- 128..135 128 ($80) 0 136..143 129 ($81) 1 144..151 130 ($82) 1 152..159 131 ($83) 5 160..167 132 ($84) 1 168..175 133 ($85) 2 176..183 134 ($86) 0 184..191 135 ($87) 0 192..199 136 ($88) 3 200..107 137 ($89) 3 208..215 138 ($8A) 3 216..223 139 ($8B) 0 224..231 140 ($8C) 0 232..239 141 ($8D) 0 240..247 142 ($8E) 1 248..255 143 ($8F) 1 This distribution is approximately uniform (a uniform distribution would have had 21/16 = 1.3 errors per Function Address). Function Address 131 had 5 errors. Errors were seen in FA's with each of bits with values 1, 2, 4, and 8 set in the low nibble. It is significant that all errors were seen in the upper And-Or Input Terms. It would be easy to blame the IMLRO for either Specific Triggers 0..15 or 16..31, except for the fact that earlier testing (using Cal Trig Random Test) showed that BOTH of these IMLRO's were generating "bit value 4" errors. It is hard to argue that the problem is with BBB's or M114 backplane or COMINT, because these things are common between lower and upper And-Or Term IMLRO's. All 4 "And-Or Cells" use FW MTG's. Are there two independent problems, one in each of the affected cells, which both happen to affect bit value 4? For example, 2 screwed-up (Electrocircuits) And-Or cards which are loading the CBUS? We should run more tests: - make a new front-panel CBUS cable for one (or both) of the affected cells, which bypass the And-Or cards (i.e. only touch MBD, IML, and IMLRO). - look at DC voltages (and also resistances) on the front-panel CBUS cables Note: How to use the new "read-histogram" feature: - The message isn't yet part of $ TRICS_ACCESS, use instead $ @EENV:COMMANDS $ PHAT READHIST cbus mba ca fa sample size - You need to turn INFO messages ON to display the full histogram (on the remote console or the logfile) use $ TRICS_ACCESS or $ PHAT TRC_INFO 1 1 - The string returned in the reply message shows the peak of the histogram with its bin content. There is a bug in the code that limits the number of characters for bin content to 3 characters (there is room for 4 characters, and will be fixed in the next system). - Turn INFO back OFF when you are done use $ TRICS_ACCESS or $ PHAT TRC_INFO 0 0 There also is another new message for displaying a register characteristics as stored in the database. - use $ PHAT SHOW_REG cbus mba ca fa - The string returned in the reply message shows what was last read by TRICS (e.g. read-back after the last write). - Again, most of the data is displayed in INFO messages. - example of usefulness: show what TRICS thinks the R/W masks are for a MTG PAL, and how it is programmed. - Pipeline registers are stored by their lowest fa. .............................................................................. Date: 12-OCT-1994 At: D0 Hall Topics: ECB meeting at NWA, IMLRO readout testing, look at CALIB EXAMINEs to see pedestal values We continue testing IMLRO readout of Large Tile And-Or Input Terms. We set all Large Tile And-Or Input Terms to true (by defining 0 MeV Large Tile Ref Sets) and capture Data Blocks via TRGMON (using the spy window in Hex mode looking at item #395-398 and 426-429). We saw the standard "bit of value 4" problems at low rate. Dan also installed (temporarily) a version of VTC which sensed this problem and displayed error messages. We saw T1, T2, T3, and T4 error messages (i.e. all possible error messages). Note that the And-Or Input Terms were set "DC high" in this test, which tends NOT to imply timing problems with IML inputs. We also triggered on one of the "bit 4" And-Or Input Terms, to get an idea whether this problem is readout-only or also affects triggering. The And-Or rate appeared steady, which points to a readout-only problem. We re-installed the "normal" VTC code. Joan showed us CALIB examine plots which showed a few Trigger Towers with funny pedestals: -12, 16 both EM and HD had an average pedestal of 6.9 -14, 7 HD only had an average pedestal of 9.0 We looked at these with TRGMON with no beam in the machine and saw the same thing. We should re-evaluate Pedestals when we have an opportunity. .............................................................................. Date: 11-OCT-1994 At: MSU Check operation of +17,15 EM and HD. Dan Owen checks the Calorimeter Examine output looking for anything funny with +17,15 and finds nothing. .............................................................................. Date: 10-OCT-1994 At: MSU Topics: Pedestal DAC values for +17,15 EM and HD. The zero energy response from TT's +17,15 EM and +17,15 HD had shifted down from 8 to 7. It is not clear why both halves of +17,15 shifted. Is the BLS disconnected or its big hybrid dead?? Anyway to get the zero energy response back to 8 the Init_DAC_Bytes.LSM was hand edited. The DAC pedestal value for +17,15 EM moved from 31 to 33. The DAC pedestal value for +17,15 HD moved from 28 to 30. This new Init_DAC_Bytes.LSM file is in the TrgCur: and D0HTCC:: at D0 and at MSU::[Trg_Current.DZero]. .............................................................................. Date: 6,7-OCT-1994 At: Fermi Topics: Work with Philippe running Cal Trig Tests: -17,9 EM Ref Set #3 stuck on problem and EM Ref Set #0 skiping +-2 problem, reading And-Or Terms from Large Tile Triggers, ECB meeting with Dave Buchholz Work running Cal Trig tests. The problem with the comparator being stuck high for EM Ref Set #3 at Trigger Tower -17,9 ended up being that it was shorted to the Tot Et comparator for Tot Et Ref Set #0. Pins #53 and #85 were shorted together on the backplane by the metal clip in the 3M connector on the cable that leads to the ERPB. I pulled off the cable and cut the metal clip out of the connector. I did not try to clip off the backplane pins any more. When the TTL high and the TTL low fought the CHTCR saw a TTL high. This makes sense because there are 100 ohm resistors between the opposite TTL outputs and the CHTCR was effectively looking at the middle of these two 100 ohm resistors. Load and run newest version of the Cal Trig Test code that checks the large tiles (EWORK1:TRICS_V63.SYS_6OCT94;1). Today the +/- 2 global count problem associated with the count in EM Ref Set #0 appears at high rate. We skip over all EM Ref Sets (test Tot RefSets only) so that we can push on the Large Tile tests. At first it shows about one readout error of Large Tile And-Or Terms in 30k loops. Then with the Monitor Pool refresh shut off it starts running 100k loops between errors (but Philippe forgot to read the Andor Cards to see if the symptoms were still the same, we only know that redoing the same loop didn't repeat the error). A new version of the code was made but not loaded (EWORK1:TRICS_V63.SYS_6OCT94;2) that reads the IMLRO card registers twice in a row, and also reads the other IMLRO and reports an error when it finds a discrepancy. 111513: Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 206896: Large Tile Count is 4 but REF_2 LT Andor Terms .GE. 1,2,3 are 0,1,1 The new version of the system (i.e. its test code) seemed to interfere with the framework (the "watch double buffer" process kept resynchronizing the pipes) and the older version (ETRICS:TRICS_V62.SYS_7SEP94) of the system was reloaded. Now back to the EM Ref Set #0 skipping around by +-2 problem. Philippe and Steve track this down to eta +13:+16 phi 1:8. Pushing on the CHTCR and its cables does not help. Pushing on the proper EM Ref Set #0 Tier 2 CAT2 card did help. The operand #1 side of this card (not the CBus side) was push back into the card file. It is possible that I actually detected a small movement into the card file when this side of this card was pushed on. Now is the CBus still properly connected? Check run (82 kloops) of random test with no errors before giving system back. Evening and more loops of old TRICS running in eta 1:16. No errors E-HTT/PAR%rand% Loop 675000/1000000, Error Count is 0 Later in the evening and now more tests using the newest version of TRICS (EWORK1:TRICS_V63.SYS_6OCT94;2). At first we saw errors every 5 or 10 k loops. then it ran for about 60k loops with no errors. Then more errors again. This was a special version of TRICS Cal Trig test that read the AND-OR IMLRO card twice and also read the second (Spec Trig 16:31) AND-OR IMLRO card two times. During all of this set of test the Monitor Pool Refresh was turned off. All possible combinations of errors were seen e.g. the 1st read could be OK and the 2nd read be wrong, or the 1st read could be wrong and the second read OK, both reads from the Spec Trig 0:15 IMLRO could be OK and one of the reads from the Spec Trig 16:31 IMLRO could be wrong. M101 1st read bad, 2nd read bad, M102 read bad 0 M101 1st read bad, 2nd read bad, M102 read ok 0 M101 1st read bad, 2nd read ok, M102 read bad 0 M101 1st read bad, 2nd read ok, M102 read ok 22 M101 1st read ok, 2nd read bad, M102 read bad 3 M101 1st read ok, 2nd read bad, M102 read ok 29 M101 1st read ok, 2nd read ok, M102 read bad 17 M101 1st read ok, 2nd read ok, M102 read ok 225900 After a while, we notice that all of the read errors (from either the Spec Trig 0:16 IMLRO or the Spec Trig 16:31 IMLRO) had the bit of value 4 wrong (always dropped, never added). The vast majority of the errors came from FA=13 although some of the other FA's were seen at a considerably lower rate (i.e. FA=11,12). This maps onto the andor terms for LT_Ref_Set#7 >= 2 (more than 50 times), LT_Ref_Set#4 >= 3 (seen 5 times), and LT_Ref_Set#2 >= 1 (seen 4 times). Is this a data path problem in M114 e.g. epoxy ??? We can also think about the sporadic background problem appearing during boot or initialize of the style "Previously 11 instead of 15 @ cbus 2 mba 129 ca 9 fa 3" (as appearing in this logfile boot sequence). Different ca and fa have been seen, and it was believed to only be a read-back problem. But this is also pointing to the bit of value 4, in the same backplane. Philippe tried locking the Read and Write A/B pipe control lines to the L1 FW both high and low. This did not appear to help or hurt. At the end of these tests we returned to the old version of TRICS. .............................................................................. Date: 5-OCT-1994 At: Fermi Topics: Note to Steve Pier, Visit ANL Sent note to Steve Pier at UCI asking about how to speed up the LATCH and XMIT_TRIG operation of ERPB - DC. T2 Supervisor meeting at ANL with L. Price, J. Dawson, R. Blair and T. Fuess of ANL, A. Lankford, S. Pier and Birgit at UCI. .............................................................................. Date: 4-OCT-1994 At: Fermi Topics: Repair +16,9 HD, Work with Al Ito, work with Joey Thompson, Jan says there is a noisy Trig Tower, Change the Init_DAC_Bytes DAC value for +20,23 EM, Repair PDM-14. The most recent Dan Owen Pulser run showed that +16,9 HD was at 400%. The problem was a cold solder joint in the Term-Attn. Replaced the Term-Attn. Al Ito is worried about "Latch Efficiency". He is using And-Or Input Terms #84:#87 which are called SCT0:SCT3. I had him put the same signal into all of them and I "programmed" 4 Specific Triggers to look at them with just the AND-OR rate scalers. All looked OK. Then he turned on his HV and things looked funny. He will look more at his stuff. He still has logic analyzers and junk stacked up in front of the L1 FW. Joey Thompson wants a NIM input to the And-Or Network to test some scintalator that looks at timing of A layer muons. There appear to be some open unused inputs in the NIM to ECL converter in the Bagby-Norm_Amos rack. He is going to use either the 3rd or 4th input from the top which is Term #122 or #123. Note that these still have old veto scheme type names in the Trig_Config.CTL file. Jan Guida reports that Trigger Tower +6,23 EM has been Excluded (or else walked around with the Ref Sets, I'm not sure which). It is most likely noisy HV but so far there has not been a quiet enough run to see it in the precision readout. I need to find out more about this. Joan is at COMO this week. Note that +6,23 EM has been excluded before (e.g. around 1-APR-1994) Trig Tower +20,23 EM (which does not exist) has been causing trouble because its zero energy response has been 9 instead of 8. This causes the Cal Examine EM Trig Tower plot to always have a spike at +20,23 and this compresses the vertical scale. By hand I edited Init_DAC_Bytes.LSM to move the DAC value for this channel from 25 to 23. The hand done histogram was 25-->9, 24-->8, 23-->8, 22-->7, 21-->7. Copied the new Init_DAC_Bytes.LSM to MSU::TrgCur.DZero. Repaired PDM-14 which failed 2-OCT-1994. Removed the failed -4.5V brick MSU SN#95 and replaced it with MSU SN#40. Tested all 4 supplies with the toaster. A good place to find 208V 3 phase is outside of the main machine shop on the high bay floor. This outlet should be far away from any power distribution panels that support the running experiment. PDM-14 is now stored at D-Zero Hall as a tested spare. .............................................................................. Date: 2-OCT-1994 At: MSU-Fermi Topics: M107 Upper Tier 1 Power Pan Fails After about 7 hours of Tevatron beam (the first in over 1 month) the M107 Upper Tier 1 Power Pan -4.5 Volt brick fails. This is PDM-14 and the -4.5 Volt brick is SN#95 which has only been operating since June 1994. The output from this brick fell to about only -3 Volts. Power cycling restored the supply to normal operation for about 1/2 hour then it died again. They paged Dan Owen and he came in on his way back from a Cub Scout camp out. Dan Owen at Fermi managed/did the replacement without the elevator. Things started up again OK once this supply was replaced. .............................................................................. Date: 23-SEP-1994 from: MSU Topics: Remote diagnostics of CalTrig CHTCR - temporary load trics v6.3 to test new additions made to random test to check the loarge tiles up to the andor network. - This version also has additional "progress report" messages in CHTCR and CTFE lookup PROM tests to show wich PROM is being checked. the CTFE PROM test spends 56.15 s/page for 64 towers (but the remote console was on [at MSU!], and this may slow things down) which is about 20 % slower (was 23.7 s/page for 32 towers). This test should be redone without the remote console to decide if this progress report is worth the slow down. - The initial values in random test are hard to understand (cf. below). This is after 0 loops on all [1..20] towers to initialize it all, and starting on a test for eta [1..16]. The 0 loops on [1..20] initialized all towers with 0 counts of ADC, and 0 counts of Threshold. If one ignores the EM REF_3 entries, the rest is explainable. The Tot Et Reference Sets see the ouptput of the EM+HD PROMs. On page 8, we see that the random test was expecting 2*16*32=1024 towers and finds 2*16*32=1280 towers, i.e missing 256 = 2*4 eta rings. Note that the pattern is symetric around the "center page" #4 The difference is 704 - 544 = 160 5 eta rings for page #1 & 7 1152 - 896 = 256 8 eta rings 2 & 6 1120 - 864 = 256 8 eta rings 3 & 5 1152 - 896 = 256 8 eta rings 4 For pages 2 - 6, we are just missing the contribution of eta 17..20. for pages 1 & 7 it seems that "some but not all" of the misisng PROMs have a response of 0 for an input of 0. This is something to expect, as it was the method to minimize the zero energy response (no "negative saturation" for the steepest prom page slope). Page 1 or page 7 has the greater PROM slope depending on the sign of eta. We should thus expect half of these PROMs to have an output=0 when input=0. The 5 eta rings, instead of 4, comes from the EM eta 20 PROMS which have a constant output of 8, and the EM+HD sum can thus not be 0. So for page 1 or page 7, the towers at + or - 17..19 are supposed to be 0 for an ADC count of 0. E-HRD/TST%rand% Global TOT Twr Count REF_0 Page#1 is 704 instead of 544 %% time: 23-SEP-1994 12:59:10.21 E-HRD/TST%rand% Global TOT Twr Count REF_1 Page#1 is 704 instead of 544 %% time: 23-SEP-1994 13:00:27.87 E-HRD/TST%rand% Global TOT Twr Count REF_2 Page#1 is 704 instead of 544 %% time: 23-SEP-1994 13:00:27.97 E-HRD/TST%rand% Global TOT Twr Count REF_3 Page#1 is 704 instead of 544 %% time: 23-SEP-1994 13:00:28.07 E-HRD/TST%rand% Global TOT Twr Count REF_0 Page#2 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:28.95 E-HRD/TST%rand% Global TOT Twr Count REF_1 Page#2 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:29.04 E-HRD/TST%rand% Global TOT Twr Count REF_2 Page#2 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:29.14 E-HRD/TST%rand% Global EM Twr Count REF_3 Page#2 is 33 instead of 32 %% time: 23-SEP-1994 13:00:29.23 E-HRD/TST%rand% Global TOT Twr Count REF_3 Page#2 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:29.33 E-HRD/TST%rand% Global TOT Twr Count REF_0 Page#3 is 1120 instead of 864 %% time: 23-SEP-1994 13:00:30.17 E-HRD/TST%rand% Global TOT Twr Count REF_1 Page#3 is 1120 instead of 864 %% time: 23-SEP-1994 13:00:30.27 E-HRD/TST%rand% Global TOT Twr Count REF_2 Page#3 is 1120 instead of 864 %% time: 23-SEP-1994 13:00:30.37 E-HRD/TST%rand% Global EM Twr Count REF_3 Page#3 is 33 instead of 32 %% time: 23-SEP-1994 13:00:30.46 E-HRD/TST%rand% Global TOT Twr Count REF_3 Page#3 is 1120 instead of 864 %% time: 23-SEP-1994 13:00:30.56 E-HRD/TST%rand% Global TOT Twr Count REF_0 Page#4 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:31.40 E-HRD/TST%rand% Global TOT Twr Count REF_1 Page#4 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:31.49 E-HRD/TST%rand% Global TOT Twr Count REF_2 Page#4 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:31.59 E-HRD/TST%rand% Global EM Twr Count REF_3 Page#4 is 1 instead of 0 %% time: 23-SEP-1994 13:00:31.68 E-HRD/TST%rand% Global TOT Twr Count REF_3 Page#4 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:31.78 E-HRD/TST%rand% Global TOT Twr Count REF_0 Page#5 is 1120 instead of 864 %% time: 23-SEP-1994 13:00:32.62 E-HRD/TST%rand% Global TOT Twr Count REF_1 Page#5 is 1120 instead of 864 %% time: 23-SEP-1994 13:00:32.72 E-HRD/TST%rand% Global TOT Twr Count REF_2 Page#5 is 1120 instead of 864 %% time: 23-SEP-1994 13:00:32.81 E-HRD/TST%rand% Global EM Twr Count REF_3 Page#5 is 33 instead of 32 %% time: 23-SEP-1994 13:00:32.91 E-HRD/TST%rand% Global TOT Twr Count REF_3 Page#5 is 1120 instead of 864 %% time: 23-SEP-1994 13:00:33.01 E-HRD/TST%rand% Global TOT Twr Count REF_0 Page#6 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:33.88 E-HRD/TST%rand% Global TOT Twr Count REF_1 Page#6 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:33.98 E-HRD/TST%rand% Global TOT Twr Count REF_2 Page#6 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:34.08 E-HRD/TST%rand% Global EM Twr Count REF_3 Page#6 is 33 instead of 32 %% time: 23-SEP-1994 13:00:34.17 E-HRD/TST%rand% Global TOT Twr Count REF_3 Page#6 is 1152 instead of 896 %% time: 23-SEP-1994 13:00:34.27 E-HRD/TST%rand% Global TOT Twr Count REF_0 Page#7 is 704 instead of 544 %% time: 23-SEP-1994 13:00:35.03 E-HRD/TST%rand% Global TOT Twr Count REF_1 Page#7 is 704 instead of 544 %% time: 23-SEP-1994 13:00:35.13 E-HRD/TST%rand% Global TOT Twr Count REF_2 Page#7 is 704 instead of 544 %% time: 23-SEP-1994 13:00:35.22 E-HRD/TST%rand% Global EM Twr Count REF_3 Page#7 is 33 instead of 32 %% time: 23-SEP-1994 13:00:35.32 E-HRD/TST%rand% Global TOT Twr Count REF_3 Page#7 is 704 instead of 544 %% time: 23-SEP-1994 13:00:35.41 E-HRD/TST%rand% Global TOT Twr Count REF_0 Page#8 is 1280 instead of 1024 %% time: 23-SEP-1994 13:00:36.95 E-HRD/TST%rand% Global TOT Twr Count REF_1 Page#8 is 1280 instead of 1024 %% time: 23-SEP-1994 13:00:37.05 E-HRD/TST%rand% Global TOT Twr Count REF_2 Page#8 is 1280 instead of 1024 %% time: 23-SEP-1994 13:00:37.15 E-HRD/TST%rand% Global EM Twr Count REF_3 Page#8 is 1 instead of 0 %% time: 23-SEP-1994 13:00:37.27 E-HRD/TST%rand% Global TOT Twr Count REF_3 Page#8 is 1280 instead of 1024 %% time: 23-SEP-1994 13:00:37.36 To explain the initial values for Tot Et Tower Counts, look at this summary of zero input response of the PROMs page# 1 2 3 4 5 6 7 8 PROM EMP0101 02 01 00 00 00 01 02 00 EMP0201 03 02 01 00 00 00 01 00 EMP0301 04 03 02 01 00 00 00 00 EMP0401 05 04 03 02 01 00 00 00 EMP0501 06 05 04 03 02 01 00 00 EMP0601 06 06 05 04 03 02 00 00 EMP0701 06 05 05 04 03 02 00 00 EMP0801 06 05 05 04 03 02 00 00 EMP0901 07 07 06 05 04 03 00 00 EMP1001 07 07 06 05 04 03 00 00 EMP1101 07 07 06 05 04 02 00 00 EMP1201 08 08 07 06 05 03 00 00 EMP1301 08 08 07 06 05 03 00 00 EMP1401 08 08 07 06 05 03 00 00 EMP1501 08 08 07 06 05 03 00 00 EMP1601 08 08 07 06 05 03 00 00 EMP1701 08 08 07 06 05 03 00 00 EMP1801 08 08 07 06 05 03 00 00 EMP1901 08 08 07 06 05 03 00 00 EMP2001 08 08 08 08 08 08 08 00 EM 0 0 1 2 3 3 17 20 zeroes Pos+Neg 17 3 4 4 4 3 17 40 HDP0101 02 01 00 00 00 00 01 08 HDP0201 02 01 00 00 00 00 00 08 HDP0301 03 02 02 01 00 00 00 09 HDP0401 04 03 03 02 01 01 00 0A HDP0501 05 04 04 03 02 01 00 0B HDP0601 04 03 03 02 01 01 00 0A HDP0701 06 05 05 04 03 02 01 0C HDP0801 06 05 05 04 03 02 01 0C HDP0901 07 06 06 05 04 03 01 0D HDP1001 07 06 06 05 04 03 01 0D HDP1101 07 06 06 05 04 03 01 0D HDP1201 06 05 05 04 03 02 00 0C HDP1301 06 05 05 04 03 02 00 0C HDP1401 08 07 07 06 05 04 01 0E HDP1501 08 07 07 06 05 04 01 0E HDP1601 08 07 07 06 05 04 01 0E HDP1701 08 07 07 06 05 04 01 0E HDP1801 08 07 07 06 05 04 01 0E HDP1901 08 07 07 06 05 04 01 0E HDP2001 08 07 07 06 05 04 01 0E (EM+HD)/2 0 0 2 2 3 4 18 0 zeroes Pos+Neg 18 4 5 4 5 4 18 0 zereos 22 36 35 36 35 36 22 40 non-zereos *32 phis 704 1152 1120 1152 1120 1152 704 1280 How to explain the initial counts on the EM Tower counts: For getting an EM tower ebove threshold, we must have EM > 0 and HD = 0 This never happens on page 4. For page 7 this happens at eta = +2, page 6 at +1, page 3 at +2, and by extrapolation page 1 at -2, page 2 at -1, page 5 at -2. There is a systematic offset of 1 count in the EM ref set #3, for all pages except page #1. One can assume that it comes from -17;9 as seen on 22-sep. Page #1 is the only page where the EM zero input response is 0, the HD response is always > 0. Possible explanation: This behaves as if the HD veto was not kicking in (on any page) and the tower passes its threshold as soon as EM = 0, with no regard to HD >0. - first attempts at running new test weren't very successful because of a bug that would prevent reporting errors for large tile andor terms. 144,000 loops were performed on eta 1..16 with no other error. - after fixing code bug, run for 839902 loops. 27 large tile errors, no other error detected. (log file is TRICS_23SEP94.LOG;3) 28581/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 11,HD,POS,E_4,P_11,LUP_3-8-8-8,EMET_REF,REF_2,42,CMP_2 53283/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 37,HD,NEG,E_9,P_12,LUP_3-4-6-8,EMET_REF,REF_3,143,CMP_1 110355/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 189,HD,NEG,E_4,P_29,LUP_7-7-8-8,EMET_REF,REF_0,39,CMP_1 133716/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 10,EM,NEG,E_1,P_21,LUP_5-1-8-8,TOTET_REF,REF_0,82,CMP_2 166750/1000000, Large Tile Count is 3 but REF_2 LT Andor Terms .GE. 1,2,3 are 0,1,1 Pick was 228,EM,POS,E_7,P_30,LUP_8-6-4-8,EMET_REF,REF_1,42,CMP_0 187046/1000000, Large Tile Count is 5 but REF_4 LT Andor Terms .GE. 1,2,3 are 1,1,0 Pick was 46,HD,NEG,E_13,P_31,LUP_4-2-6-8,EMET_REF,REF_0,6,CMP_2 230247 Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 185,EM,NEG,E_2,P_6,LUP_5-3-4-8,TOTET_REF,REF_1,119,CMP_2 248206 Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 241,EM,NEG,E_3,P_18,LUP_3-5-1-8,TOTET_REF,REF_1,244,CMP_1 292822/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 100,EM,POS,E_6,P_18,LUP_6-8-3-8,EMET_REF,REF_2,39,CMP_0 345197/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 49,HD,POS,E_3,P_12,LUP_8-8-3-8,EMET_REF,REF_3,176,CMP_2 375297/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 122,HD,NEG,E_3,P_14,LUP_8-5-3-8,EMET_REF,REF_1,220,CMP_0 375533/1000000, Large Tile Count is 3 but REF_2 LT Andor Terms .GE. 1,2,3 are 0,1,1 Pick was 13,EM,NEG,E_16,P_20,LUP_7-6-8-8,HDET_VETO,REF_2,28,CMP_3 422227/1000000, Large Tile Count is 3 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,1 Pick was 19,HD,NEG,E_11,P_27,LUP_4-8-4-8,EMET_REF,REF_1,148,CMP_0 519607/1000000, Large Tile Count is 4 but REF_4 LT Andor Terms .GE. 1,2,3 are 1,1,0 Pick was 82,HD,NEG,E_14,P_26,LUP_8-2-1-8,EMET_REF,REF_0,168,CMP_1 536559/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 57,HD,NEG,E_13,P_31,LUP_3-1-5-8,TOTET_REF,REF_2,125,CMP_2 538971/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 120,HD,POS,E_14,P_5,LUP_1-1-3-8,HDET_VETO,REF_2,147,CMP_1 584306/1000000, Large Tile Count is 4 but REF_4 LT Andor Terms .GE. 1,2,3 are 1,1,0 Pick was 188,HD,POS,E_11,P_2,LUP_6-4-3-8,HDET_VETO,REF_3,186,CMP_2 599448/1000000, Large Tile Count is 4 but REF_2 LT Andor Terms .GE. 1,2,3 are 0,1,1 Pick was 106,EM,POS,E_12,P_27,LUP_1-8-2-8,TOTET_REF,REF_0,167,CMP_1 644079/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 118,HD,NEG,E_6,P_25,LUP_1-8-7-8,EMET_REF,REF_3,235,CMP_2 647084/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 245,HD,NEG,E_12,P_15,LUP_6-6-5-8,TOTET_REF,REF_0,65,CMP_0 722884/1000000, Large Tile Count is 3 but REF_2 LT Andor Terms .GE. 1,2,3 are 0,1,1 Pick was 143,EM,NEG,E_16,P_21,LUP_2-4-4-8,TOTET_REF,REF_3,38,CMP_3 734397/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 132,EM,NEG,E_2,P_24,LUP_4-7-7-8,EMET_REF,REF_1,201,CMP_0 742611/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 239,EM,NEG,E_12,P_10,LUP_3-5-3-8,TOTET_REF,REF_3,99,CMP_2 757132/1000000, Large Tile Count is 3 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,1 ick was 61,HD,NEG,E_16,P_23,LUP_8-2-3-8,TOTET_REF,REF_1,218,CMP_0 780123/1000000, Large Tile Count is 3 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,1 Pick was 167,HD,POS,E_12,P_18,LUP_7-6-1-8,TOTET_REF,REF_2,165,CMP_0 798460/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 236,HD,POS,E_12,P_30,LUP_8-6-6-8,TOTET_REF,REF_0,229,CMP_0 839902/1000000, Large Tile Count is 2 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,0 Pick was 210,HD,NEG,E_16,P_2,LUP_5-7-5-8,EMET_REF,REF_3,163,CMP_1 In all cases the error cannot be reproduced by repeating the same loop. We have an intermittent problem with the large tiles, or with the test. Made the following TRICS command files (backup on MSU::[TRG_CURRENT.DZERO] only since the usefulness of these files is only a temporary) READ_LARGE_TILE.TCC read LgTile pattern on tier #2 LTCC cards READ_LARGE_TILE_ANDOR.TCC read LgTile Andor terms on FW IMLRO CYCLE_TSS_LGT_T3.TCC cycle the TSS to latch the Tier #2 LTCC card CYCLE_TSS_LGT_T2.TCC cycle the TSS to latch the Tier #3 LTCC card CYCLE_LGT_BY_HAND.TCC recursive call to previous files The LgTile Pattern shows nothing peculiar and all reference sets agree. Reading the Andor Terms did not show the problem, i.e. the andor terms were now fine, even without single cycling the caltrig again. At this time, the CT MTG was "parked" and the FW MTG was free running. Made another version of the system which makes the FW MTG also single cycle. (log file is TRICS_23SEP94.LOG;5). I only had time to catch 1 error before returning the system to Jan et al. 39358/100000, Large Tile Count is 3 but REF_7 LT Andor Terms .GE. 1,2,3 are 1,0,1 Pick was 164,EM,POS,E_7,P_10,LUP_8-7-7-8,HDET_VETO,REF_2,158,CMP_2 and the symtoms were identical: no problem seen on the LgTile pattern no problem seen on the andor cards problem did not repeat upon redoing same loop Was the code really single cycling the FW MTG : I think so, but I didn't have time to really prove it to myself. Is this a problem of flaky reading of the IMLRO? - at the end of all this, the older system V6.2 was reloaded, as the V6.3 diagnostics code is not yet stable/useful. .............................................................................. Date: 21-22-SEP-1994 from: MSU Topics: Remote diagnostics of CalTrig CHTCR: all CHTCRs check out, 2 * 100 k loops of random test ok. - Philippe runs CHTCR test on full coverage eta +/- 1..20. No errors. Method: Use INITIAL TRGTWR before testing each card 1) use initialize Trigger Tower to clean up the state left by the test. 2) select the first CHTCR for PROM test, run the test 3) go back to (1) for next CHTCR Initializing the towers with 0 loops of random test (as described in entry from 22-DEC-1993 in D0_HALL_LOGBOOK.LBK_1993) didn't work: - tried testing the CHTCR at either + or - 1..4;1..8 - everything is fine thru EM Ref Set # 0, 1, 2 - reaching EM Ref Set #3, the initial global count is 1 instead of 0. Using Tree Browser, I found the culprit at EM tower -17;9. The 0 loops of random test left all thresholds (em, hd_veto and tot) programmed with zero, and all towers are in simu mode, with simu ADC=0 The answer on all EM Ref Sets should be 0 because for eta 1..2, the zero input response of the EM PROM is 0 for eta 3..20, the zero input response of the HD PROM is > 0. I tried releasing the CT MTG, and re-aiming the read/write pipes, then changing the threshold to see if I could shut it off at all. I didn't succeed in seeing any change in this tower. But I couldn't change its neighbors either. I don't know what I was doing wrong. Correction: I didn't take into account the HD veto. This needs to be investigated. Note that TRGMON shows he EM#3 count at 0 after initialize. - All Tot Et Ref Sets initial counts were 1136 instead of 0. I believe 1136 can be derived from the 1152 initial count (cf entry for 23-sep) and removing the contribution of the 4*8 towers in the CHTCR cell being tested. but 16 of the 32 towers (eta = 1,2) in this range were already not contributing. This is systematic across all 4 tot et refsets, and is a new(?) effect since we changed the HD PROMs for L1.5 CT EM lookup. This change made the CHTCR test use page #4 where it was page 8 before (since HD page 8 is now constant >0) - Run 2 x 100k loops of random test. No errors. .............................................................................. Date: 16-SEPT-1994 At: Fermi Topics: Deliver spare CRC, 68k_Services ABS files, Tests of L1 Cal Trig. Deliver the 3rd L15CT CRC card to D0 Hall to be kept here as a spare. It is CRC serial number SN#1. The 68k_Services files available in TrgCur: are L15CT_68K_Services.Abs and L15CT_68K_Services.Abs_Panic. I left the "Panic" mode code loaded. These are the most recent versions, i.e. 4 Terms and not confused by $ff's from TCC. Testing of the L1 Cal Trig -------------------------- Started by running Caltrg_Random in the eta range 1:16. I tried to make some runs of 50k loops. In the beginning I had the "normal" problem of Global EM Tower Count Ref Set 0 dancing around +-2 of the proper value in the tower count range of 140 or 150. Then things settled down and it was running 50k loops OK. This is eta 1:16, all Pages, all Ref Sets. I also tried some 50k loop runs with just Page 4. Next I tried 1,000,000 loops all Pages all Ref Sets. This maded it to: E-HTT/PAR%rand% Loop 896000/1000000, Error Count is 0 At which point it was stuck in some funny repeating +-4 problem of: Global HD 1st Energy Sum is 66155 instead of 66159, T1 trunc =65536 Global HD 1st Energy Sum is 66380 instead of 66376, T1 trunc =65536 Global HD 1st Energy Sum is 62318 instead of 62314, T1 trunc =69632 Global HD 2nd Energy Sum is 62436 instead of 62432, T1 trunc =69632 Global HD 2nd Energy Sum is 66139 instead of 66135, T1 trunc =65536 After starting back up, everything looked fine and it made it through 31k looks of a 50k loop run with no errors at which point the system was Initialized out from under it. Next I started the memory test scan of eta 1:16 all Pages. Test type is Lookup, both signs of eta 1:16, all phi's, Both channel types, All Pages, 1 loop, normal I/O. This started at 20:52 and ran until 22:43 with no errors shown. Update 28-SEP-1994 : Philippe looks at log files Notice that the EM Ref Set 0 +/- 2 problem is all happening when the tower is picked in the +13..16;1..8 range, except for the first occurence. This granularity maps onto a Tier #1 CHTCR or a Tier #2 CAT2 input. HD,NEG,E_11,P_9 HD,POS,E_16,P_7 HD,POS,E_14,P_6 HD,POS,E_14,P_6 HD,POS,E_16,P_7 EM,POS,E_16,P_2 EM,POS,E_15,P_8 HD,POS,E_13,P_6 HD,POS,E_16,P_3 HD,POS,E_16,P_6 HD,POS,E_16,P_3 EM,POS,E_14,P_8 HD,POS,E_13,P_1 HD,POS,E_13,P_6 HD,POS,E_16,P_6 HD,POS,E_14,P_6 The HD Problem seems to be located on a CTFE card: HD Tier 1 CAT2 operand for NEG,E_1_4,P_1 Page#2 is 569 instead of 573 The problem is either intermittent (some days yes, some days not) or needs a very unlikely combination of trigger tower energies (it took 900kloops) It is also flaky once the system gets in screwed up mode behind by 4 (it may give the correct answer, appearing as ahead by 4, but redoing same loop goes back to bad, i.e. no error) .............................................................................. Date: 7,8,9-SEPT-1994 At: Fermi Topics:Group Meeting, Install the last 13 L15 FW Term Receiver PAL's, Swap the IML cards for L1 FW Input Terms 128:255, Install new version of TRICS, Install new version of L15CT 68k_Ser, Tests of L15CT, Recorded cosmic run with L15CT filtering 4 L1 SpTrgs. Cooked ans installed the last 13 L15 FW Term Receiver PAL's. I cooked two spare parts and put them in the programmed parts container. The UniSite model 48 at D-Zero is dead again. I let Sten Hansen know and used a cooker in the 9th floor WH. While power was off I swapped the last 2 IML's for non "H" IML's. The first two were finally swapped a couple of months ago. This time I swapped the ones for L1 Terms 128:255. In M101 pull SN#11 install SN#13. In M102 pull SN#14 and install SN#16. Start running TRICS Version 6.2 7SEP94. Also start running the newest 68k_Ser code that is protected from having its Software Flags overwritten when TRICS loads the parameter block into the 68k's "dual port" memory area. L15CT Tests ----------- Tested L15CT (using the simple CAL15CT trigger configuration file) on all 4 of its L15 FW Terms (i.e. #16:#19). Did this by editing the VWork1: temporary command files L15CT_ALL.COM and L15CT_CMD.COM and by moving the L15 FW Term with TRICS. Verified that L15CT worked with parameter values of "0" and "2" assigned to the Global DSP. Made a recorded test run of L15CT filtering on 4 L1 Spec Trig's. This is just a "cosmic + noise" run. It is Run Number 83580. The file is DATA3:[CAL]EMC_083580_01.X_ZRD01;1 271038 blocks. This is about 1000 events. I worked with Bill Cobau's MULTI_CAL15 trigger configuration to get something that would run without beam i.e. just with noise and cosmics. The following is the basic setup as used in this run: L1 Spec Trig Numb Name L1 Conditions L15CT Conditions ------------ ----------------- ---------------------------------------- 7 EM_1_Max 1 EM > 2.5 GeV 1 elec, 1x2 EM Sum > 5.5 GeV Iso > 0.80 8 EM_2_Med 2 EM > 0.5 GeV 2 elec, 1x2 EM Sum > 1.0 GeV Iso > 0.10 10 EM_1_Miss 1 EM > 2.5 GeV 1 elec, 1x2 EM Sum > 5.5 GeV Iso > 0.10 + MsPt > 7.5 Gev 12 EM_Jet 1 EM > 2.5 GeV 1 elec, 1x2 EM Sum > 5.5 GeV Iso > 0.10 + 2 JT > 1.0 GeV The following are the changes that I made on the fly to Bill's MULTI_CAL15 to get things running at a couple of Hz with reasonable L15CT rejection ratios. 1. Execute the Force L0 command file. 2. Change the following L1 Reference Sets: EM Ref Set #0 was 12.0 Gev over eta 1:19 set to 2.5 GeV same eta EM Ref Set #1 was 2.5 Gev over eta 1:19 List Builder no change EM Ref Set #2 was 7.0 Gev over eta 1:19 set to 0.5 GeV same eta EM Ref Set #3 was 12.0 Gev over eta 1:13 set to 2.5 GeV same eta Tot Ref Set #0 was 5.0 Gev over eta 1:20 set to 1.0 GeV same eta Tot Ref Set #1 was 3.0 Gev over eta 1:20 List Builder no change 3. Change the Level 1 Missing Pt Threshold #0 from 15.0 GeV to 7.5 GeV 4. Change all L1 prescales to 1. 5. Change the L15CT Term #0 EM Ref Set from 7.0 GeV to 1.0 GeV with the same eta 1:19 coverage i.e. any EM Trig Tower over 1.0 GeV was a L15CT candidate. 6. Change the following L15CT Term Parameters: Cobau's original Beam Running L15CT setup L15 Term 1x2 EM Et EM vs. Tot Count Used on L1 Number Threshold Ratio Thresh Threshold Spec Trig's -------- --------- ------------ --------- ----------- 0 15.0 GeV 0.80 1 7 1 10.0 GeV 0.10 2 8 2 15.0 GeV 0.10 1 10, 12 This Run's L15CT Setup for Cosmic and Noise Running L15 Term 1x2 EM Et EM vs. Tot Count Used on L1 Number Threshold Ratio Thresh Threshold Spec Trig's -------- --------- ------------ --------- ----------- 0 5.5 GeV 0.80 1 7 1 1.0 GeV 0.10 2 8 2 5.5 GeV 0.10 1 10, 12 7. The L15CT Mark and Force Pass Ratio was set to 25. Making these changes resulted in the following typical performance with just noise and cosmics in the calorimeters: Global Monitoring of All Allocated Specific Triggers 9-SEP-94 12:42:06 Integr Time DBSC/SBSC: 59.9/59.9 s Global Event Transfer Rate: 3.49 Hz Level 1: Running Information: Fresh Global Level 1 Trigger Rate: 7.51 Hz Fast Level 0 Good: 0.00 Hz Level 1.5 Input/Reject: 6.9 Hz/58.1 % Dead Beam X During Level 1.5: 0.1 % Time Since Last Initialize: 0 03:16:47 Events Transf Since: 8109 |Tot|Tot |Total| Sp.|Firing| Andor|Prscl|L 1.5|Events|Globl|F-End|Level|And|Strt|Watch| Trg| Rate| Rate|Ratio|Rejct|Transf|Expos| Busy|2 Dis|Trm|Dgtz|Busy | ---|----Hz|----Hz|-----|----%|------|----%|----%|----%|---|----|-----|------- 0 | 0.58|286275| 500k| 0 | 6339| 0.0| 0.0| 0.2| 1| 1| 1| 7 | 6.92| 6.92| 1| 58.1| 4665| 99.8| 0.0| 0.2| 8| 9| 9|L1.5 8 | 0.65| 0.65| 1| 89.7| 468| 99.8| 0.0| 0.2| 7| 9| 9|L1.5 10 | 3.59| 3.59| 1| 40.0| 2272| 99.8| 0.0| 0.2| 9| 9| 9|L1.5 12 | 0.48| 0.48| 1| 51.7| 331| 99.8| 0.0| 0.2| 9| 9| 9|L1.5 30 |237962|238531| 1| 0 |******| 99.8| 0.0| 0.2| 4| 9| 9| 31 | 0.00|286275| 1| 0 | 67| 0 | 0.0| 0.2| 0| 1| 1| This clip from TrgMon was taken during this run. It shows the typical L15CT rejections. It is the average of 1 minute of running. The run actually lasted for about 5 minutes. .............................................................................. Date: 24,25-AUG-1994 At: Fermi Topics: ECB Meetings, New TCC code, New TRICS_Init_Auxi_L15CT.Dat, Start loading the DSP BLX files directly from TCC disk, Test the 4 Term L15CT Moved to a new version of TRICS that accepts reentrant commands and that sends us a mail message if it has to say FAILURE BAD at Init time. If there are any problems the backup path is to TRICS_V62.SYS_19AUG94;4 We also changed the TRICS_Init_Auxi_L15CT.Dat file so that it does not say anything about any L15CT parameters. TRICS_Init_Auxi_L15CT.Dat now depends on the L15CT_Default_Config.Dat file to execute correctly. Also TRICS_Init_Auxi_L15CT.Dat was changed to load the DSP's from TCC's disk. The backup path for TRICS_Init_Auxi_L15CT.Dat file is in [TrgCur.Obsolete]. All of this new stuff appears to be working fine. Start loading the DSP BLX files directly from TCC disk ------------------------------------------------------ This change was made because we have seen times when the network link to the host was very slow or even timing out when TCC tried to get the BLX files from the host. So now it gets them from it disk from directory: D0HTCC::DUA0:[L15CT$EXEC]. Where TCC will look for the DSP BLX files is controlled by what COOR puts in its LOADCODE command. What COOR puts in the LOADCODE command is controlled by what is in the file CTL:Trig_Config.Ctl. Near the end of this file in the Cal_L15_crate 0 Code_directory statement it used to say L15CT$EXEC: today I changed this to FROM_LOCAL_DISK. Note that we have not errased the BLX files from the host so we can backup to loading from there if necessary. Test the 4 Term L15CT --------------------- Between Stores I tested the various Terms of the 4 Term L15CT in the following way. I used our standard Cal_Trig_L15 trigger configuration to setup the COOR and L2 parts and to load up and get L15CT running on its Term #0 aka L15 FW Term #16. All of this went as normal and ran OK. Then I paused the run and used TRICS to move to L15 FW Term #17 and L15CT_Cmd.Com and L15CT_All.Com to move to L15CT Term #1. This also looks OK. I paused the run again and used TRICS to move to L15 FW Term #18 and L15CT_Cmd.Com and L15CT_All.Com to move to L15CT Term #2. This looks OK. I had trouble when I tried to move to L15 FW Term #19. TRICS has eyes !! It looked into rack M103 and saw no L15 FW Term Receiver PAL for terms #19 and above and this caused it to answer BAD PARAM when I tried to setup on L15CT TERM #3. Anyway L15CT Terms #0, #1, and #2 look OK and I need to cook more L15 FW Term Receiver PALs. .............................................................................. Date: 23-AUG-1994 At: MSU Topics: Remember where the Trigger programming can be found The Postscript versions of Trigger Lists Versions 8.0 and above can be found in the directory: D0$CONFIGS$TRIGLIST Also in that directory is a "Trigger History" file (in ASCII), a Glossary of Terms used in the Trigger Lists (also in ASCII), and "Strawman" versions of Trigger Lists 8.0 (in Postscript). Additionally, the Trigger Lists which are actually used to define the running of D0 (i.e. the Trigger Lists ready by TRIGPARSE) for Global Running in Run 1B are in the directory: D0$CONFIGS$RUN1B_GLB The trigparse file and postscript file are in FNALD0::D0$L2BETA:[CONFIGS.RUN1B_GLB] .............................................................................. Date: 22-AUG-1994 At: MSU Topics: Collect more L1 Spec Trig's overlap data. Collected Spec Trig's overlap data starting at 13:28 at channel 13 luminosity of 12.3 Finished collecting at 15:46 at a channel 13 luminosity of 10.4 Collected 1680 events of which 103 were Spec Trig #31 only. This is in the file: VWork1:SpTrig_Fired_List_1328_22AUG94.Txt .............................................................................. Date: 19-AUG-1994 At: DZero Topics: Install 4 Term L15CT, look at 3 more events that fail L2 XYZ with negative energy. Move to Using 4 Term L15CT -------------------------- This move involves changes to: TCC, DSP, 68k_Ser, TRICS_Init_Auxi_L15CT.Dat, and adding a new file called L15CT_Default_Config.Dat The known remaining problems with new 4 Term L15CT include: reentranant calls in the "configuration" files for TCC which cause "failure bad" at Initialize time. Because of delays (and sometimes failures) in getting DSP code over the network if the host system is busy or if the network is busy when TCC is trying to load up the DSP's; Edmunds remains uncontrollably cranked up to keep the BLX files on TCC disk. New features of the 4 Term DSP code include: programmable count thresholds for the number of objects required before the Term answers "yes, true". The legal range of the count threshold include zero so there no longer needs to be a "PANIC" mode. What to push next: more testing of the 4 Term L15CT, test down load and then an in beam test of 4 rationally defined physics terms. Look at 3 more events that fail L2 XYZ with negative energy ----------------------------------------------------------- Evening of 18-AUG-1994 19-AUG-1994 for about 12 hours 3 L2 nodes were set with break points to stop if negative energy in Cal precision readout was found. Three events were collected this way. file SCRATCH:[LONG.LEVEL2]D0L241_NEG_EM.DMP 1360 blocks DATE and TIME= 19-AUG-1994 00:35:28.15 LOCAL RUN# 82955 L1 Spec Trig 8 Fired The event looks OK to L1. The event looks OK to L2, there is EM3 energy. There are 3 EM hits: -3,8 2.5 GeV +2,13 9.0 GeV +3,13 7.5 GeV file D0L242_NEG_EM.DMP;1 507 Blocks DATE and TIME= 19-AUG-1994 05:14:11.19 LOCAL RUN# 82956 L1 Spec Trig's 19 Fired The event looks OK to L1. No EM3 energy in the precision Cal readout. There are 2 EM hits +3,24 2.75 GeV +11,10 11.25 GeV file Scratch:[Long.Level2] D0L243_NEG_EM_E_BREAK.DMP 451 blocks DATE and TIME= 18-AUG-1994 21:15:23.36 LOCAL RUN# 82946 L1 Spec Trig's 8, 16, 26 Fired wall of fire eta -7 -8 .............................................................................. Date: 18-AUG-1994 At: DZero Topics: C80 information for Jay Wightman, Pull 22 CAT2 cards from M111 and M112, Modify the Trics_Boot file to compensate for the pulled out CAT2's, Start using a new Init_DAC_ _Bytes.LSM file, Try to understand more about the events where L2 sees negative energy, Last week I promised some C80 stuff for Jay Wightman; I need to do it. Pull the CAT2 ------------- Pull the CAT2 cards for Tier 1 EM Et and HD Et out of M111 and M112 and pull out the M111 Tier 2 EM, HD, Px and Py cards. 22 cards total. The cards pulled out are: M111 Tier 1 M112 Tier 1 -------------------------- -------------------------- HD SN# 231 HD SN# 73 Upper Tier 1 EM SN# 141 Upper Tier 1 EM SN# 57 ** HD SN# 280 HD SN# 253 EM SN# 148 EM SN# 132 HD SN# 266 HD SN# 233 Lower Tier 1 EM SN# 282 Lower Tier 1 EM SN# 277 HD SN# 246 HD SN# 281 EM SN# 237 EM SN# 255 M111 Tier 1 -------------------------- ** This card is bad. See the +Py SN# 40 log book entry from yesterday. -Py SN# 106 This card appears to be sicker Tier 2 +Px SN# 37 than just not reading back. -Px SN# 102 It's LED's looked like the HD SN# 103 correction register had not EM SN# 133 been loaded correctly. Modify the TRICS_Boot file -------------------------- Modify the TRICS_Boot file so that all TRICS access to the above 22 CAT2 cards will be aimed to the CAT2 in M111 Tier 2 that is the EM Ref Set 3 counter tree card. Do this via MOD_HDB commands to TRICS. Boot TCC and this appears to be working OK. New Init_DAC_Bytes.LSM ---------------------- Use yesterdays run of Find-DAC to make a new Init_DAC_Bytes.LSM file and load it into TCC. The previous run of Find-DAC was on 1-APRIL-1994 ! Not much had changed. Note that it is mostly HD towers in the central eta. I had previously noticed that -12,16 was low in the examine plots from global physics running. 2560 towers have been examined DAC_BYTE low 23 for EM,POS,E_16,P_15 (was 23) DAC_BYTE increment -2 for EM,NEG,E_14,P_7 37->35 DAC_BYTE increment -2 for HD,NEG,E_2,P_21 35->33 DAC_BYTE increment -2 for HD,NEG,E_1,P_14 34->32 DAC_BYTE increment -2 for HD,NEG,E_1,P_20 33->31 DAC_BYTE increment -2 for HD,POS,E_1,P_16 43->41 DAC_BYTE increment -2 for HD,POS,E_2,P_25 34->32 DAC_BYTE increment -2 for HD,POS,E_4,P_18 31->29 DAC_BYTE increment 2 for HD,NEG,E_2,P_4 37->39 DAC_BYTE increment 2 for HD,NEG,E_2,P_7 30->32 DAC_BYTE increment 2 for HD,POS,E_1,P_21 35->37 DAC_BYTE increment 2 for HD,POS,E_2,P_10 34->36 DAC_BYTE increment 2 for HD,POS,E_2,P_18 36->38 DAC_BYTE increment 3 for EM,NEG,E_12,P_16 31->34 DAC_BYTE increment 3 for HD,NEG,E_12,P_16 37->40 7 tower(s) incremented by -2 122 tower(s) incremented by -1 2270 tower(s) incremented by 0 154 tower(s) incremented by 1 5 tower(s) incremented by 2 2 tower(s) incremented by 3 Look for cause of the funny negative energy events in L2 -------------------------------------------------------- As the store was near the end between 11:00 and 11:40 setup a special trigger to look for the funny negative energy events. Luminosity is varying between 0.5 and 2.0 E30 and the Control Room makes separator scan. Make EM Ref Set 0 have threshold of 6 GeV at eta -1 only. Make EM Ref Set 1 have threshold of 2 GeV at eta -14 only. Require 3 hits in Ref Set 0 and 1 hit in Ref Set 1. Main Ring is stacking. There are no other terms in this Special Spec Trig. The rate is ZERO. But both of last weeks funny events would have passed this trigger. Has something changed with the funny events? Move the thresholds around to prove that things are working: EM Ref Set 1 EM Ref Set 0 eta -14 only eta -1 only Rate Hz -------------- -------------- --------- 0.25 1.0 50 2.0 6.0 0 2.0 1.0 1 2.0 3.0 0 0.25 3.0 1 event in 2 minutes 0.25 2.0 0.5 Try this again at the beginning of the next store. Luminosity is 10. MR is off. Watch for about 2 minutes at 2.0 Gev at -14 and 6.0 GeV at -1 and see no events. Later during this store look at the log of error from L2. Most of them are from -5,26 which translates into -9 or -10,51 or 52. There are also a few from +2,12 --> 3,24 -5,14 --> -9,28 +2,1 --> 4,1 -6,12 --> -11,23 Jim sets a break point and tries to capture an event. .............................................................................. Date: 17-AUG-1994 At: MSU Topics: 15 errors at INIT time, Make a Find-DAC run, Fix the problem in the file TRICS_Init_Auxi_L15CT.dat that caused two errors from the MTGBit8 PALs when the ERPB-MTG was loaded up and started, Remove the ERPB_MTG_Setup.dat file. 15 Errors at Initialize ----------------------- These are coming from the EM CAT2 in M112 Upper Tier 1 i.e. eta -20:-17 phi 1:8. This card is not in use since the eta coverage clip of the Global Tot Et and the Global Missing Et; so this does not cause a functional problem right now. But it does cause errors at INIT time and this must be fixed. Find-DAC -------- Made a run of Find-DAC early this morning. It is in the file DAC_17AUG94.Log The run was successful in finding good pedestal values for all 2560 DAC's. I have not moved these new values to the Init_DAC_Bytes.LSM file yet. I checked it against the current running ped values and very little has changed. ERPB-MTG errors --------------- There had been two errors reported when the TRICS_Init_Auxi_L15CT.dat executed the start up of the ERPB-MTG. These two errors came from ERPB-MTG channels #7 and #8 when the ERPB-MTG was disabled in preparation for loading the LCA arrays. These two ERPB-MTG channels use type 8 MTG Bit PALs. The problem was that the value 9 was being loaded into them to set them DC low (like you would do to a normal MTGBit2 PAL). But MTGBit8 PALs have the strange clipped leads so a value of 2 is better to load into them. A value of 2 forces there output low but does not disturbe the other internal setup. ERPB_MTG_Setup.dat file ------------------------ ERPB_MTG_Setup.dat is not longer needed so it should be removed from all of the places where it lives. First it was copied to [TrgCur.Obsolete] and then it was removed from: D0HTCC::DAU0:[Trigger], TrgCur: (Fermi and MSU), TrgL15CT.Hardware_Software_Text (Fermi and MSU). This leaves it in [TrgCur.Obsolete] at Fermi and in [TrgCur.DZero] at MSU. .............................................................................. Date: 10,11,12-AUG-1994 At: Fermi Topics: Replace a brick in the Power Pan for M108 upper Tier 1, Repair the problem that L15CT gets bad data from +2,26, Work on the L2 zero or not digitized energy problem, Clean up and THINGS TO DO. Replace a brick in the Power Pan for M108 upper Tier 1 ------------------------------------------------------ Since Monday afternoon there has been a problem with the -4.5Volt supply for M108 upper Tier 1. Monday afternoon there were a couple of alarms from this supply. Dan Owen checked it and it showed some funny noise on the scope and the Fluke meter on AC reads some tens of mV and on DC it jumps around. When I checked this supply this morning it was reading about 20 or 30 mV on the AC scale. All other supplies read 0.000 Volts on AC. When you first plug in the Fluke to this supply it reads 1 or 2 mV on the AC scale, then it starts to ramp up and then within 10 seconds or so it reads tens of mV. I pulled PDM-21 out from M108 upper Tier 1 service and replaced its -4.5 Volt brink. Pull out brick SN#47 and install brick SN#38. Then I reinstalled PDM-21 in M108 upper Tier 1. Repair the problem that L15CT gets bad data from +2,26 ------------------------------------------------------ L15CT sees bit of value 2 as always high in the data from L1 Trigger Tower +2,26. We have known about this since 24-MAR-1994. Dan Owen was also able to find this problem in the data that they analysis while working on the L15CT Simulator. Joan Guida was also able to find this problem while working on the data from the Turn On Curve run. Because the power was off in L1 Cal Trig racks today I finally fixed this problem in the CTFE for +2,26. This is CTFE SN# 90. Pin 13 on U3, the 10H124 driver for some Total Et bits was not soldered. This pin was folded under the IC. From the print set it does not appear that this folded under pin can touch an traces so I just soldered it to its pad. Work to understand why L2 "xyz" filter has started to see No data or Negative Energies at eta phi locations pointed to by L1 Jet List -------------------------------------------------------------------- Uses the L1 Trig D0User to look at events with file name *xyz* in the directory Scratch:[Long.Level2]. Two of these events had lots of negative energy and showed may EM towers over threshold in the negative eta. A summary follows: D0L225_L2EM_XYZ.DMP D0L230_EM_XYZ_POS_81949.DMP 1-AUG-1994 00:32:59.07 27-JUL-1994 05:04:23.31 Spec Trig's Fired: 7, 12 Spec Trig's fired: 5 996 blks 769 blks EM list = 2 EM list = 0 eta,phi GeV eta,phi GeV ------- ---- ------- ---- -3,22 12.75 +7,5 3.0 D0L238_EM_XYZ.DMP D0L210_XYZ.DMP D0L230_L2EM_XYZ.DMP 1-AUG-1994 23:46:38.03 10-AUG-1994 22:07:23.00 1-AUG-1994 00:50:59.79 Spec Trig's Fired: Spec Trig's Fired: Spec Trig's Fired: 7 8,16,20,21 7,8,10,12,16,20,21,22,26 990 blks 568 blks 573 blks EM list = 12 EM list =15 EM list =2 eta,phi GeV eta,phi GeV eta,phi GeV ------- ---- ------- ---- ------- ---- -14,23 2.75 -14,23 4.75 +4,30 13.25 -11,18 3.5 -11,23 4.0 +8,12 3.0 -11,23 3.75 -10,18 5.5 -10,18 6.75 -10,22 8.5 -7,22 3.0 -7,22 2.75 -1,9 3.0 -2,22 4.75 -1,10 6.75 -2,23 3.5 -1,11 4.5 -1,9 3.25 -1,12 4.0 -1,10 7.25 -1,20 3.75 -1,11 3.25 -1,22 9.25 -1,12 4.25 -1,23 6.25 -1,20 7.5 -1,21 4.75 -1,22 14.5 -1,23 12.0 Clean up and THINGS TO DO: ------------------------- We used L15CT in a special EM Calibration run in its normal mode of throwing away events. It was taking about 850 Hz in and passing about 200 Hz. TRICS V6.2 is running and we have a TRICS_Init_Auxi_L15CT.dat that sets up the ERPB MTG and then starts L15CT running. Need to fix either the data value loaded into ERPB-MTG Ch No 7,8 PALs (BitPAL8) or else change the mask in the Hardware Database. We once saw D0HTCC timeout trying to get a .BLX file for a DSP. We had one load failure of L15CT on the night of the 11-AUG-1994. It was caused by a DECnet time out as TCC was trying to get a BLX file from the host. I expect that there was just a temporary network problem or else all the L2 nodes were trying to start up at the same time. Just asking TCC to load L15CT again and all was OK. We just should be aware that this can happen. Need to give the most recent version of TrgMon to the general users via TrgUser account. Need to make a L1 Cal Trig Pedestal run. .............................................................................. Date: 4,5,6-AUG-1994 At: Fermi Topics: Test run of L15CT at the end of a global physics store and during the scraping of the next, L15CT ALWAYS needs to be able to READOUT, Data for thinking about overlap. At the end of the store this morning and during the scraping of the store this afternoon we ran L15CT as part of the global physics run. It is connected to L1 Spec Trig #7 i.e. EM_1_Max. We started this morning with L15CT actually throwing away events. Then for the last 5 minutes of this morning's store we switched to "PANIC" mode (nothing else changed we just paused and reloaded 68k_Services). During the scraping this afternoon we ran L15CT in "PANIC" mode in a global physics run. All was normal in this afternoon's run except muon was not running due to HV off because of scraping. The principal problem discovered was that about 75% of the time when Spec Trig #7 fires, some other Spec Trig has also fired. During the store on the evening of 4-AUG-94 use TrgMon to study what Spec Trig's are likely to fires at the same time as Spec Trig #7 fires. Examine 32 events whose List_of_Spec_Trigs_Fired includes Spec Trig #7. Of these 32 events, in 8 of them, only Spec Trig #7 (EM_1_Max) fired. Of these 32 events, in 8 of them, Spec Trig #8 (EM_2_Med) also fired. Of these 32 events, in 9 of them, Spec Trig #10 (EM_1_Miss) also fired. Of these 32 events, in 20 of them, Spec Trig #12 (EM_Jet) also fired. Of these 32 events, in 3 of them, Spec Trig #16 (Jet_Multi) also fired. Of these 32 events, in 5 of them, Spec Trig #20 (Missing_Et) also fired. Of these 32 events, in 2 of them, Spec Trig #21 (Jet_3_Miss) also fired. Of these 32 events, in 3 of them, Spec Trig #25 (1Jt_35) also fired. Of these 32 events, in 7 of them, Spec Trig #26 (1Jt_Max) also fired. Thus with this LOW STATISTICS it appears that Spec Trig #12 has the most overlap with Spec Trig #7. (i.e. if Spec Trig #7 fires then there is a 62% change that Spec Trig #12 has also fired). It also appears that Spec Trig's #8, #10, #26 all have a considerable overlap with #7 (i.e. any of these three has about a 25% chance of firing whenever Spec Trig #7 fires). In these 32 events, there are a total of 65 fired L1 Spec Trigs. ---> When Spec Trig#7 fires, on the average, there is one other Spec Trig firing. During the period that I collected these 32 events the Luminosity was about 8.5 to 9 E30. During the time I was collecting, the L1 prescales were changed from the 11E30 list to the 8E30 list. Which other Spec Triggers are likely to fire when Spec Trig #7 fires appears to be strongly dependent on what prescale list is loaded into L1. FOR L15CT TO BE USEFUL The meaning of all of this is, that for L15CT to be useful, we need to filter not just one Spec Trig but a whole set of Spec Trig's. Muon people learned this 2 years ago. We could have learned it from the simulator or by testing in beam in global physics running early in this project. Two Management Problems of L15CT -------------------------------- ALWAYS READOUT -------------- The way things are managed for the physics runs is that there is one location where the front-end crates that take part in physics runs are listed. What I mean here by physics run is: both the normal global physics run and any of the "special run" physics runs, and cosmic runs, and all crates test runs,.... We needed to have L15CT included in this list to be part of global physics runs but all of this implies that the L15CT crate will be readout lots of times when we were not expecting to be readout, e.g. times when we expected L15CT to be parked and in a dead loop. There are at least two major places where we need to change the way things are handled: Right after TCC INIT Time ------------------------- Right after TCC has been told to INIT the L15CT crate needs to be able to readout something onto the data cable. I.E. this is before COOR has ever talked about L15CT. For example COOR could INIT TCC and then setup an all crates test trigger and L15CT crate needs to be able to readout. When COOR is Finished Using L15CT for Physics Filtering, COOR Must Return L15CT to a Benign State. -------------------------------------------------------- At a minimum COOR needs to clear the Spec-Trig vs L15CT-Terms memory when it is finished using L15CT. Right now when COOR is finished with L15CT (e.g. near the end of a store when switching to a special run) it does nothing to "de-programmed" L15CT. At a minimum we need to get the Term_Select P2 "de-programmed" so that L15CT can just readout IBS events (i.e. no "That's_Me" events). Solutions --------- The state of L15CT right after INIT time (i.e. before COOR has ever talked about L15CT) is clearly our responsibility. We can cleanly take care of this by using the TRICS_Init_Auxi_L15CT.dat file to hold commands that will move L15CT to a running state with all Spec Trig's 0:15 setup to cause IBS cycles. The way this divides up makes good sence: the "build it" part of TRICS brings L15CT to a default halted state, then TRICS_Init_Auxi_L15CT.dat bring L15CT to a running state. Using TRICS_Init_Auxi_L15CT.dat to come to the running state gives us easy control over exactly what running state L15CT come to. Cleaning up L15CT after COOR has used it for physics filtering, is clearly the responsibility of COOR. COOR needs to either give TCC a new special message (e.g. Release_L15CT) or else COOR needs to give the series of currently defined L15CT commands that effectively result in the release of L15CT. Running in PANIC mode from the very start of Store 5071 evening of 5-AUG-94 --------------------------------------------------------------------------- Starting from the very beginning of Store 5071 we ran L15CT in PANIC mode on L1 Spec Trig #7. The following is TrgMon information from this and the next run. Each row is the average of 10 sweeps of TrgMon i.e. a 50 seconds average. Useful Chan 13 PreScl SpTrig Global Event Spec Trig Spec Trig L15CT Lumnsty List #7 Transfer to #7 Firing #7 L15FW Filtering Time E30 In Use PreScl Level 2 Rate Rate Hz % Skip Rate Hz ----- ------- ------ ------ ------------ --------- --------- ---------- Store 5071 5AUG94 19:20 12.4 15E30 1 124 Hz 38.2 Hz 72.4 % 11.1 Hz 20:06 11.7 15E30 1 108 33.9 74.2 8.7 21:03 11.0 15E30 1 96.8 32.3 72.9 8.8 21:23 10.6 11E30 1 113.6 32.8 69.3 9.9 23:10 9.3 11E30 1 93.2 28.2 71.3 8.3 6AUG94 7:40 6.14 6E30 1 112.5 17.0 69.2 5.2 10:45 5.4 6E30 1 100.2 16.0 71.0 4.6 Store 5073 7AUG94 00:15 13.0 15E30 1 125.2 39.8 72.4 11.1 SpTrg_Fired_List_1000_6AUG94.txt one hour of data centered around 10AM Channel 13 Luminosity averaged 5.6 E30 PreScale List 6E30 SpTrg_Fired_List_0030_7AUG94.txt 1.2 hours of data centered around 00:30 AM Channel 13 Luminosity 13.2 at start of file 12.6 at 1/2 way point 12.0 at end of file PreScale List 15E30 .............................................................................. Date: 26,27,28-JUL-1994 At: Fermi Topics: Re-Install the "A" Hydra-II, Look at the problem of why cann't we Load the LCA's from the ERPB_MTG_Setup.dat file, replace Power Pan MM4 with MM3, Turn-On Curve run, new TRICS and 68k- Services code, TRICS_Init_Auxi_L15CT file, Boot instructions for L15CT. Reinstalled the "A" Hydra-II. Installed MSU SN#1 as the running "A" Hydra-II. MSU SN#5 is stored here at D-Zero as a spare. It is in a white box in the bottom of the spare cards rack. It has all of its expander and paddle cards installed and Steve has checked it just before shipping from MSU to D-Zero. Loading LCA's I connected the Logic Analyzer to the ERPB MTG lines that cause Loading of the LCA's. Running the ERPB_MTG_Setup.dat file caused nothing to happen. I do not know how or why I saw the yellow lights flash a couple of weeks ago. Was it a different ERPB_MTG_Setup.dat file ? Was it caused by the different (broken) ERPB MTG card ? Anyway it takes some special running of the MTG to get it to single cycle scan the PROM Adress range 1000 to 1600 i.e. mostly upper bank. I updated the ERPB_MTG_Setup.dat file and distributed it to all four places. The logic analyzer now shows good signals. I have not had power off to the L1 racks so I do not know if it actually Loads the LCA's. Made the Turn-On-Curve run for L15CT on 27-JULY-1994. After that we switched to the V6.1 version of TRICS and the latest 68k_Services. These together give an on line look into what L15CT is doing. These new versions of TRICS and 68k_Services have been left running. 28-JUL-1994 Called at 7 this morning because a L1 Trig Alarm came in. The Alarm actually came in at about 3:20 but they let me sleep until close to shot setup time. The shot setup failed when they lost the stack at the start of shot setup. The Power Pan in M112 that services the M111 Tier 2 had its breaker trip. When I turned it back on all bricks came up OK except for the -4.5 brick. I pulled Power Pan MM4 and installed MM3. MM4 has a bad -4.5 brick. The bad brick is SN#30 7-FEB-1992. While the power was off I pulled the die down off of the M102 air flow sensor (see last weeks log entry). Load LCA's and ERPB_MTG_Setup.Dat --------------------------------- When turning the racks back on after replacing the Power Pan I checked to see if we could load the LCA's. At power up 9 of the 10 racks had yellow lights on. Only M109 had its yellow lights off. After running the ERPB_MTG_Setup.dat all 10 racks had the yellow lights off. Load LCA works. Recall that we are NOT currently using a TRICS_Init_Auxi_L15CT.dat file so if power has been off then it IS still necessary to manually execute the ERPB_MTG_Setup.dat file. L15CT Boot Instructions ----------------------- I have written more "Boot Instructions" for the L15CT 68k and a description of the characters desplayed on the 68k_Services terminal. I have modified the boot instructions from what Steve got started a couple of weeks ago. They explain where the new VME Reset button is and emphasize, "Why do you think that you need to do this boot". They also talk about turning power ON or OFF to the L15CT. I have put labels on more stuff in the MCH e.g. rack M124, the terminal... I gave a copy of these instructions to Joan Guida. I have not made them public yet. .............................................................................. Date: 25-JUL-1994 At: MSU Topics: Work on broken Hydra-II and also prep and test a spare Hydra-II for FNAL. Steve replaced the Global SRAM for DSP #3 on Hydra-II MSU S/N #4 (which had been DSP-A at FNAL until last week, see the 21..23-JUL-1994 entry). This appears to have fixed the "Sanity and Configuration Checker's" displayed error. But we never saw the "SCC never try to boot DSP #3" error which Dan saw last week. Steve looked for this error both before and after replacing the GSRAM. So for now, let's keep this card at MSU. We will use the recently-repaired Hydra-II MSU S/N #1 (with TCPE S/N #4, TPPB S/N #4, and SPPB S/N #2) as the DSP-A card at FNAL. Steve removed the TCPE and TPPB (both S/N #3) from Hydra-II MSU S/N #4 and installed them on Hydra-II MSU S/N #5. A new SPPB (S/N #8) was built for Hydra-II #5. Recall that the TCPE and TPPB are supposed to be interchangeable, but the SPPB must always be "custom-built" for its associated Hydra. The Hydra-II #5 is going to FNAL as the fully-assembled, long-term support spare. Hydra-II #4 is staying at MSU, with only its SPPB. This is the only Hydra which will remain at MSU. .............................................................................. Date: 21,22,23-JUL-1994 At: Fermi Topics: Clip the Global Missing Et and the EM Et and the HD Et at Eta 3.2, modify Trics_Init_Auxi as part of doing this, Test run with both L15CT and L15 Muon running and rejecting events, Work on the mismatch problem between L0 Bunch Number and L1 Bunch Number, Boot Rev 6.1 TCC code along with the newest 68k_Service code, Work with L15CT -----> There is an Air Flow Sensor left tied down in the L1 Racks. <----- -----> Clean this up next trip or next time that power is off. <----- Bunch Number Mismatch --------------------- In L2 Filter Code the L1 Bunch Number is determined by reading the 6 Bunch_P_Gate And-Or Term DZero Note 967 Item Number 381. This is from the IMLRO Card in the upper And-Or cardfile in rack M101 i.e. And-Or Terms 0:127 Spec Trigs 0:15. Since "for ever" there has been a low level of mismatch between L0 Bunch Number and the L1 Bunch Number e.g. one or two error per physics run. Early this week this climbed to 1 per 1000 events transfered to L2. Rich Partridge and Jeff Bantly showed by complicated tests that L0 was most likely OK and after they learned about the second copy of the And-Or Terms (i.e. for Spec Trig's 16:31) they were able to show that the Spec Trig's 0:15 readout of the And-Or Terms did not match the Spec Trig's 31:16 readout of the And-Or Terms in the region of the Bunch_P_Gates for those events where there was a L0 to L1 mismatch and that it looked like L0 matched the Spec Trig's 31:16 version. I added a check of the Bunch_P_Gate And-Or Terms to the VTC Code. This new part of the VTC Error_Checking routine verifies that: The Spec Trig's 0:15 version of the Bunch_P_Gate Terms matches the Spec Trig's 31:16 version, That the Current Bunch_P_Gate equals the Previous Bunch_P_Gate +1 Mod 6, and that there is one and only one Bunch_P_Gate And-Or Term active. This new error checking routine immediately found errors when we ran the normal test trigger. OK We now have a tool to see the problem. In M101 pull IML SN#15 (was And-Or Terms 0:127 Spec Trig's 0:15) and replace it with SN# 17 which has had all of its 10H101's replaced with 10101's. In M102 pull IML SN#10 (was And-Or Terms 0:127 Spec Trig's 31:16) and replace it with SN# 12 which has had all of its 10H101's replaced with 10101's. This did NOT help but I left the new Non 10H101 cards installed. In M101 pull IMLRO SN# 17 and replace with SN#22 (And-Or Terms 0:127 Spec Trig's 15:0). This did NOT help so the original IMLRO SN# 17 was put back into M101. Each time after working on this system I Noticed that there were never any errors when the system was first turned on and was cold and L1CT was mostly still off. It takes L1CT running for about 5 minutes to warm things up before we start to see the errors. Pulled the M101 upper And-Or cardfile (And-Or Terms 127:0 Spec Trig's 15:0) MBD card (SN# MBD-019). Replaces all 10H101's with 10101's and replaced the 10H109 with a 10109. This did NOT fix the problem. Perhaps it is a Front CBus problem on this And-Or Cell. Loaded thing up to get a test trigger running, then replaced the Front CBus with a 3 connector cable to pickup just the MBD, IML, and IMLRO. This did NOT help or make things worst. Perhaps it is a problem on the backplane in the range of And-Or Term #95 through #127. I put on an extra terminator on the backplane to screw up the signals a little. This did NOT help or make things worst. Perhaps it is something on the MBD besides the 10H101's. I put an extra terminator on the back CBus connector of the MBD. This did NOT help or make things worse. Perhaps it is something on the MBD besides the 10H101's. I replace the MBD SN# MBD-019 with the one spare MBD at Fermi SN# ???. This appeared to help a little for a while and then things got bad again. Well this looks like a flakey error i.e. its good when first turned on and then gets bad. Well may be the timing is too tight in the CBus cycles of the Data Block Builder. Currently we are running 7 CBus cycles per beam crossing. I make a new Framework main timing MTG PROM #3. This is called Revision N. This has 6 CBus data block builder cycles per beam crossing. With this installed there are no more errors. Or is it just that the old PROM is bad? One other clue. The errors picked up by the error check in VTC never said that an illegal set of Bunch_P_Gate And-Or Terms were active (e.g. zero And-Or Terms or more than one And_Or Term) ---> The problem can not be random data on the CBus data lines or random address on the CBus address line. The problem must be the address line that selects between Current and Previous. This Rev. N PROM #3 for Framework MTG keeps the positive going edge of the COMINT Clock at tick 11 at the same place as Rev M. I.E. I think (and hope) that it is only this positive edge of COMINT Clock that needs to be synchronous with other L1 FW activity. NEED TO VERIFY THIS The new TSF file is Framework_MTG_PROM_3_SN_3N.TSF. The old version was M. Note when working with a simple prescale only L1 trigger, if you want to see all of the different bunches then the prescale must not divide by 6 and also the prescale x 2, ... the prescale x 5 all must not divide by 6. 12113 and 24113 are good numbers. Work with L15CT --------------- Well there were many power cycles of L1 racks so I got to play running the ERPB_MTG_Setup.DAT file a bunch to see if it can load the LCA's. ---> It does NOT appear to work. <--- The only thing that works is pushing the button on the distributor caps. It looks like Hydra A is dead. Sometimes it hangs saying it is waking up DSP #4 and sometimes it get to DSP #3 and says that there is a memory error DSP #3 at 0x c0002022 wrote 0x 0 read 0x 1000 This was all working Thursday night when we ran L15CT and L15 muon at the sametime with both rejecting events with out any problem. I even tried power cycling the L15CT crate and this did not help. Pull the "A" Hydra-II card. It is SN#7047 MSU #4. When pulling out Hydra-II "A" I did it from the left i.e. pulling out the IRONICs cards first. Before I pulled the twist and flat cables off of the front of the HYDRA I added more information to the labels. I added the following to help quickly get them back in the right spots: e.g. A 1 L | | | A, B, or C ------+ | +------- L, M, or R 1:6 --+ This is Hydra-II A, B, or C Connector 1:6 (1 is the top connector) L Left, M Middle, R Right row of connectors when viewed from the front. On Thursday night (when L15CT was still working) we made runs with both L15 muon and L15CT running and rejecting events. This worked OK. Clip Global Missing Et and Global Total Et to |ETA| <= 3.2 ---------------------------------------------------------- To clip the Global Missing Et and Global Total Et coverage, I pulled out the M111 Tier 2 CAT2 output cables from the Px, Py, EM Et, and HD Et CAT2's cards. Trics_Init_Auxi.DAT was then changed to ever write the Correction Registers at Tier 3. Note Trics_Init_Auxi already had commented out code in it to do this. This commented out code in Trics_Init_Auxi was exactly equal to code in the file TrgCur: Tree_Offset_eta_16.dat. But I think that both of them had wrong values for the HD 2nd lookup correction registers. I fixed the HD 2nd lookup correction register values in Trics_Init_Auxi.dat but I did not change Tree_Offset_eta_16.dat. Perhaps these old values of HD 2nd lookup correction reg values were proper for the Run 1A HD PROM's. If we are going to run this way then I would like to remove the M111 Tier 2 Px, Py, HD Et, and EM Et card and the M111 and M112 Tier 1 HD Et and EM Et cards. A total of 18 CAT2 cards. This is a significant amount of power. ---> TRICS would need to be changed so that it does not INIT these cards. <--- Tried TRICS V6.1 and latest 68k_Services ---------------------------------------- TRICS V6.1 appears that it may have two problems: Something is funny at Boot and INIT time that causes it sometimes (perhaps 30%) to get stuck in a funny mode where all L1 Data Block appear to be over written or the pipe control is wrong or it does not resync the pipes or something like that. It appears that it may have the wrong base address of the status blocks or the wrong status block organization when reading status from the DSP's. .............................................................................. Date: 19-JUL-1994 At: MSU Topics: Look at VSB Mastership negotiation We wanted to understand whether the MVME135 CPU can acquire VSB mastership from the Hydra-II while the Hydra-II is performing DMA access of the VSB memory at full speed. In order to do this test at MSU, we set up the VSB bus masters in the "normal" way (i.e. the way they are at FNAL: 135 is Crate Controller and can request Mastership, Hydra-II can only request Mastership). We then set up a maximum-length DMA list in the Hydra-II, transferring from a fixed on-chip SRAM location to a fixed VSB address. We started the DMA list, and then used 135BUG to also access VSB memory (via the MD/MM commands). The Hydra-II DID transfer VSB mastership to the 135 in this situation. That is, the Hydra-II will not "hog" the VSB bus during DMA accesses, but it will instead release the bus when requested. The test was also performed using the C40 CPU to access the VSB memory. In this case, the RPTS instruction was used to perform zero-overhead looping on a STI instruction which pointed into VSB memory. Again, the results were the same. Also note that the time required to perform each VSB access was the same between CPU-initiated cycles and DMA-initiated cycles. In each case, a complete access required 700 ns. This was determined by running either the DMA engine or the RPTS instruction for a fixed amount of time and then determining how many transfers were performed by looking either at the Transfer Counter or the Repeat Counter. .............................................................................. Date: 14-JUL-1994 At: MSU Topics: Edit Trics_Boot_Auxi.dat and TrgMon_FS.RCP to implement the renamed scalers for Active MR Veto. DZero is making permanent the change over to Active MR Veto running. We needed to rename a number of Foreign Scalers in the Trics_Boot_Auxi.dat file to implement this. The old file was put in [TrgCur.Archive] at Fermi and the new file is on TCC's disk, in TrgCur:, and copied to MSU::[TrgCur.DZero]. Renamed the same set of Foreign Scalers in the TrgMon_FS.RCP file. This new file is in HTrgMon: at Fermi, in the TrgUser account at Fermi and copied to MSU::HTrgMon: The following is the new names of these scalers in the Luminosity files: NIM to ECL Pair on Module Lemo the 17 Connector Pair Cable What signal is it. Where does it go. ------------- ----------- ------------------------------------------------- 14th from top 4 This scaler was: BX_Counts_of_MR_Veto_High It becomes: BX_Counts_of_MRBS_and_MicroBl This is: Foreign Scaler #31 Gate A DBSC Ch #2 in slot 12 CA=35 15th from top 3 This scaler was: BX_Cnts_of_MR_Veto_High_or_Low It becomes: BX_Cnts_MRBS_and_uB_or_MR_Low This is: Foreign Scaler #30 Gate A DBSC Ch #3 in slot 12 CA=35 10th from top 8 This scaler was: BX_Cnts_MR_Hi_or_uB_or_Mu_HV It becomes: BX_Cnts_of_MicroBlank_or_Mu_HV This is: Foreign Scaler #35 Gate A DBSC Ch #2 in slot 11 CA=32 11th from top 7 This scaler was: BX_Cnts_MR_Hi_or_Low_or_Mu_HV It becomes: BX_MRBS_and_uB_or_MR_Low_or_MuHV This is: Foreign Scaler #34 Gate A DBSC Ch #3 in slot 11 CA=32 .............................................................................. Date: 12-JUL-1994 At: MSU Topics: TCC Problem at Fermi TCC "hung". Tried to look at the directory on TCC disk. $ dir d0htcc::dua0:[trigger] %DIRECT-E-OPENIN, error opening D0HTCC::DUA0:[TRIGGER]*.*;* as input -RMS-F-NET, network operation failed at remote node; DAP code = 01F77C54 $ show time 12-JUL-1994 13:40:59 I do not know if there were other problems (e.g. network or L2 nodes) at the same time. .............................................................................. Date: 8-JUL-1994 At: Fermi Topics: Take snapshots of L1.5 Cal Trig without beam We set up the Logic Analyzer to monitor the L1.5 Cal Trig. We are taking "snapshots" of the "big 5" L1.5 Cal Trig operation modes. We want to really nail down the details of time usage in L1.5 Cal Trig. The setup of the Logic Analyzer that was used is stored on Logic Analyzer Disk #6 in the file SNAPSTUP.C15. The Logic Analyzer was set up as follows: Pod #1: TTL ----------- Signal on pod Signal name Signal source ------------- ----------- ------------- 0: hld_tran Hold Transfer Path Select 4th from top 1: some_hap Something Happened Path Select 5th from top 2: thats_me That's Me Path Select top 3: vme_a20 VME A20 VME Bus #1 C18 4: vme_a21 VME A21 VME Bus #1 C17 5: vme_a22 VME A22 VME Bus #1 C16 6: vme_a23 VME A23 VME Bus #1 C15 7: vme_/wrt VME *WRITE VME Bus #1 A14 8: vme_/as VME *AS VME Bus #1 A18 9: po_ans_0 "Port" Ans for Term 0 Term Answer top 10: ff_don_0 "FF" Done for Term 0 Term Answer 5th from top 11: vsb_bfsl VSB Buffer Select tapped from VSB Buffer Sel. 12: not used 13: not used 14: not used 15: not used Pod #2: ECL (variable threshold = -1.4V) ---------------------------------------- Signal on pod Signal name Signal source ------------- ----------- ------------- 0: strt_dgt Start Digitize single-signal cable 1: xmt_trig Transmit Trigger "Slave" MTG pins 21-22 We collected the following files: SNPBNSTB.C15, SNPBNST2.C15: "N" with all signals SNPBNNOS.C15, SNPBNNO2.C15: "N" with no VME /WRT, no VME /AS SNPLNSTB.C15, SNPLNST2.C15: "n" with all signals SNPLNNOS.C15: "n" with no VME /WRT, no VME /AS SNPBFSTB.C15, SNPBFST2.C15: "F" with all signals SNPBFNOS.C15, SNPBFNO2.C15: "F" with no VME A22, no VME /AS SNPBISTB.C15, SNPBIST2.C15: "I" with all signals SNPLISTB.C15, SNPLIST2.C15: "i" with all signals These files will need to be carefully dissected at MSU. We performed simple sanity checks on these files at Fermilab. We see that the L1.5 Cal Trig requires 120-123 us between Start Digitize and returning DONEs to M103 L1.5 Framework. Note that these snapshots were taken running at 0.57 Hz so there is no overlap between triggers. The 68K Services CPU spends a lot of time waiting for Global to be at D3 in this configuration. With more overlap between triggers, the 68K will not be so quick to start looping on the "check Global for D3" loop. We removed the Logic Analyzer and returned the L1.5 Cal Trig to a standard clean configuration. We ran at 570 Hz (with a Mark and Force Pass ratio of 1919) for 25 minutes with 0 errors. Note that, running at 570 Hz, every event following a Mark and Force Pass event is accepted via L1.5 Framework Timeout. This is the known behavior, the L1.5 Cal Trig is not able to service this event within the L1.5 Framework Timeout timeframe, so the L1.5 Framework times this event out. Note that, in order to make "i" events, we typically change the programming of the L1.5 Trigger Framework. The Specific Trigger that is being used to make "i" events must be a L1.5-Type Specific Trigger. The programming of the L1.5 Trigger Framework Veto/Confirm MTG must be changed to force the Specific Trigger to be Vetoed. The Veto/Confirm MTG is CBUS=2, MBA=57, CA=19. The value "1" needs to be programmed into two Function Addresses in this card in order to force the Specific Trigger to be Vetoed. These Function Addresses are: the same as the Specific Trigger to be Vetoed, and 16+the Specific Trigger to be Vetoed. e.g. to force Specific Trigger 2 to be Vetoed, put a "1" into both FA=2, and FA=18. To return the L1.5 Trigger Framework to its normal operation, program the value "254" into the same two Function Addresses. .............................................................................. Date: 7-JUL-1994 At: Fermi Topics: Take Dan Owen special run during beam We took the "Dan Owen/Mike Tartaglia" special run during beam. Some TRGMON screen captures and the programming of the L1.5 Cal Trig are stored in the file VWORK1:TRGMON_DUMP.TXT_L15CT_BEAM_7JUL94. A summary of this run is included below: Approximate Luminosity = 2.3 E 30 MFP Ratio = 5 Actual rejection rate estimated at 86% by looking at 68K_Services tube TRGMON-displayed rejection rate estimated at (4/5)*Actual_rate = 69% TRGMON-displayed rejection rate: 68%-71% Dead Beam Crossings due to L1.5 approximately 0.81% @ 76 Hz, 1.1% @ 100 Hz Geo Sect 5 Front End Busy approximately 1.1% @ 78 Hz in/ 24 Hz transfer TRGMON rates: Specific Trigger #1 ("That's Me") = 100 Hz Specific Trigger #2 ("IBS") = 2 Hz Approximate rates from 68K_Services screen: I = 3/80 * 102 Hz = 3.8 Hz F = 16/80 * 102 Hz = 20.4 Hz N = 6/80 * 102 Hz = 7.7 Hz n = 55/80 * 102 Hz = 70.1 Hz Rough deadtime calculation: @100 Hz, 1.1% L1.5 Dead X = 110 us/L1.5 Cycle 4/5 of these cycles take the "normal" amount of time, about 130 us 1/5 of these cycles take the "short" amount of time, about 30 us 4/5 * 130 + 1/5 * 30 = 110 us/L1.5 Cycle average The configuration for this special run was screwed up, however. The Level 1 Reference Set was only defined out to Trigger Tower eta index +/-6. Therefore, all of the found objects were on Hydra-B. This run should be re-done. This can (and should) be done without manual Dan/Steve intervention. Steve started a set of booting instructions for the L1.5 Cal Trig 68K Services computer. We really need to install a nice pushbutton on the VME RESET* for the L1.5 Cal Trig VME Crate. This crate will need to be manually reset occasionally and it is scary to imagine random people trying to press the MVME135 RESET button. Note that the Slave Vertical Interconnect's RESET pushbutton cannot be easily tied up to the VME RESET* signal (i.e. there isn't a jumper on the Slave VI to do this). The VBD began exhibiting its old "VBD reset required after VME RESET" feature. Recall that feature results in the Sequencer console scrolling Token Loop Count errors for Crate 51. Did this problem ever really go away? Pushing VBD reset is included in the 68K Services booting instructions. .............................................................................. Date: 6-JUL-1994 At: Fermi Topics: modify Path Select P2, install ERPBs/DC for Rack M112, change ERPB MTG setup, replace ERPB MTG PCB, new "Panic" mode idea, Replaced the Tier 2 -Px CAT 2 card in M105, Problem at Init time after having power off, Ran CalTrig_Random We modified the Path Select P2 card at Fermilab to produce an ECL output of the "That's Me" signal. Also, "internal" modifications were made to the card-- the "Hold Transfer" signal is now buffered to reduce the load on the inter-P2 bus, and the Hold Transfer delay is now fixed at 300 ns. We added some TTL test points to the card. Looking at the card from the back of M124, the test points are: (top) That's Me X X GROUND Delayed Hold Transfer X X GROUND >= 1 Term to be evaluated X X GROUND Hold Transfer X X GROUND Something Happened X X GROUND (open) X X GROUND We also modified the "chain" of MTG Master Channels. The current arrangement is: "That's Me" is EXTBIT for MTG Channel 8 MTG Channel 8 BITOUT is EXTBIT for MTG Channel 3 (Store_Enable_Bar) and inverted is EXTBIT for MTG Channel 4 (Latch_Enable_Bar) and is EXTBIT for MTG Channel 7 MTG Channel 7 BITOUT is EXTENB for MTG Channel 5 (Transmit_Trigger) (Channel 5 now has a BIT2 PAL rather than a BIT8) We cleaned up this wiring somewhat on the patch panel, but we have not done a 100% final installation of this wiring, because this wiring will change when we start double-buffering the ERPBs. The final installation of this wiring should actually be done on the "jumper block" at the front of this MTG. We installed the ERPBTG1C PROM in the ERPB MTG. This PROM changed the timing for Channel 5 to be a 1us pulse, up at 45 and down at 72. This is appropriate for the BIT2 PAL which is now in Channel 5. The Transmit Trigger will now be a 1 us pulse rather than the previous 3.5 us pulse. This change was not strictly necessary (i.e. we didn't see any problems which we could trace to the old PAL) but BIT2 PALs are easier to think about. We edited the ERPB_MTG_SETUP.DAT file to account for the change to a BIT2 PAL in Channel 5. Dan installed ERPBs and the DC in Rack M112. This completes the ERPB/DC installation. The Serial Numbers are (in descending order, starting with the DC at the top of the rack): DC-6; ERPB-88; ERPB-89; ERPB-90; ERPB-91; ERPB-92; ERPB-93, ERPB-95; ERPB-94. We looked at the data from these ERPBs to verify that the DSPs were seeing 128 bytes per Comm Port, and that no dangerous bits were stuck on. It required about 4 hours of all power off to install these last ERPB's and route their cables in through the cable clamp in M111. When power was turned back on after 4 hours and we tried to initialize the system there was an error. S-INI/ODB%COORini% Initializing all Specific Triggers E-HIO/HDB%COORini% Failure Writing 240 @ cbus 2 mba 57 ca 34 fa 27 read 248 E-HST/ODB%COORini% Failure Programming Spec Trigger #11 Requiring Level 1.5 Term #12 E-INI/ODB%COORini% Failure Initializing Spec Trig #11 Requiring of L1.5 Term #12 . . . . . . . . E-HIO/HDB%COORini% Failure Writing 240 @ cbus 2 mba 57 ca 34 fa 27 read 248 E-HST/ODB%COORini% Failure Programming Spec Trigger #11 Requiring Level 1.5 Term #15 E-INI/ODB%COORini% Failure Initializing Spec Trig #11 Requiring of L1.5 Term #15 E-INI/ODB%COORini% Failure Initializing Spec Trig #11 E-INI/ODB%COORini% Spec Trig Initialization Failure Count Is 1 I believe that this is the "standard" DigiMem card error that we sometimes see on an Initialize after a long power down. This error lasted for about 4 initializes over a period of perhaps 10 mimutes (right after power up and then this problem went away (as it has before. All of "Level-1" (L1 FW, L1 CT, L1.5 FW, L1.5 CT and L1.5 CT MTG including the fan) was powered off for this installation. After powering back up, we noticed yellow ERPB LEDs turned on in Racks M107, M108, M112. We left the ERPBs in this condition for approximately 1 hour, and then turned on the ERPB MTG. When we looked at the ERPB LEDs, all of the yellow LEDs were turned off. The next time we have the L1 Cal Trig turned off, we should do the following: 1) check yellow LEDs on ERPBs...DO NOT push the pushbutton on the DCs 2) if any are turned off, wait a few minutes and check again. We want to see if they "spontaneously" load themselves. 3) if the ERPB MTG is turned off, look at the LEDs both before and after turning the ERPB MTG back on. We want to see if they load themselves based on random noise from the MTG as it is turned on 4) last of all, run the ERPB_MTG_SETUP.DAT file. Look at the LEDs both before and after running this .DAT file. We want to prove that this file actually loads the ERPBs if they haven't already been loaded (we believe that it can successfully re-load the ERPBs). We worked on the 1-in-30000 hang error (i.e. the chronic error that we have seen where no ERPB data is transmitted to the DSPs). We started by loading the "hang on error" version of the 68K_Services code. We clipped the Logic Analyzer on the TTL copy of the "Something Happened" and "That's Me" signals which were added to the Path Select P2, as well as using the differential-to- single ended ECL convertor box to monitor one of the "Transmit Trigger" slave copy signals. We set the Logic Analyzer to never trigger, and used a -1.4V threshold for the single-ended "Transmit Trigger." After the first hang, our suspicion that a "That's Me" occured without an associated "Transmit Trigger" (for the "hang" event) was confirmed. We then started looking at the "chain" (i.e. Channel 8 to Channel 7 to Channel 5) which is used to convert "That's Me" into "Transmit Trigger." The output of Channel 8 (which should be a single 3.5 us pulse starting about 400 ns after the rising edge of "That's Me") was only a 2.275 us pulse for a "hang" event. I.e. it dropped at about tick 65. This 2.275 us timing was repeatable (not random). Note that this signal is "picked up" by Channel 7 at about tick 75, i.e. it was completely missed by Channel 7 for the "hang" event. After trying a few things (i.e. replacing the BIT8 PAL in Channel 8 with a new BIT8 PAL, verifying that the Accelerator Clock and Turn Marker going to this MTG were OK, and noting that we had already replaced the PROM for this bank), we decided that the problem was somewhere in the MTG PCB, likely in the clock generation section of the board. The soldering on this board did not look very good--we couldn't see an obvious joint to re-touch, instead we saw a lot of iffy joints. We moved all of the components from the "old" ERPB MTG (S/N 24) to the new ERPB MTG (S/N 21--this card was once the L1.5 Framework Control MTG; it was removed during troubleshooting but is actually thought to be 100% OK). After re-installing, we ran at about 50 Hz (with an additional 170 Hz of L1 rejects and 30-40 Hz of L1 accepts caused by other people running) for about 30 minutes with no errors. Another thing that we need to do with the Logic Analyzer is collect snapshots of the normal running of the L1.5 Cal Trig. We should capture the following signals: - Front-End Busy - Something Happened - That's Me - Transmit Trigger - Answers/Dones to L1.5 Framework (or at least a time marker) - Start Digitize and Hold Transfer We should get a snapshot of each of the following situations: - "N", "n", "I", "i", "F" This is probably most simply done with no beam. It might also be nice to capture a normal mix of events during actual running. We also thought of two different software-only ways to do the "Panic" mode operation. We could either let the 68k_Services CPU immediately say "yes" to the L1.5 Trigger Framework (at a cost of about 20us deadtime, this is what the 68K_Services code currently does for a Mark and Force Pass event), or we could let the 68K_Services CPU wait until the DSPs respond, and then say "yes" (which would give the correct deadtime, about 130 us from "That's Me" to clearing the Front-End Busy). The first mode has the advantage of reducing deadtime (but note that it will give an unrealistic view of the real deadtime), while the second mode has the advantage of showing the real dead time (but also imposing this deadtime on the experiment). We want to avoid changing hardware to operate in "Panic" mode. Tier 2 -Px CAT2 replacement. Replaced the CAT 2 in slot 24 of the Tier 2 in rack M105. This is the -Px adder. Pulled CAT2 SN#90 and replaced it with CAT2 SN#69. These are Tier 2 ECO'ed CAT2's. This is to repair the problem of sometimes being off by 8 counts in the Px sum. Philippe traced this to Px from eta -5:-8 phi 17:24 which is operand #8 on this CAT2 Card. See entries in this log from 30-June-94, 25-MAR-94, 16-FEB-94 for more background on this problem. Run CalTrig_Random. After replacing this Tier 2 -Px CAT2 card we ran 325k loops of CalTrig_Random test. This required only 45 minutes to run. We had only 1 error during this run (EM Tower Count Ref_0 is 273 instead of 275). There were no momentum sum errors. .............................................................................. Date: 30-JUN-1994 At: Fermi Topics: Investigate MPt discrepancy (Jill P) Jill Perkins et al (Nikos, Andrezj Zieminsky) compare the Level 1 Trigger AndOr Terms to the simulator results. They have a number of discrepancies that need to be investigated in detail, after the L1.5 CT is delivered. Some are on terms actually used, others are on unused comparators, etc. One worrisome error is on MPt comparators, at the 10 % level. All 3 programmed thresholds show problems, with rates decreasing with increasing threshold values. This could be a failure of the Px/Py trees to come up with the correct sum, or of the FMLN to do the comparison, or hold its memory content, or... To investigate this problem, we run CalTrig Random Tests. And the symptoms all point to the old problem with Px at (-5:8,17:24), cf entries from 16-FEB and 25 MAR. But the problem does not go away this time after pushing on the Tier #2 +Px card. It doesn't go away after reseating the Tier #1 Px card and connectors either. The next step is to replace the Tier #2 card, trying to look at the backplane pins with the flash light. This isn't done today because we are ready to leave, and the chances of much beam in the close future are low. Also the error is only of 8 counts = 4 GeV of Px. After failing to solve the problem, we ran successfully 900,000 loops of full random tests on all etas, and 1/2 the phis (1:16). .............................................................................. 30-JUN-1994 Make tests of the trigger rate vs FEBz% and BX_Lost_to_L15% for the various types of L15CT cycles. First look at IBS events where all are CONFIRMED by the L15 Framework. "I" Hz FEBz% from Geographic Section 5 ------ ------------------------------------------------ 57 0.1% 71 0.1% 95 3/4 of time 0.1%, 1/4 of time 0.2% 114 0.2% Now turn off sending to the Host to go faster 286 0.4% 3/4 of the time, 0.5% 1/4 of the time 500 0.9% 3/4 of the time, 1.0% 1/4 of the time We also looked at L1's FEBz% during the above sweep Hz FEBz% from Geographic Section 1 ------ ------------------------------------------------ 500 2% +- 1/2% 518 55% Now look at IBS events where all are REJECTED by the L15 Framework. "i" Hz FEBz% from Geographic Section 5 ------ ------------------------------------------------ 5.8 0.0% 284 0.3% 572 0.7% 954 1.1% Here there is 0.33% dead BX during L15 cycles. 960 59 % This is the limit of printing "i"s on screen. Now look at That's_Me events where all are REJECTED by L15 Framework. "n" There are zero MFP events at this time. FEBz% Hz from Geographic Section 5 Dead BX During L15 Cycles ------ --------------------------- ----------------------------- 5.7 0.0% 0.1% 57 0.1% 0.7% 238 0.3% 3.0% 477 0.5% 5.9% 954 1.0% 11.8% During the above scan there were zero L15FW exits via Timeout. During the above scan we learned (or re-learned) that during an L15 FW Decision Cycle the Front-End Busy scalers do not increment. Now look at That's_Me events where all are CONFIRMED by L15 Framework. "N" Do this by setting a CTFE Pedestal DAC up high. We used 0,169,33,0 and moved it from 35 to 255. Used a flat over eta,phi 2.5 GeV Ref Set in L15CT. There are zero MFP events at this time. FEBz% Hz from Geographic Section 5 Dead BX During L15 Cycles ------ --------------------------- ----------------------------- 5.7 0.0% 0.07% 57 0.1% 0.71% 238 0.2% 2.94% 475 0.5% 5.84% 550 0.8% 6.9% - L1 FEBz = 60% Note that with the "n"s we ran at high rate (238, 477, 954 Hz) for sometime with no problems i.e. no "e"s. With "N"s we ran at 238 Hz and after perhaps 2 minutes had one "e". Then running at 477 Hz for a couple of minutes we had another "e". Now look at MFP events where all are CONFIRMED by L15 Framework. "F" We do this by keeping "N" running and setting the MFP ratio to 0. FEBz% Hz from Geographic Section 5 Dead BX During L15 Cycles ------ --------------------------- ----------------------------- 5.7 0.4% 0.01% 57 3.9% 0.14% 95 6.5% 0.23% 149 9.7% 0.35% 285 19.5% 0.7% Now we begin work with the Error_Recovery routine. First we note that the previously valid wake up word and the previously valid Transfer to 214 word need to be before sending the IIOF2 interrupts to the 12 DSP's. We cause an "e" by telling the ERPB_MTG to ignore its input lines. This causes an "e" and all appears OK. The second time that you tell the ERPB_MTG not to follow its inputs you get another "e" and then the system hangs with the 68k looking at Readout Control P2 looking at the DONE line from the VBD executing code in the Complete VBD routine. Abort brings the 68k out at $95C7E. Return the "old" (hang on error) 68K_Services code and try to diagnose one of the "high-rate N" hangs. First, try low-rate "N" to prove that everything is working. This doesn't work, we get "e" after 2 events. Try again multiple times, keep getting "e" very quickly. Hitting the L1.5 Cal Trig with hardware resets and re-downloading the MTG (as well as the Cal Trig) does not help. The diagnosis is the same each time, LDSP A4 (but ONLY LDSP A4) did not get some of its ERPB data. This was determined by looking at the DSP_to_68K Status longwords in VME memory via the BUGMON. No DSP debugger was involved. Next tie up the DSP debugger to more carefully study this error. It happens again very quickly, and we see that Rack 2 Total Et data for LDSP A4 is incomplete. The DMA List for this channel was 1 transfer from the end. After another reload and restart, the problem appears to go away. But not for long... Start high rate "N" tests with transfer to the host shut off. After 2 minutes at 238 Hz, it sticks. Abort 68K and examine symptoms: exactly the same (A4 Rack 2 Total Et not complete). Run some more tests, including setting a L1 HD Trigger Tower to have some Et (using pedestal DAC). The "big" tower walks around a lot in A4, but not in A1 (where it shows up in the Rack 1 Total Et). The problem is eventually traced to another bad CRC->DSP cable. Replace the cable and look for the big tower: It is stable in both A4 and A1. Run at 28 Hz (transfer to the host turned on) for 20 minutes with no errors. Begin high-rate "n" tests. At 286 Hz in to L1.5 Cal Trig (0 Hz out), we hung. Look at the DSP_to_68K Status words--NO LDSPs have seen their ERPB data. This is consistent with the hangs that Dan saw last month. We were using the SKIP_TEN_BX And-Or Input Term when the hang occured. Try to do the Bill Cobau download. Eventually we have success, but this is still clearly a weak link in the system. Collect a few hundred events on disk for Dan Owen. The L1.5 Cal Trig was accepting some events just based on beam noise, and Dan wanted to examine this phenomenon. Load the crude Error_Recovery version of 68K_Services. This version, upon detection of an error, just clears the Transfer and Wake Up Words, hits the DSPs with the Error interrupt, and the jumps to Orbit Master (i.e. it does not try to transfer the event). Doing Error Recovery in this way causes the L2 Sequencer console to generate one Token Loop Count Overflow for Crate 51 (us). We leave the Error_Recovery version of the code installed and let it run for 30 minutes. It successfully recovers from about 10 errors. We actually were there to see one "e" appear on the screen. We leave the L1.5 Cal Trig powered up and 68K loaded, so people can do download tests (or whatever) over the next few days while we are gone. We have removed the PC. We leave the L15CT VBD NOT bypassed. .............................................................................. 29-JUN-1994 At: Fermi Topics: install repaired CRC, install last PROM in ERPB MTG, clean up cabling and patch panels at the back of M124, simple data transfer tests on new CRC, VSB hang debugging. Installed the repaired CRC card in the lower CRC slot in the CRC/MTG backplane in M124. Dan cleaned up the cabling and patch panels in the back of M124 during this installation. The repaired CRC was able to grab all of the tokens from the DSPs (after running Load_12_Parameters, all 12 DSP Status to TCC Longwords were $00000000). We also installed the last ERPB MTG Type 2A PROM (for slave channels for |eta| 17..20) in the ERPB MTG at this time. We then arranged the L1.5 Cal Trig to "parasite" from a low-rate, high-prescale Specific Trigger. This was done by replacing the Start Digitize to Geo Sect 5 input to ERPB MTG Channel #6 with just the Specific Trigger Fired signal for the chosen Specific Trigger (#1). We then set the "FORCE >=1 Term" switch in the front of the VME crate. We then wanted to prove that all DSP "ERPB Input" Comm Ports were receiving 128 bytes of data on each event. We loaded the normal DSP code into the DSPs using the JTAG port. We DID NOT use any eta-coverage reducing TAKE files on the debugger. We ran (with beam) an old "Steve private" version of a baby DSP control program in the 68K. This program is: [TRG_C40.FIRST_RELEASE.SOURCE_68K]NO_CHECK_12_DSP_CONTROL.ABS Before running this program, use 135BUG to write $EF (as a byte) to $FFFFF047. This enables MVME-214 #2 to be a VSB (Load) Buffer. The L1.5 Cal Trig happily cycled away at 0.6Hz (the rate of our selected Specific Trigger), without trying to touch the VBD or read out, etc. This demonstrated that all ERPB input channels were receiving enough data to generate "DMA Finished" interrupts. We then started looking at the data more carefully, trying to perform rationality checks on the incoming data. During this operation, we saw another "hang" which was (as far as we can tell without the Logic Analyzer) exactly like the hangs seen yesterday at MSU. That is, the type of hang which occurs if the Hydra tries to access non-existent VSB memory. Here is how we got into the hang: - DSP's were cycling with no problem - stopped triggers flowing to L1.5 Cal Trig (switched into "not me" mode with front-panel switches) - ABORTed the 68K - looked at some valid VSB addresses in an already-existing debugger window on DSP B2 (no problem) - opened a NEW debugger memory window on B2, pointed at different (but still valid) VSB memory---HUNG HERE Looking at the VSB bus (only with the FLUKE meter), the hang looked identical to the "MSU hang": - PAS* low - ERR* 1.529V (on DC setting--this is how it looked on the FLUKE during the MSU hang) - BUSY* low - BREQ* high Trying to do a VSB read via 135 hung the 135 in the traditional fashion. Reading the Hydra's 2400 chip (0x3ffe8000) both reported the error and terminated the cycle (i.e. the 135 successfully completed its read and the Hydra "mem" windows updated correctly, showing that they were not pointed at invalid addresses). During the hang, the Ironics which enables VSB buffers was read. It showed that Buffer #2 was enabled for VSB. This still looks flaky and it would be nice to see that if we remove the JTAG debugger then the hang never occurs. Continuing the data sanity checks, we found that DSP C4 Rack 2 Total Et data had the MSBit stuck high for all etas/phis. DSP C1 Rack 1 Total Et (i.e. the other copy of these etas) looked OK. The problem was traced to a bad CRC->DSP cable--the MSBit appeared open. The cable was visually 100% OK, and twisting the end a little bit allowed the signal to propagate. This cable was removed and replaced with a new cable. This solved the "bit stuck high" problem. Note that the Reference Set data is not exactly correct in the parameter files on the PC. These Reference Sets are not used when the L1.5 Cal Trig has been loaded by TCC. We loaded the new TCC code and LOADed and STARTed the L1.5 Cal Trig purely from TCC. We proved that the 68K_Services code can detect a "Global Stuck waiting for D3" error by halting a Local DSP. Then we detached the PC from the L1.5 Cal Trig. We ran a few "rate" tests: MFP Fire GS 5 Transfer L1.5 Dead Mode Ratio Rate FEBz Rate Crossings ---- ----- ---- ---- -------- --------- n 10 57 Hz 0.4% 5.71 Hz n 100 57 Hz 0.1% 0.57 Hz 0.7% n 100 570 Hz 1.6% 5.57 Hz 7.1% I 100 0.57 Hz 0.0% 0.57 Hz I 100 5.7 Hz 0.0% 5.7 Hz I 100 22.9 Hz 0.0% 22.9 Hz What we really need to do is to run, independently and alone, all cases and plot rate vs. FEBz and L1.5 Dead Crossings. This is simple for "non-transfer" triggers, but we can't transfer to the host at high rate. Note that, when we were running "n"s at 570 Hz, every "F" was followed by an "N." This makes sense, because the L1.5 Cal Trig could not respond to the trigger following the "F" (because it was busy doing its 1.5 millisecond transfer to the 214), so the L1.5 Trigger Framework times these triggers out. We also got "e"s on the screen twice. These "e"s seemed to be related to changing the trigger configuration, but we need to really understand these. .............................................................................. 28-JUN-1994 At: Fermi Topics: Install *BGIN and *BGOUT pullups on Hydra-II cards. Installed the (backplane-mounted) VSB *BGIN and *BGOUT pullup resistors on all Hydras in Crate 0. These have a 2.7k Ohm resistors from BGIN* (A31) to Vcc (B32), a 2.7k Ohm resistor from BGOUT* (C32) to Vcc (B32), and a LED with a 470 Ohm resistor from Vcc (B32) to GND (B31). The pullup resistors on the BG lines are the same as is used on the MVME135-1 cards. There on no pullups on the Hydra-II cards. The VBS2400 uses an open collector output on the BGOUT signal. .............................................................................. Date: 28-JUN-1994 At: MSU Topics: Try to produce VSB hang conditions while watching VSB with Logic Analyzer In the test backplane at MSU, we tried to produce hangs with symptoms similar to the hangs seen at Fermi last week. We were able to produce a hang with similar symptoms by trying to access non-existent VSB memory with the Hydra. When looking at the backplane with the Logic Analyzer, we saw: - PAS* low (active) (this contradicts what we THINK we saw at Fermi last week) - BUSY* low (active) - BREQ* high (inactive) - ERR* went low 102 us after PAS* went low (i.e. the 135 timed-out the transfer), then high 102 us after that, and continued oscillating in this fashion forever) This indicates that the Hydra did not correctly deal with the ERR* signal going low--this is supposed to indicate a VSB bus error and the active master should terminate the cycle. Reading the 135 2400 register we saw that it did NOT record a Timeout or a Bus Error (why???), nor did it generate a Bus Error message on the screen (the best guess is that it will not generate a Bus Error message unless it actually initiated the bus transaction). Reading the Hydra 2400 register both returns the ERROR* bit cleared (indicating that a bus error had been detected at some time) and ALSO TERMINATED THE VSB CYCLE. Trying to read a valid VSB address from the Hydra did not terminate the VSB cycle. Trying to read a valid VSB address from the 135 hung the 135 in the same way we had previously seen (no BUGMON message, ABORT pushbutton ineffective). This occurs because there is no timeout on a VSB Mastership transfer when using the MVSB2400 chip. Breaking the VSB cycle (by reading the Hydra 2400 register) then allows mastership to transfer and the 135 concludes its cycle normally. This hang condition was produced both by using the Hydra debugger to read the VSB locations (and carefully reading only one VSB location, verified by watching the Logic Analyzer), and also using a baby DSP program (still running under debugger control) to read the (invalid) VSB location. Note that, when the 135 tries to read an invalid VSB address from either user code or BUGMON, the VSB cycle times-out at the correct time, is terminated by the 135, and a Bus Error message is printed on the 135 tube. Note that the Crate Controller (in the 135's 2400) sets ERR* low in both the Hydra-initiated "invalid" read and the 135-initiated "invalid" read. I.e. it never asserts ACK* in response to a VSB bus timeout. Note that when the 135 tries to read an invalid VSB address, ERR* is low for only about 175 ns. There is no way that the bus cycle is being terminated by BUGMON intervention (i.e. the cycle is not concluded via the same mechanism that we were able to conclude the Hydra-initiated cycle, by reading the MVSB2400 Status Register). The 135 can't be doing anything meaningful in 175 ns. Sometime it would be good to put two 135's in a VSB backplane (or swap the 135 and the Hydra) and see what a 135 that is NOT the Crate Controller does in response to a Crate Controller bus timeout. I.e. would it act the same as the Hydra, or is the Hydra wired up in some funny way? All of this processing should be internal to the MVSB2400 chip so it is hard to see what Ariel could have done to screw this up. .............................................................................. Date: 22,23,24-JUN-1994 At: Fermi 22-JUN-1994 At: Fermi Topics: install baby VME rack, 2-headed TCC, upgraded Term Select Card, first meeting of TCC vs 68K vs DSP Replaced D0HTCC with the new 2-headed TCC. We currently have only one 2-headed TCC and need to convert the "old" TCC to 2-headed operation. The clamp on D0HTCC has been removed, so it is fast to remove and replace D0HTCC if necessary. Installed the baby VME crate (with the pVBA and VI-monarch cards) in the BA-23 box, and connected the VI-monarch to the (newly-installed) VI-serf in the L1.5 CT VME crate. Also installed the upgraded Term Select P2 Paddleboard in the L1.5 CT VME crate. Loaded the new (2-headed [v6.0??]) TRICS code in D0HTCC (and also verified that the 1-headed TRICS code [v5.3??] can run in this 2-headed box). We then tried to LOADCODE into the L1.5 Cal Trig. This was the first meeting of TCC and 68K_Services, and a few problems were found. After making a special version of 68K_Services (which moved the 68K-to-TCC Status Block from its actual home to a location where was expecting it), we were able to successfully LOADCODE to the L1.5 Cal Trig (i.e. no errors were generated). Weird behavior of DSPs with debugger attached: 1) at one time, everything was fine 2) then DSP A4 can load code, but not hear load param interrupt from TCC 2b) load param 68k program shows the same thing 3) the sanity/config startup program on Hydra A hangs at DSP 4 4) only after "parking" A4 at PC=%X00000040 can the sanity/config run thru 5) DSP A4 still fails to hear the load param interrupt 6) exiting the A4 debugger session made no difference 7) exiting all debugger sessions made no difference either 8) shut down the PC and unplug pod, A4 now hears the load param interrupt, 8b) but A1 does not loadcode 9) run the sanity/config and all Hydra A DSPs 100% happy again. 10) reconnect PC, A1 does not loadcode 11) see where A1 is: it has bad memory controller setup words 12) fix memory ctrl setup words helped see code, but A1 still won't load code 13) run the sanity/config and all Hydra A DSPs 100% happy again. After the LOADCODE success, we tried to START the L1.5 Cal Trig. An "EVE Block Copy" bug in TRICS prevented us from STARTing any Local DSPs other than A1, B1, or C1. We thought we could first execute LOAD_12_PARAMETERS to load the DSPs, and then START would produce no errors. That trick did get the DSPs to load their Parameters, but when the TCC downloads parameters to the Shared Dual Port Memories, it overwrites the DSP_to_TCC and DSP_to_68K Status Longwords, so START still produced errors. Note that this is actually the desired, designed-in behavior of START. Note that, to run the L1.5 Cal Trig with reduced eta coverage, a TAKE file is required. That means that manual intervention via the PC-based debugger is required sometime after the LOADCODE occurs but before the first event will flow through the L1.5 Cal Trig. We saw some problems with access of the VSBs via DSP Debugger. More on this later. After the accelerator broke, we continued to check TCC vs. 68K vs. DSPs. We were able to verify that the Pass_one_of_(N) counter was correctly read by the 68K (after a minor code change...) and that the Frame Parameter and Tool Parameter sections of the Data Block were being correctly programmed by the 68K. We then tried to actually take some data with the L1.5 Cal Trig, to prove that LOADing and STARTing via TCC were actually working. That failed--after LOADing and STARTing, the L1.5 Cal Trig would "transfer" 2 events and then hang with 100% Front-End Busy, and the 68K would not respond to its ABORT push-button. After trying various combinations of TCC-based STARTing and also using Load_12_Parameters, and additionally removing lines from 68K_Services, we thought that the problem was related to the Load_Parameters interrupt processing in the 68K. We removed the VSB writes from Load_Parameters, and TCC-based STARTing appeared to work OK. We then restored the VSB writes, but left the VSB pointing at Buffer 2 (not Buffer 1), but kept the Which_214_Is_Load_Buffer flag pointing at Buffer 1. Again, this worked OK. Why??!? Some further notes on the above problem: it appears that the first VSB writes by the 68K AFTER the VSB writes in Load_Parameters were "hanging." This is funny for 2 reasons: (1) if something is being screwed up in VSB by the Load_Parameters interrupt, why should the VSB writes in Load_Parameters succeed?, and (2) note that the VSB bus isn't supposed to hang on a normal write cycle, but instead the MVSB2400 on the 135 is programmed to bus-error after some amount of time. Is there something funny in mastership arbitration? This also seems unlikely, because the Global DSP should not have requested mastership at this time. This problem is probably related to Steve's earlier difficulties accessing VSB space via the DSP debugger. Also, the new TCC with the new code went into KERNEL EDEBUG. See the TCC logbook. 23-JUN-1994 At: Fermi Topics: More on VSB hang problem, install all remaining CRC->DSP cables, copy M111 data to M112 for a test Even more notes on the VSB hang problem: Today we were totally unable to re-create the VSB hang problem. We returned to the "old" 68K_Services and did things which absolutely hung the system last night (RESET, g 95000, LOADCODE, START, ABORT, take eta_1_8.tak, prun -r, g 95000). This worked fine. We were not able to try to transfer events, because the accelerator was running all day. We need to understand this problem: what is the real problem? We may only be looking at symptoms. We met with Bill Cobau for a few minutes to make a special run request, and also describe the test that we want to do. Bill claimed that he would make up some trigparse files. We asked him for the output from COORSIM for these files, as a way to verify that the COOR-to-TCC messages would be rational. He did not provide these. None of us want to waste beam time debugging COOR-to-TCC messages, so we did not actually request the special run. The new TCC with the new code went into KERNEL EDEBUG again, in a similar (but not identical) way as yesterday. We put the old code in the new TCC as a way to try to see whether the problem is in the new code or the new box. See the TCC logbook. We installed the remaining CRC->DSP cables. This installation is now permanent. It will be a major job to remove any Hydra-II cards now. Dan made a "splitter" cable to feed the ERPB data from M111 to the CRC channels for both M111 and M112. This way, we will have data being fed to all DSPs, and we will not need to do anything "funny" during TCC-driven LOADs and STARTs. Steve fired up the DSPs to see whether they could run with the "old" No_Check_12_DSP_Control program. The answer is yes, as long as one of the VSB buffers is enabled "by hand." In the process of making this test, we noticed that the DSPs connected to the new CRC were still making Token Warnings at Load_Parameters time. After poking around, we realized that the problem was that the new CRC is missing its 1 MHz crystal. We could not find a 1 MHz rock, so we installed a 2.5 MHz rock in the CRC instead. This did not work (see further entries for 24-JUN-1994). 24-JUN-1994 At: D-Zero Topics: More attempted CRC repair, continue "VSB hang" diagnosis, Try to fully download L1.5 Cal Trig from Cobau-generated file. We continued the "VSB hang" diagnosis. We used the "canonical" 68K_Services code (not the any of the funny "this.abs" code). Upon first starting the L1.5 Cal Trig from power-up, we did not experience the VSB hang. We did experience the VSB hang once after trying to start the L1.5 Cal Trig without first using the "take eta_1_8.tak" command (the hang occured after we recovered from the normal "D3 hang" that is expected when the reduced eta coverage take file is not used). We looked at some VSB signals both before and during the hang. The only thing that looked funny (both before and during the hang) was the *BGOUT from Hydra-B. It was at about 2-2.8V, i.e. not really a good TTL level. This is also what the *BGOUT from Hydra-C and Hydra-A were doing. We removed the Bus Grant jumper between Hydra-B and Hydra-C. We tried single-stepping through the 68K_Services code. We determined that the hang occurs immediately after the 68K tells the Global DSP to transfer data to the MVME214. Note that this is the first VSB write in "cyclic" processing (68K does VSB writes during Load_Parameters processing...these appear to work). Note also that the Global DSP is requesting VSB Mastership at this time. Steve looked at the DMA List to notice that NO data had been successfully transferred to the MVME214 by the Global DSP. We re-started the L1.5 Cal Trig, and stopped the 68K just before it tried to tell the Global DSP to transfer to 214. Using the DSP debugger, we looked at VSB memory via DSP (i.e. request bus mastership again). We expected a hang, but it did not. We were able to transfer VSB mastership back and forth between 68K and DSP. Also, when we released the L1.5 Cal Trig to run at speed, there were no hangs. We have not seen a hang since. What is going on? Note that in our single-step test, the DSP got VSB mastership not in conjunction with a DMA list, also a READ (rather than a WRITE) followed the Mastership transfer). We tried to use the Bill Cobau-provided configuration file to LOAD and START the L1.5 Cal Trig (as well as set up the rest of the data acquisition system). There were a few errors discovered during this test, but we were able to work around most of them. This configuration file was more complex than we would have liked, though, because it had multiple Specific Triggers digitizing Geo Sect 5. This caused the L1.5 Cal Trig to hang, because the Path Select paddleboard was not being used. We then looked again at the lower CRC, and noticed another jumper wire (the VME_RESET-to-VCC wire) was missing. Steve installed this jumper, and quickly examined the CRC for other problems, and then re-installed the CRC. It still did not grab tokens from the DSPs. We tied a "lower CRC" DSP up to the upper CRC to verify that the problem was not in the DSP. We tried making a crude switch to replace the crystal. This did not work either. Upon (another) examination of this CRC, we notice that it does not have the Token Grabber pin #1-to-VCC jumper cosmetic traces cut (as described in the CRC Description). This would absolutely cause the Token Grabber PALs to not grab Tokens. We probably wrecked the 2.5 MHz crystal, but what problems were caused by the switch, which tried to ground these pins? Steve checked the ECOs on this card vs. the ECOs in the CRC Description, and the un-cut traces were the only remaining problem (that Steve could find). We are returning this card to MSU for repair and Token Grabber PAL testing. This test is easy to do at MSU. We then switched to using the Path Select Paddleboard. We were able, using 4 different TAKERs, to produce all normal paths (F,N,n,I,i) through the L1.5 Cal Trig. This was done by programming 4 Specific Triggers, which each always produced the same result. The MFP ratio was set to 100. We started examining rates and dead times, but ran into problems during this process, probably because the Specific Triggers did not have "SKIP_n_BX" And-Or Input Terms defined, which interferes with the Transmit-Trigger generation of the L1.5 Cal Trig. Here are the rates we found: ST 0 ST 1 ST 2 ST 3 Normal Normal IBS IBS L1 L1 Fail Pass Fail Pass Trigger TRANSFER (n) (N) (i) (I) FE Busy Rate Rate [Hz] [Hz] [Hz] [Hz] GS 5 [Hz] [Hz] -------------------------------------------------------------------- 6 5.6 5.2 5.7 0.1% 22.4 11.4 61 6 54 5.5 0.3% 126 12 With some energy in the detector (but only protons), and using the Skip_10_Bx And-Or Input Term on each of the Specific Triggers, we took the following data: L1 L1 Trigger TRANSFER ST 0 ST 1 ST 2 ST 3 FE Busy Rate Rate [Hz] [Hz] [Hz] [Hz] GS 5 [Hz] [Hz] -------------------------------------------------------------------- 474 8.2 430 9 14.7 916 153 Note that the ST's did not map directly to NnIi, but some of our desired n's became N's. Also note that we were not using the Term Select Paddleboard during this short run. Instead, we had thrown the >=1 Term Selected switch. The only characters on the 68K_Services screen were all lower case "n" and lower case "f". This is consistent with throwing the >=1 Term switch. It is good to note that, using Skip_10_Bx, we were not hanging the L1.5 Cal Trig. .............................................................................. Date: 17-JUN-1994 At: Fermi Topics: Replace CAT2 at Tier 1 Py eta -13:16 phi 17:24, Install/rework L15CT cables into M124, Power Supply voltages in L15CT, Install Bus Grant wire in upper L15CT crate, Install Panduit plastic cable try on front of M124 C-channels, Verify D0HTCC - BA23 configuration. During the day of 16-JUNE-94, We had some mail messages from TRICS about problems initializing some L1 Cal Trig registers. Looking in the Trics_Log one saw errors at: CBUS 1 MBA 207 CA 20 FA 16, 19, 22, 25, 28, 17, 20 e.g. WRITE 0 READ 63 and %% time: 17-JUN-1994 01:00:40.64 E-HIO/HDB%COORini% Failure Writing 63 @ cbus 1 mba 207 ca 20 fa 23 read 0 E-HIO/HDB% (cont) Data = Out 00111111 In 00000000 Mask= W 00111111 R 00111111 E-HTT/ODB%COORini% Failure Programming Lrg Tile NEG,E_13_16,P_17_24,REF_1 E-INI/ODB%COORini% Failure Initializing Large Tile NEG,E_13_16,P_17_24 These errors are from the CAT2 card that services eta -13:-16 phi 17:24 PY Tier 1. Pull CAT2 SN#254 from service at eta -13:-16 phi 17:24 PY Tier 1. Try using CAT2 SN#176. It makes lots of readback (and other?) errors. Pull it. Install CAT2 SN# 178. SN#178 looks OK so far. Return CAT2's SN#254 and SN#176 to MSU for repair and testing. While looking in the Trics_Log I also see from time to time: %% time: 17-JUN-1994 01:50:51.64 S-INI/ODB%COORini% Initializing all Specific Triggers E-HIO/HDB%COORini% Previously 11 instead of 15 @ cbus 2 mba 129 ca 8 fa 14 E-HIO/HDB% (cont) 00001011 i/of 00001111 Msk= W 11111111 R 11111111, Writing 240 Install the power cables for the ERPB-MTG and the 2nd CRC. Straighten up the cables coming into the top of M124 and their flow down the side. Check L15CT power supply voltages: On the CRC card: +5.024 Volts and -4.506 Volts On the MTG card: +4.968 Volts and -5.194 Volts Power Pan test points: +5.196 Volts, -2.027 Volts, -4.520 Volts and -5.406 Volts Install the Bus Grant Priority 3 jumper wire from Slot #9 (MVME135-1) pin P1-B11 (BG3Out) to slot 1 (Vert Inter Slave from TCC) pin P1-B10 (BG3IN). Verify the the BG3 jumper is removed from slot 8. Install the plastic Panduit cable try mounted on an aluminum bar to the front of the "C" channels on M124 to hold the rest of the CRC to DSP cabes. I put on only one of the two cable try sections because there is limited vertical space. Verify D0HTCC - BA23 physical stack configuration. BA23 is on top. 4000 box can be removed without taking the BA23 and its shelf off of the stack. The clamp over the top of the 4000 box can be removed without taking anything else off. Remember to purchase a "A", "B", "C", "D" switch for the 68k and 3x Hydra's. Take the Term Select P2 card to MSU to have the 16 address readout wires added. Take all remaining CRC to DSP cables to MSU to have labels changed to get ready to install them. .............................................................................. Date: 8,9,10-JUNE-1994 At: D-Zero Hall Topics: Test ERPB to DSP data transport in the eta ranges, +9:+12, -9:-12, +13:+16. Edit L1.5 CT Driver Documents. Cable installation work. Move the CRC. Install ERPBs/DCs in Racks M110 and M111. Install the L15CT cables to M111 and M112 (ERPB-MTG and DC to CRC). Repair the CTFE that services -12,11 HD. Dan put connectors on the M124 end of the DC --> CRC cables from M107, M108, M109, and M110. The DC->CRC cables from M109 and M110 are not yet threaded through the clamp in M124. Dan also installed the DC --> CRC and MTG --> DC cables for Racks M111 and M112. These cables are not yet routed through cable clamps at either end, but they do have connectors installed. We removed the power cable for the ERPB MTG. This cable will be modified (made beefier) at MSU. This is being done to allow us to equalize the +5V between the CRC cards and the ERPB MTG. Currently the +5V on the MTG is about 300 mV lower than the +5V on the CRC. When we re-install this cable, we will also install the power cable for the "lower" Crate 0 CRC, and also thread the L1 <--> L1.5 cables described above through the cable clamp in the top of M124. We did not have the 2nd CRC card, so we did plug the ERPBs into the "correct" DSPs. Instead, we used the following DSPs: /---> DSP A1 Rack #2 M109 --< \---> DSP B3 Rack #1 /---> DSP B3 Rack #2 M108 --< \---> DSP B4 Rack #1 /---> DSP B4 Rack #2 M107 --< \---> DSP B1 Rack #1 We tested these 3 ERPB -> DC -> CRC -> DSP data transport paths at a low trigger rate (about 1 Hz). We used the standard "pallet" files: L15CT_PALLET_Mxxx_*.DAT where xxx = Rack Number * = 80 (EM=TOT=$80 everywhere) 7F (EM=TOT=$7F everywhere) 55_AA (EM=TOT= $55/$AA alternating) Recall that these 3 files test each bit high, each bit low, neighbor bits shorted, and also maximum switching between consecutive eta/phi time slices during the data transport phase We found no shorted bit or switching problems in Racks M107, M108, M109 during this test. This test was done using the standard 68K_Services code in the MVME135, and the Trigger Tower data was examined by hand from the DSP DeBugger (by looking in the DeBug section of the Data Block). We have only looked at a few events this way. We did not use any of Steve's old "Learn and Check" 68K code, because that code only knows about the DSPs, it does not know about the rest of the hardware which is now installed in the L1.5 CT VME Crate. We have not yet looked at the M106 ERPB data since Dan installed the special cable for the Trigger Towers at eta = -5, -6, phi = 13, 14, 16. We still have a stuck bit in Rack M103 (see the electronic logbook dated 22-25 MAR 1994 for details of this stuck bit). This stuck bit is the only known problem with the L1 Cal Trig to L1.5 Cal Trig data path (for Racks M103 to M109). We will lay out the CRC/MTG backplane in M124 as follows (note that this involves moving the currently-installed ("upper") CRC, which we did): Slot #1 (top) unused Slot #2 CRC for Racks M104, M103, M106, M105, M108 Slot #3 unused Slot #4 CRC for Racks M107, M110, M109, M112, M111 Slot #5 unused Slot #6 unused Slot #7 ERPB MTG Slot #8 (bottom) unused This will probably change if/when L1.5 Cal Trig Crate #1 is built up. We installed (9-JUN) ERPBs and DCs in Racks M110 and M111. The installation in M110 is 100% complete, but M111 requires a little bit of clean-up work. The DC-to-ERPB Parallel Timing Cable in M111 still needs a terminator, and some cables in M111 need a final taping into position. We still need to install ERPBs and a DC in Rack M112. The ERPB and DC installation seemed to go a bit faster this time, about 2.5 to 3 hours per rack. We have done no data path testing of Racks M110 and M111. When we applied power to the Cal Trig, we first turned on the upper backplane in a rack, and then turned on the lower backplane. In some racks, the DCs had successfully downloaded (i.e. the yellow LED was not lit) the ERPB LCAs without needing a push of the button, but in other racks they hadn't. In all racks, we still pushed the "download LCA" pushbutton on the DCs. In Rack M103, the yellow LEDs flashed OFF briefly when the button was pushed, but then came back on. This is the same problem we saw earlier on Racks M103, M104, and M105. The best guess is that there is a problem with the DC (because ALL yellow LEDs in M103 acted the same). These 3 DCs (M103, M104, M105) are all "prototype" DCs, which have PCBs identical to the "production" DCs, but are not assembled as nicely. We have 13 production DCs, so we could replace these cards the next time we have power off in the Cal Trig. Would that solve the problem??? Edit: L15CT_Shared_Dual_Port_Memory_Map.txt to indicate the new larger 68k_Services to TCC Status Block. Wakeup_L15CT_Outline.txt to indicate that TCC does not write the Frame Parameter Section or the Tool Parameter section of the L15CT Data Block into the MVME214 memory modules but rather 68k_Services does this in response to a Load_Parameters interrupt. Status_68k_to_TCC.txt to standardize the status codes from 68k_Services to TCC and to add "un-stick DSP" information to this status block. Services_68K_Draft_1.txt to describe the "un-stick DSP" (which should really be called the "system-level automatic error recovery") steps. This includes adding information in both the Unstick_DSP description (to describe how to unstick), and also in the house-keeping and linear code sections (to describe when to unstick). L15CT_Data_Block_Section_Layout.txt to indicate that the Frame Parameter and Tool Parameter Blocks are written by the 68K (not the TCC) when it receives the "Load Parameters" Interrupt from TCC, and also to provide a DSP vs. VME vs. VSB address map of the Data Block in the MVME214. Initialize_DSPs_Via_TCC.txt to clean up the details of TCC actions to the DSPs during "Wakeup" time. Note that this file does not describe the overall system flow (that is the job of the Wakeup_L15CT_Outline.txt file), but only talks about the relationship between TCC and the DSPs at Wakeup time. Install the ERPB-MTG cable to racks M111, M112. This cable is 43 sections long. The extra length in this cable is randomly (but orderly) folded up in the tray above M112. Note the entry from 7-APRIL-1994 specifically about the differences in ERPB-MTG cable lengths (to compensate for the Cal Trig MTG cable length differences). Sometime this needs to be looked at again because it looks like the skew in cable lengths between M103:M106 and M107:M110 may be backwards. This +- 10 nsec effect does not make much difference for 3.5 usec running but it may need to be looked at in the future for high luminosity running. Install the DC to CRC cables to M111 and M112. The cable to M111 is 28 sections long and the cable to M112 is 31 sections long. Repair the CTFE that services -12,11 HD. This is CTFE sn#246. About 2 or 3 weeks ago -12,11 HD drifted high by a couple of counts for a run or 2. Joan Guida watched it and it went away. It drifted up again in the run last night. Joan showed it to us and we could see it in TrgMon ADC Counts. It was showing 15 or 16 counts. This is very close to being in trouble. We pulled it and replaced the 0.1 ufd cap on Ch#4 HD. This fixed the problem. Also replaced the cap on Ch#2 EM because it looked a bit funny on the Fluke ohm meter. .............................................................................. Date: 7-June-1994 At: MSU Topics: Experiments with DSP DeBugger and TCC Steve and Philippe tried loading and starting the DSPs from TCC both with and without the DSP DeBugger. We also tried various combinations of starting, stopping, and disconnecting the DeBugger to see what problems arise. Here is a summary: - Loading and Starting code from TCC, with the DeBugger present the entire time, works IF the DeBugger has left all DSPs (that will be Loaded and Started) in the RUNNING state - The safest mechanism is to begin running the DeBugger, verify that all DSPs are in the RUNNING state (or put them in the RUNNING state), and then issue one or two (or more) VME SYSRESETs, watching both the Dual Port Access LEDs and the Serial Port of each DSP to verify that the Sanity and Configuration Checker executes successfully. Then verify that all DSPs are in the "do-nothing dead loop" of the Sanity and Configuration Checker (which is at about 2ff931h). - If any DSP has been HALTED by the DeBugger, there is NOTHING TCC can do to make the DSP run again. I.e. the DeBugger HALT has priority over both the Boot Control Register RESET and the hardware VME SYSRESET. - In this case, either the DSPs must be put into the RUNNING state via the DeBugger, or the DeBugger must be physically removed from the system (i.e. turn the PC off and unplug the DeBugger pod from the PC) - The DeBugger cannot debug any DSP which has been placed in RESET by the TCC (via the Boot Control Register). The DSP must be unRESET, either via the Boot Control Register, or through a VME SYSRESET. - The DeBugger can be both exited and physically disconnected from the DSPs (i.e. PC turned off and DeBugger pod unplugged from the PC) while programs are running on the DSPs, or while the DSP is halted. In either case, the DSP is left in the RUNNING state (so, for example, the TCC can Load and Start the DSPs). - The DeBugger can be physically re-connected and started while programs are running on the DSPs. The general context of the DSP is not lost when this is done, but the DSPs may move from the RUNNING state to the HALTED state. This is perhaps the known problem between PDM and the individual DeBuggers. PDM will think that the DSPs are RUNNING, while the individual DeBuggers will think that the DSPs are HALTED. The DSPs actually appear to be HALTED. I think that in order to move the DSPs to the RUNNING state, it is best to first single-step each DSP a few times. I have sometimes had problems telling the DSPs to RUN immediately after physically reconnecting the DeBugger .............................................................................. Date: 6-June-1994 At: DZero Topics: Replace Power Pan in Upper Tier 1 and MSU in M107. About 10:30 Dan Owen called from Fermi. The -4.5 in the upper Tier 1 Power Pan in M107 had died. He and Mike Matulik will work to replace it. There were 2 or 3 hours of the store left to go and Muon wants to use it. Thus they will turn off all of L1 Cal Trig but keep the FW's running. This has the problem that L1 VME Transfer Computer will find lots of Pilot COMINT errors (really first read vs last read errors in L1 Cal Trig CTFE ADC data errors). The 4 usec required to put these on the screen will crash L2 (i.e. if it ever needs to resync a data cable it will never be able to, a new "feature" of L2 code). So in TrgCur: there is a new special version of RunMe68020.ABS called RunMe68020.abs_No_Cal_Trig_Error_Check. They load this version of VTC code to use while L1 Cal Trig is turned off. The plan is to maintain this version of VTC code to run when L1 Cal Trig is off. Dan and Mike replaced the Power Pan within a couple of hours and TrgMon says that L1 Cal Trig is running OK again. .............................................................................. Date: 1:3-June-1994 At: D0-Hall Topics: Talks at the D0 Collaboration Meeting, Install Norm Amos scaler, Install ERPB's in M109, Fix ERPB cabling in M106, Files that can be deleted, At the Collaboration Run Meeting Dan Owen gave a L15CT talk show 1x2,3x3 Electron results. Presented stuff at the first Blazey UpGrade Trigger meeting. Had the first meeting of the Electronics Board for Run II. Files that can someday be deleted: In TrgL15CTHST: CALOR$1_078267_01.X_ZRD01;1 4095/4095 9-MAY-94 In VWork1: L15CT_THATS_ME_MFP_ZBD.TXT_29_APR_1994;1 405/405 29-APR-1994 L15CT_THATS_ME_NORMAL_ZBD.TXT_29_APR_1994;1 320/321 29-APR-1994 L15CT_ZBD_IBS_29APR94.TXT;2 44/45 29-APR-1994 L1_L15CT_PRISTINE_ZBD_RAW.TXT;2 721/723 29-APR-1994 This is about 5585 blocks. Modify Scaler #1 on DBSC card SN# 06 so that it has both a hardware reset input and so that its clock signal comes from an on board 10 MHz crystal oscillator. The details about how to make this modification are given in the DBSC text file in TrgHard:[DBSC], This is for Foreign Scaler #36 which Norm Amos will use to show the location of the Main Ring Beam with every event to the nearest 100 nsec tick. This scaler is called "time_from_micro_blank" and its hardware reset signal via the 9th Lemo from the top on the NIM to ECL module in M122. Note that there is a problem about the increment of this scaler (from it 10 MHz oscillator) vs the Latch-Shift from the Framework Main Timing MTG. Install ERPB's into rack M109. The DC is SN #3 and it still needs its DIP switches set. The top ERPB is SN# 64, the 63, 62, 61, 59, 58, 57, and the bottom ERPB is SN# 56. All the yellow lights went out when I pushed the LCA load button on the DC. In rack M106 the -6,14 Trig Tower has munged pins on bits 6 and 7. See the log entry from 6-APRIL-1994 for details. Today I installed a special cable to try to get the Trig Tower connected up. Note this is why Trig Towers eta -5 and -6 Phi 13,14,16 have not been reading out (i.e. all these cables were left off to give access to -6,14. .............................................................................. Date: 24-25-MAY-1994 At: Fermi Topics: Beam run for L15CT 1k events, Ariel visit, TCC boot, Disconnecting the C40 24-MAY-1994 Debugger, Term-Sel P2, Special ERPB Cable. ----------- Ran in beam of lumiosity 2.8 See the TrgMon Dumps in VWork1: TrgMon_Dump. TXT_L15CT_BEAM_24MAY94. About 1000 events on disk for this run 79226. 25-MAY-1994 ----------- Visit from Ariel sales person Dion Messer. Need to send her stuff about the multi JTAG and more written stuff e.g. IEEE and TI papers. TCC or network or COOR have some kind of a problem. COOR says timout waiting for Acknowledge at the end of a run (79254). Now if you do a $ dirs D0HTCC::DUA0:[Trigger] then after 30 seconds it says -RMS-F-Net, Network operation failed at remote node; DAP code = 01F7 7C54 At the same time as this started 12 L2 nodes crashed and had to be retriggered. Booting (just a NCP Trigger boot NOT a power cycle boot) fixes the problem. More test of disconnecting the debugger from the DSP's. Say "Quit" to PDM and all DSP's are still OK Say Shutdown the IBM/PC and all DSP's are still OK. Disconnected the cable from the PC to the Pod and all DSP's are still OK. Reconnected the cable from the PC to the Pod and all DSP's are still OK. Pulled the Term Select P2 card to verify how it is wired. I put notes on this in the P2 note book. Reinstalled this P2 but left power off to the L15CT VME crate and to the ERPB-MTG, CRC. Take PC back to MSU. Recall that the ERPB servicing eta -5,-6 phi 13,14,16 needs a special cable made for it to fix the backplane problem. .............................................................................. Date: 23-MAY-1994 At: MSU Topics: Return Hydra-II to Ariel Return Hydra-II serial number 7010 (MSU#1) to Ariel for repair of its NMI to DSP #2 problem. .............................................................................. Date: 18,19,20-MAY-1994 At: FERMI Topics: New DSP code 1x2,3x3, Repair a Trig Tower, Clip pins for more ERPB's, Replace Hydra-II "A", Tests runs and beam run. 19-MAY-1994 ---------- Start running L15CT with the code that has the new ISR routines. So far it looks OK. Repair Trig Tower +9,13 EM. It was reported at 200% in the pulser run. It had a Term-Attn with a cold solder joint. Look at Trig Tower +18,4 EM. It was reported at 50% in a pulser run. I check with the test pulser, looked at its switching noise waveform compared to its neighbors, checked with Fluke for shorts and opens. All looks OK. Worked with Dan Owen clipping pins. M109 and M110 are now clipped. Strung in the ERPB MTG cable for M107:M110. Strung in the DC to CRC data cables for M109 and M110. Pulled out DSP "A". Pulled Ariel SN 7010 MSU SN#1, Installed Ariel SN 7047 MSU SN#4. This fixed the "A2" not responding to interrupt problem. Install DC cards in M107 and M108. At power up and DC button push all SRPB's appear to load OK i.e. the yellow LED's go out. Recall that the ERPB at eta -5,-6 phi 13,14,16 has a problem and that I should have replaced it today while I had the chance. See dsfasd entry. 20-MAY-1994 ----------- Ran overnight and stuck after 120k events. 68k_Services was looking at DSP B and came out at $96030. B2 Check Reported Comm Ports R6=0 32, 0, 0, 19, 64, 64 <-- values in 4th DMA Cntrl LW for Comm Ports 0:5 A2 Wait for Previous DSP Data C2 Wait_for_Previous_DSP_Data R6=0 R6=0 32, 0, 0, 19, 20, 20 32, 0, 0, 19, 20, 20 A1 Check_Reported_Comm_Ports C1 Wait for Sync R6 = 0 0, 20, 0, 20, 20, 20, Start again and Hang after about 5500 events at 14 Hz. It had just finished an "F" transfer. 68k-Services was looking at DSP B. It came out at $9603A. A1 Check_Reported_Comm_Ports C1 Wait for Sync R6 = 0 R6=0 0, 20, 0, 20, 20, 20, 0, 20, 0, 20, 20, 20 A2 Wait for Previous DSP Data C2 Wait_for_Previous_DSP_Data R6=0 R6=0 32, 0, 0, 19, 20, 20 32, 0, 0, 19, 20, 20 A3 Wait for Sync C3 Check_Reported_Comm_Ports R6=? R6=0 20, 20, 20, 20, 0, 19 20, 20, 20, 20, 0, 19 A4 Wait for Sync C4 Wait_for_Sync R6=0 R6=0 20, 0, 20, 0, 20, 20 20, 0, 20, 0, 20, 20 B1 Check_Reported_Comm_Ports R6=0 0, 20, 0, 20, 20, 20 B2 Check Reported Comm Ports R6=0 32, 0, 0, 19, 64, 64 B3 Check Reported Comm Ports R6=0 20, 20, 20, 20, 0, 19 B4 Check Reported Comm Ports R6=0 20, 0, 20, 0, 20, 20 Ran until event 26621 when I was asked to stop it for other tests. Ran in beam (run 79041 and run 79044 put 30 events on tape. L1 was set for 1 EM Trig Tower over a 2.5 GeV threshold in the eta range -6:+6. L15CT was set for 1x2 EM of 5 GeV and ratio to 3x3 Tot Et of 0.8 Beam was a very low luminosity 6x1 store. .............................................................................. Date: 11,12,13-MAY-94 At: Fermi Topics: More no beam tests of L15CT, Investigate problem with MFP events, Books needed at Fermi 11-MAY-1994 ---------- Setup the L15CT up as last week. Everything is the same except for 68k_Services code. Running 50:50 pass/fail (TAS No driven) and MFP_Ratio of 3. All low rates (<15 Hz) all appears OK. At higher rates (28 Hz) then things hang after about 30 seconds of running. At the point when things hang: It is an MFP event being processed. The previous event was either a normal event that passed and was transfered to L2 or a normal event that was rejected. 68k_Services is waiting for GDSP to say that it is at step D3. GDSP is at 04A6044Dh Check_Reported_Com_Ports: When running 50:50 pass/fail (TAS No driven) and MFP_Ratio of 3, I see the following: Hz L1 FEBz% DeadBX% Note: Only one ----- ----- ------- out of every 5.7 0.1% 0.05% three events 11.5 0.3% 0.10% is actually 20.5 0.5% 0.17% rejected. 28 0.7% 0.24% When running 50:50 pass/fail (TAS No driven) and MFP_Ratio of very very big, I see the following: Hz L1 FEBz% DeadBX% Note: One ----- ----- ------- out of every 11.5 0.0% 0.13% two events 28.6 0.0% 0.33% is actually 57.3 0.1% 0.66% rejected. 95.4 0.1% 1.11% Books that I can not find at D-Zero Hall and that need to come here: TI C40 book, Ariel Hydra-II book. 12-MAY-1994 ----------- It appears that when we "hang" in a MFP event that it may be DSP C3 that that is causing the problem. It appears that C3 may not be sending the MFP data to C2 and thus C2, having received only the Object Lists, "hangs" waiting for MFP data from C3. C3 may not be getting told that it is an MFP event or it may be forgetting this. To see where things are during a hang do the following: Examine DSP B2 register R6. It has the form: Data from DSP ------------- DSP --> C2 B3 B1 A2 -- -- -- -- Value if all data has been received from this DSP --> 42 42 42 42 Value if still waiting for all data from this DSP --> 00 00 00 00 Examine DSP C2. Where is it in its program? Is it at Wait_for_Previous_DSP_ _Data ?? Examine memory location LG_Xfr_from_Prev_Status_Loc. This location holds the status of the transfers to C2 from C1 and C3. This has the format: Data from DSP ------------- DSP --> C3 C1 -- -- -- -- Value if all data has been received from this DSP --> 00 00 7F 7F Value if still waiting for all data from this DSP --> 00 00 00 00 Steve makes a new version of the L_Scan.A40 routine. This new version saves the most recent Wake_Up_Word and the next to most recent WUW in two memory locations. It is necessary for us to learn what DSP C3 thought that it was told to do on the L15CT cycle that "hangs". Adding this explicit memory of the WUW is necessary because by the time that we can get a look at C3, it has already gotten to Step D15, and has erased the other direct indicators as to whether it received a MFP or a normal WUW. These two memory locations are: WUW_N_Minus_1_Loc, and WUW_N_Minus_2_Loc. In C:\C40CODE\WORK\LOCAL rename the old L_Scan.A40 to L_Scan.ALD Copy the new L_Scan.A40 from floppy to C:\C40CODE\WORK\LOCAL Then assemble and link by: LOCALASM all and then LOCALLNK all. Reloaded the DSP's and verified that I could see WUW_N_Minus_1_Loc and WUW_N_Minus_2_Loc in both A3 and C3. They are currently at C000 10B6h and contain all zeros (have not run any cycles yet). During the next running the idea is to check: Does the "hang" always involve DSP C3? and what does DSP C3 think that it was told to do. To copy the new L_Scan.A40 from VAX to PC floppy I did: Copying files from the VAX to the PC. Put the file in Scratch:[Long.PCCommon] on the online cluster with a filename that fits the PC 8.3 format. This will appear in drive H: on the PC. Then just copy from H: to B: on the PC. 13-MAY-1994 ----------- Look at more "hangs" on MFP events: "Hang" Processor Details ------ --------- ---------------------- #1 B2 PC = Check_Reported_Comm_Ports R6 = 42420042 --> DSP B1 B1 PC = Wait_for_Sync WUW_N_Minus_1_Loc = 110001ff, 000000ff C1 PC = Wait_for_Sync WUW_N_Minus_1_Loc = 110001ff, 000000ff #2 B2 PC = Check_Reported_Comm_Ports R6 = 42004242 --> DSP B3 B3 PC = Wait_for_Sync WUW_N_Minus_1_Loc = bb0001ff, aa0000ff Comm 4 \/ 0094008b 002ffc97 00000001 00000000 1000e0 /\ 00100082 00000000 c000049c 00000000 Comm 5 \/ 0cd4004b 00100091 00000000 00000005 1000f0 /\ 002ffc92 00000001 c00004b0 00000000 B4 PC = Wait_for_Sync WUW_N_Minus_1_Loc = bb0001ff, aa0000ff Comm 1 \/ 0084008b c00004d7 00000001 00000000 1000b0 /\ 00100052 00000000 c000049c 00000000 #3 B2 PC = Check_Reported_Comm_Ports R6 = 42420042 --> DSP B1 Comm 3 \/ 0cd4004b 00100071 00000000 0000010e 1000d0 /\ c0000b21 00000001 c0000474 00000000 B1 PC = Wait_for_Sync WUW_N_Minus_1_Loc = 990001ff, 880000ff Comm 0 \/ 0084008b 002ffcb0 00000001 00000000 1000A0 /\ 00100042 00000000 c000049c 00000000 A1 PC = Wait_for_Sync WUW_N_Minus_1_Loc = 990001ff, 880000ff #4 B2 PC = Check_Reported_Comm_Ports R6 = 42424200 --> DSP A2 Comm 4 \/ 0cc0004b 00100081 00000000 00000064 1000e0 /\ 002ffc01 00000001 c000048A 00000000 A2 PC = Wait_for_Previous_DSP_Data LG_Xfr_from_Prev_Status_Loc = 00007f00 --> A1 WUW_N_Minus_1_Loc = 000001ff, ff0000ff Comm 3 \/ 0cd4004b 00100071 00000000 0000010e 1000d0 /\ c000082a 00000001 c00004b0 00000000 A1 PC = Wait_for_Sync WUW_N_Minus_1_Loc = 000001ff, ff0000ff Comm 0 \/ 0084008b 002ffc65 00000001 00000000 1000A0 /\ 00100042 00000000 c000049c 00000000 #5 B2 PC = Check_Reported_Comm_Ports R6 = 00424242 --> DSP C2 Comm 5 \/ 0cc0004b 00100091 00000000 00000064 1000f0 /\ 002ffcb0 00000001 c00004a2 00000000 C2 PC = Wait_for_Previous_DSP_Data LG_Xfr_from_Prev_Status_Loc = 0000007f --> C3 WUW_N_Minus_1_Loc = 990001ff, 880001ff --> MFP_Ratio = 1 Comm 1 \/ 0080008b 002ffcb0 00000001 00000064 1000b0 /\ 00100052 00000000 c00004aa 00000000 Comm 0 \/ 0cc4004b 00100041 00000000 0000021c 1000a0 /\ c0000c63 00000001 c00004c0 00000000 C3 PC = Wait_for_Sync LG_Xfr_from_Prev_Status_Loc = 00007f7f WUW_N_Minus_1_Loc = 990001ff, 880001ff --> MFP_Ratio = 1 Comm 4 \/ 0084008b 002ffce2 00000001 00000000 1000e0 /\ 00100082 00000000 c000049c 00000000 Comm 5 \/ 0cd4004b 00100091 00000000 00000005 1000f0 /\ 002ffcdd 00000001 c00004b0 00000000 C4 PC = Wait_for_Sync WUW_N_Minus_1_Loc = 990001ff, 880001ff --> MFP_Ratio = 1 Comm 1 \/ 0084008b c00004d7 00000001 00000000 1000b0 /\ 00100052 00000000 c000049c 00000000 Steve make a new version of L_Scan.A40 and of L_ISR.A40 The new L_Scan has a new memory location called This_Event_Type_Loc and it explicitly checks the new valid WUW Flags Byte against "0" and "1". The new L_ISR has routines that save the Processor Status and then restore it when returning. Get these new files from the VAX to the floppy using the PC and PathWorks. In C:\C40CODE\WORK\LOCAL rename the old L_Scan.A40 to L_Scan.ALE These is now both an L_Scan.ALD and L_Scan.ALE Copy the new L_Scan.A40 from floppy to C:\C40CODE\WORK\LOCAL In C:\C40CODE\WORK\LOCAL rename the old L_ISR.A40 to L_ISR.ALD Copy the new L_ISR.A40 from floppy to C:\C40CODE\WORK\LOCAL Then assemble and link by: LOCALASM all and then LOCALLNK all. .............................................................................. Date: 4,5,6 May 1994 At: Fermi Topics: Installed two more DC cards and CRC to Hydra II cables, Installed L15CT Term Answer P2 to M103 L15 Framework Answer Done cables, Work with L15CT doing all pass, all reject, and full dance, write a tape with a couple dozen events from L15CT, Checked what PAL's were installed in M103 L15 FW, pickup a 2nd Vertical Interconnect master and slave 4 May 1994 from Greg Cisco. ---------- Installed DC's in Racks M105 and M106. We notice that the YELLOW LED on the ERPBs in M106 turns on when power is first applied to the DC, but then turns off when the "configure LCA" pushbutton on the DC is pressed. The yellow LEDs in M103, M104, and M105 only briefly flash off when the "configure LCA" pushbutton is pressed. Also, the LCAs in M106 are NOT configured until the pushbutton is pressed. We are not certain what happens in the other racks. We put connectors on the DC->CRC cables for these DC's, and also made a short extender cable for the MTG->CRC cable for these DC's. The old cable did not have a connector for a terminator. We installed CRC->DSP cables for these next 2 Racks. We now have 4 of the 5 channels on the first CRC occupied. Cable routing space is getting very tight, we are going to have to do something about it soon. We also installed the Done/Answer cables to the L1.5 FW in M103, but did not plug them in. We don't have the Term Answer card finished yet (but it's close). The Answer Done cable from M124 L15CT to M103 FW is made in the following way: Upper L15CT M103 L15 Crate Framework Pin #1 end +-----+ +-----+ Pin 1 end --------------------------|Trm16| |Trm16|----------------------------' |ANSWs| | | 34 conductor .----------------------|Trm23| |ANSWs| twist-flat ------ < .----------------|Trm16| | | \ / |DONEs| |Trm31|---------.--------------------. \ / .--------------|Trm23| | N.C.|_______. \ \ X / | N.C.| +-----+ \ \__ \ / \ / Pins 33,34 +-----+ \____| Pins 33,34 X X no connection M103 L15 no connection / \ / \ Framework / X \ Lower Crate +-----+ Pin 1 end / / \ \ Pin #1 end +-----+ |Trm16|-----------------------------' / \ '-------------|Trm24| | | 34 conductor / \ |ANSWs| |DONEs| twist-flat -------< ----------------|Trm31| | | '----------------------|Trm24| |Trm31|---------.-------------------. |DONEs| | N.C.|_______. \ \------------------------|Trm31| +-----+ \ \__ | N.C.| \____| Pins 33,34 Pins 33,34 +-----+ no connection no connection Verified what PAL's are now in the L15 Framework Term Receiver MTG. PAL's are installed in the Receiver MTG for L15 Terms 0:18. All Veto Confirm PAL's are installed for Spec Trig's 0:15. Recall that we need more BIT2PAL's for the ERPB MTG. We verified that Racks M105 and M106 were sending data to the DSPs. We did not do any super-serious data transfer checks like with M103 and M104. We will need to do these checks before the L1.5 Cal Trig is a usable product. We still haven't fixed the munged backplane problem with the top backplane in Rack M106. We just read "0" for EM and Total Et from eta = -5, -6 at phi = 13, 14, and 16. Phi 15 reads correctly. Steve made Pallet files for M105 and M106 but they haven't been tested. 5 May 1994 ---------- We tried (but did not succeed) to do the full "dance" test during a beam study period. We had some problems with both DSP software (all related to Steve's addition of the Type 4 Entry in the DeBug section) and the 68K Service software (related to moving the "synch state" from D12 to D15). These problems took quite some time to find and fix. We then discovered that we had the Specific Trigger Fired and its Strobe wired up to our patch panel upside-down. This caused ERPB data to roll into the DSP Comm Ports when it was not expected. It appears that this can confuse the "Sanity and Configuration Checker" which runs when the DSPs are reset. After fixing that, we verified that the DSP's and the 68K Services could handshake correctly (i.e. the same test we did last week). After proving that functionality, we attached the Term Answer paddleboard and hooked up the cables to the L1.5 Framework. We tried to accept every event (by forcing a large EM Et in a Trigger Tower outside of the current eta coverage). We discovered that we were exiting every L1.5 Decision Cycle via Timeout. This was because we were using the Specific Trigger Fired Strobe to initiate L1.5 Cal Trig processing. Of course this Strobe doesn't fire until AFTER the end of the L1.5 Decision Cycle, so the L1.5 Cal Trig didn't start running until the L1.5 FW timed out. To fix this problem we decided to change the ERPB MTG setup somewhat. We now use the AND of Specific Trigger x (x = 0:15) and Start Digitize to feed MTG Channel 6. MTG Channel 6 feeds MTG channel 8, which outputs a pulse one Bx long, starting at 82 of the Bx which made the Specific Trigger fire. MTG Channel 8 feeds the /STORE and /LATCH "master" MTG Channels, and also feeds Channel #7. Channel #7 makes another 1 Bx pulse, starting at 75 of the Bx following the Bx with the positive L1 decision. Channel #7 feeds the Transmit_Trigger "master" MTG Channel. This scheme will work (with replacing the Specific Trigger input by the >=1 Term Required signal from the TSP2 Paddleboard, which will require actually programming the Term Select paddleboard) until double buffering is required. We needed to make a new ERPB MTG PROM, ERPBTG1B.DAT. To get this file on a floppy disk, we used the FTP program on the Macintosh (in the Telnet folder, which is in the Communications folder). We then used Apple File Exchange on this Mac to write an MS-DOS compatible disk to take to the Data-I/O Unisite Model 48. Theoretically we could have used the PC (Dan claims to have done this) but the PC seemed to be munged. When we pulled out the ERPB MTG we noticed that it has no PROM or PALs for Timing Signals 25:32 (i.e. high eta). Before we are able to use |eta| 17:20 we will need to install a PROM and appropriate PALs in this MTG. The beam returned while we were still getting ready for another attempt at the "always pass" test. We left the power turned off in the CRC/MTG crate and in the L1.5 Cal Trig VME Backplane. 6-May-1994 ---------- We powered up the L1.5 Cal Trig VME Crate again. Powering up this crate, and getting code correctly loaded into all 12 DSPs, still appears to be a delicate operation with some not-understood complications. Some problems that we see are: Not all 12 DSPs successfully complete their "Sanity and Configuration Checker". This is apparent either by watching the Shared Dual Port Access LEDs (which you can see by reflection) and noticing that the correct "pattern" does not appear, or looking at the program counter of each DSP. The end of the Sanity and Configuration checker should be address BU address where address is 2ff931 or thereabouts. When this happens, the associated C40s do not always operate correctly (for example, they may have "junk" in some Comm Port Input or Output FIFOs), or may not correctly respond to interrupts. Does the VME RESET signal act differently from the RESET button on the DSPs, and do both act differently from resetting the DSPs via Boot Control Registers (which TCC must do)? The "software" reset available via the debugger is absolutely NOT equivalent to a hardware reset (for example it does not provoke the Sanity and Configuration Checker). What can we do to improve the robustness of the start-up sequence? Also, what about robustness during "normal cycling"? In all of our tests, once we have gotten over the "hump" (which may take a few cycles of "normal cycling" followed by resets or new code loads) we then appear to be moving smoothly. Are we just on the ragged edge of something? We did the "always pass" with no difficulty. The "always fail" test worked after we removed a line which had been added to the 68K Service code for testing. We appeared to exit a small fraction of cycles via Time Out (more on this in 2 paragraphs). We then tried the "dance" test, but letting the 68K generate Term Answers based on the TAS# rather than the DSPs determination of Term Answers. We had a design problem in 68K Services which did not correctly handle the "fail" case (it set the "DSP Data Available" flag when it should not have done so). This design problem was fixed and the "dance" test then succeeded (again with a small fraction of "exit via Time Out"). We then tried to Mark and Force Pass one out of every 3 events, as well as "dancing." Under these conditions we saw 33.33% "Exit L1.5 by Time Out". The problem was (is) that 68K Services always waits for GDSP to arrive at D3 (and provide Term Answers) before sending the Term Answers to the L1.5 Framework. This has 2 problems: (a) L1.5 Framework times out at 250 usec, while the GDSP takes more than 350 usec to arrive at state D3 during MFP Events, and (b) the 68K does not actually try to "force pass" the event, but rather just sends the GDSP's Term Answers to the L1.5 Framework. This may be the same problem that caused us to have a small fraction of "exit via Time Out" when we tried to Mark and Force Pass a small fraction of the events. We DID NOT fix this problem in the 68K Services code. Note that a Time Out forces the event to be accepted, so we do actually pass all of the Mark and Force Pass events. We may try to return Term Answers and Dones to the L1.5 Framework at random times, but under the special running conditions we used, the "reset DONEs" logic in the Term Answer card masked the DONEs. Running under the above conditions, we put 59 events on disk (with help from the Guidas) in the file: DATA3:[CAL]CALOR$1_078267_01.X_ZRD01 (available only at Fermi, NOT at MSU). We could probably put events on disk by ourselves under TAKER, but they need to land in a directory which is accessable both to COOR and to whatever account used to run the TAKER. Use "Turn Recording On" and "Data Disposition" to shoot data at a disk. Steve looked at these events a little bit under FZBASCII (do SETUP UTILS, choose "7" [SETUP_UTIL.COM], and then type FZBASCII. Some hints on using FZBASCII: FZopen gets a file (for example the above file) NExt skips to the next record in the file (beginning of data counts as a record) FInd Finds records which meet certain criteria DBank is the moral equivalent of ZBD, but has an even less-friendly user interface (!) Once in DBank, there is no "help" facility. Some hints for DBank: , scroll down or up in the data block go to longword (in decimal) X set output format to hex exit DBank, return to main menu Note that any commands you type DO NOT appear on the DBank screen. These events looked completely correct. There is a mix of MFP and normal events. The first 4 events are: Event # Comment ------- ------- 1863 Mark and Force Pass 1908 Mark and Force Pass 1924 "normal" 1940 "normal" Steve sent mail to Dan Owen, Djoko, and Greg Snow telling them to look at this data and carefully examine it. I picked up from Greg Cisco a 2nd master VI and a second slave VI. Both of these came back to MSU for testing and use in the TCC to L15CT crate link. The 1st VI master and slave that I picked up from Greg a couple of weeks ago remain in use in the L15CT crate bus to bus link. .............................................................................. Date: 27,28,29-APR-1994 At: Fermi Topics: Test L1.5 Cal Trig with real 68K Services code, sending data to L2 via VBD. 27-APR ------ We brought the PC back to FNAL to continue testing of the L1.5 Cal Trig. We were able to debug all 12 DSPs in the JTAG scan path (which we had also previously been able to do at MSU). We loaded the correct code into all 11 Local DSPs and the Global DSPs, and also made the correct A-to-B and C-to-B Local-to-Global connections. This had never been done at MSU. Using Steve's "baby" DSP control program we were able to control all 12 DSPs and move data into a MVME214 module (without reading ERPB data). We ran all 12 DSPs under the "baby" control program, running as fast as we could (not writing data to 214's, and also without reading ERPB data). At top speed, each complete cycle took about 117 microseconds. We have a problem with DSP A2. It does not respond to its NMI (which is used by TCC to tell the DSP to load its Parameters). It does not respond to the push-button or to the "VME" NMI (recall that the NMI source is selected by the Hydra Interrupt Control Register). All other DSPs on Hydra-A respond to this interrupt correctly. Hydra-B and Hydra-C also have no problems with NMI processing. Either this DSP is broken (note that this Hydra is MSU S/N-1 which at one time correctly responded to NMI), or we are doing something wrong with this Hydra card. Note that Hydra-A is the Hydra which is NOT on VSB, and DSP #2 is the DSP with VSB access. We then "meshed in" the real 68K Services program (written by Dan). After fixing a few communication mismatch problems we were able to cycle the DSPs under control of the real 68K Services program. We were able to pass events up to Level 2 (not during global running!) but did not look at the contents of any events. We also noticed that both 214's were being accessed in a single event (i.e. both 214 LEDs lit up more or less at once). This indicates some problem. Installed the Data Cable to connect the L15CT crate. The path is now the following: from the Sequences to a Repeater in M114, from this repeater's output to the L15CT crate in M124, from L15CT crate to another Repeater in M114, from the output of this 2nd Repeater to the 3 VBD's in M114 and M115, then up to the third floor. Note that all three cables (64 TF, 26 TF and RG58U) in the Data Cable all follow this path. No new Data Cable problems appear to have been started by this 60 foot addition to the Trgr Data Cable. Note about the VBD in ByPass mode: Even in ByPass mode the VBD is not completely save. If the VBD is in ByPass mode and you tell it to do random things it CAN put data onto the Data Cable. Only switch to ByPass mode (or back from ByPass mode) when data is NOT flowing. When in ByPass mode and you want to work in the VME crate (in a way that might talk to the VBD), UNPLUG the 64 TF and the 26 TF from the VBD. Do NOT unplug the RG58U token cable. 28-APR ------ Since we don't have a TCC hooked up to VME, we made a better version of the "generate NMIs to all 12 DSPs" program. This one tries to verify that all DSPs have correctly loaded their parameters. Not having the TCC around has the advantage of removing a complex element from the system. The PC can load the code, 68K can provide the interrupts, is there anything else we can do to make TCC-less life easy? There is a file in the [D0_Text.Level_15.CalTrig.Hardware_Software_Text] directory about how to start up the DSPs without the TCC. We also added some switches that let us either let the 68K look at Path Select P2PB to choose "Innocent By Stander" vs. "That's Me" processing, or force one or the other. We ran under the same conditions as yesterday (real 68K Services, no ERPB data, not Global running so we had our own Specific Trigger) to work on the "both 214's hit on one event" problem. We were able to get a ZBDUMP of some events. They looked about right (with a few small problems), but we realized that we have no way to demonstrate that the Data Block is all from the same event! We revived the old "pass the TAS Number from 68K to DSPs in the Wakeup Word" idea (it actually never left the DSPs...) and for now just stuck the Wakeup Word into the Global DSP Object List, which is near the end of a "normal" Data Block (but not near the end of a Mark and Force Pass Data Block). We should define a new entry type for the DeBug section which contains some synch information. This entry type should always appear in the DeBug section. It should appear at the end of the block. What else should we stick into the Data Block? L1 has lots of Data Block consistency checks, why didn't we build any into L1.5 from the start? After sticking the TAS Number into the Data Block it was clear that the Load Buffer was being changed while the Global DSP was still loading it. We found the problem in 68K services and made a temporary fix. That solved the "both 214's hit" problem. We played with the mix of Mark and Force Pass vs. "normal" events. We have some evidence that the VBD reads out the "full" DeBug section even on events which didn't receive Mark and Force Pass processing. We need to look more carefully at this. We also see a VBD problem. Whenever we change 68K Services code, we pause the run, abort the 68K, reload the 68K, restart the Services code, and resume the run. After resuming the run, we see lots of crate token errors on our data cable (which would seem to indicate that the VBD was not being read out), but we see events flowing through the VBD. Hitting "reset" on the VBD clears the problem. We should be very careful about playing with the L1.5 Cal Trig Service 68K when events are flowing through Level 1 (which is on the same data cable). We then let the DSPs receive data from a subset of the ERPBs (|eta| 1..4). We set up the MTG to control the ERPBs exactly the same as the first Data Transfer test done a month ago. We were able to see the calorimeter noise in the Mark and Force Pass data for the DSPs which were hooked up to L1. We also proved that the DSPs needed to see ERPB data by shutting off ERPB transfer and watching the system hang. Finally, we did some speed tests of the whole system. Here are the results: (1) Readout Required, That's Me processing, no ERPB data Trigger Geo Sect L2 Rate (Hz) 5 FE Bz Disbl Prescale Comment --------- -------- ----- -------- ------- 5.7 0.1% 50000 11.5 0.1% 25000 22.9 0.3% 12500 38.2 0.5% 7500 72.0 0.9% 4000 94.0 1.2% ~40% 2000 rate limited by L2, but 93.0 1.2% ~60% 1500 rate vs. FE Busy is realistic 94.0 1.2% ~75% 900 (2) Readout Required, Innocent By Stander, no ERPB data Trigger Geo Sect L2 Rate (Hz) 5 FE Bz Disbl Prescale Comment --------- -------- ----- -------- ------- 85 0.1% 2000 (3) Readout Required, That's Me, including ERPB Data Trigger Geo Sect L2 Rate (Hz) 5 FE Bz Disbl Prescale Comment --------- -------- ----- -------- ------- 5.7 0.1% 50000 57.0 0.7% 5000 94 1.1% 2500 94 1.1% ~45% 1750 rate L2 limited again, but 89 1.1% ~100% 900 rate vs. FEBz realistic What we have not done is to look at any Readout Not Required rates. This will happen a lot in the real system so we should look at these rates also. 29-APR ------ We swapped in the new CRC card. When Steve turned the L1.5 Cal Trig Power Pan on with the CRC plugged in and fully hooked up to the DC's and the Hydra's, the AC fuses for both the -5.2V and -4.5V supplies blew. Steve replaced the fuses (the dead fuses were 1.5A, but all I could find for replacements were 2.25A). After fuse replacement things looked fine. The new CRC appears to work. Dan made some "bug fix" changes in 68K Services code. The current version has no known problems. We captured some good ZBDumps of the TRGR bank (including both L1 and L1.5 Cal Trig). They are stored in the VWORK1 directory with names containing the string "29_APR". We got "Innocent By Stander," "That's Me Normal," and "That's Me Mark and Force Pass" events. We could compare the L1 data to the L1.5 data in the Mark and Force Pass events. We still have not tested any "Dump Event" processing. Here are some reminders about running TAKER and ZBD: To set up taker: D0SETUP TAKER To run taker: TAKER/FU L1.5 Cal Trig test trigger definition is under: CAL ---> CAL_TRIG_L15 To setup ZBD: D0SETUP ZBDUMP (not ZBD!) To run ZBD: ZBD The lengths of the crates in the TRGR bank are: Level 1: 2847 longwords Level 1.5 Cal Trig: 426 longwords (without Mark Force Pass data) 3396 longwords (with Mark Force Pass data) The maximum length of the TRGR bank is less than 6300 longwords. If you select an ending address greater than the TRGR bank length, ZBD collects data to the end of the bank. When L1.5 Cal Trig is not doing Mark Force Pass processing, it is first in the TRGR bank. Usually, when L1.5 Cal Trig does Mark Force Pass processing, the Level 1 data is first in the TRGR bank. We have still not solved the "VBD reset required" problem. It never happens if we just pause our run, but frequently happens if we pause the run and abort the 68K (for example to load a new version of 68K services) and then start execution of 68K services at the normal $95000 entry point. We also still have not solved the "A2 ignores NMI" problem. .............................................................................. Date: 21,22,23-APR-1994 At: Fermi Topics: Install JTAG, and more L15CT modules and jumpers, first tests of 68k_Services and known problems known Vertical Interconnect problems. Installed the JTAG pod and the JTAG wire wrap board in the upper L15CT VME crate. These are installed on the piece of metal that coves the P3 backplane location. Installed the "A" and "C" Hydra-II cards. Now the Hydra-II's installed at Fermi are the following: "A" is Ariel SN# 7010 MSU SN# 1 "B" is Ariel SN# 7052 MSU SN# 3 "C" is Ariel SN# 7044 MSU SN# 2 Now all cards are installed except for the slave Vertical Interconnect from the TCC. Installed the proper (I hope) backplane "grant" jumpers to cover locations where we do not have cards installed. VME BG3 jumpers cover slots: 8, 18, 19 VBS BG jumper covers slot: 11 Worked on typing in the 68k_Services source program and making initial tests of it. So far there are three known problems-features: 1. Either this VBD is different from the L1 VBD or else it appears different to the 68k because of the VI between them. What ever the cause it ends of the the Base Addresses do not load into the VBD correctly at $B800 if they are loaded as Longwords. Note that L1 VTC code does load them as longwords!! 2. The VBD (or at least this VBD does not appear to go ahead and dump data on the floor (i.e. onto a non existent data cable) if it is controlled via its hardware SRDY and DONE lines. The L1 VBD does dump data on the floor but it is controlled via VBD registers. 3. Dean knows of two problems with VI's then they are the Slot 1 Crate Controler AND they are passing VME bus mastership back and forth with a different module. 1) with a 68k cpu module things can hang. This was noticed in the HV racks. The cause is a feature that was built into the VI for Goodwin front end software compatibility. There are PAL's available to remove this feature. 2) with VBD modules things can hang. This was noticed in CD (or else muon cates). It is not understood but is thought only to happen when the bus arbitration overlaps and VBD is master data transfer. .............................................................................. Date: 20-APR-1994 At: MSU, Topics: M111 upper Tier1 power pan breaks. Fermi Fire Tech people doing something. Power Pan Replacement at Fermi ------------------------------ Replaced the Power Pan in upper M111 (i.e. eta +17:+20 phi 1:16) Removed Power Pan sn# PDM-14 and replaced it with PDM-22. It required right about 1 hour to replace the Power Pan. I found a good strong proper height cart to use to hold the Power Pan as it was being installed. I removed the test data generator for ERPB Distributor Cap card at the same time. On TCC's disk I renamed the special Trics_Init_Auxi.dat to Trics_Init_Auxi.20APR94_Only_dat. I understand that this special file should be deleted but I wanted to check on this before deleting it. Returned the 68k VTC to running its normal program but did not delete the special eta 16 program. When the system was powered back up I had the normal ZRL induced TCC problem. TCC was awake and taking commands but I do not think that it was really doing very much with them. It would take about twice as long as normal to Initialize and then say BAD FAILURE. I NCP Trigger Booted it. This did not help. I then power cycle booted it (power cycle both TCC and its BA23 box). After the power cycle all was OK. The alarm message says that the Power Pan -4.5 brick had failed at 11:12 AM this morning. They had run with beem from about 00:30 until 5:45 this morning when a D0:: online disk locked up. They fixed that by 11:15 AM i.e. 3 minutes after the L1 Power Pan failure. It took on the order of one hour to get the eta |1:16| work around going. They ran that way until about 21:20 this evening. At that time I replaced the Power Pan while muon people did some work and then things started back up for the tail end of the store. Note we were tripped off yesterday (morning at about 11 AM I think) by the fire tech people doing something. Today when I arrived the first thing that I saw as I came in the front door was that our VESDA "thermometer" was up about 3 or 4 notches. The fire tech people were still around today. There were two of them in the 1st floor MCH when I arrived, two by the front door, and two in the kitchen. They say that right now the Trigger VESDA is jumpered out. As far as I and Maris know, no MSU people were ever contacted before this work started nor were we ever told that the Trigger VESDA protection had been jumpered out. The get it running work around from MSU --------------------------------------- The upper power pan in Rack M111 broke sometime between 5AM and noon today. At about 12:15 Jan called from the control room to report the following: M111 upper pan (pan #2) -4.5V supply was making -3V according to the low voltage monitoring system. Confirmed by Dean S measuring -3V at test points on pan. He measured approximately 0V AC. All 4 LEDs were still on. The triggering symptom they saw was "infinite Missing Pt" (I assume Dean meant that the Missing Pt triggers fired a lot). Steve and Philippe had Dean turn off the 4 Tier 1 pans in Racks M111 and M112, but leave the Tier 2 pan in M112 powered up. We at first used TRICS to tell TCC that the Cal Trig eta coverage was limited to eta +/- 1..16 thinking that this was the fastest way to get back on the air. We reminded them that they would have to modify all 16 reference set files, but they did not. No reference set download worked because TCC rejects any message with trigger tower boundaries outside the Cal Trig eta coverage defined in TCC. TCC does NOT try to program the parts of the Reference Set which are in the defined coverage--instead it returns a bad parameter acknowledgement message to COOR. To save the Control Room people from having to edit all of their Ref Set configuration files, we re-defined the Cal Trig eta coverage to +/- 1..20. We overwrote the Tree Offsets to be correct for eta range +/- 1..16 (including changing the Px/Py offset from 3 to 2). There is a new file in D0HTCC::[TRIGGER]TREE_OFFSET_ETA_16.DAT that overwrites the EM, HD and TOT 1st and 2nd Lookups tree corrections with the values for a coverage of eta = 1..16. The correction of Px and Py are also matched for having onlyl two tier #2, that is 2 counts. However the HD 2nd lookup correction value is pre-L15CT HD PROM change and is incorect. There is a temporary TRICS_INIT_AUXI file >>on TCC only<< (not backed up) that starts with ***** Disclaimer ***** This is a temporary version overwriting the tree offsets to eta =16 to remedy to the loss of a power pan in M111 This version only lived on TCC on 20-APR-1994 This version has not and doesn't need to be archived. This file needs to be deleted once the eta = 17..20 eta coverage is back online. We re-initialized and the Control Room then re-downloaded Level 1. At that point things looked about right. There were 3200 errors when initializing the Cal Trig, but all seemed to come from the caltrig at eta 17..20. We then made a temporary version of the VTC program. This temporary version forces the correct Zero Response (8) in the ADC counts for eta +/- 17..20 in the Data Block. The temporary version is called RUNME68020_ETA_16.ABS and is in the VWORK2: directory as well as in TRGCUR. We had someone from the Control Room load this temporary VTC program into the VTC. All of this work took about 1 hour, during most of which time Level 1 was not able to take data (although the rest of the experiment was). Between 5AM and noon, there were cluster problems. The broken power supply was not discovered by the Control Room until after the cluster problems were fixed. .............................................................................. Date: 6-APR-1994 At: Fermi Topics: L15CT Installation Work through 8-APR-1994 6-APR-1994 ---------- Worked on L15CT installation. Cut pins and shrouded all backplanes in 4 racks (M105:M108). This required from 22:09 to 9:07, i.e. it still takes about 3.7 hours to cut pins and shroud a rack. One significant problem was discovered in rack M106 top Tier 1 backplane. Trigger Tower -6,14 Tier 1 Backplane SN# 7 TotEt #1 is OK first 5 bits of TotEt #2 are OK, bit 6 has a pin bent into bit 7, bits 8 and 9 are OK, TotEt #3 and #4 are OK. We should check the inventory book about this backplane. The bent over pins have teflon tubing over them. 7-APR-1994 ---------- ERPB MTG to ERPB's in M107:M110. This cable goes first to M110, and then to M109, and then to M108, and finally to M107. There are three sections of cable between each rack. There are 31 sections between the M124 ERPB MTG and the connector in M107. Note this is 4 sections shorter than the run to M103 (i.e. 10 nsec shorter to match the Cal Trig Timing cables as it should). Install ERPBs, CTFE-to-ERPB cables, daisy-chain data cables, and parallel timing cables in Racks M107:M110. This work required from 21:56 to 4:55 with 2 people working. So it takes about 3.5 man-hours to stack ERPBs and cable a single rack. This is on top of the 3.7 man-hours required to cut pins. This is about as fast as it will ever go, I think. DCs are not yet installed in Racks M107:M110. We turned the L1 and L1.5 FW (and M114) back on but left the L1 Cal Trig turned off when we left (including T3). 8-APR-1994 ---------- "Ohms-check" all power pans in Racks M103:M107 and also the T3 power pan in Rack M108--everybody was fine. We turned on the Cal Trig and there were no power or smoke problems with either the old stuff or the ERPBs we installed. We performed an INITIALIZE/RESTORE and checked TRGMON afterwards. TRGMON was geeked up, it showed big (but mostly or completely unchanging) global energy and momentum sums and lots of Large Tiles above Reference Set). We didn't look at ADC counts. All T1, T2, and T3 CAT LEDs showed lots of energy also so this was not just a readout problem. TRICs did not report any errors with the INITIALIZE/RESTORE. We tried to EXCLUDE all Trigger Towers. TRICs again reported no errors but TRGMON was still showing the same big global energy sums, etc. We checked CAT LEDs and they were still munged. We still did not think to look at ADC counts. We tried a full INITIALIZE. Again no errors from TRICs but no change in TRGMON or CAT LEDs. We checked the suspect Tier 1 Timing Signals but they all looked good. We then looked at ADC counts and saw random (but mostly large) values. We tried to EXCLUDE a single EM Trigger Tower. TRICs responded with BAD PARAM. We then rebooted the TCC without giving Philippe a chance to look at anything. This was stupid but rebooting the TCC did solve the immediate problem. This is now the second time that we have seen this "third" type of TCC problem. Both times have been associated with (long?) shutoffs of M114. It sounds suspiciously Zeller-related. The next time this happens we need to let Philippe take a look. We re-routed the AC cabling for the fans, water valve, and Norm Amos crate in Rack M124. .............................................................................. Date: 31-MAR-1994 At: Fermi Topics: L15CT Installation Work: through Ironics, Vertical Interconnect, 2-APR-1994 Possible L1 Problems Dean Sees, Made a run of Find_DAC, some Level 2 disable scalers look funny, +14,-14 Ref Set Test 2-APR-1994 Made a run of Find_DAC, some Level 2 disable scalers look funny Yesterday they made another run for Kathy Streets for L1 calibration The run only lasted about 10 minutes (most of which I was on the phone and talking to people about the -14 vs +14 "problem"). Thus I only was able to watch TrgMon for about the last one minute of this run. I watched Global CT display and not the scalers. About 1/2 of the events that I saw were from -14 and the rest from the other 7 eta rings. Between stores I setup a +14 only Ref set and a -14 only Ref set and tied them to two Spec Trigs with a count threshold of 1. I lowered the Ref Set to 1 GeV to get some rate. Typically this was about 1 Hz with possibly the -14 being higher. The accelerator then injected 6 small proton bunches and the +14 rate when up to typically about 10 Hz. The accelerator then did something that made the D-Zero loss monitors go way up and the +14 rate when into the hundreds and the -14 rate into the tens. 1-APR-1994 ---------- I got a master and a slave Vertical Interconnect from Greg Cisko. The address switch on the Master was set to $20000000 and I moved things around until the master woke up at $10000000. I now think that the rocker closest to P1 is the A31 key and that the lower order address lines work their way towards P2. But there is no documentation that explains this. On the Slave Vertical Interconnect I added jumpers J10 and J15 to turn on all of its slot 1 controller functions. I believe that the DIP switch on the Slave Vertical Interconnect puts it at $4000 in Short I/O space but once again this is not clear to me. Tested the Vertical Interconnect, the VBD Buffer, the two 214's, the Ironics control of the 214's via the Readout P2, and the VSB Bus. This was done just using the 135Bug MM, MD, and memory test routines. Control Cables to the MVME214 -------------- Slot13 Slot14 Slot #13 Slot #14 RC Ironics Port 4 Cab #1 Cab #2 MVME-214 SN#11 MVME-214 SN#12 ------------------ ----- ______ -------------- -------------- write $10 read $EF 0V +5V no VME VSB ok VME ok no VSB write $20 read $DF +5V 0V VME ok no VSB no VME VSB ok 31-MAR-1994 ----------- Level 1.5 CT Work ----------------- Installed the -5.2 power wiring and the inter-P2 cabling in the upper L15CT crate. Cut an opening the the rear lower air baffel to provide access to the VBD P3 connector for the SRDY and DONE signals. This VBD is now operating OK. It control reg had a bad value for the crate ID address which it checks at wake up time. I moved the MVME135 to the Bus #2 section of the crate to check out the VBD and the Short I/O memory. The Short-214 that I'm using is the official spare from here. It was setup in some goofie way (i.e. decoded 8k at $8000). Thus it was NOT really a good spare card for here. It is now back to normal (i.e. 2k at $9000) and is working OK. I brought three of the P2 Paddle Cards here and they are installed: Term Select, Readout Control, and Path Select. I had to cut the bottom rear panel to allow room for and access to the inter-P2 cables. I brought three Ironics cards here: SN#3, SN#4, and SN#6. These will be used to service the above 3 P2 Paddle Cards. There was a spare Ironics card here at Fermi (SN# none) which I also used. This card was the one that had operated at NWA. I have plugged the Ironics cards into the crate. I was concerned that without software to write "0's" into the bits that are inputs that there could have been fighting lines (i.e. external driver is high and the Ironics bit output driver is low. The Ironics manual does not say that the VME crate reset signal does anything. I traced the SYSRESET signal on an Ironics card and it in fact does clear all six of the National 8211 output driver chips. So it is OK to plug the Ironics into there P2 cards even through there is no software yet. The Ironics were installed as follows: Crate Slot Serial Number Base Address Function ---------- ------------- ------------ ------------------ 2 SN# none $F010 M103 Comm 3 SN#3 $F020 Term Select 5 SN#4 $F040 Readout Control 6 SN#6 $F080 Path Select Possible L1 Errors ------------------ Dean reported to me two possible L1 errors: Now that muon has gotten rid of most of there problems, about once every 15 to 20 minutes of normal global physics running Dean sees crate $B "stale data" message. This means that when the token came around the data in the VBD had an "older" 4 bits of sync information. My best idea so far is wait until it does this and then run into the VTC console and look for an error. I could also change the VTC program to NOT write the 1's and 0's so that error messages would stay visible. Near the end of this afternoons run Dean said that a couple of times Spec Trig #22 went to a very high rate and then after perhaps 10 minutes it came back down to a normal rate. During the high rate time Trig Tower -9,22 showed up in the Total Et Jet List all of the time. The main readout does not show energy in this Trig Tower region during the period of abnormal high rate. .............................................................................. Date: 22-25 MAR 1994 At: Fermi Topics: L1.5 CT installation and testing work Some random loops with errors We installed some more of the L1.5 Cal Trig hardware at Fermilab. Some of this installation work is permanent, and some was undone and brought back to MSU when we left. The things that were installed were: Component Still at FNAL? --------- -------------- DC's in Racks M103 and M104 YES (including all necessary cabling) (DC S/N-P2 in M103, S/N-P3 in M104, these are DCs from the first Prototype manufacturing run) CRC S/N-1 (with only 2 Channels) in M124 YES ERPB MTG PROMs and Patch Panel YES Hydra Card "B-2" YES (including 8 cables to CRC) MVME135 CPU YES MVME214 Memory Card YES Debugger Pod for Hydra YES IBM-PC to operate debugger NO We spent most of this week performing data transfer tests for the L1.5 Cal Trig. We wanted to demonstrate that we can reliably transfer data from the 2 racks of CTFEs that are currently instrumented all the way into the DSPs. In this we were successful. We have been able to perform over 1,000,000 readout cycles (corresponding to 1GB of data transferred from CTFEs to DSPs) with no errors that can be blamed on the data transfer mechanism. We first used the counter/switch board to inject data into a single DC, and demonstrated that we could transfer data from this counter/switch board into both CRC Channels (and then into the DSPs). We ran the "real" L1.5 Cal Trig code on the DSPs and wrote Mark and Force Pass Data into the MVME-214 memory module. The Service 68K CPU provided the necessary EC/RC services, and also checked the data in the MVME214. The checking was done by allowing the 68K to "learn" the data pattern on the first transfer, and then requiring it to "check" the data pattern on every subsequent transfer. It examined the entire Mark and Force Pass subsection (for DSP B1, B3, and B4) of the Data Block. We then allowed the CTFEs to provide the data for the DSPs. This was done using "Pallet" .DAT files written by Philippe. These .DAT files paint some pattern in all of the CTFEs in a given rack. We again used the "learn and check" control software to verify the transfers. We also knew how the TT Data in the DSPs should look (because we knew what we put in the CTFEs) and checked by hand to verify that the data looked correct. We have several "Pallet" files. Each attempts to find a different failure mode. Each "Palette" file is named [TRIGGER]L15CT_PALLET_M10%_*.DAT Where the '%' is either '3' or '4' (indicating Rack number), and the '*' describes the data which should be in the EM TT Data section in the Local DSPs. The Tot Et tries to have the same pattern as EM, but the intent failed at |eta|= 2 and 3 because some of the values that need to be loaded in the HD channel are close to 8 (but different) and fall within the low energy cutoff. TOt Et at |eta|=2 is ahead by one count of the desired value and |eta|=3 is ahead by two counts The existing "Pallet" files are: '*' Description --- ----------- 80 Each EM Longword in DSP Memory = $80808080 \ exercise ability to 7F Each EM Longword in DSP Memory = $7F7F7F7F / drive all bits low/high 55_AA Each EM Longword in DSP Memory = $AA55AA55 (+Eta) \ (max switching) $55AA55AA (-Eta) | alternating bits AA_55 Each EM Longword in DSP Memory = $55AA55AA (+Eta) | in words and $AA55AA55 (-Eta) / in byte stream COORD Each EM Longword matches eta/phi coords in hex (check readout order) These files can find bits stuck low, bits stuck high, neighbor bits shorted together, some classes of readout order problems, some ERPB problems (internal Xilinx screw-ups), and also can maximally torture the data transfer by requiring each data line to make a transition on every byte transfer. All of these files have been run and checked both by hand and by the "learn and check" program. These files can be cloned to test other racks by EVE-search/replacing the MBA. We found no design problems with the data transfer mechanism but we did find some problems with ERPB cards and CTFE cards. What we found was: (1) the top ERPB in Rack M104 (S/N-18) appeared to have a problem of repeating the TT Et in both Towers serviced by one Xilinx chip. We replaced this ERPB with ERPB S/N-19 and the problem went away. (2) the CTFE servicing ETA = +1..+4, PHI = 26 appears to have bit of value 2 stuck "on" in the 9-bit Total Et output driver for ETA = +2. This CTFE has NOT been repaired or even looked at. We will wait for the next convenient power-off opportunity. Running some Random loops on 23-MAR detects some weird error after 10-40k loops. One or a random subset of the 4 Global EM Tower Counts are off by +1, -1, -2, +2 count, but this is never repeatable when redoing the same loop. Also notice that there are some instances with combinations of -1 and -2 or +1 and +2 among the 4 reference sets, but all ahead or all behind at once. Philippe used the Tree Browser to try and locate the problem. Let it run until it fails, write down all the tree outputs, then make it redo the same loop and chase down which output has moved, then start over one lower level down in the tree. Chase it down to SIGN_ETA(NEG) MAGN_ETA(13:16) PHI(1:8). The symptoms seemed strange, with more than one bit, or even one word into the CHTCR being different. But the bits that are unstable seem to come from the PHI(3) card. I could witness only two instances of this before we had to relinquish the system. Also remember that we just turned additional clocking of the CTFEs right before running these tests; this might be what has pushed this CTFE over the edge. Also, while investigating the above problem, Px started acting up in the same way as 16-FEB entry. This time Philippe tried to modify one thing at a time to locate the source of the problem. Pushing on the Px card in the crate at (eta,phi)=(-5:8,17:24) didn't help. Unplugging, inspecting and replugging the cable from the Tier 1 cat2 did not help. But shoving the Tier #2 Px cards made the problem go away again, even though no movement of the card could be felt. .............................................................................. Date: 22,23-MAR-1994 At: Fermi Topics: CTMTG PROMs, DCs installed Foreign Scalers/AND_OR Terms for Norm Amos Installed DC's in M103 and M104. Used the ERPB Test Data generator on the DC in M104. Used the 68k program that checks the data in the MVME214 based on the low order byte of the first Tot Et longword. Installed new PROM's in the Calorimeter Trigger MTG in PROM positions #1 and #2. PROM #1 moved from SN# 1L to 1M. PROM #2 moved from SN# 2K to 2L. Checked the serial numbers of the Tier 1 Backplanes at the high eta racks. We need to check this again against the inventory log book for the backplanes to see which ones have the new style short pins. M110 M111 M112 ------ ------ ------ Top SN#17 SN#19 SN#22 Bottom SN#18 SN#20 SN#21 Renamed a AND-OR Input Term for Norm Amos. And-Or Input Term number 120 was changed from MR_CAL_LOW to CAL_RECOVERY. Work on setting up new Foreign Scalers for Norm Amos. The following table shows the wiring between the Bagby M122 rack NIM to ECL module and the Foreign Scalers. The changes are indicated by a "*". NIM to ECL Pair on Module Lemo the 17 Connector Pair Cable What signal is it. Where does it go. ------------- ----------- ------------------------------------------------- top 17 Reset BX Count into MR 29 cycle to Foreign #4. 2nd from top 16 L0 Fast Z Good to our scalers. 3rd from top 15 Not connected to any of our stuff. Mod Ch in use. 4th from top 14 Not connected to any of our stuff. Mod Ch in use. 5th from top 13 Qty #3 to the per Bunch Luminosity Scalers. 6th from top 12 MRBS_Loss signal to Foreign Scale #1 Gate A. 7th from top 11 MicroBlank signal to Foreign Scaler #2 Gate A. 8th from top 10 MRBS_Loss .or. uBlank to Foreign Scaler#29 Gate A. * 9th from top 9 This is now a free Foreign Scaler. Foreign Scaler #36 Gate A DBSC Ch #1 in slot 11 CA=32 this was MR_Veto_Cal_Low *10th from top 8 BX_Cnts_MR_Hi_or_uB_or_Mu_HV BX_Counts_of_ MR_Veto_High_or_Micro_Blank_or_Muon_HV_Recovery Foreign Scaler #35 Gate A DBSC Ch #2 in slot 11 CA=32 this was MR_Veto_Muon_Low *11th from top 7 BX_Cnts_MR_Hi_or_Low_or_Mu_HV BX_Counts_of_ MR_Veto_High_or_MR_Veto_Low_or_Muon_HV_Recovery Foreign Scaler #34 Gate A DBSC Ch #3 in slot 11 CA=32 this was MR_Veto_Cal_High *12th from top 6 BX_Cnts_of_MRBS_or_uB_or_Mu_HV BX_Counts_of_ MRBS_Loss_or_Micro_Blank_or_Muon_HV_Recovery Foreign Scaler #33 Gate A DBSC Ch #4 in slot 11 CA=32 this was MR_Veto_Muon_High 13th from top 5 BX_Counts_of_MR_Veto_Low Foreign Scaler #32 Gate A DBSC Ch #1 in slot 12 CA=35 14th from top 4 BX_Counts_of_MR_Veto_High Foreign Scaler #31 Gate A DBSC Ch #2 in slot 12 CA=35 *15th from top 3 BX_Cnts_of_MR_Veto_High_or_Low Foreign Scaler #30 Gate A DBSC Ch #3 in slot 12 CA=35 16th from top 2 NC Recall that the Lemo on the Module to pair number on the 17 pair twist and flat is the following: Top Lemo is pair #17, the bottom Lemo is pair #2 and pair #1 is not used. The proper (I hope) edits have been made to TrgCur:Trics_Boot_Auxi.dat to get this new Foreign Scaler information put into the Begin Run End Run Pause Run Luminosity files. Put the old version of Trics_Boot_Auxi.dat in [TrgCur.Obsolete]. The new version of Trics_Boot_Auxi.dat is in D0:: TrgCur:, D0HTCC::[Trigger], and MSUHEP::TrgCur:[DZero]. The proper (I hope) edits have been made to HTrgMon:TrgMon_FS.RCP to include these new Foreign Scalers in the TrgMon Display. This file was then copied to MSUHEP::HTrgMon: and to D0::User1:[Trguser.TrgMon]. I edited the [D0_Text.Scalers]Scaler_Assignments.Txt to show the new use of these scalers. .............................................................................. Date: 17-21-MAR-1994 At: MSU Topics: Counter Switchboard - DC - CRC - DSP Timing DC setup Standardized how to set up the DC's which will be used at FNAL. The switches should be set as follows: H2: This jumper block chooses the delay between the data transition and the rising edge of the Strobe to the CRC. Currently we would like to have 10 ns of set-up time between the data transition and the rising edge of this strobe. Set H2 as follows: (backplane side of DC) o o o o o===o <- set the jumper at the 3rd position from the o o backplane end of H2 o o o o o o o o o o H3 and H4: These jumper blocks allow spare signals to be sent to or received from the CRC card. For now we are not using any spare signals to/from the CRC so these jumper blocks should be left completely unused (no jumpers or wires installed) SW1: Mode/ID Switch Positions 1 through 4 of this switch set the ID of this DC. Position 1 is the MSB, Position 4 is the LSB. These switches indicate which Rack is being serviced by this DC: ID Rack Number -- ----------- 0 M103 1 M104 . . 10 M112 Positions 5 through 7 of this switch are not used and should be set to the DOWN position. Position 8 of this switch selects whether the MTG or the Serial Configuration PROM (SCP) is the source of the Xilinx configuration. It should be set to the DOWN position (selecting the SCP as the Xilinx configuration source). SW2: SW2 is used to select various "bells and whistles" of the DC. Positions 1 through 7 of this switch should all be OFF (down) for now. Position 8 of this switch should be DOWN for POSITIVE ETA, and UP for NEGATIVE ETA. This switch value is sent to the ERPBs and they use it to select readout order. Dan and Steve spent several hours working with the "Dan Counter" to DC to CRC to DSP data transfer. This is a summary of what we have learned: (1) The following files are stored on the L1.5 Cal Trig Disk for the Logic Analyzer: C4PSS_1.M20 all 100 ns byte period, 10-ns delay (i.e. H2 C4PSS_2.M20 set on 3rd position), 40-ns DC_STROBE (i.e. C4PSS_3C.M20 the 2nd version of the DIST4 GAL [dist4v01]). The one labelled 3C shows every 4th /CRDY (corresponding to the 1st longword in a transfer following a longword transfer) "stretched". These were taken with only one DSP running all 4 Comm Ports. C8PSS_1C.M20 all 100 ns byte period, 10-ns delay (i.e. H2 C8PSS_2.M20 set on 3rd position), 40-ns DC_STROBE (i.e. C8PSS_3C.M20 the 2nd version of the DIST4 GAL [dist4v01]). C8PSS_4W.M20 The ones labelled C show every 4th /CRDY (corresponding to the 1st longword in a transfer following a longword transfer) "stretched". The one labelled W shows every 4th /CRDY stretched longer than typically seen on the "C" captures. These were taken with 2 DSPs each running all 4 Comm Ports. All traces are from ONLY ONE DSP. (2) There are 3 classes of timings that are seen. These classes are: (a) a "normal" byte transfer (i.e. not the 1st byte after a complete longword transfer). (b) a "C-capture" version of the 1st byte after a complete longword transfer (c) a "W-capture" version of the 1st byte after a complete longword transfer. (3) The critical timings we have seen are (note: all events are actually recorded at the CRC end unless otherwise noted. All recordings at the CRC have been "de-skewed" with respect to each other [done by measuring at similar points on the CRC path, NOT by calculating skew and subtracting from measurements]). (a) For a "normal" byte transfer: .............................................................................. Date: 9,10-MAR-1994 At: D-Zero Topics: Install ERPB's in M103 and M104, Install cables to M124, Repair L1 Trigger Tower +5,22 EM, Power Cycle boot. Installed ERPB's in the bottom half of M103 and in M104. It takes about 2 1/4 hours to clip and shroud a rack and another 2 hours to stack and cable the ERPB's. The ERPB's that are installed are serial numbers (top to bottom of rack): M103: 6, 5, 2, 1, 10, 9, 7, 8 M104: 18, 17, 16, 15, 14, 13, 12, 11 Work on making cables to go between M124 and the L1 Cal Trig racks. Measure some lengths: From inside M124 to the M105 top entry clamp is 14 sections. From M105 top entry clamp to D.C. in M103 is 6 sections. From inside M124 to D.C. in M112 is 27 sections min. Maked the M103 to M124 D.C. --> CRC Cable 20 sections long. Maked the M104 to M124 D.C. --> CRC Cable 19 sections long. Maded the M124 to M103:M106 MTG --> D.C. Cable 35 sections to the first D.C. connector (which is in M103), then three sections to the next D.C connector, and then three more sections to the next D.C connector and finally three more sections (for a total of 47 sections to the D.C. connector in M106. Note that the TSS Bus cables (and the CBus cables) have three sections between racks in the L1 Cal Trig. An Owen Pulser Run showed that +5,22 EM was too big by a factor of 2. The problem was a cold solder joint on the ground leg of R5 in the Term-Attn. When I powered up the L1 system on Thursday morning after working Wednesday night installing ERPB's, the system would not initialize properly. TCC was running OK. TRICS software was OK, i.e. it would take my Initialize All command and then come back with a Bad Failure after about one minute (i.e. about twice as long as normal). The lights on the BBB cards in M114 would flash during the initialize so something was happening. The TRICS log file had many messages of the following: Assistant CBus Not Immediately Released I ended up panicing and power cycle booting the TCC i.e. power cycling both the 4000 itself and its BA23 box. After doing this the L1 system initialized all OK. After talking with Philippe we suspect that one of the DRV11J cards may have become confused (because L1 power was off) and needed to be reloaded. If I had power cycled only the BA23 box then the TRICS software running in the 4000 would have automatically reloaded the DRV11J's when power was restored to the BA23 box. There was no indication of any problem in the 4000 box or in the running TRICS software. .............................................................................. Date: 3,4-MAR-1994 At: D-Zero Topics: Bring L15CT VME crates and CRC crate to D0 and install them, Bring MTG for L15CT to D0 and install it, Cables from M124 to M114, Install some ERPB in the L1 racks. Bring MTG card SN#24 to Fermi for use as the CRC_MTG. Bring MTG card SN#27 to Fermi as a spare. SN#27 has the ECO for the global external signal input. Install the two L15CT crates in M124. L15CT crste SN#1 is on top and SN#2 is on the bottom. Also install the CRC Crate, the radiators, chassis supports, and front panels. To make room the IBM token ring stuff in the back of the rack was moved up so that it is behind the Shea modules. To make room at the bottom back of the rack, the two LAr monitor HV fanout modules were removed from their card file, the card file pulled out, and the modules put on the floor of the rack behind Norm Amos's NIM bin. Norm Amos still has his Active MR Veto NIM Bin in the bottom of M124 screwed in from the back. I strung the CBus cable (Assistant COMINT CBus #3) from M114 to M124 along with a 34 conductor cable for control signals. These cables are layed up against the north vertical wall of the cable tray that runs along the back of the north aisle of racks. Over this I put another 64 twist and flat to act as a shield between our stuff and the rest of the stuff in this cable tray. The L15CT Power Pan is on top of the air conditioner but not yet tied down. A tie down platform needs to be made for this and the VT terminal that will be used for the L15CT 68k. The CRC_MTG is powered up and running. It should be addressed as CBus #3, BBA 88, MBA 89, CA 35. It receives its Clock and Once per Turn Marker from the monitor output on the L1 Framework Main Timing MTG. These signals are carried from M114 to M124 over the 34 conductor cable. See the file in TrgL15CT:[Hardware_Software_Text] for more details about CRC_MTG. Signals carried on the 17 pair control cable between M114 and M124. Pair Function ------- -------------------------------------------- 1 Not used, shield 2 Once per Turn Marker to the CRC_MTG "MTG PROM Address Counter Reset" 3 Not used, shield 4 Clock to the CRC_MTG 5 Not used, shield 6:17 Not yet assigned Friday afternoon the online cluster crashed so I got a chance to install some ERPB cards. 4 ERPB's are installed in the range eta +1:+4 phi 1:16. They look fine. The daisy chain cables are installed but no parallel timing cable yet. The ERPB's appear to support themselves OK. The parallel timing cable could be used to give so support. Where does the DC plug in, i.e. to just 2 connectors or to all three? The yellow and red LED's are ON and the green LED is OFF. I estimate about 2 hours per rack to install the ERPB's plus extra time for the DC and the cables back to rack M124. I need to make a 1U patch panel for the L15CT. I think that it can replace the 1U air flow panel just below the CRC Crate. Need to make a holder for the top of the 1st floor MCH air conditioner to hold the L15CT Power Pan and the VT terminal for the L15CT. I think that the top of the air conditioner is 31" wide (north-south direction) and with the Power Pan up there, there is 13" left for the VT Terminal, which should be just enough. .............................................................................. Date: 16,17,18-FEB-1994 At: Fermi Topics: Check the M109 Tier 2 CTMBD, Replace two CTFE PROM's, Test with Cal Random program and over night test runs, Cook two MVME-135 U86 PAL's, Finish the Two COMINT ECO to COMINT SN#08, Measure the current space for L15CT, Installed more drip Detector strips and connected the RMI to our RPSS, G10 shroud for bottom M114 radiator, Description about modification to a CHTCR to readout the Beam Crossing Number, Check missing Large Tile supposedly fixed with CTMBD swap last week: >> The problem isn't fixed<<. Things are "less bad": only one tower and one refset and bad only about 10% of the time. Exclude all EM and HD towers, and set all large tile reference sets to 0 GeV. Then use TRGMON's "spy window" to display in Hex mode at item 4945. All large tiles should appear above threshold for all refsets and show as FF. But item #4947 still reads as FB about 10% of the time. This is Large tile +9:12,17:24 for large tile refset #0. Replace 2 PROMs (HD PROM at +17,12) and (Px PROM at +3,15), then run the lookup PROM test with success on just these proms. Philippe found problems with these two PROM's a couple of weeks ago when he was checking all PROM's. See 13-JAN-1994. Run Lookup test on all PROMs (test now handles Tier#1 truncation and negative numbers) (17-FEB 22:23...18-FEB 03:04) One error detected HD low by 4 counts NEG,E_1,P_1 page #2 EM 255 & HD 165. Doing the same loop again repeated the error. Note that this is page #2 and that the HD count is off by 4 counts, like in the random test errors below. Run Prom test on this prom only (just HD, then all 4 proms), no error. Redo prom test on same page the next day, and there were no errors. New problem detected while trying to run random test. Px low by 8 counts, even before the first loop is run. Using Tree browser, locate the problem coming from cell -5:8,17:24. The output of the CTFE all read ok, but the input to the tier #2 is low by 8 counts. The lights on the Tier #1 card were displaying the corect number. Philippe pushed on the front connector of the tier #1 Px card, and pushed hard on the Px cards of Tier #2. Nothing seemed incorrectly seated, but the problem went away. Start Random test overnight. When they initialized the trigger in the morning, caltrig had done 3,382,163 loops. But, because of operator stupidity, the test ran on positive etas only. The details of all random tests run: 162,000 16-FEB 23:37 - 00:02 no error until operator stop 3,382,163 17-FEB 00:05 - 08:42 no error until init (but pos_e only) 87,955 17-FEB 11:28 - 11:42 HD 2nd lookup Sum is low by 4 counts error not systematic re-doing loop this was on lookup page #2. 1,060,237 17-FEB 14:12 - 16:58 HD 1st lookup low by 4 counts error systematic re-doing loop this was on lookup page #2. 125,000 18-FEB 10:31 - 10:11 no error until INITIALIZE 74,000 18-FEB 10:46 - 10:58 no error until INITIALIZE 529,000 18-FEB 10:59 - 12:23 no error until INITIALIZE 98,000 18-FEB 12:57 - 13:13 no error until download ---------- 5,518,355 loops Neither one of the HD errors in random test were properly traced, because of operator stupidity while doing two things at once, and weakness in tree browser that needs to be fixed. Cook two U86 PAL's for the MVME-135 CPU cards. These are original parts to replace a PAL that I damaged a couple of weeks ago. Add the Two COMINT ECO to COMINT SN# 08. This is the card that had been running at D-Zero up until the time that we installed the Two COMINT setup. A lot of work was required to remove and rework other "white wires" that had been added in earlier ECO's because of the way that these wires were routed and glued down. Measured the current amount of space available for the L15 CAl Trig. It is about 53 3/4". This is about 30.7 U. It is integer U at the top and fractional U at the bottom where it meets Norm Amos's Main Ring Veto stuff. Each of the two L15 Cal Trig VME crates are 9U (i.e. 15 3/4") plus a 1U (i.e. 1 3/4") fan tray. Thus both L15 Cal Trig VME crates plus their fan trays take up 20U total (i.e. 35"). This leaves about 18 3/4" (about 10.7U) for the following: Card file for CRC, CRC power supplies, vertical air in and out of the top L15CT crate, vertical air in and out of the bottom L15CT carte. RMI Drip Detector ( 17-FEB-1994 23:08 - 18-FEB-1994 01:18 ) We installed Drip Detector strips for the 8 pack radiators in racks M100 and M113. Because there have been no false trips from the Drip Detector in the past week we went ahead and connected the Drip Detector so that it would trip off all the L1 power. To do this we installed the RMI to RPSS Box. This box is located in the bottom of rack M113 behind the RPSS. It connects to the RMI in the top of M114 via a long green RG58 BNC cable. This box is plugged into the M114 slot of the RPSS. If the RMI detects a water leak the RPSS will trip showing "Air Flow" and "Water Pressure" faults in M114. The RMI output is also connected to one of the spare rack voltage monitor channels and Dan Owen has an alarm set on this channel. This channel is labeled LV1FW_M114_4.DRIP. We did a number of tests to prove that the RMI Drip Detector would trip the RPSS and that the "voltage" read out from the RMI output was close enough to fit the 0.5 Volt tolerance that Dan Owen has set on the alarm for this channel. I need to make a written description of the RMI to RPSS connection and the operation of this setup. I will put this in the TrgHard:[RPSS] directory. I will also move all of the safety system related documents from my private directories to the TrgHard:[RPSS] directory. Philippe installed a G10 shroud around the lower radiator in M114 to protect the backplanes, cards, and power supplies from a leak at the hose connection end of this radiator. There still is no shrouding around the upper two radiators in M114. Leaks from these two radiators would be very damaging. Started writing a file in TrgHard:[CHTCR] to describe the modifications to the CHTCR card that are required to read out the Beam Crossing Number. .............................................................................. Date: 10,11,12-FEB-1994 At: D-Zero Topics: Bring spare cards to D-Zero Hall, Swap the CTMBD in M109 Tier 2, Setup two new Foreign Scalers for Norm Amos, Measure Muon to Us cables, Install Drip Strips and an RMI, Pull out old TCC, Cook CHTCR C2R1 PROM's Bring 3 CTMBD cards to D0. CTMBD SN# 35 wired for Tier 1, CTMBD SN# 08 wired for Tier 2, and CTMBD SN# 17 wired for Tier 2. Return the BBB SN# 09 to D0 Hall. A couple of weeks ago this card was pulled but the problem ended up being no CBus or Time and Sync Bus terminators on the cables going to this card. Swap the CTMBD in M109 Tier 2. Pull CTMBD SN# 14 and install SN# 08. This card is being pulled because during data block building it does not read the first location correctly (an LTCC card). See last weeks log entry. After this swap I did not see any more "No Candidates In LT JL" messages. But in a couple hours of global physics running there were about one hundred "LT JL Overflowed" messages. Many Control Room people do not understand the difference between these two messages. Rich Astur currently has his call to the routine to make the LT JT set a limit of 20 entries. Remember to look at the sort alarms output to understand how many errors there are during a run. Setup two new Foreign Scalers for Norm Amos. These will watch the new active Main Ring Veto setup. Foreign Scaler #32, the DBSC Ch #1 in slot 12 CA=35, is feed from the NIM-ECL Converter Ch #13 in the Bagby rack. This is called BX_Counts_of_MR_Veto_Low. Foreign Scaler #31, the DBSC Ch #2 in slot 12 CA=35, is feed from the NIM-ECL Converter Ch #14 in the Bagby rack. This is called BX_Counts_of_MR_Veto_High. The proper (I hope) edits have been made to TrgCur:Trics_Boot_Auxi.dat to get this new Foreign Scaler information put into the Begin Run End Run Pause Run Luminosity files. Put the old version of Trics_Boot_Auxi.dat in [TrgCur.Obsolete]. The new version of Trics_Boot_Auxi.dat is in D0:: TrgCur:, D0HTCC::[Trigger], and MSUHEP::TrgCur:[DZero]. The proper (I hope) edits have been made to HTrgMon:TrgMon_FS.RCP to include these new Foreign Scalers in the TrgMon Display. This file was then copied to MSUHEP::HTrgMon: and to D0::User1:[Trguser.TrgMon]. I edited the [D0_Text.Scalers]Scaler_Assignments.Txt to show the new use of these scalers. I checked with Norm and he says that it is OK for me to disconnect the "ECL Box" setup that I made for him to bring some Muon L1 Terms to his stuff in the bottom of M124. I unplug cables going to the Muon system so that they could measure the electrical length of the cables. The lengths are: Muon L1 Trig's to the L1 Framework 85 nsec. Muon L1.5 Trigs (Answer and Done) to the L1.5 FW 92 nsec. Trig-Acq-Sync Cables from L1 FW to Muon 80 nsec. Pulled the old TCC uVAX II out of the top of M114. Installed the RMI at the very top of M114. Need a front panel that is about 5 screws high to cover the rest of the space where the old TCC was. The output of this RMI is connected to Entry 03BE of the Shea ADC CH# 62 of Node 74D. This is the next channel after the -5.2 Volt monitor from M114. This RMI is not yet connected to our RPPS so it can not trip our stuff yet. Installed drip strips between all of our long string of racks (i.e. ll strips). I need to install more strips at each end for the 8 pack radiators and do we want drip strips in M114 ?? Verified that the connection from the RMI to the Voltage Monitoring system was reading out OK Cooked 16 more of the big PROM's 82HS321 for the CHTCR cards. Each card required 8 of these parts and there are 2 cards at MSU that need these PROM's before we can test them. This leaves us no spare programmed C2R1 parts. We still have about 30 unprogrammed parts. .............................................................................. Date: 1,2,3,4-FEB-1994 At: Fermi Topics: Save/archive single comint TCC files Create dual comint version TCC files and DIRECT_TO_TCC files List of COMINT cards now at Fermi, Made new COMINT to VMX Driver cables, Results of rate tests with Two_COMINT_Operation, Problem of bad Timing Signal TSS-L in the Tier 1's in M111 and M112 Phi 1:16, Work on the No Candidates in the Large Tile Jet List Problem. archive TCC files ----------------- Verify and Copy latest revision of all TCC files from TRGCUR: to [.OBSOLETE], and save all these files in [.OBSOLETE.ONE_COMINT]. [.obsolete] is now empty and ready for the dual comint files. Upgrade all .DAT files for dual comint operation: ---------------- TRICS_BOOT_AUXI.DAT for address of end_run scaler readout get ready for "end of run" crash recovery file TRICS_INIT_AUXI.DAT and add reminder of related .DAT and .MSG files TRICS_FORCE_BUF_UPDATE.DAT TRICS_L1_IGNORE_L15.DAT and add header, and comments TRICS_L1_OBEY_L15.DAT and add header, and comments IGNORE_L0_FAST_Z.DAT and fix Momentum Lookup programming OBEY_L0_FAST_Z.DAT and fix Momentum Lookup programming DIRECT_TO_TCC files ------------------- Delete USER1:[TRGUSER.DIRECT_TO_TCC]RESET_GEOSECT_IN_L15.EXE, it was obsolete rename USER1:[TRGUSER.DIRECT_TO_TCC]TEMP_NO_L0MI_V80.MSG *.obsolete Upgrade all .MSG files for dual comint operation, files affected: ETA_32_TREE_CORRECTION.MSG FORCE_L0_FAST_Z.MSG RESET_GEOSECT_IN_L15.MSG Switch to dual-comint IO ------------------------ The COMINT cards at Fermi are the following: The COMINT that had been running for the past year or so is SN# 08 with CDBE SN# 06 The COMINT that had been the FERMI spare for the past year or so is SN# 09 with CDBE SN# 05. Note that this card was not current on ECO's i.e. it would not have worked. It did not have the ECO to use the Card Address PROM bit to stop the Data Block Builder, it send out a Data Block Complete signal instead of a Data Block Builder Busy signal. The COMINT that was brought from MSU to Fermi this week is SN# 06 and there is no SN# on its CDBE card. Steve has just checked the inventory of CDBE's. We will call this CDBE SN# 08. Next time that this COMINT is pulled out we should write the CDBE SN on it. Currently we have COMINT SN# 06 installed as the Pilot and COMINT SN# 09 installed as the Assistant. Before we started Two_COMINT_Operation we tested both COMINT SN#06 and SN#09 in Single_COMINT_Operation. They were both OK but L1.5 triggering was not running during these tests (but double buffering was in frequent use). Made new COMINT to VMX Driver cables. The twist and flat cables are 4 twist sections long. The single pair cables are about 4 to 6 inches longer than these new twist and flat cables. The old twist and flat cable that we removed was 9 twist sections long (i.e. about 15 feet long) and the single pair A13 cable was 10 feet long. Results of rate tests with Two_COMINT_Operation: Running with a prescale of 500 we see the system slowly oscillate between two consitions. The period of oscilations is perhaps in the 10 to 30 seconds range. The two limiting conditions are: Waiting for a Hz FE Busy % Free VBD Buff % ---- ----------- ----------------- 420 20% 15% 475 0.2% 0.5% Running with a prescale of 600 the system is more stable and it generally sits around about: Waiting for a Hz FE Busy % Free VBD Buff % ---- ----------- ----------------- 435 1.5% 1.5% If we pull the control bus cable to the L1 VBD (so that it thinks that it always has the "Grant" to send data up to L2) then we see that L1 builds and sends Data Blocks at a maximum rate of about 520 Hz. If we keep the VBD always granted and Stop reading the Vertical Interconnects then we see a maximum rate of about 602 Hz. In all conditions except the last one the timer that counts how long VTC has to wait after finishing reading the Vertical Interconnects until the Slave Ready signal arrives reads zero, i.e. it took so long to finish reading the Verticals that the Data Block Builders were finished before the Vertical reads were finished. In the last measurement, were we dropped reading the Verticals, then this counter said that there was a 0.38 msec/event wait between finishing the Vertical reads and receiving the Slave Ready. Problem of bad Timing Signal TSS-L in the Tier 1's in M111 and M112 Phi 1:16. TSS-L comes from Cal Trig MTG Ch# 13. The BBB card for the upper Tier 1's in M111 and M112 is in slot #17 of the upper backplane in M114. The back plane pass through pin that carries non-inverted MTG Ch# 13 out of the BBB in slot #17 and into the TSS cable appears to be open. It does not appear to be epoxy on this pin but rather something like the pin is broken or burned through in the middle between the connector housing and the backplane PCB. We did not try to pull this pin out and replace it. Rather we switched which BBB card is driving TSS signals to the M111:M112 Phi 1:16 Tier 1's. We switched slot #17 with slot #14. The BBB in slot #14 drives the M111 Tier 2. Recall that all Cal Trig BBB's output the same set of timing signals. Run a few tousoands loops of Random test to verify that this solved the problem encountered by Dan on 20,21-JAN-1994. Work on the No Candidates in the Large Tile Jet List Problem. The problem is that the Large Tile at eta +9:+12 phi 17:24 sometimes does not register in the pattern of Large Tiles over threshold although it does participate correctly in generating the L1 Trigger. This Large Tile is handled by the lower LTCC card in the M109 Tier 2 card file. This is MBA=209 CA=11. The problem is that data line 3 (i.e. bit value 4) sometimes reads out low when it should be high. This happens only with the fast Data Block Builder reads. Programmed I/O reads are OK. In fact it happens only one the first read in this card file by the Data Block Builder. The subsequent reads are all OK. We checked Terminators (backplane and cable bus). They are all OK. We swapped the LTCC cards and there was no difference. We swapped CTMBD's and the problem went away. There are two spare CTMBD's at Fermi. Both are wired as Tier 1 CTMBD's. CTMBD SN# 19 has a very bad solder job and is being returned to MSU for rework. There is an IC on this card that has 1/2 of its pins not soldered. The other CTMBD is SN# 09. It looks OK and its tag says that it has passed 100k loops of some test. .............................................................................. Date: 18,19-JAN-1994 At: Fermi Topics: Distribute new "Start TCC" 20,21-JAN-1994 Instructions, Give the Detector Shifter Training talk about L1 Cal Trig, Problems with the Fancy-214's, Problems with Vertical Interconnect reads, Work on the eta +13:+16 Phi 1:8 CHTCR, Get M114 upper backplane ready for Two COMINT Operation. Distribute the new (11-JAN-93) edition of the "Start TCC" Instructions (3 copies). Give the L1 Cal Trig talk for the Detector Shift Training. Give a L1 and L15CT talk to the Run Meeting. Give a Framework and Cal Trig upgrade talk for the Trigger Upgrade Meeting. We had the Fancy-214's running for about one day. At first it appeared to be all OK but in reality there were perhaps at least 4 different problems seen: 1. Overwritten word counts in the Short I/O Block of one of the two Fancy-214's With $FFFFF009 reading 11 there was starting at $9100 0005 010a 0280 010a 0280 0041 910c 9140 00a0 0029 0014 0024 0006 00aa 0080 0000 it should have been: 0005 010a 0280 010a 0280 0041 0041 0140 00a0 0029 0014 0024 0006 00aa 0080 0000 note that the two overwritten locations have either their address written into them or else a compination of their address and their data. This problem was only seen with Fancy-214's in the L1 crate and the Fancy-214's were providing the Short I/O memory. 2. There was a significant block of time when the length of the L1 data block was about 6000 long words instead of 2843 (or what ever the proper number is). This problem was seen with Fancy-214's in the L1 crate and the Fancy-214's were providing the Short I/O memory. Does it happen with our old normal "C" type 214's ?? This problem did not appear to stop normal Physics running operation. Is this related to the "spurts" of empty or overflow Jet Lists?? 3. With Fancy-214's operation and prviding the Short I/O Blocks there was a period during a Calib run where the Version Number in the header was often wrong. Note that on these events the Controller words and the Revision word (both also event to event static values) that are on each side of the Version Number word were reading out OK. There version number was reading D008000B instead of 00000008. After a while this problem just "fixed itself". Is this related to the "spurts" of Jet List overflows and empties ?? 4. Sometimes the pulser programming data has FFFF in the lower half of the longword that is built up from two word reads. This is perhaps 1 in 500 events. This appears even with all OLD (i.e. "C" type 214's) in the L1 VME crate. More test with the Fancy 214's. Made a test version of VTC code that reloads the Short I/O list of Word Counts right before the VBD is going to read them. Then running with the Fancy-214's this made things much worse. There were lots of errors: TF, CP, BX. Running this code with the old "C" type 214 is all OK. To check the Vertical Interconnect reads of the Cal Pulsers I made a test version of the VTC code that Tests the 3rd word of each pulser to see if it is $FFFF. It does this by bringing the Vertical Interconnect data directly into a working register (i.e. the data comes from the cable into a register for testing and not back out of the "V" type 214). Watching during 70 Hz running I saw about one error every 15 seconds P2 26 errors CC Pulser P8 21 errors CC Pulser P4 7 errors ECS Pulser P6 5 errors ECS Pulser P9 2 errors CC Pulser P3 1 error CC Pulser P7 1 error ECS Pulser P5 1 error ECS Pulser Note there are no ECN errors. From what I here it is a known problem with Vertical Interconnects not to read back correctly. Comm Taker uses 5 read backs. M114 Backplane Cut the J4 traces between between slots 14 and 15 on the upper M114 backplane. Tested for shorts with and ohm meter and then put epoxy over the cut. Work on the eta +13:+16 phi 1:8 CHTCR When looking into the EM cable that goes from this CHTCR to the Tier 2 CAT2 you could see something funny with the ohm meter on bit of value 2 for EM Ref Set 0. It is not an open or a short to an adjacent conductor but there is something funny about this input. The meter reads different for this input than for any other input on this cable. Looking with the scope and the Diff ECL box nothing looked bad but the DC levels of this input bit looked different. I was going to try swaping CAT2's but I wanted to see this problem with Cal Trig Test before I changed things. After waiting an hour to get a tube I had trouble with Cal Trig Test. Running in just eta 1:16 it made a Px and/or Py error about once every 1000 loops. Running with full eta it made a Px and?or Py error about once every loop to about once every 10 loops. Because I could not see the CHTCR problem I did not swap CAT2's. .............................................................................. Date: 13,14-JAN-1994 At: Fermi Topics: First Test of the Fancy-214's in the Level 1 System. The first test of the Fancy-214's in the Level 1 system did not work out at all. The block going up to L2 was all junk and the length was wrong. The was true of all events transfered up to L2. Backed up to the old "C" type 214's. From the 133ABug the data at $380000 looked like all junk. The data at $9010 was OK. The data at $9100 was OK. The data at $305000 was OK. The data at $B000 was OK. The data at $B800 was OK. The length up to L2 was wrong ---> the VBD was picking up the wrong length counts from Short I/O Address Space. .............................................................................. Date: 13-JAN-1994 Remotely from MSU: Topics: Eta +/- 1..20 Phi 6..32 CTFE PROMs tested Random Test Runs Another Power glitch Eta +/- 17..20 CHTCR PROMs were tested. Eta +/- 1..20 Phi 6..32 CTFE PROMs tested ----------------------------------------- Finish systematic readout of all Lookup PROMs (started 7-jan). The test code still doesn't know how to deal with Tier#1 to tier#2 truncation of the MSB (test .le. 3 phis at a time), or negative numbers read from Tier#3. The test was carried on all etas for phis 6..8, 9..11, 12..14, 15..16, 17..19, 20..22, 23..27, 28..30, 31..32. See earlier entry for test method. I had to program an offset in the Px/Py Global sums to keep them positive. I loaded 1/2 of full scale in the correction register of the Tier#3 CAT3 cards WRITEREG 0 153 39 49 16 and 0 153 37 49 16. This wasn't required for phi 1..8, which must have naturally kept a positive sum, but is now necessary for phi 9..16. One error detected for HD PROM at +17,12 Page #3 (error stays when redoing same loop 3 times) HD PROM answer is 37 instead of 39 global is now 8963 HD CAT inputs are 28, 28, 28, 58, 28, 28, 28, 28 Error Detected at POS,E_17,P_12 page #3 EM 255 & HD 36 later test HD PROM at +17,12 Page #3 by itself, and get the same error HD PROM answer is 37 instead of 39 global is now 5214 HD CAT inputs are 28, 28, 28, 58, 28, 28, 28, 28 Error Detected at POS,E_17,P_12 page #3 EM 0 & HD 36 Two errors detected for Px PROM at +3,15 Page #3 (error stays when redoing same loop 3 times) (note that there is a bug in the display code, it should say +41 and not -41) Px PROM answer is -41 instead of 57 global is now 4193803 PX CAT inputs are 0, 2, 4, 4, 5, 6, 505, 7 Error Detected at POS,E_3,P_15 page #3 EM 128 & HD 0 Px PROM answer is -41 instead of 57 global is now 4193803 PX CAT inputs are 0, 2, 4, 4, 5, 6, 505, 7 Error Detected at POS,E_3,P_15 page #3 EM 129 & HD 0 PROM Failed Test at POS,E_3,P_15,EMETZ0 Page #3 Test EM and HD channels at +3,15 alone on Page #3, and get same error on Px PROM. Then try different values of EM and HD by hand. After 0 loops of radom test, one needs to release the MTG clock by WRITEREG 0 105 53 33 100, aim both read and write to pipe A with WRITEREG 0 105 53 3 9 and 0 105 53 5 9. Then use a combination of Tree browsing and WRITEREG to FA 81 and 82 to set different values and read the CTFE's 29525 and CTFE partial sums. The PROM is short by 16 counts whenever the input is 128/2 (with EM=128/HD=0, EM=0/HD=128, or EM=127/HD=2), while inputs of 127/2 and 130/2 are ok. Another Power glitch -------------------- Level 1 lost power, and TCC seem stuck in the "Initialize all Framework Registers". I looked in the logfile from when this happened. TCC was simply going along its initialization, but every write was waiting for the CBUS and timing out (but after about 2s a piece), and then reading back 0. This is the same symptoms as were noticed on 3-jan. Again, it seems that the ZRL pQPA got screwed up, but not enough to raise its internal error flag so that TCC could know it. Eta +/- 17..20 CHTCR PROMs were tested. -------------------------------------- the last of the CHTCR PROMs were successfuly tested, see previous entry for test method. Random Tests ------------ Tests now fail after some number of loops on the Global EM Tower count for Ref #0. The count becomes short by 2, then catches up. In order to make sure that the test catches the problem right away, limit it to all eta/phi, page 4 only, and EM Ref #0 only. The count is always changing between Bad-Good when the test plays with a tower at eta +13..16, phi 1..8. further diagnosis: let the Random test catch an error. Use Tree.Browser to locate the culprit. The input to Tier#2 from this CHTCR reads 17 while the readout of the CHTCR inputs shows that 19 bits are set. The CHTCR PROM test now also produces lots of errors, while testing Tier #1 PROM #1, #2, or #3, and the Tier #2 PROM. Note that this CHTCR passed the test on 21-DEC-1993. It is sometimes (I don't believe always) missing the second bit (i.e. short by 2 counts). It is not obviously correlated to the first or third bit. When it acts up, it not an intermittent problem. One should suspect a problem with the second tier PROM on this channel (in its output circuitry), or pin/socket connection, or trace short, or water damage, or cable/connector in the front of this card, or maybe instead a problem with the input of the following Tier#2 CAT2. One could use the random test to bring the card to a point of failure, unplug the front cable and check the differential voltages for the second bit. I have the strong impression that this problem is now appearing more often (i.e. on more combination of bits?) than it was on 20..21-DEC...??!! .............................................................................. Date: 5,6,7-JAN-1994 At: Fermi Topics: Install the last 6 of the Rev 1993 Term-Attn, Investigate the readout problem at +13,1, Repair problems that Dan Owen found in the pulser runs, Reworked the AC power for the new TCC and its BA23 box, New procedures for booting the L1 68k and for booting TCC, Look at Temperatures, What is connected to eta 20 EM. Installed the last 6 of the new 1993 Term-Attn networks. They were: -7,1 -8,1 +12,1 +12,2 +12,3 +12,4 Investigated the problem with +13,1 EM and HD reading out funny values (typically 247) but participating OK in the generation of triggers. The problem was that there were NO terminators (either CBus or T&S Bus) on the cable from M114 to the phi 1:16 cardfiles in racks M107, M108, M109, M110. This string of racks is feed from the M110 end so the CTMBD in M107 needed to have Terminators. I plugged in the 110 ohm packs and the +13,1 readout now looks OK. What other troubles could this have caused? NO terminator on the Timing and Sync Bus !! This is very likely why we replaced this BBB a couple of weeks ago. That BBB (SN# 9) is very likely all ok. Repair some Trigger Tower problems: -5,28 EM was reading 25%. It had a bad Term-Attn. -16,9 HD was reading 60%. It had a bad Term-Attn. +15,12 EM was reading 60%. CTFE SN# 277 had wrong value at R107 (1k vs 3k). +20,12 HD was reading 60%. CTFE SN# 357 had wrong value at C303 (220-15pf). -17,18 EM was reading 60%. CTFE SN# 207 had wrong value at C295 (220-15pf). -20,22 HD was reading 50%. Broken connector at L1 end of BLS cable replaced. AC power for the New TCC and its BA23 now comes from the strip line inside rack M114. This strip line is powered from the unswitched 115 AC outlet on the Contactor Box for M114. Thus the new TCC has the same source of AC power and safety gnd as the rest of the equipment in M114. This is how the old TCC got its AC power. I finished writing new procedures for booting the L1 68k (VTC) and for booting the new TCC. These were posted on the West end of M113 and in the DAQ expert notebooks. These new procedures are in the files TrgMisc:Start_68020.txt and TrgMisc:Start_TCC.txt. People should check and correct the Start_TCC file. I have put labels on the new TCC box (near its On-Off switch) and on TCC's BA23 box. The file [D0_Text.Software.Trics_Doc]Boot_Procedure.txt needs to be looked at and modified for the new TCC (e.g. pushing reset, typing B, where TCC is located). 5-JAN-1994 Everything ON 7-JAN-1994 Everything ON ------------------------- ------------------------- air flow 300 to 310 lfpm air flow 300 lfpm water temp 54.7 water temp 54.3 68.5 67.6 109.8 68.0 67.1 109.6 68.4 110.4 110.4 67.9 110.0 110.1 What is connected to eta 20 EM ??? People are looking at BLS resistors but it looks like the highest eta HD elements (i.e. cal eta 4.4 are connected to our 20 EM. I had asked for this highest HD stuff to be connected to 20 HD. If it is connected to 20 EM then what trouble can it cause?? What is in the 20 EM PROMs? Dan Owen let the trigger meisters know not to let the EM Ref Sets go out to include 20. .............................................................................. Date: 3-JAN-1994 At: MSU Topics: Trouble at Fermi ReBoot D0HTCC (reboot via power cycle) From: D0::TRGUSER 3-JAN-1994 10:33:42.40 Subj: TRICS V5.2/02-JAN-1994/ Exit Refresh Monit Pool From: D0::TRGUSER 3-JAN-1994 13:11:59.28 Subj: TRICS V5.2/03-JAN-1994/ Booting At about 11:45 AM (Chicago time) the Control Room called claiming that the TCC could not talk to the Framework or the Cal Trig. According to the TRICS Log the problem began at about 10:30 AM (Chicago time). Before 10:30 there were no errors, after 10:30 every attempt to communicate with the COMINT appeared to fail. The last mail message was an "Exit Refresh Monit Pool" at 10:33. Jan was in the Control Room, so they called us before trying anything drastic (i.e. they didn't reboot or turn any power off, or try EDEBUG). Sal (Fahey?) was DAQ Expert and he thought that the problem was correlated with several L0 high-voltage supplies tripping off. Recall that either the BA23 or the uVAX 4000 box gets its 110V from the same strip which services some of the L0 high-voltage supplies. Rack M114 had power (low-voltage monitoring indicated correct voltages, and the LEDs and orange AC indicator were on), also the uVAX 4000 and the BA23 box had their AC indicator lights on. I could talk to the uVAX via TRICS, but all CBUS WRITEs failed (all FAs read back 0). I had Sal power-cycle the uVAX 4000 and the BA23 box. This appeared to solve the problem. The "Booting" mail message was sent at 13:11 (Chicago time). I wish I had had Sal try to TRIGGER the node. I didn't try that but I believe that the node would have TRIGGERed OK. I don't know whether TRIGGERing the node would have solved the problem, though. If this happens again we should try TRIGGERing the node before power-cycling. Sal claimed that there were instructions for booting the TCC (but he didn't say where they were, i.e. were they taped to the rack or were they in some log book?) which were incorrect. He asked where the RESTART button on the uVAX 4000 was. I did not tell him because the RESTART button is not clearly marked and it is mixed in with some DIP switches, etc. which should not be touched. Added by Philippe 12-JAN-1994: - Inspect the logfile TRICS_02JAN94.LOG;1 which includes the pQBA problem. No pQBA device error was logged at the initial problem time. But the end of the logfile (12:51) shows what probably was a manual power-cycling of the BA23. The logfile shows that the pQBA device was then woken up and that TRICS reset it. There is also a successful write afterwards, before TCC was rebooted. Whatever happened was significant enough to screw up the pQBA, but not enough to wake up its power problem error flag that is checked by the ZRL Interrupt Service Routine. The actual power cycling did, and TRICS had a chance to reinitialize the pQBA and the DRV11J(s) .............................................................................. Date: 2-JAN-1994 At: MSU Topics: Trouble at Fermi Required a Power Cycle ReBoot D0HTCC to fix it. They called at about 4:30 AM. I ended up having them climb up on the ladder and power cycle boot TCC. TrgMon could not connect to TCC, COOR could not connect to TCC. I'm 99% certain that EDEBug could connect to TCC although I did not do this myself. Triggering TCC from NCP did nothing i.e. NCP triggering did NOT cause TCC to boot. I only had them power cycle the 4k box. The BA23 was not power cycled. The last mail message of from TCC was an Init at 4:09 The power cycle boot was at about 5:10 Sunday morning Jan 2nd. Mary Ann Cummings was the DAQ expert on shift. I'm not sure what happened but there is some story like this: The L2 Graphics program was running at the same time that she switched the L2 nodes over to collider mode. Booting the L2 nodes while this new L2 Graphics program is running is known to cause EtherNet problems. I expect that we should talk to Jan to understand the details of this L2 Graphics vs change over problem. Perhaps if there is time this week then we could crash the system in this way on purpose so that Philippe can see what happens inside TCC. As listed in the entry in this log book from Dec 20th we need to make new written instructions about how to boot the TCC. Actually I do not think that we ever have had written instructions about TCC booting ? New instructions could include: NCP boot vs power cycle boot, TCC boots from its disk even though one normally causes it to boot via a network command. .............................................................................. Log book for 1993 is in D0_HALL_LOGBOOK.LBK_1993 ..............................................................................