COOR-TCC Protocol for Run II ============================ Guidelines ========== Updated: 7/14/97 Short history of COOR-TCC protocol ---------------------------------- Prototype version of COOR-TCC protocol created for 1987 test beam Syntax often awkward and inconsistent ASCII but often un-parsable by human No consistency of format (mixed binary, decimal, Hex) COOR-TCC Protocol reworked before commissioning at D0 Hall Keep some features ELNCON, Message ID field, Fixed keyword command field Change some features Add more messages More intuitive Unify syntax TCC was a uVAX II syntax was kept simple to match limited processing power Produce written specification/documentation Several upgrades during Run I to match hardware upgrades Add L1 Cal. Trig Add Begin/End Run scaler snapshots Add L1.5 Framework Add L1.5 Cal. Trig Perceived Advantages of Run I protocol (things to preserve) -------------------- TCC sends one acknowledgement message per COOR command message TCC acknowledgement always instantaneous Execution takes much less time than transfer time (0.1 s) Except for Begin/End Run File (from 2 s to 2 mn depending on host file server) solution= TCC launch task and COOR synchronize later Initialization : simply takes long (1 mn in run I with no errors) Very few L1CT messages took about 1 second (e.g. Missing Pt thresh) All commands are human readable ASCII Keywords for message type Some other keywords fields and values (in L1CT) With common/uniform format Few simple syntax rules All numbers are decimal strings (with some keywords like POS_ETA) Same command interface used on L1 Simulator Clean Interface between COOR and Trigger hardware Easy to compose/send test messages to TCC online or to simulator Easy to check during commissioning Easy to check current/archived messages and understand configuration Perceived Disadvantages of Run I Protocol (things to improve) ----------------------- Lots of individual messages. Would scale up to 1000 messages in Run II for 128 SpTrg to download. Run I Syntax would allow programming several SpTrg in one message but only for one type of resource (e.g. program several prescale ratios). This wasn't used except for SpTrg enable/disable messages. Probably because it was orthogonal to the way COOR organized its actions (per SpTrg as opposed as per property). Message Roundtrip latency too slow to support lots of individual messages 0.1-0.2 s per message + acknowledge Latency is in network layers, and serialization, NOT in processing Some messages were somewhat useless or redundant Some functionality was never exercised (e.g. independently specifying list of Geographic Sections to digitize and list of Front-End Busy to listen to). Some programming never departed from standard default configuration (e.g. SpTrg always told to obey L2 disable). Action verbs limited to fixed location and fixed length This might have been appropriate for limited power of uVAXII but not necessary today Acknowledge messages were also fixed fields/fixed length And were content-poor in term of diagnostics/debug information Poor feedback from COOR back to user or to DAQEXP Error status not displayed and/or not associated with particular message (e.g. show difference between BAD PARAM and BAD ERROR) Suplemental error information lost (e.g. Begin/End Run file errors). Some diagnostic information sent to COOR's Log files, but not all. Sub-System separation not very apparent in messages L1 FW / L1.5 FW / L1 Cal / L1.5 Cal Initialize message took long and COOR would not wait Especially because (I think) the TRIGGER INIT used a side door to COOR and didn't seem (I think) well synchronized with TAKER requests We had users and novice DAQEXP confused several times because of that. This also occasionally caused a loss of synchronization where COOR would parse acknowlegement for messsage n-1 instead of n. The point is that COOR must wait for completion of initialization. There is no way around this, and no point for COOR getting any further if TCC isn't done initializing. COOR would always loose the first message after loosing connection to TCC (e.g. after TCC reboot) COOR needs to resend messages after re-connecting. This was addressed but never made to actually work. Worth mentioning, but not a real problem, just an overall choice: Message information did not include specific trigger logical name local/global run number for SpTrg And/Or term logical names This knowledge transfer was not needed for triggering or programming. Some people complained that TRGMON did not report that kind of info. Note that this information could have been obtained by TRGMON from COOR and/or COOR input/output files. TRGMON was/is a low level hardware monitoring... but is also used by shifters and detector user. Guidelines for Run II --------------------- Keep simple ASCII format, improve keywords and formatting Keep one acknowledge per message, or per message group Improve error information in acknowledgement And improve path back to TAKER and/or DAQEXP with view of offending message and acknowledge status Condense the number of messages Group the messages: e.g. one message per SpTrg Allow setting several/all properties of a sptrg in one transaction Still need separate message to enable/disable one or a set of SpTrg Still need separate message to change prescale ratios between runs Condense the length of messages by using ranges in syntax (e.g. a range of geographic sections 0:127) Carries same information as exhaustive list Actually emphasizes contiguous groups and highlights the holes More compact This was used with great success in Run I L1 CT Reference Sets. Minimize the number of SpTrg features to setup but without loosing on functionality Still keep widest functionality/flexibility accessible at COOR level Implement all message types necessary to access all bells and whistles, But suppress any messages that match standard/normal default settings. Split "INIT" in 2 pieces (because of Run II use of FPGA technology) Download all FPGA configurations (only after power-up, slow = minutes) Reset to default programming (clean restart, faster = seconds) Trigger Exposure Groups For luminosity accounting (avoiding scalers for 128 SpTrg * 160 Bunch) There will be up to 8 Trigger Exposure Groups (Run I had "sort of" 1-3) Split off some And-Or Term used as Beam Quality (e.g. SCINT_VETO) Limit number of distinct Beam Quality And-Or Term groupings Limit number of distinct subsets of geographic sections for readout How do we handle the "heartbeat trigger"? The idea it to keep the whole D0DAQ cycling at ~0.2 Hz between runs. TCC needs a SpTrg reserved and programmed with the auto-disable feature TCC watches for 5 sec timeout on normal event flow and force an event. TCC must know which SpTrg it is so that it can hit the right one Should it be 100% programmed by COOR or part of TCC initialization? Other parts of the DAQ system must also either be programmed to answer this heartbeat, or know what to do by default. How much enable/disable control left to COOR and/or Level 3? Concepts NEW for Run II 1) DZero-wide Heartbeat Trigger In run I we had an internal heartbeat trigger, without stimulus to geographic sections and without readout to L3. 2) Trigger Exposure Groups In run I we had, in essence, all Specific Triggers in one Group (all SpTrg readout same Geographic Sections). There were some variants (e.g. L0 single interaction flag) that were also monitored and could be counted as equivalent to a total of 1-3 groups. 3) Level 2 Trigger Framework In Run I we had the L1.5 Trigger Framework, with the important distinction that only a subset of the events were sent to the L1.5 Trigger, and only a subset of the Level 1 SpTrg could be enabled to send their events to the L1.5 Trigger. In Run II *all* events will always go through the Level 2 System. 4) Level 2 Trigger PreProcessor (L2CTPP) In Run I we had the L1.5 CT with similar functionality. All messages were actually parsed and processed by TCC. A memory block with a binary data structure was prepared and shipped to the L1.5 hardware. 5) Level 2 Trigger Global Processor In Run I the L1.5 CT partially filled this functionality The Run II method of programming the L2 CTPP and the L2 Global are still being worked on, but it seems like there will still be ASCII messages and a third-party translator program (on the host or TCC) to fill a binary data structure that can be read by the L2 CTPP or L2 GLobal processors. These two systems (L2 CTPP and the L2 Global) are not *directly* discussed here. But we should at least *try* to make the L1, L2 and L3 programming interfaces *similar*, or compatible, if not truely identical.