(Some of the) points addressed 25-JUN-1997 ---------------------------------------- we should start using the word "forward assigning" instead of "pre-assigning". The "linear" version of the system ["linear" is our code name for the first implementation that handles only one event at a time] will still cycle through all MagicBus Mapper Slots, and all Main Memory Buffers. It will be using only 2-3 at a time (one buffer with the current event, one buffer being pointed to for the next event and one buffer to forward clear for the Main Memory to switch to). The Magic Bus 8 bit DMA addresses are split with 4 bits for 16 events and 4 bits for 16 sources, one source reserved for alpha communication or other special case. Reserving one slice of source ID for Alpha communication does not interfere with having room to buffer 16 events at any time. It is the Magic Bus Source Card that generates the 4 bit magic bus event id. This is done autonomously within the Source Card(s) without intervention from the Adiministrator. In order to solve the problem of driving back 128(+1?) bits of electrical information back to the L2FW, it was proposed at the last meeting to add functionality to the Magic Bus source card and make it a target on the Magic Bus. The MB Source card would have an appropriate set of registers that the Admin could write to and control a set of line drivers to form a parallel line output (128 or 129 wide). The Magic Bus Source Card is responsible for several (four or eight?) input sources (L2 PPs or L1 CT racks). It will thus send, for each event, data with (four) separate Magic Bus Source IDs. We know that, in order to make most efficient use of the PCI bus on the Alpha nodes receiving the event pieces, it is preferable not to keep switching Magic Bus Source ID (nor event ID, of course). The Magic Bus Source Card should thus send all data from one sub-source in one contiguous transfer and then switch to the next sub-source that it is responsible for. Now if we are going to use the Magic Bus to send information back to the L2FW and communicate between the Admin and Worker(s) we need to verify that the Magic Bus is not "locked out" during event transfer for long (10-50us) periods of time. It thus seems necessary to specify that each Magic Bus Source card always re-arbitrates for ownership of the magic bus between successive sub-sources that it is responsible for. This will bring the "locked-out" periods down to the 1-5 us level. We also need to ensure that the Alpha boards are given priority during arbitration over the MB Source card. It is currently believed that this can be accomplished by just controlling the order of the Cards on the Magic Bus, and/or by a custom modification of the backplane. We had also day-dreamed that we could use the same parallel Output facility to make the Administrator generate some electrical signals showing which processing phase it is currently in. We could then clip a logic analyser for instant capture of timing and sequencing information during event processing, control activity, event readout, etc. We think we need (two?) Magic Bus Source Cards in the system to handle 2x4 independent input sources. We think we need one Magic Bus Source card to drive signals back to the L2 F W (cf. above). This left us with the parallel output of the second Magic Bus card unused. This makes one set of 128 bits free to display this kind of state information. However the compromise of re-arbitration between sub-source (cf. above) seems to still leave us with a serious limitation at the 1-5 us level, while we'd like our state/phase flags to have sub-microsecond accuracy. what about multiple workers? We have to remember the two separate cases L2GLB and L2CTPP We could assign events to multiple workers in different ways - Workers working on "separate parts" of the same event - Workers working on separate, successive events (odd/even) - working on separate events under management from the Administrator on a first done/first (re)launched basis. We are not intending to start off with multi-worker mode. But we want to understand enough to make sure that our mono-worker mode is compatible with such a future upgrade. This means our mono-worker "linear" mode and our mono- worker "fully buffered" mode. Case of same event split among multiple workers That is how we would run the L2CTPP; one worker for electrons, one for jets. For the linear, or the full buffering scheme, this does not seem to present buffer synchronization problems. The workers would synchronize with the Administrator at the end of every event to report the partial answer. The Administrator can control whether the Worker who is done ahead of the other(s) may start on the next event. With the Workers only working on buffers that are either already received an event (as flagged by the ISR), or clearing buffers that cannot by definition receive another event (forward assigning), we are guaranteed there will be no clash. Case of workers working on separate events The Administrator can wait for answers from either worker, but the Administrator must serialize these events and return the answers in the order the events arrived. There doesn't seem to be a problem either if we want to escalate and have the Administrator dynamically allocate which event each node will work on next. How do we transfer data from L2CTPP to L2GLB? L2CTPP Magic Bus backplane is more heavily used. That's about 3kbyte/event one crossing worth of Trigger Tower information. If the Trigger Tower pick-off resolution isn't sufficient to properly separate successive 132 ns Beam Crossings, we might need to transfer Trigger Tower Information for more than one crossing and recover single crossing data by de-convoluting in L2CTPP. How many crossings does L1CT possibly need to send to L2CTPP? 3, 4? Notes from meeting of 9 July 1997 on L2 Monitoring Slow monitoring --------------- Physical path for slow monitoring info: TCC has a Bit3 card connecting to a VME crate TCC's VME crate has vertical interconnects to parts of trigger framework Bit3 crate interconnect card for each alpha crate Alpha crate has Bit3 crate interconnect card with a mezzanine card of dualport memory When is slow monitoring information stored? Goal is monitoring information refreshed every 5 sec or so timestamped with actual time of acquisition options are TCC provokes somehow something in trigger setups provokes a special event Global does it with a timer Of these, the special event seemed most attractive. For acquisition of L1 information, every 5 seconds the framework(s) will be requested to grab scalers after the next L1-passed event. This mechanism will be extended to set a trigger qualifier flag to be sent to L2Global (and L2 preprocessors?). This results in a set of L2 scalers which are event-synchronized with the L1 scalers. This was felt to be more attractive than trying to synchronize in absolute time. The time delay might be of order 16 * 100 usec, ie of order 2 msec or less out of 5 seconds. Global worker and administrator are to grab monitoring information after processing the event. Global will transfer the data to the multiport memory when VME is free, another delay of order 1 msec. TCC will look after perhaps 500 msec. Should not make the event MFP--would roughly double the L2 rate. Could choose to send last scalers with MFP events. This mechanism is sufficient to capture scalers. It could also capture at a low rate information on the number of input and output buffers in use in the L2 processor. If, further, the number of input buffers in use on the Magic Bus Source Card is desired, this would number would have to be available as a register readable by the alpha. It does not provide information about time in processing state, fractional utilization of busses, etc. Mean processing times and time in state can be obtained by fast monitoring described below. However, if a distribution of processing times is required, a technique such as keeping a buffer of clock times at the end of processing of N events would be needed; the buffer could be copied and emptied at the time of acquisition of the slow monitoring data. Time in State and Busy Fractions -------------------------------- This kind of information can be obtained in one of two ways: - hardware: present a signal to be used as a scaler gate. The scalers can be read out with slow monitoring. Knowing the scaler clock rate can translate ratios of scalers to absolute times. This can apply to any state which lasts long enough compared to the scaler clock rate, every 132ns. - software (especially for states): have a routine which records a timestamp upon entering a new state, and keeps an array of start/stop times for each state; the array can be shipped with slow monitoring for analysis into mean times and distributions of times. This can apply to any state which lasts long enough compared to the clock tick on the system clock. The alpha system clock may, depending on the particular model, click between 1 and 16 instructions. This would be in the range 2 to 32 ns for a 500MHz alpha, so gives good resolution for timing of internal states, or those producing interrupts, at the cost of some overhead in instructions. The clock probably wraps at 32 bits, which is 8-128 seconds, depending on the click multiplier. The software mechanism is less appropriate for things such as MBus busy measurement. Fast monitoring --------------- Fast monitoring information is intended to give detailed or time-resolved information such as present state, time in state, and instantaneous number of buffers used. This information could be presented in two forms: on pins available for a logic analyzer (eg unused pins on J2 VME connector) as lines travelling back to the L1 framework to be used as gates for scalers The latter lines for an alpha card might appear as Fred port lines Parallel Printer Port lines plugged into ISA bus For state monitoring, it would make sense to have separate lines for each alpha node, since each has its own processing states. How many such lines would be of interest? ---------------------------------------- 16 bits Input buffer occupancy L1 framework could also show this for the state of the Front End buffers worth asking Marvin whether this can be made part of the VRB spec Magic Bus Source Card might also have such ocupancy lines hardware lines [less important 8 bits Output buffer occupancy Occupancy means: bit 0 1 or more buffers occupied 1 2 . . . 15 16 buffers occupied If bits are tight, it could make sense to decrease resolution, eg 0 2 or more 1 4 or more 2 14 or more 3 16 or more This is probably more useful than the 4 undecoded counter bits, at least for scaling. Count of bits ------------- [decoded strongly preferred] coded decoded 4 4-16 input buffer usage 3 4-8 output buffer usage 4 4-16 internal processing states if timing internal, undecoded ok 1 1 Magic Bus busy ---------- 9-41 for each of the alphas