(Some of the) points addressed 25-JUN-1997
----------------------------------------

we should start using the word "forward assigning" instead of "pre-assigning".

The "linear" version of the system ["linear" is our code name for the
first implementation that handles only one event at a time] will still cycle
through all MagicBus Mapper Slots, and all Main Memory Buffers.  It will be
using only 2-3 at a time (one buffer with the current event, one buffer being
pointed to for the next event and one buffer to forward clear for the Main
Memory to switch to).


The Magic Bus 8 bit DMA addresses are split with 4 bits for 16 events  and 4
bits for 16 sources, one source reserved for alpha communication or other
special case. Reserving one slice of source ID for Alpha communication does not
interfere with having room to buffer 16 events at any time.

It is the Magic Bus Source Card that generates the 4 bit magic bus event id.
This is done autonomously within the Source Card(s) without intervention
from the Adiministrator.


In order to solve the problem of driving back 128(+1?) bits of electrical
information back to the L2FW, it was proposed at the last meeting to add
functionality to the Magic Bus source card and make it a target on the Magic
Bus. The MB Source card would have an appropriate set of registers that the
Admin could write to and control a set of line drivers to form a parallel line
output (128 or 129 wide).


The Magic Bus Source Card is responsible for several (four or eight?) input
sources (L2 PPs or L1 CT racks).  It will thus send, for each event, data with
(four) separate Magic Bus Source IDs.  We know that, in order to  make most
efficient use of the PCI bus on the Alpha nodes receiving the event pieces,  it
is preferable not to keep switching Magic Bus Source ID (nor event ID, of
course). The Magic Bus Source Card should thus send all data from one
sub-source in one contiguous transfer and then switch to the next sub-source
that it is responsible for.   Now if we are going to use the Magic Bus to send
information back to the L2FW and communicate between the Admin and Worker(s) we
need to verify that the Magic Bus is not "locked out" during event transfer for
long (10-50us) periods of time. It thus seems necessary to specify that each
Magic Bus Source card always re-arbitrates for ownership of the magic bus
between successive sub-sources that it is responsible for.  This will bring the
"locked-out" periods down to the 1-5 us level.  We also need to ensure that the
Alpha boards are given priority during arbitration over the MB Source card.  It
is currently believed that this can be accomplished by just controlling the
order of the Cards on the Magic Bus, and/or by a custom modification of the
backplane.

We had also day-dreamed that we could use the same parallel Output facility to
make the Administrator generate some electrical signals showing which
processing phase it is currently in.  We could then clip a logic analyser for
instant capture of timing and sequencing information during event processing,
control activity, event readout, etc.  We think we need (two?) Magic Bus Source
Cards in the system to handle 2x4 independent input sources.  We think we need
one Magic Bus Source card to drive signals back to the L2 F W (cf. above). This
left us with the parallel output of the second Magic Bus card unused.  This
makes one set of 128 bits free to display this kind of state information.
However the compromise of re-arbitration between sub-source (cf. above) seems
to still leave us with a serious limitation at the 1-5 us level, while we'd
like our state/phase flags to have sub-microsecond accuracy.


what about multiple workers?

  We have to remember the two separate cases L2GLB and L2CTPP

  We could assign events to multiple workers in different ways
    - Workers working on "separate parts" of the same event
    - Workers working on separate, successive events (odd/even)
    - working on separate events under management from the Administrator on a
    first done/first (re)launched basis.

  We are not intending to start off with multi-worker mode. But we want to
understand enough to make sure that our mono-worker mode is compatible with
such a future upgrade.  This means our mono-worker "linear" mode and our mono-
worker "fully buffered" mode.

Case of same event split among multiple workers

    That is how we would run the L2CTPP; one worker for electrons, one for jets.

    For the linear, or the full buffering scheme, this does not seem to present
buffer synchronization problems.  The workers would synchronize with the
Administrator at the end of every event to report the partial answer.  The
Administrator can control whether the Worker who is done ahead of the other(s)
may start on the next event.  With the Workers only working on buffers that
are either already received an event (as flagged by the ISR), or clearing
buffers that cannot by definition receive another event (forward assigning), we
are guaranteed there will be no clash.

Case of workers working on separate events

   The Administrator can wait for answers from either worker, but the
Administrator must serialize these events and return the answers in the order
the events arrived. There doesn't seem to be a problem either if we want to
escalate and have the Administrator dynamically allocate which event each node
will work on next.

How do we transfer data from L2CTPP to L2GLB?

L2CTPP Magic Bus backplane is more heavily used.  That's about 3kbyte/event one
crossing worth of Trigger Tower information.  If the Trigger Tower pick-off
resolution isn't sufficient to properly separate successive 132 ns Beam
Crossings, we might need to transfer Trigger Tower Information for more than
one crossing and recover single crossing data by de-convoluting in L2CTPP. How
many crossings does L1CT possibly need to send to L2CTPP? 3, 4?
    
Notes from meeting of 9 July 1997 on L2 Monitoring

Slow monitoring
---------------

Physical path for slow monitoring info:

    TCC has a Bit3 card connecting to a VME crate
    TCC's VME crate has
        vertical interconnects to parts of trigger framework
        Bit3 crate interconnect card for each alpha crate

    Alpha crate has
        Bit3 crate interconnect card with a mezzanine card of dualport memory

When is slow monitoring information stored?
    Goal is monitoring information refreshed every 5 sec or so
    timestamped with actual time of acquisition
    
    options are 
        TCC provokes somehow
        something in trigger setups provokes a special event
        Global does it with a timer

    Of these, the special event seemed most attractive.  For acquisition of L1
information, every 5 seconds the framework(s) will be requested to grab scalers
after the next L1-passed event.  This mechanism will be extended to set a
trigger qualifier flag to be sent to L2Global (and L2 preprocessors?).  This
results in a set of L2 scalers which are event-synchronized with the L1 scalers.
This was felt to be more attractive than trying to synchronize in absolute time.
The time delay might be of order 16 * 100 usec, ie of order 2 msec or less out
of 5 seconds.  Global worker and administrator are to grab monitoring
information after processing the event.  Global will transfer the data to the
multiport memory when VME is free, another delay of order 1 msec.  TCC will look
after perhaps 500 msec.  Should not make the event MFP--would roughly double the
L2 rate.  Could choose to send last scalers with MFP events.

    This mechanism is sufficient to capture scalers.  It could also capture at a
low rate information on the number of input and output buffers in use in the L2
processor.  If, further, the number of input buffers in use on the Magic Bus
Source Card is desired, this would number would have to be available as a
register readable by the alpha.  It does not provide information about time in
processing state, fractional utilization of busses, etc.

    Mean processing times and time in state can be obtained by fast monitoring
described below.  However, if a distribution of processing times is required,
a technique such as keeping a buffer of clock times at the end of processing of
N events would be needed; the buffer could be copied and emptied at the time of
acquisition of the slow monitoring data.

Time in State and Busy Fractions
--------------------------------
    This kind of information can be obtained in one of two ways:
- hardware: present a signal to be used as a scaler gate.  The scalers can be
    read out with slow monitoring.  Knowing the scaler clock rate can translate 
    ratios of scalers to absolute times.  This can apply to any state which
    lasts long enough compared to the scaler clock rate, every 132ns.

- software (especially for states): have a routine which records a timestamp
    upon entering a new state, and keeps an array of start/stop times for each
    state;  the array can be shipped with slow monitoring for analysis into mean
    times and distributions of times.  This can apply to any state which lasts
    long enough compared to the clock tick on the system clock.  The alpha
    system clock may, depending on the particular model, click between 1 and 16
    instructions.  This would be in the range 2 to 32 ns for a 500MHz alpha, 
    so gives good resolution for timing of internal states, or those producing 
    interrupts, at the cost of some overhead in instructions.  The clock 
    probably wraps at 32 bits, which is 8-128 seconds, depending on the click 
    multiplier.

The software mechanism is less appropriate for things such as MBus busy
measurement.

Fast monitoring
---------------

Fast monitoring information is intended to give detailed or time-resolved
information such as present state, time in state, and instantaneous number of
buffers used.  This information could be presented in two forms:

    on pins available for a logic analyzer  (eg unused pins on J2 VME connector)
    as lines travelling back to the L1 framework to be used as gates for scalers
    
    The latter lines for an alpha card might appear as
        Fred port lines 
        Parallel Printer Port lines plugged into ISA bus


For state monitoring, it would make sense to have separate lines for each alpha
node, since each has its own processing states.  

How many such lines would be of interest?
----------------------------------------
16 bits Input buffer occupancy
       L1 framework could also show this for the state of the Front End buffers
       worth asking Marvin whether this can be made part of the VRB spec
       Magic Bus Source Card might also have 
            such ocupancy lines
            hardware lines [less important

8 bits  Output buffer occupancy

Occupancy means:
    bit 0   1 or more buffers occupied
        1   2
        . . .
        15  16 buffers occupied
    
    If bits are tight, it could make sense to decrease resolution, eg
        0   2 or more
        1   4 or more
        2   14 or more
        3   16 or more
    This is probably more useful than the 4 undecoded counter bits, at least for
    scaling.

Count of bits
-------------
            [decoded strongly preferred]
coded   decoded
4       4-16      input buffer usage
3       4-8      output buffer usage
4       4-16      internal processing states if timing internal, undecoded ok
1        1      Magic Bus busy
----------
        9-41

for each of the alphas