From:	JIML::LINNEMANN    "Jim Linnemann at MSU (517)355-3328"  8-AUG-1997 17:40:10.74
To:	EDMUNDS
CC:	
Subj:	revised version of requirements

Here are the pieces of functionality we would like from Magic Bus and/or Fred

A. Event Input data:
    flows at 10 KHz
    can be substantial volume: 1-20 KB/event, giving 200MB/s worst case
    buffering for 16 events on input
    up to 12-14 sources of event-type data

B. Output Event data:
    [only for calorimeter crate]
    flows at 10 KHz on same MBus heavily loaded for input
    data volume smaller: less than 1KB/event typical, maybe X 5 less
    Comes split among 2-3 sources, each a separate alpha card
    [target is in another crate, so each source need not be event-synchronous
    with other source, nor strictly interleaved with reading of event, ie
    sequence can be IOIIO, not restricted to IOIOIO]

    [planning to write output data L3 for Global and Cal preprocessor on VME]

C. Communication data:
    The alpha processors need to communicate [across MBus] several times per
    event.  Because of long VME latency with the D0 VBD readout controller, 
    this looks impractical to do in VME, at least in the Global crate.  The
    messages are 5-20 B long, and need latencies of order 1 usec.  

D. Fast Monitoring
    The alpha cards would like to be able to set and have respond in not too
    many instruction cycles some lines which represent the state code of the
    internal state of the node, and the number of buffers currently occupied on
    the card.  This is a minimum of 8 bits (4+4), which could be received,
    decoded, and scaled (at 132 ns) in the trigger framework.  For flexibility
    we suggest the ISA (Fred) port have 32 bits, with at least of these bits
    bidirectional.  These same lines would be very handy for Logic Analyzer
    debugging.

How we might implement these on (our understanding of) MBUS/Fred.

    It would have been nice if

- MB Sources provided 24b of address and PCI/MB Bridge (Bridge) passed along
lower 16 bits

- Alpha cards could source and sink with 24b of address at no worse than half of
full MB speed

- provided alpha cards can write to each other and other MB devices at
  reasonable speed, having 12 b of address space as broadcast can probably
  substitute adequately for the full flexibility

Below are some more detailed comments on what we would like MBus to do, under
the assumption that the Bridge only sees 8 (or better 12) bits of address
reading, but can source with a larger address range.

1. We have 16 sources and 16 addresses associated with these sources.  This can
   be encoded as 1 Byte of address interpreted as Higher and Lower half-bytes.
   The source is responsible for providing to MBus its [full 16-bit] address
   information, which is used in the DMA engine to point data towards particular
   memory blocks.

    The location within these blocks is determined strictly by sequential
    address of data within transfer, assigned by the DMA controller counter.
    The DMA controller switches blocks when it notes a change in 16-bit address.
  * [When it switches, does the location count revert to 0, or is this the job
    of the CPU when resetting for "next event"?  Are there separate counters for
    each of the 256 addresses?]


2. Event Input data:
    flows at 10 KHz
    can be substantial volume: 1-20 KB/event, giving 200MB/s worst case
    buffering for 16 events on input
    up to 12-14 sources of event-type data

    All event-type data is seen by MBUS as write cycles into Alpha boards from
    MBus Tranceiver (MBT) cards.

    We will probably have 2 MBT cards of 8 sources each in the crate.
        [how do we daisy-chain these to produce "end of event"?
        [whose responsibility is it to produce "end of event"?  MBus
        termination?  MBT card (daisychained across front panels?  on MB lines?]
    * Assume that any MB source slot can assert any address, and that physical
    slots, not addresses, are used for arbitration
    * If so, does one MBT card have to have all its sources finished before
    relinquishing bus mastership

3. Output Event data:
    [only for calorimeter crate]
    flows at 10 KHz on same MBus heavily loaded for input
    data volume smaller: less than 1KB/event typical, maybe X 5 less
    Comes split among 2-3 sources, each a separate alpha card
    [target is in another crate, so each source need not be event-synchronous
    with other source, nor strictly interleaved with reading of event, ie
    sequence can be IOIIO, not restricted to IOIOIO]
    Send out on MBus to particular target addresses on MBT cards
        * assumes Alpha card bridge can WRITE to MBUS, not just read from it
        * speed need not be as fast as input, but not slower than 10X?
        * need standard MBUs arbitration logic on Alpha card
        * assume that this is a message to a particular location on MBus, so
        that MBT card can be set up to be the only listener interested in
        message to these addresses.  Assume that the listener on the MBT card
        can be set up as either a FIFO (constant address) or a memory (Alpha
        source changes MB address).  Which, if either of these can the Alpha
        card support?

4. Communication data:
    The alpha processors need to communicate [across MBus] several times per
    event.  Because of long VME latency with the D0 VBD readout controller, 
    this looks impractical to do in VME, at least in the Global crate.  The
    messages are 5-20 B long, and need latencies of order 1 usec.  

    Sources 14-15 are alpha-alpha communication.  Any of the CPU's can describe
    itself as source either.  There are 16 messages supported on this channel; a
    message is defined by which CPU [if any] have their DMA engines set up to
    transfer that source to a memory buffer; if not, it is ignored.  If we do
    need one address [here effectively 6-bit: 14/15 + 4b for "event"] per
    message, and must keep them distinct, that might force us to want 12b
    addresses instead of 8b.

    Source 15 messages should be able to get bus mastership between sources of
an event, as the communication is not synched to an input event, and we do not
wish to accept the latency of waiting for the end of an input event.
* Assumes that MBT cards allow arbitration between their sources, not the full
half-event.  But if so, what is to prevent downstream card from seizing
mastership if an Alpha card is not ready to do so?

3.  Here are the presently-known messages which pass in a simple
    worker-administrator crate for a global processor:

    a. Worker sends to Admin the buffer number of an event, the 128 bit answer
    for a 16-bit event number, and if it passed, a list of start addresses and
    wordcounts for data to be read out It then stalls until the reply message b:

    b. Admin replies by giving (perhaps with event number and buffer number) the
    buffer number to use when an event comes into the mapper slot occupied by
    the present buffer number.

    c. Admin sends the 128 bit answer to HW Framework by sending on MBus; this
    message goes to an output module on a MagicBusTransceiver (MBT) card.
        [is an acknowledge message required?] [switch select on which card this
        is active]
    d. [Via MBUS???]
        an interrupt must be generated for EACH alpha card in the crate when the
        a) Event Finished is raised by the [last] MBT card
        b) the DMA engine has finished transfer of this event into Main Memory 
        [ is an acknowledge from Workers to Admin's required?  If so, one per
        worker?]  
        [ if multiple workers:
        - there seems no mechanism to send to a DIFFERENT address inside the
        predefined block of memory, so again may need to be distinct messages
        -if need ack's from two workers, would they be
        separated in time until Admin has time to digest the message?
        -instead, need to go to different memory blocks!!?
    e. [Via MBUS??? Fred??]
        send GO to MBT card[s] for next input event
    f. [Cal Preprocessor Crate] Jet worker writes Jets to L2Global by sending to
    MBT [switch-enabled].  [ack required?]
    g. [Cal Preprocessor Crate] Em Worker writes EM to global.
    h. [Cal Preprocessor Crate] MET worker?