FPGA Loading Straw Man
                    ----------------------
                                                Original: 11-APR-1995 
                                                  Latest:  2-MAY-1995 

Introduction
------------

The purpose of this document is to propose a method for loading the
FPGA's on "the" card for the Run II Frameworks.  The general idea is
to implement the simplest loading method which provides all of the 
required functionality.  The following methods have been considered:

    1. Using one-time programmable FPGA's

    2. Using reprogrammable FPGA's with a ROM in a ZIF socket on
       the card

    3. Using reprogrammable FPGA's which are downloaded directly
       from TCC

    4. Using reprogrammable FPGA's and having some non-volatile memory
       (e.g. flash memory) on the card.  The on-card non-volatile
       memory could be downloaded via TCC.


Each method considered
----------------------

1. Using one-time programmable FPGA's

    With this method, the trigger logic is hardcoded in the FPGA's.  This 
    is very similar to the Run I Trigger Frameworks. 
        
    This method has some possible good points.  Downloading is not required.
    The final trigger logic is not going to be changed a lot (we think), so
    why make it downloadable?  Also, there is no worry about random cosmic
    rays changing the trigger logic (but SRAM's are very stable now, so
    this isn't really a concern).  

    However, the disadvantages appear to outweigh the advantages.  Giving
    up reprogrammability seems to be too high a price to pay.  Some of
    the advantages of reprogrammability are:

        - we can solder the FPGA's to the card (we would not want to
          solder one-time programmable FPGA's to the card).  This will
          make routing easier (no sockets).

        - we can change the logic easily if need be

        - stocking spares will be easier

    Thus, this method is ruled out.


2. Using reprogrammable FPGA's with ROM's in ZIF sockets

    With this method, we get the advantages of reprogrammable FPGA's, but
    still avoid having to perform a download via TCC.  This sounds pretty
    good at first, but having to remove the cards to change the FPGA
    configuration is again too big a price to pay.  We know that these
    cards will have lots of cables attached to the front, which makes
    card removal difficult.  Also, we would probably want some TCC
    monitoring/control of the download, so we need some path between
    the TCC and the FPGA's anyway.  

    Thus, this method is ruled out.
         

3. Using reprogrammable FPGA's which are downloaded directly from TCC

    With this method, we can download the FPGA's from TCC.  The penalty
    for changing the FPGA configuration is small (don't need to 
    remove a card).  

    The possible disadvantage is that it would take time to load the
    FPGA's.  Using the Xilinx XC4025 (which has 422 kbits = 53 kbytes
    of program data) as an example, and scaling from the speed with 
    which we currently transfer data via the pVBA + Vertical Interconnect 
    (5 us per IO, or 8 us including computations or data preparation -> 
    125-200 kbytes/second), it would take about 250-400 msec to load one
    XC4025 8bits at a time. Thus it would take about 4-6.5 seconds to load
    16 FPGA's on one card, and about 400-650 seconds or 7-11 mn to load the
    estimated 100 cards in the two Frameworks. This is at the upper limit
    of (or beyond) what would be acceptable.  Several options can be
    considered, e.g. using 16 bit transfers would save a factor 2.  
    Another approach would be to allow loading several FPGA's in parallel,
    since most cards should only need 1-3 different FPGA's configurations,
    it would save time if we could program all the FPGA's of each kind and at
    the same time.  Saving a factor 4-16 on enough cards and bringing the
    FPGA logic download time down in the 2-5 mn range would be satisfactory.

    As an aside, the fastest that we could ever load the FPGA's is 
    driven by VME backplane speed.  The maximum throughput of our
    VME backplane will be 16 bits/200 ns.  At this rate we could load
    an FPGA in about 5.3 msec, load a whole card in 84 msec, and load
    the whole system in about 8.4 seconds.  Note that Xilinx FPGA's can
    not directly accept 16 bit program data.  

    If the trigger programming (e.g. And-Or Terms vs. Specific Triggers)
    were not embedded in the FPGA configuration, then we would only need
    to download the FPGA's after a power cycle (once every n months).  
    This is similar to the L1 and L1.5 68K code.  

    If the trigger programming were embedded in the FPGA configuration,
    then every initialize would take about 5 minutes.  Can we afford
    5 minute initializes?  A Run I initialize is about 1 minute.  
    What about things like prescales which are changed during the
    run?  Not all of the FPGA's would need to be downloaded, only
    the FPGA's on the 4 Trigger Decision Modules, so changing prescales
    would cost 13 seconds (assuming all 64 FPGA's were reprogrammed).  


4. Using reprogrammable FPGA's with on-card non-volatile storage

    Finally, we could think about having the FPGA configuration stored
    in non-volatile memory on "the" card.  This non-volatile memory 
    could be written by TCC.  This appears to recognize that the 
    FPGA configuration will be more or less static (assuming that the
    trigger programming is not part of the FPGA configuration), but
    that we would like the option of changing it without tearing the
    system apart.

    But what is the real advantage of doing this?  It would only
    speed up downloading the FPGA configuration.  It gains nothing else.
    So once every n months the FPGA's could be loaded in a few seconds
    rather than a few minutes.  This does not appear to be worth the
    extra complexity.

    If the trigger programming were embedded in the FPGA configuration,
    this method looks even less appealing.  Even if the Trigger List
    Vx.y were stored in the on-card non-volatile memory, we need
    to deal with things like prescales and special runs.  We could
    have multiple "pages" in the flash memory but remember that 
    we would want the ability to store different configurations in
    all 16 FPGA's, so we're talking about 850 kbytes of memory
    per page.  That's going to take up some space and also cost
    some money.  

    Thus, this method is ruled out.


A closer look at direct download via TCC
----------------------------------------

The method we are currently most interested in is method (3) above, using
reprogrammable FPGA's which are directly downloaded from TCC.  Let's 
look more closely at this method, and make a strawman proposal for how
the FPGA's would be downloaded.  Again remember that the goal is to
build the simplest system which has all of the functionality that we
would need.  

Assume that "the" card contains 16 Xilinx XC4025 FPGA's.  One way to
arrange the FPGA's (from a programming standpoint) would be to have
an 8-bit (maybe a 16-bit) on-card data bus which visits all 16 FPGA's.
This bus would be used for (at least) two purposes: (1) FPGA configuration 
download via VME, and (2) monitor data readback via VME.  A third use
of this bus would be to download the trigger programming via VME
(if it is not embedded in the FPGA configuration).  

Each FPGA would be programmed in Asynchronous Peripheral Mode.  In this
mode, the FPGA wakes up at 2 addresses.  The first address is used
for downloading the FPGA configuration.  Repeated 8-bit (Xilinx does not
support 16-bit configuration download) writes to a single address are 
used to download one FPGA's configuration.  This sounds like a good
use of VME block transfer mode in conjunction with some DMA work to
reduce the amount of programmed I/O required to configure the FPGA's.
Each FPGA would have a different "configuration address."  Each FPGA 
produces a RDY/BUSY* output which can be used to generate VME DTACK*.  
The second address is used to read back the state of the RDY/BUSY* line 
(not the DONE signal) so it doesn't sound too useful if we use RDY/BUSY*
to generate DTACK*.  

It is possible to load all 16 FPGA's in whatever order, but have them
remain "dormant" until all FPGA's are loaded.  Then we can "start" all
FPGA's simultaneously (see p. 2-28 of the Xilinx book).  They could
be "started" either via writing to a VME register (do a card at a 
time) or using one of the P1 parallel timing signals (do a crate
at a time).  Via a VME register is probably the cleaner way.  

Each FPGA produces a dedicated output (DONE) which indicates that
configuration is complete.  All 16 DONEs could be read back in a single
register.  We could read this register every 5 seconds with all of the
other monitoring data.  

The configuration data can be serially shifted out of an
FPGA for verification but I am not certain that we need to support
this.  

Each FPGA would also have (at least) one address for VME reading of
"monitoring data."  Again note that the VME readback would occur over
the same on-card bus (and FPGA pins) as FPGA programming.  We don't
know which FPGA's would really need to put monitoring data on VME
bus, so ALL FPGA's should have this ability.  

Two other ways to organize FPGA configuration would be to have one
FPGA in Asynchronous Peripheral Mode which then feeds the other 15
FPGA's in Serial Slave Mode, or to have all 16 FPGA's in Serial Slave
Mode and build our own serializer.  Both of these methods avoid having
the 8-bit on-card bus for programming, but we probably want this bus
for readback anyway.  Thus these layouts are not so interesting right
now.

So the order of operations would be:

    (1) download FPGA configuration (one FPGA at a time)
        - done via VME
        - done only after power cycles unless trigger programming
          is embedded in FPGA configuration.  

    (2) "start" the FPGA's (one card at a time)
        - done either via VME or parallel timing signal
        - this is separate from starting on-card MTG's or un-pausing
          the Frameworks. 
        - only done after downloading, not with every initialize

    (3) download trigger programming (one "register" at a time)
        - only if the trigger programming is not embedded in the
          FPGA configuration
        - i.e. "initialize"

    (4) "start" the Frameworks
        - i.e. "un-pause"

    (5) periodically read monitoring data from "registers"