FPGA Loading Straw Man ---------------------- Original: 11-APR-1995 Latest: 2-MAY-1995 Introduction ------------ The purpose of this document is to propose a method for loading the FPGA's on "the" card for the Run II Frameworks. The general idea is to implement the simplest loading method which provides all of the required functionality. The following methods have been considered: 1. Using one-time programmable FPGA's 2. Using reprogrammable FPGA's with a ROM in a ZIF socket on the card 3. Using reprogrammable FPGA's which are downloaded directly from TCC 4. Using reprogrammable FPGA's and having some non-volatile memory (e.g. flash memory) on the card. The on-card non-volatile memory could be downloaded via TCC. Each method considered ---------------------- 1. Using one-time programmable FPGA's With this method, the trigger logic is hardcoded in the FPGA's. This is very similar to the Run I Trigger Frameworks. This method has some possible good points. Downloading is not required. The final trigger logic is not going to be changed a lot (we think), so why make it downloadable? Also, there is no worry about random cosmic rays changing the trigger logic (but SRAM's are very stable now, so this isn't really a concern). However, the disadvantages appear to outweigh the advantages. Giving up reprogrammability seems to be too high a price to pay. Some of the advantages of reprogrammability are: - we can solder the FPGA's to the card (we would not want to solder one-time programmable FPGA's to the card). This will make routing easier (no sockets). - we can change the logic easily if need be - stocking spares will be easier Thus, this method is ruled out. 2. Using reprogrammable FPGA's with ROM's in ZIF sockets With this method, we get the advantages of reprogrammable FPGA's, but still avoid having to perform a download via TCC. This sounds pretty good at first, but having to remove the cards to change the FPGA configuration is again too big a price to pay. We know that these cards will have lots of cables attached to the front, which makes card removal difficult. Also, we would probably want some TCC monitoring/control of the download, so we need some path between the TCC and the FPGA's anyway. Thus, this method is ruled out. 3. Using reprogrammable FPGA's which are downloaded directly from TCC With this method, we can download the FPGA's from TCC. The penalty for changing the FPGA configuration is small (don't need to remove a card). The possible disadvantage is that it would take time to load the FPGA's. Using the Xilinx XC4025 (which has 422 kbits = 53 kbytes of program data) as an example, and scaling from the speed with which we currently transfer data via the pVBA + Vertical Interconnect (5 us per IO, or 8 us including computations or data preparation -> 125-200 kbytes/second), it would take about 250-400 msec to load one XC4025 8bits at a time. Thus it would take about 4-6.5 seconds to load 16 FPGA's on one card, and about 400-650 seconds or 7-11 mn to load the estimated 100 cards in the two Frameworks. This is at the upper limit of (or beyond) what would be acceptable. Several options can be considered, e.g. using 16 bit transfers would save a factor 2. Another approach would be to allow loading several FPGA's in parallel, since most cards should only need 1-3 different FPGA's configurations, it would save time if we could program all the FPGA's of each kind and at the same time. Saving a factor 4-16 on enough cards and bringing the FPGA logic download time down in the 2-5 mn range would be satisfactory. As an aside, the fastest that we could ever load the FPGA's is driven by VME backplane speed. The maximum throughput of our VME backplane will be 16 bits/200 ns. At this rate we could load an FPGA in about 5.3 msec, load a whole card in 84 msec, and load the whole system in about 8.4 seconds. Note that Xilinx FPGA's can not directly accept 16 bit program data. If the trigger programming (e.g. And-Or Terms vs. Specific Triggers) were not embedded in the FPGA configuration, then we would only need to download the FPGA's after a power cycle (once every n months). This is similar to the L1 and L1.5 68K code. If the trigger programming were embedded in the FPGA configuration, then every initialize would take about 5 minutes. Can we afford 5 minute initializes? A Run I initialize is about 1 minute. What about things like prescales which are changed during the run? Not all of the FPGA's would need to be downloaded, only the FPGA's on the 4 Trigger Decision Modules, so changing prescales would cost 13 seconds (assuming all 64 FPGA's were reprogrammed). 4. Using reprogrammable FPGA's with on-card non-volatile storage Finally, we could think about having the FPGA configuration stored in non-volatile memory on "the" card. This non-volatile memory could be written by TCC. This appears to recognize that the FPGA configuration will be more or less static (assuming that the trigger programming is not part of the FPGA configuration), but that we would like the option of changing it without tearing the system apart. But what is the real advantage of doing this? It would only speed up downloading the FPGA configuration. It gains nothing else. So once every n months the FPGA's could be loaded in a few seconds rather than a few minutes. This does not appear to be worth the extra complexity. If the trigger programming were embedded in the FPGA configuration, this method looks even less appealing. Even if the Trigger List Vx.y were stored in the on-card non-volatile memory, we need to deal with things like prescales and special runs. We could have multiple "pages" in the flash memory but remember that we would want the ability to store different configurations in all 16 FPGA's, so we're talking about 850 kbytes of memory per page. That's going to take up some space and also cost some money. Thus, this method is ruled out. A closer look at direct download via TCC ---------------------------------------- The method we are currently most interested in is method (3) above, using reprogrammable FPGA's which are directly downloaded from TCC. Let's look more closely at this method, and make a strawman proposal for how the FPGA's would be downloaded. Again remember that the goal is to build the simplest system which has all of the functionality that we would need. Assume that "the" card contains 16 Xilinx XC4025 FPGA's. One way to arrange the FPGA's (from a programming standpoint) would be to have an 8-bit (maybe a 16-bit) on-card data bus which visits all 16 FPGA's. This bus would be used for (at least) two purposes: (1) FPGA configuration download via VME, and (2) monitor data readback via VME. A third use of this bus would be to download the trigger programming via VME (if it is not embedded in the FPGA configuration). Each FPGA would be programmed in Asynchronous Peripheral Mode. In this mode, the FPGA wakes up at 2 addresses. The first address is used for downloading the FPGA configuration. Repeated 8-bit (Xilinx does not support 16-bit configuration download) writes to a single address are used to download one FPGA's configuration. This sounds like a good use of VME block transfer mode in conjunction with some DMA work to reduce the amount of programmed I/O required to configure the FPGA's. Each FPGA would have a different "configuration address." Each FPGA produces a RDY/BUSY* output which can be used to generate VME DTACK*. The second address is used to read back the state of the RDY/BUSY* line (not the DONE signal) so it doesn't sound too useful if we use RDY/BUSY* to generate DTACK*. It is possible to load all 16 FPGA's in whatever order, but have them remain "dormant" until all FPGA's are loaded. Then we can "start" all FPGA's simultaneously (see p. 2-28 of the Xilinx book). They could be "started" either via writing to a VME register (do a card at a time) or using one of the P1 parallel timing signals (do a crate at a time). Via a VME register is probably the cleaner way. Each FPGA produces a dedicated output (DONE) which indicates that configuration is complete. All 16 DONEs could be read back in a single register. We could read this register every 5 seconds with all of the other monitoring data. The configuration data can be serially shifted out of an FPGA for verification but I am not certain that we need to support this. Each FPGA would also have (at least) one address for VME reading of "monitoring data." Again note that the VME readback would occur over the same on-card bus (and FPGA pins) as FPGA programming. We don't know which FPGA's would really need to put monitoring data on VME bus, so ALL FPGA's should have this ability. Two other ways to organize FPGA configuration would be to have one FPGA in Asynchronous Peripheral Mode which then feeds the other 15 FPGA's in Serial Slave Mode, or to have all 16 FPGA's in Serial Slave Mode and build our own serializer. Both of these methods avoid having the 8-bit on-card bus for programming, but we probably want this bus for readback anyway. Thus these layouts are not so interesting right now. So the order of operations would be: (1) download FPGA configuration (one FPGA at a time) - done via VME - done only after power cycles unless trigger programming is embedded in FPGA configuration. (2) "start" the FPGA's (one card at a time) - done either via VME or parallel timing signal - this is separate from starting on-card MTG's or un-pausing the Frameworks. - only done after downloading, not with every initialize (3) download trigger programming (one "register" at a time) - only if the trigger programming is not embedded in the FPGA configuration - i.e. "initialize" (4) "start" the Frameworks - i.e. "un-pause" (5) periodically read monitoring data from "registers"