DDR Memory for the FPGA Fabric ---------------------------------- Initial Rev. 17-Nov-2022 Current Rev. 7-Dec-2022 This file describes the DDR Memory that is connected to the FPGA Fabric on the Disco-Kraken circuit board. The FPGA DDR memory must be able to sink 40 Gbits per second and be long enough to keep this up for about one second. This implies a total memory size of 5 GBytes which for now I will either: round down to 4 GBytes i.e. big enough to hold 0.8 seconds of data or round up to 8 GBytes i.e. big enough to hold 1.6 seconds of data. The current guess is that we will use a 4 GByte memory to buffer 0.8 seconds of PMT ADC data. An important benefit of the smaller size is to help control how much power is used by this memory system. The initial design study will look at building this memory system from either 8 Gbit or 16 Gbit size DDR4 memory chips which are available in data widths of: 4, 8, or 16 bits. Needing to sink data at 40 Gbits per second and having an overall data path that is 16 bits wide means that you need to do 2.5 G data transfers per second and that requires a 1.25 GHz clock. The FPGA's Fabric's DDR Controller can support a memory clock of 800 MHz absolute maximum. Needing to sink data at 40 Gbits per second and having an overall data path that is 32 bits wide means that you need to do 1.25 G data transfers per second and that requires at least a 625 MHz clock. Running with a 800 MHz clock gives only 28% extra capacity. We know that one does not get 100% write access to the memory chips, e.g. there are refresh cycles and we need to do some read cycles. A 64 bit wide data path would have more than 2x extra bandwidth but *may* be harder to route, will likely make more electrical noise, and will take more power. Given that there are 6 memory chip types to select from: in 8 Gbit size: 2G4 1G8 512M16 in 16 Gbit size: 4G4 2G8 1G16 If we want an overall width of 32 bits and an overall size of either 4 GBytes or 8 GBytes then the following combinations fit to make a rank = 1 memory: Option Arrangement ------ ---------------------------------------------------- 1. 8 chips of the 2G4 size gives 8 GBytes total 2. 4 chips of the 1G8 size gives 4 GBytes total 3. 4 chips of the 2G8 size gives 8 GBytes total 4. 2 chips of the 1G16 size gives 4 GBytes total Options: 2, 3, and 4 look to be the most practical for a 32 bit data bus width. If we want an overall width of 64 bits and an overall size of either 4 GBytes or 8 GBytes then the following combinations fit to make a rank = 1 memory: Option Arrangement ------ --------------------------------------------- 1. 8 chips of the 1G8 size gives 8 GBytes 2. 4 chips of the 512M16 size gives 4 GBytes 3. 4 chips of the 1G16 size gives 8 GBytes Options: 2 and 3 look to be the most practical for a 64 bit data bus width. FPGA Fabric DDR Memory Controller --------------------------------- - Most of the following is from the document, "PolarFire FPGA and PolarFire SoC FPGA Memory Controller". - The DDR Memory Controller in the FPGA Fabric is a "soft controller", i.e. it is IP that is implemented in the Fabric itself. - The Phy part of the Fabric connection to DDR devices is through a HSIO Bank. I *think* that certain signals, e.g. Address Bus or DQ lines, are locked to certain Lanes in the HSIO Bank. Their documentation says that within a byte that you may move around the DQ lines to facilitate pcb routing. - The Fabric's DDR Controller only works with: DDR3, DDR3L, LPDDR3, and DDR4 type memory chips or DIMMs. - The Fabric's DDR Controller supports "additive latency modes". - The Fabric's DDR Controller has a maximum data rate of 1600 Mbps (per data pin) and that requires a 800 MHz clock to the memory chips or DIMMs. My understanding is that only the -1 speed grade supports the 800 MHz memory clock, the standard speed grade works up to only a 660 MHz memory clock. Can we run the -1 speed grade FPGA on its standard 1.00 V VDD core supply voltage and get the 800 MHz I/O or do we need to run the FPGA at its optional 1.05 V VDD supply to get the 800 MHz performance ? Note that the ADC for the PMTs needs a 1.00 Volt supply so will we just need a 1.00 V converter or will be need both 1.00 V and 1.05 V converters on the DK board ? - The Fabric's DDR Controller supports a maximum of 16 Banks. - The Fabric's DDR Controller supports 2T Timing and 2T Timing is always enabled. - The Fabric Resources used by the DDR Memory Controller depend on the type of interface to the Controller (AXI4 or Native) and the width of the Data Bus to the memory chips. Looking at the setup the DK board requires: Interface Memory to the Data Bus Number of Number of Number of Controller Width LUT4 uSRAM LSRAM ---------- -------- --------- --------- --------- AXI4 32 18k 45 39 AXI4 64 29k 69 72 Native 32 13k 29 2 Native 64 29k 62 2 MPFS250T Provides 254k 2352 812 So - order of magnitude the DDR Controller will use about 10% of the Fabric in the MPFS250T device. Given that you probably want the overall design to use no more than about 50% of the FPGA's resources - then the DDR Controller will use about 20% of the resources actually available to you. - Required Width of the Data interface into the DDR Controller - - Section 3.4.1.4 says that Multi-Burst Capability is only available if you talk to the controller via its Native- Interface. Isn't it Multi-Burst Capability, i.e. stacking successive 8n Write Bursts one immediately after the one before it, what we need to achieve a 5 GByte per second write rate of PMT ADC data into the DDR memory ? If that is correct then we can not use the AXI4 interface between the fabric and the DDR Controller instantiated in the fabric. - Section 3.9.1.1 recommends using DDR4-2400 memory devices with the 800 MHz memory clock from the fabric DDR Controller. So they explicitly recommend using the DDR4-2400 with a memory clock that is only 66% of the nominal clock frequency for this speed grade DDR4 chip. This is a strong hint that you can operate DDR4 chips at less than their rated clock frequency, i.e. that their PLL will pull that low and if they are counting memory clock cycles to determine when to refresh that stretching out the real time is OK. This section gives other timing recommendations about setting up the fabric's DDR controller. Page 64. - Section 3.9.4 starting on page 82 describes setting up the DDR Controller to use DDR4 memory chips. It says that their DDR Controller supports DDR4 for 1600, 1866, 2133, 2400, 2666, 2933, and 3200 at a 800 MHz clock frequency. But do the faster memory chips actually work at the slower clock frequency ? DDR4-2400 seems to be a popular speed grade that one can get. 3 or 4 years from now what speed grades will be available ? Note: it appears that they want a 200 MHz reference clock for the PLL in the FPGA that will make the 800 MHz memory clock. The PLL ratio may be required to be 4 in order to make quadrature signals for the internal operation of the DDR controller. This 200 MHz fabric clock looks appropriate for other functions in the fabric. Note: it appears that with a 32 bit DQ data bus to the memory chips that you can use a 64, 128, or 256 bit wide AXI data bus to the DDR Controller. To keep up with the 1600 Meg Transfers per second operation of the 32 bit wide data bus to the memory chips (6.4 GBytes per second) these various AXI data bus widths would need to operate at the following clock rates (with one transfer per clock cycle): AXI Data Required Bus Width Clock Rate --------- ---------- 64 800 MHz 128 400 MHz 256 200 MHz Note: it appears that with a 64 bit DQ data bus to the memory chips that you can use only a 512 bit wide AXI data bus to the DDR Controller. To keep up with the 1600 Meg Transfers per second operation of the 64 bit wide data bus to the memory chips (12.8 GBytes per second) the 512 bit wide data bus to the AXI interface needs to operate at a clock rate of 200 MHz (with one transfer per clock cycle). - Section 3.9.6.2 starting on page 95 describes the Physical Constraints, i.e. what HSIO Bank pin is what DDR4 memory bus signal. It says to look at the "PolarFire SoC Package Pin Assignment Tables (PPAT)" but in looking at them I do not see the required pin assignment information. - Section 7. starting on page 140 describes the pcb design recommendations. See also 7.3 on page 145 Note the "shield" pin between bytes of DQ signals. This whole thing is repeated in section 8. page 148 and section 8.3 page 153. - Section 8.4.3 page 159 has some routing examples for (I believe) the DDR4 memory to the CPU section of the FPGA-CPU. - Section 11. page 164 gives their timing parameter abbreviations. - Section 12. page 165 gives (I believe) the required placement location of the DDR Controller in the fabric, i.e. NORTH_NE ANCHOR (HSIO). Splash Kit Demo Board for the MPFS300TS-1FCG484I: --------------------------------------------------- - This is an older PolarFire demo board from about 2017 or 2018. - Sheets 3, 4, 5 of the schematics for this demo board show the DDR memory system to the Fabric of this FPGA. - This setup uses MT40A256M16GE-083E:B DDR4 memory chips, automotive temp range 4 Gbit size parts. - Their setup uses the following bypass capacitors: 1P2V_DDR4 has: 2x 10 uFd 20x 100 nFd 20x 10 nFd the memory chip has 10 x VDD pins and 10 X VDDQ pins VDD25 has: 2x 100 nFd 2x 10 nFd the memory chip has 2x VPP pins 0P6V_REF_DDR4 has: 1x 100 nFd 1x 10 nFd the memory chip has 1x VREFCA pin Note: 0P6V_REF_DDR4 is different than 0P6V_VTT_DDR4 0P6V_REF_DDR4 comes from dual HSIO pines on sheet 3 of the schematic or is left floating with a DNL R1531. 0P6V_VTT_DDR4 comes from a TPS51200 converter on sheet 26 of the print set. 1P2V_DDR4 comes from a LX7165-01CSP converter on sheet 26 of the print set. VDD25 comes from a MIC69502WR converter on sheet 27 of the print set. 0P6V_VTT_DDR4 is for the single-ended terminators - not for the memory chips themselves: there are 25x single-ended terminators all of them are 39 Ohm next to these terminators there are 2x 10 uFd 13x 100 nFd bypass caps the single-ended terminators are on: ADDR0 through ADDR9 ADDR10/ ADDR11 ADDR12/BC_n ADDR13 ADDR14/WE_n ADDR15/CAS_n ADDR16/RAS_n BA0, BA1 BG0 PAR_IN ACT_N CKE0 CS0_n ODT0 CLK0_P CLK0_N differential clocks each have a 39 Ohm terminator tied to a common 100 nFd capacitor and the far side of this cap goes to 1P2V_DDR4 ADDR17/ALERT_N has a 4.7 k Ohm pull up 1P2V_DDR4 7-Dec-2022: ----------- Example of what actually exists in 2022 (not to imply that you can get it) in the way of 16 Gbit parts by 16 data lines: MT40A1G16 this is 1 Gig addresses each one 16 bits wideMT40A1G16 Speed Grade Data Rate (MT/s) Target CL-nRCD-nRP tAA (ns) tRCD (ns) tRP (ns) -068 2933 21-21-21 14.32 (13.75) 14.32 (13.75) 14.32 (13.75) MT40A1G16TB-068