Questions from Uli: Rev. 10-May-2014 ------------------- Hi, due to the L1Topo review on Wednesday I didn't find the time yet to have a go at the CMX specs in much detail. I tried to concentrate on issues related to L1Topo only, and unfortunately I might have found an issue there. I will continue to read through the documents provided over the weekend, but I thought I'd flag this one immediately. It is about clocking the data in the 320 MHz domain. Let me first summarize what L1Topo was planning for. Not everyone might be aware of the details. I quote from a (somewhat too detailed) explanation I sent around internally. "The plan was, to use a crystal based mgtrefclock on L1Topo for getting the MGT PLLs into lock. There we have a generous +/-200 ppm lock range on the bit clock frequencies, and no requirement on the relative phase. For the crystals we have so far got 4xLHC clock multiples (~160.32MHZ), though 8xLHC might actually be advantageous... At the output of the MGT the data are presented as 16-bit wide, 320MHz data that are phase aligned to the 8-fold LHC bunch clock (usrclk) from the TTCdec card upon link startup. Note that this scheme is not officially supported by Xilinx, but it is meant to work as long as exactly the same frequencies are used on transmitting and receiving end (this is indeed guaranteed by the TTC clock) *and* the combined effects of jitter and phase walk are well below the data eye width (nominally +/-1.56ns @8xLHC) . Having looked into the phase issue already recently, we concluded that it should be about ok, however, we were already starting to look into alternatives by that time. Just in case. Belt and braces. Now I take from the CMX specs that the jitter cleaner device used adds another 5-15 ns uncertainty (a bit ugly, but not relevant, since it is just a constant term) plus! up to 1ns "Dynamic Phase Offset". Dependent on voltage and temperature. (reference: http://www.conwin.com/datasheets/sg/sg186.pdf ) That additional ns of uncertainty might eventually turn out lethal. We cannot easily check for that by doing measurements, since we would need to check the full phase space of supply voltages and temperatures at various components. But it would be absolute horror if after 2 years of running we find the system go out of sync occasionally. I am wondering whether that should rather rule out our previous baseline scheme. Note, that any other scheme will invariably generate higher latency due to required double buffering ! The (non-)options there are (- single lane phase alignment at recovered clock: impossible due to large number of global clocks (80) required) - multi-lane phase alignment (at the recovered clock of one of several lanes coming all from the same source module) :cumbersome, major rework of deserializer firmware. Will require additional alignment to global LHC clock (latency penalty, resync in user domain) - MGT buffered mode (shallow buffer, not elastic): latency penalty within the MGT. Straightforward to implement once the latency control (buffer limits) is understood. " I would think the latency penalty is roughly the same in both cases, we are still below the latency limits according to Norman's spreadsheets. My personal preference would be the latter: if we require a buffer to resynchronize data, then use the one that is provided by Xilinx as hard core rather than doing some horrible clock domain crossing stuff by hand. I would conclude that there is rather no point trying to modify CMX or L1Topo hardware design at this stage, rather agree on a clocking scheme that is less demanding in terms of phase stability. Happy to receive comments back over the weekend, let's settle this one quickly !!! cheers Uli Answer: The 6.4 Gbps communications from the CMX to L1Topo needs to be taken very seriously. The Xilinx IBERT tests by themselves are not enough to prove that this communications path works in the way that L1Calo wants to use it. From your note I believe your main point is that we will need very good phase stability between the GTX reference clock on the CMX card and the MTG reference clock in the L1Topo in order to operate the L1Topo MTG receivers in a mode that results in the minimum of buffering latency in the L1Topo. I believe that your concern is the phase stability of the CMX GTX reference clock over temperature. In the specification for the SFX_524G VCXO PLL this is called Dynamic Phase Offset and is given as 1 nsec maximum. In these VCXO PLL multiplier circuits the bulk of the uncertainty in matching the phase of 320.64 MHz output to the 40.08 MHz reference input probably comes from the uncertainty in the propagation delay getting through the divide by 8 counter in the feedback path from the VCXO output to the phase detector input. In the actual circuit used (Analog Devices ADF4111) this divide by 8 is all done in a prescaler that is guaranteed to operate up to 1.2 GHz and thus likely has a temperature drift in its propagation delay that is considerably less than 1 nsec. Note that if the main counter in the feedback path of the ADF4111 were used in the CMX application then a 1 nsec uncertainty in the propagation delay sounds reasonable. Note that the actual drift of the phase of the 320.64 MHz GTX reference clock on the CMX card relative to the reference that it receives at its clock input on backplane pins J8-C24 and J8-C25 will also include the following: - U155 the buffer in the clock line from the backplane an OnSemi NB6L611. This is a 4 GHz buffer with a typical 280 nsec propagation delay (and thus less than 280 nsec variation over temperature). - Uncertainty in the input to output phase relationship in the TTCrx ASIC chip on the TTCDec mezzanine card. In the TTCrx this phase relationship is adjustable and thus probably includes some temperature drift. I do not know of a specification for its stability. - Uncertainty in the input to output phase relationship in the CY2304 output buffer on the TTCDec mezzanine card. For this part I do not see a propagation delay vs temperature specification but there are propagation delay uncertainties of a few 100 nsec for: cycle to cycle, part to part, and vs load capacitance. In addition to this we should consider the temperature drift of input to output clock phase of the TCM card in slot 21 that receives the optical timing signal and distributes the backplane clocks. This too will contribute to drift in the CMX card's 320.64 MHz GTX reference clock relative to an absolute Atlas timing signal. If the input to output phase stability of the SFX_524G is an important contribution to the overall phase stability of the 320.64 MHz GTX reference clock on the CMX card relative to the reference that it receives from the backplane (which I doubt) then I'm happy to look into finding a better part for this application. Operating many cards across multiple racks and crates and holding a better than 1 nsec relative clock alignment over a long period of operation will be a challenge. If this is required for CMX L1Topo operation then as you suggest we need to start working on it soon. Reply from Uli: Rev. 12-May-2014 --------------- Hi Dan, I had a quick look at your answer. Thanks. I would like to get a bit more precise here: I have not been talking about the MGTrefclock. Not at the L1Topo end anyway. Might be that at the transmitting end with the chosen clocking scheme it is the MGTrefclk phase that defines the phase of the data stream. Not so in the topo processor baseline clocking scheme. We are using the incoming TTC clock, multiply it up on an MMCM and use it as usrclk on the parallel end of the MGT. That will definitely work only if after successful phase alignment between transmitter and receiver there is a very low phase error. Anyway, we have decided now to go for a different, slightly higher latency clocking scheme. Even though initial tests with the described scheme had been successful, we are now worried that at P1 the situation might be worse than in our lab and we do not want to take the risk. The latency numbers we presented at the L1Topo PRR suggest a bit of latency contingency and we are willing to trade latency for signal integrity. Future generations might decide differently and try out whether it might be worth trying to squeeze a bit... Here a few minor additional comments. - fig. 8 suggests that 12 fibres per CMX will be runninig into an optional patch panel. However, it is actually 24 fibres per CMX that are routed through the optical patch panel. - p. 68 I see that only one out of 3 quads is supplied with a TTC based refclock. That requires internal routing of the refclock to the neighbouring quad. According to the data sheets this should be ok. However, in response to a L1Topo reviewer's comment (from MSU, I believe) we redesigned the L1topo clocking scheme after the FDR so as to avoid internal refclock routing ! That's it for now... Thanks for the well written and comprehensive documents cheers Uli