Questions from Uli:                         Rev. 10-May-2014
-------------------

Hi,

due to the L1Topo review on Wednesday I didn't find the time yet to
have a go at the CMX specs in much detail. I tried to concentrate on
issues related to L1Topo only, and unfortunately I might have found an
issue there. I will continue to read through the documents provided
over the weekend, but I thought I'd flag this one immediately. It is
about clocking the data in the 320 MHz domain.

Let me first summarize what L1Topo was planning for. Not everyone
might be aware of the details. I quote from a (somewhat too detailed)
explanation I sent around internally.

"The plan was, to use a crystal based mgtrefclock on L1Topo for
getting the MGT PLLs into lock. There we have a generous +/-200 ppm
lock range on the bit clock frequencies, and no requirement on the
relative phase. For the crystals we have so far got 4xLHC clock
multiples (~160.32MHZ), though 8xLHC might actually be advantageous...

At the output of the MGT the data are presented as 16-bit wide, 320MHz
data that are phase aligned to the 8-fold LHC bunch clock (usrclk)
from the TTCdec card upon link startup.

Note that this scheme is not officially supported by Xilinx, but it is
meant to work as long as exactly the same frequencies are used on
transmitting and receiving end (this is indeed guaranteed by the TTC
clock) *and* the combined effects of jitter and phase walk are well
below the data eye width (nominally +/-1.56ns @8xLHC) .

Having looked into the phase issue already recently, we concluded that
it should be about ok, however, we were already starting to look into
alternatives by that time. Just in case. Belt and braces.

Now I take from the CMX specs that the jitter cleaner device used adds
another 5-15 ns uncertainty (a bit ugly, but not relevant, since it is
just a constant term) plus! up to 1ns "Dynamic Phase
Offset". Dependent on voltage and temperature. (reference:

http://www.conwin.com/datasheets/sg/sg186.pdf )

That additional ns of uncertainty might eventually turn out lethal. We
cannot easily check for that by doing measurements, since we would
need to check the full phase space of supply voltages and temperatures
at various components. But it would be absolute horror if after 2
years of running we find the system go out of sync occasionally.

I am wondering whether that should rather rule out our previous
baseline scheme. Note, that any other scheme will invariably generate
higher latency due to required double buffering !

The (non-)options there are

(- single lane phase alignment at recovered clock: impossible due to
large number of global clocks (80) required)

- multi-lane phase alignment (at the recovered clock of one of several
lanes coming all from the same source module) :cumbersome, major
rework of deserializer firmware. Will require additional alignment to
global LHC clock (latency penalty, resync in user domain)

- MGT buffered mode (shallow buffer, not elastic): latency penalty
within the MGT. Straightforward to implement once the latency control
(buffer limits) is understood. "

I would think the latency penalty is roughly the same in both cases,
we are still below the latency limits according to Norman's
spreadsheets. My personal preference would be the latter: if we
require a buffer to resynchronize data, then use the one that is
provided by Xilinx as hard core rather than doing some horrible clock
domain crossing stuff by hand.

I would conclude that there is rather no point trying to modify CMX or
L1Topo hardware design at this stage, rather agree on a clocking
scheme that is less demanding in terms of phase stability.

Happy to receive comments back over the weekend, let's settle this one
quickly !!!

cheers
Uli


Answer:

The 6.4 Gbps communications from the CMX to L1Topo needs
to be taken very seriously.  The Xilinx IBERT tests by
themselves are not enough to prove that this communications
path works in the way that L1Calo wants to use it.

From your note I believe your main point is that we will
need very good phase stability between the GTX reference
clock on the CMX card  and  the MTG reference clock in
the L1Topo in order to operate the L1Topo MTG receivers
in a mode that results in the minimum of buffering
latency in the L1Topo.

I believe that your concern is the phase stability of
the CMX GTX reference clock over temperature.  In the
specification for the SFX_524G VCXO PLL this is called
Dynamic Phase Offset and is given as 1 nsec maximum.

In these VCXO PLL multiplier circuits the bulk of the
uncertainty in matching the phase of 320.64 MHz output to
the 40.08 MHz reference input probably comes from the
uncertainty in the propagation delay getting through the
divide by 8 counter in the feedback path from the VCXO
output to the phase detector input.  In the actual circuit
used  (Analog Devices ADF4111)  this divide by 8 is all done
in a prescaler that is guaranteed to operate up to 1.2 GHz
and thus likely has a temperature drift in its propagation
delay that is considerably less than 1 nsec.  Note that
if the main counter in the feedback path of the ADF4111
were used in the CMX application then a 1 nsec uncertainty
in the propagation delay sounds reasonable.

Note that the actual drift of the phase of the 320.64 MHz
GTX reference clock on the CMX card relative to the
reference that it receives at its clock input on
backplane pins J8-C24 and J8-C25 will also include the
following:

 - U155 the buffer in the clock line from the backplane
   an OnSemi NB6L611.  This is a 4 GHz buffer with a
   typical 280 nsec propagation delay  (and thus less
   than 280 nsec variation over temperature).

 - Uncertainty in the input to output phase relationship
   in the TTCrx ASIC chip on the TTCDec mezzanine card.
   In the TTCrx this phase relationship is adjustable
   and thus probably includes some temperature drift.
   I do not know of a specification for its stability.

 - Uncertainty in the input to output phase relationship
   in the CY2304 output buffer on the TTCDec mezzanine card.
   For this part I do not see a propagation delay vs
   temperature specification but there are propagation delay
   uncertainties of a few 100 nsec for: cycle to cycle,
   part to part, and vs load capacitance.

In addition to this we should consider the temperature drift
of input to output clock phase of the TCM card in slot 21
that receives the optical timing signal and distributes the
backplane clocks.  This too will contribute to drift in the
CMX card's 320.64 MHz GTX reference clock relative to an
absolute Atlas timing signal.

If the input to output phase stability of the SFX_524G
is an important contribution to the overall phase stability
of the  320.64 MHz GTX reference clock on the CMX card
relative to the reference that it receives from the
backplane  (which I doubt)  then I'm happy to look into
finding a better part for this application.

Operating many cards across multiple racks and crates
and holding a better than 1 nsec relative clock alignment
over a long period of operation will be a challenge.  If
this is required for CMX L1Topo operation then as you
suggest we need to start working on it soon.


Reply from Uli:                             Rev. 12-May-2014
---------------

Hi Dan,

I had a quick look at your answer. Thanks.

I would like to get a bit more precise here: I have not been talking
about the MGTrefclock. Not at the L1Topo end anyway. Might be that at
the transmitting end with the chosen clocking scheme it is the
MGTrefclk phase that defines the phase of the data stream.

Not so in the topo processor baseline clocking scheme. We are using
the incoming TTC clock, multiply it up on an MMCM and use it as usrclk
on the parallel end of the MGT. That will definitely work only if
after successful phase alignment between transmitter and receiver
there is a very low phase error.

Anyway, we have decided now to go for a different, slightly higher
latency clocking scheme. Even though initial tests with the described
scheme had been successful, we are now worried that at P1 the
situation might be worse than in our lab and we do not want to take
the risk. The latency numbers we presented at the L1Topo PRR suggest a
bit of latency contingency and we are willing to trade latency for
signal integrity. Future generations might decide differently and try
out whether it might be worth trying to squeeze a bit...

Here a few minor additional comments.

- fig. 8 suggests that 12 fibres per CMX will be runninig into an
optional patch panel. However, it is actually 24 fibres per CMX that
are routed through the optical patch panel.

- p. 68 I see that only one out of 3 quads is supplied with a TTC
based refclock. That requires internal routing of the refclock to the
neighbouring quad. According to the data sheets this should be
ok. However, in response to a L1Topo reviewer's comment (from MSU, I
believe) we redesigned the L1topo clocking scheme after the FDR so as
to avoid internal refclock routing !

That's it for now...

Thanks for the well written and comprehensive documents

cheers
Uli