ALCT Radiation Calculation

Jay Hauser
11-August-2000
summary added 18-Sept-2000
some mistakes fixed 5-Oct-2000

Quick Summary:

3x the maximum estimated LHC neutron rate assumed.

* If triple voting works well, SEU damage is negligible and is tracked.

* If not, then self-testing with trigger patterns during the 88 us pixel detector
refresh cycle also results in negligible SEU damage.  There is probably
2 man-months of engineering required. This is much less than what is required
to switch from Altera to Xilinx (1 man-year for conversion, 0.5 year for
modification of design, e.g. Xilinx flat-pack arrays have fewer I/O pins).

* Allowing ALCT boards to initiate an auto-refresh sequence cuts down the
problem tremendously. This does not need to have the possibility of boards
going amok - the auto-refresh can be initiated by ALCT recognizing an
upset and requesting the CLCT/TMB board to perform the refresh cycle. That
board has access to the DAQ and Slow Control systems for error logging, as
well as radiation-hard clock and control signals from the CCB board.

Note bene: If we didn't do anything other than periodic refreshes of the entire
system(which won't happen),  there would be a 0.73% effect.
This is due to equal effects of boards with at least
one SEU-affected chip, plus the time taken up during system resets.

Less Quick Summary:

Columns:
    Plan   == label for SEU mitigation plan
    Voting == 2/3 majority voting implemented, working well (traps SEU's)
    Test   == ALCT's self-test during pixel refresh cycle (88us at 1 Hz)
    Auto   == ALCT boards can initiate auto-refresh sequence
    Deadtime== average % deadtime plus error time (i.e. post-SEU)

Plan  Voting  Test  Auto  Deadtime  Comments
====  ======  ====  ====  ========  ========
  A       no    no    no    0.73 %  Optimal 150ms refresh every 40 seconds
  B      yes    no    no    0.063%  Refresh @ 5 min (0.013% error time)
  C       no   yes    no    0.013%  Refresh @ 5 min, only error boards
  D      yes    no   yes    0.001%  Refresh when 2/3, before affects trig
  E       no   yes   yes    0.006%  (0.005% error time, 0.001% refresh)
  F      yes   yes   yes    0.001%  Better than Plan D if voting does not
                                    catch all errors

Notes:

Another benefit of majority voting or self-tests - errors are internally
detected and the erroneous ALCT outputs are suppressed so there is no
external effect on trigger.  This benefit is not realized by simply
resetting periodically.
 

VITAL DETAILS:

Input data:

The doses come from Huhtinen's talk part 1 and the plan for using protons comes from part 2.

Let's take the maximum neutron rate which is in ME1/1 and using the safety factor of 3.
The total number of neutrons is then 2x10^12/cm^2 in 5x10^7 sec of LHC running (10 years) at 10^34, this is a rate of 4x10^4/sec/cm^2.

The Altera sensitivity is measured to be 2.3x10^-9 cm^2 ref.

Refresh time takes about 150 ms = 0.15 s.

Each chip in ME1/1 therefore has an upset at max luminosity at the rate 4x10^4 * 2.3x10^-9 sec = 9.2x10^-5 /sec, or one upset every 10800 sec (three hours). Each board in ME1/1 has an upset every 2700 sec, since there are 4 Altera chips per board.

Periodic refresh calculation:

There are 72 ALCT boards in ME1/1, and 4x72=288 Altera chips.  On the one hand, the percentage of SEU-affected boards increases linearly with time between refreshes. On the other hand, the deadtime due to the refresh cycle decreases linearly with time between refreshes.  So there is an optimum time between refreshes. This optimum interval is different for the different muon stations, since the SEU rate differs.

If we refresh every 40 sec, then the time-averaged percentage of ME1/1 boards that have been SEU-affected is 1/2 * (40s/2700s) = 0.7%. The deadtime due to refresh is 0.15s/40s=0.37%. The total affected time is then 1.07%.

What about other stations? The neutron rate is a factor of four lower in ME2/1, ME3/1, and ME4/1, and yet lower in the outer stations. The SEU-affected time will therefore be below 0.18%, while the sum with the refresh time bring the total in these stations to 0.18%+0.37%=0.55% or below.

Voting calculation:

If we use the NASA technique of triplicating circuitry and then choosing the two that agree for the output, then the board supplies wrong results only when two or more circuits have encountered an SEU. According to Poisson statistics, this only happens with probability P(2)=0.5*P^2, where P is the probability of each SEU. Since P is generally a small number,   P(2) can be made extremely small.

For instance, if the ALCT is refreshed every 300 seconds (5 minutes), then in ME1/1 we have Pmax=3*300sec/10800 sec=0.084.  The factor of 3 in front of the previous equation is because the chip now contains three times as much circuitry. Pmax represents the likelihood of two SEUs on a chip. We must also multiply by the factor 6/9=0.67 for the likelihood that these two SEUs are in different logic chains. So actually Pmax=0.054.

The probability grows quadratically after refresh from zero up to P(2)=0.5*Pmax^2=0.146%. On time average, the probability is one third of this, or 0.049%. For other chambers with lower neutron rate, the probability is very much smaller.

The deadtime due to refresh in this case is 0.15s/300s = 0.050%. The total deadtime plus SEU-disturbed time is the sum of these, i.e. 0.099%. This should be acceptable.

An additional benefit is that the errors are internally detected and the erroneous ALCT outputs are suppressed.