Quick Summary:
3x the maximum estimated LHC neutron rate assumed.
* If triple voting works well, SEU damage is negligible and is tracked.
* If not, then self-testing with trigger patterns during the 88
us pixel detector
refresh cycle also results in negligible SEU damage. There
is probably
2 man-months of engineering required. This is much less than what
is required
to switch from Altera to Xilinx (1 man-year for conversion, 0.5
year for
modification of design, e.g. Xilinx flat-pack arrays have fewer
I/O pins).
* Allowing ALCT boards to initiate an auto-refresh sequence cuts
down the
problem tremendously. This does not need to have the possibility
of boards
going amok - the auto-refresh can be initiated by ALCT recognizing
an
upset and requesting the CLCT/TMB board to perform the refresh
cycle. That
board has access to the DAQ and Slow Control systems for error
logging, as
well as radiation-hard clock and control signals from the CCB board.
Note bene: If we didn't do anything other than periodic refreshes
of the entire
system(which won't happen), there would be a 0.73% effect.
This is due to equal effects of boards with at least
one SEU-affected chip, plus the time taken up during system resets.
Less Quick Summary:
Columns:
Plan == label for SEU mitigation
plan
Voting == 2/3 majority voting implemented, working
well (traps SEU's)
Test == ALCT's self-test during
pixel refresh cycle (88us at 1 Hz)
Auto == ALCT boards can initiate
auto-refresh sequence
Deadtime== average % deadtime plus error time
(i.e. post-SEU)
Plan Voting Test Auto Deadtime Comments
==== ====== ==== ==== ======== ========
A no
no no 0.73 % Optimal 150ms refresh
every 40 seconds
B yes no
no 0.063% Refresh @ 5 min (0.013% error time)
C no yes
no 0.013% Refresh @ 5 min, only error boards
D yes no
yes 0.001% Refresh when 2/3, before affects trig
E no yes
yes 0.006% (0.005% error time, 0.001% refresh)
F yes yes
yes 0.001% Better than Plan D if voting does not
catch all errors
Notes:
Another benefit of majority voting or self-tests - errors are internally
detected and the erroneous ALCT outputs are suppressed so there
is no
external effect on trigger. This benefit is not realized
by simply
resetting periodically.
VITAL DETAILS:
Input data:
The doses come from Huhtinen's talk part 1 and the plan for using protons comes from part 2.
Let's take the maximum neutron rate which is in ME1/1 and using the
safety factor of 3.
The total number of neutrons is then 2x10^12/cm^2 in 5x10^7 sec of
LHC running (10 years) at 10^34, this is a rate of 4x10^4/sec/cm^2.
The Altera sensitivity is measured to be 2.3x10^-9 cm^2 ref.
Refresh time takes about 150 ms = 0.15 s.
Each chip in ME1/1 therefore has an upset at max luminosity at the rate 4x10^4 * 2.3x10^-9 sec = 9.2x10^-5 /sec, or one upset every 10800 sec (three hours). Each board in ME1/1 has an upset every 2700 sec, since there are 4 Altera chips per board.
Periodic refresh calculation:
There are 72 ALCT boards in ME1/1, and 4x72=288 Altera chips. On the one hand, the percentage of SEU-affected boards increases linearly with time between refreshes. On the other hand, the deadtime due to the refresh cycle decreases linearly with time between refreshes. So there is an optimum time between refreshes. This optimum interval is different for the different muon stations, since the SEU rate differs.
If we refresh every 40 sec, then the time-averaged percentage of ME1/1 boards that have been SEU-affected is 1/2 * (40s/2700s) = 0.7%. The deadtime due to refresh is 0.15s/40s=0.37%. The total affected time is then 1.07%.
What about other stations? The neutron rate is a factor of four lower in ME2/1, ME3/1, and ME4/1, and yet lower in the outer stations. The SEU-affected time will therefore be below 0.18%, while the sum with the refresh time bring the total in these stations to 0.18%+0.37%=0.55% or below.
Voting calculation:
If we use the NASA technique of triplicating circuitry and then choosing the two that agree for the output, then the board supplies wrong results only when two or more circuits have encountered an SEU. According to Poisson statistics, this only happens with probability P(2)=0.5*P^2, where P is the probability of each SEU. Since P is generally a small number, P(2) can be made extremely small.
For instance, if the ALCT is refreshed every 300 seconds (5 minutes), then in ME1/1 we have Pmax=3*300sec/10800 sec=0.084. The factor of 3 in front of the previous equation is because the chip now contains three times as much circuitry. Pmax represents the likelihood of two SEUs on a chip. We must also multiply by the factor 6/9=0.67 for the likelihood that these two SEUs are in different logic chains. So actually Pmax=0.054.
The probability grows quadratically after refresh from zero up to P(2)=0.5*Pmax^2=0.146%. On time average, the probability is one third of this, or 0.049%. For other chambers with lower neutron rate, the probability is very much smaller.
The deadtime due to refresh in this case is 0.15s/300s = 0.050%. The total deadtime plus SEU-disturbed time is the sum of these, i.e. 0.099%. This should be acceptable.
An additional benefit is that the errors are internally detected and the erroneous ALCT outputs are suppressed.