Translation - Japanese: 非同期SRAMのソフトエラーを軽減させる様々な方法 - KBA90939 - Community Translated (JA)
Question: What are the different ways to mitigate soft errors in Asynchronous SRAMs?
The following methods are commonly used to mitigate soft errors:
Changes in process technology and cell layout
A high-energy particle incident on an SRAM cell will generate charge (i.e., an electron-hole pair). The electric field in the depletion region causes the charge to be collected by the junction of the transistor. This results in a disturbance of the current in the affected MOS structure. The restoring transistor tries to balance this disturbance. However, the finite current drive and channel conductance of this restoring MOS induces a voltage disturbance at its drain that can result in an upset. QCRIT is defined as the minimum charge collected due to a particle strike that can cause a soft error. A system with a high QCRIT is less vulnerable to soft errors.
Figure 1. Interaction of a High-Energy Particle on an SRAM Cell
Higher QCRIT can be achieved in one of two ways. You can increase junction capacitance, which requires larger geometries for transistors, or you can increase the saturation current (by lowering PMOS VT), which in results in higher leakages. Process technology and cell layout mitigation techniques come at a cost and are not always feasible.
Changes in chip-design and architecture
Architectural enhancements, such as embedded error correcting code (ECC) and bit-interleaving can be used to limit the effects of soft errors on memory devices.
Typical bit-interleave distance depends on process technology. Accelerated neutron testing is performed with a subsequent physical MBU analysis to determine the safe interleaving distance for each process technology node.
Figure 2. Non-Interleaved Memory–Physical Multiple-Cell Upset Resulting in an MBU in a Single Word
Figure 3. Interleaved Memory Array–Avoid MBU by Spreading the Data Word
At system level, soft errors can be mitigated in the following ways:
While simple to implement, system-level mitigation using the above schemes enforces usage of larger board area, higher cost, and performance penalties (in terms of delay introduced due to processing overhead for software ECC or a triple modular redundancy scheme).
For more information, refer to the following KBAs: