Design of highly reliable memory subsystem

Although continuing cost and performance improvements of the new bipolar and MOS RAM devices are providing strong incentives for their greatly expanded use in mainframe memory and other storage applications, these components have not yet reached the degree of reliability for large memory system. Memory error frequency will increase due to the use of larger memory system and higher density memory RAMs, which are more susceptible to soft errors because of their smaller memory cell geometry. The Error Detection and Correction AM2960 allows correction of any single bit error and detection of all double and some triple bit errors. In this paper, the design of highly memory subsystem based on the use of AM2960 is presented. Since the AM2960 EDC unit improves the reliability of the overall system, diagnostic software is developed to check the operation of the EDC itself. Together with the AM2960, a 4-bit Error Correction Multiple Bus Buffers AM2962 is used to facilitate the complete data path interface between the AM2960, RAM memory and system data bus. In all the present EDC system, the processor will stop upon detection of double bit errors. In this system, the design of Error Logging and Selective Memory Scrubbing to reduce the chance of having double bit errors is discussed. Additionally, to keep system running upon detection of double bit errors, techniques of memory spare block redundancy and memory block power switching is also presented with minimum increase in cost and power.