US7114116B2 - Accelerated Galois data integrity crosscheck system and method - Google Patents

Accelerated Galois data integrity crosscheck system and method Download PDF

Info

Publication number
US7114116B2
US7114116B2 US10/244,121 US24412102A US7114116B2 US 7114116 B2 US7114116 B2 US 7114116B2 US 24412102 A US24412102 A US 24412102A US 7114116 B2 US7114116 B2 US 7114116B2
Authority
US
United States
Prior art keywords
information
data
error correction
exclusive
controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime, expires
Application number
US10/244,121
Other versions
US20040054956A1 (en
Inventor
James Byrd
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle America Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Priority to US10/244,121 priority Critical patent/US7114116B2/en
Assigned to SUN MICROSYSTEMS, INC. reassignment SUN MICROSYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BYRD, JAMES
Publication of US20040054956A1 publication Critical patent/US20040054956A1/en
Application granted granted Critical
Publication of US7114116B2 publication Critical patent/US7114116B2/en
Assigned to Oracle America, Inc. reassignment Oracle America, Inc. MERGER AND CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: Oracle America, Inc., ORACLE USA, INC., SUN MICROSYSTEMS, INC.
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/09Error detection only, e.g. using cyclic redundancy check [CRC] codes or single parity bit
    • H03M13/091Parallel or block-wise CRC computation
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/65Purpose and implementation aspects
    • H03M13/6508Flexibility, adaptability, parametrability and configurability of the implementation
    • H03M13/6516Support of multiple code parameters, e.g. generalized Reed-Solomon decoder for a variety of generator polynomials or Galois fields

Definitions

  • This invention relates to the field of information error correction. More particularly, embodiments of the present invention relate to systems and methods for data checking.
  • auxiliary physical storage systems typically cannot tolerate memory error faults and the memory error faults usually have significant detrimental impacts on functionality and utility of a system.
  • Traditional information errors often result in crashes or terminations that impede smooth operation continuity.
  • the approach to memory errors is to reboot the system. Rebooting from a system crash due to a memory error is very inconvenient, usually results in wasting significant resources and significantly increasing the probability of data corruption.
  • Traditional memory error failure modes often result in lost information and delays incurred during the reboot process waste time.
  • the present invention is a error management system and method that facilitates convenient error detection with minimal resource consumption in a manner that permits smooth and continuous operations.
  • a present invention computer memory error management system and method corrects memory errors and handles memory control buffer corruption concerns.
  • the present invention permits error detection to be performed on a word basis (e.g., 32 bits in parallel).
  • An exclusive OR and shift function is performed on each bit of data and cyclic redundancy code (CRC) bit in parallel. If a feedback value (e.g., a standard poly divider) is a logical one it is also included in the exclusive OR function.
  • each data word is effectively multiplied by alpha to the first power, where alpha is a root solution to the applicable polynomial utilized to calculate the CRC.
  • the instructions are in assembly language configured with a machine instruction shift through carry.
  • FIG. 1A is a flow chart of a memory error management method, one embodiment of the present invention.
  • FIG. 1B is a block diagram of an error detection system in accordance with one embodiment of the present invention.
  • FIG. 2 is a block diagram illustration of a memory error management computer system, one exemplary system for implementing methods of the present invention.
  • FIG. 3 is a block diagram of one embodiment of a present invention memory controller.
  • FIG. 4 is a flow chart of a cyclic redundancy code process in accordance with one embodiment of the present invention.
  • FIG. 1A is a flow chart of data error correction method 100 , one embodiment of the present invention.
  • Data error correction method 100 facilitates error correction of data.
  • data error correction method 100 utilizes a Galois Finite Field Equation with primitive irreducible polynomials over FG(2) with linearly independent roots and the reciprocal polynomial with linearly independent roots.
  • data error correction method 100 enables management of disk array controller cache memory errors.
  • step 110 data and error correction code information is received.
  • a data block with a cyclic redundancy code (CRC) value attached to the end is received.
  • an error correction code (ECC) capable memory receives the data and error correction code.
  • the information is a plurality of bits (e.g., electrical signals representing logical ones and zeroes) received by a memory controller buffer location (e.g., a cell of a memory controller cache of a disk array memory system).
  • a memory controller buffer location e.g., a cell of a memory controller cache of a disk array memory system.
  • the present invention is readily adaptable to a variety of implementations, including receiving the data and error correction code information in a communication system. It is also compatible with information in different configurations, for example a memory controller buffer location may include a word comprising four bytes (32 bits) of information, two bytes (16 bits), eight bytes (64 bits), etc.
  • an exclusive OR function is performed on the data in parallel with the error correction code information on a word basis.
  • each bit of the error correction code is shifted though a register as an exclusive OR function is performed with the a data bit.
  • a CRC value is shifted into a register after exclusive ORing it with a single data bit of a 32 bit word.
  • a feedback value is included in the exclusive OR function whenever the most significant bit of the CRC value is a logical one value.
  • step 130 the results of the exclusive OR function on the data and error correction code information are analyzed to determine if an error exists. For example, if a CRC value calculated in step 120 does not equal zero then a determination is made that an error exists.
  • an algorithm compares the error to a table that maps correctable errors (e.g., single bit error and select multi-bit errors). If the error is a correctable error the process proceeds to step 140 .
  • An error correction process is engaged in step 140 .
  • the error correction process corrects the errors inline as the information is accessed (e.g., during the data's transfer return to a PCI bus).
  • an exclusive OR (XOR) array technique is utilized to correct an error.
  • FIG. 1B is a block diagram of error detection system 500 , one embodiment of the present invention.
  • Error detection system 500 is one exemplary a hardware circuit implementation of the present invention.
  • Error detection system 500 provides relatively fast and convenient calculation of a CRC value.
  • Error detection system 500 comprises registers 700 through 731 , exclusive OR gates 800 through 832 , feedback loop 950 (e.g., 0X04C11DB7) and data inputs 900 though 931 .
  • Registers 700 through 731 and exclusive OR gates 800 through 832 are alternately coupled so that the input and output of each exclusive OR gate 800 through 832 coupled to a registers 700 through 731 respectively except the output of exclusive OR gate 832 is coupled to the input of exclusive OR gate 800 .
  • Each data input 900 though 931 is coupled to an input of exclusive OR gate 801 through 832 respectively and feedback values of a logical one are feed into the corresponding exclusive OR gate. For example, if the least significant portion of the feedback is a hexi-decimal 7 (0111 binary) then exclusive OR gates 800 though 802 have a feedback and exclusive Or gate 803 does not.
  • FIG. 2 is a block diagram illustration of memory error management computer system 200 , one exemplary system for implementing methods of the present invention.
  • Memory error management computer system 200 comprises a host 210 and a memory device 250 .
  • the one embodiment of the present invention memory device 250 is a peripheral or an auxiliary memory device (e.g., a disk array storage system).
  • Memory device 250 includes physical memory medium 240 (e.g., a disk array subsystem), and memory controller 220 .
  • Memory controller 220 includes memory controller buffers 221 .
  • memory controller buffer 221 is a cache.
  • Host 210 is coupled to memory controller 220 via an interconnect 270
  • memory controller 220 is coupled to physical memory medium 240 .
  • host 210 is a server computer system that provides access and services to other computers (not shown).
  • the components of memory error management computer system 200 cooperatively operate to store, communicate and process information while efficiently managing memory errors that occur during the communication of information.
  • Host 210 executes processing operations that provide and receive information to and from memory device 250 .
  • Memory storage device 250 stores information while providing error detection management.
  • Information communicated between host 210 and memory storage device 250 is routed via memory controller 220 .
  • memory controller 220 provides an information communication interface that handles memory error detection and memory error correction of correctable errors (e.g., single bit errors).
  • controller buffer 221 information communicated between physical memory medium 240 and host server computer system 210 is routed via controller buffer 221 .
  • overall memory input/output (I/O) operations involve host 210 writing and reading information to and from the physical memory medium 240 .
  • information is sent (e.g., written) to and received (e.g., read) from the control buffer 221 .
  • Memory error management is performed when communicating (e.g., reading) the information from the controller buffer 221 to either the host 210 or the physical memory medium 240 .
  • FIG. 3 is a block diagram of memory controller 300 one embodiment of a present invention memory controller (e.g., memory controller 221 ).
  • Memory controller 300 comprises controller processing core 310 , controller buffer 320 , XOR array 330 , accumulator 340 , backend interface 355 , front end interface 357 and communication bus 350 .
  • Communication bus 350 is coupled to controller processing core 310 , backend interface 355 , front end interface 357 and XOR array 330 which is coupled to controller buffer 320 and accumulator 340 .
  • controller processing core 310 comprises central processing unit (CPU) 311 for processing memory controller instructions, a non-volatile memory 312 (e.g., a read only memory ROM) for storing static information for CPU 311 , volatile memory 313 (e.g., a random access memory RAM) for storing information for CPU 311 , input and output (I/O) communication interface 314 (e.g., a peripheral component interconnect (PCI) bridge) communicates information and address data bus 315 for communicating information between components within controller processing core 310 .
  • CPU central processing unit
  • non-volatile memory 312 e.g., a read only memory ROM
  • volatile memory 313 e.g., a random access memory RAM
  • I/O input and output
  • PCI peripheral component interconnect
  • Controller processing core 310 directs the operations of memory controller 300 .
  • controller processing core 310 is utilized as a platform to implement present invention memory error management processes (e.g., a memory cell error resolution process of step 150 ).
  • Buffer 320 stores information being communicated by memory controller 300 (e.g., between a host and physical memory medium).
  • Logical exclusive or (XOR) array 330 provides correction of single bit errors. The correction is performed inline as the data is process through direct memory access (DMA) to a front end or back end interface.
  • Accumulator 340 stores information associated with the logic and arithmetic operations of the XOR array.
  • Back end interface provides a communications interface to back end devices (e.g., a memory storage medium).
  • Front end interface provides a communications interface to front end devices (e.g., a host).
  • Communication bus 350 communicates information between the components of memory controller 300 .
  • FIG. 4 is a flow chart of cyclic redundancy code process 400 .
  • the instructions of cyclic redundancy code process 400 are stored on a computer readable medium that when loaded in a processor cause the processor to perform a cyclic redundancy code process.
  • memory cell error resolution process prevents corrupted information in a memory controller buffer from causing synchronization and incoherency problems between a host and a physical memory storage medium.
  • step 410 information is received with data and cyclic redundancy code (CRC) value.
  • the information includes a 32 bit word.
  • the CRC value is set equal to the exclusive OR of the CRC information and the data in step 420 .
  • setting the CRC value equal to the exclusive OR of the CRC information and the data is performed on a 32 bit word basis.
  • a pointer to the data is incremented.
  • a shift of the CRC value is performed. The shift may be performed in a variety of ways including multiplying the CRC value info by two or adding the CRC value to itself.
  • an exclusive OR combination of the shifted CRC information and a feedback value is performed.
  • the instructions for cyclic redundancy code process 400 can be encoded in a variety of software languages.
  • C programming language is utilized. For example:
  • the instructions are in assembly language configured with a machine instruction shift through carry.
  • ⁇ 1 is alpha to the power of 1 and alpha is defined as the solution to the finite field polynomial equal to zero.
  • the function D(x) represents the data equation with coefficients of the powers of ⁇ n from the finite field polynomial.
  • the present invention facilitates correction and resolution of errors that have detrimental impacts on the operations and functionality of systems utilizing error checking and correction memories.
  • a present invention error detection system and method facilitates convenient and efficient error correction.
  • the present invention system and method is compatible with CRC that ensures high level of randomness in it cyclic characteristics.
  • the CRC can be calculated rapidly on a word basis with minimal consumption of clock cycles for each word.

Abstract

The present invention permits error detection to be performed on a word basis (e.g., 32 bits in parallel). An exclusive OR function is performed on each bit of data and cyclic redundancy code (CRC) bit in parallel. If a feedback value (e.g., a standard poly divider) is a logical one it is also included in the exclusive OR function. The present invention is readily adaptable for use with a variety of CRC polynomials (e.g., any Galois Finited Field Equation with primitive irreducible polynomials over GF(2) with linearly independent roots and the reciprocal polynomial with linearly independent roots. In one embodiment, each data word is effectively multiplied by alpha to the first power, where alpha is a root solution to the applicable polynomial utilized to calculate the CRC. In one exemplary implementation of the present invention, the instructions are in assembly language configured with a machine instruction shift through carry.

Description

FIELD OF THE INVENTION
This invention relates to the field of information error correction. More particularly, embodiments of the present invention relate to systems and methods for data checking.
BACKGROUND OF THE INVENTION
Electronic systems and circuits have made a significant contribution towards the advancement of modern society and are utilized in a number of applications to achieve advantageous results. Numerous electronic technologies such as digital computers, calculators, audio devices, video equipment, and telephone systems have facilitated increased productivity and reduced costs in analyzing and communicating data, ideas and trends in most areas of business, science, education and entertainment. Realizing these advantageous results frequently requires systems to manipulate very large amounts of information. The information is often communicated between devices and errors sometimes occur in the communication of information to and from the devices. These errors often have detrimental impacts on the operations and functionality of the computer system.
Errors in the communication of information often causes system crashes. For example, traditional auxiliary physical storage systems typically cannot tolerate memory error faults and the memory error faults usually have significant detrimental impacts on functionality and utility of a system. Traditional information errors often result in crashes or terminations that impede smooth operation continuity. For example, in traditional auxiliary physical storage systems the approach to memory errors is to reboot the system. Rebooting from a system crash due to a memory error is very inconvenient, usually results in wasting significant resources and significantly increasing the probability of data corruption. Traditional memory error failure modes often result in lost information and delays incurred during the reboot process waste time.
Traditional error detection systems and methods are typically either very complex or have very limited memory error detection capabilities (such as parity checking). Attempts at providing more robust error detection typically involve very complicated algorithms and/or convoluted information manipulation to detect an error. The traditional error detection systems and methods also typically perform a byte-wise calculation of an error detection code which consumes significant processing resources. Other traditional systems that attempt to provide greater granularity in the error detection, the information is broken out by software one bit at a time and each bit is individually strobed into a register. Some systems may attempt to grab 32 bits of information at the same time and multiplying each byte by 32 which in the end consumes essentially the same resources as breaking the information up and strobing it in one bit at a time.
What is required is a system and method that provides a fine granularity to error detection utilizing minimal resources.
SUMMARY
The present invention is a error management system and method that facilitates convenient error detection with minimal resource consumption in a manner that permits smooth and continuous operations. A present invention computer memory error management system and method corrects memory errors and handles memory control buffer corruption concerns. The present invention permits error detection to be performed on a word basis (e.g., 32 bits in parallel). An exclusive OR and shift function is performed on each bit of data and cyclic redundancy code (CRC) bit in parallel. If a feedback value (e.g., a standard poly divider) is a logical one it is also included in the exclusive OR function. The present invention is readily adaptable for use with a variety of CRC polynomials (e.g., any Galois Finited Field Equation with primitive irreducible polynomials over GF(2) with linearly independent roots and the reciprocal polynomial with linearly independent roots. In one embodiment, each data word is effectively multiplied by alpha to the first power, where alpha is a root solution to the applicable polynomial utilized to calculate the CRC. In one exemplary implementation of the present invention, the instructions are in assembly language configured with a machine instruction shift through carry.
DESCRIPTION OF THE DRAWINGS
FIG. 1A is a flow chart of a memory error management method, one embodiment of the present invention.
FIG. 1B is a block diagram of an error detection system in accordance with one embodiment of the present invention.
FIG. 2 is a block diagram illustration of a memory error management computer system, one exemplary system for implementing methods of the present invention.
FIG. 3 is a block diagram of one embodiment of a present invention memory controller.
FIG. 4 is a flow chart of a cyclic redundancy code process in accordance with one embodiment of the present invention.
DETAILED DESCRIPTION
Reference will now be made in detail to the preferred embodiments of the invention, a computer memory error management system and method, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be obvious to one ordinarily skilled in the art that the present invention may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the current invention.
Some portions of the detailed descriptions which follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within an electronic system. These descriptions and representations are the means used by those skilled in the digital arts to most effectively convey the substance of their work to others skilled in the art. A procedure, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in the electronic system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise or as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing”, “computing”, “calculating”, “determining”, “displaying”, or the like, refer to the action and processes of an electronic device that manipulates and transforms data represented as physical (electronic) quantities within the system (e.g., within registers, logic gates, memories, etc.) into other data similarly represented as physical quantities within the system devices or other such information storage, transmission or display devices.
FIG. 1A is a flow chart of data error correction method 100, one embodiment of the present invention. Data error correction method 100 facilitates error correction of data. In one embodiment of the present invention, data error correction method 100 utilizes a Galois Finite Field Equation with primitive irreducible polynomials over FG(2) with linearly independent roots and the reciprocal polynomial with linearly independent roots. In one exemplary disk array memory system implementation of the present invention, data error correction method 100 enables management of disk array controller cache memory errors.
In step 110, data and error correction code information is received. For example, a data block with a cyclic redundancy code (CRC) value attached to the end is received. In one embodiment of the present invention, an error correction code (ECC) capable memory receives the data and error correction code. In one exemplary implementation of the present invention, the information is a plurality of bits (e.g., electrical signals representing logical ones and zeroes) received by a memory controller buffer location (e.g., a cell of a memory controller cache of a disk array memory system). The present invention is readily adaptable to a variety of implementations, including receiving the data and error correction code information in a communication system. It is also compatible with information in different configurations, for example a memory controller buffer location may include a word comprising four bytes (32 bits) of information, two bytes (16 bits), eight bytes (64 bits), etc.
At step 120, an exclusive OR function is performed on the data in parallel with the error correction code information on a word basis. In one embodiment, each bit of the error correction code is shifted though a register as an exclusive OR function is performed with the a data bit. In one embodiment of the present invention, a CRC value is shifted into a register after exclusive ORing it with a single data bit of a 32 bit word. In one exemplary implementation, a feedback value is included in the exclusive OR function whenever the most significant bit of the CRC value is a logical one value.
In step 130, the results of the exclusive OR function on the data and error correction code information are analyzed to determine if an error exists. For example, if a CRC value calculated in step 120 does not equal zero then a determination is made that an error exists. In another embodiment of the present invention, an algorithm compares the error to a table that maps correctable errors (e.g., single bit error and select multi-bit errors). If the error is a correctable error the process proceeds to step 140.
An error correction process is engaged in step 140. The error correction process corrects the errors inline as the information is accessed (e.g., during the data's transfer return to a PCI bus). In one embodiment of the present invention, an exclusive OR (XOR) array technique is utilized to correct an error.
FIG. 1B is a block diagram of error detection system 500, one embodiment of the present invention. Error detection system 500 is one exemplary a hardware circuit implementation of the present invention. Error detection system 500 provides relatively fast and convenient calculation of a CRC value. Error detection system 500 comprises registers 700 through 731, exclusive OR gates 800 through 832, feedback loop 950 (e.g., 0X04C11DB7) and data inputs 900 though 931. Registers 700 through 731 and exclusive OR gates 800 through 832 are alternately coupled so that the input and output of each exclusive OR gate 800 through 832 coupled to a registers 700 through 731 respectively except the output of exclusive OR gate 832 is coupled to the input of exclusive OR gate 800. Each data input 900 though 931 is coupled to an input of exclusive OR gate 801 through 832 respectively and feedback values of a logical one are feed into the corresponding exclusive OR gate. For example, if the least significant portion of the feedback is a hexi-decimal 7 (0111 binary) then exclusive OR gates 800 though 802 have a feedback and exclusive Or gate 803 does not.
FIG. 2 is a block diagram illustration of memory error management computer system 200, one exemplary system for implementing methods of the present invention. Memory error management computer system 200 comprises a host 210 and a memory device 250. The one embodiment of the present invention memory device 250 is a peripheral or an auxiliary memory device (e.g., a disk array storage system). Memory device 250 includes physical memory medium 240 (e.g., a disk array subsystem), and memory controller 220. Memory controller 220 includes memory controller buffers 221. In one embodiment of the present invention, memory controller buffer 221 is a cache. Host 210 is coupled to memory controller 220 via an interconnect 270, and memory controller 220 is coupled to physical memory medium 240. In one exemplary implementation of the present invention, host 210 is a server computer system that provides access and services to other computers (not shown).
The components of memory error management computer system 200, cooperatively operate to store, communicate and process information while efficiently managing memory errors that occur during the communication of information. Host 210 executes processing operations that provide and receive information to and from memory device 250. Memory storage device 250 stores information while providing error detection management. Information communicated between host 210 and memory storage device 250 is routed via memory controller 220. For example, memory controller 220 provides an information communication interface that handles memory error detection and memory error correction of correctable errors (e.g., single bit errors).
Referring still to FIG. 2, information communicated between physical memory medium 240 and host server computer system 210 is routed via controller buffer 221. In one exemplary implementation, overall memory input/output (I/O) operations involve host 210 writing and reading information to and from the physical memory medium 240. In order to perform the overall input/output operations information is sent (e.g., written) to and received (e.g., read) from the control buffer 221. Memory error management is performed when communicating (e.g., reading) the information from the controller buffer 221 to either the host 210 or the physical memory medium 240.
FIG. 3 is a block diagram of memory controller 300 one embodiment of a present invention memory controller (e.g., memory controller 221). Memory controller 300 comprises controller processing core 310, controller buffer 320, XOR array 330, accumulator 340, backend interface 355, front end interface 357 and communication bus 350. Communication bus 350 is coupled to controller processing core 310, backend interface 355, front end interface 357 and XOR array 330 which is coupled to controller buffer 320 and accumulator 340. In one exemplary implementation of the present invention, controller processing core 310 comprises central processing unit (CPU) 311 for processing memory controller instructions, a non-volatile memory 312 (e.g., a read only memory ROM) for storing static information for CPU 311, volatile memory 313 (e.g., a random access memory RAM) for storing information for CPU 311, input and output (I/O) communication interface 314 (e.g., a peripheral component interconnect (PCI) bridge) communicates information and address data bus 315 for communicating information between components within controller processing core 310.
The components of memory controller 300 cooperatively operate to facilitate fault tolerant memory error management. Controller processing core 310 directs the operations of memory controller 300. In one embodiment of the present invention, controller processing core 310 is utilized as a platform to implement present invention memory error management processes (e.g., a memory cell error resolution process of step 150). Buffer 320 stores information being communicated by memory controller 300 (e.g., between a host and physical memory medium). Logical exclusive or (XOR) array 330 provides correction of single bit errors. The correction is performed inline as the data is process through direct memory access (DMA) to a front end or back end interface. Accumulator 340 stores information associated with the logic and arithmetic operations of the XOR array. Back end interface provides a communications interface to back end devices (e.g., a memory storage medium). Front end interface provides a communications interface to front end devices (e.g., a host). Communication bus 350 communicates information between the components of memory controller 300.
FIG. 4 is a flow chart of cyclic redundancy code process 400. In one embodiment the instructions of cyclic redundancy code process 400 are stored on a computer readable medium that when loaded in a processor cause the processor to perform a cyclic redundancy code process. In one embodiment of the present invention, memory cell error resolution process prevents corrupted information in a memory controller buffer from causing synchronization and incoherency problems between a host and a physical memory storage medium.
In step 410 information is received with data and cyclic redundancy code (CRC) value. In one exemplary implementation, the information includes a 32 bit word. The CRC value is set equal to the exclusive OR of the CRC information and the data in step 420. For example, setting the CRC value equal to the exclusive OR of the CRC information and the data is performed on a 32 bit word basis. At step 430 a pointer to the data is incremented. In step 430 a shift of the CRC value is performed. The shift may be performed in a variety of ways including multiplying the CRC value info by two or adding the CRC value to itself. At step 440 an exclusive OR combination of the shifted CRC information and a feedback value is performed.
The instructions for cyclic redundancy code process 400 can be encoded in a variety of software languages. In one exemplary implementation C programming language is utilized. For example:
crc = crc {circumflex over ( )} *data++; // one clock cycle for 4 bytes
if (crc & 0x80000000); // two clock cycles for test
 crc = (crc + crc) {circumflex over ( )} 0x04c11db7; //two clock cycle (1 bit shift, xor)
else
 crc = crc+crc;

In one exemplary implementation the instructions are in assembly language configured with a machine instruction shift through carry. The present invention CRC calculation can be represented mathematically by the following formula:
Rx=[α 1 *D(x)]MOD(X+α 1).
Where α1 is alpha to the power of 1 and alpha is defined as the solution to the finite field polynomial equal to zero. The function D(x) represents the data equation with coefficients of the powers of αn from the finite field polynomial.
Thus, the present invention facilitates correction and resolution of errors that have detrimental impacts on the operations and functionality of systems utilizing error checking and correction memories. A present invention error detection system and method facilitates convenient and efficient error correction. The present invention system and method is compatible with CRC that ensures high level of randomness in it cyclic characteristics. The CRC can be calculated rapidly on a word basis with minimal consumption of clock cycles for each word.
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order best to explain the principles of the invention and its practical application, thereby to enable others skilled in the art best to utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

Claims (20)

1. A data error correction method, comprising:
receiving data and error correction code information;
performing an exclusive OR function of data in parallel with the error correction code information on a word basis, wherein said exclusive OR function provides a result equivalent to multiplying each word of said data by a root solution of a Galois finite field polynomial equation;
analyzing the results of the exclusive OR function to determine if an error exists; and
engaging in an error correction process.
2. The data error correction method of claim 1 wherein said data and error correction code information is stored as a plurality of bits in a memory controller buffer cell.
3. The data error correction method of claim 1 wherein said data and error correction code information is stored in a memory controller buffer of a disk array memory system.
4. The data error correction method of claim 1 wherein said word basis is a 32 bit word basis.
5. The data error correction method of claim 1 wherein performing said exclusive OR function of data in parallel with the error correction code information includes a feedback includes a standard 32 bit divider value.
6. The data error correction method of claim 5, wherein said standard 32-bit divider value represents the hexadecimal value 0x04C11DB7.
7. The data error correction method of claim 1 said error correction process includes rewriting information to a memory control buffer location.
8. An information communication controller, comprising:
a communication bus for communicating information between components of said information communication controller;
a controller processing core for directing operations of said information communication controller and providing a platform to implement data error correction process in which a cyclic redundancy code is determined by generating solutions for a Galois Finite Field polynomial equation, said controller processing core coupled to said communication bus;
an XOR array for detecting and correcting bit errors on an information word basis by exclusive oring data in parallel with a shift register, said XOR array coupled to said communication bus;
a buffer for storing information being communicated by said information communication controller between a host and a physical memory medium, said controller buffer coupled to said XOR array; and
an interface port for conveying information to or from another device, said interface port coupled to said communication bus.
9. The information communication controller of claim 8 wherein an error is detected at a location within a memory controller buffer and said controller processing core directs rewriting information to said location and rechecking for an error in said location.
10. The information communication controller of claim 9 wherein said controller processing core directs a reread of information from a physical memory medium and the reread information is utilized in said rewrite to said location.
11. The information communication controller of claim 9 wherein said controller processing core fences off said location and rewrites said information to a different location within said memory controller buffer.
12. The information communication controller of claim 9 wherein exclusive oring is performed on a word basis.
13. The information communication controller of claim 9 further comprising an accumulator for storing information associated with the logic and arithmetic operations of said XOR array, said accumulator coupled to said XOR array.
14. The information communication controller of claim 9 wherein said controller processing core directs tracking of error information.
15. A computer readable medium storing instructions configured to cause a processor to perform a cyclic redundancy code process, said instructions configured to implement:
receiving information including a data value and a cyclic redundancy code (CRC) value;
setting the CRC value equal to the exclusive OR of the CRC value and the data value; incrementing a pointer to the data value;
performing a shift of the CRC value subsequent to setting the CRC value equal to the exclusive OR of the CRC value and the data value; and
performing an exclusive OR combination of the shifted CRC value and a feedback value to produce an updated CRC value;
analyzing the updated CRC value to determine if an error exists; and
engaging in an error correction process if the error exists.
16. The computer readable medium of claim 15 wherein said shift is performed by multiplying said CRC value info by two.
17. The computer readable medium of claim 15 wherein said shift is performed by adding said CRC value to itself.
18. The computer readable medium of claim 15 wherein said data value includes a 32 bit word.
19. The computer readable medium of claim 15 wherein said setting the CRC value equal to the exclusive OR of the CRC information and the data is performed on a 32 bit word basis.
20. The computer readable medium of claim 15 wherein said instructions are in assembly language configured with a machine instruction shift through carry.
US10/244,121 2002-09-13 2002-09-13 Accelerated Galois data integrity crosscheck system and method Expired - Lifetime US7114116B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/244,121 US7114116B2 (en) 2002-09-13 2002-09-13 Accelerated Galois data integrity crosscheck system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/244,121 US7114116B2 (en) 2002-09-13 2002-09-13 Accelerated Galois data integrity crosscheck system and method

Publications (2)

Publication Number Publication Date
US20040054956A1 US20040054956A1 (en) 2004-03-18
US7114116B2 true US7114116B2 (en) 2006-09-26

Family

ID=31991827

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/244,121 Expired - Lifetime US7114116B2 (en) 2002-09-13 2002-09-13 Accelerated Galois data integrity crosscheck system and method

Country Status (1)

Country Link
US (1) US7114116B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282743A1 (en) * 2005-05-24 2006-12-14 Michael Kounavis Instructions for performing modulo-2 multiplication and bit reflection
US9495165B2 (en) 2009-12-17 2016-11-15 Intel Corporation Method and apparatus for performing a shift and exclusive or operation in a single instruction
US10243583B2 (en) 2017-06-16 2019-03-26 Western Digital Technologies, Inc. CPU error remediation during erasure code encoding

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7188296B1 (en) 2003-10-30 2007-03-06 Sun Microsystems, Inc. ECC for component failures using Galois fields
US7516246B2 (en) * 2005-10-27 2009-04-07 International Business Machines Corporation Communications channel method for verifying integrity of untrusted subsystem responses to a request
KR100991957B1 (en) * 2009-01-20 2010-11-04 주식회사 팬택 Apparatus and method for scrambling sequence generation in a broadband wireless communication system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4433388A (en) 1980-10-06 1984-02-21 Ncr Corporation Longitudinal parity
US4949342A (en) * 1987-04-14 1990-08-14 Matsushita Electric Industrial Co., Ltd. Code error detecting method
US5020060A (en) 1987-06-30 1991-05-28 Matsushita Electric Industrial Co., Ltd. Error code correction device having a galois arithmetic unit
US5177747A (en) * 1989-10-16 1993-01-05 International Business Machines Corp. Personal computer memory bank parity error indicator
US5303302A (en) * 1992-06-18 1994-04-12 Digital Equipment Corporation Network packet receiver with buffer logic for reassembling interleaved data packets
US5537421A (en) * 1988-10-07 1996-07-16 Advanced Micro Devices, Inc. Single chip error processor
US5606662A (en) * 1995-03-24 1997-02-25 Advanced Micro Devices, Inc. Auto DRAM parity enable/disable mechanism
US6374383B1 (en) 1999-06-07 2002-04-16 Maxtor Corporation Determining error locations using error correction codes
US6704838B2 (en) 1997-10-08 2004-03-09 Seagate Technology Llc Hybrid data storage and reconstruction system and method for a data storage device
US6738947B1 (en) 1999-10-25 2004-05-18 Matsushita Electric Industrial Co., Ltd. Method and apparatus for error correction
US6766493B1 (en) 2000-12-21 2004-07-20 Cisco Technology, Inc. Method and apparatus for generating and checking cyclic redundancy code (CRC) values using a CRC generator and binary galois field multiplier
US6779150B1 (en) * 2000-12-21 2004-08-17 Emc Corporation CRC error detection system and method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4433388A (en) 1980-10-06 1984-02-21 Ncr Corporation Longitudinal parity
US4949342A (en) * 1987-04-14 1990-08-14 Matsushita Electric Industrial Co., Ltd. Code error detecting method
US5020060A (en) 1987-06-30 1991-05-28 Matsushita Electric Industrial Co., Ltd. Error code correction device having a galois arithmetic unit
US5537421A (en) * 1988-10-07 1996-07-16 Advanced Micro Devices, Inc. Single chip error processor
US5177747A (en) * 1989-10-16 1993-01-05 International Business Machines Corp. Personal computer memory bank parity error indicator
US5303302A (en) * 1992-06-18 1994-04-12 Digital Equipment Corporation Network packet receiver with buffer logic for reassembling interleaved data packets
US5606662A (en) * 1995-03-24 1997-02-25 Advanced Micro Devices, Inc. Auto DRAM parity enable/disable mechanism
US6704838B2 (en) 1997-10-08 2004-03-09 Seagate Technology Llc Hybrid data storage and reconstruction system and method for a data storage device
US6374383B1 (en) 1999-06-07 2002-04-16 Maxtor Corporation Determining error locations using error correction codes
US6738947B1 (en) 1999-10-25 2004-05-18 Matsushita Electric Industrial Co., Ltd. Method and apparatus for error correction
US6766493B1 (en) 2000-12-21 2004-07-20 Cisco Technology, Inc. Method and apparatus for generating and checking cyclic redundancy code (CRC) values using a CRC generator and binary galois field multiplier
US6779150B1 (en) * 2000-12-21 2004-08-17 Emc Corporation CRC error detection system and method

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
"A Painless Guide to CRC Error Detection Algorithms," Version 3.00, Sep. 24, 1996, 19 pages.
"A Short Spiel on 1-Bit Error Correction Codes," printed Aug. 11, 2001, 2 pages.
"Before you start-Chapter 6: Break and Verify," (C) Laurence Soucy 1997/2001, 1 page.
"Cyclic Redundancy Check," Aug. 2, 1997, 1 pages.
"Memory Tutorial," 1998, 3 pages.
Anarchriz, "CRC and How to Reverse it: A CRC Tutorial & The c001 Way to Reverse CRC," Apr. 1999, 14 pages.
Anastase Nakassis, "Fletcher's Error Detection Algorithm: How to Implement it Efficiently and How to Avoid the Most Common Pitfalls," ACM SIGCOMM Computer Communication Review, vol. 18, Issue 5, Oct. 1998, pp. 63-88.
Anh Vu, "Error Detection and Error Correction," Oct. 27, 1997, 7 pages.
Craig Partridge, et al., "Performance of Checksums and CRCs over Real Data," 1995 ACM, pp. 68-76.
Eric Smith, "Memory Error Detection and Correction: An Introduction to Parity and ECC," Aug. 23, 1996, 2 pages.
J. Zweig, et al., "TCP Altenate Checksum Options," Mar. 1990, 5 pages.
Jonathan Stone, et al., Performance of Checksums and CRC's over Real Data, 1998 IEEE, ACM Transaction on Networking, vol. 6, No. 5, pp. 529-543.
Rick Kelly, "Error Management Using DesignWare CRC and ECC," 2001 Synopsys, Inc., 5 pages.
Satran, Sheinwald Informational, "iSCSI CRC Considerations," May 7, 2001, 18 pages.
Timothy Hsu, "How to Select System Memory," Jan. 9, 1999, 13 pages.
Wicker, Stephen B. Error Control Ststems for Digital Communication and Storage, Prentice Hall, pp. 100-123. *
ZDNet UK, "Parity RAM and ECC," PC Magazine Guide to Servers, 1999, 1 page.

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282743A1 (en) * 2005-05-24 2006-12-14 Michael Kounavis Instructions for performing modulo-2 multiplication and bit reflection
US7590930B2 (en) * 2005-05-24 2009-09-15 Intel Corporation Instructions for performing modulo-2 multiplication and bit reflection
US9495165B2 (en) 2009-12-17 2016-11-15 Intel Corporation Method and apparatus for performing a shift and exclusive or operation in a single instruction
US9495166B2 (en) 2009-12-17 2016-11-15 Intel Corporation Method and apparatus for performing a shift and exclusive or operation in a single instruction
US9501281B2 (en) 2009-12-17 2016-11-22 Intel Corporation Method and apparatus for performing a shift and exclusive or operation in a single instruction
US9747105B2 (en) 2009-12-17 2017-08-29 Intel Corporation Method and apparatus for performing a shift and exclusive or operation in a single instruction
US10684855B2 (en) 2009-12-17 2020-06-16 Intel Corporation Method and apparatus for performing a shift and exclusive or operation in a single instruction
US10243583B2 (en) 2017-06-16 2019-03-26 Western Digital Technologies, Inc. CPU error remediation during erasure code encoding

Also Published As

Publication number Publication date
US20040054956A1 (en) 2004-03-18

Similar Documents

Publication Publication Date Title
US10802910B2 (en) System for identifying and correcting data errors
US10140179B2 (en) Combined group ECC protection and subgroup parity protection
US8010875B2 (en) Error correcting code with chip kill capability and power saving enhancement
US7761780B2 (en) Method, apparatus, and system for protecting memory
US7296213B2 (en) Error correction cache for flash memory
US10761928B2 (en) Combined secure mac and device correction using encrypted parity with multi-key domains
US7797612B2 (en) Storage accelerator
US20220004454A1 (en) Enhanced checksum system
US20070268905A1 (en) Non-volatile memory error correction system and method
US20030037280A1 (en) Computer memory error management system and method
US7231585B2 (en) Error correction for flash memory
US8219880B2 (en) Combined single error correction/device kill detection code
US9208027B2 (en) Address error detection
US7587658B1 (en) ECC encoding for uncorrectable errors
US8250435B2 (en) Memory error detection and/or correction
US8707133B2 (en) Method and apparatus to reduce a quantity of error detection/correction bits in memory coupled to a data-protected processor port
US7114116B2 (en) Accelerated Galois data integrity crosscheck system and method
US6742146B2 (en) Techniques for providing data within a data storage system
US20220413959A1 (en) Systems and methods for multi-use error correcting codes
CN111247516A (en) Circuit structure, system-on-chip (SoC) and data processing method
US11416336B2 (en) Managing parity data associated with configuration register data
US20020138804A1 (en) Computation of checksums and other functions with the aid of software instructions
Chen et al. Reliability-availability-serviceability characteristics of a compressed-memory system
JPH04237351A (en) Memory rewriting system
JPS63279348A (en) Check system for memory

Legal Events

Date Code Title Description
AS Assignment

Owner name: SUN MICROSYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BYRD, JAMES;REEL/FRAME:013303/0592

Effective date: 20020911

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: ORACLE AMERICA, INC., CALIFORNIA

Free format text: MERGER AND CHANGE OF NAME;ASSIGNORS:ORACLE USA, INC.;SUN MICROSYSTEMS, INC.;ORACLE AMERICA, INC.;REEL/FRAME:037302/0616

Effective date: 20100212

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553)

Year of fee payment: 12