WO2009082598A1 - Determining a message residue - Google Patents

Determining a message residue Download PDF

Info

Publication number
WO2009082598A1
WO2009082598A1 PCT/US2008/085284 US2008085284W WO2009082598A1 WO 2009082598 A1 WO2009082598 A1 WO 2009082598A1 US 2008085284 W US2008085284 W US 2008085284W WO 2009082598 A1 WO2009082598 A1 WO 2009082598A1
Authority
WO
WIPO (PCT)
Prior art keywords
segments
message
segment
polynomial
significant
Prior art date
Application number
PCT/US2008/085284
Other languages
French (fr)
Inventor
Vinodh Gopal
Michael Kounavis
Gilbert Wolrich
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/959,142 external-priority patent/US7886214B2/en
Application filed by Intel Corporation filed Critical Intel Corporation
Publication of WO2009082598A1 publication Critical patent/WO2009082598A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/724Finite field arithmetic
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/09Error detection only, e.g. using cyclic redundancy check [CRC] codes or single parity bit
    • H03M13/091Parallel or block-wise CRC computation

Definitions

  • a variety of computer applications operate on messages to create a message residue.
  • the residue can represent message contents much more compactly.
  • message residues are frequently used to determine whether data transmitted over network connections or retrieved from a storage device may have been corrupted. For instance, a noisy transmission line may change a "1 " signal to a "0", or vice versa.
  • a message is often accompanied by its message residue.
  • a receiver of the data can then independently determine a residue for the message and compare the determined residue with the received residue.
  • a common message residue is known as a Cyclic Redundancy Check (CRC).
  • CRC Cyclic Redundancy Check
  • a CRC computation is based on interpreting a stream of message bits as coefficients of a polynomial. For example, a message of "1010" corresponds to a polynomial of (1 x 3 ) + (0 x 2 ) + (1 x 1 ) + (0 x°) or, more simply, x 3 + x 1 .
  • the message polynomial is divided by another polynomial known as the modulus. For example, the other polynomial may be "11 " or x + 1.
  • a CRC is the remainder of a division of the message by the polynomial.
  • CRC polynomial division is somewhat different than ordinary division in that it is computed over the finite field GF(2) (i.e., the set of integers modulo 2). More simply put: even number coefficients become zeroes and odd number coefficients become ones.
  • FIG. 1 is a diagram illustrating repeated reduction of data segments used to represent a message.
  • FIG. 2 is a diagram illustrating a reduction constant.
  • FIG. 3 is a flow chart illustrating determination of a message residue.
  • FIG. 4 is a diagram illustrating repeated reduction of message segments used to represent a message.
  • FIG. 5 is a flow-chart of a process to determine a message residue.
  • FIG. 6 is code illustrating reduction of message segments used to represent a message.
  • FIG. 7 is a sample execution schedule of instructions to reduce message segments used to represent a message.
  • FIG. 8 is a sample execution schedule of instructions to reduce message segments used to represent a message.
  • FIG. 9 is a diagram of a sample binary polynomial multiplication circuit.
  • FIG. 10 is a diagram illustrating a byte shuffle operation to re-order message bits in order of significance.
  • FIG. 11 is a diagram illustrating segment reduction of a message having bits arranged in little-endian format.
  • FIG. 12 is a diagram illustrating a relationship between multiplication of operands and multiplication of bit reflected operands.
  • FIG. 13 is code illustrating reduction of message segments having bits arranged in big-endian format.
  • FIG. 1 depicts a message S 100.
  • the bits of the message S can be handled as the coefficients of a polynomial S(x).
  • a 32-bit CRC of S can be defined as:
  • g is a 33-bit polynomial.
  • Different values of g have been defined for a variety of applications. For example, iSCSI (Internet Small Computer System Interface) defines a value of 11 EDC6F411 6 for g. Other applications have selected different polynomial sizes and values.
  • the resulting CRC value is stored with the message, S, in the empty 32-least significant bits created by the 2 32 shift. Alternately, the value may be stored in a packet field or other location. Recomputing the CRC value and comparing with the stored value can indicate whether data was corrupted.
  • S can be represented as a series of n-bit segments 100a, 100b, 100c, 100d.
  • the following describes techniques that successively "fold" the most significant segment into remaining segments, repeatedly reducing the amount of data used to represent S by one segment.
  • implementations of the folding operation are comparatively inexpensive in terms of computation and die impact.
  • FIG. 1 depicts the first two 108a, 108b of a series of repeated folding operations that operate on a subset of segments.
  • each subset features the three most significant remaining segments representing S.
  • the initial folding operation 108a operates on segments 100a, 100b, 100c.
  • the operation 108a includes a polynomial multiplication 102a of the most significant segment 100a by a pre-computed constant k.
  • the result of this multiplication 102a is XOR-ed with the values of the least significant segments of the subset to yield segments 104a, 104b.
  • the values of these segments 104a, 104b preserve the contribution of the most significant segment 100a to the final determination of the residue for S.
  • segment 100a can be discarded or ignored for residue purposes.
  • the segments 104a, 104b output from the first folding operation 108a can be combined with the next segment of S 100d to form a new subset of segments.
  • the same folding operation 108b proceeds, folding the most significant segment 104a into segments 104b, 100d by a multiplication 102b of segment 104a by constant k and an XOR-ing of the result 102b with segments 104b, 100d to yield segments 106a, 106b.
  • This process of folding can repeat as desired to linearly reduce the data representing S by one segment for each folding operation.
  • the reduction may be repeated any number of times.
  • folding may continue until only two segments remain.
  • the residue of the final two segments e.g., remaining-segments mod g
  • the folding operation shown in FIG. 1 uses a pre-computed constant k to speed computations.
  • n is the number of bits in a segment.
  • A e.g., A • x 2n
  • the technique is well suited to execution on processors that have a Galois-field (carry-less) multiplier, though such hardware is not an implementation requirement.
  • k can be pre-computed and stored prior to processing of the data values of S. Potentially, different values of k can be pre-computed and stored for different values of n and g. Such an implementation can quickly switch between polynomials by a lookup of k based on g and/or n.
  • FIG. 3 illustrates a sample process for computing a message residue.
  • the initial data segments 154 are reduced 156 to a set of fewer but equivalent data segments with respect to the residue. If additional segments remain 158, reducing 156 continues with the next segment 1600. Otherwise, a residue value is determined for the final set of segments.
  • a variety of approaches can determine the final residue value such as a lookup table or a wide variety of algorithms to determine a modular remainder.
  • L is an operation returning the b-least-significant-bits of data;
  • g * is the b-least-significant bits of polynomial g,
  • M is an operation returning the b-most-significant-bits
  • q+ is the quotient of (2 2n / g).
  • this approach can use pre-computed values for g * and q+ to speed computation.
  • FIG. 4 depicts an alternate implementation.
  • S can again be represented as a series of n-bit segments, labeled S[ ],100a-100z.
  • the determination of the different values (e.g., 102, 104) to fold into the remaining segments (e.g., 100c, 100d) can be independently computed, potentially permitting parallel computation of these values.
  • folding multiple segments 100a, 100b at each iteration can greatly decrease computation time.
  • FIG. 4 depicts the first two 108a, 108b of a series of repeated folding operations that operate on a subset of segments.
  • each subset features the four most significant remaining segments representing S.
  • the initial folding operation 108a operates on segments 100a, 100b, 100c, 100d.
  • the operation 108a includes binary polynomial multiplication 102a of the most significant segment 100a by a first pre- computed constant K2 and a binary polynomial multiplication 104 of the next most significant segment 100b by a second pre-computed constant K1.
  • the results of these multiplications 102, 104 are folded into the remaining message segments by XORing the results with the values of the least significant segments of the subset 100c, 10Od to yield segments 106a, 106b.
  • the values of these segments 106a, 106b preserve the contribution of the most significant segments 100a, 100b to the final determination of the residue for S. Thereafter, segments 100a, 100b can be discarded or ignored for residue purposes.
  • the segments 106a, 106b output from the first set of folding operations 108a can be combined with the next segments of S 100e, 100f to form a new subset of segments 106a, 106b, 100e, 1001
  • a set of folding operation 108b proceeds, folding the most significant segments 106a, 106b into segments 100e, 100f by a multiplication of segment 106a by constant K2, a multiplication of segment 106b by a constant K1 , and XOR-ing the results with segments 100e, 100f to yield segments 110 (labeled S").
  • This process of folding can repeat as desired to reduce the data representing S by two segments for each iteration.
  • the reduction may be repeated any number of times.
  • folding may continue until only two segments remain.
  • the residue of the final two segments e.g., remaining- segments mod g
  • the folding operations shown in FIG. 4 use constants K1 and K2 which can be pre- computed prior to accessing any segments of S. These constants may be determined as:
  • x and y demarcate the boundary of the most significant segments folded into the remaining segments.
  • y represents the ending bit position of segment 100b within the subset of segments 100a-10Od
  • x represents the ending bit position of segment 100a within the subset 100a-100d.
  • the values of x and y may vary in different implementations and architectures. For instance, for an implementation that operates on subsets of 64-bit segments, y may have a value of 128 and x may have a value of 192. However, in another 64-bit segment implementation, y may have a value of 96 and x may have a value of 160. For an implementation using 32-bit segments, x may have a value of 64 and y may have a value of 96.
  • a set of K values (e.g., K1 , K2,...) can be pre-computed and stored prior to processing of the data values of S. Potentially, different sets of K can be pre-computed and stored for different values of g, x, and/or y. Such an implementation can quickly switch between polynomials by a lookup of a set of K values based on the particular computational setup.
  • folding operations 108a, 108b that operate on 4-segment subsets
  • other implementations may operate on a different number of segments. For example, instead of 4:2 segment reduction another implementation may feature a different segment reduction ratio. For instance, another implementation may fold more than two segments per iteration 108.
  • the techniques described above can be used in other operations that determine a message residue or perform a modular reduction of a polynomial.
  • FIG. 5 illustrates a sample process for computing a message residue.
  • the initial data segments 204 are reduced 206 to a set of fewer but equivalent data segments with respect to the residue by folding multiple segments into the remaining message at a time. If additional segments remain 208, reducing 206 continues with each reducing folding multiple segments into remaining ones. As shown, a final residue value 210 can be determined for the final set of remaining segments.
  • FIG. 6 illustrates a sample implementation 300 of the reduction techniques described above.
  • the inner loop of code 300 includes a sequence of 7- instructions.
  • the high order bits of R1 and R2 initially store the value of the first segment
  • the low order bits of R1 and R2 initially store the value of the second segment
  • a register, K stores both K2 and K1 in the respective high and low order bits of a register.
  • BMUL binary polynomial multiplication instruction
  • the instruction can operate on either the high order and/or low order half of the operands as indicated by the notation ".high” or ".low".
  • BMUL R1 , K
  • low multiplies the lower half of R1 by the lower half of the register K (e.g., K1 ).
  • each segment is 32-bits and each register (e.g.
  • RO, R1 , R2) is 64-bits.
  • K 32-bit segments and constant values
  • Other implementations can feature different sized segments and constants and may be tailored for other architectures.
  • FIG. 7 illustrates an instruction execution schedule 400 that operates on two different messages ("a", "b") in parallel.
  • the schedule 400 shown identifies the cycle/processor execution port of the instructions 300 of the sample implementation of FIG. 6.
  • inner-loop instructions 1 and 2 are executed for message "a" (labeled 1a, 2a)
  • instruction 3 is executed for message "a” (labeled 1a)
  • instructions 1 and 2 are executed for message "b" (labeled 1 b, 2b).
  • the sample schedule 400 shown in FIG. 7 reflects a processor architecture featuring multiple ports of execution, though only a single port (portO) implements or is used to execute the binary polynomial multiplication instruction.
  • the schedule 400 also reflects an architecture where the binary polynomial multiplication instruction is a 1 -cycle operation.
  • each iteration of the folding process consumes 7-cycles for the inner loop and uses 4-registers (RO, R1 , R2, and a register to store K1 and K2).
  • RO, R1 , R2, and a register to store K1 and K2 As shown, 6 8-byte blocks are processed after 17 cycles, averaging close to 3-cycles per 8-byte block. Continued operation approaches an average of 2.5 cycles per block.
  • FIG. 8 depicts another schedule 500 where the binary polynomial multiplication instruction is assumed to take 3-cycles instead of 1 -cycle. While this may increase the total number of instruction cycles used to perform the reduction and may increase the number of registers used, increasing the number of messages simultaneously processed can mask the increase in instruction execution time and nevertheless attain similar performance to the 1 -cycle binary polynomial multiplication instruction.
  • the schedule 500 of FIG. 8 processes 4- messages ("a", "b", "c", and "d") simultaneously and, again, achieves close to 2.5 cycles per 8-byte block.
  • Other schedules are possible for different architectures and different instructions used to implement the folding operations than those described above. For example, a processor implementing the binary polynomial multiplication instruction on multiple processor execution ports can initiate execution of instructions 3 and 4 in the same cycle.
  • FIG. 9 depicts a sample of circuitry that may be used to implement the binary polynomial multiplication instruction.
  • the instruction operates on two operands, srd and src2.
  • the circuitry selects 604, 606 the specified High ("H") or Low (“L") order bits of the operands to supply to the multiplier 616.
  • the circuitry can also perform XOR-ing 616, 614 of the High and Low portions of either operand if imm2 is set.
  • the multiplier 616 stores output to the specified destination register 618.
  • FIG. 10 illustrates message bytes 700a ordered from low-order byte (byte-O) to high (byte-15) where the high order byte represents the next sequential byte of a message.
  • each byte within the message has a little-endian format 702a.
  • the sequence of bits within the message 700a are not arranged in a linearly ascending or descending order of significance.
  • This "saw-tooth" arrangement of bits where the byte arrangement is the opposite of the bit endian-ness within a byte, may make computations more difficult.
  • a byte shuffle operation may be used. For example, as shown in FIG. 10, shuffling bytes of message 700a yields message 700b. Since both the bytes of message 700b and the bits within the bytes of message 700b are arranged in the same sequence of significance (high to low), the bits of message 700a linearly descend from high to low significance.
  • pseudo-code 704 features shuffle byte (PSHUFB) instructions 706, 708 that reorder bytes of a message portion to create a linear descending arrangement of bits.
  • FIG. 11 depicts another scenario where the bytes 902 feature a big-endian format and the bytes are arranged from low to high significance.
  • FIG. 12 illustrates a technique that permits circuitry operating on the "reversed" message to proceed without time consuming bit or byte rearrangement.
  • the constants used can be computed to properly operate on the reversed message, obviating the need for bit-reflections or byte shuffles.
  • FIG. 12 shows two four-bit numbers a and b.
  • FIG. 12 also depicts polynomial multiplication of bit-reflections of a and b, labeled a r and b r , to yield the bit-values of byte c/.
  • c/[6:0] bitreflect(c[6:0]).
  • d « 1 bitreflect(c).
  • Circuitry can be constructed based on this relationship to permit processing of the "reverse" bit-ordering shown in FIG. 11 without bit reflections or byte shufflings.
  • the procedure can operate on the bits, as is, by reflecting the constants, K, and shifting them to the left by one.
  • code 1000 performs a bit-reflection and shift of each K value to yield the rK values. This can be a one-time operation performed before processing of a message.
  • the procedures 1000 uses the rK values in a polynomial multiplication operation with message segments similar to that shown in FIG. 6, though with constant values that reflect the bit-ordering.
  • the ability to operate on a reflected message may be used in other modular reduction schemes than that described above.
  • a reflected and shifted constant may be used in a scheme that folds a single segment per iteration instead of multiple segments.
  • the more general technique of processing a reflected bit sequence with binary polynomial multiplication of a bitreflected/shifted value may be used in a wide variety of applications.
  • the techniques described above can be implemented in a wide variety of logic.
  • the techniques may be implemented as circuitry (e.g., a processor element (e.g., a CPU (Central Processing Unit) or processor core) that executed program instructions disposed on a computer readable storage medium. While the logic was illustrated as programmatic instructions executed by processor circuitry, the techniques may be implemented in dedicated digital or analog circuitry (e.g., expressed in a hardware description language such as Verilog(tm)), firmware, and/or as an ASIC (Application Specific Integrated Circuit) or Programmable Gate Array (PGA).
  • a processor element e.g., a CPU (Central Processing Unit) or processor core
  • the logic was illustrated as programmatic instructions executed by processor circuitry
  • the techniques may be implemented in dedicated digital or analog circuitry (e.g., expressed in a hardware description language such as Verilog(tm)), firmware, and/or as an ASIC (Application Specific Integrated Circuit) or Programmable Gate Array (PGA).
  • ASIC Application Specific
  • the circuitry may feature a combination of dedicated hardware and programmable execution (e.g., a processor that implements a program that uses a Galois-field (carry-less) multiplier or multiplier configured to handle carry-less multiplication.
  • dedicated hardware e.g., a processor that implements a program that uses a Galois-field (carry-less) multiplier or multiplier configured to handle carry-less multiplication.
  • programmable execution e.g., a processor that implements a program that uses a Galois-field (carry-less) multiplier or multiplier configured to handle carry-less multiplication.
  • the logic may be integrated within a discrete component such as a NIC (Network Interface Controller), framer, offload engine, storage processor, and so forth.
  • a component may verify a CRC value included with the PDU. Alternately, for egress packets, the component may generate a CRC value for including in the PDU.
  • Such components may include a PHY coupled to a MAC (media access controller) (e.g., an Ethernet
  • MAC MAC
  • these techniques may be implemented in blade circuitry for insertion into a chassis backplane.

Abstract

A description of techniques of determining a modular remainder with respect to a polynomial of a message comprised of a series of segments. An implementation can include repeatedly accessing a strict subset of the segments and transforming the strict subset of segments to into a smaller set of segments that are equivalent to the strict subset of the segments with respect to the modular remainder. The implementation can also include determining the modular remainder based on a set of segments output by the repeatedly accessing and transforming and storing the determined modular remainder.

Description

DETERMINING A MESSAGE RESIDUE
BACKGROUND
[0001] A variety of computer applications operate on messages to create a message residue. The residue can represent message contents much more compactly. Among other uses, message residues are frequently used to determine whether data transmitted over network connections or retrieved from a storage device may have been corrupted. For instance, a noisy transmission line may change a "1 " signal to a "0", or vice versa. To detect corruption, a message is often accompanied by its message residue. A receiver of the data can then independently determine a residue for the message and compare the determined residue with the received residue.
[0002] A common message residue is known as a Cyclic Redundancy Check (CRC). A CRC computation is based on interpreting a stream of message bits as coefficients of a polynomial. For example, a message of "1010" corresponds to a polynomial of (1 x3) + (0 x2 ) + (1 x1 ) + (0 x°) or, more simply, x3 + x1. The message polynomial is divided by another polynomial known as the modulus. For example, the other polynomial may be "11 " or x + 1. A CRC is the remainder of a division of the message by the polynomial. CRC polynomial division, however, is somewhat different than ordinary division in that it is computed over the finite field GF(2) (i.e., the set of integers modulo 2). More simply put: even number coefficients become zeroes and odd number coefficients become ones.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] FIG. 1 is a diagram illustrating repeated reduction of data segments used to represent a message.
[0004] FIG. 2 is a diagram illustrating a reduction constant. [0005] FIG. 3 is a flow chart illustrating determination of a message residue. [0006] FIG. 4 is a diagram illustrating repeated reduction of message segments used to represent a message.
[0007] FIG. 5 is a flow-chart of a process to determine a message residue. [0008] FIG. 6 is code illustrating reduction of message segments used to represent a message. [0009] FIG. 7 is a sample execution schedule of instructions to reduce message segments used to represent a message.
[0010] FIG. 8 is a sample execution schedule of instructions to reduce message segments used to represent a message.
[0011] FIG. 9 is a diagram of a sample binary polynomial multiplication circuit.
[0012] FIG. 10 is a diagram illustrating a byte shuffle operation to re-order message bits in order of significance.
[0013] FIG. 11 is a diagram illustrating segment reduction of a message having bits arranged in little-endian format.
[0014] FIG. 12 is a diagram illustrating a relationship between multiplication of operands and multiplication of bit reflected operands.
[0015] FIG. 13 is code illustrating reduction of message segments having bits arranged in big-endian format.
DETAILED DESCRIPTION
[0016] Message residues are commonly used in many different protocols to protect data integrity. Computing these residues, however, imposes significant computational overhead. The following describes techniques that repeatedly whittle a message into a smaller set of equivalent data using fast, inexpensive operations. The resulting, smaller, set of data can then be processed, for example, in a conventional manner to arrive at a final message residue value. In other words, the more burdensome task of determining an exact value of a message residue is postponed until a message is reduced to a smaller size that retains the mathematical characteristics of the original message with respect to the residue. [0017] In greater detail, FIG. 1 depicts a message S 100. To determine a message residue, the bits of the message S can be handled as the coefficients of a polynomial S(x). For example, a 32-bit CRC of S can be defined as:
CRC = S • 232 mod g
[1 ] where g is a 33-bit polynomial. Different values of g have been defined for a variety of applications. For example, iSCSI (Internet Small Computer System Interface) defines a value of 11 EDC6F4116 for g. Other applications have selected different polynomial sizes and values. Typically, the resulting CRC value is stored with the message, S, in the empty 32-least significant bits created by the 232 shift. Alternately, the value may be stored in a packet field or other location. Recomputing the CRC value and comparing with the stored value can indicate whether data was corrupted.
[0018] As shown, S can be represented as a series of n-bit segments 100a, 100b, 100c, 100d. The following describes techniques that successively "fold" the most significant segment into remaining segments, repeatedly reducing the amount of data used to represent S by one segment. As described below, implementations of the folding operation are comparatively inexpensive in terms of computation and die impact.
[0019] FIG. 1 depicts the first two 108a, 108b of a series of repeated folding operations that operate on a subset of segments. In the example shown, each subset features the three most significant remaining segments representing S. For example, the initial folding operation 108a operates on segments 100a, 100b, 100c. As shown, the operation 108a includes a polynomial multiplication 102a of the most significant segment 100a by a pre-computed constant k. The result of this multiplication 102a is XOR-ed with the values of the least significant segments of the subset to yield segments 104a, 104b. The values of these segments 104a, 104b preserve the contribution of the most significant segment 100a to the final determination of the residue for S. Thus, segment 100a can be discarded or ignored for residue purposes.
[0020] As shown, the segments 104a, 104b output from the first folding operation 108a can be combined with the next segment of S 100d to form a new subset of segments. Again, the same folding operation 108b proceeds, folding the most significant segment 104a into segments 104b, 100d by a multiplication 102b of segment 104a by constant k and an XOR-ing of the result 102b with segments 104b, 100d to yield segments 106a, 106b.
[0021] This process of folding can repeat as desired to linearly reduce the data representing S by one segment for each folding operation. The reduction may be repeated any number of times. For example, in the sample implementation shown, folding may continue until only two segments remain. The residue of the final two segments (e.g., remaining-segments mod g) can be determined in a variety of ways such as described below in conjunction with FIG. 3. [0022] The folding operation shown in FIG. 1 uses a pre-computed constant k to speed computations. As shown in FIG. 2, the constant k may be determined as: k = 22n mod g
[2] where n is the number of bits in a segment. The contribution of A (e.g., A • x2n) to the message residue can, thus, be expressed as A * k (e.g., 102a, 102b in FIG. 1 ). The technique is well suited to execution on processors that have a Galois-field (carry-less) multiplier, though such hardware is not an implementation requirement. [0023] Since the polynomial g and n, the number of bits in each segment, are constants, k can be pre-computed and stored prior to processing of the data values of S. Potentially, different values of k can be pre-computed and stored for different values of n and g. Such an implementation can quickly switch between polynomials by a lookup of k based on g and/or n.
[0024] The specific examples described above illustrated determination of a 32-bit CRC polynomial. However, the techniques describe above work for arbitrary segment sizes. For example, n can be set as a number of bits equal to the (width of g) -1. Additionally, while the above described folding operations 108a, 108b that operate on 3-segments subsets, other implementations may operate on a different number of segments. For example, instead of 3:2 segment reduction another implementation may feature a 4:3 segment reduction and so forth. Additionally, while the above describes a CRC message residue, the techniques described above can be used in other operations that determine a message residue. [0025] FIG. 3 illustrates a sample process for computing a message residue. As shown, after computing k 152, the initial data segments 154 are reduced 156 to a set of fewer but equivalent data segments with respect to the residue. If additional segments remain 158, reducing 156 continues with the next segment 1600. Otherwise, a residue value is determined for the final set of segments. [0026] A variety of approaches can determine the final residue value such as a lookup table or a wide variety of algorithms to determine a modular remainder. For example, the final segments can be processed using an approach that implements the division process as multiplication. For instance, a n-bit remainder can be expressed as: rn = L (g* . (M (s • q+))
[3] where
L is an operation returning the b-least-significant-bits of data; g* is the b-least-significant bits of polynomial g,
M is an operation returning the b-most-significant-bits, and q+ is the quotient of (22n / g).
Like the folding techniques described above, this approach can use pre-computed values for g* and q+ to speed computation.
[0027] FIG. 4 depicts an alternate implementation. As shown, S can again be represented as a series of n-bit segments, labeled S[ ],100a-100z. In this implementation, the determination of the different values (e.g., 102, 104) to fold into the remaining segments (e.g., 100c, 100d), can be independently computed, potentially permitting parallel computation of these values. In comparison with an approach that successively folds a single segment 100a at a time, folding multiple segments 100a, 100b at each iteration can greatly decrease computation time. [0028] In greater detail, FIG. 4 depicts the first two 108a, 108b of a series of repeated folding operations that operate on a subset of segments. In the example shown, each subset features the four most significant remaining segments representing S. For example, the initial folding operation 108a operates on segments 100a, 100b, 100c, 100d. As shown, the operation 108a includes binary polynomial multiplication 102a of the most significant segment 100a by a first pre- computed constant K2 and a binary polynomial multiplication 104 of the next most significant segment 100b by a second pre-computed constant K1. The results of these multiplications 102, 104 are folded into the remaining message segments by XORing the results with the values of the least significant segments of the subset 100c, 10Od to yield segments 106a, 106b. The values of these segments 106a, 106b preserve the contribution of the most significant segments 100a, 100b to the final determination of the residue for S. Thereafter, segments 100a, 100b can be discarded or ignored for residue purposes.
[0029] As shown, the segments 106a, 106b output from the first set of folding operations 108a can be combined with the next segments of S 100e, 100f to form a new subset of segments 106a, 106b, 100e, 1001 Again, a set of folding operation 108b proceeds, folding the most significant segments 106a, 106b into segments 100e, 100f by a multiplication of segment 106a by constant K2, a multiplication of segment 106b by a constant K1 , and XOR-ing the results with segments 100e, 100f to yield segments 110 (labeled S").
[0030] This process of folding can repeat as desired to reduce the data representing S by two segments for each iteration. The reduction may be repeated any number of times. For example, in the sample implementation shown, folding may continue until only two segments remain. The residue of the final two segments (e.g., remaining- segments mod g) can be determined in a variety of ways as described below. The folding operations shown in FIG. 4 use constants K1 and K2 which can be pre- computed prior to accessing any segments of S. These constants may be determined as:
K1 = 2y mod g
K2 = 2X mod g where x and y demarcate the boundary of the most significant segments folded into the remaining segments. For example, as shown in FIG. 4, y represents the ending bit position of segment 100b within the subset of segments 100a-10Od and x represents the ending bit position of segment 100a within the subset 100a-100d. The values of x and y may vary in different implementations and architectures. For instance, for an implementation that operates on subsets of 64-bit segments, y may have a value of 128 and x may have a value of 192. However, in another 64-bit segment implementation, y may have a value of 96 and x may have a value of 160. For an implementation using 32-bit segments, x may have a value of 64 and y may have a value of 96.
[0031] The contribution of a given segment (e.g., 100a, 100b) to the message residue can, thus, be expressed as S[ ] • Kn. This technique described is well suited to execution on processors that have a Galois-field (carry-less) multiplier, though such hardware is not an implementation requirement.
[0032] Since g, x, and y, are constants, a set of K values (e.g., K1 , K2,...) can be pre-computed and stored prior to processing of the data values of S. Potentially, different sets of K can be pre-computed and stored for different values of g, x, and/or y. Such an implementation can quickly switch between polynomials by a lookup of a set of K values based on the particular computational setup. [0033] Specific examples described above illustrated determination of a 32-bit CRC polynomial however the techniques described above may be applied to different sized remainders and/or polynomials. The techniques described above work also may work for arbitrary and/or non-uniform segment sizes. Additionally, while the above described folding operations 108a, 108b that operate on 4-segment subsets, other implementations may operate on a different number of segments. For example, instead of 4:2 segment reduction another implementation may feature a different segment reduction ratio. For instance, another implementation may fold more than two segments per iteration 108. Further, while the above describes a CRC message residue, the techniques described above can be used in other operations that determine a message residue or perform a modular reduction of a polynomial.
[0034] FIG. 5 illustrates a sample process for computing a message residue. As shown, after computing K values 202, the initial data segments 204 are reduced 206 to a set of fewer but equivalent data segments with respect to the residue by folding multiple segments into the remaining message at a time. If additional segments remain 208, reducing 206 continues with each reducing folding multiple segments into remaining ones. As shown, a final residue value 210 can be determined for the final set of remaining segments.
[0035] FIG. 6 illustrates a sample implementation 300 of the reduction techniques described above. As shown, the inner loop of code 300 includes a sequence of 7- instructions. In this example, the high order bits of R1 and R2 initially store the value of the first segment, the low order bits of R1 and R2 initially store the value of the second segment, and a register, K, stores both K2 and K1 in the respective high and low order bits of a register.
[0036] Two of the operations (operation 3 and 4) use a binary polynomial multiplication instruction (labeled "BMUL") that performs a polynomial multiplication over GF(2) of specified portions of the operands. That is, the instruction can operate on either the high order and/or low order half of the operands as indicated by the notation ".high" or ".low". For example, BMUL (R1 , K). low multiplies the lower half of R1 by the lower half of the register K (e.g., K1 ). A sample implementation of circuitry to implement this instruction is described below in conjunction with FIG. 9. [0037] In this particular implementation, each segment is 32-bits and each register (e.g. RO, R1 , R2) is 64-bits. For architectures using a 64-bit datapath and/or 64/128- bit register files, the use of 32-bit segments and constant values, K, can reduce the need to shift or mask registers to access segment and constant values during folding operations. Other implementations, however, can feature different sized segments and constants and may be tailored for other architectures.
[0038] A given set of instructions using the techniques above can be scheduled in a variety of ways to increase overall throughput. For example, FIG. 7 illustrates an instruction execution schedule 400 that operates on two different messages ("a", "b") in parallel. The schedule 400 shown identifies the cycle/processor execution port of the instructions 300 of the sample implementation of FIG. 6. For example, in cycle 1 , inner-loop instructions 1 and 2 are executed for message "a" (labeled 1a, 2a), while in cycle 2, instruction 3 is executed for message "a" (labeled 1a) and instructions 1 and 2 are executed for message "b" (labeled 1 b, 2b).
[0039] The sample schedule 400 shown in FIG. 7 reflects a processor architecture featuring multiple ports of execution, though only a single port (portO) implements or is used to execute the binary polynomial multiplication instruction. The schedule 400 also reflects an architecture where the binary polynomial multiplication instruction is a 1 -cycle operation. In this sample schedule 400, each iteration of the folding process consumes 7-cycles for the inner loop and uses 4-registers (RO, R1 , R2, and a register to store K1 and K2). As shown, 6 8-byte blocks are processed after 17 cycles, averaging close to 3-cycles per 8-byte block. Continued operation approaches an average of 2.5 cycles per block.
[0040] FIG. 8 depicts another schedule 500 where the binary polynomial multiplication instruction is assumed to take 3-cycles instead of 1 -cycle. While this may increase the total number of instruction cycles used to perform the reduction and may increase the number of registers used, increasing the number of messages simultaneously processed can mask the increase in instruction execution time and nevertheless attain similar performance to the 1 -cycle binary polynomial multiplication instruction. For example, the schedule 500 of FIG. 8 processes 4- messages ("a", "b", "c", and "d") simultaneously and, again, achieves close to 2.5 cycles per 8-byte block. [0041] Other schedules are possible for different architectures and different instructions used to implement the folding operations than those described above. For example, a processor implementing the binary polynomial multiplication instruction on multiple processor execution ports can initiate execution of instructions 3 and 4 in the same cycle.
[0042] FIG. 9 depicts a sample of circuitry that may be used to implement the binary polynomial multiplication instruction. In particular, the instruction operates on two operands, srd and src2. Based on the values imm1 and immO, the circuitry selects 604, 606 the specified High ("H") or Low ("L") order bits of the operands to supply to the multiplier 616. The circuitry can also perform XOR-ing 616, 614 of the High and Low portions of either operand if imm2 is set. The multiplier 616 stores output to the specified destination register 618.
Expressed as pseudo-code, the circuitry implements logic for an instruction having a syntax of: dest = BMUL (sourcel , source2) immδ where the immδ modifier is composed of imm2, imm1 , immO bits. The values of these bits instruct the logic to perform the following operations:
Figure imgf000010_0001
[0043] where, in a 64-bit register implementation, low corresponds to bits [31 :0] and high corresponds to bits [63:32]. The circuitry like that shown can reduce shifting and masking used to access segment/constant values. However, a wide variety of other implementation of the binary polynomial multiplication instruction are easily constructed. Additionally, other implementations may feature different circuit components, feature a different syntax, and perform different operations on operands.
[0044] Implementations of techniques described above may also vary based on the byte arrangement and endianess of message bytes. For example, FIG. 10 illustrates message bytes 700a ordered from low-order byte (byte-O) to high (byte-15) where the high order byte represents the next sequential byte of a message. However, as shown in the scenario, each byte within the message has a little-endian format 702a. Thus, the sequence of bits within the message 700a are not arranged in a linearly ascending or descending order of significance. This "saw-tooth" arrangement of bits, where the byte arrangement is the opposite of the bit endian-ness within a byte, may make computations more difficult.
[0045] To organize the bits into a linear sequence, a byte shuffle operation may be used. For example, as shown in FIG. 10, shuffling bytes of message 700a yields message 700b. Since both the bytes of message 700b and the bits within the bytes of message 700b are arranged in the same sequence of significance (high to low), the bits of message 700a linearly descend from high to low significance. As shown, pseudo-code 704 features shuffle byte (PSHUFB) instructions 706, 708 that reorder bytes of a message portion to create a linear descending arrangement of bits. [0046] FIG. 11 depicts another scenario where the bytes 902 feature a big-endian format and the bytes are arranged from low to high significance. While the bits proceed in a linearly increasing order of significance, from the least significant bit to the most significant, the bit ordering is the exact opposite of the arrangement of message 700b in FIG. 10. Instead of rearranging the bits, however, FIG. 12 illustrates a technique that permits circuitry operating on the "reversed" message to proceed without time consuming bit or byte rearrangement. Briefly, due to the bit- agnostic property of polynomial multiplication (i.e., the columns are independent), the constants used can be computed to properly operate on the reversed message, obviating the need for bit-reflections or byte shuffles. [0047] To illustrate, FIG. 12 shows two four-bit numbers a and b. As shown, polynomial multiplication of the bits of a and b yield the bit-values of byte c. FIG. 12 also depicts polynomial multiplication of bit-reflections of a and b, labeled ar and br, to yield the bit-values of byte c/. As shown, the values of c and d, excluding the most-significant 0, mirror each other. That is, Co = αfe, Ci = c/5, C2 = c/4, and C3 = c/3. In other words, c/[6:0] = bitreflect(c[6:0]). Including the zero value of C7 and d7, d « 1 = bitreflect(c). While FIG. 12 features 4-bit numbers, this relationship equally applies to larger numbers. [0048] Circuitry can be constructed based on this relationship to permit processing of the "reverse" bit-ordering shown in FIG. 11 without bit reflections or byte shufflings. In other words, instead of rearranging the bit-sequence of the message, the procedure can operate on the bits, as is, by reflecting the constants, K, and shifting them to the left by one. Thus, as shown in FIG. 13, code 1000 performs a bit-reflection and shift of each K value to yield the rK values. This can be a one-time operation performed before processing of a message. As shown, the procedures 1000 uses the rK values in a polynomial multiplication operation with message segments similar to that shown in FIG. 6, though with constant values that reflect the bit-ordering.
[0049] The ability to operate on a reflected message may be used in other modular reduction schemes than that described above. For example, a reflected and shifted constant may be used in a scheme that folds a single segment per iteration instead of multiple segments. Additionally, the more general technique of processing a reflected bit sequence with binary polynomial multiplication of a bitreflected/shifted value may be used in a wide variety of applications.
[0050] The techniques described above can be implemented in a wide variety of logic. For example, the techniques may be implemented as circuitry (e.g., a processor element (e.g., a CPU (Central Processing Unit) or processor core) that executed program instructions disposed on a computer readable storage medium. While the logic was illustrated as programmatic instructions executed by processor circuitry, the techniques may be implemented in dedicated digital or analog circuitry (e.g., expressed in a hardware description language such as Verilog(tm)), firmware, and/or as an ASIC (Application Specific Integrated Circuit) or Programmable Gate Array (PGA). Alternately, the circuitry may feature a combination of dedicated hardware and programmable execution (e.g., a processor that implements a program that uses a Galois-field (carry-less) multiplier or multiplier configured to handle carry-less multiplication.
[0051] The logic may be integrated within a discrete component such as a NIC (Network Interface Controller), framer, offload engine, storage processor, and so forth. For example, to process a received protocol data unit (PDU), a component may verify a CRC value included with the PDU. Alternately, for egress packets, the component may generate a CRC value for including in the PDU. Such components may include a PHY coupled to a MAC (media access controller) (e.g., an Ethernet
MAC). As another example, these techniques may be implemented in blade circuitry for insertion into a chassis backplane.
[0052] Other embodiments are within the scope of the following claims.
[0053] What is claimed is:

Claims

CLAIMS:
1. A method of determining a modular remainder with respect to a polynomial of a message comprised of a series of segments, comprising: repeatedly: accessing a strict subset of the segments, the strict subset consisting of the most significant segments representing the message; transforming the strict subset of segments into a smaller set of segments that are equivalent to the strict subset of the segments with respect to the modular remainder; and determining the modular remainder based on a set of segments output by the repeatedly accessing and transforming; and storing the determined modular remainder.
2. The method of claim 1 , wherein the smaller set of segments is more than one segment smaller than the strict subset of segments.
3. The method of claim 1 , wherein the modular remainder comprises a CRC (Cyclic Redundancy Check).
4. The method of claim 1 , further comprising performing at least one of the following:
(1 ) comparing the modular remainder to a previously computed modular remainder for the message; and
(2) storing the modular remainder in a packet.
5. The method of claim 1 , wherein the transforming comprises determining based on a multiplication of the most significant active segment by a constant k derived by segment size and the polynomial.
6. The method of claim 5, wherein the transforming comprises a determination based on the least significant of the subset of segments and the multiplication of the most significant segment by a constant derived from the segment size and the polynomial.
7. The method of claim 6, further comprising: accessing a set of precomputed constants based on at least one of: (a) the polynomial and (b) a size of a segment.
8. The method of claim 1 , wherein the message comprises a set of segments, S, of size n where n is an integer; wherein a constant k = 22n mod polynomial; and wherein the transforming comprises XOR-ing polynomial multiplication of the most significant remaining segment of S by k and XOR-ing with the next most significant segments of S.
9. A computer program, disposed on a storage medium, to determine a modular remainder with respect to a polynomial of a message comprised of a series of segments, comprising instructions to cause a processor element to: repeatedly: access a strict subset of the segments, the strict subset consisting of the most significant segments representing the message; transform the strict subset of segments to into a smaller set of segments that are equivalent to the strict subset of the segments with respect to the modular remainder; and determine the modular remainder based on a set of segments output by the repeatedly access and transform; and store the determined modular remainder.
10. The computer program of claim 9, wherein the strict subset of segments consists of three segments and the smaller set of segments consists of two segments.
11. The computer program of claim 9, wherein the modular remainder comprises a CRC (Cyclic Redundancy Check).
12. The computer program of claim 9, further comprising instructions to cause the processor element to perform at least one of the following:
(1 ) compare the modular remainder to a previously computed modular remainder for the message; and
(2) store the modular remainder in a packet.
13. The computer program of claim 9, wherein the instructions to perform transform comprise instructions to determine based on a multiplication of the most significant active segment by a constant k derived by segment size and the polynomial.
14. The computer program of claim 13, wherein the instructions to transform comprise instructions to determine based on the least significant of the subset of segments and the multiplication of the most significant segment by a constant derived from the segment size and the polynomial.
15. The computer program of claim 14, further comprising instructions to: access a set of precomputed constants based on at least one of: (a) the polynomial and (b) a size of a segment.
16. The computer program of claim 9, wherein the message comprises a set of segments, S, of size n where n is an integer; wherein a constant k = 22n mod polynomial; and wherein the instructions to transform comprise instructions to XOR results of polynomial multiplication of the most significant remaining segment of S by k with the next most significant segments of S.
PCT/US2008/085284 2007-12-18 2008-12-02 Determining a message residue WO2009082598A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US11/959,142 2007-12-18
US11/959,142 US7886214B2 (en) 2007-12-18 2007-12-18 Determining a message residue
US12/291,621 2008-11-12
US12/291,621 US8042025B2 (en) 2007-12-18 2008-11-12 Determining a message residue

Publications (1)

Publication Number Publication Date
WO2009082598A1 true WO2009082598A1 (en) 2009-07-02

Family

ID=40754915

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/085284 WO2009082598A1 (en) 2007-12-18 2008-12-02 Determining a message residue

Country Status (2)

Country Link
US (1) US8042025B2 (en)
WO (1) WO2009082598A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7827471B2 (en) 2006-10-12 2010-11-02 Intel Corporation Determining message residue using a set of polynomials
US8042025B2 (en) 2007-12-18 2011-10-18 Intel Corporation Determining a message residue
US8229109B2 (en) 2006-06-27 2012-07-24 Intel Corporation Modular reduction using folding
US8689078B2 (en) 2007-07-13 2014-04-01 Intel Corporation Determining a message residue

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7886214B2 (en) * 2007-12-18 2011-02-08 Intel Corporation Determining a message residue
US8312363B2 (en) * 2008-12-16 2012-11-13 Intel Corporation Residue generation
US8607129B2 (en) 2011-07-01 2013-12-10 Intel Corporation Efficient and scalable cyclic redundancy check circuit using Galois-field arithmetic
US8683296B2 (en) 2011-12-30 2014-03-25 Streamscale, Inc. Accelerated erasure coding system and method
US8914706B2 (en) 2011-12-30 2014-12-16 Streamscale, Inc. Using parity data for concurrent data authentication, correction, compression, and encryption
US20160142073A1 (en) * 2013-06-20 2016-05-19 Telefonaktiebolaget L M Ericsson (Publ) Access Control in a Network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6223320B1 (en) * 1998-02-10 2001-04-24 International Business Machines Corporation Efficient CRC generation utilizing parallel table lookup operations
US20030167440A1 (en) * 2002-02-22 2003-09-04 Cavanna Vicente V. Methods for computing the CRC of a message from the incremental CRCs of composite sub-messages
US6732317B1 (en) * 2000-10-23 2004-05-04 Sun Microsystems, Inc. Apparatus and method for applying multiple CRC generators to CRC calculation
US20050149812A1 (en) * 2003-11-19 2005-07-07 Honeywell International Inc. Message error verification using checking with hidden data

Family Cites Families (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3980874A (en) * 1975-05-09 1976-09-14 Burroughs Corporation Binary to modulo M translation
JP2577914B2 (en) * 1987-06-11 1997-02-05 クラリオン株式会社 m-sequence code generator
FR2622713A1 (en) * 1987-10-30 1989-05-05 Thomson Csf CALCULATION CIRCUIT USING RESIDUAL ARITHMETICS
FR2658932A1 (en) * 1990-02-23 1991-08-30 Koninkl Philips Electronics Nv METHOD OF ENCODING THE RSA METHOD BY A MICROCONTROLLER AND DEVICE USING THE SAME
US5384786A (en) * 1991-04-02 1995-01-24 Cirrus Logic, Inc. Fast and efficient circuit for identifying errors introduced in Reed-Solomon codewords
US5274707A (en) * 1991-12-06 1993-12-28 Roger Schlafly Modular exponentiation and reduction device and method
US5363107A (en) * 1993-07-16 1994-11-08 Massachusetts Institute Of Technology Storage and transmission of compressed weather maps and the like
US5642367A (en) * 1994-02-07 1997-06-24 Mitsubishi Semiconductor America, Inc. Finite field polynomial processing module for error control coding
US5671377A (en) * 1994-07-19 1997-09-23 David Sarnoff Research Center, Inc. System for supplying streams of data to multiple users by distributing a data stream to multiple processors and enabling each user to manipulate supplied data stream
US7190681B1 (en) * 1996-07-10 2007-03-13 Wu William W Error coding in asynchronous transfer mode, internet and satellites
US6128766A (en) * 1996-11-12 2000-10-03 Pmc-Sierra Ltd. High speed cyclic redundancy check algorithm
US5942005A (en) * 1997-04-08 1999-08-24 International Business Machines Corporation Method and means for computationally efficient error and erasure correction in linear cyclic codes
JP2001527673A (en) * 1997-05-04 2001-12-25 フォートレス ユー アンド ティー リミティド Apparatus and method for modular multiplication and exponentiation based on Montgomery multiplication
US6484192B1 (en) * 1998-01-29 2002-11-19 Toyo Communication Equipment Co., Ltd. Root finding method and root finding circuit of quadratic polynomial over finite field
CA2267721C (en) * 1998-03-26 2002-07-30 Nippon Telegraph And Telephone Corporation Scheme for fast realization of encryption, decryption and authentication
US6530057B1 (en) * 1999-05-27 2003-03-04 3Com Corporation High speed generation and checking of cyclic redundancy check values
GB2360177B (en) * 2000-03-07 2003-08-06 3Com Corp Fast frame error checker for multiple byte digital data frames
JP3926532B2 (en) * 2000-03-16 2007-06-06 株式会社日立製作所 Information processing apparatus, information processing method, and card member
GB0013355D0 (en) * 2000-06-01 2000-07-26 Tao Group Ltd Parallel modulo arithmetic using bitwise logical operations
US6721771B1 (en) * 2000-08-28 2004-04-13 Sun Microsystems, Inc. Method for efficient modular polynomial division in finite fields f(2{circumflex over ( )}m)
US6609410B2 (en) * 2000-09-29 2003-08-26 Spalding Sports Worldwide, Inc. High strain rate tester for materials used in sports balls
JP2002118471A (en) * 2000-10-06 2002-04-19 Hitachi Ltd Recording and reproducing device, method for correcting and coding error, and method for recording information
JP3785044B2 (en) * 2001-01-22 2006-06-14 株式会社東芝 Power residue calculation device, power residue calculation method, and recording medium
US20020144208A1 (en) * 2001-03-30 2002-10-03 International Business Machines Corporation Systems and methods for enabling computation of CRC' s N-bit at a time
US7027597B1 (en) * 2001-09-18 2006-04-11 Cisco Technologies, Inc. Pre-computation and dual-pass modular arithmetic operation approach to implement encryption protocols efficiently in electronic integrated circuits
US7027598B1 (en) * 2001-09-19 2006-04-11 Cisco Technology, Inc. Residue number system based pre-computation and dual-pass arithmetic modular operation approach to implement encryption protocols efficiently in electronic integrated circuits
US7458006B2 (en) * 2002-02-22 2008-11-25 Avago Technologies General Ip (Singapore) Pte. Ltd. Methods for computing the CRC of a message from the incremental CRCs of composite sub-messages
EP2175559A1 (en) * 2002-04-22 2010-04-14 Fujitsu Limited Error-detecting decoder with re-calculation of a remainder upon partial re-transmission of a data string.
US7512230B2 (en) * 2002-04-30 2009-03-31 She Alfred C Method and apparatus of fast modular reduction
US7461115B2 (en) * 2002-05-01 2008-12-02 Sun Microsystems, Inc. Modular multiplier
US7187770B1 (en) * 2002-07-16 2007-03-06 Cisco Technology, Inc. Method and apparatus for accelerating preliminary operations for cryptographic processing
US7343541B2 (en) * 2003-01-14 2008-03-11 Broadcom Corporation Data integrity in protocol offloading
US7243289B1 (en) * 2003-01-25 2007-07-10 Novell, Inc. Method and system for efficiently computing cyclic redundancy checks
US7058787B2 (en) * 2003-05-05 2006-06-06 Stmicroelectronics S.R.L. Method and circuit for generating memory addresses for a memory buffer
US7373514B2 (en) * 2003-07-23 2008-05-13 Intel Corporation High-performance hashing system
US7543142B2 (en) * 2003-12-19 2009-06-02 Intel Corporation Method and apparatus for performing an authentication after cipher operation in a network processor
US20050149744A1 (en) * 2003-12-29 2005-07-07 Intel Corporation Network processor having cryptographic processing including an authentication buffer
US7171604B2 (en) * 2003-12-30 2007-01-30 Intel Corporation Method and apparatus for calculating cyclic redundancy check (CRC) on data using a programmable CRC engine
US7529924B2 (en) * 2003-12-30 2009-05-05 Intel Corporation Method and apparatus for aligning ciphered data
US7543214B2 (en) * 2004-02-13 2009-06-02 Marvell International Ltd. Method and system for performing CRC
US20060059219A1 (en) * 2004-09-16 2006-03-16 Koshy Kamal J Method and apparatus for performing modular exponentiations
US7590930B2 (en) * 2005-05-24 2009-09-15 Intel Corporation Instructions for performing modulo-2 multiplication and bit reflection
US7707483B2 (en) * 2005-05-25 2010-04-27 Intel Corporation Technique for performing cyclic redundancy code error detection
US20070083585A1 (en) * 2005-07-25 2007-04-12 Elliptic Semiconductor Inc. Karatsuba based multiplier and method
US7958436B2 (en) * 2005-12-23 2011-06-07 Intel Corporation Performing a cyclic redundancy checksum operation responsive to a user-level instruction
US20070157030A1 (en) * 2005-12-30 2007-07-05 Feghali Wajdi K Cryptographic system component
US8229109B2 (en) 2006-06-27 2012-07-24 Intel Corporation Modular reduction using folding
US7827471B2 (en) * 2006-10-12 2010-11-02 Intel Corporation Determining message residue using a set of polynomials
US7925011B2 (en) * 2006-12-14 2011-04-12 Intel Corporation Method for simultaneous modular exponentiations
US8689078B2 (en) * 2007-07-13 2014-04-01 Intel Corporation Determining a message residue
US7886214B2 (en) * 2007-12-18 2011-02-08 Intel Corporation Determining a message residue
US8042025B2 (en) 2007-12-18 2011-10-18 Intel Corporation Determining a message residue
US9052985B2 (en) 2007-12-21 2015-06-09 Intel Corporation Method and apparatus for efficient programmable cyclic redundancy check (CRC)
US8189792B2 (en) * 2007-12-28 2012-05-29 Intel Corporation Method and apparatus for performing cryptographic operations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6223320B1 (en) * 1998-02-10 2001-04-24 International Business Machines Corporation Efficient CRC generation utilizing parallel table lookup operations
US6732317B1 (en) * 2000-10-23 2004-05-04 Sun Microsystems, Inc. Apparatus and method for applying multiple CRC generators to CRC calculation
US20030167440A1 (en) * 2002-02-22 2003-09-04 Cavanna Vicente V. Methods for computing the CRC of a message from the incremental CRCs of composite sub-messages
US20050149812A1 (en) * 2003-11-19 2005-07-07 Honeywell International Inc. Message error verification using checking with hidden data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8229109B2 (en) 2006-06-27 2012-07-24 Intel Corporation Modular reduction using folding
US7827471B2 (en) 2006-10-12 2010-11-02 Intel Corporation Determining message residue using a set of polynomials
US8689078B2 (en) 2007-07-13 2014-04-01 Intel Corporation Determining a message residue
US8042025B2 (en) 2007-12-18 2011-10-18 Intel Corporation Determining a message residue

Also Published As

Publication number Publication date
US20090158132A1 (en) 2009-06-18
US8042025B2 (en) 2011-10-18

Similar Documents

Publication Publication Date Title
US8042025B2 (en) Determining a message residue
US7458006B2 (en) Methods for computing the CRC of a message from the incremental CRCs of composite sub-messages
US7886214B2 (en) Determining a message residue
EP0767539B1 (en) Highly parallel cyclic redundancy code generator
JP5384492B2 (en) Determining message remainder
US6904558B2 (en) Methods for computing the CRC of a message from the incremental CRCs of composite sub-messages
JP5164277B2 (en) Message remainder determination using a set of polynomials.
US7171604B2 (en) Method and apparatus for calculating cyclic redundancy check (CRC) on data using a programmable CRC engine
US20050010630A1 (en) Method and apparatus for determining a remainder in a polynomial ring
JP2003529233A (en) Method and apparatus for encoding and decoding data
US7162679B2 (en) Methods and apparatus for coding and decoding data using Reed-Solomon codes
JP2002236448A (en) Hardware for modular multiplication using a plurality of almost entirely identical processor elements
JPH02148225A (en) Data processing method and apparatus for calculating multipicative inverse element of finite field
US8539326B1 (en) Method and implementation of cyclic redundancy check for wide databus
Ji et al. Fast parallel CRC algorithm and implementation on a configurable processor
JP4566513B2 (en) Method and apparatus for generating pseudo-random sequences
GB2389678A (en) Finite field processor reconfigurable for varying sizes of field.
Drescher et al. VLSI architecture for non-sequential inversion over GF (2m) using the euclidean algorithm
Mohamed Asan Basiri et al. Efficient hardware-software codesigns of AES encryptor and RS-BCH encoder
KR100392370B1 (en) Apaaratus for calculating inversion of multi level structure in the finite field
Yadav et al. Forward Error Correction for Gigabit Automotive Ethernet using RS (450 406) Encoder
JPH06230991A (en) Method and apparatus for computation of inverse number of arbitrary element in finite field
Reddy An Optimization Technique for CRC Generation
Chang et al. Design and implementation of a reconfigurable architecture for (528, 518) Reed-Solomon codec IP
Kadatch et al. Everything we know about CRC but afraid to forget

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08864951

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08864951

Country of ref document: EP

Kind code of ref document: A1