WO1998021656A1 - High performance data path with xor on the fly - Google Patents

High performance data path with xor on the fly Download PDF

Info

Publication number
WO1998021656A1
WO1998021656A1 PCT/US1997/018523 US9718523W WO9821656A1 WO 1998021656 A1 WO1998021656 A1 WO 1998021656A1 US 9718523 W US9718523 W US 9718523W WO 9821656 A1 WO9821656 A1 WO 9821656A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
bus
disk
memory
xor
Prior art date
Application number
PCT/US1997/018523
Other languages
French (fr)
Inventor
Robert C. Solomon
Brian K. Bailey
Robert Yates
Peter Everdell
Elizabeth H. Reeves
Original Assignee
Data General Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Data General Corporation filed Critical Data General Corporation
Priority to DE69733076T priority Critical patent/DE69733076T2/en
Priority to EP97910943A priority patent/EP0938704B1/en
Priority to JP52256898A priority patent/JP3606881B2/en
Priority to CA002268548A priority patent/CA2268548A1/en
Priority to AU48201/97A priority patent/AU4820197A/en
Publication of WO1998021656A1 publication Critical patent/WO1998021656A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2211/00Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
    • G06F2211/10Indexing scheme relating to G06F11/10
    • G06F2211/1002Indexing scheme relating to G06F11/1076
    • G06F2211/1054Parity-fast hardware, i.e. dedicated fast hardware for RAID systems with parity

Definitions

  • This invention relates to high availability disk arrays tor use in data processing systems and more particularly, to a high performance data path for accommodating validity and error checking while maintaining an acceptable and reliable throughput du ⁇ ng disk access operations.
  • VOD Video on Demand
  • Disk array systems have been developed to provide high availability and reliability.
  • systems known as Redundant Array ot Inexpensive Disks (RAID) provide resilience to disk failure through the use of multiple disks and data distribution and correction techniques
  • RAID Redundant Array ot Inexpensive Disks
  • the present invention is directed to a high performance data path tor use in particular in a RAID controller.
  • the high performance data path permits accelerated methods for reading and writing data to an array of disks implementing an XOR based pa ⁇ ty system.
  • the high performance data path includes a first bus and a second bus that are switchably interconnected.
  • a first memory resides on the first bus and a second memory resides on the second bus.
  • An XOR engine is switchably connectable to the first and second buses.
  • the XOR engine is used to XOR data that is passed between the first and second memories across the first and second buses with data in a butter and placing the result back into the butter.
  • Ot the two memories one may be reterred to as a host-side memory and the other a disk-side memory.
  • the two memories may also advantageously be used as caches.
  • the high performance data path is useful in performing both reads and writes of data to a redundant array ot disks.
  • Groups of disks are interrelated in that a sector in one of the disks corresponds with a sector in each of the other disks in the group
  • the sector ot one of the disks in the group holds pa ⁇ ty comprised of the results ot XORing corresponding words in the corresponding data sectors.
  • an XOR of corresponding words may be performed on the fly. i.e. as data is moved past the XOR engine. Taking advantage of this, it is possible to deliver the data to a host computer before all ot the disks have responded to the read.
  • the missing data from the final disk is essentially reconstructed on the fly in the XOR engine
  • the w ⁇ te operation to the disks is also accelerated in that as data is moved from the staging area in the host-side memory to the disk-side memory which are separately provided in the high performance data path, they are simultaneously XORed in the XOR engine
  • the XOR buffer contains the XORed parity for moving into the pa ⁇ ty disk.
  • a DMA controls the XOR engine and also may provide additional functionality.
  • the DMA may be used to create a checksum for data being w ⁇ tten and to check the checksum ot data being read. By operating on the fly, the DMA and the XOR operation are advantageously completed without performance degradation.
  • FIG. 1 is a block diagram of a RAID controller including the high performance data path of the present invention.
  • FIG. 2 is a block diagram of the memory controller and memory used with each of the disk-side and host-side memo ⁇ es in the invention of FIG. 1.
  • FIG. 3 is a block diagram ot a data path control engine for use in the high per ormance data path of the present invention.
  • FIG. 4 is a block diagram of the DMA logic for including in the data path control engine of FIG. 3.
  • the high performance data path of the invention shall now be desc ⁇ bed in the context of the RAID controller shown in FIG 1
  • the RALD controller supervises I/O operations between a disk array and the host computers Data on the disks is arranged according to a RAID algorithm, such as RAID 3 or RAID 5 for example
  • the RAID controller ot the presently prete ⁇ ed embodiment is implemented lor use with a tiber channel
  • a fiber channel protocol has been standardized for system communication with disk arrays
  • a host-side tiber channel controller 12 is provided at the host end ot the RAID controller for interlacing with host computers
  • a hub 14 allows tor more than one host to connect to the host-side liber channel controller 12
  • the host-side fiber channel controller in conjunction with
  • Control data consists ot control structures a small amount ol FCP data (mode pages, etc ) and fiber channel data (login, etc ) This is all infrequently accessed data
  • the bandwidth requirement ot this data is relatively small and the processor requirement to read and write this data is relatively high
  • the control data is all stored in a shared memory system which the job processor 24 also uses tor its own operating soltware All control data fetches to and trom the host-side fiber channel controller 12 and a disk-side fiber channel controller 16 are done through an upper bus 20 which is a standard PCI bus Disk data is the data which passes between the host and the disks There is no need for the job processor 24 to read or write this data There is a need for this data to be XORed and checksummed Words ot data are XORed with corresponding words in a pa ⁇ ty sector to obtain updated parity
  • a checksum is determined tor a sector by XORing together all the disk data words in the sector
  • Both the host-side fiber channel controller 12 and the disk-side fiber channel controller 16 have access to the PCI bus 20 and the high performance data path of the present invention
  • the host-side tiber channel controller interfaces with host computers while the disk-side fiber channel controller interfaces with the disk drives
  • the host-side fiber channel controller and the disk-side channel controller may be conventional fiber controllers such as the TachyonTM chip manufactured by Hewlett Packard
  • the router 18 in the diagram represents the capability ot the system to connect the disk-side fiber channel controller 16 to either ot the two fiber channel loops that are conventionally present in a tiber channel system
  • a presently preferred method for connecting to the disk arrays in a liber channel system is shown in U S Patent Application Serial No 08/749.31 1. filed November 14, 1996 and entitled "Fail-over Switching System" assigned to the same assignee as the present application, the disclosure of which is hereby incorporated by reference herein
  • Bus gaskets 22 form an interf ce between the bus of each ot the fiber channel controllers and the PCI bus 20 and the high performance data path ot the invention
  • the PCI bus is connected through a PCI bridge interface 26 with the job processor 24
  • the PCI bridge interlace 26 is required to interlace the 32-bit PCI bus with the 64-bit 60X bus used in the job processor 24
  • the job processor 24 includes a microprocessor In accordance with the presently preferred embodiment, a Motorola Power PC with 200 MHZ capability is implemented In order to complete the job processor 24.
  • the job processor includes a fixed amount ot DRAM which in accordance with the presently preferred embodiment is 8 MB for use by the firmware, the disk cache tracking structures and the control structures for the fiber channel controllers
  • a local resource unit 28 is connected to the PCI bus through a PCI-PCMCIA b ⁇ dge 30
  • the local resource unit 28 is an environmental monitor It provides a collection place for control, status and shared resources used by the RALD controller These resources include communication posts tor external console, field replaceable unit status and standby power monitoring.
  • a RAID controller should have a staging memory.
  • a first host-side memory 32 and a second disk-side memory 34 are provided.
  • An advantageous feature of the present invention is that the host-side memory
  • the disk-side memory 10 32 and the disk-side memory 34 may double in function both as staging memo ⁇ es and caching memo ⁇ es.
  • the separate disk-side and host-side caches are able to be placed as close to their sources as possible with the current design. Disk caching algorithms and architectures are well known in the art.
  • the host-side memory and disk-side memory are ECC (Error Correcting Code) protected DRAM DIMMs in accordance with the presently
  • the memo ⁇ cs are each controlled by a memory controller 36.
  • the high performance data path includes a first bus 38 tor connection to the memory controller 36 operating the host-side memory 32.
  • the first and second buses are selectively connectable by switching devices 42.
  • the switching devices 42 are also connected to a data path control engine 44
  • the data path control engine is a combined Direct Memory Access unit (DMA), and an XOR engine.
  • the data path control engine may also include a checksumming engine.
  • the switching devices 42 may connect the first bus, the second bus and the data path control engine together all at once or they may be used to isolate either the host-side memory 32
  • the data path control engine 44 includes a DMA/XOR engine 46 and an XOR buffer 48.
  • the tirst data being passed between the host-side memory and the disk-side memory is copied into the XOR buffer 48.
  • data words are XORed by the DMA/XOR engine with the corresponding data words in
  • a sequencer 50 is used to control the operation ot the DMA XOR engine 46.
  • the sequencer is a microprocessor which performs a list of instructions. The instructions are provided by the job processor 24.
  • a PCI b ⁇ dge interface 52 interfaces the 64-bit bus of the sequencer 50 with the 32-bit PCI bus.
  • the sequencer 50 works in conjunction with a flashROM 54 that contains the configuration and power up self-test code tor the sequencer 50.
  • the sequencer 50 also instructs the DMA with regard to the state of switching devices 42.
  • the sequencer may be a Motorola Power PC that operates up to 100 MHZ. in accordance with the presently preterred embodiment.
  • the presently preterred RAID controller is provided with a peer controller.
  • the two controllers are each connected to independent tiber channels under normal operation
  • a direct SCSI connection between the controllers provides direct communications between the two controllers.
  • a peer bridge 60 facilitates this communication.
  • Memory inside the peer processor is used to mirror the w ⁇ te cache for each of the RAID controllers.
  • Two dedicated 64-bit data paths are provided between the two peer controllers tor use in mirroring the caching data. All w ⁇ te data gets mirrored before w ⁇ ting to a disk.
  • mi ⁇ o ⁇ ng can take place with the w ⁇ te bus 38 isolated from the read bus 40 so that the read bus can handle other operations du ⁇ ng the mirro ⁇ ng process
  • the mirror data in accordance with the presently preterred embodiment is pushed from one controller to the other. That is. one controller will place data in a mirror FIFO 58 which physically exists on the peer controller. The peer controller will remove the data from its mirror FIFO 58 and place it into its own memory.
  • This approach allows the receiving controller to perform certain validity checks on the address of the data and to construct fire walls between the two boards such that if a first controller were to attempt to write data to a location on the peer controller which it is not authorized to. the first controller will be caught and prevented from causing any damage to the receiving peer controller.
  • the mirrored cache provides protection in the event ot a memory failure or processor failure. In the event of such a failure the host's data remains available in the mirrored cache.
  • the high performance data path of the invention is tor use in conjunction with a disk array in which the data on a group of disks is interrelated
  • the interrelation involves providing parity in one of the disks.
  • the pa ⁇ ty data is an XOR ot the data in each of the other disks.
  • the disks are typically st ⁇ ped so that sectors in each ot the disks belong to a stripe Sectors belonging to a st ⁇ pe are all said herein to correspond to one another. All the data sectors in a stripe are XORed with one another to produce the parity sector for that st ⁇ pe.
  • a RAID 3 read is performed by reading the sector from each disk in a stnpe from an interrelated group of disks. Reading from the disks takes place asynchronously Du ⁇ ng the reading ot data from the disks into the disk-side memory, the switching devices 42 isolate the read bus 40 from the w ⁇ te bus 38 and the data path control engine 44 As the data arrives, it is w ⁇ tten into the disk-side memory 34 Thus, the data is staged in the disk-side memory pending completion of the entire read operation Typically, a RAID 3 read only reads the data sectors in a st ⁇ pe.
  • a mode may be selected in which all ot the sectors, including the parity sector, are read, but it is not necessary to wait for the last disk d ⁇ ve to provide its sector of data.
  • the data from the first N-l disks (where N is the total number of disks in a group) in the interrelated group is successively moved from the disk-side memory to the host-side memory
  • the switching devices 42 interconnect the disk-side memory, the host-side memory and the data path control engine.
  • a copy of the data tor the first move is copied into or filled into the XOR buffer 48.
  • the DMA/XOR engine 46 performs an XOR on that data with the corresponding words of data that is in the XOR butter 48.
  • the result of the XOR operation is replaced into the XOR buffer
  • the XORing is successA ely repeated until all except one of the corresponding sectors in the stripe have been moved
  • the job processor 24 always knows what data and pa ⁇ ty has been received from the disks based on which disk I/O ' s have completed This is used in providing the sequencer 50 with its list of operations to perform If the final missing sector is a data sector, the contents ot the XOR butter 48 is then moved into the host-side memory as it should be equal to the data in the sector that has not yet ar ⁇ ved It the final missing sector is the pa ⁇ ty sector, the contents of the XOR butter 48 is not needed and the normal RAID 3 moves ot the data sectors have been sufficient
  • a write operation assisted by the high performance data path operates as follows
  • the data to be written is written into the host-side memory 32 while the write bus 38 is isolated from the read bus 40 and the data path control engine 44 Thus, the data is staged in the host-side memory pending completion ot the write
  • the host-side memory is switched into connection with the data path control engine and the disk-side memory Data sectors are moved successively trom the host-side memory into the disk-side memory
  • the first data sector is copied and tilled into the XOR butter 48
  • its data words are XORed in the DMA/XOR engine 46 with corresponding words in the XOR butter 48 to produce a result that is placed back into the XOR butter 48
  • the contents of the XOR butter 48 is moved to the disk-side memory and will be treated as the parity sector
  • the data sectors and the parity sector ot the stripe are then within the disk-side memory and available for writing into the respective disks under the control of the disk-side tiber channel controller 16
  • the logic device may be a Complex Programmable Logic Device (CPLD) such as those made by Advanced Micro Devices.
  • the memory control logic device 70 is made up of 3 sections: an address controller, a data path controller and a DLMM controller.
  • the address controller latches addresses and commands otf the respective bus. Pa ⁇ ty checking is performed on the address.
  • the logic device 70 decodes the address, generates the row and column addresses, increments the CAS (column address strobe) address during burst operations and handles page crossings
  • the data controller portion of the logic device 70 is responsible for all the data path control and the data valid handshake protocol between an initiator and a target.
  • the DLMM controller portion handles RAS (row address strobe). CAS. WE (write enable).
  • the DIMM controller contains RAS and CAS state machines.
  • the CAS machine is the only part ot the controller that runs at 66MHz
  • the remainder ot the controller operates at 33 MHZ in the presently preferred embodiment.
  • the address controller ot the logic device works in conjunction with an address register 72
  • the address register consists ot two bus transceivers with pa ⁇ ty generators/checkers
  • one of the inputs to the address controller is called N_X_FRAME.
  • the N_X_FRAME signal is asserted by the current bus master to indicate the beginning and duration ot an access It is deasserted at the beginning ot the last data phase.
  • the word "beat" as used herein refers to a single bus cycle
  • the latch is closed, the next cycle.
  • the latch is reopened when the operation is complete.
  • Odd byte pa ⁇ ty is checked on each address, and is valid the cycle alter N_X_FRAME is asserted.
  • a read/write bit. X_W_NR is asserted with N_X_FRAME for one cycle du ⁇ ng the address phase.
  • Another similar bit in the presently preferred embodiment is entitled EDAC_CMD.
  • the ED AC bit defines the operation as an ED AC mode register write tor power up configuration or an ED AC diagnostic register read.
  • the Xbus address bus X_AD is time multiplexed with the least significant 32 bits ot data. It is asserted with N_X_FRAME for one cycle
  • the address parity pins X_PAR are odd byte parity. They are asserted with N_X_FRAME tor one cycle. Bit 3:0 contains address pa ⁇ ty tor X_AD Addresses are checked for odd byte parity when they are latched into the address register the cycle after N_X_FRAME is asserted.
  • the address register 72 sets the address pa ⁇ ty error flag N_X_ADDR_PERR in the cycle atter N_X_FRAME is asserted
  • the bus address BA is accessed by the logic device 70 from the address register 72
  • the address register is given an address tor beginning a memory access, and that address is incremented until the access is completed
  • the column address has an incremcnter.
  • N_RAS is the row address strobe for DLMMs 33. It is asserted two cycles atter N_X_FRAME and deasserted the cycle atter the CAS state machine goes idle. On a page crossing, it is deasserted tor precharge while the new row address is being generated.
  • N_CAS is the column address strobe for the DIMMs 33.
  • the presently preterred embodiment is configured to support a page size independent ot DRAM size.
  • the memory design transparently supports page crossings to allow synchronized data transfers between the read and disk-side and host-side memories Thus, there is no need tor additional wait states du ⁇ ng page crossings.
  • the disk-side and host-side memo ⁇ es ot the presently preferred invention are inserted into noninterleaved DIMM sockets.
  • the memory of a desired size can be configured
  • the DLMM socket is voltage keyed, so that only 3 3 volt DRAMS can be inserted into the socket.
  • the DLMMs 33 preferably use 74 LVT 16244's to butter the address and all control signals except N_RAS. There are decoupling capacitors mounted beside each DRAM.
  • the DIMM controller of the logic device 70 includes a RAS and a CAS state machine.
  • the RAS state machine is the master state machine that handles both refresh and read/w ⁇ te accesses.
  • N_RAS is asserted three cycles after N X FRAME is asserted and remains asserted until the CAS state machine returns to idle. Du ⁇ ng a page crossing, the RAS state machine cycles back through idle in order to precharge N_RAS while the new row address is being generated
  • the RAS machine kicks ott the CAS state machine and then asserts N_RAS a cycle atter N_CAS is asserted.
  • the CAS state machine runs oft of a 66 MHZ clock and is t ⁇ ggered by the RAS state machine.
  • N_CAS is asserted for 15 ns. and then precharged tor 15ns If wait states are asserted by the master, or a page crossing occurs, N_CAS remains in precharge.
  • the CAS state machine is kicked off. it syncs up with a 33 MHZ clock signal, so that N_CAS is always asserted on the positive edge ot the 66 MHZ clock signal which aligns with the positive edge ot the 33 MHZ clock signal. This way the machine only samples 33 MHZ signals in the second halt ot the 30 ns cycle.
  • Each ot the DIMMs has separate enable signals indicated by N_WE(3:0) and N_MOE(3:0)
  • N_MOE. is asserted in the same cycle as N_RAS on reads and tor as long as N_RAS is asserted unless there is a page crossing. Du ⁇ ng a page crossing, the output enable signals are switched while N_RAS is precharging
  • the write enable signal N_WE is handled in the same way. It is asserted in the same cycle as N_RAS on writes, tor as long as N_RAS is asserted unless there is a page crossing.
  • a timer in the local resource unit (LRU) 28 will assert N_REF_TIME every 15.4 ⁇ s
  • the logic device 70 includes a refresh state machine that will assert N_X_REF_REQ in the tollowing cycle to request a refresh cycle, regardless ol the current state of the controller. It is up to an arbiter programmed into a CPLD to not grant the refresh until the Xbus is idle. Upon receiving N_X_REF_GNT. a CBR refresh cycle is kicked otf by the RAS and the CAS state machines. The disk-side and host-side memo ⁇ es are refreshed simultaneously. N_X_REF_REQ remains asserted until the refresh is complete. This will let the arbiter know when the Xbus is tree. The arbiter will not grant the bus to anyone else du ⁇ ng refresh.
  • the data path controller ot the logic device 70 controls a tirst data register 76, a second data register 78. an error detection and correction device 80 and a data transceiver 82 and the data valid handshake interface protocol to the Xbus.
  • On a read data is clocked into the second data register 78 from the enabled DLMM.
  • the data is then clocked into the first data register 76 on the positive edge of the 33 MHZ clock. There is a full cycle to get through the ED AC 80 and make setup into the data transceiver 82.
  • the ED AC 80 does ECC detection and correction on the fly and generates odd byte pa ⁇ ty for the Xbus.
  • N_RD_XOE is the output enable ot the data transceiver.
  • the ED AC checks the parity and generates ECC check bits.
  • the next cycle data is clocked into the second data register 78 and in the subsequent cycle, data is written into the approp ⁇ ate DLMM.
  • a pre-target-ready signal is used to aid the DMA 66 in controlling the initiator ready signals between the disk-side and host-side memo ⁇ es during a page crossing.
  • the pre-target-ready signal suspends the data transter while the memory prepares to cross a page boundary.
  • the pre-target-ready signal is asserted the cycle before data is valid on the Xbus. For moves between the write and read memo ⁇ es. this allows the DMA controller to assert an initiator ready signal to the target of the read the following cycle.
  • the next cycle a target ready signal is asserted until the cycle atter N_X_FRAME is deasserted. (On single beat operations, it is just asserted for one cycle if the initiator ready signal is also asserted.)
  • the pre-target ready signal is asserted the cycle before it is ready to accept data trom the Xbus. For moves between w ⁇ te and read memories, this allows the DMA to assert the initiator ready signal to the source of the write in the following cycle. Assertion and deassertion of the target ready signal is the same as tor reads.
  • the ED AC 80 ot the presently preferred embodiment is an IDT49C466A. It is a 64-bit flow-through error detection and correction unit. It uses a modified Hamming code to correct all single bit hard and soft errors and detect all double bit and some multiple bit errors.
  • the flow-through architecture with separate system and memory buses, allows for a pipeline data path.
  • the DMA/XOR engine consists ot two subunits. The first is a data engine and the second is an XOR engine.
  • the data engine is responsible for generating/checking the checksum and for moving data between the disk-side and host- side memo ⁇ es. trom the host-side memory to the mirror FIFO 58, from the XOR buffer 48 to the disk-side and host-side memo ⁇ es, from the disk-side and host-side memories to the XOR butter 48 and immediate data moves to and from either of the disk-side or host- side memo ⁇ es. the XOR butter or the mirror FIFO 58.
  • Immediate data moves are precoded predetermined moves ot a constant number value (no address needed to define the number).
  • the data engine also monitors the status of the second peer processor mirror FIFO 58. If there is data in the mirror FIFO 58, the DMA 66 will contain a status bit depicting it. Once the peer sequencer has moved its mirrored data into its host-side memory, it will signal to the DMA sequencer that it was finished and that no error has occurred du ⁇ ng the transfer.
  • the second unit of the DMA/XOR engine is the XOR engine which generates RAID parity data, checks the RALD 3 parity data and checks for hardware bus (data and address path) parity errors trom the disk-side and host-side memories.
  • the XOR engine is made up ot programmable logic devices and the XOR butter 48. The XOR engine will return RAID data status to the data engine and hardware parity errors to the local resource unit (LRU) 28.
  • the switching devices 42 are implemented as three separate switches A first switch 62 may be used to isolate the w ⁇ te bus 38 from the read bus 40 and the data path control engine 44.
  • a second switch 64 may be used to isolate the read bus 40 from the w ⁇ te bus 38 and the data path control engine 44
  • a third switch 68 is connected to receive the output ot the XOR butler 48.
  • the third switch 68 allows the results in the XOR butter 48 to be loaded onto the read or w ⁇ te bus depending on the state of the other two switches.
  • the switches in a presently preterred embodiment are implemented by crossbar switches, in particular. CBT16212 switches.
  • the state of the switches is controlled by the DMA 66.
  • the status of the switches may be controlled to connect all of the read bus 40. write bus 38 and the data path control engine 44 together. Alternatively, the switches may be controlled to isolate one or more of these data paths trom the others.
  • the DMA 66 is programmed by the sequencer 50 and returns status to the sequencer when it has completed its task.
  • the sequencer 50 communicates with the flashROM 54 and DMA 66 through a transceiver 74.
  • the transceiver 74 provides a buffering path from the sequencer to registers in the DMA.
  • the sequencer reads and w ⁇ tes to the DMA registers.
  • the sequencer 50 uses the program in the flashROM 54 to boot up.
  • the DMA 66 communicates in the torm of read/w ⁇ te operations to either the disk- side or host-side memo ⁇ es. the mirror FIFO or the XOR engine. For every sector move, the data engine will automatically generate or check the checksum ot that data.
  • the checksum will also be generated any time the XOR buffer 48 is being unloaded.
  • a bit (“sector begin” bit) is set to indicate the beginning of the sector transfer. The setting of this bit will cause the checksum logic to XOR the contents of a checksum seed with the first piece ot data. It this bit is not set. then the first piece of data is XORed with the contents ot the checksum register
  • the sector begin bit will allow tor variability in the sector ' s size without modifying the checksum logic.
  • the calculated checksum will be returned to the sequencer
  • the propagation ot data is suspended whenever a checksum is generated and merged onto the data bus. The cycle is used to turn around the data bus and to finish up the generation ot the checksum
  • the DMA includes a CMD/status register with tive fields.
  • a first field tells the DMA whether the operation is a data movement that involves a disk-side or host-side memory or whether it is a read/write ot immediate data. i.e.. a single beat (8 bytes) of data.
  • the data source must be specified by the source field ot the CMD/status register with the source address in the source register
  • the source is the XOR butter 48.
  • the XOR engine will unload the data and at the same time refill itself with the same data. This allows RALD 3 reconstructed data to stay in the XOR buller 48 so it can be checked against the final piece ot data.
  • the flashROM 54 is used to store boot codes.
  • the flashROM of the presently preferred embodiment is a 512KB x 8 and will reside on the most significant byte lane of the data bus.
  • the DMA engine always acts as an initiator and always is responsible tor d ⁇ ving the initiator ready signal even when it is not sourcing the data
  • the disk-side and host- side memory and the mirror FIFO will always be targets.
  • the write bus 38 and the read bus 40 will each have its own interface control signals even when the buses are joined together.
  • the DMA synchronizes data flow between the two memory systems. DMA will handshake with the memo ⁇ es through the initiator ready and the pre-target-ready signal signals. When a memory is ready to receive or send data, it will assert the pre-target-ready signal one cycle before the data is valid. In response, the DMA will assert the initiator ready signal.
  • the DMA When both memo ⁇ es are being used in the transfer, the DMA will not assert the initiator ready signal to the memo ⁇ es until it has received both pre-target-ready signals Once it has received both of these signals, it will assert the initiator ready signal at the same time. At this time, the data is considered valid on the Xbus. i.e.. the read and w ⁇ te busses.
  • a time stamp is a unique number that is located in a validity field ot each data sector The unique number is inserted in each sector in a RAID group during a major st ⁇ pe update.
  • a major stripe update refers to w ⁇ ting data filling every sector in a RAID stripe It data is written into a sector thereafter, then the time stamp is invalidated.
  • a new time stamp is inserted at the next stripe update
  • a write stamp provides a bit tor each disk in a group When a w ⁇ te is made to a disk the value of the bit tor that disk is flipped, i.e..
  • the DMA will only suspend an operation at the end of the data transfer in order to append the checksum and the stamp information.
  • the initiator ready signal will be deasserted for one cycle and then asserted for one cycle when the 8 bytes including stamp information from the DMA and checksum information from the XOR engine are on the bus.
  • the one cycle delay is needed in order to complete the generation of the checksum.
  • the stamp information is stripped off and placed in stamp registers in the DMA.
  • the stamp information is not sent to the host-side memory.
  • the host-side memory stores 512 bytes tor each sector and the disk-side memory stores 520 bytes per sector.
  • the preterred XOR buffer 48 is a FIFO.
  • the FIFO 48 holds one sector
  • the FIFO 48 can be loaded from one of three paths.
  • the choice ot paths is switched by operation of a multiplexor 84 which may be separated from the FIFO 48 by a butter 86.
  • the first is with data directly trom the data bus.
  • a bus interface 88 removes data trom the bus for use in the XOR engine. Data is registered externally and parity checked before it is written to the FIFO.
  • a second path is a feedback path from the FIFO into itself. This is needed in order to maintain a copy in the FIFO while unloading the FIFO data onto the bus.
  • the final logical path into the FIFO is the XOR path.
  • the XOR logic 90 exclusive ORs the registered data trom the bus with the contents ot the FLFO. The result of the XOR is input to the FIFO 48 behind the old data as the old data is sequentially output to the exclusive OR logic. The existing data is thus replaced by the exclusive OR results.
  • the selection ot the needed path is done by the sequencer by setting the appropriate bits in the CMD/status register. Whenever the XOR FLFO is being unloaded, the DMA logic can be programmed to append the generated checksum and the w ⁇ te and time stamp information tound in the DMA register sets.
  • the XOR engine can advantageously be used to detect RAID 3 user data parity errors.
  • the XOR engine can be programmed through the DMA to reconstruct the final piece ot data.
  • the data When the data has been regenerated, it can be unloaded fiom the FIFO butter 48 to the host-side memory. It the new data is to be compared to the old data, then du ⁇ ng the unload operation, the check pa ⁇ ty data bit can be set. This allows the new data that is being read from the XOR FIFO to be w ⁇ tten back into the FIFO butter 48. The old data when it comes in can then be compared to the new data by setting this bit with an XOR FIFO operation.
  • FIG. 4 illustrates the DMA logic tor performing the checksum on the fly.
  • a check sum butter 92 holds the checksum data as it is being computed.
  • a multiplexor 94 feeds the checksum XOR logic 96 with either the data in the checksum butter 92 or an initial seed data found in a DMA register. The other input to the XOR logic 96 is the data being checksummed.
  • the high performance data path of the present invention permits an "N-l and go" mode tor performing RALD read operations. If one of the d ⁇ ves is struggling or not operating, N-l and go mode can maintain a fairly constant transler rate to the host. While N-l and go is quicker for situations where a d ⁇ ve is struggling or not operational, the mode is also slower than normal operation. Under normal condition, the N-l and go operation is slower because there is an added latency for handling the XOR operation and there is one extra request from the disk-side fiber loop. Instead of requesting data from the data sectors only, the pa ⁇ ty sector is also requested. The advantage of this mode is that the through-put is more deterministic.

Abstract

A high performance data path for performing XOR on the fly. A first memory is connected to a first bus and a second memory is connected to a second bus selectively coupled to the first bus. Logic for performing an XOR can be switched into connection with the first and second bus for XORing data in a buffer with the data passed from one of the memories to the other memory. The result is replaced into the buffer to permit successive XORing. When reading from an interrelated group of disks such as a RAID 3 group, the data path permits an N-1 and go mode in which a read does not wait for data from the last disk to retrieve its data sector. If the last disk contains data (as opposed to parity) the data is obtained from the XORed data in the XOR buffer of the high performance data path. For writing data, the XOR on the fly generates the parity sector for writing at the completion of a write to an interrelated group of disks.

Description

High Performance Data Path with Xor on the Flv Background of the Invention This invention relates to high availability disk arrays tor use in data processing systems and more particularly, to a high performance data path for accommodating validity and error checking while maintaining an acceptable and reliable throughput duπng disk access operations.
The bottleneck ot many systems and their applications is located at the I/O level. Applications are demanding more and more speed and bandwidth from their data storage products. In addition to speed, an application such as Video on Demand (VOD), needs its disk accesses to be timely both in terms ot speed and interval. That is. the requirement for VOD is that of high bandwidth without interruption. Specifically, the speed required for VOD must be sustained without interruption - resulting in uninterrupted movie clips.
Disk array systems have been developed to provide high availability and reliability. In particular, systems known as Redundant Array ot Inexpensive Disks (RAID) provide resilience to disk failure through the use of multiple disks and data distribution and correction techniques Unfortunately, the techniques for increasing reliability often result in slowing down an I/O operation
It is desirable to devise systems and techniques tor maintaining the reliability of RAID systems and at the same time increasing their speed Summary ol the Invention
The present invention is directed to a high performance data path tor use in particular in a RAID controller. The high performance data path permits accelerated methods for reading and writing data to an array of disks implementing an XOR based paπty system. The high performance data path includes a first bus and a second bus that are switchably interconnected. A first memory resides on the first bus and a second memory resides on the second bus. An XOR engine is switchably connectable to the first and second buses. The XOR engine is used to XOR data that is passed between the first and second memories across the first and second buses with data in a butter and placing the result back into the butter. Ot the two memories, one may be reterred to as a host-side memory and the other a disk-side memory. The host-side memory stages data for a write and the disk-side memory stages data received from the disks to complete a read. The two memories may also advantageously be used as caches.
The high performance data path is useful in performing both reads and writes of data to a redundant array ot disks. Groups of disks are interrelated in that a sector in one of the disks corresponds with a sector in each of the other disks in the group The sector ot one of the disks in the group holds paπty comprised of the results ot XORing corresponding words in the corresponding data sectors. When a read from the disks is performed over the high performance data path, an XOR of corresponding words may be performed on the fly. i.e.. as data is moved past the XOR engine. Taking advantage of this, it is possible to deliver the data to a host computer before all ot the disks have responded to the read. The missing data from the final disk is essentially reconstructed on the fly in the XOR engine The wπte operation to the disks is also accelerated in that as data is moved from the staging area in the host-side memory to the disk-side memory which are separately provided in the high performance data path, they are simultaneously XORed in the XOR engine Thus, upon completion ot the movement ot the data trom host-side memory to the disk-side memory, the XOR buffer contains the XORed parity for moving into the paπty disk.
A DMA controls the XOR engine and also may provide additional functionality. For example, the DMA may be used to create a checksum for data being wπtten and to check the checksum ot data being read. By operating on the fly, the DMA and the XOR operation are advantageously completed without performance degradation.
Other objects and advantages ot the present invention will become apparent duπng the following description taken in con]unctιon with the drawings.
Brief Description ot the Drawings FIG. 1 is a block diagram of a RAID controller including the high performance data path of the present invention.
FIG. 2 is a block diagram of the memory controller and memory used with each of the disk-side and host-side memoπes in the invention of FIG. 1.
FIG. 3 is a block diagram ot a data path control engine for use in the high per ormance data path of the present invention. FIG. 4 is a block diagram of the DMA logic for including in the data path control engine of FIG. 3. Detailed Descnption of the Presently Preferred Embodiment The high performance data path of the invention shall now be descπbed in the context of the RAID controller shown in FIG 1 The RALD controller supervises I/O operations between a disk array and the host computers Data on the disks is arranged according to a RAID algorithm, such as RAID 3 or RAID 5 for example The RAID controller ot the presently preteπed embodiment is implemented lor use with a tiber channel A fiber channel protocol has been standardized for system communication with disk arrays A host-side tiber channel controller 12 is provided at the host end ot the RAID controller for interlacing with host computers A hub 14 allows tor more than one host to connect to the host-side liber channel controller 12 The host-side fiber channel controller in conjunction with dπver software, implements the host side Fibre Channel Protocol The host-side tiber channel controller moves two types of data between itself and an attached memory system, control data and user disk data Control data has a relatively low bandwidth requirement and user disk data has a relatively high bandwidth requirement The high performance data path ot the present invention is designed to accelerate operations involving user disk data
Control data consists ot control structures a small amount ol FCP data (mode pages, etc ) and fiber channel data (login, etc ) This is all infrequently accessed data The bandwidth requirement ot this data is relatively small and the processor requirement to read and write this data is relatively high As a result the control data is all stored in a shared memory system which the job processor 24 also uses tor its own operating soltware All control data fetches to and trom the host-side fiber channel controller 12 and a disk-side fiber channel controller 16 are done through an upper bus 20 which is a standard PCI bus Disk data is the data which passes between the host and the disks There is no need for the job processor 24 to read or write this data There is a need for this data to be XORed and checksummed Words ot data are XORed with corresponding words in a paπty sector to obtain updated parity A checksum is determined tor a sector by XORing together all the disk data words in the sector The bandwidth requirements ot disk data are significant As a result, the specialized high performance data path of the present invention is intended for use with all disk data movement The disk-side fiber channel controller 16. in conjunction wnh disk side driver software, implements the fiber channel arbitrated loop protocol among the disk arrays Both the host-side fiber channel controller 12 and the disk-side fiber channel controller 16 have access to the PCI bus 20 and the high performance data path of the present invention The host-side tiber channel controller interfaces with host computers while the disk-side fiber channel controller interfaces with the disk drives The host-side fiber channel controller and the disk-side channel controller may be conventional fiber controllers such as the Tachyon™ chip manufactured by Hewlett Packard
The router 18 in the diagram represents the capability ot the system to connect the disk-side fiber channel controller 16 to either ot the two fiber channel loops that are conventionally present in a tiber channel system A presently preferred method for connecting to the disk arrays in a liber channel system is shown in U S Patent Application Serial No 08/749.31 1. filed November 14, 1996 and entitled "Fail-over Switching System" assigned to the same assignee as the present application, the disclosure of which is hereby incorporated by reference herein
Bus gaskets 22 form an interf ce between the bus of each ot the fiber channel controllers and the PCI bus 20 and the high performance data path ot the invention The PCI bus is connected through a PCI bridge interface 26 with the job processor 24 The PCI bridge interlace 26 is required to interlace the 32-bit PCI bus with the 64-bit 60X bus used in the job processor 24 The job processor 24 includes a microprocessor In accordance with the presently preferred embodiment, a Motorola Power PC with 200 MHZ capability is implemented In order to complete the job processor 24. there is also a level 2 processor cache, a control DRAM and a flashROM RAID controller firmware relating to power up procedures is found in the flashROM Firmware in the flashROM executes in the ιob processor 24 The job processor includes a fixed amount ot DRAM which in accordance with the presently preferred embodiment is 8 MB for use by the firmware, the disk cache tracking structures and the control structures for the fiber channel controllers
A local resource unit 28 is connected to the PCI bus through a PCI-PCMCIA bπdge 30 The local resource unit 28 is an environmental monitor It provides a collection place for control, status and shared resources used by the RALD controller These resources include communication posts tor external console, field replaceable unit status and standby power monitoring.
The high performance data path in the RAID controller shall now be descπbed. Through normal operation of a RAID controller, staging memory is required to account 5 for the asynchronous nature of disk I/O. Logically related operations, but physically separate disk operations often need to be staged while awaiting completion of all disk operations. Thus, a RAID controller should have a staging memory. In accordance with the present invention, a first host-side memory 32 and a second disk-side memory 34 are provided. An advantageous feature of the present invention is that the host-side memory
10 32 and the disk-side memory 34 may double in function both as staging memoπes and caching memoπes. The separate disk-side and host-side caches are able to be placed as close to their sources as possible with the current design. Disk caching algorithms and architectures are well known in the art. The host-side memory and disk-side memory are ECC (Error Correcting Code) protected DRAM DIMMs in accordance with the presently
15 preferred embodiment. The memoπcs are each controlled by a memory controller 36. The high performance data path includes a first bus 38 tor connection to the memory controller 36 operating the host-side memory 32. There is also a second bus 40 tor connection with the memory controller operating the disk-side memory 34. The first and second buses are selectively connectable by switching devices 42.
20 The switching devices 42 are also connected to a data path control engine 44 The data path control engine is a combined Direct Memory Access unit (DMA), and an XOR engine. The data path control engine may also include a checksumming engine. The switching devices 42 may connect the first bus, the second bus and the data path control engine together all at once or they may be used to isolate either the host-side memory 32
25 or the disk-side memory 34 from the others. The data path control engine 44 includes a DMA/XOR engine 46 and an XOR buffer 48. In performing an XOR on data transfers, the tirst data being passed between the host-side memory and the disk-side memory is copied into the XOR buffer 48. In subsequent data transfers between the two memories . data words are XORed by the DMA/XOR engine with the corresponding data words in
30 the XOR buffer 48. Each time the result of the XORing is replaced into the XOR buffer 48. In performing a write operation to interrelated RAID disks, when all data in a corresponding group has been XORed, the parity data is found in the XOR buffer.
A sequencer 50 is used to control the operation ot the DMA XOR engine 46. The sequencer is a microprocessor which performs a list of instructions. The instructions are provided by the job processor 24. A PCI bπdge interface 52 interfaces the 64-bit bus of the sequencer 50 with the 32-bit PCI bus. The sequencer 50 works in conjunction with a flashROM 54 that contains the configuration and power up self-test code tor the sequencer 50. The sequencer 50 also instructs the DMA with regard to the state of switching devices 42. The sequencer may be a Motorola Power PC that operates up to 100 MHZ. in accordance with the presently preterred embodiment. The presently preterred RAID controller is provided with a peer controller. The two controllers are each connected to independent tiber channels under normal operation A direct SCSI connection between the controllers provides direct communications between the two controllers. A peer bridge 60 facilitates this communication. In order to maintain a highly available wπte cache in the host-side memory 32. it must be mirrored. Memory inside the peer processor is used to mirror the wπte cache for each of the RAID controllers. Two dedicated 64-bit data paths are provided between the two peer controllers tor use in mirroring the caching data. All wπte data gets mirrored before wπting to a disk. Advantageously, miπoπng can take place with the wπte bus 38 isolated from the read bus 40 so that the read bus can handle other operations duπng the mirroπng process The mirror data in accordance with the presently preterred embodiment is pushed from one controller to the other. That is. one controller will place data in a mirror FIFO 58 which physically exists on the peer controller. The peer controller will remove the data from its mirror FIFO 58 and place it into its own memory. This approach allows the receiving controller to perform certain validity checks on the address of the data and to construct fire walls between the two boards such that if a first controller were to attempt to write data to a location on the peer controller which it is not authorized to. the first controller will be caught and prevented from causing any damage to the receiving peer controller.
The mirrored cache provides protection in the event ot a memory failure or processor failure. In the event of such a failure the host's data remains available in the mirrored cache. The high performance data path of the invention is tor use in conjunction with a disk array in which the data on a group of disks is interrelated The interrelation involves providing parity in one of the disks. The paπty data is an XOR ot the data in each of the other disks. The disks are typically stπped so that sectors in each ot the disks belong to a stripe Sectors belonging to a stπpe are all said herein to correspond to one another. All the data sectors in a stripe are XORed with one another to produce the parity sector for that stπpe.
The advantages ot the high performance data path ot the invention can be better seen in the completion of RAID operations. For example, a RAID 3 read is performed by reading the sector from each disk in a stnpe from an interrelated group of disks. Reading from the disks takes place asynchronously Duπng the reading ot data from the disks into the disk-side memory, the switching devices 42 isolate the read bus 40 from the wπte bus 38 and the data path control engine 44 As the data arrives, it is wπtten into the disk-side memory 34 Thus, the data is staged in the disk-side memory pending completion of the entire read operation Typically, a RAID 3 read only reads the data sectors in a stπpe. One would wait tor all of the data sectors in the stπpe to be read and written into memory. This may occasionally result in extended delays it one ot the disks experiences an error or fails to respond in a timely manner Also, transition to a degraded mode (in which one of the disk drives is missing) will result in a noticeably longer delay because then paπty information and reconstruction will be required In accordance with the present invention, a mode may be selected in which all ot the sectors, including the parity sector, are read, but it is not necessary to wait for the last disk dπve to provide its sector of data. Operating in this "N-l and go" mode, the data from the first N-l disks (where N is the total number of disks in a group) in the interrelated group is successively moved from the disk-side memory to the host-side memory To do this, the switching devices 42 interconnect the disk-side memory, the host-side memory and the data path control engine. At the same time the data is being moved from the disk-side memory to the host- side memory, a copy of the data tor the first move is copied into or filled into the XOR buffer 48. For each subsequent data sector that is moved from the disk-side memory to the host-side memory, the DMA/XOR engine 46 performs an XOR on that data with the corresponding words of data that is in the XOR butter 48. The result of the XOR operation is replaced into the XOR buffer The XORing is successA ely repeated until all except one of the corresponding sectors in the stripe have been moved The job processor 24 always knows what data and paπty has been received from the disks based on which disk I/O's have completed This is used in providing the sequencer 50 with its list of operations to perform If the final missing sector is a data sector, the contents ot the XOR butter 48 is then moved into the host-side memory as it should be equal to the data in the sector that has not yet arπved It the final missing sector is the paπty sector, the contents of the XOR butter 48 is not needed and the normal RAID 3 moves ot the data sectors have been sufficient A write operation assisted by the high performance data path operates as follows
The data to be written is written into the host-side memory 32 while the write bus 38 is isolated from the read bus 40 and the data path control engine 44 Thus, the data is staged in the host-side memory pending completion ot the write The host-side memory is switched into connection with the data path control engine and the disk-side memory Data sectors are moved successively trom the host-side memory into the disk-side memory The first data sector is copied and tilled into the XOR butter 48 For each successive data sector, its data words are XORed in the DMA/XOR engine 46 with corresponding words in the XOR butter 48 to produce a result that is placed back into the XOR butter 48 Atter the last of the data sectors has been XORed, the contents of the XOR butter 48 is moved to the disk-side memory and will be treated as the parity sector The data sectors and the parity sector ot the stripe are then within the disk-side memory and available for writing into the respective disks under the control of the disk-side tiber channel controller 16 The write to disk may take place with the read bus 40 isolated by the switches 42 from the write bus 38 and the data path control engine 44 The memory controllers 36 tor the host-side memory and the disk-side memory may be configured in the same manner as shown in FIG 2 Although the specific presently preterred implementation is described below, those of ordinary skill in the art may implement any number ot memory controllers that accomplish the functions of the invention claimed herein Conventional memory control techniques are available to be implemented to satisfy the memory control needs of the present invention The present invention is thus in no way limited by the following discussion ot the presently preferred embodiment of the memory controllers The logic for operating the memory controller ot the presently preferred embodiment is contained within a memory control logic device 70. The logic device may be a Complex Programmable Logic Device (CPLD) such as those made by Advanced Micro Devices. The memory control logic device 70 is made up of 3 sections: an address controller, a data path controller and a DLMM controller. The address controller latches addresses and commands otf the respective bus. Paπty checking is performed on the address. The logic device 70 decodes the address, generates the row and column addresses, increments the CAS (column address strobe) address during burst operations and handles page crossings The data controller portion of the logic device 70 is responsible for all the data path control and the data valid handshake protocol between an initiator and a target. The DLMM controller portion handles RAS (row address strobe). CAS. WE (write enable). OE (output enable), and retresh for the DRAMS The DIMM controller contains RAS and CAS state machines. The CAS machine is the only part ot the controller that runs at 66MHz The remainder ot the controller operates at 33 MHZ in the presently preferred embodiment. The address controller ot the logic device works in conjunction with an address register 72 The address register consists ot two bus transceivers with paπty generators/checkers In the presently preterred embodiment, one of the inputs to the address controller is called N_X_FRAME. The N_X_FRAME signal is asserted by the current bus master to indicate the beginning and duration ot an access It is deasserted at the beginning ot the last data phase. On single beat (8 bytes) operations, it is asserted for one cycle. The word "beat" as used herein refers to a single bus cycle When the N_X_FRAME signal is latched, the latch is closed, the next cycle. The latch is reopened when the operation is complete. Odd byte paπty is checked on each address, and is valid the cycle alter N_X_FRAME is asserted. A read/write bit. X_W_NR, is asserted with N_X_FRAME for one cycle duπng the address phase. Another similar bit in the presently preferred embodiment is entitled EDAC_CMD. The ED AC bit defines the operation as an ED AC mode register write tor power up configuration or an ED AC diagnostic register read. Both ot these signals are valid in the first cycle that the N_X_FRAME is asserted. The Xbus address bus X_AD is time multiplexed with the least significant 32 bits ot data. It is asserted with N_X_FRAME for one cycle The address parity pins X_PAR are odd byte parity. They are asserted with N_X_FRAME tor one cycle. Bit 3:0 contains address paπty tor X_AD Addresses are checked for odd byte parity when they are latched into the address register the cycle after N_X_FRAME is asserted. If there is an odd byte parity error, the address register 72 sets the address paπty error flag N_X_ADDR_PERR in the cycle atter N_X_FRAME is asserted The bus address BA is accessed by the logic device 70 from the address register 72 The address register is given an address tor beginning a memory access, and that address is incremented until the access is completed
Further, with regard to the address controller, in order to allow bursting up to 65 beats (520 bytes), the column address has an incremcnter. When a DRAM page crossing (8 Kbytes boundary when row address must be incremented) is detected, an incremented row address is generated while N_RAS is being precharged N_RAS is the row address strobe for DLMMs 33. It is asserted two cycles atter N_X_FRAME and deasserted the cycle atter the CAS state machine goes idle. On a page crossing, it is deasserted tor precharge while the new row address is being generated. N_CAS is the column address strobe for the DIMMs 33. It is asserted tor a hall a 33 MHZ cycle and is driven by a 66MHz clock. Duπng wait states, it is held in precharge (deasserted) The presently preterred embodiment is configured to support a page size independent ot DRAM size. In the preferred embodiment, the memory design transparently supports page crossings to allow synchronized data transfers between the read and disk-side and host-side memories Thus, there is no need tor additional wait states duπng page crossings.
The disk-side and host-side memoπes ot the presently preferred invention are inserted into noninterleaved DIMM sockets. Using 16 and 64 Mbit DRAMS, the memory of a desired size can be configured In the presently preferred embodiment, the DLMM socket is voltage keyed, so that only 3 3 volt DRAMS can be inserted into the socket. The DLMMs 33 preferably use 74 LVT 16244's to butter the address and all control signals except N_RAS. There are decoupling capacitors mounted beside each DRAM.
The DIMM controller of the logic device 70 includes a RAS and a CAS state machine. The RAS state machine is the master state machine that handles both refresh and read/wπte accesses. On a read/write operation. N_RAS is asserted three cycles after N X FRAME is asserted and remains asserted until the CAS state machine returns to idle. Duπng a page crossing, the RAS state machine cycles back through idle in order to precharge N_RAS while the new row address is being generated For a CBR (CAS before RAS) refresh, the RAS machine kicks ott the CAS state machine and then asserts N_RAS a cycle atter N_CAS is asserted. The CAS state machine runs oft of a 66 MHZ clock and is tπggered by the RAS state machine. During read/write accesses. N_CAS is asserted for 15 ns. and then precharged tor 15ns If wait states are asserted by the master, or a page crossing occurs, N_CAS remains in precharge. Once the CAS state machine is kicked off. it syncs up with a 33 MHZ clock signal, so that N_CAS is always asserted on the positive edge ot the 66 MHZ clock signal which aligns with the positive edge ot the 33 MHZ clock signal. This way the machine only samples 33 MHZ signals in the second halt ot the 30 ns cycle. Each ot the DIMMs has separate enable signals indicated by N_WE(3:0) and N_MOE(3:0) The output enable. N_MOE. is asserted in the same cycle as N_RAS on reads and tor as long as N_RAS is asserted unless there is a page crossing. Duπng a page crossing, the output enable signals are switched while N_RAS is precharging The write enable signal N_WE is handled in the same way. It is asserted in the same cycle as N_RAS on writes, tor as long as N_RAS is asserted unless there is a page crossing. A timer in the local resource unit (LRU) 28 will assert N_REF_TIME every 15.4μs The logic device 70 includes a refresh state machine that will assert N_X_REF_REQ in the tollowing cycle to request a refresh cycle, regardless ol the current state of the controller. It is up to an arbiter programmed into a CPLD to not grant the refresh until the Xbus is idle. Upon receiving N_X_REF_GNT. a CBR refresh cycle is kicked otf by the RAS and the CAS state machines. The disk-side and host-side memoπes are refreshed simultaneously. N_X_REF_REQ remains asserted until the refresh is complete. This will let the arbiter know when the Xbus is tree. The arbiter will not grant the bus to anyone else duπng refresh.
The data path controller ot the logic device 70 controls a tirst data register 76, a second data register 78. an error detection and correction device 80 and a data transceiver 82 and the data valid handshake interface protocol to the Xbus. On a read, data is clocked into the second data register 78 from the enabled DLMM. In order to use the ED AC 80 in flow-through mode, the data is then clocked into the first data register 76 on the positive edge of the 33 MHZ clock. There is a full cycle to get through the ED AC 80 and make setup into the data transceiver 82. The ED AC 80 does ECC detection and correction on the fly and generates odd byte paπty for the Xbus. Data is dπven onto the Xbus in the next cycle. N_RD_XOE is the output enable ot the data transceiver. The ED AC checks the parity and generates ECC check bits. The next cycle, data is clocked into the second data register 78 and in the subsequent cycle, data is written into the appropπate DLMM.
A pre-target-ready signal is used to aid the DMA 66 in controlling the initiator ready signals between the disk-side and host-side memoπes during a page crossing. The pre-target-ready signal suspends the data transter while the memory prepares to cross a page boundary. When a read operation is ongoing, the pre-target-ready signal is asserted the cycle before data is valid on the Xbus. For moves between the write and read memoπes. this allows the DMA controller to assert an initiator ready signal to the target of the read the following cycle. The next cycle a target ready signal is asserted until the cycle atter N_X_FRAME is deasserted. (On single beat operations, it is just asserted for one cycle if the initiator ready signal is also asserted.)
For a wπte operation, the pre-target ready signal is asserted the cycle before it is ready to accept data trom the Xbus. For moves between wπte and read memories, this allows the DMA to assert the initiator ready signal to the source of the write in the following cycle. Assertion and deassertion of the target ready signal is the same as tor reads.
The ED AC 80 ot the presently preferred embodiment is an IDT49C466A. It is a 64-bit flow-through error detection and correction unit. It uses a modified Hamming code to correct all single bit hard and soft errors and detect all double bit and some multiple bit errors. The flow-through architecture, with separate system and memory buses, allows for a pipeline data path.
Referring now to FIG. 3, the operation ot the DMA/XOR engine shall be described with more particulaπty. The DMA/XOR engine consists ot two subunits. The first is a data engine and the second is an XOR engine. The data engine is responsible for generating/checking the checksum and for moving data between the disk-side and host- side memoπes. trom the host-side memory to the mirror FIFO 58, from the XOR buffer 48 to the disk-side and host-side memoπes, from the disk-side and host-side memories to the XOR butter 48 and immediate data moves to and from either of the disk-side or host- side memoπes. the XOR butter or the mirror FIFO 58. Immediate data moves are precoded predetermined moves ot a constant number value (no address needed to define the number). The data engine also monitors the status of the second peer processor mirror FIFO 58. If there is data in the mirror FIFO 58, the DMA 66 will contain a status bit depicting it. Once the peer sequencer has moved its mirrored data into its host-side memory, it will signal to the DMA sequencer that it was finished and that no error has occurred duπng the transfer.
The second unit of the DMA/XOR engine is the XOR engine which generates RAID parity data, checks the RALD 3 parity data and checks for hardware bus (data and address path) parity errors trom the disk-side and host-side memories. The XOR engine is made up ot programmable logic devices and the XOR butter 48. The XOR engine will return RAID data status to the data engine and hardware parity errors to the local resource unit (LRU) 28. The switching devices 42 are implemented as three separate switches A first switch 62 may be used to isolate the wπte bus 38 from the read bus 40 and the data path control engine 44. A second switch 64 may be used to isolate the read bus 40 from the wπte bus 38 and the data path control engine 44 A third switch 68 is connected to receive the output ot the XOR butler 48. The third switch 68 allows the results in the XOR butter 48 to be loaded onto the read or wπte bus depending on the state of the other two switches. The switches in a presently preterred embodiment are implemented by crossbar switches, in particular. CBT16212 switches. The state of the switches is controlled by the DMA 66. The status of the switches may be controlled to connect all of the read bus 40. write bus 38 and the data path control engine 44 together. Alternatively, the switches may be controlled to isolate one or more of these data paths trom the others.
The DMA 66 is programmed by the sequencer 50 and returns status to the sequencer when it has completed its task. The sequencer 50 communicates with the flashROM 54 and DMA 66 through a transceiver 74. The transceiver 74 provides a buffering path from the sequencer to registers in the DMA. The sequencer reads and wπtes to the DMA registers. The sequencer 50 uses the program in the flashROM 54 to boot up. The DMA 66 communicates in the torm of read/wπte operations to either the disk- side or host-side memoπes. the mirror FIFO or the XOR engine. For every sector move, the data engine will automatically generate or check the checksum ot that data. The checksum will also be generated any time the XOR buffer 48 is being unloaded. To start the checksumming process, a bit ("sector begin" bit) is set to indicate the beginning of the sector transfer. The setting of this bit will cause the checksum logic to XOR the contents of a checksum seed with the first piece ot data. It this bit is not set. then the first piece of data is XORed with the contents ot the checksum register The sector begin bit will allow tor variability in the sector's size without modifying the checksum logic. Any time the register set of status bits is read, the calculated checksum will be returned to the sequencer In connection with the generation ot a checksum, the propagation ot data is suspended whenever a checksum is generated and merged onto the data bus. The cycle is used to turn around the data bus and to finish up the generation ot the checksum
The DMA includes a CMD/status register with tive fields. A first field tells the DMA whether the operation is a data movement that involves a disk-side or host-side memory or whether it is a read/write ot immediate data. i.e.. a single beat (8 bytes) of data. For data moves, the data source must be specified by the source field ot the CMD/status register with the source address in the source register When the source is the XOR butter 48. the XOR engine will unload the data and at the same time refill itself with the same data. This allows RALD 3 reconstructed data to stay in the XOR buller 48 so it can be checked against the final piece ot data. Once the last piece of the sector comes in that can be checked by setting a check parity data bit of CMD/status register. The destination must also be indicated by setting the destination field in the CMD/status and its address in the destination register The amount of data to be transferred is indicated by the word count field.
The flashROM 54 is used to store boot codes. The flashROM of the presently preferred embodiment is a 512KB x 8 and will reside on the most significant byte lane of the data bus.
The DMA engine always acts as an initiator and always is responsible tor dπving the initiator ready signal even when it is not sourcing the data The disk-side and host- side memory and the mirror FIFO will always be targets. The write bus 38 and the read bus 40 will each have its own interface control signals even when the buses are joined together. At the beginning of moves between the disk-side and host-side memories, the DMA synchronizes data flow between the two memory systems. DMA will handshake with the memoπes through the initiator ready and the pre-target-ready signal signals. When a memory is ready to receive or send data, it will assert the pre-target-ready signal one cycle before the data is valid. In response, the DMA will assert the initiator ready signal. When both memoπes are being used in the transfer, the DMA will not assert the initiator ready signal to the memoπes until it has received both pre-target-ready signals Once it has received both of these signals, it will assert the initiator ready signal at the same time. At this time, the data is considered valid on the Xbus. i.e.. the read and wπte busses.
In accordance with the presently preterred embodiment, other validity checking mechanisms besides the checksum are also used A time stamp is a unique number that is located in a validity field ot each data sector The unique number is inserted in each sector in a RAID group during a major stπpe update. A major stripe update refers to wπting data filling every sector in a RAID stripe It data is written into a sector thereafter, then the time stamp is invalidated. A new time stamp is inserted at the next stripe update A write stamp provides a bit tor each disk in a group When a wπte is made to a disk the value of the bit tor that disk is flipped, i.e.. changed trom a 1 to a 0 or trom a 0 to a 1 depending on whichever is its current value In order to accommodate the wπte and time stamp information ot the presently preferred embodiment, the DMA will only suspend an operation at the end of the data transfer in order to append the checksum and the stamp information. At that time, the initiator ready signal will be deasserted for one cycle and then asserted for one cycle when the 8 bytes including stamp information from the DMA and checksum information from the XOR engine are on the bus. The one cycle delay is needed in order to complete the generation of the checksum. In the opposite direction as data is moved from the disk-side memory to the host-side memory, the stamp information is stripped off and placed in stamp registers in the DMA. The stamp information is not sent to the host-side memory. Thus, in accordance with the presently preterred embodiment, the host-side memory stores 512 bytes tor each sector and the disk-side memory stores 520 bytes per sector. The preterred XOR buffer 48 is a FIFO. The FIFO 48 holds one sector The FIFO 48 can be loaded from one of three paths. The choice ot paths is switched by operation of a multiplexor 84 which may be separated from the FIFO 48 by a butter 86. The first is with data directly trom the data bus. A bus interface 88 removes data trom the bus for use in the XOR engine. Data is registered externally and parity checked before it is written to the FIFO. A second path is a feedback path from the FIFO into itself. This is needed in order to maintain a copy in the FIFO while unloading the FIFO data onto the bus. The final logical path into the FIFO is the XOR path. The XOR logic 90 exclusive ORs the registered data trom the bus with the contents ot the FLFO. The result of the XOR is input to the FIFO 48 behind the old data as the old data is sequentially output to the exclusive OR logic. The existing data is thus replaced by the exclusive OR results. The selection ot the needed path is done by the sequencer by setting the appropriate bits in the CMD/status register. Whenever the XOR FLFO is being unloaded, the DMA logic can be programmed to append the generated checksum and the wπte and time stamp information tound in the DMA register sets.
The XOR engine can advantageously be used to detect RAID 3 user data parity errors. During a RAID 3 reconstruct operation, the XOR engine can be programmed through the DMA to reconstruct the final piece ot data. When the data has been regenerated, it can be unloaded fiom the FIFO butter 48 to the host-side memory. It the new data is to be compared to the old data, then duπng the unload operation, the check paπty data bit can be set. This allows the new data that is being read from the XOR FIFO to be wπtten back into the FIFO butter 48. The old data when it comes in can then be compared to the new data by setting this bit with an XOR FIFO operation. If there is a mismatch between the old and new data, the check sum error bit will be set in the CMD/status register. Note that duπng the compare, the XOR FIFO will not be reloaded. FIG. 4 illustrates the DMA logic tor performing the checksum on the fly. A check sum butter 92 holds the checksum data as it is being computed. A multiplexor 94 feeds the checksum XOR logic 96 with either the data in the checksum butter 92 or an initial seed data found in a DMA register. The other input to the XOR logic 96 is the data being checksummed.
The high performance data path of the present invention permits an "N-l and go" mode tor performing RALD read operations. If one of the dπves is struggling or not operating, N-l and go mode can maintain a fairly constant transler rate to the host. While N-l and go is quicker for situations where a dπve is struggling or not operational, the mode is also slower than normal operation. Under normal condition, the N-l and go operation is slower because there is an added latency for handling the XOR operation and there is one extra request from the disk-side fiber loop. Instead of requesting data from the data sectors only, the paπty sector is also requested. The advantage of this mode is that the through-put is more deterministic. For applications where this is of great importance, the reduction in speed for normal operation is an acceptable tradeoff. Of course, it should be understood that various changes and modifications to the preterred embodiment described will be apparent to those skilled in the art Such changes could be made without departing trom the spirit and the scope ot the invention and without diminishing its attendant advantages. It is therefore intended that such changes and modifications be covered by the following claims.

Claims

WE CLAIM
1 A high performance data path composing a first bus. a second bus selectively coupled to said tirst bus, a first memory coupled to said tirst bus. a second memory coupled to said second bus. and an XOR engine switchably connectable to said first and second buses to accomplish successive XORing of corresponding data passed between said first and second memoπes along said tirst and second buses so as to produce a result ot said XORing
2 The data path ol claim 1 further comprising a butter switchably connected to receive data passed between said tirst and second memories and connected to said XOR engine to replace data in said butter with a result from said XOR engine ot XORing data passed between said first and second memories with the data in said butter
3 The data path ot claim 2 wherein said buffer composes a FLFO
4 The data path ol claim 1 further comprising a host-bus interlace coupled to said first bus
5 The data path ot claim 4 further comprising a disk-bus interface coupled to said second bus
6 The data path of claim 5 wherein said first memory stages data lor wπtes and functions as a cache
7 The data path ot claim 5 wherein said second memory stages data retπeved duπng a read and functions as a cache
8 The data path of claim 1 lurther compπsing a switch connected to said first bus that permits isolating said first bus trom said second bus and said FIFO engine.
9 The data path ot claim 8 further comprising a switch connected to said second bus that permits isolating said second bus from said first bus and said FLFO engine
10. A high performance data path compπsing: a first bus: a second bus selectively coupled to said first bus; a first memory connected to said tirst bus: a second memory connected to said second bus: a FIFO switchably connectable to said first bus and said second bus: and XOR logic coupled to said FLFO and switchably connectable to said tirst and second buses to permit XORing ot data passed between said first and second memories along said first and second buses with data in said FLFO and placing a result ot said XORing into said FIFO.
1 1. The data path ot claim 10 turther comprising a host-bus interface coupled to said tirst bus.
12. The data path of claim 1 1 turther composing a disk-bus interface coupled to said second bus.
13. The data path ot claim 12 wherein said first memory stages data tor wotes and functions as a cache.
14. The data path ot claim 12 wherein said second memory stages data retrieved during a read and functions as a cache
15. The data path of claim 10 turther composing a switch connected to said tirst bus that permits isolating said first bus from said second bus and said XOR logic. 16 The data path ot claim 15 further comprising a switch connected to said second bus that permits isolating said second bus trom said first bus and said XOR logic
17 A method for performing a read from a group ot interrelated disks where a sector of data in one ot the disks cooesponds with a sector in each ot the other disks in the group, the method comprising the steps of reading cooesponding sectors on at least all but one disk in the group ot disks, writing data trom the corresponding sectors in the at least all but one disks into a disk-side memory. successively moving the data of the corresponding sectors from the disk-side memory into a host-side memory, filling a copy ol the data ot a first ot the corresponding sectors from the disk-side memory into a FIFO successively XORing, in an XOR engine data trom sectors corresponding to the first ot the corresponding sectors trom the disk-side memory with the corresponding data in the FIFO and replacing the data in the FIFO with results trom the XORing until all the cooesponding sectors in the group except one has been XORed, and then moving the data in the FIFO into the host- side memory
18 The method ot claim 17 wherein the step of filling the copy of the data of the first of the corresponding sectors occurs synchronously with moving the data ot the first ot the corresponding sectors trom the disk-side memory into the host-side memory
19 A method for generating a parity sector composing
(a) moving a tirst data sector from a host-side memory into a FLFO and into a disk-side memory,
(b) successively moving data sectors corresponding to the first data sector trom the host-side memory to the disk-side memory (c) performing an XOR in the XOR engine ot data in the FIFO with cooesponding data moved from the host-side memory and replacing the data in the FLFO with results trom said XOR.
(d) successively performing step (c) until all the cooesponding sectors have been XORed with data in the FIFO, and
(e) moving the data trom the F FO to the disk-side memory, said data constituting the parity sector
20 The method of claim 19 turther composing woting the cooesponding data sectors and parity sector trom the disk-side memory onto a group of interrelated disks wherein each ot said disks receives one ot the sectors trom among the corresponding data sectors and parity sector
21 The method ot claim 19 further composing switching to isolate the disk- side memory from the host-side memory and the XOR engine before the step ot writing
PCT/US1997/018523 1996-11-14 1997-10-08 High performance data path with xor on the fly WO1998021656A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
DE69733076T DE69733076T2 (en) 1996-11-14 1997-10-08 HIGH-PERFORMANCE DATA PATH WITH IMMEDIATE XOR
EP97910943A EP0938704B1 (en) 1996-11-14 1997-10-08 High performance data path with xor on the fly
JP52256898A JP3606881B2 (en) 1996-11-14 1997-10-08 High-performance data path that performs Xor operations during operation
CA002268548A CA2268548A1 (en) 1996-11-14 1997-10-08 High performance data path with xor on the fly
AU48201/97A AU4820197A (en) 1996-11-14 1997-10-08 High performance data path with xor on the fly

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US74931296A 1996-11-14 1996-11-14
US08/749,312 1996-11-14
US08/815,193 1997-03-11
US08/815,193 US6161165A (en) 1996-11-14 1997-03-11 High performance data path with XOR on the fly

Publications (1)

Publication Number Publication Date
WO1998021656A1 true WO1998021656A1 (en) 1998-05-22

Family

ID=27115094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/018523 WO1998021656A1 (en) 1996-11-14 1997-10-08 High performance data path with xor on the fly

Country Status (7)

Country Link
US (1) US6161165A (en)
EP (1) EP0938704B1 (en)
JP (2) JP3606881B2 (en)
AU (1) AU4820197A (en)
CA (1) CA2268548A1 (en)
DE (1) DE69733076T2 (en)
WO (1) WO1998021656A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998038576A1 (en) * 1997-02-28 1998-09-03 Network Appliance, Inc. Fly-by xor
US5948110A (en) * 1993-06-04 1999-09-07 Network Appliance, Inc. Method for providing parity in a raid sub-system using non-volatile memory
US5963962A (en) * 1995-05-31 1999-10-05 Network Appliance, Inc. Write anywhere file-system layout
US6119244A (en) * 1998-08-25 2000-09-12 Network Appliance, Inc. Coordinating persistent status information with multiple file servers
US6279011B1 (en) 1998-06-19 2001-08-21 Network Appliance, Inc. Backup and restore for heterogeneous file server environment
US6636879B1 (en) 2000-08-18 2003-10-21 Network Appliance, Inc. Space allocation in a write anywhere file system
US6728922B1 (en) 2000-08-18 2004-04-27 Network Appliance, Inc. Dynamic data space
US6751637B1 (en) 1995-05-31 2004-06-15 Network Appliance, Inc. Allocating files in a file system integrated with a raid disk sub-system
US7072916B1 (en) 2000-08-18 2006-07-04 Network Appliance, Inc. Instant snapshot
US7293097B2 (en) 1997-12-05 2007-11-06 Network Appliance, Inc. Enforcing uniform file-locking for diverse file-locking protocols

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6988176B2 (en) * 1997-09-12 2006-01-17 Hitachi, Ltd. Method and apparatus for data duplexing in storage unit system
US6665781B2 (en) * 2000-10-17 2003-12-16 Hitachi, Ltd. Method and apparatus for data duplexing in storage unit system
US6622224B1 (en) * 1997-12-29 2003-09-16 Micron Technology, Inc. Internal buffered bus for a drum
US6415355B1 (en) * 1998-05-11 2002-07-02 Kabushiki Kaisha Toshiba Combined disk array controller and cache control method for parity generation mode and data restoration mode
US7047357B1 (en) * 1998-10-01 2006-05-16 Intel Corporation Virtualized striping controller
US6460122B1 (en) * 1999-03-31 2002-10-01 International Business Machine Corporation System, apparatus and method for multi-level cache in a multi-processor/multi-controller environment
KR100287190B1 (en) * 1999-04-07 2001-04-16 윤종용 Memory module system connecting a selected memory module with data line &data input/output method for the same
JP2000305856A (en) * 1999-04-26 2000-11-02 Hitachi Ltd Disk subsystems and integration system for them
JP2001022529A (en) * 1999-06-30 2001-01-26 Internatl Business Mach Corp <Ibm> Disk drive device and its controlling method
US6542960B1 (en) * 1999-12-16 2003-04-01 Adaptec, Inc. System and method for parity caching based on stripe locking in raid data storage
JP4434407B2 (en) * 2000-01-28 2010-03-17 株式会社日立製作所 Subsystem and integrated system thereof
JP4044717B2 (en) 2000-03-31 2008-02-06 株式会社日立製作所 Data duplication method and data duplication system for storage subsystem
US6675253B1 (en) * 2000-04-04 2004-01-06 Hewlett-Packard Development Company, L.P. Dynamic routing of data across multiple data paths from a source controller to a destination controller
US6370616B1 (en) * 2000-04-04 2002-04-09 Compaq Computer Corporation Memory interface controller for datum raid operations with a datum multiplier
US7127668B2 (en) * 2000-06-15 2006-10-24 Datadirect Networks, Inc. Data management architecture
US7392291B2 (en) * 2000-08-11 2008-06-24 Applied Micro Circuits Corporation Architecture for providing block-level storage access over a computer network
US6665773B1 (en) * 2000-12-26 2003-12-16 Lsi Logic Corporation Simple and scalable RAID XOR assist logic with overlapped operations
US6799284B1 (en) * 2001-02-28 2004-09-28 Network Appliance, Inc. Reparity bitmap RAID failure recovery
US6513098B2 (en) 2001-05-25 2003-01-28 Adaptec, Inc. Method and apparatus for scalable error correction code generation performance
US7093158B2 (en) * 2002-03-11 2006-08-15 Hewlett-Packard Development Company, L.P. Data redundancy in a hot pluggable, large symmetric multi-processor system
US7111228B1 (en) * 2002-05-07 2006-09-19 Marvell International Ltd. System and method for performing parity checks in disk storage system
US6918007B2 (en) * 2002-09-09 2005-07-12 Hewlett-Packard Development Company, L.P. Memory controller interface with XOR operations on memory read to accelerate RAID operations
US7096407B2 (en) * 2003-02-18 2006-08-22 Hewlett-Packard Development Company, L.P. Technique for implementing chipkill in a memory system
US20040163027A1 (en) * 2003-02-18 2004-08-19 Maclaren John M. Technique for implementing chipkill in a memory system with X8 memory devices
KR20060025135A (en) * 2003-04-21 2006-03-20 네트셀 코포레이션 Disk array controller with reconfigurable date path
US7379974B2 (en) * 2003-07-14 2008-05-27 International Business Machines Corporation Multipath data retrieval from redundant array
US7281177B2 (en) * 2003-07-14 2007-10-09 International Business Machines Corporation Autonomic parity exchange
US7533325B2 (en) * 2003-07-14 2009-05-12 International Business Machines Corporation Anamorphic codes
US7254754B2 (en) * 2003-07-14 2007-08-07 International Business Machines Corporation Raid 3+3
US7428691B2 (en) * 2003-11-12 2008-09-23 Norman Ken Ouchi Data recovery from multiple failed data blocks and storage units
US7913148B2 (en) * 2004-03-12 2011-03-22 Nvidia Corporation Disk controller methods and apparatus with improved striping, redundancy operations and interfaces
TWI251745B (en) * 2004-07-27 2006-03-21 Via Tech Inc Apparatus and related method for calculating parity of redundant array of inexpensive disks
US20060123312A1 (en) * 2004-11-19 2006-06-08 International Business Machines Corporation Method and system for increasing parallelism of disk accesses when restoring data in a disk array system
TWI285313B (en) * 2005-06-22 2007-08-11 Accusys Inc XOR circuit, RAID device capable of recover a plurality of failures and method thereof
US7797467B2 (en) * 2005-11-01 2010-09-14 Lsi Corporation Systems for implementing SDRAM controllers, and buses adapted to include advanced high performance bus features
US20070233926A1 (en) * 2006-03-10 2007-10-04 Inventec Corporation Bus width automatic adjusting method and system
GB0622224D0 (en) * 2006-11-08 2006-12-20 Ibm Apparatus and method for disk read checking
US8458377B2 (en) * 2010-03-05 2013-06-04 Lsi Corporation DMA engine capable of concurrent data manipulation
JP5635621B2 (en) * 2010-09-10 2014-12-03 株式会社日立製作所 Storage system and storage system data transfer method
FR2991799B1 (en) * 2012-06-11 2015-05-29 St Microelectronics Rousset ADAPTING AN ANTENNA CIRCUIT FOR NEAR FIELD COMMUNICATION TERMINAL
US10236043B2 (en) 2016-06-06 2019-03-19 Altera Corporation Emulated multiport memory element circuitry with exclusive-OR based control circuitry
US10372531B2 (en) 2017-01-05 2019-08-06 Texas Instruments Incorporated Error-correcting code memory
US10365967B2 (en) 2017-08-23 2019-07-30 Toshiba Memory Corporation On the fly raid parity calculation
US11662955B2 (en) * 2021-09-27 2023-05-30 GRAID Technology Inc. Direct memory access data path for RAID storage
US11726715B2 (en) 2021-10-11 2023-08-15 Western Digital Technologies, Inc. Efficient data path in compare command execution

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0427119A2 (en) * 1989-11-03 1991-05-15 Compaq Computer Corporation Disk array controller with parity capabilities
US5146588A (en) * 1990-11-26 1992-09-08 Storage Technology Corporation Redundancy accumulator for disk drive array memory
US5163132A (en) * 1987-09-24 1992-11-10 Ncr Corporation Integrated controller using alternately filled and emptied buffers for controlling bi-directional data transfer between a processor and a data storage device
EP0529557A2 (en) * 1991-08-27 1993-03-03 Kabushiki Kaisha Toshiba Apparatus for preventing computer data destructively read out from storage unit
US5335235A (en) * 1992-07-07 1994-08-02 Digital Equipment Corporation FIFO based parity generator
US5396620A (en) * 1993-12-21 1995-03-07 Storage Technology Corporation Method for writing specific values last into data storage groups containing redundancy
EP0727750A2 (en) * 1995-02-17 1996-08-21 Kabushiki Kaisha Toshiba Continuous data server apparatus and data transfer scheme enabling multiple simultaneous data accesses
EP0740247A2 (en) * 1995-04-28 1996-10-30 Hewlett-Packard Company Data stream server system

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4761785B1 (en) * 1986-06-12 1996-03-12 Ibm Parity spreading to enhance storage access
US5191584A (en) * 1991-02-20 1993-03-02 Micropolis Corporation Mass storage array with efficient parity calculation
US5345565A (en) * 1991-03-13 1994-09-06 Ncr Corporation Multiple configuration data path architecture for a disk array controller
JPH07122864B2 (en) * 1991-07-22 1995-12-25 インターナショナル・ビジネス・マシーンズ・コーポレイション Data processing system, interface circuit used in data processing system, and communication method between data processors
US5257391A (en) * 1991-08-16 1993-10-26 Ncr Corporation Disk controller having host interface and bus switches for selecting buffer and drive busses respectively based on configuration control signals
US5708668A (en) * 1992-05-06 1998-01-13 International Business Machines Corporation Method and apparatus for operating an array of storage devices
JP3183719B2 (en) * 1992-08-26 2001-07-09 三菱電機株式会社 Array type recording device
US5537567A (en) * 1994-03-14 1996-07-16 International Business Machines Corporation Parity block configuration in an array of storage devices
JP3661205B2 (en) * 1994-09-09 2005-06-15 株式会社日立製作所 Disk array system and method for generating parity data of disk array system
US5737744A (en) * 1995-10-13 1998-04-07 Compaq Computer Corporation Disk array controller for performing exclusive or operations
US5721839A (en) * 1995-10-13 1998-02-24 Compaq Computer Corporation Apparatus and method for synchronously providing a fullness indication of a dual ported buffer situated between two asynchronous buses
US5809280A (en) * 1995-10-13 1998-09-15 Compaq Computer Corporation Adaptive ahead FIFO with LRU replacement
US5771359A (en) * 1995-10-13 1998-06-23 Compaq Computer Corporation Bridge having a data buffer for each bus master
US5903906A (en) * 1996-06-05 1999-05-11 Compaq Computer Corporation Receiving a write request that allows less than one cache line of data to be written and issuing a subsequent write request that requires at least one cache line of data to be written
US5937174A (en) * 1996-06-28 1999-08-10 Lsi Logic Corporation Scalable hierarchial memory structure for high data bandwidth raid applications
US5748911A (en) * 1996-07-19 1998-05-05 Compaq Computer Corporation Serial bus system for shadowing registers
US5950225A (en) * 1997-02-28 1999-09-07 Network Appliance, Inc. Fly-by XOR for generating parity for data gleaned from a bus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5163132A (en) * 1987-09-24 1992-11-10 Ncr Corporation Integrated controller using alternately filled and emptied buffers for controlling bi-directional data transfer between a processor and a data storage device
EP0427119A2 (en) * 1989-11-03 1991-05-15 Compaq Computer Corporation Disk array controller with parity capabilities
US5146588A (en) * 1990-11-26 1992-09-08 Storage Technology Corporation Redundancy accumulator for disk drive array memory
EP0529557A2 (en) * 1991-08-27 1993-03-03 Kabushiki Kaisha Toshiba Apparatus for preventing computer data destructively read out from storage unit
US5335235A (en) * 1992-07-07 1994-08-02 Digital Equipment Corporation FIFO based parity generator
US5396620A (en) * 1993-12-21 1995-03-07 Storage Technology Corporation Method for writing specific values last into data storage groups containing redundancy
EP0727750A2 (en) * 1995-02-17 1996-08-21 Kabushiki Kaisha Toshiba Continuous data server apparatus and data transfer scheme enabling multiple simultaneous data accesses
EP0740247A2 (en) * 1995-04-28 1996-10-30 Hewlett-Packard Company Data stream server system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5948110A (en) * 1993-06-04 1999-09-07 Network Appliance, Inc. Method for providing parity in a raid sub-system using non-volatile memory
US5963962A (en) * 1995-05-31 1999-10-05 Network Appliance, Inc. Write anywhere file-system layout
US6751637B1 (en) 1995-05-31 2004-06-15 Network Appliance, Inc. Allocating files in a file system integrated with a raid disk sub-system
WO1998038576A1 (en) * 1997-02-28 1998-09-03 Network Appliance, Inc. Fly-by xor
EP1310874A2 (en) * 1997-02-28 2003-05-14 Networkappliance, Inc. Fly-by XOR
US7293097B2 (en) 1997-12-05 2007-11-06 Network Appliance, Inc. Enforcing uniform file-locking for diverse file-locking protocols
US6279011B1 (en) 1998-06-19 2001-08-21 Network Appliance, Inc. Backup and restore for heterogeneous file server environment
US6119244A (en) * 1998-08-25 2000-09-12 Network Appliance, Inc. Coordinating persistent status information with multiple file servers
US6636879B1 (en) 2000-08-18 2003-10-21 Network Appliance, Inc. Space allocation in a write anywhere file system
US6728922B1 (en) 2000-08-18 2004-04-27 Network Appliance, Inc. Dynamic data space
US7072916B1 (en) 2000-08-18 2006-07-04 Network Appliance, Inc. Instant snapshot
US7930326B2 (en) 2000-08-18 2011-04-19 Network Appliance, Inc. Space allocation in a write anywhere file system

Also Published As

Publication number Publication date
EP0938704B1 (en) 2005-04-20
EP0938704A1 (en) 1999-09-01
DE69733076T2 (en) 2006-02-23
CA2268548A1 (en) 1998-05-22
DE69733076D1 (en) 2005-05-25
AU4820197A (en) 1998-06-03
US6161165A (en) 2000-12-12
JP2005032265A (en) 2005-02-03
JP3606881B2 (en) 2005-01-05
JP2001500654A (en) 2001-01-16

Similar Documents

Publication Publication Date Title
US6161165A (en) High performance data path with XOR on the fly
EP0768607B1 (en) Disk array controller for performing exclusive or operations
US5890207A (en) High performance integrated cached storage device
US5890219A (en) Redundant writing of data to cached storage system
US5884055A (en) Method and apparatus including a shared resource and multiple processors running a common control program accessing the shared resource
US5822584A (en) User selectable priority for disk array background operations
US6058489A (en) On-line disk array reconfiguration
US5689678A (en) Distributed storage array system having a plurality of modular control units
US5961652A (en) Read checking for drive rebuild
JP3151008B2 (en) Disk sector analysis method
US5101492A (en) Data redundancy and recovery protection
JP2981245B2 (en) Array type disk drive system and method
US5548711A (en) Method and apparatus for fault tolerant fast writes through buffer dumping
EP1019835A1 (en) Segmented dma with xor buffer for storage subsystems
JPH0683717A (en) Large fault-resistant nonvolatile plural port memories
EP0825534B1 (en) Method and apparatus for parity block generation
US5680538A (en) System and method for maintaining a minimum quality of service during read operations on disk arrays
US5787463A (en) Disk array system including a dual-ported staging memory and concurrent redundancy calculation capability
US6513098B2 (en) Method and apparatus for scalable error correction code generation performance
JPH03184109A (en) Target indication resetting for data processor
JP3256329B2 (en) Disk array device and control method therefor
JP3234211B2 (en) Disk array system
JP2002196893A (en) Disk array device and its controlling method

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA IL JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2268548

Country of ref document: CA

Ref country code: CA

Ref document number: 2268548

Kind code of ref document: A

Format of ref document f/p: F

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1998 522568

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1997910943

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1997910943

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1997910943

Country of ref document: EP