WO2014089311A3 - Raid surveyor - Google Patents

Raid surveyor Download PDF

Info

Publication number
WO2014089311A3
WO2014089311A3 PCT/US2013/073347 US2013073347W WO2014089311A3 WO 2014089311 A3 WO2014089311 A3 WO 2014089311A3 US 2013073347 W US2013073347 W US 2013073347W WO 2014089311 A3 WO2014089311 A3 WO 2014089311A3
Authority
WO
WIPO (PCT)
Prior art keywords
disk drive
data storage
data
failing
storage subsystem
Prior art date
Application number
PCT/US2013/073347
Other languages
French (fr)
Other versions
WO2014089311A2 (en
Inventor
Anthony J. FLOEDER
Derek J. ANDERSON
Original Assignee
Compellent Technologies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Compellent Technologies filed Critical Compellent Technologies
Priority to IN2706DEN2015 priority Critical patent/IN2015DN02706A/en
Priority to CN201380059018.2A priority patent/CN104813290B/en
Priority to EP13860693.4A priority patent/EP2929435B1/en
Publication of WO2014089311A2 publication Critical patent/WO2014089311A2/en
Publication of WO2014089311A3 publication Critical patent/WO2014089311A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1088Reconstruction on already foreseen single or plurality of spare disks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/004Error avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/008Reliability or availability analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/076Error or fault detection not based on redundancy by exceeding limits by exceeding a count or rate limit, e.g. word- or bit count limit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1092Rebuilding, e.g. when physically replacing a failing disk
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2069Management of state, configuration or failover
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2211/00Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
    • G06F2211/10Indexing scheme relating to G06F11/10
    • G06F2211/1002Indexing scheme relating to G06F11/1076
    • G06F2211/1088Scrubbing in RAID systems with parity

Abstract

A method for surveying a data storage subsystem for latent errors before a failing disk drive of the data storage subsystem fails and recovering unreadable data usable to reconstruct data of the failing disk drive. The method includes determining that a disk drive of a plurality of disk drives of the data storage subsystem meets a threshold for being identified as a failing disk drive, and prior to failure of the failing disk drive, surveying at least a portion of the data on the remaining plurality of disk drives to identify data storage areas with latent errors. The identified data storage areas may be reconstructed utilizing, at least in part, data stored on the failing disk drive.
PCT/US2013/073347 2012-12-06 2013-12-05 Raid surveyor WO2014089311A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
IN2706DEN2015 IN2015DN02706A (en) 2012-12-06 2013-12-05
CN201380059018.2A CN104813290B (en) 2012-12-06 2013-12-05 RAID investigation machines
EP13860693.4A EP2929435B1 (en) 2012-12-06 2013-12-05 Raid surveyor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/706,553 US9135096B2 (en) 2012-12-06 2012-12-06 RAID surveyor
US13/706,553 2012-12-06

Publications (2)

Publication Number Publication Date
WO2014089311A2 WO2014089311A2 (en) 2014-06-12
WO2014089311A3 true WO2014089311A3 (en) 2014-07-31

Family

ID=50882388

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/073347 WO2014089311A2 (en) 2012-12-06 2013-12-05 Raid surveyor

Country Status (5)

Country Link
US (2) US9135096B2 (en)
EP (1) EP2929435B1 (en)
CN (1) CN104813290B (en)
IN (1) IN2015DN02706A (en)
WO (1) WO2014089311A2 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9104604B2 (en) * 2013-02-26 2015-08-11 International Business Machines Corporation Preventing unrecoverable errors during a disk regeneration in a disk array
CN106557389B (en) * 2015-09-29 2019-03-08 成都华为技术有限公司 A kind of slow disk detection method and device
CN106250278A (en) * 2016-08-04 2016-12-21 深圳市泽云科技有限公司 The data of magnetic disk array restoration methods that an a kind of key performs
US10678643B1 (en) * 2017-04-26 2020-06-09 EMC IP Holding Company LLC Splitting a group of physical data storage drives into partnership groups to limit the risk of data loss during drive rebuilds in a mapped RAID (redundant array of independent disks) data storage system
US10346247B1 (en) * 2017-04-27 2019-07-09 EMC IP Holding Company LLC Adjustable error sensitivity for taking disks offline in a mapped RAID storage array
US10210045B1 (en) * 2017-04-27 2019-02-19 EMC IP Holding Company LLC Reducing concurrency bottlenecks while rebuilding a failed drive in a data storage system
CN109725838B (en) * 2017-10-27 2022-02-25 伊姆西Ip控股有限责任公司 Method, apparatus and computer readable medium for managing a plurality of discs
US10691543B2 (en) 2017-11-14 2020-06-23 International Business Machines Corporation Machine learning to enhance redundant array of independent disks rebuilds
US10740181B2 (en) * 2018-03-06 2020-08-11 Western Digital Technologies, Inc. Failed storage device rebuild method
CN110413454B (en) * 2018-04-28 2022-04-05 华为技术有限公司 Data reconstruction method and device based on storage array and storage medium
CN111124264B (en) * 2018-10-31 2023-10-27 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for reconstructing data
CN109873985A (en) * 2019-03-01 2019-06-11 苏州星奥达科技有限公司 A kind of intelligent backup restoration methods of pair of video platform cluster
US11182358B2 (en) 2019-07-18 2021-11-23 International Business Machines Corporation Performance enhanced data scrubbing
CN115206406A (en) 2021-04-12 2022-10-18 伊姆西Ip控股有限责任公司 Method and device for managing redundant array of independent disks

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080115017A1 (en) * 2006-10-31 2008-05-15 Jacobson Michael B Detection and correction of block-level data corruption in fault-tolerant data-storage systems
US7574623B1 (en) * 2005-04-29 2009-08-11 Network Appliance, Inc. Method and system for rapidly recovering data from a “sick” disk in a RAID disk group
US20120079189A1 (en) * 2010-09-28 2012-03-29 John Colgrove Intra-device data protection in a raid array
US20120084600A1 (en) * 2010-10-01 2012-04-05 Lsi Corporation Method and system for data reconstruction after drive failures

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7017107B2 (en) * 2001-04-30 2006-03-21 Sun Microsystems, Inc. Storage array employing scrubbing operations at the disk-controller level
US6934904B2 (en) * 2001-04-30 2005-08-23 Sun Microsystems, Inc. Data integrity error handling in a redundant storage array
US6871263B2 (en) * 2001-08-28 2005-03-22 Sedna Patent Services, Llc Method and apparatus for striping data onto a plurality of disk drives
US7363546B2 (en) * 2002-07-31 2008-04-22 Sun Microsystems, Inc. Latent fault detector
CN101566931B (en) 2003-08-14 2011-05-18 克姆佩棱特科技公司 Virtual disk drive system and method
US7590801B1 (en) * 2004-02-12 2009-09-15 Netapp, Inc. Identifying suspect disks
US7313721B2 (en) * 2004-06-21 2007-12-25 Dot Hill Systems Corporation Apparatus and method for performing a preemptive reconstruct of a fault-tolerant RAID array
US20090055682A1 (en) * 2007-07-18 2009-02-26 Panasas Inc. Data storage systems and methods having block group error correction for repairing unrecoverable read errors
US7971093B1 (en) * 2008-01-16 2011-06-28 Network Appliance, Inc. Apparatus and method to proactively address hard disk drive inefficiency and failure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7574623B1 (en) * 2005-04-29 2009-08-11 Network Appliance, Inc. Method and system for rapidly recovering data from a “sick” disk in a RAID disk group
US20080115017A1 (en) * 2006-10-31 2008-05-15 Jacobson Michael B Detection and correction of block-level data corruption in fault-tolerant data-storage systems
US20120079189A1 (en) * 2010-09-28 2012-03-29 John Colgrove Intra-device data protection in a raid array
US20120084600A1 (en) * 2010-10-01 2012-04-05 Lsi Corporation Method and system for data reconstruction after drive failures

Also Published As

Publication number Publication date
CN104813290B (en) 2018-09-21
EP2929435A4 (en) 2016-11-02
US9135096B2 (en) 2015-09-15
CN104813290A (en) 2015-07-29
WO2014089311A2 (en) 2014-06-12
IN2015DN02706A (en) 2015-09-04
US20150347232A1 (en) 2015-12-03
US10025666B2 (en) 2018-07-17
US20140164849A1 (en) 2014-06-12
EP2929435B1 (en) 2020-03-25
EP2929435A2 (en) 2015-10-14

Similar Documents

Publication Publication Date Title
WO2014089311A3 (en) Raid surveyor
WO2010120475A3 (en) Data recovery in a solid state storage system
WO2012125315A3 (en) Virtual disk storage techniques
WO2012051600A3 (en) File system-aware solid-state storage management system
GB2511681A (en) Use of virtual drive as hot spare for raid group
TWI563380B (en) Using reliability information from multiple storage units and a parity storage unit to recover data for a failed one of the storage units
WO2013012673A3 (en) Flash disk array and controller
WO2014039322A3 (en) Techniques for recovering a virtual machine
GB2485872B (en) Wear leveling of solid state disks based on usage information of data and parity received from a raid controller
EP2180407A3 (en) Fast data recovery from HDD failure
WO2014164134A3 (en) Detecting effect of corrupting event on preloaded data in non-volatile memory
WO2012100087A3 (en) Apparatus, system, and method for managing out-of-service conditions
GB2515709A (en) Systems and methods for preventing data loss
WO2011116071A3 (en) Mlc self-raid flash data protection scheme
WO2014039227A3 (en) Error detection and correction in a memory system
WO2013070366A3 (en) Statistical read comparison signal generation for memory systems
WO2012118756A3 (en) Compressed journaling in event tracking files for metadata recovery and replication
WO2012052800A8 (en) Two stage checksummed raid storage model
WO2014105447A3 (en) Backup user interface
GB201206443D0 (en) Backup and storage system
WO2013144720A3 (en) Improved performance for large versioned databases
WO2013057174A9 (en) Comparing positional data
WO2011123699A3 (en) Systems and methods for securing data in motion
WO2013006293A3 (en) Unaligned data coalescing
WO2011071818A3 (en) Extending ssd lifetime using hybrid storage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13860693

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2013860693

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13860693

Country of ref document: EP

Kind code of ref document: A2