CN103761195A - Storage method utilizing distributed data encoding - Google Patents
Storage method utilizing distributed data encoding Download PDFInfo
- Publication number
- CN103761195A CN103761195A CN201410009331.1A CN201410009331A CN103761195A CN 103761195 A CN103761195 A CN 103761195A CN 201410009331 A CN201410009331 A CN 201410009331A CN 103761195 A CN103761195 A CN 103761195A
- Authority
- CN
- China
- Prior art keywords
- data
- client
- coding
- server
- buffer memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a storage method utilizing distributed data encoding. The storage method includes that after a client side process receives data from a cache, encoding calculation is performed to generate a data verifying block; after encoding is completed, a client side sends data pieces to all server sides; every time after write success is returned by a server, a counter is maintained on the client side; according to configuration parameters of a system, when different levels at which data can be restored are reached, rewriting is no longer performed on server nodes failed in writing. Compared with the prior art, the storage method has the advantages that a diaster recovery scheme is realized by utilizing array operation, failures of any multiple storage nodes or magnetic disks are allowable, and CPU (central processing unit) occupancy rate of conventional erasure code or array operation is greatly lowered; the storage method is high in practicability and easy in popularization.
Description
Technical field
The present invention relates to technical field of computer data storage, specifically a kind of storage means of utilizing distributed data coding.
Background technology
No matter modern traditional distributed file system is business or increases income system, substantially be all to have adopted the mode of many copies to guarantee the security of data, as the GPFS of the IBM of business, the BWFS of blue whale, panFS of Pansas company etc., the for example lustre increasing income, GlusterFS, HDFS, ceph etc., it is generally the mode that adopts many copies between node, ensure the fault redundance of node, on node, adopt again RAID5 or RAID6 etc. to ensure the fault redundance of disk, often the mode of copy is to realize by writing the form of a piece of data more, this performance linear reduction along with the quantity increase of copy with regard to causing writing, for the RAID operation between disk, no matter be that initialization or reconstruction recover, its height is consuming time is all a difficult problem in the industry, in the situation that all inefficient guaranteeing safety, only can ensure on the contrary the fault of one or two node and a disk, differ greatly with industrial 99.9999% requirement.
Nowadays large data age, the most important assets of data Shi Yige enterprise, the security of data is the lifeblood of Shi Yige company many times.Arrival in the time of mobile Internet, the growth rate of data every day of mass users is surprising, and within a certain period of time, data all need the storage of high security, and affect data, store topmost factor, and the one, security, the 2nd, performance.
In the prior art, in traditional distributed file system, copy mode is easily brought fissure problem and problem of inconsistency; Traditional disk level RAID5 or RAID scheme can improve the cost of enterprise greatly simultaneously, and in reconstruction and rejuvenation, power consumption is larger simultaneously, causes the wasting of resources, based on this, now provides a kind of novel, high available high-performance storage method.
Summary of the invention
Technical assignment of the present invention is to solve the deficiencies in the prior art, and a kind of storage means of utilizing distributed data coding is provided.
Technical scheme of the present invention realizes in the following manner, this kind of storage means of utilizing distributed data coding, and its concrete storing process is:
One, client process receives after the data from buffer memory, encode to calculate and generate checking data piece, this checking data piece can guarantee to carry out in 3 nodes or disk failures situation, adopt N+M mode, N is the number of data block, M is the number of check block, and N is more than or equal to 1 natural number, and M is more than or equal to 3 natural number;
Two, after completing coding, client sends data slice to all server ends, after every next server returns and writes successfully, at client maintenance counter, according to the configuration parameter of system, when reaching the different stage that can recover data, to writing the server node of failure, no longer rewrite.
What in described step 1, client was encoded calculating employing in real time is 2+2 grouping parallel high spped coding mode, the particular content of which is: 3 data blocks are one group, adopt two groups of coded system formation check block data separately respectively, carry out the overall coding of whole 6 blocks of data simultaneously, now produce 3 check blocks, redundance reaches 3.
The detailed process of described step 2 is: when client is received read request, by client, send and read instruction to all storage server nodes, in client awaits, accept the data that each storage server returns, after client is carried out the accumulation of consistency detection and counter to the data of returning, for the check block preferentially returning, carry out buffer memory, buffer memory number is determined according to configuration, if follow-up, return to all normal data blocks, data block returns to user and calls, and the check block of buffer memory abandons;
In the process of normally reading, find that there is that data block cannot read or find to damage according to marker bit, by the check block of buffer memory, filled, and according to the grouping situation of 2+2, the calculating of decoding, obtaining after data, then return to the user that call on upper strata.
Described normally reading in flow process, preferential transmission reads request to normal data block node.
In described cluster, there is reparation process, this process is carried out continual inspection to all data, when finding that there is loss of data or node failure, again reading out data, decoding recovers data, then again writes storage server, and reparation process has all been carried out detailed record to repairing file and layout, when new node, put back in cluster, by reparation process, carried out the unloading of data.
The beneficial effect that the present invention compared with prior art produced is:
The reliability that a kind of storage means of utilizing distributed data to encode of the present invention has replaced traditional copy mode to guarantee, and contemporary other RAID scheme of disk level of having replaced, solved fissure problem and the low performance of copy, and the height of the reconstruction in RAID5 or RAID6 scheme and recovery is consuming time and expensive problem; Utilize the computing of matrix to realize disaster recovery solution, allow any number of memory nodes or disk failure, the CPU usage of traditional correcting and eleting codes or matrix operation is reduced greatly; Practical, be easy to promote.
Accompanying drawing explanation
Accompanying drawing 1 is written document process flow diagram of the present invention.
Accompanying drawing 3 is read document flowchart in the present invention.
Accompanying drawing 4 is for repairing and read document flowchart in the present invention.
Accompanying drawing 5 is repaired document flowchart in the present invention.
Embodiment
Below in conjunction with accompanying drawing, a kind of storage means of distributed data coding of utilizing of the present invention is described in detail below.
As shown in accompanying drawing 1~5, the invention provides a kind of storage means of utilizing distributed data coding, its concrete storing process is:
As shown in Figure 1, when client process receives after the data from buffer memory, encode to calculate and generate checking data piece, traditional computing method have Cauchy's encoder matrix and Fan Demeng matrix, and carry out matrix operation, often take very high CPU, performance also can not get improving, consider that in practical application, a node failure or two node failures appear in the overwhelming majority, this scheme can guarantee that in actual conditions 3 nodes or disk failures can satisfy the demand, in this technical scheme, be referred to as N+M mode, the number of the timely data block of N, M is the number of check block, M=3 now, corresponding, if adopt copy mode, to there are 4 copies simultaneously, at this moment the performance of copy can reduce greatly.
At HDFS, TFS etc. increase income in scheme, there is equally the input tolerant for Chinese mode of Cauchy matrix or Fan Demeng matrix, what but they all adopted is to carry out on memory node, the real-time that cannot guarantee data security, asynchronous scheme poor reliability of carrying out, security is low, and most schemes is all to adopt the mode that retains copy, write performance can not be guaranteed at all like this, the present invention adopts the client calculating of encoding in real time, and the one-step optimization performance of going forward side by side, no matter from the equal method of far super HDFS and TFS distributed system of security or performance.
Invention has simultaneously adopted a kind of 2+2 grouping parallel high spped coding mode, as shown in Figure 2,3 data blocks are one group, adopt two groups of coded system formation check block data separately respectively, carry out the overall coding of whole 6 blocks of data simultaneously, such one meets 3 check blocks of generation together, and redundance also reaches 3, is equivalent to the scheme of copy 4.And the fastest speed that produces a check block is to adopt uniform enconding, performance is better than the coded system of Fan Demeng matrix or Cauchy matrix greatly.
In practical application, often the damageability of one or two nodes is very large, and recovering data often only needs a verification, has reduced the transmission quantity of check block and has improved the decoding speed that recovers data.When damaging a node at first group, second group when damaging 2 nodes, need two check blocks of the 3rd group and a check block of second group.But the generation of this situation is very low.
After completing coding, client sends data slice to all server ends, after every next server returns and writes successfully, at client maintenance counter, according to the configuration parameter of system, when reaching the different stage that can recover data, to writing the server node of failure, no longer rewrite, the same like this performance of writing that improved, general idea is the write operation of minimum task as far as possible.
File is normally read flow process: as shown in Figure 3, when client is received read request, by client, send and read instruction to all storage server nodes, in client awaits, accept the data that each storage server returns, after client is carried out the accumulation of consistency detection and counter to the data of returning, for the check block preferentially returning, carry out buffer memory, buffer memory number is determined according to configuration, if follow-up, return to all normal data blocks, data block returns to user and calls, and the check block of buffer memory abandons.
In the flow process of reading normally, preferential transmission reads request to normal data block node, avoids the decoding of client to calculate.
Flow process is read in file reparation: as shown in Figure 4, in the process of normally reading, find that there is that data block cannot read or find to damage according to marker bit, there is the check block of buffer memory to fill, and according to grouping 2+2 method, the calculating of decoding, obtain after data, then return to the user that call on upper strata.
File is repaired flow process: as shown in Figure 5, in order to guarantee the security of data, in cluster, can there is reparation process, all data are carried out to continual inspection, when finding that there is loss of data or node failure, again reading out data, decoding recovers data, then again writes storage server, and reparation process has all been carried out detailed record to repairing file and layout, when new node, put back in cluster, by reparation process, carried out the unloading of data.
The present invention can reduce and greatly improves space utilisation and reduce the low performance problem that copy mode is brought, and reduces other RAID cost of disk level simultaneously.
The foregoing is only embodiments of the invention, within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (5)
1. a storage means of utilizing distributed data coding, is characterized in that its concrete storing process is:
One, client process receives after the data from buffer memory, encode to calculate and generate checking data piece, this checking data piece can guarantee to carry out in 3 nodes or disk failures situation, adopt N+M mode, N is the number of data block, M is the number of check block, and N is more than or equal to 1 natural number, and M is more than or equal to 3 natural number;
Two, after completing coding, client sends data slice to all server ends, after every next server returns and writes successfully, at client maintenance counter, according to the configuration parameter of system, when reaching the different stage that can recover data, to writing the server node of failure, no longer rewrite.
2. a kind of storage means of utilizing distributed data coding according to claim 1, it is characterized in that: what in described step 1, client was encoded calculating employing in real time is 2+2 grouping parallel high spped coding mode, the particular content of which is: 3 data blocks are one group, adopt two groups of coded system formation check block data separately respectively, carry out the overall coding of whole 6 blocks of data simultaneously, now produce 3 check blocks, redundance reaches 3.
3. a kind of storage means of utilizing distributed data coding according to claim 2, it is characterized in that: the detailed process of described step 2 is: when client is received read request, by client, send and read instruction to all storage server nodes, in client awaits, accept the data that each storage server returns, after client is carried out the accumulation of consistency detection and counter to the data of returning, for the check block preferentially returning, carry out buffer memory, buffer memory number is determined according to configuration, if follow-up, return to all normal data blocks, data block returns to user and calls, the check block of buffer memory abandons,
In the process of normally reading, find that there is that data block cannot read or find to damage according to marker bit, by the check block of buffer memory, filled, and according to the grouping situation of 2+2, the calculating of decoding, obtaining after data, then return to the user that call on upper strata.
4. a kind of storage means of utilizing distributed data coding according to claim 3, is characterized in that: described normally reading in flow process, preferential transmission reads request to normal data block node.
5. according to arbitrary described a kind of storage means of utilizing distributed data coding in claim 1~4, it is characterized in that: in described cluster, have reparation process, this process is carried out continual inspection to all data, when finding that there is loss of data or node failure, again reading out data, decoding recovers data, then again write storage server, reparation process has all been carried out detailed record to repairing file and layout, when new node, put back in cluster, by reparation process, carried out the unloading of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410009331.1A CN103761195B (en) | 2014-01-09 | 2014-01-09 | Storage method utilizing distributed data encoding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410009331.1A CN103761195B (en) | 2014-01-09 | 2014-01-09 | Storage method utilizing distributed data encoding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103761195A true CN103761195A (en) | 2014-04-30 |
CN103761195B CN103761195B (en) | 2017-05-10 |
Family
ID=50528437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410009331.1A Active CN103761195B (en) | 2014-01-09 | 2014-01-09 | Storage method utilizing distributed data encoding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103761195B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104731676A (en) * | 2015-03-24 | 2015-06-24 | 浪潮集团有限公司 | Method for accelerating data recovery of cluster system |
CN105791353A (en) * | 2014-12-23 | 2016-07-20 | 深圳市腾讯计算机系统有限公司 | Distributed data storage method and system based on erasure code |
CN105930103A (en) * | 2016-05-10 | 2016-09-07 | 南京大学 | Distributed storage CEPH based erasure correction code overwriting method |
WO2017041233A1 (en) * | 2015-09-08 | 2017-03-16 | 广东超算数据安全技术有限公司 | Encoding and storage node repairing method for functional-repair regenerating code |
CN107615248A (en) * | 2015-06-17 | 2018-01-19 | 华为技术有限公司 | Distributed data storage method, control device and system |
CN109976663A (en) * | 2017-12-27 | 2019-07-05 | 浙江宇视科技有限公司 | Distributed storage response method and system |
CN110059068A (en) * | 2019-04-11 | 2019-07-26 | 厦门网宿有限公司 | Data verification method and data verification system in a kind of distributed memory system |
CN111610938A (en) * | 2020-05-29 | 2020-09-01 | 宁波富万信息科技有限公司 | Distributed data code storage method, electronic device and computer readable storage medium |
CN112732164A (en) * | 2019-10-28 | 2021-04-30 | 北京白山耘科技有限公司 | Cross-node data group management method, device and medium |
WO2023184921A1 (en) * | 2022-03-30 | 2023-10-05 | 苏州浪潮智能科技有限公司 | Raid encoding and decoding method and apparatus, and device and non-volatile readable storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106227617A (en) * | 2016-07-15 | 2016-12-14 | 乐视控股(北京)有限公司 | Self-repair method and storage system based on correcting and eleting codes algorithm |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1530836A (en) * | 2003-03-17 | 2004-09-22 | 株式会社瑞萨科技 | Nonvolatile memory device and data processing system |
US20090172464A1 (en) * | 2007-12-30 | 2009-07-02 | Agere Systems Inc. | Method and apparatus for repairing uncorrectable drive errors in an integrated network attached storage device |
CN101488104A (en) * | 2009-02-26 | 2009-07-22 | 北京世纪互联宽带数据中心有限公司 | System and method for implementing high-efficiency security memory |
US7681104B1 (en) * | 2004-08-09 | 2010-03-16 | Bakbone Software, Inc. | Method for erasure coding data across a plurality of data stores in a network |
US7823009B1 (en) * | 2001-02-16 | 2010-10-26 | Parallels Holdings, Ltd. | Fault tolerant distributed storage for cloud computing |
CN102272731A (en) * | 2008-11-10 | 2011-12-07 | 弗森-艾奥公司 | Apparatus, system, and method for predicting failures in solid-state storage |
US20120192037A1 (en) * | 2007-07-18 | 2012-07-26 | Panasas, Inc. | Data storage systems and methods having block group error correction for repairing unrecoverable read errors |
US20130173956A1 (en) * | 2011-12-30 | 2013-07-04 | Streamscale, Inc. | Using parity data for concurrent data authentication, correction, compression, and encryption |
-
2014
- 2014-01-09 CN CN201410009331.1A patent/CN103761195B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7823009B1 (en) * | 2001-02-16 | 2010-10-26 | Parallels Holdings, Ltd. | Fault tolerant distributed storage for cloud computing |
CN1530836A (en) * | 2003-03-17 | 2004-09-22 | 株式会社瑞萨科技 | Nonvolatile memory device and data processing system |
US7681104B1 (en) * | 2004-08-09 | 2010-03-16 | Bakbone Software, Inc. | Method for erasure coding data across a plurality of data stores in a network |
US20120192037A1 (en) * | 2007-07-18 | 2012-07-26 | Panasas, Inc. | Data storage systems and methods having block group error correction for repairing unrecoverable read errors |
US20090172464A1 (en) * | 2007-12-30 | 2009-07-02 | Agere Systems Inc. | Method and apparatus for repairing uncorrectable drive errors in an integrated network attached storage device |
CN102272731A (en) * | 2008-11-10 | 2011-12-07 | 弗森-艾奥公司 | Apparatus, system, and method for predicting failures in solid-state storage |
CN101488104A (en) * | 2009-02-26 | 2009-07-22 | 北京世纪互联宽带数据中心有限公司 | System and method for implementing high-efficiency security memory |
US20130173956A1 (en) * | 2011-12-30 | 2013-07-04 | Streamscale, Inc. | Using parity data for concurrent data authentication, correction, compression, and encryption |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105791353A (en) * | 2014-12-23 | 2016-07-20 | 深圳市腾讯计算机系统有限公司 | Distributed data storage method and system based on erasure code |
CN105791353B (en) * | 2014-12-23 | 2020-03-17 | 深圳市腾讯计算机系统有限公司 | Distributed data storage method and system based on erasure codes |
CN104731676A (en) * | 2015-03-24 | 2015-06-24 | 浪潮集团有限公司 | Method for accelerating data recovery of cluster system |
CN107615248B (en) * | 2015-06-17 | 2019-12-13 | 华为技术有限公司 | Distributed data storage method, control equipment and system |
CN107615248A (en) * | 2015-06-17 | 2018-01-19 | 华为技术有限公司 | Distributed data storage method, control device and system |
WO2017041233A1 (en) * | 2015-09-08 | 2017-03-16 | 广东超算数据安全技术有限公司 | Encoding and storage node repairing method for functional-repair regenerating code |
CN105930103B (en) * | 2016-05-10 | 2019-04-16 | 南京大学 | A kind of correcting and eleting codes covering write method of distributed storage CEPH |
CN105930103A (en) * | 2016-05-10 | 2016-09-07 | 南京大学 | Distributed storage CEPH based erasure correction code overwriting method |
CN109976663A (en) * | 2017-12-27 | 2019-07-05 | 浙江宇视科技有限公司 | Distributed storage response method and system |
CN109976663B (en) * | 2017-12-27 | 2021-12-28 | 浙江宇视科技有限公司 | Distributed storage response method and system |
CN110059068A (en) * | 2019-04-11 | 2019-07-26 | 厦门网宿有限公司 | Data verification method and data verification system in a kind of distributed memory system |
CN112732164A (en) * | 2019-10-28 | 2021-04-30 | 北京白山耘科技有限公司 | Cross-node data group management method, device and medium |
CN111610938A (en) * | 2020-05-29 | 2020-09-01 | 宁波富万信息科技有限公司 | Distributed data code storage method, electronic device and computer readable storage medium |
WO2023184921A1 (en) * | 2022-03-30 | 2023-10-05 | 苏州浪潮智能科技有限公司 | Raid encoding and decoding method and apparatus, and device and non-volatile readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103761195B (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103761195B (en) | Storage method utilizing distributed data encoding | |
US9552258B2 (en) | Method and system for storing data in raid memory devices | |
CN109725822B (en) | Method, apparatus and computer program product for managing a storage system | |
US10162704B1 (en) | Grid encoded data storage systems for efficient data repair | |
US10089176B1 (en) | Incremental updates of grid encoded data storage systems | |
US9998539B1 (en) | Non-parity in grid encoded data storage systems | |
US9904589B1 (en) | Incremental media size extension for grid encoded data storage systems | |
US9063910B1 (en) | Data recovery after triple disk failure | |
US11513891B2 (en) | Systems and methods for parity-based failure protection for storage devices | |
CN109814807B (en) | Data storage method and device | |
CN110442535B (en) | Method and system for improving reliability of distributed solid-state disk key value cache system | |
CN105956128B (en) | A kind of adaptive coding storage fault-tolerance approach based on simple regeneration code | |
CN102110154B (en) | File redundancy storage method in cluster file system | |
CN102520890B (en) | RS (Reed-Solomon) - DRAID( D redundant array of independent disk) system based on GPUs (graphic processing units) and method for controlling data of memory devices | |
CN101984400B (en) | RAID control method, device and system | |
CN103729151A (en) | Failure data recovery method based on improved erasure codes | |
US20140164695A1 (en) | Method and system for storing and rebuilding data | |
CN110427156B (en) | Partition-based MBR (Membrane biological reactor) parallel reading method | |
CN108228382A (en) | A kind of data reconstruction method for EVENODD code single-deck failures | |
CN107153661A (en) | A kind of storage, read method and its device of the data based on HDFS systems | |
CN101901115B (en) | Method for constructing redundant array of inexpensive disks (RAID) 6 level | |
CN104866243A (en) | RAID-6 transverse and oblique check encoding and decoding method for optimizing input/output load | |
Iliadis | Reliability evaluation of erasure-coded storage systems with latent errors | |
CN116501553B (en) | Data recovery method, device, system, electronic equipment and storage medium | |
US9928141B1 (en) | Exploiting variable media size in grid encoded data storage systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |