CN103761195A

CN103761195A - Storage method utilizing distributed data encoding

Info

Publication number: CN103761195A
Application number: CN201410009331.1A
Authority: CN
Inventors: 王欢
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2014-01-09
Filing date: 2014-01-09
Publication date: 2014-04-30
Anticipated expiration: 2034-01-09
Also published as: CN103761195B

Abstract

The invention provides a storage method utilizing distributed data encoding. The storage method includes that after a client side process receives data from a cache, encoding calculation is performed to generate a data verifying block; after encoding is completed, a client side sends data pieces to all server sides; every time after write success is returned by a server, a counter is maintained on the client side; according to configuration parameters of a system, when different levels at which data can be restored are reached, rewriting is no longer performed on server nodes failed in writing. Compared with the prior art, the storage method has the advantages that a diaster recovery scheme is realized by utilizing array operation, failures of any multiple storage nodes or magnetic disks are allowable, and CPU (central processing unit) occupancy rate of conventional erasure code or array operation is greatly lowered; the storage method is high in practicability and easy in popularization.

Description

A kind of storage means of utilizing distributed data coding

Technical field

The present invention relates to technical field of computer data storage, specifically a kind of storage means of utilizing distributed data coding.

Background technology

No matter modern traditional distributed file system is business or increases income system, substantially be all to have adopted the mode of many copies to guarantee the security of data, as the GPFS of the IBM of business, the BWFS of blue whale, panFS of Pansas company etc., the for example lustre increasing income, GlusterFS, HDFS, ceph etc., it is generally the mode that adopts many copies between node, ensure the fault redundance of node, on node, adopt again RAID5 or RAID6 etc. to ensure the fault redundance of disk, often the mode of copy is to realize by writing the form of a piece of data more, this performance linear reduction along with the quantity increase of copy with regard to causing writing, for the RAID operation between disk, no matter be that initialization or reconstruction recover, its height is consuming time is all a difficult problem in the industry, in the situation that all inefficient guaranteeing safety, only can ensure on the contrary the fault of one or two node and a disk, differ greatly with industrial 99.9999% requirement.

Nowadays large data age, the most important assets of data Shi Yige enterprise, the security of data is the lifeblood of Shi Yige company many times.Arrival in the time of mobile Internet, the growth rate of data every day of mass users is surprising, and within a certain period of time, data all need the storage of high security, and affect data, store topmost factor, and the one, security, the 2nd, performance.

In the prior art, in traditional distributed file system, copy mode is easily brought fissure problem and problem of inconsistency; Traditional disk level RAID5 or RAID scheme can improve the cost of enterprise greatly simultaneously, and in reconstruction and rejuvenation, power consumption is larger simultaneously, causes the wasting of resources, based on this, now provides a kind of novel, high available high-performance storage method.

Summary of the invention

Technical assignment of the present invention is to solve the deficiencies in the prior art, and a kind of storage means of utilizing distributed data coding is provided.

Technical scheme of the present invention realizes in the following manner, this kind of storage means of utilizing distributed data coding, and its concrete storing process is:

One, client process receives after the data from buffer memory, encode to calculate and generate checking data piece, this checking data piece can guarantee to carry out in 3 nodes or disk failures situation, adopt N+M mode, N is the number of data block, M is the number of check block, and N is more than or equal to 1 natural number, and M is more than or equal to 3 natural number;

Two, after completing coding, client sends data slice to all server ends, after every next server returns and writes successfully, at client maintenance counter, according to the configuration parameter of system, when reaching the different stage that can recover data, to writing the server node of failure, no longer rewrite.

What in described step 1, client was encoded calculating employing in real time is 2+2 grouping parallel high spped coding mode, the particular content of which is: 3 data blocks are one group, adopt two groups of coded system formation check block data separately respectively, carry out the overall coding of whole 6 blocks of data simultaneously, now produce 3 check blocks, redundance reaches 3.

The detailed process of described step 2 is: when client is received read request, by client, send and read instruction to all storage server nodes, in client awaits, accept the data that each storage server returns, after client is carried out the accumulation of consistency detection and counter to the data of returning, for the check block preferentially returning, carry out buffer memory, buffer memory number is determined according to configuration, if follow-up, return to all normal data blocks, data block returns to user and calls, and the check block of buffer memory abandons;

In the process of normally reading, find that there is that data block cannot read or find to damage according to marker bit, by the check block of buffer memory, filled, and according to the grouping situation of 2+2, the calculating of decoding, obtaining after data, then return to the user that call on upper strata.

Described normally reading in flow process, preferential transmission reads request to normal data block node.

In described cluster, there is reparation process, this process is carried out continual inspection to all data, when finding that there is loss of data or node failure, again reading out data, decoding recovers data, then again writes storage server, and reparation process has all been carried out detailed record to repairing file and layout, when new node, put back in cluster, by reparation process, carried out the unloading of data.

The beneficial effect that the present invention compared with prior art produced is:

The reliability that a kind of storage means of utilizing distributed data to encode of the present invention has replaced traditional copy mode to guarantee, and contemporary other RAID scheme of disk level of having replaced, solved fissure problem and the low performance of copy, and the height of the reconstruction in RAID5 or RAID6 scheme and recovery is consuming time and expensive problem; Utilize the computing of matrix to realize disaster recovery solution, allow any number of memory nodes or disk failure, the CPU usage of traditional correcting and eleting codes or matrix operation is reduced greatly; Practical, be easy to promote.

Accompanying drawing explanation

Accompanying drawing 1 is written document process flow diagram of the present invention.

Accompanying drawing 2 is encryption algorithm schematic diagram of the present invention.

Accompanying drawing 3 is read document flowchart in the present invention.

Accompanying drawing 4 is for repairing and read document flowchart in the present invention.

Accompanying drawing 5 is repaired document flowchart in the present invention.

Embodiment

Below in conjunction with accompanying drawing, a kind of storage means of distributed data coding of utilizing of the present invention is described in detail below.

As shown in accompanying drawing 1～5, the invention provides a kind of storage means of utilizing distributed data coding, its concrete storing process is:

As shown in Figure 1, when client process receives after the data from buffer memory, encode to calculate and generate checking data piece, traditional computing method have Cauchy's encoder matrix and Fan Demeng matrix, and carry out matrix operation, often take very high CPU, performance also can not get improving, consider that in practical application, a node failure or two node failures appear in the overwhelming majority, this scheme can guarantee that in actual conditions 3 nodes or disk failures can satisfy the demand, in this technical scheme, be referred to as N+M mode, the number of the timely data block of N, M is the number of check block, M=3 now, corresponding, if adopt copy mode, to there are 4 copies simultaneously, at this moment the performance of copy can reduce greatly.

At HDFS, TFS etc. increase income in scheme, there is equally the input tolerant for Chinese mode of Cauchy matrix or Fan Demeng matrix, what but they all adopted is to carry out on memory node, the real-time that cannot guarantee data security, asynchronous scheme poor reliability of carrying out, security is low, and most schemes is all to adopt the mode that retains copy, write performance can not be guaranteed at all like this, the present invention adopts the client calculating of encoding in real time, and the one-step optimization performance of going forward side by side, no matter from the equal method of far super HDFS and TFS distributed system of security or performance.

Invention has simultaneously adopted a kind of 2+2 grouping parallel high spped coding mode, as shown in Figure 2,3 data blocks are one group, adopt two groups of coded system formation check block data separately respectively, carry out the overall coding of whole 6 blocks of data simultaneously, such one meets 3 check blocks of generation together, and redundance also reaches 3, is equivalent to the scheme of copy 4.And the fastest speed that produces a check block is to adopt uniform enconding, performance is better than the coded system of Fan Demeng matrix or Cauchy matrix greatly.

In practical application, often the damageability of one or two nodes is very large, and recovering data often only needs a verification, has reduced the transmission quantity of check block and has improved the decoding speed that recovers data.When damaging a node at first group, second group when damaging 2 nodes, need two check blocks of the 3rd group and a check block of second group.But the generation of this situation is very low.

After completing coding, client sends data slice to all server ends, after every next server returns and writes successfully, at client maintenance counter, according to the configuration parameter of system, when reaching the different stage that can recover data, to writing the server node of failure, no longer rewrite, the same like this performance of writing that improved, general idea is the write operation of minimum task as far as possible.

File is normally read flow process: as shown in Figure 3, when client is received read request, by client, send and read instruction to all storage server nodes, in client awaits, accept the data that each storage server returns, after client is carried out the accumulation of consistency detection and counter to the data of returning, for the check block preferentially returning, carry out buffer memory, buffer memory number is determined according to configuration, if follow-up, return to all normal data blocks, data block returns to user and calls, and the check block of buffer memory abandons.

In the flow process of reading normally, preferential transmission reads request to normal data block node, avoids the decoding of client to calculate.

Flow process is read in file reparation: as shown in Figure 4, in the process of normally reading, find that there is that data block cannot read or find to damage according to marker bit, there is the check block of buffer memory to fill, and according to grouping 2+2 method, the calculating of decoding, obtain after data, then return to the user that call on upper strata.

File is repaired flow process: as shown in Figure 5, in order to guarantee the security of data, in cluster, can there is reparation process, all data are carried out to continual inspection, when finding that there is loss of data or node failure, again reading out data, decoding recovers data, then again writes storage server, and reparation process has all been carried out detailed record to repairing file and layout, when new node, put back in cluster, by reparation process, carried out the unloading of data.

The present invention can reduce and greatly improves space utilisation and reduce the low performance problem that copy mode is brought, and reduces other RAID cost of disk level simultaneously.

The foregoing is only embodiments of the invention, within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a storage means of utilizing distributed data coding, is characterized in that its concrete storing process is:

2. a kind of storage means of utilizing distributed data coding according to claim 1, it is characterized in that: what in described step 1, client was encoded calculating employing in real time is 2+2 grouping parallel high spped coding mode, the particular content of which is: 3 data blocks are one group, adopt two groups of coded system formation check block data separately respectively, carry out the overall coding of whole 6 blocks of data simultaneously, now produce 3 check blocks, redundance reaches 3.

3. a kind of storage means of utilizing distributed data coding according to claim 2, it is characterized in that: the detailed process of described step 2 is: when client is received read request, by client, send and read instruction to all storage server nodes, in client awaits, accept the data that each storage server returns, after client is carried out the accumulation of consistency detection and counter to the data of returning, for the check block preferentially returning, carry out buffer memory, buffer memory number is determined according to configuration, if follow-up, return to all normal data blocks, data block returns to user and calls, the check block of buffer memory abandons,

4. a kind of storage means of utilizing distributed data coding according to claim 3, is characterized in that: described normally reading in flow process, preferential transmission reads request to normal data block node.

5. according to arbitrary described a kind of storage means of utilizing distributed data coding in claim 1～4, it is characterized in that: in described cluster, have reparation process, this process is carried out continual inspection to all data, when finding that there is loss of data or node failure, again reading out data, decoding recovers data, then again write storage server, reparation process has all been carried out detailed record to repairing file and layout, when new node, put back in cluster, by reparation process, carried out the unloading of data.