CN103530387A

CN103530387A - Improved method aimed at small files of HDFS

Info

Publication number: CN103530387A
Application number: CN201310494888.4A
Authority: CN
Inventors: 孟祥飞; 邓鹏飞; 吴楠; 宗栋瑞; 邓强
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2013-10-22
Filing date: 2013-10-22
Publication date: 2014-01-22

Abstract

The invention relates to a computer HDFS field and discloses an improved method aimed at small files of an HDFS. Partial authority of a Namenode is delegated to a Datanode which caches element data information of part of small files and the Datanode is used for processing the writing and reading request of most small files. Burdens of the Namenode are reduced to the utmost. The new processing method is provided for solving the problem that the efficiency in processing the small files by the HDFS is low. The improved method aimed at the small files of the HDFS can effectively solve the problem that burdens of a single node, namely the Datanode are excessively large and pressure of the small files is distributed to the Datanode. Thus, large file and small file processing efficiency and performance of the integral large data processing colony are quite ideal.

Description

A kind of HDFS improves one's methods for small documents

Technical field

The present invention relates to computing machine HDFS distributed file system field, be specifically related to a kind of HDFS improving one's methods for small documents.

Technical background

Hadoop Distributed File System, is called for short HDFS, is a distributed file system.

Along with the fast development of internet, data volume presents exponential growth, in order to adapt to this situation, has nowadays occurred that very great server architecture is as data center and cloud computing.Aspect large data processing, the GFS of Google provides effective method for processing large file, and file system HDFS under Hadoop is as the GFS realization of increasing income, most functions of GFS have been realized, it is also based on large file processing, the treatment effeciency of large file is outstanding, yet but very low in processing small documents efficiency, because need memory request address repeatedly when storage small documents, the block(piece of memory allocated), a large amount of small documents makes the HDFS of single Namenode (namenode) seem unable to do what one wishes, and produced the internal memory that a large amount of metadata takies Namenode.Yet, in the middle of actual application, small documents can be found everywhere, the daily file producing from individual application all can produce many small documents in web application, blog in the last few years particularly, wiki, the rise of spaces has caused internet to provide the mode of content to change, take the arrival of web2.0 of user's create contents as sign, indicate that web is own through becoming maximum infosystem, its data have magnanimity, various, dynamic change etc. feature, web2.0 website can produce the small documents of magnanimity, as user's head portrait, the files such as photograph album thumbnail, journal file, introduce etc.This has proposed higher demand challenge to processing quickly the small documents of these magnanimity, and the researchist of industry turns to sight here one after another.

Summary of the invention

The technical problem to be solved in the present invention is: the invention provides improving one's methods that a kind of HDFS processes for small documents, the method wants effectively to solve the problem of HDFS inefficiency when processing small documents.Mainly solve particularly following problem:

1, single Namenode safeguards metadata problem when processing read-write requests;

2, metadata structure problem;

3, metadata cache policing issue;

4, metadata updates policing issue.

The technical solution adopted in the present invention is: a kind of HDFS improves one's methods for small documents, comprise a cluster, cluster comprises a Namenode(namenode) and a plurality of Datanode(back end), can be by a plurality of client-access, as shown in Figure 1, wherein, the part authority of Namenode is transferred on Datanode node, allow Datanode buffer memory part small documents metadata information, process most small documents read-write requests.

Except all metadata of Namenode managing file system, Datanode also preserves the metadata of part, is mainly the metadata information of small documents, and large file metadata information is still kept on Namenode.These metadata comprise name space, access control information, file is to the map information of piece, and the position at the current place of piece, the upper all meta-data preservations of Datanode are among internal memory, and Namenode is because small documents metadata is huge, the meta-data preservation of most small documents is on disk.

The wherein also activity of management system scope of Namenode, for example piece is rented management, the movement between data server of the garbage reclamation of orphan's piece and piece is as copied, delete etc., and follow periodically each Datanode communication with heartbeat message, to them, command operating is provided and collects their feedback states.

The operation of client is no longer single to Namenode request, search or distribute corresponding metadata, but the metadata operation of part small documents is assigned on Datanode and is carried out, as searching of small documents, if do not find corresponding result searches to Namenode again, during written document, client basis read-write record in the past, whether direct inquiry Datanode has is not write block blocks of files full and that now other client is not writing, if had, directly data are write to this data block, and upgrade corresponding metadata information, if do not had, to Namenode, send write data requests, by Namenode, distributing a new data block to complete data again writes, client data query piece completes in the machine, while reading file, directly inquire about Datanode, if fail to find, then search Namenode.

File is divided into the piece of fixed measure, still adopts the 64M of former HDFS acquiescence.When each piece creates, Namenode distributes to its constant piece handle number of globally unique 64 it is identified, wherein first is used for identifying big or small file data blocks, Datanode is kept at piece on local hard drive as linux file, and the scope in data block is read and write blocks of data according to the piece handle number of appointment and byte.In order to guarantee reliability, each piece can copy on a plurality of block servers.Under default situations, system is still preserved three backups.

Piece size is one of key of whole HDFS design, because distributing, inert space helped avoid the space waste that causes internal fragmentation to bring, and for the division unification with making whole HDFS size blocks of files, under default situations, remain 64MB, can modify by configuration file.At this, we can define the file that is less than 1M is all small documents.

Adopt large data block in new HDFS, to process small documents in this patent and have several large benefits, first be exactly the burden that alleviates Namenode, because it is to be still responsible for by Namenode that the establishment of whole is divided, after a newly-built complete piece, follow-up most toward storage file in piece, be to manage by Datanode.

Here suppose to simulate with a kind of ideal situation the mode that the request of small documents storage is processed by system.The small documents size of each management is 1M, the piece of a 64M can be preserved 64 files, writing of 64 files ideally only has one by Namenode, to be controlled, and the ratio of the file that Datanode controls controls with Namenode is 63:1, thereby, effectively alleviated the burden of Namenode.Secondly, being arranged on while processing small documents of bulk can effectively utilize Cache to reduce the access times to disk, improves speed storage speed.Finally, the division of bulk can reduce the small documents metadata space by Namenode and Datanode management.

In design, the establishment of all pieces is also responsible for by Namenode, but when establishment writes small documents piece, selects the strategy of the position that piece places more complicated before comparing;

While writing small documents piece, the Datanode that need to preserve at Namenode create less than piece list in select to create in the list that number is less than 3 and select corresponding machine, and meet two rules that HDFS selects piece position (hard disk utilization rate and and client between topology distance), three pieces are also to distribute according to former HDFS, two in same frame, and one in another one frame.

When client (Client) is not but less than the Datanode of 3 pieces to Namenode request establishment data block, Namenode directly selects a Datanode to process the request of client according to the topological structure between them, Datanode by the request queue of Client toward oneself not writing the data of writing direct in full data block.After all data blocks are write completely in Datanode, still there is Client write request, be queued in top Client and create new piece to Namenode request.Because just write, be now the Datanode's that impossible existence do not satisfy condition.

Metadata is the control information of whole system, there is very crucial effect, in order to alleviate the burden of Namenode, optimize the read or write speed of HDFS, after Namenode creates a piece, control to this block operations is transferred to Datanode, and store a plurality of small documents in a data block.

Therefore according to above requirement, need to revise the metadata structure that small documents is preserved in system.Because need to preserve a plurality of files in a piece, so need to revise the mapping between small documents and data block.

First, file has added start offset amount and the length of file in data block to the conventional mapping basis of being mapped in of piece, as shown in Figure 2, filename(filename) be the filename character string with fullpath.Secondly, data block was to the single mapping of being mapped as of file in the past, change to: data block is to the mapping of file array set, still whether each file with effective flag, as shown in Figure 3,1 for markup document be effective, 0 is invalid, Size is notebook data piece total size of include file, after allowing client often find a data block, a plurality of data files of insertion that can try one's best, reduce the number of times of repeatedly searching block, and this patent is arranged on while searching block thinks that when Size is greater than to arrange when data block size deducts 1M data are own full.

The data structures such as position of the Datanode that the NameSpace of file and piece and block are corresponding do not change, and still continue to use former structure.

The metadata of Namenode buffer memory, the NameSpace of buffer memory All Files and piece, file is to the mapping of piece, and the position of each piece copy, wherein these metadata of large file are to be kept in internal memory, and small documents just preserve a wherein the hottest part and less than the metadata of piece, so Namenode also needs to safeguard that a list about small documents is for determining the metadata at which small documents of internal memory buffer memory.In addition, Namenede also will safeguard the newly-built list less than number of blocks of said extra Datanode above, for the establishment of late time data piece.

Next is the metadata of Datanode buffer memory, is also that this designs of paramount importance part, is used for effectively alleviating the burden of Namenode.Here the metadata of Datanode buffer memory is all about small documents.All metadata of large file still only have Namenode to carry out buffer memory, and the scope of the metadata of each Datanode buffer memory only has the block of two backups in this frame.

In addition, Client also needs the fileinfo of buffer memory part, first Client need to determine the Datanode of access, the information that just needs buffer memory Datanode, each Client safeguards that the list of a fixed qty keeps the Datanode accessing, in the middle of selecting each time, the maximum Datanode of access times conducts interviews, and what the replacement of list was used is lru algorithm.

Metadata (MetaData) update strategy is as follows:

First be the metadata information upgrading on Namenode, in former HDFS, when writing data, client first to Namenode, asks distribute data piece, Namenode is after setting up the NameSpace of file and piece, the position of the title of piece and corresponding Datanode is sent to client, client directly connects Datanode and carries out writing of file, and after written document completes, notice Namenode upgrades.

In the present invention, when system is when writing small documents, do not take this policy update metadata information, after completing of each small documents write, and notify not in time Namenode to upgrade metadata information, but write full or data block is produced read request and now do not have other client when writing this piece or do not have read-write requests to surpass 1 minute to this piece in data block, Datanode just notifies Namenode to upgrade the metadata information about this piece, comprises the file that comprises in newly-built file name space and piece etc.Add corresponding information to Datanede does not write in full list simultaneously.This strategy can significantly be accelerated the speed of whole system writing in files.

Next is the metadata information upgrading on Datanode, Namenode is when initiating self and Datanode adds cluster, the information of its piece of comprising of inquiry Datanode, analyze small documents data block wherein, analyze and each frame that contains two backups of this data block, by this piece, to file, file sends on each Datanode of this frame to the metadata of this piece and the corresponding Datanode of this piece.After the normal operation of Datanode, metadata can be upgraded after data block is modified, identifier to the meeting of the modification each time modified block of piece is unified, whether what the identifier between all backups that comprise this data block by direct contrast when therefore client directly conducts interviews to Datanode was determined is active block, when occurring in inconsistent situation, each Datanode by and Namenode between the heartbeat message of rule, to Namenode, ask the relevant metadata information of data block therewith of up-to-date own buffer memory, Datanode is after the information of the data block of notice Namenode renewal small documents, Namenode is at the new small documents piece that creates, after Datanode inefficacy and copied chunks, also can notify more new metadata of corresponding frame.

Small documents I/O: when Client needs written document, first from the Datanode list of own buffer memory, select the maximum Datanode of access directly to access, if do not had, to Namenode request, create new data block, if now can not create, Namenode distributes a Datanode according to topological path for it; Client connects Datanode and sends write request, and Datanode is its main block address of distributing writeable data block then to notify client to comprise this piece lease, and client can be carried out record to the Datanode of access each time in buffer memory;

During Client file reading, also be first in the Datanode of own buffer memory list, to select the maximum Datanode of read-write number of times directly to connect, if, do not searched on Namenode, if had, in the metadata of corresponding Datanode buffer memory, search corresponding file, if find corresponding block, this Datanode directly returns to the corresponding data block of Client file and the side-play amount of file in piece and the Datanode of length and preservation piece, if do not found, Client searches Namenode again, Namenode returns to Client again and wants the Datanode of file reading storage and blockid and file in side-play amount and the length of piece, Client is after obtaining corresponding block, connect corresponding Datanode directly reads in piece, when Client directly connects Datanode and finds corresponding piece and carry out reading of file, whether the data that need checking to read are valid data, adopt the strategy of two steps cards: first search whether read the identifier that all three backups of this piece check piece identical, if the same directly read, if different, to Namenode request, upgrade metadata information, and then search and read, secondly in the process of reading out data, by check code, verify whether read data content is effectively got.

Beneficial effect of the present invention is:

The present invention is directed to the inefficiency problem that HDFS processes small documents, a kind of new disposal route has been proposed, the method efficiently solves the overweight problem of Namenode single-point load, the pressure distribution of small documents is upper to back end Datanode, thus reached the ideal effect that whole large data processing cluster is suitable with small documents treatment effeciency performance to large file.

Accompanying drawing explanation

Fig. 1 is the configuration diagram that HDFS is new;

Fig. 2 is that file is to the mapping schematic diagram of Block;

Fig. 3 is that Block is to the mapping schematic diagram of small documents group;

Fig. 4 is written document process flow diagram;

Fig. 5 is for reading document flowchart.

Embodiment

With reference to the accompanying drawings, in conjunction with the embodiments to the detailed description of the invention.

Embodiment 1:

As shown in Figure 1, a kind of HDFS improves one's methods for small documents, comprise a cluster, wherein in cluster, comprise a Namenode and a plurality of Datanode, can be by a plurality of client-access, wherein the part authority of Namenode is transferred on Datanode node, allowed Datanode buffer memory part small documents metadata information, process most small documents read-write requests.

Embodiment 2:

On the basis of embodiment 1, the present embodiment is except all metadata of Namenode managing file system, Datanode also preserves the metadata of part, it is mainly the metadata information of small documents, large file metadata information is still kept on Namenode, wherein Namenode is in charge of the activity of system scope, and follows periodically each Datanode communication with heartbeat message, to them, command operating is provided and collects their feedback states.

Embodiment 3:

On the basis of embodiment 1, the operation of the present embodiment client is assigned to the metadata operation of part small documents on Datanode and carries out, if do not find corresponding result searches to Namenode again, during written document, client basis read-write record in the past, whether direct inquiry Datanode has is not write block blocks of files full and that now other client is not writing, if had, directly data are write to this data block, and upgrade corresponding metadata information, if do not had, to Namenode, send write data requests, by Namenode, distributing a new data block to complete data again writes, client data query piece completes in the machine, while reading file, directly inquire about Datanode, if fail to find, then search Namenode.

Embodiment 4:

On the basis of embodiment 1, file is divided into the piece of fixed measure, adopt the 64M of former HDFS acquiescence, when each piece creates, Namenode distributes to its constant piece handle number of globally unique 64 it is identified, wherein first is used for identifying big or small file data blocks, and Datanode is kept at piece on local hard drive as linux file, and the scope in data block is read and write blocks of data according to the piece handle number of appointment and byte.

Embodiment 5:

On the basis of embodiment 4, the establishment that this enforcement is whole is divided and is responsible for by Namenode, after a newly-built complete piece, follow-up most toward storage file in piece, be to manage by Datanode, the Datanode preserving at Namenode while writing small documents piece created less than piece list in select to create in the list that number is less than 3 and select corresponding machine, and meet HDFS select piece position two rule-hard disk utilization rates and and client between topology distance; When client (Client) is not but less than the Datanode of 3 pieces to Namenode request establishment data block, Namenode directly selects a Datanode to process the request of client according to the topological structure between them, Datanode by the request queue of Client toward oneself not writing the data of writing direct in full data block.After all data blocks are write completely in Datanode, still there is Client write request, be queued in top Client and create new piece to Namenode request.

Embodiment 6:

On the basis of embodiment 4 or 5, the present embodiment, after Namenode creates a piece, is transferred to Datanode by the control to this block operations, and store a plurality of small documents in a data block.

Embodiment 7:

On the basis of embodiment 6, the present embodiment file has added start offset amount and the length of file in data block to the conventional mapping basis of being mapped in of piece, filename is the filename character string with fullpath, data block arrives the mapping block of file to the mapping of file array set, and still whether each file with effective flag.

Embodiment 8:

On the basis of embodiment 1, the metadata of the present embodiment Namenode buffer memory, the NameSpace of buffer memory All Files and piece, file is to the mapping of piece, and the position of each piece copy, wherein these metadata of large file are to be kept in internal memory, and small documents only preserve a wherein the hottest part and less than the metadata of piece, Namenode also needs to safeguard that a list about small documents is for determining the metadata at which small documents of internal memory buffer memory, Namenode also will safeguard the newly-built list less than number of blocks of said extra Datanode above, establishment for late time data piece,

The metadata of Datanode buffer memory is all about small documents, and all metadata of large file still only have Namenode to carry out buffer memory, and the scope of the metadata of each Datanode buffer memory only has the block of two backups in this frame;

Client also needs the fileinfo of buffer memory part, the information that needs buffer memory Datanode, each Client safeguards that the list of a fixed qty keeps the Datanode accessing, in the middle of selecting each time, the maximum Datanode of access times conducts interviews, and what the replacement of list was used is lru algorithm.

Embodiment 9:

On the basis of embodiment 1, the update strategy of metadata is as follows described in the present embodiment:

Figure 2013104948884100002DEST_PATH_IMAGE002

first be the metadata information upgrading on Namenode, when system is when writing small documents, after completing of each small documents write, in data block, write full or data block is produced read request and now do not have other client when writing this piece or do not have read-write requests to surpass 1 minute to this piece, Datanode notice Namenode upgrades the metadata information about this piece, comprise file comprising in newly-built file name space and piece etc., corresponding information is added to Datanede not write in full list simultaneously;

next is the metadata information upgrading on Datanode, Namenode is when initiating self and Datanode adds cluster, the information of its piece of comprising of inquiry Datanode, analyze small documents data block wherein, analyze the frame that each contains two backups of this data block, by this piece, to file, file sends on each Datanode of this frame to the metadata of this piece and the corresponding Datanode of this piece, after the normal operation of Datanode, metadata can be upgraded after data block is modified, identifier to the meeting of the modification each time modified block of piece is unified, whether what the identifier between all backups that comprise this data block by direct contrast when therefore client directly conducts interviews to Datanode was determined is active block, when occurring in inconsistent situation, each Datanode by and Namenode between the heartbeat message of rule, to Namenode, ask the relevant metadata information of data block therewith of up-to-date own buffer memory, Datanode is after the information of the data block of notice Namenode renewal small documents, Namenode is at the new small documents piece that creates, after Datanode inefficacy and copied chunks, also can notify more new metadata of corresponding frame.

Embodiment 10:

On the basis of embodiment 1, the present embodiment is when Client needs written document, first from the Datanode list of own buffer memory, select the maximum Datanode of access directly to access, if do not had, to Namenode request, create new data block, if now can not create, Namenode distributes a Datanode according to topological path for it; Client connects Datanode and sends write request, and Datanode is its main block address of distributing writeable data block then to notify client to comprise this piece lease, and client can be carried out record to the Datanode of access each time in buffer memory;

Claims

1. a HDFS improving one's methods for small documents, comprise a cluster, wherein in cluster, comprise a Namenode and a plurality of Datanode, can be by a plurality of client-access, it is characterized in that: the part authority of Namenode is transferred on Datanode node, allow Datanode buffer memory part small documents metadata information, process most small documents read-write requests.

2. a kind of HDFS according to claim 1 improving one's methods for small documents, it is characterized in that: except all metadata of Namenode managing file system, Datanode also preserves the metadata of part, it is mainly the metadata information of small documents, large file metadata information is still kept on Namenode, wherein Namenode is in charge of the activity of system scope, and follow periodically each Datanode communication with heartbeat message, to them, command operating is provided and collects their feedback states.

3. a kind of HDFS according to claim 1 improving one's methods for small documents, it is characterized in that: the operation of client is assigned to the metadata operation of part small documents on Datanode and carries out, if do not find corresponding result searches to Namenode again, during written document, client basis read-write record in the past, whether direct inquiry Datanode has is not write block blocks of files full and that now other client is not writing, if had, directly data are write to this data block, and upgrade corresponding metadata information, if do not had, to Namenode, send write data requests, by Namenode, distributing a new data block to complete data again writes, client data query piece completes in the machine, while reading file, directly inquire about Datanode, if fail to find, then search Namenode.

4. a kind of HDFS according to claim 1 improving one's methods for small documents, it is characterized in that: file is divided into the piece of fixed measure, adopt the 64M of former HDFS acquiescence, when each piece creates, Namenode distributes to its constant piece handle number of globally unique 64 it is identified, wherein first is used for identifying big or small file data blocks, Datanode is kept at piece on local hard drive as linux file, and the scope in data block is read and write blocks of data according to the piece handle number of appointment and byte.

5. a kind of HDFS according to claim 4 improving one's methods for small documents, it is characterized in that: the establishment of whole is divided and is responsible for by Namenode, after a newly-built complete piece, follow-up most toward storage file in piece, be to manage by Datanode, the Datanode preserving at Namenode while writing small documents piece created less than piece list in select to create in the list that number is less than 3 and select corresponding machine, and meet HDFS select piece position two rule-hard disk utilization rates and and client between topology distance; When client (Client) is not but less than the Datanode of 3 pieces to Namenode request establishment data block, Namenode directly selects a Datanode to process the request of client according to the topological structure between them, Datanode by the request queue of Client toward oneself not writing the data of writing direct in full data block;

After all data blocks are write completely in Datanode, still there is Client write request, be queued in top Client and create new piece to Namenode request.

According to a kind of HDFS described in claim 4 or 5 for the improving one's methods of small documents, it is characterized in that: after Namenode creates a piece, the control to this block operations is transferred to Datanode, and store a plurality of small documents in a data block.

7. a kind of HDFS according to claim 6 improving one's methods for small documents, it is characterized in that: file has added start offset amount and the length of file in data block to the conventional mapping basis of being mapped in of piece, filename is the filename character string with fullpath, data block arrives the mapping block of file to the mapping of file array set, and still whether each file with effective flag.

8. a kind of HDFS according to claim 1 improving one's methods for small documents, it is characterized in that: the metadata of Namenode buffer memory, the NameSpace of buffer memory All Files and piece, file is to the mapping of piece, and the position of each piece copy, wherein these metadata of large file are to be kept in internal memory, and small documents only preserve a wherein the hottest part and less than the metadata of piece, Namenode also needs to safeguard that a list about small documents is for determining the metadata at which small documents of internal memory buffer memory, Namenode also will safeguard the newly-built list less than number of blocks of said extra Datanode above, establishment for late time data piece,

9. a kind of HDFS according to claim 1, for the improving one's methods of small documents, is characterized in that: the update strategy of described metadata is as follows:

Figure 2013104948884100001DEST_PATH_IMAGE001

10. a kind of HDFS according to claim 1, for the improving one's methods of small documents, is characterized in that:

When Client needs written document, first from the Datanode list of own buffer memory, select the maximum Datanode of access directly to access, if do not had, to Namenode request, create new data block, if now can not create, Namenode distributes a Datanode according to topological path for it; Client connects Datanode and sends write request, and Datanode is its main block address of distributing writeable data block then to notify client to comprise this piece lease, and client can be carried out record to the Datanode of access each time in buffer memory;