CN103530387A - Improved method aimed at small files of HDFS - Google Patents

Improved method aimed at small files of HDFS Download PDF

Info

Publication number
CN103530387A
CN103530387A CN201310494888.4A CN201310494888A CN103530387A CN 103530387 A CN103530387 A CN 103530387A CN 201310494888 A CN201310494888 A CN 201310494888A CN 103530387 A CN103530387 A CN 103530387A
Authority
CN
China
Prior art keywords
datanode
piece
namenode
file
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310494888.4A
Other languages
Chinese (zh)
Inventor
孟祥飞
邓鹏飞
吴楠
宗栋瑞
邓强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201310494888.4A priority Critical patent/CN103530387A/en
Publication of CN103530387A publication Critical patent/CN103530387A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/1827Management specifically adapted to NAS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/503Resource availability

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a computer HDFS field and discloses an improved method aimed at small files of an HDFS. Partial authority of a Namenode is delegated to a Datanode which caches element data information of part of small files and the Datanode is used for processing the writing and reading request of most small files. Burdens of the Namenode are reduced to the utmost. The new processing method is provided for solving the problem that the efficiency in processing the small files by the HDFS is low. The improved method aimed at the small files of the HDFS can effectively solve the problem that burdens of a single node, namely the Datanode are excessively large and pressure of the small files is distributed to the Datanode. Thus, large file and small file processing efficiency and performance of the integral large data processing colony are quite ideal.

Description

A kind of HDFS improves one's methods for small documents
Technical field
The present invention relates to computing machine HDFS distributed file system field, be specifically related to a kind of HDFS improving one's methods for small documents.
Technical background
Hadoop Distributed File System, is called for short HDFS, is a distributed file system.
Along with the fast development of internet, data volume presents exponential growth, in order to adapt to this situation, has nowadays occurred that very great server architecture is as data center and cloud computing.Aspect large data processing, the GFS of Google provides effective method for processing large file, and file system HDFS under Hadoop is as the GFS realization of increasing income, most functions of GFS have been realized, it is also based on large file processing, the treatment effeciency of large file is outstanding, yet but very low in processing small documents efficiency, because need memory request address repeatedly when storage small documents, the block(piece of memory allocated), a large amount of small documents makes the HDFS of single Namenode (namenode) seem unable to do what one wishes, and produced the internal memory that a large amount of metadata takies Namenode.Yet, in the middle of actual application, small documents can be found everywhere, the daily file producing from individual application all can produce many small documents in web application, blog in the last few years particularly, wiki, the rise of spaces has caused internet to provide the mode of content to change, take the arrival of web2.0 of user's create contents as sign, indicate that web is own through becoming maximum infosystem, its data have magnanimity, various, dynamic change etc. feature, web2.0 website can produce the small documents of magnanimity, as user's head portrait, the files such as photograph album thumbnail, journal file, introduce etc.This has proposed higher demand challenge to processing quickly the small documents of these magnanimity, and the researchist of industry turns to sight here one after another.
Summary of the invention
The technical problem to be solved in the present invention is: the invention provides improving one's methods that a kind of HDFS processes for small documents, the method wants effectively to solve the problem of HDFS inefficiency when processing small documents.Mainly solve particularly following problem:
1, single Namenode safeguards metadata problem when processing read-write requests;
2, metadata structure problem;
3, metadata cache policing issue;
4, metadata updates policing issue.
The technical solution adopted in the present invention is: a kind of HDFS improves one's methods for small documents, comprise a cluster, cluster comprises a Namenode(namenode) and a plurality of Datanode(back end), can be by a plurality of client-access, as shown in Figure 1, wherein, the part authority of Namenode is transferred on Datanode node, allow Datanode buffer memory part small documents metadata information, process most small documents read-write requests.
Except all metadata of Namenode managing file system, Datanode also preserves the metadata of part, is mainly the metadata information of small documents, and large file metadata information is still kept on Namenode.These metadata comprise name space, access control information, file is to the map information of piece, and the position at the current place of piece, the upper all meta-data preservations of Datanode are among internal memory, and Namenode is because small documents metadata is huge, the meta-data preservation of most small documents is on disk.
The wherein also activity of management system scope of Namenode, for example piece is rented management, the movement between data server of the garbage reclamation of orphan's piece and piece is as copied, delete etc., and follow periodically each Datanode communication with heartbeat message, to them, command operating is provided and collects their feedback states.
The operation of client is no longer single to Namenode request, search or distribute corresponding metadata, but the metadata operation of part small documents is assigned on Datanode and is carried out, as searching of small documents, if do not find corresponding result searches to Namenode again, during written document, client basis read-write record in the past, whether direct inquiry Datanode has is not write block blocks of files full and that now other client is not writing, if had, directly data are write to this data block, and upgrade corresponding metadata information, if do not had, to Namenode, send write data requests, by Namenode, distributing a new data block to complete data again writes, client data query piece completes in the machine, while reading file, directly inquire about Datanode, if fail to find, then search Namenode.
File is divided into the piece of fixed measure, still adopts the 64M of former HDFS acquiescence.When each piece creates, Namenode distributes to its constant piece handle number of globally unique 64 it is identified, wherein first is used for identifying big or small file data blocks, Datanode is kept at piece on local hard drive as linux file, and the scope in data block is read and write blocks of data according to the piece handle number of appointment and byte.In order to guarantee reliability, each piece can copy on a plurality of block servers.Under default situations, system is still preserved three backups.
Piece size is one of key of whole HDFS design, because distributing, inert space helped avoid the space waste that causes internal fragmentation to bring, and for the division unification with making whole HDFS size blocks of files, under default situations, remain 64MB, can modify by configuration file.At this, we can define the file that is less than 1M is all small documents.
Adopt large data block in new HDFS, to process small documents in this patent and have several large benefits, first be exactly the burden that alleviates Namenode, because it is to be still responsible for by Namenode that the establishment of whole is divided, after a newly-built complete piece, follow-up most toward storage file in piece, be to manage by Datanode.
Here suppose to simulate with a kind of ideal situation the mode that the request of small documents storage is processed by system.The small documents size of each management is 1M, the piece of a 64M can be preserved 64 files, writing of 64 files ideally only has one by Namenode, to be controlled, and the ratio of the file that Datanode controls controls with Namenode is 63:1, thereby, effectively alleviated the burden of Namenode.Secondly, being arranged on while processing small documents of bulk can effectively utilize Cache to reduce the access times to disk, improves speed storage speed.Finally, the division of bulk can reduce the small documents metadata space by Namenode and Datanode management.
In design, the establishment of all pieces is also responsible for by Namenode, but when establishment writes small documents piece, selects the strategy of the position that piece places more complicated before comparing;
While writing small documents piece, the Datanode that need to preserve at Namenode create less than piece list in select to create in the list that number is less than 3 and select corresponding machine, and meet two rules that HDFS selects piece position (hard disk utilization rate and and client between topology distance), three pieces are also to distribute according to former HDFS, two in same frame, and one in another one frame.
When client (Client) is not but less than the Datanode of 3 pieces to Namenode request establishment data block, Namenode directly selects a Datanode to process the request of client according to the topological structure between them, Datanode by the request queue of Client toward oneself not writing the data of writing direct in full data block.After all data blocks are write completely in Datanode, still there is Client write request, be queued in top Client and create new piece to Namenode request.Because just write, be now the Datanode's that impossible existence do not satisfy condition.
Metadata is the control information of whole system, there is very crucial effect, in order to alleviate the burden of Namenode, optimize the read or write speed of HDFS, after Namenode creates a piece, control to this block operations is transferred to Datanode, and store a plurality of small documents in a data block.
Therefore according to above requirement, need to revise the metadata structure that small documents is preserved in system.Because need to preserve a plurality of files in a piece, so need to revise the mapping between small documents and data block.
First, file has added start offset amount and the length of file in data block to the conventional mapping basis of being mapped in of piece, as shown in Figure 2, filename(filename) be the filename character string with fullpath.Secondly, data block was to the single mapping of being mapped as of file in the past, change to: data block is to the mapping of file array set, still whether each file with effective flag, as shown in Figure 3,1 for markup document be effective, 0 is invalid, Size is notebook data piece total size of include file, after allowing client often find a data block, a plurality of data files of insertion that can try one's best, reduce the number of times of repeatedly searching block, and this patent is arranged on while searching block thinks that when Size is greater than to arrange when data block size deducts 1M data are own full.
The data structures such as position of the Datanode that the NameSpace of file and piece and block are corresponding do not change, and still continue to use former structure.
The metadata of Namenode buffer memory, the NameSpace of buffer memory All Files and piece, file is to the mapping of piece, and the position of each piece copy, wherein these metadata of large file are to be kept in internal memory, and small documents just preserve a wherein the hottest part and less than the metadata of piece, so Namenode also needs to safeguard that a list about small documents is for determining the metadata at which small documents of internal memory buffer memory.In addition, Namenede also will safeguard the newly-built list less than number of blocks of said extra Datanode above, for the establishment of late time data piece.
Next is the metadata of Datanode buffer memory, is also that this designs of paramount importance part, is used for effectively alleviating the burden of Namenode.Here the metadata of Datanode buffer memory is all about small documents.All metadata of large file still only have Namenode to carry out buffer memory, and the scope of the metadata of each Datanode buffer memory only has the block of two backups in this frame.
In addition, Client also needs the fileinfo of buffer memory part, first Client need to determine the Datanode of access, the information that just needs buffer memory Datanode, each Client safeguards that the list of a fixed qty keeps the Datanode accessing, in the middle of selecting each time, the maximum Datanode of access times conducts interviews, and what the replacement of list was used is lru algorithm.
Metadata (MetaData) update strategy is as follows:
First be the metadata information upgrading on Namenode, in former HDFS, when writing data, client first to Namenode, asks distribute data piece, Namenode is after setting up the NameSpace of file and piece, the position of the title of piece and corresponding Datanode is sent to client, client directly connects Datanode and carries out writing of file, and after written document completes, notice Namenode upgrades.
In the present invention, when system is when writing small documents, do not take this policy update metadata information, after completing of each small documents write, and notify not in time Namenode to upgrade metadata information, but write full or data block is produced read request and now do not have other client when writing this piece or do not have read-write requests to surpass 1 minute to this piece in data block, Datanode just notifies Namenode to upgrade the metadata information about this piece, comprises the file that comprises in newly-built file name space and piece etc.Add corresponding information to Datanede does not write in full list simultaneously.This strategy can significantly be accelerated the speed of whole system writing in files.
Next is the metadata information upgrading on Datanode, Namenode is when initiating self and Datanode adds cluster, the information of its piece of comprising of inquiry Datanode, analyze small documents data block wherein, analyze and each frame that contains two backups of this data block, by this piece, to file, file sends on each Datanode of this frame to the metadata of this piece and the corresponding Datanode of this piece.After the normal operation of Datanode, metadata can be upgraded after data block is modified, identifier to the meeting of the modification each time modified block of piece is unified, whether what the identifier between all backups that comprise this data block by direct contrast when therefore client directly conducts interviews to Datanode was determined is active block, when occurring in inconsistent situation, each Datanode by and Namenode between the heartbeat message of rule, to Namenode, ask the relevant metadata information of data block therewith of up-to-date own buffer memory, Datanode is after the information of the data block of notice Namenode renewal small documents, Namenode is at the new small documents piece that creates, after Datanode inefficacy and copied chunks, also can notify more new metadata of corresponding frame.
Small documents I/O: when Client needs written document, first from the Datanode list of own buffer memory, select the maximum Datanode of access directly to access, if do not had, to Namenode request, create new data block, if now can not create, Namenode distributes a Datanode according to topological path for it; Client connects Datanode and sends write request, and Datanode is its main block address of distributing writeable data block then to notify client to comprise this piece lease, and client can be carried out record to the Datanode of access each time in buffer memory;
During Client file reading, also be first in the Datanode of own buffer memory list, to select the maximum Datanode of read-write number of times directly to connect, if, do not searched on Namenode, if had, in the metadata of corresponding Datanode buffer memory, search corresponding file, if find corresponding block, this Datanode directly returns to the corresponding data block of Client file and the side-play amount of file in piece and the Datanode of length and preservation piece, if do not found, Client searches Namenode again, Namenode returns to Client again and wants the Datanode of file reading storage and blockid and file in side-play amount and the length of piece, Client is after obtaining corresponding block, connect corresponding Datanode directly reads in piece, when Client directly connects Datanode and finds corresponding piece and carry out reading of file, whether the data that need checking to read are valid data, adopt the strategy of two steps cards: first search whether read the identifier that all three backups of this piece check piece identical, if the same directly read, if different, to Namenode request, upgrade metadata information, and then search and read, secondly in the process of reading out data, by check code, verify whether read data content is effectively got.
Beneficial effect of the present invention is:
The present invention is directed to the inefficiency problem that HDFS processes small documents, a kind of new disposal route has been proposed, the method efficiently solves the overweight problem of Namenode single-point load, the pressure distribution of small documents is upper to back end Datanode, thus reached the ideal effect that whole large data processing cluster is suitable with small documents treatment effeciency performance to large file.
Accompanying drawing explanation
Fig. 1 is the configuration diagram that HDFS is new;
Fig. 2 is that file is to the mapping schematic diagram of Block;
Fig. 3 is that Block is to the mapping schematic diagram of small documents group;
Fig. 4 is written document process flow diagram;
Fig. 5 is for reading document flowchart.
Embodiment
With reference to the accompanying drawings, in conjunction with the embodiments to the detailed description of the invention.
Embodiment 1:
As shown in Figure 1, a kind of HDFS improves one's methods for small documents, comprise a cluster, wherein in cluster, comprise a Namenode and a plurality of Datanode, can be by a plurality of client-access, wherein the part authority of Namenode is transferred on Datanode node, allowed Datanode buffer memory part small documents metadata information, process most small documents read-write requests.
Embodiment 2:
On the basis of embodiment 1, the present embodiment is except all metadata of Namenode managing file system, Datanode also preserves the metadata of part, it is mainly the metadata information of small documents, large file metadata information is still kept on Namenode, wherein Namenode is in charge of the activity of system scope, and follows periodically each Datanode communication with heartbeat message, to them, command operating is provided and collects their feedback states.
Embodiment 3:
On the basis of embodiment 1, the operation of the present embodiment client is assigned to the metadata operation of part small documents on Datanode and carries out, if do not find corresponding result searches to Namenode again, during written document, client basis read-write record in the past, whether direct inquiry Datanode has is not write block blocks of files full and that now other client is not writing, if had, directly data are write to this data block, and upgrade corresponding metadata information, if do not had, to Namenode, send write data requests, by Namenode, distributing a new data block to complete data again writes, client data query piece completes in the machine, while reading file, directly inquire about Datanode, if fail to find, then search Namenode.
Embodiment 4:
On the basis of embodiment 1, file is divided into the piece of fixed measure, adopt the 64M of former HDFS acquiescence, when each piece creates, Namenode distributes to its constant piece handle number of globally unique 64 it is identified, wherein first is used for identifying big or small file data blocks, and Datanode is kept at piece on local hard drive as linux file, and the scope in data block is read and write blocks of data according to the piece handle number of appointment and byte.
Embodiment 5:
On the basis of embodiment 4, the establishment that this enforcement is whole is divided and is responsible for by Namenode, after a newly-built complete piece, follow-up most toward storage file in piece, be to manage by Datanode, the Datanode preserving at Namenode while writing small documents piece created less than piece list in select to create in the list that number is less than 3 and select corresponding machine, and meet HDFS select piece position two rule-hard disk utilization rates and and client between topology distance; When client (Client) is not but less than the Datanode of 3 pieces to Namenode request establishment data block, Namenode directly selects a Datanode to process the request of client according to the topological structure between them, Datanode by the request queue of Client toward oneself not writing the data of writing direct in full data block.After all data blocks are write completely in Datanode, still there is Client write request, be queued in top Client and create new piece to Namenode request.
Embodiment 6:
On the basis of embodiment 4 or 5, the present embodiment, after Namenode creates a piece, is transferred to Datanode by the control to this block operations, and store a plurality of small documents in a data block.
Embodiment 7:
On the basis of embodiment 6, the present embodiment file has added start offset amount and the length of file in data block to the conventional mapping basis of being mapped in of piece, filename is the filename character string with fullpath, data block arrives the mapping block of file to the mapping of file array set, and still whether each file with effective flag.
Embodiment 8:
On the basis of embodiment 1, the metadata of the present embodiment Namenode buffer memory, the NameSpace of buffer memory All Files and piece, file is to the mapping of piece, and the position of each piece copy, wherein these metadata of large file are to be kept in internal memory, and small documents only preserve a wherein the hottest part and less than the metadata of piece, Namenode also needs to safeguard that a list about small documents is for determining the metadata at which small documents of internal memory buffer memory, Namenode also will safeguard the newly-built list less than number of blocks of said extra Datanode above, establishment for late time data piece,
The metadata of Datanode buffer memory is all about small documents, and all metadata of large file still only have Namenode to carry out buffer memory, and the scope of the metadata of each Datanode buffer memory only has the block of two backups in this frame;
Client also needs the fileinfo of buffer memory part, the information that needs buffer memory Datanode, each Client safeguards that the list of a fixed qty keeps the Datanode accessing, in the middle of selecting each time, the maximum Datanode of access times conducts interviews, and what the replacement of list was used is lru algorithm.
Embodiment 9:
On the basis of embodiment 1, the update strategy of metadata is as follows described in the present embodiment:
Figure 2013104948884100002DEST_PATH_IMAGE002
first be the metadata information upgrading on Namenode, when system is when writing small documents, after completing of each small documents write, in data block, write full or data block is produced read request and now do not have other client when writing this piece or do not have read-write requests to surpass 1 minute to this piece, Datanode notice Namenode upgrades the metadata information about this piece, comprise file comprising in newly-built file name space and piece etc., corresponding information is added to Datanede not write in full list simultaneously;
next is the metadata information upgrading on Datanode, Namenode is when initiating self and Datanode adds cluster, the information of its piece of comprising of inquiry Datanode, analyze small documents data block wherein, analyze the frame that each contains two backups of this data block, by this piece, to file, file sends on each Datanode of this frame to the metadata of this piece and the corresponding Datanode of this piece, after the normal operation of Datanode, metadata can be upgraded after data block is modified, identifier to the meeting of the modification each time modified block of piece is unified, whether what the identifier between all backups that comprise this data block by direct contrast when therefore client directly conducts interviews to Datanode was determined is active block, when occurring in inconsistent situation, each Datanode by and Namenode between the heartbeat message of rule, to Namenode, ask the relevant metadata information of data block therewith of up-to-date own buffer memory, Datanode is after the information of the data block of notice Namenode renewal small documents, Namenode is at the new small documents piece that creates, after Datanode inefficacy and copied chunks, also can notify more new metadata of corresponding frame.
Embodiment 10:
On the basis of embodiment 1, the present embodiment is when Client needs written document, first from the Datanode list of own buffer memory, select the maximum Datanode of access directly to access, if do not had, to Namenode request, create new data block, if now can not create, Namenode distributes a Datanode according to topological path for it; Client connects Datanode and sends write request, and Datanode is its main block address of distributing writeable data block then to notify client to comprise this piece lease, and client can be carried out record to the Datanode of access each time in buffer memory;
During Client file reading, also be first in the Datanode of own buffer memory list, to select the maximum Datanode of read-write number of times directly to connect, if, do not searched on Namenode, if had, in the metadata of corresponding Datanode buffer memory, search corresponding file, if find corresponding block, this Datanode directly returns to the corresponding data block of Client file and the side-play amount of file in piece and the Datanode of length and preservation piece, if do not found, Client searches Namenode again, Namenode returns to Client again and wants the Datanode of file reading storage and blockid and file in side-play amount and the length of piece, Client is after obtaining corresponding block, connect corresponding Datanode directly reads in piece, when Client directly connects Datanode and finds corresponding piece and carry out reading of file, whether the data that need checking to read are valid data, adopt the strategy of two steps cards: first search whether read the identifier that all three backups of this piece check piece identical, if the same directly read, if different, to Namenode request, upgrade metadata information, and then search and read, secondly in the process of reading out data, by check code, verify whether read data content is effectively got.

Claims (10)

1. a HDFS improving one's methods for small documents, comprise a cluster, wherein in cluster, comprise a Namenode and a plurality of Datanode, can be by a plurality of client-access, it is characterized in that: the part authority of Namenode is transferred on Datanode node, allow Datanode buffer memory part small documents metadata information, process most small documents read-write requests.
2. a kind of HDFS according to claim 1 improving one's methods for small documents, it is characterized in that: except all metadata of Namenode managing file system, Datanode also preserves the metadata of part, it is mainly the metadata information of small documents, large file metadata information is still kept on Namenode, wherein Namenode is in charge of the activity of system scope, and follow periodically each Datanode communication with heartbeat message, to them, command operating is provided and collects their feedback states.
3. a kind of HDFS according to claim 1 improving one's methods for small documents, it is characterized in that: the operation of client is assigned to the metadata operation of part small documents on Datanode and carries out, if do not find corresponding result searches to Namenode again, during written document, client basis read-write record in the past, whether direct inquiry Datanode has is not write block blocks of files full and that now other client is not writing, if had, directly data are write to this data block, and upgrade corresponding metadata information, if do not had, to Namenode, send write data requests, by Namenode, distributing a new data block to complete data again writes, client data query piece completes in the machine, while reading file, directly inquire about Datanode, if fail to find, then search Namenode.
4. a kind of HDFS according to claim 1 improving one's methods for small documents, it is characterized in that: file is divided into the piece of fixed measure, adopt the 64M of former HDFS acquiescence, when each piece creates, Namenode distributes to its constant piece handle number of globally unique 64 it is identified, wherein first is used for identifying big or small file data blocks, Datanode is kept at piece on local hard drive as linux file, and the scope in data block is read and write blocks of data according to the piece handle number of appointment and byte.
5. a kind of HDFS according to claim 4 improving one's methods for small documents, it is characterized in that: the establishment of whole is divided and is responsible for by Namenode, after a newly-built complete piece, follow-up most toward storage file in piece, be to manage by Datanode, the Datanode preserving at Namenode while writing small documents piece created less than piece list in select to create in the list that number is less than 3 and select corresponding machine, and meet HDFS select piece position two rule-hard disk utilization rates and and client between topology distance; When client (Client) is not but less than the Datanode of 3 pieces to Namenode request establishment data block, Namenode directly selects a Datanode to process the request of client according to the topological structure between them, Datanode by the request queue of Client toward oneself not writing the data of writing direct in full data block;
After all data blocks are write completely in Datanode, still there is Client write request, be queued in top Client and create new piece to Namenode request.
According to a kind of HDFS described in claim 4 or 5 for the improving one's methods of small documents, it is characterized in that: after Namenode creates a piece, the control to this block operations is transferred to Datanode, and store a plurality of small documents in a data block.
7. a kind of HDFS according to claim 6 improving one's methods for small documents, it is characterized in that: file has added start offset amount and the length of file in data block to the conventional mapping basis of being mapped in of piece, filename is the filename character string with fullpath, data block arrives the mapping block of file to the mapping of file array set, and still whether each file with effective flag.
8. a kind of HDFS according to claim 1 improving one's methods for small documents, it is characterized in that: the metadata of Namenode buffer memory, the NameSpace of buffer memory All Files and piece, file is to the mapping of piece, and the position of each piece copy, wherein these metadata of large file are to be kept in internal memory, and small documents only preserve a wherein the hottest part and less than the metadata of piece, Namenode also needs to safeguard that a list about small documents is for determining the metadata at which small documents of internal memory buffer memory, Namenode also will safeguard the newly-built list less than number of blocks of said extra Datanode above, establishment for late time data piece,
The metadata of Datanode buffer memory is all about small documents, and all metadata of large file still only have Namenode to carry out buffer memory, and the scope of the metadata of each Datanode buffer memory only has the block of two backups in this frame;
Client also needs the fileinfo of buffer memory part, the information that needs buffer memory Datanode, each Client safeguards that the list of a fixed qty keeps the Datanode accessing, in the middle of selecting each time, the maximum Datanode of access times conducts interviews, and what the replacement of list was used is lru algorithm.
9. a kind of HDFS according to claim 1, for the improving one's methods of small documents, is characterized in that: the update strategy of described metadata is as follows:
Figure 2013104948884100001DEST_PATH_IMAGE001
first be the metadata information upgrading on Namenode, when system is when writing small documents, after completing of each small documents write, in data block, write full or data block is produced read request and now do not have other client when writing this piece or do not have read-write requests to surpass 1 minute to this piece, Datanode notice Namenode upgrades the metadata information about this piece, comprise file comprising in newly-built file name space and piece etc., corresponding information is added to Datanede not write in full list simultaneously;
Figure 702221DEST_PATH_IMAGE002
next is the metadata information upgrading on Datanode, Namenode is when initiating self and Datanode adds cluster, the information of its piece of comprising of inquiry Datanode, analyze small documents data block wherein, analyze the frame that each contains two backups of this data block, by this piece, to file, file sends on each Datanode of this frame to the metadata of this piece and the corresponding Datanode of this piece, after the normal operation of Datanode, metadata can be upgraded after data block is modified, identifier to the meeting of the modification each time modified block of piece is unified, whether what the identifier between all backups that comprise this data block by direct contrast when therefore client directly conducts interviews to Datanode was determined is active block, when occurring in inconsistent situation, each Datanode by and Namenode between the heartbeat message of rule, to Namenode, ask the relevant metadata information of data block therewith of up-to-date own buffer memory, Datanode is after the information of the data block of notice Namenode renewal small documents, Namenode is at the new small documents piece that creates, after Datanode inefficacy and copied chunks, also can notify more new metadata of corresponding frame.
10. a kind of HDFS according to claim 1, for the improving one's methods of small documents, is characterized in that:
When Client needs written document, first from the Datanode list of own buffer memory, select the maximum Datanode of access directly to access, if do not had, to Namenode request, create new data block, if now can not create, Namenode distributes a Datanode according to topological path for it; Client connects Datanode and sends write request, and Datanode is its main block address of distributing writeable data block then to notify client to comprise this piece lease, and client can be carried out record to the Datanode of access each time in buffer memory;
During Client file reading, also be first in the Datanode of own buffer memory list, to select the maximum Datanode of read-write number of times directly to connect, if, do not searched on Namenode, if had, in the metadata of corresponding Datanode buffer memory, search corresponding file, if find corresponding block, this Datanode directly returns to the corresponding data block of Client file and the side-play amount of file in piece and the Datanode of length and preservation piece, if do not found, Client searches Namenode again, Namenode returns to Client again and wants the Datanode of file reading storage and blockid and file in side-play amount and the length of piece, Client is after obtaining corresponding block, connect corresponding Datanode directly reads in piece, when Client directly connects Datanode and finds corresponding piece and carry out reading of file, whether the data that need checking to read are valid data, adopt the strategy of two steps cards: first search whether read the identifier that all three backups of this piece check piece identical, if the same directly read, if different, to Namenode request, upgrade metadata information, and then search and read, secondly in the process of reading out data, by check code, verify whether read data content is effectively got.
CN201310494888.4A 2013-10-22 2013-10-22 Improved method aimed at small files of HDFS Pending CN103530387A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310494888.4A CN103530387A (en) 2013-10-22 2013-10-22 Improved method aimed at small files of HDFS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310494888.4A CN103530387A (en) 2013-10-22 2013-10-22 Improved method aimed at small files of HDFS

Publications (1)

Publication Number Publication Date
CN103530387A true CN103530387A (en) 2014-01-22

Family

ID=49932396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310494888.4A Pending CN103530387A (en) 2013-10-22 2013-10-22 Improved method aimed at small files of HDFS

Country Status (1)

Country Link
CN (1) CN103530387A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970844A (en) * 2014-04-28 2014-08-06 北京创世漫道科技有限公司 Big data write-in method and device, big data read method and device and big data processing system
CN104270412A (en) * 2014-06-24 2015-01-07 南京邮电大学 Three-level caching method based on Hadoop distributed file system
CN104850548A (en) * 2014-02-13 2015-08-19 中国移动通信集团山西有限公司 Method and system used for implementing input/output process of big data platform
CN104978336A (en) * 2014-04-08 2015-10-14 云南电力试验研究院(集团)有限公司电力研究院 Unstructured data storage system based on Hadoop distributed computing platform
CN105279166A (en) * 2014-06-20 2016-01-27 中国电信股份有限公司 File management method and system
CN105404645A (en) * 2015-10-27 2016-03-16 北京乐动卓越科技有限公司 File management method in file server system and file server system
CN105930357A (en) * 2016-04-07 2016-09-07 深圳市慧动创想科技有限公司 Distributed file system, and data node data storage processing method and device
CN106156359A (en) * 2016-07-28 2016-11-23 四川新环佳科技发展有限公司 A kind of data synchronization updating method under cloud computing platform
CN107153662A (en) * 2016-03-04 2017-09-12 华为技术有限公司 A kind of data processing method and device
CN107220124A (en) * 2017-05-26 2017-09-29 郑州云海信息技术有限公司 A kind of routing resource and device
CN107223240A (en) * 2015-03-12 2017-09-29 英特尔公司 The computational methods associated with the context-aware management of cache memory and device
CN107229720A (en) * 2017-05-27 2017-10-03 郑州云海信息技术有限公司 A kind of method of Lustre file managements, apparatus and system
CN107295030A (en) * 2016-03-30 2017-10-24 阿里巴巴集团控股有限公司 A kind of method for writing data, device, data processing method, apparatus and system
CN107368608A (en) * 2017-08-07 2017-11-21 杭州电子科技大学 The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC
CN107682399A (en) * 2017-08-29 2018-02-09 中国信息安全测评中心 A kind of file breaker point continuous transmission method based on big data
CN107995147A (en) * 2016-10-27 2018-05-04 中国电信股份有限公司 Metadata encryption and decryption method and system based on distributed file system
CN108268344A (en) * 2017-12-26 2018-07-10 华为技术有限公司 A kind of data processing method and device
CN108763397A (en) * 2018-05-22 2018-11-06 中国科学技术大学苏州研究院 A kind of data method for placing of the distributed file system of supporting depth study
CN108932287A (en) * 2018-05-22 2018-12-04 广东技术师范学院 A kind of mass small documents wiring method based on Hadoop
CN109299057A (en) * 2018-10-09 2019-02-01 北京快友世纪科技股份有限公司 Hadoop multi-pipe data handles analysis method
CN109947721A (en) * 2017-12-01 2019-06-28 北京安天网络安全技术有限公司 A kind of small documents treating method and apparatus
CN111581017A (en) * 2020-04-14 2020-08-25 上海爱数信息技术股份有限公司 Backup and recovery system and method for modern application
CN113127420A (en) * 2021-03-30 2021-07-16 山东英信计算机技术有限公司 Metadata request processing method, device, equipment and medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220923A1 (en) * 2002-05-23 2003-11-27 International Business Machines Corporation Mechanism for running parallel application programs on metadata controller nodes
US20030220974A1 (en) * 2002-05-23 2003-11-27 International Business Machines Corporation Parallel metadata service in storage area network environment
US20040133650A1 (en) * 2001-01-11 2004-07-08 Z-Force Communications, Inc. Transaction aggregation in a switched file system
US20090077097A1 (en) * 2007-04-16 2009-03-19 Attune Systems, Inc. File Aggregation in a Switched File System
CN101520805A (en) * 2009-03-25 2009-09-02 中兴通讯股份有限公司 Distributed file system and file processing method thereof
US20110153606A1 (en) * 2009-12-18 2011-06-23 Electronics And Telecommunications Research Institute Apparatus and method of managing metadata in asymmetric distributed file system
US20120078849A1 (en) * 2010-09-24 2012-03-29 Hitachi Data Systems Corporation System and method for enhancing availability of a distributed object storage system during a partial database outage
CN102420854A (en) * 2011-11-14 2012-04-18 西安电子科技大学 Distributed file system facing to cloud storage
US20120150930A1 (en) * 2010-12-10 2012-06-14 Electronics And Telecommunications Research Institute Cloud storage and method for managing the same
CN102821138A (en) * 2012-07-09 2012-12-12 广州鼎鼎信息科技有限公司 Metadata distributed storage method applicable to cloud storage system
CN102855284A (en) * 2012-08-03 2013-01-02 北京联创信安科技有限公司 Method and system for managing data of cluster storage system
CN103106286A (en) * 2013-03-04 2013-05-15 曙光信息产业(北京)有限公司 Method and device for managing metadata

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133650A1 (en) * 2001-01-11 2004-07-08 Z-Force Communications, Inc. Transaction aggregation in a switched file system
US20030220923A1 (en) * 2002-05-23 2003-11-27 International Business Machines Corporation Mechanism for running parallel application programs on metadata controller nodes
US20030220974A1 (en) * 2002-05-23 2003-11-27 International Business Machines Corporation Parallel metadata service in storage area network environment
US20090077097A1 (en) * 2007-04-16 2009-03-19 Attune Systems, Inc. File Aggregation in a Switched File System
CN101520805A (en) * 2009-03-25 2009-09-02 中兴通讯股份有限公司 Distributed file system and file processing method thereof
US20110153606A1 (en) * 2009-12-18 2011-06-23 Electronics And Telecommunications Research Institute Apparatus and method of managing metadata in asymmetric distributed file system
US20120078849A1 (en) * 2010-09-24 2012-03-29 Hitachi Data Systems Corporation System and method for enhancing availability of a distributed object storage system during a partial database outage
US20120150930A1 (en) * 2010-12-10 2012-06-14 Electronics And Telecommunications Research Institute Cloud storage and method for managing the same
CN102420854A (en) * 2011-11-14 2012-04-18 西安电子科技大学 Distributed file system facing to cloud storage
CN102821138A (en) * 2012-07-09 2012-12-12 广州鼎鼎信息科技有限公司 Metadata distributed storage method applicable to cloud storage system
CN102855284A (en) * 2012-08-03 2013-01-02 北京联创信安科技有限公司 Method and system for managing data of cluster storage system
CN103106286A (en) * 2013-03-04 2013-05-15 曙光信息产业(北京)有限公司 Method and device for managing metadata

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付东华: "基于HDFS的海量分布式文件系统的研究与优化", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850548A (en) * 2014-02-13 2015-08-19 中国移动通信集团山西有限公司 Method and system used for implementing input/output process of big data platform
CN104850548B (en) * 2014-02-13 2018-05-22 中国移动通信集团山西有限公司 A kind of method and system for realizing big data platform input/output processing
CN104978336A (en) * 2014-04-08 2015-10-14 云南电力试验研究院(集团)有限公司电力研究院 Unstructured data storage system based on Hadoop distributed computing platform
CN103970844A (en) * 2014-04-28 2014-08-06 北京创世漫道科技有限公司 Big data write-in method and device, big data read method and device and big data processing system
CN103970844B (en) * 2014-04-28 2017-11-21 北京创世漫道科技有限公司 The wiring method and device of big data, read method and device and processing system
CN105279166A (en) * 2014-06-20 2016-01-27 中国电信股份有限公司 File management method and system
CN104270412A (en) * 2014-06-24 2015-01-07 南京邮电大学 Three-level caching method based on Hadoop distributed file system
CN107223240A (en) * 2015-03-12 2017-09-29 英特尔公司 The computational methods associated with the context-aware management of cache memory and device
CN105404645A (en) * 2015-10-27 2016-03-16 北京乐动卓越科技有限公司 File management method in file server system and file server system
CN107153662A (en) * 2016-03-04 2017-09-12 华为技术有限公司 A kind of data processing method and device
CN107153662B (en) * 2016-03-04 2020-04-28 华为技术有限公司 Data processing method and device
CN107295030A (en) * 2016-03-30 2017-10-24 阿里巴巴集团控股有限公司 A kind of method for writing data, device, data processing method, apparatus and system
CN105930357A (en) * 2016-04-07 2016-09-07 深圳市慧动创想科技有限公司 Distributed file system, and data node data storage processing method and device
CN105930357B (en) * 2016-04-07 2019-12-27 深圳市慧动创想科技有限公司 Distributed file system and data node data storage processing method and device
CN106156359A (en) * 2016-07-28 2016-11-23 四川新环佳科技发展有限公司 A kind of data synchronization updating method under cloud computing platform
CN106156359B (en) * 2016-07-28 2019-05-21 广东奥飞数据科技股份有限公司 A kind of data synchronization updating method under cloud computing platform
CN107995147B (en) * 2016-10-27 2021-05-14 中国电信股份有限公司 Metadata encryption and decryption method and system based on distributed file system
CN107995147A (en) * 2016-10-27 2018-05-04 中国电信股份有限公司 Metadata encryption and decryption method and system based on distributed file system
CN107220124B (en) * 2017-05-26 2021-01-12 苏州浪潮智能科技有限公司 Path selection method and device
CN107220124A (en) * 2017-05-26 2017-09-29 郑州云海信息技术有限公司 A kind of routing resource and device
CN107229720A (en) * 2017-05-27 2017-10-03 郑州云海信息技术有限公司 A kind of method of Lustre file managements, apparatus and system
CN107368608A (en) * 2017-08-07 2017-11-21 杭州电子科技大学 The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC
CN107682399B (en) * 2017-08-29 2020-07-14 中国信息安全测评中心 File folder breakpoint continuous transmission method based on big data
CN107682399A (en) * 2017-08-29 2018-02-09 中国信息安全测评中心 A kind of file breaker point continuous transmission method based on big data
CN109947721B (en) * 2017-12-01 2021-08-17 北京安天网络安全技术有限公司 Small file processing method and device
CN109947721A (en) * 2017-12-01 2019-06-28 北京安天网络安全技术有限公司 A kind of small documents treating method and apparatus
CN108268344A (en) * 2017-12-26 2018-07-10 华为技术有限公司 A kind of data processing method and device
CN108932287A (en) * 2018-05-22 2018-12-04 广东技术师范学院 A kind of mass small documents wiring method based on Hadoop
CN108763397A (en) * 2018-05-22 2018-11-06 中国科学技术大学苏州研究院 A kind of data method for placing of the distributed file system of supporting depth study
CN108932287B (en) * 2018-05-22 2019-11-29 广东技术师范大学 A kind of mass small documents wiring method based on Hadoop
CN108763397B (en) * 2018-05-22 2022-07-08 中国科学技术大学苏州研究院 Data placement method of distributed file system supporting deep learning
CN109299057A (en) * 2018-10-09 2019-02-01 北京快友世纪科技股份有限公司 Hadoop multi-pipe data handles analysis method
CN111581017A (en) * 2020-04-14 2020-08-25 上海爱数信息技术股份有限公司 Backup and recovery system and method for modern application
CN111581017B (en) * 2020-04-14 2021-07-13 上海爱数信息技术股份有限公司 Backup and recovery system and method for modern application
CN113127420A (en) * 2021-03-30 2021-07-16 山东英信计算机技术有限公司 Metadata request processing method, device, equipment and medium
CN113127420B (en) * 2021-03-30 2023-03-14 山东英信计算机技术有限公司 Metadata request processing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN103530387A (en) Improved method aimed at small files of HDFS
US10198356B2 (en) Distributed cache nodes to send redo log records and receive acknowledgments to satisfy a write quorum requirement
US9710535B2 (en) Object storage system with local transaction logs, a distributed namespace, and optimized support for user directories
US11392544B2 (en) System and method for leveraging key-value storage to efficiently store data and metadata in a distributed file system
KR101932372B1 (en) In place snapshots
Dong et al. An optimized approach for storing and accessing small files on cloud storage
KR101926674B1 (en) Log record management
CN104850572B (en) HBase non-primary key index construct and querying method and its system
KR101672901B1 (en) Cache Management System for Enhancing the Accessibility of Small Files in Distributed File System
CN105183839A (en) Hadoop-based storage optimizing method for small file hierachical indexing
CN103856567A (en) Small file storage method based on Hadoop distributed file system
US20230053087A1 (en) Data management system and method of controlling
CN102541985A (en) Organization method of client directory cache in distributed file system
CN104679898A (en) Big data access method
CN109697016A (en) Method and apparatus for improving the storage performance of container
US20170351620A1 (en) Caching Framework for Big-Data Engines in the Cloud
US20130198230A1 (en) Information processing apparatus, distributed processing system, and distributed processing method
JP2012168781A (en) Distributed data-store system, and record management method in distributed data-store system
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
WO2017023709A1 (en) Object storage system with local transaction logs, a distributed namespace, and optimized support for user directories
Nguyen et al. Optimizing mongodb using multi-streamed ssd
US10055139B1 (en) Optimized layout in a two tier storage
Choi et al. A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory
US11586353B2 (en) Optimized access to high-speed storage device
US10628391B1 (en) Method and system for reducing metadata overhead in a two-tier storage architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140122