WO2015096649A1

WO2015096649A1 - Data processing method and related device

Info

Publication number: WO2015096649A1
Application number: PCT/CN2014/094071
Authority: WO
Inventors: 崔慧敏; 谢睿; 阮功; 杨文森
Original assignee: 华为技术有限公司
Priority date: 2013-12-23
Filing date: 2014-12-17
Publication date: 2015-07-02
Also published as: CN104731569B; CN104731569A

Abstract

A data processing method and a related device, implementing automatic conversion of data format and automatic splicing of data in a node device by a Hadoop. The method mainly comprises: a data preprocessor reads metadata from a first buffer area of a CPU, reads data of a data collection from the first buffer area on the basis of a memory address indicated by the metadata, converts, on the basis of a preset analytic function, the data of the data collection into a data format indicated by the preset analytic function, and stores data blocks generated with the converted data collection in a second buffer area of the CPU, thus allowing a data splicer to read from the second buffer area the data blocks and to splice same to a GPU.

Description

Data processing method and related equipment

Technical field

The present invention relates to the field of information processing technologies, and in particular, to a data processing method and related equipment.

Background technique

Together with cloud computing, big data brings a new revolution to information technology (IT, Information Technology). Cloud computing has powerful big data computing power, and computing speed is very fast, but the transmission of big data has become a big problem.

MapReduce (there is no unified Chinese translation in the field) is a well-known cloud computing architecture provided by Google search engine Google for parallel computing on large-scale data sets (greater than 1TB), Hadoop (there is no unified in this field) Chinese translation) is a concrete implementation of the MapReduce architecture, which is divided into a master node device and a slave node device in a Hadoop cluster. The master node device uses the Map function provided by MapReduce to divide the data set into M pieces of data fragments according to the size, and distributes the data fragments to multiple slave nodes for parallel processing. Specifically, each slave node device obtains the value of the key value pair from the data fragment, and stores the value in a buffer allocated from a processor (Central Processing Unit, CPU for short) of the node device, and then, the buffer buffer. Read the value of the key-value pair, for example, convert the data format of the value of the key-value pair, and then splicing the parsed value to the graphics processor of the slave node through an Application Programming Interface (API) ( The GPU (Graphics Processing Unit) allocates buffers for storing data, and the GPU performs calculation processing.

When implementing the above solution, the present inventors have found that since the analytic function is not provided in the MapReduce architecture, when parsing the value of the key value pair, it is necessary to rely on the corresponding program written by the programmer; meanwhile, the CPU allocates the storage key value. The buffer of the value of the buffer may be inconsistent with the buffer size allocated by the GPU to store the data, and the corresponding judgment method is not provided in the MapReduce architecture, and the buffer corresponding to the CPU and the GPU is also relied on the corresponding judgment function written by the programmer. Whether the judgment is consistent or not, and the execution efficiency of the slave node device is reduced.

Summary of the invention

In view of the above drawbacks, the embodiment of the present invention provides a data processing method and related device, which is applied to a Hadoop cluster under the MapReduce architecture, which can improve the working efficiency of the slave node device in the Hadoop cluster and simplify the programmer's programming work, which is beneficial to the programming. Subsequent optimization of the MapReduce architecture.

In a first aspect, the present invention provides a data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU. The slave node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:

The data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, at the first buffer header Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;

The data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata;

The data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the CPU. a second buffer, such that the data splicer reads the data block from the second buffer to splicing to the GPU.

With reference to the first aspect, in a first possible implementation, the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element Means for indicating that the data of the data set is in a storage address of the first buffer, and wherein the data preprocessor reads the data set from the first buffer according to the storage address indicated by the metadata The data includes: the data preprocessor instructing to start reading from a storage address of the first buffer from a data element of the address index array until a storage address indicated by a next data element or an end of the first buffer end is read Take data.

With reference to the first aspect, in a second possible implementation, the converting the data of the data set into the data format indicated by the preset analytic function comprises: the data preprocessor according to the preset The analytic function converts the data of the data set into a data format that satisfies the logical operation specified by the analytic function.

With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner, when the first buffer stores a data format of the data set and a storage format of the data in the GPU In case of inconsistency, the generating the data block after the converted data set comprises: the data preprocessor converting the data in the data block into a storage format in the GPU.

In combination with the first aspect, or the first possible implementation of the first aspect, or the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in a fourth possible In an implementation manner, the data set is specifically composed of a value splicing of a plurality of key value pairs in the data fragment.

In combination with the first aspect, or the first possible implementation of the first aspect, or the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in a fifth possible In an implementation manner, the first buffer and the second buffer are automatically allocated and reclaimed by the CPU, and a life cycle of the first buffer is a processing time of a data fragment, and the second buffer is The life cycle is the processing time of a data set.

A second aspect of the present invention provides a data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU. The slave node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:

The data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU;

The data splicer splices the data block into a working buffer of the GPU that is allocated a block of stored data.

With reference to the second aspect, in a first possible implementation manner, when the data splicer splicing the data block to a working buffer in the GPU that is allocated a storage data block, the splicing of the data is suspended. Blocking, and triggering the GPU to process the data block stored by the working buffer.

With reference to the second aspect, or the first possible implementation manner of the second aspect, in a second possible implementation, the data splicer splicing the data block from a starting address indicated by a cursor parameter, The cursor parameter is used to indicate a starting address in the working buffer of the GPU in which the storage data block is allocated for storing the data block.

With reference to the second possible implementation of the second aspect, in a third possible implementation, after the data block is successfully spliced, the method further includes: the data splicer notifying the GPU of the data The size of the block; the data splicer updates the cursor parameters.

A third aspect of the present invention provides a data preprocessor, including:

a first reading unit, configured to read metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, in the first buffer The header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;

a second reading unit, configured to read data of the data set from the first buffer according to a storage address indicated by the metadata;

a converting unit, configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;

And a storage unit, configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.

With reference to the third aspect, in a first possible implementation, the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element a storage address indicating that the data of the data set is in the first buffer, and wherein the second reading unit comprises: a data reading unit, configured to indicate from the data element of the address index array that the first The memory address of the buffer begins to be read until the memory address indicated by the next data element or the end of the first buffer ends to read data.

With reference to the third aspect, or the first possible implementation manner of the third aspect, in a second possible implementation, the parsing unit includes: a data format converting unit, configured to use the preset analytic function The data of the data set is converted into a data format generating unit that satisfies the logical operation specified by the analytic function, and is configured to generate the data block by converting the data set.

With reference to the third aspect, in a third possible implementation, the parsing unit further includes: a format converting unit, configured to: when the first buffer stores data of the data set, in a format of the GPU When the storage format of the data is inconsistent, the data in the data block is converted into a storage format in the GPU.

A fourth aspect of the present invention provides a data splicer, including:

a third reading unit, configured to read a data block generated by the data preprocessor from a second buffer of the CPU;

a splicing processing unit, configured to splicing the data block into a working buffer of the GPU to which a storage data block is allocated.

With reference to the fourth aspect, in a first possible implementation, the data splicer further includes: a trigger processing unit, configured to: when the data splicer splices the data block into the GPU, is allocated to store data When the working buffer of the block fails, the data block is suspended and the GPU is triggered to process the data block stored in the working buffer.

With reference to the fourth aspect, or the first possible implementation manner of the fourth aspect, in a second possible implementation, the splicing processing unit is specifically configured to splicing the data block from a starting address indicated by a cursor parameter The cursor parameter is used to indicate a starting address in the working buffer of the GPU in which the storage data block is allocated for storing the data block.

With reference to the second possible implementation of the fourth aspect, in a third possible implementation, the data splicer further includes: a notification unit, configured to notify the GPU of the size of the data block; and an update unit, Used to update the cursor parameters.

A fifth aspect of the present invention provides a processor, which may include the data preprocessor according to the above third aspect and the data splicer according to the fourth aspect.

With reference to the fifth aspect, in a first possible implementation, the first buffer and the second buffer are automatically allocated and reclaimed, and a life cycle of the first buffer is a processing time of a data fragment. The life cycle of the second buffer is the processing time of one data set.

A sixth aspect of the present invention provides a slave node device, which may include the processor CPU described in the above fifth aspect, and a graphics processor GPU; wherein the data preprocessor in the CPU is used to obtain the data slice from the data slice. The data set converts the data format, and converts the data set after the data format to generate a data block, and splices the data block into a working buffer of the GPU to allocate the storage data block by using a data splicer in the CPU; The GPU is configured to process the data block to obtain a processing result, and then return the processing result to the CPU.

It can be seen from the above technical solutions that the embodiments of the present invention have the following advantages:

In one aspect, the embodiment of the present invention reads the metadata from the first buffer of the CPU by the data preprocessor by setting the data preprocessor and the data splicer in the slave node device, because the metadata is in the number The data set is generated for the data set when the set is stored in the first buffer, and is used to represent the storage address of the data of the data set in the first buffer, and then the data preprocessor can read the data set from the first buffer according to the metadata. The data is further converted into a data set according to a preset analytic function, and then the converted data set is generated into a data block, and the data block is stored in a second buffer of the CPU so as to be completed by the data splicer. GPU data block splicing. Compared with the prior art, in the embodiment of the present invention, when the data set is stored into the first buffer, metadata including the storage address is added to the data of the data set, and the data preprocessor can automatically read from the first buffer. Taking data from a data collection does not require relying on the programmer to write the appropriate program. Furthermore, the data preprocessor can parse the data of the data set according to the preset analytic function, improve the processing efficiency in the CPU, and can be beneficial to the subsequent optimization of the MapReduce architecture;

On the other hand, the data block is read from the second buffer through the data splicer and spliced into the working buffer of the GPU that is allocated the storage data block, and the splicing fails, indicating the remaining of the working buffer of the GPU that is allocated the storage data block. If the memory is not enough to complete the splicing of the data block, the splicing of the data block is temporarily stopped, and the GPU is triggered to perform data operation on the data block. The data block will also be temporarily saved in the second buffer, and then spliced next time. Compared with the prior art, it does not need to rely on a program written by a programmer, and the data splicing can be automatically completed by the data splicer, which effectively prevents data block loss and improves data block splicing efficiency.

DRAWINGS

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.

FIG. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention;

2 is a schematic flowchart of a data processing method according to another embodiment of the present invention;

FIG. 3 is a schematic flowchart of a data processing method according to an embodiment of the present invention;

4 is a schematic flowchart of a data processing method according to another embodiment of the present invention;

FIG. 5-a is a schematic structural diagram of a data preprocessor according to an embodiment of the present invention;

FIG. 5-b is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention;

FIG. 5-c is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention;

FIG. 5-d is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention;

6-a is a schematic structural diagram of a data splicer according to an embodiment of the present invention;

6-b is a schematic structural diagram of a data splicer according to another embodiment of the present invention;

6-c is a schematic structural diagram of a data splicer according to another embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a processor according to an embodiment of the present invention;

FIG. 8-a is a schematic structural diagram of a slave node device according to an embodiment of the present invention;

FIG. 8-b is a schematic diagram of interaction between a CPU and a GPU in a slave node device according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of a data processing device according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The embodiment of the invention provides a data processing method and related device, which is applied to a Hadoop cluster under the MapReduce architecture, realizes automatic conversion of Hadoop slave node data format and automatic data splicing, simplifies programmer programming, and is beneficial to subsequent optimization of MapReduce. Architecture.

As shown in FIG. 1, an aspect of the present invention provides a data processing method, including:

S110. The data preprocessor reads the metadata from the first buffer of the CPU. When the data set obtained from the data fragment is stored in the first buffer, the first buffer is in the first buffer. Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;

The embodiment of the present invention is applied to a Hadoop cluster in a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave node device obtains data points from the master node device. Slice, and a data preprocessor and data splicer are provided in the CPU.

Allocating a first buffer in the CPU for storing the data set obtained from the data fragment, and when the data set is stored in the first buffer, adding metadata to the data set in the first buffer header, the element The data mainly includes the storage address of the data in the data set in the first buffer.

S120. The data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata.

Since the metadata includes the storage address of the data set in the first buffer, the data preprocessor can directly read the data of the data set from the first buffer according to the indication of the metadata, without having to rely on the programmer to write an additional program. To read the data.

S130. The data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the a second buffer of the CPU such that the data splicer reads the data block from the second buffer to spliced to the GPU.

In addition, an analytic function is pre-configured in the MapReduce architecture, and the data preprocessor can parse the data of the data set in the first buffer according to a preset analytic function, convert the data format into a preset analytic function, and then convert the data format. The subsequent data set generates a data block. At the same time, a second buffer is allocated in the CPU for storing data blocks. The data splicer can then read the data block from the second buffer into the GPU.

In the embodiment of the present invention, since the metadata is added to the data set when the data set is stored in the first buffer of the CPU, the metadata includes the storage address of the data of the data set in the first buffer, and therefore, the data is preprocessed. After reading the metadata from the first buffer, the device reads the data of the data set from the first buffer according to the storage address indicated by the metadata, and then converts the data format of the data by using a preset analytic function, and converts the format. The subsequent data set generates a data block and stores it in the second buffer in the CPU, thereby implementing the operation of automatically reading the data of the first buffer by the data preprocessor and parsing the data, without further relying on the program. The programming of the staff provides a more complete MapReduce architecture for the programmer, and is also beneficial for subsequent optimization of the MapReduce architecture.

It can be understood that in the MapReduce architecture, a mapping Map function is specified to map the input key-value pairs into new key-value pairs; and the concurrent reduced Reduce function is used to ensure the key-value pairs of all the mappings. Each of them shares the same key group. After the Map function maps the input key-value pairs into new key-value pairs, all the new key-value pairs are divided into different data fragments according to the data size by the master node device in the Hadoop cluster, according to the data fragmentation arrangement. Perform corresponding arithmetic processing for each slave node device.

In the CPU where the slave device is located, call the RecordReader class to get the key in the data fragment. Value pairs, and extract the values from the key-value pairs into a data set. The CPU allocates a DirectBuffer to its data set in its memory. The data set is stored in the DirectBuffer in the format of the DirectBuffer. When the data set is stored in the DirectBuffer, metadata is added to the data set at the head of the DirectBuffer. At the same time, in the MapReduce architecture, a preset analytic function for parsing data of the data set is pre-set, and the preset analytic function specifically converts the data into a specified data format that satisfies the logical operation. The data preprocessor is set in the CPU, and the data preprocessor completes reading data from the DirectBuffer according to the metadata, and automatically converts the data format by using a preset analytic function. Specifically, the embodiment provided in FIG. 1 will be described in detail below. Referring to FIG. 2, a data processing method may include:

S210: The data preprocessor reads the metadata from the DirectBuffer, where the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element a storage address for indicating data of the data set in the DirectBuffer;

Specifically, when storing the data set into the DirectBuffer, metadata is added at the head of the DirectBuffer to indicate the storage address of the data in the data set in the DirectBuffer. It can be understood that the metadata may include an address index array. When the data set is stored in the DirectBuffer, the storage address of the data is added to the address index array according to the data in the data set at the position of the DirectBuffer. The address index array has data elements that correspond one-to-one with the data in the data set, and the data elements indicate the storage address of the data of the data set in the DirectBuffer. Generally speaking, the data stored in the data collection of DirectBuffer is the same data format, and can be a format such as text format or binary that cannot be logically operated.

S220. The data preprocessor reads data in the data set from the DirectBuffer according to the data element of the address index array in the metadata.

Specifically, the data preprocessor reads the corresponding data from the storage address in the DirectBuffer according to the storage address indicated by the data element in the address index array until the storage address indicated by the next data element is read or When the end of the DirectBuffer ends, one data of the data set is read, and then the next data is read until the data of the data set in the DirectBuffer is read.

S230. The data preprocessor converts data of the data set according to a preset analytic function. And replacing with the data format specified by the preset analytic function to satisfy the logical operation;

The data stored in the data set of the DirectBuffer is generally a data format that cannot be logically operated, and needs to be converted into a format that can be logically operated before being transferred to the GPU for logical operations. Therefore, the analytic function is preset in the MapReduce architecture, and the data preprocessor automatically converts the data format according to the preset analytic function, and converts into a data format that satisfies the logical operation specified by the analytic function.

Optionally, the data format specified by the preset analytic function may be a data format required by the GPU logic operation. Specifically, the data format that can be logically operated by the preset analytic function may be shaped data, floating point data, string data, or the like.

S240. The preprocessor converts the data set after the data format to generate a data block.

After the data preprocessor automatically converts each data into a data format that can be logically operated according to a preset analytic function according to a preset analytic function, the data format is converted in order to facilitate subsequent splicing of data between the CPU and the GPU. The subsequent data set generates a data block.

S250. The preprocessor stores the data block in a LaunchingBuffer, so that the data splicer reads the data block from the LaunchingBuffer and splices it to the GPU.

Specifically, the CPU further allocates a LaunchingBuffer in the memory to temporarily store the data format converted data block, wherein the data preprocessor stores the data block in the LaunchingBuffer, and then the data splicer completes the reading of the data block from the LaunchingBuffer to the data block. GPU.

It can be understood that the data stored in the DirectBuffer of the CPU and the data to be processed by the GPU may be inconsistent in the storage format, that is, the processing of the size and mantissa problem is inconsistent, wherein the small endian storage format refers to the high order of the data stored in the high In the address, the lower bits of the data are stored in the lower address; the big-endian storage format means that the upper bits of the data are stored in the lower address, and the status of the data is stored in the upper address. Therefore, the data preprocessor also needs to solve the problem of the size and mantissa of the data block.

The DirectBuffer allocated by the CPU has its own member variable, which indicates whether the data is stored in the DirectBuffer in the big-end format or the small-tail format. It also indicates whether the storage format needs to be converted when stored in the LaunchingBuffer, and the conversion needs to be given. The big tail format is a hint for the small tail format. For example, the data in the data set is stored in the DirectBuffer in the big tail format, while the GPU stores the data in the small tail format. When the data block is stored in the LaunchingBuffer, the data in the data block is stored in a small endian format. save at LaunchingBuffer. After that, the data splicer can directly read the data block from the LaunchingBuffer and splice it to the GPU, ensuring that the storage format of the data of the LaunchingBuffer and the GPU of the CPU is consistent, ensuring that the GPU can correctly read the data block for arithmetic processing, and avoid reading the data high. Going low, or reading the data status high, causes an arithmetic error.

In the embodiment of the present invention, the data preprocessor first reads the address index array from the DirectBuffer, and reads the data in the corresponding data set from the DirectBuffer according to the data elements in the address index array, and then, according to the preset The analytic function implements data format conversion on the data in the data set, so that the data after the data format conversion can satisfy the logical operation. The data set generation data block is stored in the LaunchingBuffer, and the data splicer reads the data block from the LaunchingBuffer and transmits it to the GPU. The embodiment of the invention is completed by the data preprocessor in the CPU, and the data is automatically parsed by the preset analytic function, which facilitates the operation of the data block by the GPU, and the data preprocessor is used to simplify the programming work of the slave device. Conducive to future optimization.

The CPU automatically allocates and reclaims WorkingBuffer and LaunchingBuffer. The working period of a WorkingBuffer is the processing time of one data fragment, and the life cycle of a LaunchingBuffer is the time to process a data collection. In addition, the ResultBuffer is also allocated on the CPU to store the operation result returned by the GPU operation, and then the operation result is used as the input of the Reduce task in the MapReduce.

As shown in FIG. 3, another aspect of the present invention provides a data processing method, including:

S310. The data splicer reads the data block generated by the data preprocessor from the second buffer of the CPU.

The data preprocessor is configured to read data of the data set from the first buffer of the CPU, convert the data into a data format, and store the data set generated data block into the second buffer. The data splicer mainly completes the splicing of data blocks from the CPU to the GPU.

S320. The data splicer splices the data block into a working buffer of a GPU that is allocated a storage data block.

In the embodiment of the present invention, the data splicer reads the data block from the second buffer of the CPU, and splices the data block from the second buffer of the CPU to the working buffer of the GPU. In the embodiment of the invention, the data splicing is completed by the data splicer, which is no longer dependent on the programming of the programmer, thereby simplifying the programmer's programming work and facilitating the subsequent optimization of the entire MapReduce architecture.

The embodiment provided in FIG. 3 is described in detail below. As shown in FIG. 4, a data processing method may include:

S410. The data splicer reads a data block from the LaunchingBuffer.

The CPU also allocates a LaunchingBuffer in the memory, which is mainly used to store data blocks that need to be spliced to the GPU.

S420: The data splicer splices the data block from a starting address indicated by a cursor parameter, where the cursor parameter is used to indicate a starting address of a WorkingBuffer in the GPU where a storage data block is allocated for storing a data block. ;

WorkingBuffer is allocated in the GPU memory, which is mainly used to store data spliced from the CPU's LaunchingBuffer; the memory size of the WorkingBuffer is determined by the GPU itself, and the memory size of the DirectBuffer in the CPU is determined by the Java operating environment. Generally speaking, the memory size of the WorkingBuffer on the GPU is much larger than the memory of the DirectBuffer supported by Java in the CPU. Therefore, the WorkingBuffer may store at least one data block obtained from the DirectBuffer, and when stored in a certain data block, The remaining memory of the WorkingBuffer may no longer be able to store the data block, which will be processed correctly by the data splicer.

Specifically, the data splicer manages a cursor parameter, and the cursor parameter indicates that the WorkingBuffer can store the starting address of the data. After each splicing of the data block to the WorkingBuffer, the cursor parameter is updated accordingly, so that the WorkingBuffer can be accurately stored next time. The starting address of the data. When the data block needs to be transferred to the WorkingBuffer, the data block is spliced into the WorkingBuffer starting from the starting address indicated by the cursor parameter.

S430. When the data splicer fails to splicing the data block to the WorkingBuffer, the splicing of the data block is suspended, and the GPU is triggered to process the data block stored by the WorkingBuffer.

The data in the data block read by the data splicer from the LaunchingBuffer can be directly logically operated, and meets the storage format requirement of the GPU for the data. Invoking an application programming interface (API) to stitch data in the data block to WorkingBuffer. If the remaining memory of the WorkingBuffer can be spliced out of the data block read from the CPU's LaunchingBuffer, the entire data block is spliced into the WorkingBuffer; if the remaining memory of the WorkingBuffer cannot be spliced, the data block read from the CPU's LaunchingBuffer is suspended. The data block is spliced, the data block is still stored in the LaunchingBuffer, and the GPU is triggered to start processing all the data blocks in the WorkingBuffer.

In the embodiment of the present invention, the data splicer in the CPU is used to solve the problem of data block splicing when the remaining memory size of the WorkingBuffer in the DirectBuffer and the GPU in the CPU is inconsistent. The data splicer directly splices the data block from the LaunchingBuffer to the WorkingBuffer. If the remaining memory size of the WorkingBuffer cannot satisfy the storage data block, the splicing operation is temporarily stopped. If the remaining memory of the WorkingBuffer can be spliced again, again. The read data block in the LaunchingBuffer is spliced into the WorkingBuffer. Since the data block in the LaunchingBuffer has already met the needs of the GPU for data processing, the GPU can directly perform arithmetic processing after receiving the data block, thereby effectively improving the working efficiency of the GPU.

It can be understood that after the data splicer successfully transmits the data block, the following steps are performed:

B1. The data splicer notifies a size of the data block of the GPU.

B2. The data splicer updates the cursor parameter.

The data splicer notifies the GPU to the GPU after each successful splicing of the data block to the GPU, and the GPU can directly use the data block size without reducing the GPU workload.

In addition, in the DirectBuffer of the above CPU, the address index array is used to indicate that the data is stored in the DirectBuffer storage address, and the GPU may also add a lookup index array to the data block in the WorkingBuffer header, and the search index array contains the data block. The data corresponds to the data element, and the data element is used to indicate the storage address of the data in the WorkingBuffer. After the data splicer splices a data block, the data element corresponding to each data of the data block is added in the search index array, so that the GPU can quickly find the data from the WorkingBuffer and read the data for operation.

The above steps B1 and B2 are in no particular order and are not limited herein.

Since in the CPU, each data slice received may eventually generate multiple data blocks, the WorkingBuffer allocated in the GPU is stored in units of data blocks, and the lifetime is one processed. The time of data fragmentation. After the data splicer successfully transmits the entire data fragment, the data splicer returns the flag value of the successful transmission to notify the master node device to allocate the next data fragment; after the data splicer fails to transmit the data fragment, it returns The failed flag value is transmitted to inform the master node device to suspend the allocation of the next data slice.

In addition, the ResultBuffer is also allocated in the GPU memory. The ResultBuffer is used to save the result of the operation, and then the API interface is called, and the operation result is returned to the CPU and stored in the ResultBuffer allocated by the CPU as an input of the Reduce task under the MapReduce.

The DirectBuffer used to store the data set in the CPU, the LaunchingBuffer that stores the data block after the data format conversion, and the ResultBuffer used to store the result returned by the GPU are automatically allocated and reclaimed by the CPU. The life cycle of the LaunchingBuffer is a data block. Processing time; the WorkingBuffer used to store the received data block in the GPU and the ResultBuffer storing the operation result are automatically allocated and reclaimed by the GPU. The working period of the WorkingBuffer is the processing time of a data fragment; the life cycle of the ResultBuffer is WorkingBuffer has the same life cycle. The buffers in the CPU and the GPU are automatically synchronized. For example, the ResultBuffer in the CPU is synchronized with the WorkingBuffer and ResultBuffer in the GPU.

As shown in FIG. 5-a, an embodiment of the present invention further provides a data preprocessor 500, which may include:

a first reading unit 510, configured to read metadata from a first buffer of the CPU; wherein, when the data set acquired from the data fragment is stored in the first buffer, in the first buffer a region header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;

The second reading unit 520 is configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;

The converting unit 530 is configured to parse the data, and generate a data block by using the parsed data set;

The storage unit 540 is configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.

The embodiment of the present invention is applied to a Hadoop cluster under the MapReduce architecture, and the data preprocessor 500 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further configured. There is a data splicer, and each slave node device also has a GPU, and the slave node device acquires data fragments from the master node device of the Hadoop cluster, and then splicing the values in the key value pairs in the data fragment into data sets and storing them into the CPU memory. The first buffer allocated in the first buffer, because the memory of the first buffer may not be able to store the values of all the key-value pairs in the data fragment once, so the value of the key-value pair in the data fragment can be spliced into data multiple times. set.

When the data set is stored in the first buffer, metadata is added to the data set at the head of the first buffer, the metadata mainly including the storage address of the data of the data set in the first buffer. Thereafter, the metadata is read from the first buffer by the first reading unit 510, and then the second reading unit 520 reads the data in the data set from the first buffer according to the storage address indicated by the metadata, and then converts The unit 530 performs data format conversion on the data, and generates a data block by converting the entire data set. The storage unit 540 stores the data block in the second buffer of the CPU, and the second buffer is mainly allocated by the CPU in the memory. The data block is stored so that the data splicer can read the data block from the second buffer and transfer it to the working buffer of the GPU. In the embodiment of the present invention, the data preprocessor automatically completes the data reading and the conversion of the data format, and the programmer does not need to write the corresponding program, which reduces the programming work of the programmer, and is more conducive to the subsequent optimization of the M, apReduce architecture. Improve CPU productivity.

Further, the metadata specifically includes an address index array, the address index array includes a data element corresponding to the data of the data set, and the data element is used to indicate that the data of the data set is at the first The storage address of the buffer, and as shown in FIG. 5-b, the second reading unit 520 may include:

a data reading unit 5210, configured to start reading from a data element of the address index array, starting at a storage address of the first buffer, until the storage address indicated by the next data element or the end of the first buffer ends data.

Specifically, the data reading unit 5210 reads the corresponding data from the storage address in the first buffer according to the storage address indicated by the data element in the address index array until the storage address indicated by the next data element is When reading to the end of the first buffer, it ends, reads a data of the data set, and then continues to read the next data until the data of the data set in the first buffer is read.

As shown in FIG. 5-c, the parsing unit 530 includes:

a data format conversion unit 5310, configured to convert, by using a preset analytic function, data of the data set into a data format that satisfies a logical operation specified by the analytic function;

The generating unit 5320 is configured to generate the data block by converting the data set.

In the MapReduce architecture, the data format specified by the preset analytic function may be the data format required by the GPU logic operation. Specifically, the data format that can be logically operated by the preset analytic function may be shaped data, floating point data, string data, or the like.

As shown in FIG. 5-d, the parsing unit 530 may further include:

The format conversion unit 5330 is configured to convert the data in the data block into the GPU when the storage format of the data of the first buffer to the data set is inconsistent with the storage format of the data in the GPU. The storage format in .

The first buffer of the CPU and the storage format of the GPU may be inconsistent, that is, the processing of the size and mantissa problem is inconsistent. The small-endian storage format refers to the high-order data being stored in the high address, and the low-order storage of the data. In the low address; the big endian storage format means that the upper bits of the data are stored in the lower address, and the status of the data is stored in the upper address.

The first buffer allocated by the CPU has its own member variable, which indicates whether the data is stored in the first buffer in the big tail format or the small tail format. It also indicates whether it needs to be stored in the second buffer. Convert the storage format and give hints that need to be converted to a big tail format or a small tail format. For example, the data in the data set is stored in the first buffer in a big tail format, while the GPU stores the data in a small tail format, and the format conversion unit 5330 converts the data block into a small tail format, which is stored in the second buffer. Area. After that, the data splicer can directly read the data block from the second buffer and splice it to the GPU, ensuring that the second buffer of the CPU and the GPU store the data in the same format, and ensure that the GPU can correctly read the data block for arithmetic processing. Avoid reading the high bit of the data to the low bit, or reading the data bit high to cause an operation error.

As shown in FIG. 6-a, an embodiment of the present invention further provides a data splicer 600, which may include:

a third reading unit 610, configured to read a data block generated by the data preprocessor from a second buffer of the CPU;

The splicing processing unit 620 is configured to splicing the data block into a working buffer of the GPU to which the storage data block is allocated.

The embodiment of the present invention is applied to a Hadoop cluster under the MapReduce architecture. The data splicer 600 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further provided with a data preprocessor 500 as shown in FIG. 5-a. And each slave node device further includes a GPU, and the slave node device acquires data fragments from the master node device of the Hadoop cluster, and then splices the values in the key value pairs in the data fragment into data sets and stores them in the CPU memory. The first buffer, because the memory of the first buffer may not be able to store the values of all the key-value pairs in the data fragment once, the value of the key-value pair in the data fragment can be spliced into data sets multiple times.

The data preprocessor 500 reads data from the first buffer according to the metadata, then converts the data format, and then generates the data block by converting the entire data set after the data format to the second buffer in the CPU, and then Then, the third reading unit 610 of the data splicer reads the data block from the second buffer of the CPU, and the splicing processing unit 620 splices the read data block into the working buffer of the GPU that is allocated the storage data block. in.

Wherein, in the CPU of the slave node device, the data format converter 500 completes the data format conversion, and the data splicer completes the data block splicing, no longer relies on the programmer to write the corresponding program, which can simplify the programmer's programming work. Moreover, the automatic operation of the data preprocessor 500 and the data splicer can improve the working efficiency of the CPU, and is also beneficial to the subsequent optimization of MapReduce.

The data splicer 600 manages a cursor parameter, and the cursor parameter indicates that the working buffer of the GPU can store the starting address of the data. After each time the data block is spliced into the working buffer of the GPU, the cursor parameter is updated accordingly, so that the next time It is possible to know exactly where the GPU's working buffer can store the starting address of the data. When the data block needs to be transferred to the working buffer of the GPU, the splicing processing unit 620 splices the data block into the working buffer of the GPU according to the starting address indicated by the cursor parameter.

Therefore, the splicing processing unit 620 is specifically configured to splicing the data block from a starting address indicated by the vernier parameter, where the vernier parameter is used to indicate that the working buffer allocated to the stored data block in the GPU is available for storing data. The starting address of the block.

As shown in Figure 6-b, the data splicer further includes:

The trigger processing unit 630 is configured to: when the data splicer splicing the data block to the working buffer of the GPU that is allocated to store the data block, suspending splicing the data block and triggering the GPU processing The data block stored by the working buffer.

The third reading unit 610 of the data splicer 600 reads from the data block of the second buffer Data can be directly logically operated and meets the GPU's storage format requirements for data. The API is called to splicing the data in the data block to the working buffer of the GPU. If the remaining memory of the working buffer of the GPU can be spliced out of the data block read from the second buffer of the CPU, the entire data block is spliced into the working buffer of the GPU; if the remaining memory of the working buffer of the GPU cannot After the data block read from the second buffer of the CPU is spliced, that is, when the spliced data block fails, the data block is suspended and the data block is still stored in the second buffer, and the trigger processing unit 630 triggers the GPU to start. All data blocks in the working buffer are processed.

Further, as shown in FIG. 6-c, the data splicer 600 may further include:

a notification unit 640, configured to notify the GPU of a size of the data block;

The updating unit 650 is configured to update the cursor parameter.

After each successful splicing of the data block to the GPU, the notification unit 640 notifies the GPU of the data block size, and the GPU can directly use the data block size without reducing the workload of the GPU. In addition, the update unit 650 further cursors the parameters.

As shown in FIG. 7, the embodiment of the present invention provides a processor 700, which includes a data preprocessor 500 as shown in FIG. 5-a and a data splicer 600 as shown in FIG. 6-a. The introduction of the data preprocessor 500 and the data splicer 600 will not be described herein.

The first buffer and the second buffer are automatically allocated and reclaimed in the CPU. The life cycle of the first buffer is the processing time of one data fragment, and the life cycle of the second buffer is the processing time of one data block. Similarly, the working buffer is automatically allocated in the GPU, and the working time of the working buffer is the processing time of the previous data fragment.

As shown in Figure 8-a, an embodiment of the present invention further provides a slave node device, which may include:

a processor CPU-700 as shown in FIG. 7 above, and a graphics processor GPU-800;

The CPU-700 is as described above and will not be described here.

Specifically, the data preprocessor in the CPU-700 is configured to convert a data set obtained from the data fragment into a data format, and generate a data block by converting the data format after the data format, by using the data in the CPU-700. a splicer splicing the data block into a working buffer of the GPU-800 to allocate a storage data block;

The GPU-800 is configured to process the data block to obtain a processing result, and then return the processing result to the CPU-700.

In the actual application, a ResultBuffer will be automatically allocated and reclaimed in the CPU-700. Similarly, a ResultBuffer is automatically allocated and reclaimed in the GPU-800. The ResultBuffer in the CPU-700 has the same lifetime as the ResultBuffer in the GPU-800. Is the result of storing the operation. In actual application, the first buffer allocated by CPU-700 is DirectBuffer, the second buffer is LaunchingBuffer, and the working buffer allocated by GPU-800 is WorkingBuffer, as shown in Figure 8-b, Figure 8-b is A schematic diagram of interaction between the CPU-700 and the GPU-800 in the slave node device provided by the embodiment of the present invention. As shown in FIG. 8-b, the data preprocessor 500 and the data splicer 600 are set in the CPU-700. In addition, DirectBuffer, LaunchingBuffer and ResultBuffer are allocated in the CPU-700. The DirectBuffer stores a data set that needs to be converted into a data format. The data set includes data composed of values in a key-value pair, and metadata is added in the DirectBuffer. The data mainly includes the data of the data set in the storage address of the DirectBuffer, and the preprocessor 500 can read the data in the data set from the DirectBuffer according to the metadata data, and then perform automatic data format conversion and conversion on the data through the specified preset analytic function. The subsequent data set generates a data block, and finally the data preprocessor 500 stores the data block into the LaunchingBuffer. If the storage format of the data in the data block needs to be converted when stored in the LaunchingBuffer, the storage format is converted to ensure that the data storage format in the LaunchingBuffer is the same as the WorkingBuffer in the GPU-800. The data splicer 600 splicing the data block from the LaunchingBuffer to the WorkingBuffer in the GPU-800. If the splicing fails, the WorkingBuffer can no longer store the data block, and then the GPU is triggered to perform the operation processing on the data block stored in the WorkingBuffer, and the GPU will operate. The result is stored in the ResultBuffer where it is located, and the API interface is called to transfer the result of the operation to the ResultBuffer in the CPU.

Referring to FIG. 9, an embodiment of the present invention further provides a data processing device, which may include: a memory 910 and at least one processor 920 (taking one processor in FIG. 9 as an example). In some embodiments of the embodiments of the present invention, the memory 910 and the processor 920 may be connected by a bus or other means, wherein FIG. 9 is exemplified by a bus connection.

The processor 920 may perform the following steps: the data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first a buffer, adding metadata to the data set in the first buffer header, where the metadata includes a storage address of the data of the data set in the first buffer; the data pre- The processor reads data of the data set from the first buffer according to a storage address indicated by the metadata; the data preprocessor converts data of the data set into a according to a preset analytic function Determining a data format indicated by the analytic function, and generating the data block after the converted data set is stored in a second buffer of the CPU, so that the data splicer reads the second buffer The data block is spliced to the GPU.

or

The data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU; the data splicer splices the data block into a work of the GPU to which a storage data block is allocated Buffer.

In some embodiments of the present invention, the processor 920 may further perform the step of: the data preprocessor instructing the data element of the address index array to start reading at the storage address of the first buffer until the next data element The indicated storage address or the end of the first buffer ends reading data.

In some embodiments of the present invention, the processor 920 may further perform the following steps: the data preprocessor converts data of the data set into a logical operation specified by the analytic function according to the preset analytic function Data Format.

In some embodiments of the invention, processor 920 may also perform the step of converting the data in the data block to a storage format in the GPU.

In some embodiments of the present invention, the processor 920 may further perform the following steps: when the data splicer splicing the data block to the working buffer of the GPU that is allocated the storage data block, the splicing is suspended. Decoding the data block and triggering the GPU to process the data block stored by the working buffer.

In some embodiments of the present invention, the processor 920 may further perform the step of: the data splicer splicing the data block starting from a starting address indicated by a cursor parameter, the cursor parameter being used to indicate that the GPU is allocated The starting address of the data block that can be used to store the data block.

In some embodiments of the present invention, the processor 920 may further perform the following steps: the data spelling The router notifies the GPU of the size of the data block; the data splicer updates the cursor parameter.

In some embodiments of the present invention, the memory 910 can be used to store data sets, metadata, and data blocks;

In some embodiments of the invention, the memory 910 can also be used to store an array of address indices.

In some embodiments of the invention, the memory 910 can also be used to store cursor parameters.

In some embodiments of the invention, the memory 910 can also be used to store the results of the operations.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be performed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, the above mentioned storage. The medium can be a read only memory, a magnetic disk or an optical disk or the like.

The data processing method and the related device provided by the present invention are described in detail above. For those skilled in the art, according to the idea of the embodiment of the present invention, there are changes in the specific implementation manner and application scope. In summary, the content of the specification should not be construed as limiting the invention.

Claims

A data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave device The node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:

The data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, at the first buffer header Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;

The data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata;

The data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the CPU. a second buffer, such that the data splicer reads the data block from the second buffer to splicing to the GPU.
The method according to claim 1, wherein the metadata specifically comprises an address index array, the address index array comprising data elements in one-to-one correspondence with data of the data set, the data elements being used for And indicating that the data of the data set is in a storage address of the first buffer, and further that the data preprocessor reads data in the data set from the first buffer according to the storage address indicated by the metadata, including :

The data preprocessor instructs reading from the storage address of the first buffer from the data element of the address index array until the storage address indicated by the next data element or the end of the first buffer ends reading data.
The method according to claim 1, wherein the converting the data of the data set into the data format indicated by the preset analytic function comprises:

The data pre-processor converts the data of the data set into a data format that satisfies the logical operation specified by the analytic function according to the preset analytic function.
The method of claim 3, wherein the method further comprises:

When the storage format of the data of the first buffer to the data set is inconsistent with the storage format of the data in the GPU, the data set generated by the converted data set includes:

The data preprocessor converts data in the data block into a storage format in the GPU.
The method according to any one of claims 1 to 4, characterized in that

The data set is specifically composed of splicing values of a plurality of key value pairs in the data slice.
The method according to any one of claims 1 to 4, characterized in that

The first buffer and the second buffer are automatically allocated and reclaimed by the CPU, the life cycle of the first buffer is a processing time of a data fragment, and the life cycle of the second buffer is The processing time of a data collection.
A data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave device The node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:

The data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU;

The data splicer splices the data block into a working buffer of the GPU that is allocated a block of stored data.
The method of claim 7, wherein the method further comprises:

When the data splicer splicing the data block to a working buffer of the GPU that is allocated to store the data block, suspending splicing the data block and triggering the GPU to process the working buffer storage data block.
The method according to claim 7 or 8, wherein the data splicer splicing the data block into a working buffer of the GPU to which the storage data block is allocated comprises:

The data splicer splices the data block from a starting address indicated by a cursor parameter, where the cursor parameter is used to indicate a starting address of a working buffer in the GPU in which a storage data block is allocated for storing a data block. .
The method according to claim 9, wherein after the splicing of the data block is successful, the method further comprises:

The data splicer notifies the GPU of the size of the data block;

The data splicer updates the cursor parameters.
A data preprocessor, comprising:

a first reading unit, configured to read metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, in the first buffer The header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;

a second reading unit, configured to read data of the data set from the first buffer according to a storage address indicated by the metadata;

a converting unit, configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;

And a storage unit, configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.
The data preprocessor according to claim 11, wherein the metadata specifically includes an address index array, the address index array including data elements in one-to-one correspondence with data of the data set, the data The element is used to indicate that the data of the data set is in a storage address of the first buffer, and the second reading unit includes:

a data reading unit, configured to start reading from a data element of the address index array, starting at a storage address of the first buffer, until the storage address indicated by the next data element or the end of the first buffer ends reading data .
The data preprocessor according to claim 11 or 12, wherein the parsing unit comprises:

a data format conversion unit, configured to convert data of the data set into a data format that satisfies a logical operation specified by the analytic function by a preset analytic function

Generating unit for generating a data block by converting the data set.
The data preprocessor according to claim 11, wherein the parsing unit further comprises:

a format conversion unit, configured to convert data in the data block into data when a storage format of data of the first buffer to the data set is inconsistent with a storage format of data in the GPU The storage format in the GPU.
A data splicer, comprising:

a third reading unit, configured to read a data block generated by the data preprocessor from a second buffer of the CPU;

a splicing processing unit, configured to splicing the data block into a working buffer of the GPU to which a storage data block is allocated.
The data splicer according to claim 15, wherein the data splicer further comprises:

a triggering processing unit, configured to: when the data splicer splicing the data block to a working buffer of the GPU that is allocated a storage data block, suspending splicing the data block, and triggering the GPU processing The data block stored in the working buffer.
The data splicer according to claim 15 or 16, wherein the splicing processing unit is specifically configured to splicing the data block starting from a starting address indicated by a cursor parameter, wherein the vernier parameter is used to indicate the GPU The starting address of the working block in which the memory block is allocated can be used to store the data block.
The data splicer according to claim 17, wherein the data splicer further comprises:

a notification unit, configured to notify the GPU of the size of the data block;

An update unit for updating the cursor parameter.
A processor, comprising: the data preprocessor of claim 11 and the data splicer of claim 15.
The processor of claim 19, wherein the processor further comprises:

Automatically allocating and reclaiming the first buffer and the second buffer, wherein a life cycle of the first buffer is a processing time of one data fragment, and a life cycle of the second buffer is a data set. Processing time.
A slave node device, wherein the slave node device is a slave node device in a Hadoop cluster, the Hadoop cluster further includes a master node device, and the slave node device receives data fragments from the Hadoop cluster. The slave node device includes: a graphics processor GPU and the processor CPU according to claim 19;

The data preprocessor in the CPU is configured to convert a data set obtained from the data fragment into a data format, and generate a data block by converting the data format after the data format, by using a data splicer in the CPU Data blocks are spliced into a working buffer of the GPU that allocates storage data blocks;

The GPU is configured to process the data block to obtain a processing result, and then return the processing result to the CPU.
The slave device according to claim 21, wherein the GPU further comprises:

The working buffer is automatically allocated and reclaimed, and the working period of the working buffer is the processing time of one data fragment.