WO2015096649A1 - Data processing method and related device - Google Patents

Data processing method and related device Download PDF

Info

Publication number
WO2015096649A1
WO2015096649A1 PCT/CN2014/094071 CN2014094071W WO2015096649A1 WO 2015096649 A1 WO2015096649 A1 WO 2015096649A1 CN 2014094071 W CN2014094071 W CN 2014094071W WO 2015096649 A1 WO2015096649 A1 WO 2015096649A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
buffer
gpu
block
splicer
Prior art date
Application number
PCT/CN2014/094071
Other languages
French (fr)
Chinese (zh)
Inventor
崔慧敏
谢睿
阮功
杨文森
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015096649A1 publication Critical patent/WO2015096649A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/541Interprogram communication via adapters, e.g. between incompatible applications

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a data processing method and related equipment.
  • Cloud computing has powerful big data computing power, and computing speed is very fast, but the transmission of big data has become a big problem.
  • MapReduce (there is no unified Chinese translation in the field) is a well-known cloud computing architecture provided by Google search engine Google for parallel computing on large-scale data sets (greater than 1TB), Hadoop (there is no unified in this field) Chinese translation) is a concrete implementation of the MapReduce architecture, which is divided into a master node device and a slave node device in a Hadoop cluster.
  • the master node device uses the Map function provided by MapReduce to divide the data set into M pieces of data fragments according to the size, and distributes the data fragments to multiple slave nodes for parallel processing.
  • each slave node device obtains the value of the key value pair from the data fragment, and stores the value in a buffer allocated from a processor (Central Processing Unit, CPU for short) of the node device, and then, the buffer buffer.
  • a processor Central Processing Unit, CPU for short
  • the buffer buffer For example, convert the data format of the value of the key-value pair, and then splicing the parsed value to the graphics processor of the slave node through an Application Programming Interface (API)
  • API Application Programming Interface
  • the GPU Graphics Processing Unit allocates buffers for storing data, and the GPU performs calculation processing.
  • the present inventors have found that since the analytic function is not provided in the MapReduce architecture, when parsing the value of the key value pair, it is necessary to rely on the corresponding program written by the programmer; meanwhile, the CPU allocates the storage key value.
  • the buffer of the value of the buffer may be inconsistent with the buffer size allocated by the GPU to store the data, and the corresponding judgment method is not provided in the MapReduce architecture, and the buffer corresponding to the CPU and the GPU is also relied on the corresponding judgment function written by the programmer. Whether the judgment is consistent or not, and the execution efficiency of the slave node device is reduced.
  • the embodiment of the present invention provides a data processing method and related device, which is applied to a Hadoop cluster under the MapReduce architecture, which can improve the working efficiency of the slave node device in the Hadoop cluster and simplify the programmer's programming work, which is beneficial to the programming. Subsequent optimization of the MapReduce architecture.
  • the present invention provides a data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU.
  • the slave node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:
  • the data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, at the first buffer header Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
  • the data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata
  • the data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the CPU. a second buffer, such that the data splicer reads the data block from the second buffer to splicing to the GPU.
  • the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element Means for indicating that the data of the data set is in a storage address of the first buffer, and wherein the data preprocessor reads the data set from the first buffer according to the storage address indicated by the metadata
  • the data includes: the data preprocessor instructing to start reading from a storage address of the first buffer from a data element of the address index array until a storage address indicated by a next data element or an end of the first buffer end is read Take data.
  • the converting the data of the data set into the data format indicated by the preset analytic function comprises: the data preprocessor according to the preset The analytic function converts the data of the data set into a data format that satisfies the logical operation specified by the analytic function.
  • the generating the data block after the converted data set comprises: the data preprocessor converting the data in the data block into a storage format in the GPU.
  • the data set is specifically composed of a value splicing of a plurality of key value pairs in the data fragment.
  • the first buffer and the second buffer are automatically allocated and reclaimed by the CPU, and a life cycle of the first buffer is a processing time of a data fragment, and the second buffer is The life cycle is the processing time of a data set.
  • a second aspect of the present invention provides a data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU.
  • the slave node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:
  • the data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU;
  • the data splicer splices the data block into a working buffer of the GPU that is allocated a block of stored data.
  • the data splicer splicing the data block from a starting address indicated by a cursor parameter
  • the cursor parameter is used to indicate a starting address in the working buffer of the GPU in which the storage data block is allocated for storing the data block.
  • the method further includes: the data splicer notifying the GPU of the data The size of the block; the data splicer updates the cursor parameters.
  • a third aspect of the present invention provides a data preprocessor, including:
  • a first reading unit configured to read metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, in the first buffer The header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
  • a second reading unit configured to read data of the data set from the first buffer according to a storage address indicated by the metadata
  • a converting unit configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;
  • a storage unit configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.
  • the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element a storage address indicating that the data of the data set is in the first buffer, and wherein the second reading unit comprises: a data reading unit, configured to indicate from the data element of the address index array that the first The memory address of the buffer begins to be read until the memory address indicated by the next data element or the end of the first buffer ends to read data.
  • the parsing unit includes: a data format converting unit, configured to use the preset analytic function The data of the data set is converted into a data format generating unit that satisfies the logical operation specified by the analytic function, and is configured to generate the data block by converting the data set.
  • the parsing unit further includes: a format converting unit, configured to: when the first buffer stores data of the data set, in a format of the GPU When the storage format of the data is inconsistent, the data in the data block is converted into a storage format in the GPU.
  • a fourth aspect of the present invention provides a data splicer, including:
  • a third reading unit configured to read a data block generated by the data preprocessor from a second buffer of the CPU
  • a splicing processing unit configured to splicing the data block into a working buffer of the GPU to which a storage data block is allocated.
  • the data splicer further includes: a trigger processing unit, configured to: when the data splicer splices the data block into the GPU, is allocated to store data When the working buffer of the block fails, the data block is suspended and the GPU is triggered to process the data block stored in the working buffer.
  • a trigger processing unit configured to: when the data splicer splices the data block into the GPU, is allocated to store data When the working buffer of the block fails, the data block is suspended and the GPU is triggered to process the data block stored in the working buffer.
  • the splicing processing unit is specifically configured to splicing the data block from a starting address indicated by a cursor parameter
  • the cursor parameter is used to indicate a starting address in the working buffer of the GPU in which the storage data block is allocated for storing the data block.
  • the data splicer further includes: a notification unit, configured to notify the GPU of the size of the data block; and an update unit, Used to update the cursor parameters.
  • a fifth aspect of the present invention provides a processor, which may include the data preprocessor according to the above third aspect and the data splicer according to the fourth aspect.
  • the first buffer and the second buffer are automatically allocated and reclaimed, and a life cycle of the first buffer is a processing time of a data fragment.
  • the life cycle of the second buffer is the processing time of one data set.
  • a sixth aspect of the present invention provides a slave node device, which may include the processor CPU described in the above fifth aspect, and a graphics processor GPU; wherein the data preprocessor in the CPU is used to obtain the data slice from the data slice.
  • the data set converts the data format, and converts the data set after the data format to generate a data block, and splices the data block into a working buffer of the GPU to allocate the storage data block by using a data splicer in the CPU;
  • the GPU is configured to process the data block to obtain a processing result, and then return the processing result to the CPU.
  • the embodiment of the present invention reads the metadata from the first buffer of the CPU by the data preprocessor by setting the data preprocessor and the data splicer in the slave node device, because the metadata is in the number
  • the data set is generated for the data set when the set is stored in the first buffer, and is used to represent the storage address of the data of the data set in the first buffer, and then the data preprocessor can read the data set from the first buffer according to the metadata.
  • the data is further converted into a data set according to a preset analytic function, and then the converted data set is generated into a data block, and the data block is stored in a second buffer of the CPU so as to be completed by the data splicer.
  • GPU data block splicing is
  • the data preprocessor when the data set is stored into the first buffer, metadata including the storage address is added to the data of the data set, and the data preprocessor can automatically read from the first buffer. Taking data from a data collection does not require relying on the programmer to write the appropriate program. Furthermore, the data preprocessor can parse the data of the data set according to the preset analytic function, improve the processing efficiency in the CPU, and can be beneficial to the subsequent optimization of the MapReduce architecture;
  • the data block is read from the second buffer through the data splicer and spliced into the working buffer of the GPU that is allocated the storage data block, and the splicing fails, indicating the remaining of the working buffer of the GPU that is allocated the storage data block. If the memory is not enough to complete the splicing of the data block, the splicing of the data block is temporarily stopped, and the GPU is triggered to perform data operation on the data block. The data block will also be temporarily saved in the second buffer, and then spliced next time. Compared with the prior art, it does not need to rely on a program written by a programmer, and the data splicing can be automatically completed by the data splicer, which effectively prevents data block loss and improves data block splicing efficiency.
  • FIG. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a data processing method according to another embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a data processing method according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a data processing method according to another embodiment of the present invention.
  • FIG. 5-a is a schematic structural diagram of a data preprocessor according to an embodiment of the present invention.
  • FIG. 5-b is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention.
  • FIG. 5-c is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention.
  • FIG. 5-d is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention.
  • 6-a is a schematic structural diagram of a data splicer according to an embodiment of the present invention.
  • 6-b is a schematic structural diagram of a data splicer according to another embodiment of the present invention.
  • 6-c is a schematic structural diagram of a data splicer according to another embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a processor according to an embodiment of the present invention.
  • FIG. 8-a is a schematic structural diagram of a slave node device according to an embodiment of the present invention.
  • FIG. 8-b is a schematic diagram of interaction between a CPU and a GPU in a slave node device according to an embodiment of the present invention
  • FIG. 9 is a schematic structural diagram of a data processing device according to an embodiment of the present invention.
  • the embodiment of the invention provides a data processing method and related device, which is applied to a Hadoop cluster under the MapReduce architecture, realizes automatic conversion of Hadoop slave node data format and automatic data splicing, simplifies programmer programming, and is beneficial to subsequent optimization of MapReduce. Architecture.
  • an aspect of the present invention provides a data processing method, including:
  • the data preprocessor reads the metadata from the first buffer of the CPU. When the data set obtained from the data fragment is stored in the first buffer, the first buffer is in the first buffer. Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
  • the embodiment of the present invention is applied to a Hadoop cluster in a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave node device obtains data points from the master node device. Slice, and a data preprocessor and data splicer are provided in the CPU.
  • the data mainly includes the storage address of the data in the data set in the first buffer.
  • the data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata.
  • the data preprocessor can directly read the data of the data set from the first buffer according to the indication of the metadata, without having to rely on the programmer to write an additional program. To read the data.
  • the data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the a second buffer of the CPU such that the data splicer reads the data block from the second buffer to spliced to the GPU.
  • an analytic function is pre-configured in the MapReduce architecture, and the data preprocessor can parse the data of the data set in the first buffer according to a preset analytic function, convert the data format into a preset analytic function, and then convert the data format.
  • the subsequent data set generates a data block.
  • a second buffer is allocated in the CPU for storing data blocks. The data splicer can then read the data block from the second buffer into the GPU.
  • the metadata since the metadata is added to the data set when the data set is stored in the first buffer of the CPU, the metadata includes the storage address of the data of the data set in the first buffer, and therefore, the data is preprocessed.
  • the device After reading the metadata from the first buffer, the device reads the data of the data set from the first buffer according to the storage address indicated by the metadata, and then converts the data format of the data by using a preset analytic function, and converts the format.
  • the subsequent data set generates a data block and stores it in the second buffer in the CPU, thereby implementing the operation of automatically reading the data of the first buffer by the data preprocessor and parsing the data, without further relying on the program.
  • the programming of the staff provides a more complete MapReduce architecture for the programmer, and is also beneficial for subsequent optimization of the MapReduce architecture.
  • mapping Map function is specified to map the input key-value pairs into new key-value pairs; and the concurrent reduced Reduce function is used to ensure the key-value pairs of all the mappings. Each of them shares the same key group.
  • Map function maps the input key-value pairs into new key-value pairs, all the new key-value pairs are divided into different data fragments according to the data size by the master node device in the Hadoop cluster, according to the data fragmentation arrangement. Perform corresponding arithmetic processing for each slave node device.
  • the CPU In the CPU where the slave device is located, call the RecordReader class to get the key in the data fragment. Value pairs, and extract the values from the key-value pairs into a data set.
  • the CPU allocates a DirectBuffer to its data set in its memory.
  • the data set is stored in the DirectBuffer in the format of the DirectBuffer.
  • metadata is added to the data set at the head of the DirectBuffer.
  • a preset analytic function for parsing data of the data set is pre-set, and the preset analytic function specifically converts the data into a specified data format that satisfies the logical operation.
  • a data processing method may include:
  • the data preprocessor reads the metadata from the DirectBuffer, where the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element a storage address for indicating data of the data set in the DirectBuffer;
  • the metadata when storing the data set into the DirectBuffer, metadata is added at the head of the DirectBuffer to indicate the storage address of the data in the data set in the DirectBuffer.
  • the metadata may include an address index array.
  • the storage address of the data is added to the address index array according to the data in the data set at the position of the DirectBuffer.
  • the address index array has data elements that correspond one-to-one with the data in the data set, and the data elements indicate the storage address of the data of the data set in the DirectBuffer.
  • the data stored in the data collection of DirectBuffer is the same data format, and can be a format such as text format or binary that cannot be logically operated.
  • the data preprocessor reads data in the data set from the DirectBuffer according to the data element of the address index array in the metadata.
  • the data preprocessor reads the corresponding data from the storage address in the DirectBuffer according to the storage address indicated by the data element in the address index array until the storage address indicated by the next data element is read or When the end of the DirectBuffer ends, one data of the data set is read, and then the next data is read until the data of the data set in the DirectBuffer is read.
  • the data preprocessor converts data of the data set according to a preset analytic function. And replacing with the data format specified by the preset analytic function to satisfy the logical operation;
  • the data stored in the data set of the DirectBuffer is generally a data format that cannot be logically operated, and needs to be converted into a format that can be logically operated before being transferred to the GPU for logical operations. Therefore, the analytic function is preset in the MapReduce architecture, and the data preprocessor automatically converts the data format according to the preset analytic function, and converts into a data format that satisfies the logical operation specified by the analytic function.
  • the data format specified by the preset analytic function may be a data format required by the GPU logic operation.
  • the data format that can be logically operated by the preset analytic function may be shaped data, floating point data, string data, or the like.
  • the preprocessor converts the data set after the data format to generate a data block.
  • the data preprocessor After the data preprocessor automatically converts each data into a data format that can be logically operated according to a preset analytic function according to a preset analytic function, the data format is converted in order to facilitate subsequent splicing of data between the CPU and the GPU.
  • the subsequent data set generates a data block.
  • the preprocessor stores the data block in a LaunchingBuffer, so that the data splicer reads the data block from the LaunchingBuffer and splices it to the GPU.
  • the CPU further allocates a LaunchingBuffer in the memory to temporarily store the data format converted data block, wherein the data preprocessor stores the data block in the LaunchingBuffer, and then the data splicer completes the reading of the data block from the LaunchingBuffer to the data block.
  • the data stored in the DirectBuffer of the CPU and the data to be processed by the GPU may be inconsistent in the storage format, that is, the processing of the size and mantissa problem is inconsistent, wherein the small endian storage format refers to the high order of the data stored in the high In the address, the lower bits of the data are stored in the lower address; the big-endian storage format means that the upper bits of the data are stored in the lower address, and the status of the data is stored in the upper address. Therefore, the data preprocessor also needs to solve the problem of the size and mantissa of the data block.
  • the DirectBuffer allocated by the CPU has its own member variable, which indicates whether the data is stored in the DirectBuffer in the big-end format or the small-tail format. It also indicates whether the storage format needs to be converted when stored in the LaunchingBuffer, and the conversion needs to be given.
  • the big tail format is a hint for the small tail format. For example, the data in the data set is stored in the DirectBuffer in the big tail format, while the GPU stores the data in the small tail format. When the data block is stored in the LaunchingBuffer, the data in the data block is stored in a small endian format. save at LaunchingBuffer.
  • the data splicer can directly read the data block from the LaunchingBuffer and splice it to the GPU, ensuring that the storage format of the data of the LaunchingBuffer and the GPU of the CPU is consistent, ensuring that the GPU can correctly read the data block for arithmetic processing, and avoid reading the data high. Going low, or reading the data status high, causes an arithmetic error.
  • the data preprocessor first reads the address index array from the DirectBuffer, and reads the data in the corresponding data set from the DirectBuffer according to the data elements in the address index array, and then, according to the preset
  • the analytic function implements data format conversion on the data in the data set, so that the data after the data format conversion can satisfy the logical operation.
  • the data set generation data block is stored in the LaunchingBuffer, and the data splicer reads the data block from the LaunchingBuffer and transmits it to the GPU.
  • the embodiment of the invention is completed by the data preprocessor in the CPU, and the data is automatically parsed by the preset analytic function, which facilitates the operation of the data block by the GPU, and the data preprocessor is used to simplify the programming work of the slave device. Conducive to future optimization.
  • the CPU automatically allocates and reclaims WorkingBuffer and LaunchingBuffer.
  • the working period of a WorkingBuffer is the processing time of one data fragment, and the life cycle of a LaunchingBuffer is the time to process a data collection.
  • the ResultBuffer is also allocated on the CPU to store the operation result returned by the GPU operation, and then the operation result is used as the input of the Reduce task in the MapReduce.
  • another aspect of the present invention provides a data processing method, including:
  • the data splicer reads the data block generated by the data preprocessor from the second buffer of the CPU.
  • the embodiment of the present invention is applied to a Hadoop cluster in a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave node device obtains data points from the master node device. Slice, and a data preprocessor and data splicer are provided in the CPU.
  • the data preprocessor is configured to read data of the data set from the first buffer of the CPU, convert the data into a data format, and store the data set generated data block into the second buffer.
  • the data splicer mainly completes the splicing of data blocks from the CPU to the GPU.
  • the data splicer splices the data block into a working buffer of a GPU that is allocated a storage data block.
  • the data splicer reads the data block from the second buffer of the CPU, and splices the data block from the second buffer of the CPU to the working buffer of the GPU.
  • the data splicing is completed by the data splicer, which is no longer dependent on the programming of the programmer, thereby simplifying the programmer's programming work and facilitating the subsequent optimization of the entire MapReduce architecture.
  • a data processing method may include:
  • the data splicer reads a data block from the LaunchingBuffer.
  • the CPU also allocates a LaunchingBuffer in the memory, which is mainly used to store data blocks that need to be spliced to the GPU.
  • S420 The data splicer splices the data block from a starting address indicated by a cursor parameter, where the cursor parameter is used to indicate a starting address of a WorkingBuffer in the GPU where a storage data block is allocated for storing a data block. ;
  • WorkingBuffer is allocated in the GPU memory, which is mainly used to store data spliced from the CPU's LaunchingBuffer; the memory size of the WorkingBuffer is determined by the GPU itself, and the memory size of the DirectBuffer in the CPU is determined by the Java operating environment. Generally speaking, the memory size of the WorkingBuffer on the GPU is much larger than the memory of the DirectBuffer supported by Java in the CPU. Therefore, the WorkingBuffer may store at least one data block obtained from the DirectBuffer, and when stored in a certain data block, The remaining memory of the WorkingBuffer may no longer be able to store the data block, which will be processed correctly by the data splicer.
  • the data splicer manages a cursor parameter, and the cursor parameter indicates that the WorkingBuffer can store the starting address of the data. After each splicing of the data block to the WorkingBuffer, the cursor parameter is updated accordingly, so that the WorkingBuffer can be accurately stored next time. The starting address of the data. When the data block needs to be transferred to the WorkingBuffer, the data block is spliced into the WorkingBuffer starting from the starting address indicated by the cursor parameter.
  • the data in the data block read by the data splicer from the LaunchingBuffer can be directly logically operated, and meets the storage format requirement of the GPU for the data. Invoking an application programming interface (API) to stitch data in the data block to WorkingBuffer. If the remaining memory of the WorkingBuffer can be spliced out of the data block read from the CPU's LaunchingBuffer, the entire data block is spliced into the WorkingBuffer; if the remaining memory of the WorkingBuffer cannot be spliced, the data block read from the CPU's LaunchingBuffer is suspended. The data block is spliced, the data block is still stored in the LaunchingBuffer, and the GPU is triggered to start processing all the data blocks in the WorkingBuffer.
  • API application programming interface
  • the data splicer in the CPU is used to solve the problem of data block splicing when the remaining memory size of the WorkingBuffer in the DirectBuffer and the GPU in the CPU is inconsistent.
  • the data splicer directly splices the data block from the LaunchingBuffer to the WorkingBuffer. If the remaining memory size of the WorkingBuffer cannot satisfy the storage data block, the splicing operation is temporarily stopped. If the remaining memory of the WorkingBuffer can be spliced again, again.
  • the read data block in the LaunchingBuffer is spliced into the WorkingBuffer. Since the data block in the LaunchingBuffer has already met the needs of the GPU for data processing, the GPU can directly perform arithmetic processing after receiving the data block, thereby effectively improving the working efficiency of the GPU.
  • the data splicer notifies a size of the data block of the GPU.
  • the data splicer updates the cursor parameter.
  • the data splicer notifies the GPU to the GPU after each successful splicing of the data block to the GPU, and the GPU can directly use the data block size without reducing the GPU workload.
  • the address index array is used to indicate that the data is stored in the DirectBuffer storage address
  • the GPU may also add a lookup index array to the data block in the WorkingBuffer header, and the search index array contains the data block.
  • the data corresponds to the data element, and the data element is used to indicate the storage address of the data in the WorkingBuffer.
  • the data splicer splices a data block
  • the data element corresponding to each data of the data block is added in the search index array, so that the GPU can quickly find the data from the WorkingBuffer and read the data for operation.
  • steps B1 and B2 are in no particular order and are not limited herein.
  • each data slice received may eventually generate multiple data blocks
  • the WorkingBuffer allocated in the GPU is stored in units of data blocks, and the lifetime is one processed. The time of data fragmentation. After the data splicer successfully transmits the entire data fragment, the data splicer returns the flag value of the successful transmission to notify the master node device to allocate the next data fragment; after the data splicer fails to transmit the data fragment, it returns The failed flag value is transmitted to inform the master node device to suspend the allocation of the next data slice.
  • the ResultBuffer is also allocated in the GPU memory.
  • the ResultBuffer is used to save the result of the operation, and then the API interface is called, and the operation result is returned to the CPU and stored in the ResultBuffer allocated by the CPU as an input of the Reduce task under the MapReduce.
  • the DirectBuffer used to store the data set in the CPU, the LaunchingBuffer that stores the data block after the data format conversion, and the ResultBuffer used to store the result returned by the GPU are automatically allocated and reclaimed by the CPU.
  • the life cycle of the LaunchingBuffer is a data block. Processing time; the WorkingBuffer used to store the received data block in the GPU and the ResultBuffer storing the operation result are automatically allocated and reclaimed by the GPU.
  • the working period of the WorkingBuffer is the processing time of a data fragment; the life cycle of the ResultBuffer is WorkingBuffer has the same life cycle.
  • the buffers in the CPU and the GPU are automatically synchronized. For example, the ResultBuffer in the CPU is synchronized with the WorkingBuffer and ResultBuffer in the GPU.
  • an embodiment of the present invention further provides a data preprocessor 500, which may include:
  • a first reading unit 510 configured to read metadata from a first buffer of the CPU; wherein, when the data set acquired from the data fragment is stored in the first buffer, in the first buffer a region header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
  • the second reading unit 520 is configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;
  • the converting unit 530 is configured to parse the data, and generate a data block by using the parsed data set;
  • the storage unit 540 is configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.
  • the embodiment of the present invention is applied to a Hadoop cluster under the MapReduce architecture, and the data preprocessor 500 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further configured.
  • the data preprocessor 500 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further configured.
  • There is a data splicer and each slave node device also has a GPU, and the slave node device acquires data fragments from the master node device of the Hadoop cluster, and then splicing the values in the key value pairs in the data fragment into data sets and storing them into the CPU memory.
  • the first buffer allocated in the first buffer because the memory of the first buffer may not be able to store the values of all the key-value pairs in the data fragment once, so the value of the key-value pair in the data fragment can be spliced into data multiple times. set.
  • the metadata is added to the data set at the head of the first buffer, the metadata mainly including the storage address of the data of the data set in the first buffer.
  • the metadata is read from the first buffer by the first reading unit 510, and then the second reading unit 520 reads the data in the data set from the first buffer according to the storage address indicated by the metadata, and then converts
  • the unit 530 performs data format conversion on the data, and generates a data block by converting the entire data set.
  • the storage unit 540 stores the data block in the second buffer of the CPU, and the second buffer is mainly allocated by the CPU in the memory. The data block is stored so that the data splicer can read the data block from the second buffer and transfer it to the working buffer of the GPU.
  • the data preprocessor automatically completes the data reading and the conversion of the data format, and the programmer does not need to write the corresponding program, which reduces the programming work of the programmer, and is more conducive to the subsequent optimization of the M, apReduce architecture. Improve CPU productivity.
  • the metadata specifically includes an address index array
  • the address index array includes a data element corresponding to the data of the data set
  • the data element is used to indicate that the data of the data set is at the first
  • the storage address of the buffer and as shown in FIG. 5-b, the second reading unit 520 may include:
  • a data reading unit 5210 configured to start reading from a data element of the address index array, starting at a storage address of the first buffer, until the storage address indicated by the next data element or the end of the first buffer ends data.
  • the data reading unit 5210 reads the corresponding data from the storage address in the first buffer according to the storage address indicated by the data element in the address index array until the storage address indicated by the next data element is When reading to the end of the first buffer, it ends, reads a data of the data set, and then continues to read the next data until the data of the data set in the first buffer is read.
  • the parsing unit 530 includes:
  • a data format conversion unit 5310 configured to convert, by using a preset analytic function, data of the data set into a data format that satisfies a logical operation specified by the analytic function;
  • the generating unit 5320 is configured to generate the data block by converting the data set.
  • the data format specified by the preset analytic function may be the data format required by the GPU logic operation.
  • the data format that can be logically operated by the preset analytic function may be shaped data, floating point data, string data, or the like.
  • the parsing unit 530 may further include:
  • the format conversion unit 5330 is configured to convert the data in the data block into the GPU when the storage format of the data of the first buffer to the data set is inconsistent with the storage format of the data in the GPU.
  • the first buffer of the CPU and the storage format of the GPU may be inconsistent, that is, the processing of the size and mantissa problem is inconsistent.
  • the small-endian storage format refers to the high-order data being stored in the high address, and the low-order storage of the data. In the low address; the big endian storage format means that the upper bits of the data are stored in the lower address, and the status of the data is stored in the upper address.
  • the first buffer allocated by the CPU has its own member variable, which indicates whether the data is stored in the first buffer in the big tail format or the small tail format. It also indicates whether it needs to be stored in the second buffer. Convert the storage format and give hints that need to be converted to a big tail format or a small tail format. For example, the data in the data set is stored in the first buffer in a big tail format, while the GPU stores the data in a small tail format, and the format conversion unit 5330 converts the data block into a small tail format, which is stored in the second buffer. Area.
  • the data splicer can directly read the data block from the second buffer and splice it to the GPU, ensuring that the second buffer of the CPU and the GPU store the data in the same format, and ensure that the GPU can correctly read the data block for arithmetic processing. Avoid reading the high bit of the data to the low bit, or reading the data bit high to cause an operation error.
  • an embodiment of the present invention further provides a data splicer 600, which may include:
  • a third reading unit 610 configured to read a data block generated by the data preprocessor from a second buffer of the CPU
  • the splicing processing unit 620 is configured to splicing the data block into a working buffer of the GPU to which the storage data block is allocated.
  • the embodiment of the present invention is applied to a Hadoop cluster under the MapReduce architecture.
  • the data splicer 600 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further provided with a data preprocessor 500 as shown in FIG. 5-a.
  • each slave node device further includes a GPU, and the slave node device acquires data fragments from the master node device of the Hadoop cluster, and then splices the values in the key value pairs in the data fragment into data sets and stores them in the CPU memory.
  • the first buffer because the memory of the first buffer may not be able to store the values of all the key-value pairs in the data fragment once, the value of the key-value pair in the data fragment can be spliced into data sets multiple times.
  • the data preprocessor 500 reads data from the first buffer according to the metadata, then converts the data format, and then generates the data block by converting the entire data set after the data format to the second buffer in the CPU, and then Then, the third reading unit 610 of the data splicer reads the data block from the second buffer of the CPU, and the splicing processing unit 620 splices the read data block into the working buffer of the GPU that is allocated the storage data block. in.
  • the data format converter 500 completes the data format conversion, and the data splicer completes the data block splicing, no longer relies on the programmer to write the corresponding program, which can simplify the programmer's programming work.
  • the automatic operation of the data preprocessor 500 and the data splicer can improve the working efficiency of the CPU, and is also beneficial to the subsequent optimization of MapReduce.
  • the data splicer 600 manages a cursor parameter, and the cursor parameter indicates that the working buffer of the GPU can store the starting address of the data. After each time the data block is spliced into the working buffer of the GPU, the cursor parameter is updated accordingly, so that the next time It is possible to know exactly where the GPU's working buffer can store the starting address of the data.
  • the splicing processing unit 620 splices the data block into the working buffer of the GPU according to the starting address indicated by the cursor parameter.
  • the splicing processing unit 620 is specifically configured to splicing the data block from a starting address indicated by the vernier parameter, where the vernier parameter is used to indicate that the working buffer allocated to the stored data block in the GPU is available for storing data. The starting address of the block.
  • the data splicer further includes:
  • the trigger processing unit 630 is configured to: when the data splicer splicing the data block to the working buffer of the GPU that is allocated to store the data block, suspending splicing the data block and triggering the GPU processing The data block stored by the working buffer.
  • the third reading unit 610 of the data splicer 600 reads from the data block of the second buffer Data can be directly logically operated and meets the GPU's storage format requirements for data.
  • the API is called to splicing the data in the data block to the working buffer of the GPU.
  • the entire data block is spliced into the working buffer of the GPU; if the remaining memory of the working buffer of the GPU cannot After the data block read from the second buffer of the CPU is spliced, that is, when the spliced data block fails, the data block is suspended and the data block is still stored in the second buffer, and the trigger processing unit 630 triggers the GPU to start. All data blocks in the working buffer are processed.
  • the data splicer 600 may further include:
  • a notification unit 640 configured to notify the GPU of a size of the data block
  • the updating unit 650 is configured to update the cursor parameter.
  • the notification unit 640 After each successful splicing of the data block to the GPU, the notification unit 640 notifies the GPU of the data block size, and the GPU can directly use the data block size without reducing the workload of the GPU.
  • the update unit 650 further cursors the parameters.
  • the embodiment of the present invention provides a processor 700, which includes a data preprocessor 500 as shown in FIG. 5-a and a data splicer 600 as shown in FIG. 6-a.
  • a processor 700 which includes a data preprocessor 500 as shown in FIG. 5-a and a data splicer 600 as shown in FIG. 6-a.
  • the introduction of the data preprocessor 500 and the data splicer 600 will not be described herein.
  • the first buffer and the second buffer are automatically allocated and reclaimed in the CPU.
  • the life cycle of the first buffer is the processing time of one data fragment
  • the life cycle of the second buffer is the processing time of one data block.
  • the working buffer is automatically allocated in the GPU, and the working time of the working buffer is the processing time of the previous data fragment.
  • an embodiment of the present invention further provides a slave node device, which may include:
  • processor CPU-700 as shown in FIG. 7 above, and a graphics processor GPU-800;
  • the CPU-700 is as described above and will not be described here.
  • the data preprocessor in the CPU-700 is configured to convert a data set obtained from the data fragment into a data format, and generate a data block by converting the data format after the data format, by using the data in the CPU-700.
  • a splicer splicing the data block into a working buffer of the GPU-800 to allocate a storage data block;
  • the GPU-800 is configured to process the data block to obtain a processing result, and then return the processing result to the CPU-700.
  • a ResultBuffer will be automatically allocated and reclaimed in the CPU-700.
  • a ResultBuffer is automatically allocated and reclaimed in the GPU-800.
  • the ResultBuffer in the CPU-700 has the same lifetime as the ResultBuffer in the GPU-800. Is the result of storing the operation.
  • the first buffer allocated by CPU-700 is DirectBuffer
  • the second buffer is LaunchingBuffer
  • the working buffer allocated by GPU-800 is WorkingBuffer
  • Figure 8-b is A schematic diagram of interaction between the CPU-700 and the GPU-800 in the slave node device provided by the embodiment of the present invention. As shown in FIG.
  • the data preprocessor 500 and the data splicer 600 are set in the CPU-700.
  • DirectBuffer, LaunchingBuffer and ResultBuffer are allocated in the CPU-700.
  • the DirectBuffer stores a data set that needs to be converted into a data format.
  • the data set includes data composed of values in a key-value pair, and metadata is added in the DirectBuffer.
  • the data mainly includes the data of the data set in the storage address of the DirectBuffer, and the preprocessor 500 can read the data in the data set from the DirectBuffer according to the metadata data, and then perform automatic data format conversion and conversion on the data through the specified preset analytic function.
  • the subsequent data set generates a data block
  • the data preprocessor 500 stores the data block into the LaunchingBuffer. If the storage format of the data in the data block needs to be converted when stored in the LaunchingBuffer, the storage format is converted to ensure that the data storage format in the LaunchingBuffer is the same as the WorkingBuffer in the GPU-800.
  • the data splicer 600 splicing the data block from the LaunchingBuffer to the WorkingBuffer in the GPU-800. If the splicing fails, the WorkingBuffer can no longer store the data block, and then the GPU is triggered to perform the operation processing on the data block stored in the WorkingBuffer, and the GPU will operate. The result is stored in the ResultBuffer where it is located, and the API interface is called to transfer the result of the operation to the ResultBuffer in the CPU.
  • an embodiment of the present invention further provides a data processing device, which may include: a memory 910 and at least one processor 920 (taking one processor in FIG. 9 as an example).
  • the memory 910 and the processor 920 may be connected by a bus or other means, wherein FIG. 9 is exemplified by a bus connection.
  • the processor 920 may perform the following steps: the data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first a buffer, adding metadata to the data set in the first buffer header, where the metadata includes a storage address of the data of the data set in the first buffer; the data pre- The processor reads data of the data set from the first buffer according to a storage address indicated by the metadata; the data preprocessor converts data of the data set into a according to a preset analytic function Determining a data format indicated by the analytic function, and generating the data block after the converted data set is stored in a second buffer of the CPU, so that the data splicer reads the second buffer The data block is spliced to the GPU.
  • the data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU; the data splicer splices the data block into a work of the GPU to which a storage data block is allocated Buffer.
  • the processor 920 may further perform the step of: the data preprocessor instructing the data element of the address index array to start reading at the storage address of the first buffer until the next data element The indicated storage address or the end of the first buffer ends reading data.
  • the processor 920 may further perform the following steps: the data preprocessor converts data of the data set into a logical operation specified by the analytic function according to the preset analytic function Data Format.
  • processor 920 may also perform the step of converting the data in the data block to a storage format in the GPU.
  • the processor 920 may further perform the following steps: when the data splicer splicing the data block to the working buffer of the GPU that is allocated the storage data block, the splicing is suspended. Decoding the data block and triggering the GPU to process the data block stored by the working buffer.
  • the processor 920 may further perform the step of: the data splicer splicing the data block starting from a starting address indicated by a cursor parameter, the cursor parameter being used to indicate that the GPU is allocated The starting address of the data block that can be used to store the data block.
  • the processor 920 may further perform the following steps: the data spelling The router notifies the GPU of the size of the data block; the data splicer updates the cursor parameter.
  • the memory 910 can be used to store data sets, metadata, and data blocks;
  • the memory 910 can also be used to store an array of address indices.
  • the memory 910 can also be used to store cursor parameters.
  • the memory 910 can also be used to store the results of the operations.

Abstract

A data processing method and a related device, implementing automatic conversion of data format and automatic splicing of data in a node device by a Hadoop. The method mainly comprises: a data preprocessor reads metadata from a first buffer area of a CPU, reads data of a data collection from the first buffer area on the basis of a memory address indicated by the metadata, converts, on the basis of a preset analytic function, the data of the data collection into a data format indicated by the preset analytic function, and stores data blocks generated with the converted data collection in a second buffer area of the CPU, thus allowing a data splicer to read from the second buffer area the data blocks and to splice same to a GPU.

Description

一种数据处理方法及相关设备Data processing method and related equipment 技术领域Technical field
本发明涉及信息处理技术领域,具体涉及一种数据处理方法及相关设备。The present invention relates to the field of information processing technologies, and in particular, to a data processing method and related equipment.
背景技术Background technique
大数据与云计算一起,为信息技术(IT,Information Technology)带来一场新革命,云计算具备强大的大数据计算能力,计算速度非常快,可是大数据的传送却成为其一大难题。Together with cloud computing, big data brings a new revolution to information technology (IT, Information Technology). Cloud computing has powerful big data computing power, and computing speed is very fast, but the transmission of big data has become a big problem.
MapReduce(本领域中暂时没有统一的中文译文)是谷歌搜索引擎Google提供的一个著名的云计算架构,用于大规模数据集(大于1TB)上的并行运算,Hadoop(本领域中暂时没有统一的中文译文)是MapReduce架构的具体实现,在Hadoop集群中分为主节点设备和从节点设备。其中,在主节点设备中利用MapReduce所提供的Map函数把数据集按照大小分割成M片数据分片,将数据分片分配到多个从节点设备上做并行处理。具体地,每个从节点设备从数据分片中获取键值对的值,将值拼接存储在从节点设备的处理器(Central Processing Unit,简称CPU)分配的缓冲区中,之后,从缓冲区读取键值对的值进行解析,例如转换键值对的值的数据格式等,再将解析后的值通过应用程序编程接口(API,Application Programming Interface)拼接到从节点设备的图形处理器(GPU,Graphics Processing Unit)分配存储数据的缓冲区中,由GPU进行计算处理。MapReduce (there is no unified Chinese translation in the field) is a well-known cloud computing architecture provided by Google search engine Google for parallel computing on large-scale data sets (greater than 1TB), Hadoop (there is no unified in this field) Chinese translation) is a concrete implementation of the MapReduce architecture, which is divided into a master node device and a slave node device in a Hadoop cluster. The master node device uses the Map function provided by MapReduce to divide the data set into M pieces of data fragments according to the size, and distributes the data fragments to multiple slave nodes for parallel processing. Specifically, each slave node device obtains the value of the key value pair from the data fragment, and stores the value in a buffer allocated from a processor (Central Processing Unit, CPU for short) of the node device, and then, the buffer buffer. Read the value of the key-value pair, for example, convert the data format of the value of the key-value pair, and then splicing the parsed value to the graphics processor of the slave node through an Application Programming Interface (API) ( The GPU (Graphics Processing Unit) allocates buffers for storing data, and the GPU performs calculation processing.
本发明技术人员在实现上述方案时发现,由于MapReduce架构中没有提供解析函数,在对键值对的值进行解析时,需要依靠于程序员所编写的相应程序;同时,由于CPU分配存储键值对的值的缓冲区与GPU分配用来存储数据的缓冲区大小可能不一致,而MapReduce架构中没有提供相应的判断方法,同样依靠于程序员所编写的相应判断函数,对CPU和GPU的缓冲区是否一致进行判断,降低从节点设备的执行效率。When implementing the above solution, the present inventors have found that since the analytic function is not provided in the MapReduce architecture, when parsing the value of the key value pair, it is necessary to rely on the corresponding program written by the programmer; meanwhile, the CPU allocates the storage key value. The buffer of the value of the buffer may be inconsistent with the buffer size allocated by the GPU to store the data, and the corresponding judgment method is not provided in the MapReduce architecture, and the buffer corresponding to the CPU and the GPU is also relied on the corresponding judgment function written by the programmer. Whether the judgment is consistent or not, and the execution efficiency of the slave node device is reduced.
发明内容 Summary of the invention
针对上述缺陷,本发明实施例提供了一种的数据处理方法及相关设备,应用于MapReduce架构下的Hadoop集群,可以提高Hadoop集群中从节点设备的工作效率,简化程序员的编程工作,有利于后续优化MapReduce架构。In view of the above drawbacks, the embodiment of the present invention provides a data processing method and related device, which is applied to a Hadoop cluster under the MapReduce architecture, which can improve the working efficiency of the slave node device in the Hadoop cluster and simplify the programmer's programming work, which is beneficial to the programming. Subsequent optimization of the MapReduce architecture.
第一方面,本发明提供一种数据处理方法,应用于MapReduce架构下的Hadoop集群,所述Hadoop集群包括主节点设备和从节点设备,所述从节点设备包括处理器CPU和图形处理器GPU,所述从节点设备从所述主节点设备获取数据分片,所述CPU中设置有数据预处理器和数据拼接器,所述方法包括:In a first aspect, the present invention provides a data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU. The slave node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:
所述数据预处理器从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址;The data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, at the first buffer header Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据;The data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata;
所述数据预处理器根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块后存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。The data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the CPU. a second buffer, such that the data splicer reads the data block from the second buffer to splicing to the GPU.
结合第一方面,在第一种可能的实现方式中,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取数据集合中的数据包括:所述数据预处理器从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。With reference to the first aspect, in a first possible implementation, the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element Means for indicating that the data of the data set is in a storage address of the first buffer, and wherein the data preprocessor reads the data set from the first buffer according to the storage address indicated by the metadata The data includes: the data preprocessor instructing to start reading from a storage address of the first buffer from a data element of the address index array until a storage address indicated by a next data element or an end of the first buffer end is read Take data.
结合第一方面,在第二种可能的实现方式中,所述将所述数据集合的数据转换成所述预设解析函数所指示的数据格式包括:所述数据预处理器根据所述预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式。 With reference to the first aspect, in a second possible implementation, the converting the data of the data set into the data format indicated by the preset analytic function comprises: the data preprocessor according to the preset The analytic function converts the data of the data set into a data format that satisfies the logical operation specified by the analytic function.
结合第一方面的第二种可能的实现方式,在第三种可能的实现方式中,当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,所述将转换后的数据集合生成数据块后包括:所述数据预处理器将所述数据块中的数据转换成所述GPU中的存储格式。With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner, when the first buffer stores a data format of the data set and a storage format of the data in the GPU In case of inconsistency, the generating the data block after the converted data set comprises: the data preprocessor converting the data in the data block into a storage format in the GPU.
结合第一方面,或第一方面的第一种可能的实现方式,或第一方面的第二种可能的实现方式,或第一方面的第三种可能的实现方式,在第四种可能的实现方式中,所述数据集合具体由所述数据分片中的多个键值对的值拼接组成。In combination with the first aspect, or the first possible implementation of the first aspect, or the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in a fourth possible In an implementation manner, the data set is specifically composed of a value splicing of a plurality of key value pairs in the data fragment.
结合第一方面,或第一方面的第一种可能的实现方式,或第一方面的第二种可能的实现方式,或第一方面的第三种可能的实现方式,在第五种可能的实现方式中,所述第一缓冲区和所述第二缓冲区由所述CPU自动分配和回收,所述第一缓冲区的生存周期为一个数据分片的处理时间,所述第二缓冲区的生存周期为一个数据集合的处理时间。In combination with the first aspect, or the first possible implementation of the first aspect, or the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in a fifth possible In an implementation manner, the first buffer and the second buffer are automatically allocated and reclaimed by the CPU, and a life cycle of the first buffer is a processing time of a data fragment, and the second buffer is The life cycle is the processing time of a data set.
本发明第二方面提供一种数据处理方法,应用于MapReduce架构下的Hadoop集群,所述Hadoop集群包括主节点设备和从节点设备,所述从节点设备包括处理器CPU和图形处理器GPU,所述从节点设备从所述主节点设备获取数据分片,所述CPU中设置有数据预处理器和数据拼接器,所述方法包括:A second aspect of the present invention provides a data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU. The slave node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:
所述数据拼接器从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块;The data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU;
所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。The data splicer splices the data block into a working buffer of the GPU that is allocated a block of stored data.
结合第二方面,在第一种可能的实现方式中,当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。With reference to the second aspect, in a first possible implementation manner, when the data splicer splicing the data block to a working buffer in the GPU that is allocated a storage data block, the splicing of the data is suspended. Blocking, and triggering the GPU to process the data block stored by the working buffer.
结合第二方面,或第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述数据拼接器从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。 With reference to the second aspect, or the first possible implementation manner of the second aspect, in a second possible implementation, the data splicer splicing the data block from a starting address indicated by a cursor parameter, The cursor parameter is used to indicate a starting address in the working buffer of the GPU in which the storage data block is allocated for storing the data block.
结合第二方面的第二种可能的实现方式,在第三种可能的实现方式中,当拼接所述数据块成功后,所述方法还包括:所述数据拼接器通知所述GPU所述数据块的大小;所述数据拼接器更新所述游标参数。With reference to the second possible implementation of the second aspect, in a third possible implementation, after the data block is successfully spliced, the method further includes: the data splicer notifying the GPU of the data The size of the block; the data splicer updates the cursor parameters.
本发明第三方面提供一种数据预处理器,包括:A third aspect of the present invention provides a data preprocessor, including:
第一读取单元,用于从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址;a first reading unit, configured to read metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, in the first buffer The header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
第二读取单元,用于根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据;a second reading unit, configured to read data of the data set from the first buffer according to a storage address indicated by the metadata;
转换单元,用于根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块;a converting unit, configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;
存储单元,用于将所述数据块存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。And a storage unit, configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.
结合第三方面,在第一种可能的实现方式中,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而所述第二读取单元包括:数据读取单元,用于从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。With reference to the third aspect, in a first possible implementation, the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element a storage address indicating that the data of the data set is in the first buffer, and wherein the second reading unit comprises: a data reading unit, configured to indicate from the data element of the address index array that the first The memory address of the buffer begins to be read until the memory address indicated by the next data element or the end of the first buffer ends to read data.
结合第三方面,或第三方面的第一种可能的实现方式,在第二种可能的实现方式中,所述解析单元包括:数据格式转换单元,用于通过预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式生成单元,用于将转换的数据集合生成数据块。With reference to the third aspect, or the first possible implementation manner of the third aspect, in a second possible implementation, the parsing unit includes: a data format converting unit, configured to use the preset analytic function The data of the data set is converted into a data format generating unit that satisfies the logical operation specified by the analytic function, and is configured to generate the data block by converting the data set.
结合第三方面,在第三种可能的实现方式中,所述解析单元还包括:格式转换单元,用于当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,将所述数据块中的数据转换成所述GPU中的存储格式。With reference to the third aspect, in a third possible implementation, the parsing unit further includes: a format converting unit, configured to: when the first buffer stores data of the data set, in a format of the GPU When the storage format of the data is inconsistent, the data in the data block is converted into a storage format in the GPU.
本发明第四方面提供一种数据拼接器,包括: A fourth aspect of the present invention provides a data splicer, including:
第三读取单元,用于从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块;a third reading unit, configured to read a data block generated by the data preprocessor from a second buffer of the CPU;
拼接处理单元,用于将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。a splicing processing unit, configured to splicing the data block into a working buffer of the GPU to which a storage data block is allocated.
结合第四方面,在第一种可能的实现方式中,所述数据拼接器还包括:触发处理单元,用于当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。With reference to the fourth aspect, in a first possible implementation, the data splicer further includes: a trigger processing unit, configured to: when the data splicer splices the data block into the GPU, is allocated to store data When the working buffer of the block fails, the data block is suspended and the GPU is triggered to process the data block stored in the working buffer.
结合第四方面,或第四方面的第一种可能的实现方式,在第二种可能的实现方式中,所述拼接处理单元具体用于从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。With reference to the fourth aspect, or the first possible implementation manner of the fourth aspect, in a second possible implementation, the splicing processing unit is specifically configured to splicing the data block from a starting address indicated by a cursor parameter The cursor parameter is used to indicate a starting address in the working buffer of the GPU in which the storage data block is allocated for storing the data block.
结合第四方面的第二种可能的实现方式,在第三种可能的实现方式中,所述数据拼接器还包括:通知单元,用于通知所述GPU所述数据块的大小;更新单元,用于更新所述游标参数。With reference to the second possible implementation of the fourth aspect, in a third possible implementation, the data splicer further includes: a notification unit, configured to notify the GPU of the size of the data block; and an update unit, Used to update the cursor parameters.
本发明第五方面提供一种处理器,可包括上述第三方面所述的数据预处理器和上述第四方面所述的数据拼接器。A fifth aspect of the present invention provides a processor, which may include the data preprocessor according to the above third aspect and the data splicer according to the fourth aspect.
结合第五方面,在第一种可能的实现方式中,自动分配和回收所述第一缓冲区和所述第二缓冲区,所述第一缓冲区的生存周期为一个数据分片的处理时间,所述第二缓冲区的生存周期为一个数据集合的处理时间。With reference to the fifth aspect, in a first possible implementation, the first buffer and the second buffer are automatically allocated and reclaimed, and a life cycle of the first buffer is a processing time of a data fragment. The life cycle of the second buffer is the processing time of one data set.
本发明第六方面提供一种从节点设备,可包括上述第五方面所述的处理器CPU,以及图形处理器GPU;其中,所述CPU中的数据预处理器用于将从数据分片获取的数据集合转换数据格式,并将转换数据格式后的数据集合生成数据块,通过所述CPU中的数据拼接器将所述数据块拼接到所述GPU中分配存储数据块的工作缓冲区中;所述GPU用于对所述数据块进行处理得到处理结果,之后将所述处理结果返回给所述CPU。A sixth aspect of the present invention provides a slave node device, which may include the processor CPU described in the above fifth aspect, and a graphics processor GPU; wherein the data preprocessor in the CPU is used to obtain the data slice from the data slice. The data set converts the data format, and converts the data set after the data format to generate a data block, and splices the data block into a working buffer of the GPU to allocate the storage data block by using a data splicer in the CPU; The GPU is configured to process the data block to obtain a processing result, and then return the processing result to the CPU.
从以上技术方案可以看出,本发明实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present invention have the following advantages:
一方面,本发明实施例通过在从节点设备中设置数据预处理器和数据拼接器,由数据预处理器从CPU的第一缓冲区读取元数据,由于元数据是在数 据集合存储进第一缓冲区时为该数据集合生成,用于表示该数据集合的数据在第一缓冲区的存储地址,之后数据预处理器能够根据元数据从第一缓冲区读取数据集合的数据,再根据预设的解析函数对数据集合的数据进行格式转换,之后将转换后的数据集合生成数据块,把数据块存储到CPU的第二缓冲区中,以便由数据拼接器完成与GPU的数据块拼接。与现有技术相比,本发明实施例中通过在将数据集合存储进第一缓冲区时,为数据集合的数据添加包括存储地址的元数据,数据预处理器可以自动从第一缓冲区读取数据集合的数据,不需要依赖于程序员编写相应的程序。再者,数据预处理器可以根据预设解析函数对数据集合的数据进行解析,提高CPU中的处理效率,还能有利于后续优化MapReduce架构;In one aspect, the embodiment of the present invention reads the metadata from the first buffer of the CPU by the data preprocessor by setting the data preprocessor and the data splicer in the slave node device, because the metadata is in the number The data set is generated for the data set when the set is stored in the first buffer, and is used to represent the storage address of the data of the data set in the first buffer, and then the data preprocessor can read the data set from the first buffer according to the metadata. The data is further converted into a data set according to a preset analytic function, and then the converted data set is generated into a data block, and the data block is stored in a second buffer of the CPU so as to be completed by the data splicer. GPU data block splicing. Compared with the prior art, in the embodiment of the present invention, when the data set is stored into the first buffer, metadata including the storage address is added to the data of the data set, and the data preprocessor can automatically read from the first buffer. Taking data from a data collection does not require relying on the programmer to write the appropriate program. Furthermore, the data preprocessor can parse the data of the data set according to the preset analytic function, improve the processing efficiency in the CPU, and can be beneficial to the subsequent optimization of the MapReduce architecture;
另一方面,通过数据拼接器从第二缓冲区读取数据块拼接到GPU中被分配存储数据块的工作缓冲区中,在拼接失败,说明GPU中被分配存储数据块的工作缓冲区的剩余内存不够完成数据块的拼接,则暂时停止拼接该数据块,转而触发GPU对数据块进行数据运算。而数据块还将暂时保存在第二缓冲区中,下次再进行拼接。与现有技术相比,不需要依赖于程序员编写的程序,可以由数据拼接器自动完成数据块拼接,有效防止数据块丢失,提高数据块拼接效率。On the other hand, the data block is read from the second buffer through the data splicer and spliced into the working buffer of the GPU that is allocated the storage data block, and the splicing fails, indicating the remaining of the working buffer of the GPU that is allocated the storage data block. If the memory is not enough to complete the splicing of the data block, the splicing of the data block is temporarily stopped, and the GPU is triggered to perform data operation on the data block. The data block will also be temporarily saved in the second buffer, and then spliced next time. Compared with the prior art, it does not need to rely on a program written by a programmer, and the data splicing can be automatically completed by the data splicer, which effectively prevents data block loss and improves data block splicing efficiency.
附图说明DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.
图1为本发明一实施例提供的数据处理方法流程示意图;FIG. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention;
图2为本发明另一实施例提供的数据处理方法流程示意图;2 is a schematic flowchart of a data processing method according to another embodiment of the present invention;
图3为本发明一实施例提供的数据处理方法流程示意图;FIG. 3 is a schematic flowchart of a data processing method according to an embodiment of the present invention;
图4为本发明另一实施例提供的数据处理方法流程示意图;4 is a schematic flowchart of a data processing method according to another embodiment of the present invention;
图5-a为本发明一实施例提供的数据预处理器的结构示意图;FIG. 5-a is a schematic structural diagram of a data preprocessor according to an embodiment of the present invention;
图5-b为本发明另一实施例提供的数据预处理器的结构示意图;FIG. 5-b is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention;
图5-c为本发明另一实施例提供的数据预处理器的结构示意图; FIG. 5-c is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention;
图5-d为本发明另一实施例提供的数据预处理器的结构示意图;FIG. 5-d is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention;
图6-a为本发明一实施例提供的数据拼接器的结构示意图;6-a is a schematic structural diagram of a data splicer according to an embodiment of the present invention;
图6-b为本发明另一实施例提供的数据拼接器的结构示意图;6-b is a schematic structural diagram of a data splicer according to another embodiment of the present invention;
图6-c为本发明另一实施例提供的数据拼接器的结构示意图;6-c is a schematic structural diagram of a data splicer according to another embodiment of the present invention;
图7为本发明一实施例提供的处理器的结构示意图;FIG. 7 is a schematic structural diagram of a processor according to an embodiment of the present invention;
图8-a为本发明一实施例提供的从节点设备的结构示意图;FIG. 8-a is a schematic structural diagram of a slave node device according to an embodiment of the present invention;
图8-b为本发明一实施例提供的从节点设备中CPU与GPU之间的交互示意图;FIG. 8-b is a schematic diagram of interaction between a CPU and a GPU in a slave node device according to an embodiment of the present invention;
图9为本发明一实施例提供的数据处理设备的结构示意图。FIG. 9 is a schematic structural diagram of a data processing device according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明实施例提供了一种数据处理方法及相关设备,应用于MapReduce架构下的Hadoop集群,实现Hadoop从节点设备数据格式自动转换和数据自动拼接,简化程序员的编程工作,有利于后续优化MapReduce架构。The embodiment of the invention provides a data processing method and related device, which is applied to a Hadoop cluster under the MapReduce architecture, realizes automatic conversion of Hadoop slave node data format and automatic data splicing, simplifies programmer programming, and is beneficial to subsequent optimization of MapReduce. Architecture.
如图1所示,本发明一方面提供一种数据处理方法,包括:As shown in FIG. 1, an aspect of the present invention provides a data processing method, including:
S110、数据预处理器从CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址;S110. The data preprocessor reads the metadata from the first buffer of the CPU. When the data set obtained from the data fragment is stored in the first buffer, the first buffer is in the first buffer. Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
本发明实施例应用于MapReduce架构下的Hadoop集群,该Hadoop集群中包括主节点设备和从节点设备,从节点设备包括处理器CPU和图形处理器GPU,该从节点设备从主节点设备获取数据分片,而在CPU中设置有数据预处理器和数据拼接器。The embodiment of the present invention is applied to a Hadoop cluster in a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave node device obtains data points from the master node device. Slice, and a data preprocessor and data splicer are provided in the CPU.
在CPU中分配第一缓冲区,用于存储从数据分片中获取的数据集合,而在数据集合存储进第一缓冲区时,则在第一缓冲区头部为数据集合添加元数据,元数据主要包括数据集合中的数据在该第一缓冲区中的存储地址。 Allocating a first buffer in the CPU for storing the data set obtained from the data fragment, and when the data set is stored in the first buffer, adding metadata to the data set in the first buffer header, the element The data mainly includes the storage address of the data in the data set in the first buffer.
S120、所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据;S120. The data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata.
由于元数据包括有数据集合在第一缓冲区的存储地址,数据预处理器可以根据元数据的指示从第一缓冲区中直接读取数据集合的数据,而无需再依赖程序员编写额外的程序来读取数据。Since the metadata includes the storage address of the data set in the first buffer, the data preprocessor can directly read the data of the data set from the first buffer according to the indication of the metadata, without having to rely on the programmer to write an additional program. To read the data.
S130、所述数据预处理器根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块后存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到GPU。S130. The data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the a second buffer of the CPU such that the data splicer reads the data block from the second buffer to spliced to the GPU.
另外,在MapReduce架构中预设有解析函数,数据预处理器可以根据预设解析函数对第一缓冲区中的数据集合的数据进行解析,转换成预设解析函数对应的数据格式,然后将转换后的数据集合生成数据块。同时,CPU中还分配了第二缓冲区,用来存储数据块。数据拼接器则可以从第二缓冲区中读取数据块拼接到GPU中。In addition, an analytic function is pre-configured in the MapReduce architecture, and the data preprocessor can parse the data of the data set in the first buffer according to a preset analytic function, convert the data format into a preset analytic function, and then convert the data format. The subsequent data set generates a data block. At the same time, a second buffer is allocated in the CPU for storing data blocks. The data splicer can then read the data block from the second buffer into the GPU.
本发明实施例中,由于在数据集合存储进CPU的第一缓冲区时,为数据集合添加了元数据,该元数据包括数据集合的数据在第一缓冲区的存储地址,因此,数据预处理器先从第一缓冲区中读取元数据后,根据元数据所指示的存储地址从第一缓冲区读取数据集合的数据,再利用预设解析函数转换数据的数据格式,将都转换格式后的数据集合生成数据块,存储到CPU中的第二缓冲区,进而实现由数据预处理器自动完成读取第一缓冲区的数据,和对数据进行解析的操作,无需再额外依赖于程序员的编程,为程序员提供更加完善的MapReduce架构,也有利于后续对MapReduce架构优化。In the embodiment of the present invention, since the metadata is added to the data set when the data set is stored in the first buffer of the CPU, the metadata includes the storage address of the data of the data set in the first buffer, and therefore, the data is preprocessed. After reading the metadata from the first buffer, the device reads the data of the data set from the first buffer according to the storage address indicated by the metadata, and then converts the data format of the data by using a preset analytic function, and converts the format. The subsequent data set generates a data block and stores it in the second buffer in the CPU, thereby implementing the operation of automatically reading the data of the first buffer by the data preprocessor and parsing the data, without further relying on the program. The programming of the staff provides a more complete MapReduce architecture for the programmer, and is also beneficial for subsequent optimization of the MapReduce architecture.
可以理解的是,在MapReduce架构中,指定一个映射Map函数,用来把输入的键值对映射成新的键值对;再指定并发的化简Reduce函数,用来保证所有映射的键值对中的每一个共享相同的键组。而在Map函数把输入的键值对映射成新的键值对后,由Hadoop集群中的主节点设备,按照数据大小将所有新键值对划分成不同的数据分片,根据数据分片安排给每个从节点设备进行相应的运算处理。It can be understood that in the MapReduce architecture, a mapping Map function is specified to map the input key-value pairs into new key-value pairs; and the concurrent reduced Reduce function is used to ensure the key-value pairs of all the mappings. Each of them shares the same key group. After the Map function maps the input key-value pairs into new key-value pairs, all the new key-value pairs are divided into different data fragments according to the data size by the master node device in the Hadoop cluster, according to the data fragmentation arrangement. Perform corresponding arithmetic processing for each slave node device.
在从节点设备所在的CPU中,调用RecordReader类获取数据分片中的键 值对,并将键值对中的值提取出来拼接成数据集合。CPU为数据集合在其内存中分配DirectBuffer,数据集合以该DirectBuffer的格式要求存储进DirectBuffer中,其中,在数据集合存储进DirectBuffer时,会在DirectBuffer的头部为数据集合添加元数据。同时,在MapReduce架构中,预设有对数据集合的数据进行解析的预设解析函数,预设解析函数将数据具体转换成满足逻辑运算的指定数据格式。而在CPU中设置有数据预处理器,由数据预处理器完成根据元数据从DirectBuffer读取数据,和通过预设解析函数自动实现数据格式的转换。具体地,下面将对图1提供的实施例作详细地说明,请参阅图2,一种数据处理方法,可包括:In the CPU where the slave device is located, call the RecordReader class to get the key in the data fragment. Value pairs, and extract the values from the key-value pairs into a data set. The CPU allocates a DirectBuffer to its data set in its memory. The data set is stored in the DirectBuffer in the format of the DirectBuffer. When the data set is stored in the DirectBuffer, metadata is added to the data set at the head of the DirectBuffer. At the same time, in the MapReduce architecture, a preset analytic function for parsing data of the data set is pre-set, and the preset analytic function specifically converts the data into a specified data format that satisfies the logical operation. The data preprocessor is set in the CPU, and the data preprocessor completes reading data from the DirectBuffer according to the metadata, and automatically converts the data format by using a preset analytic function. Specifically, the embodiment provided in FIG. 1 will be described in detail below. Referring to FIG. 2, a data processing method may include:
S210、数据预处理器从DirectBuffer读取元数据,其中,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述DirectBuffer中的存储地址;S210: The data preprocessor reads the metadata from the DirectBuffer, where the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element a storage address for indicating data of the data set in the DirectBuffer;
具体地,在存储数据集合进DirectBuffer时,在DirectBuffer的头部添加元数据,用来指示所述数据集合中的数据在所述DirectBuffer中的存储地址。可以理解的是,该元数据可以包括一个地址索引数组,在数据集合存储进DirectBuffer时,根据数据集合中的数据在DirectBuffer的位置,将数据的存储地址添加进地址索引数组中。该地址索引数组具有与数据集合中数据一一对应的数据元素,数据元素指示数据集合的数据在DirectBuffer中的存储地址。通常来讲,存储进DirectBuffer的数据集合中的数据都是同一种数据格式,可以是文本格式或者二进制等不可进行逻辑运算的格式。Specifically, when storing the data set into the DirectBuffer, metadata is added at the head of the DirectBuffer to indicate the storage address of the data in the data set in the DirectBuffer. It can be understood that the metadata may include an address index array. When the data set is stored in the DirectBuffer, the storage address of the data is added to the address index array according to the data in the data set at the position of the DirectBuffer. The address index array has data elements that correspond one-to-one with the data in the data set, and the data elements indicate the storage address of the data of the data set in the DirectBuffer. Generally speaking, the data stored in the data collection of DirectBuffer is the same data format, and can be a format such as text format or binary that cannot be logically operated.
S220、所述数据预处理器根据所述元数据中地址索引数组的数据元素,从DirectBuffer中读取数据集合中的数据;S220. The data preprocessor reads data in the data set from the DirectBuffer according to the data element of the address index array in the metadata.
具体地,该数据预处理器根据地址索引数组中数据元素所指示的存储地址,在DirectBuffer中从该存储地址开始读取相应的数据,直到下一个数据元素所指示的存储地址或者是读取到DirectBuffer末端时结束,读取到数据集合的一个数据,然后继续读取下一个数据,直到将DirectBuffer中数据集合的数据读取完为止。Specifically, the data preprocessor reads the corresponding data from the storage address in the DirectBuffer according to the storage address indicated by the data element in the address index array until the storage address indicated by the next data element is read or When the end of the DirectBuffer ends, one data of the data set is read, and then the next data is read until the data of the data set in the DirectBuffer is read.
S230、所述数据预处理器根据预设解析函数,将所述数据集合的数据转 换成所述预设解析函数指定的满足逻辑运算的数据格式;S230. The data preprocessor converts data of the data set according to a preset analytic function. And replacing with the data format specified by the preset analytic function to satisfy the logical operation;
存储进DirectBuffer的数据集合中的数据一般是不可进行逻辑运算的数据格式,而在传送给GPU进行逻辑运算之前,需要将其转换成可进行逻辑运算的格式。因此,在MapReduce架构中预设解析函数,数据预处理器根据预设解析函数自动实现数据格式转换,转换成解析函数所指定的满足逻辑运算的数据格式。The data stored in the data set of the DirectBuffer is generally a data format that cannot be logically operated, and needs to be converted into a format that can be logically operated before being transferred to the GPU for logical operations. Therefore, the analytic function is preset in the MapReduce architecture, and the data preprocessor automatically converts the data format according to the preset analytic function, and converts into a data format that satisfies the logical operation specified by the analytic function.
可选地,预设解析函数所指定的数据格式,可以是GPU逻辑运算时所要求的数据格式。具体地,该预设解析函数指定的可进行逻辑运算的数据格式,可以是整形数据、浮点型数据、字符串数据等。Optionally, the data format specified by the preset analytic function may be a data format required by the GPU logic operation. Specifically, the data format that can be logically operated by the preset analytic function may be shaped data, floating point data, string data, or the like.
S240、所述预处理器将转换数据格式后的数据集合生成数据块;S240. The preprocessor converts the data set after the data format to generate a data block.
在数据预处理器根据预设解析函数,自动将每个数据都转换成预设解析函数指定的可进行逻辑运算的数据格式后,为了方便后续CPU与GPU之间数据的拼接,将转换数据格式后的数据集合生成数据块。After the data preprocessor automatically converts each data into a data format that can be logically operated according to a preset analytic function according to a preset analytic function, the data format is converted in order to facilitate subsequent splicing of data between the CPU and the GPU. The subsequent data set generates a data block.
S250、所述预处理器将所述数据块存储到LaunchingBuffer中,以使得数据拼接器从所述LaunchingBuffer中读取所述数据块拼接到GPU。S250. The preprocessor stores the data block in a LaunchingBuffer, so that the data splicer reads the data block from the LaunchingBuffer and splices it to the GPU.
具体地,CPU还在内存中分配LaunchingBuffer来暂时存储数据格式转换后的数据块,其中,数据预处理器将数据块存储到LaunchingBuffer中,之后由数据拼接器完成从该LaunchingBuffer读取数据块拼接到GPU。Specifically, the CPU further allocates a LaunchingBuffer in the memory to temporarily store the data format converted data block, wherein the data preprocessor stores the data block in the LaunchingBuffer, and then the data splicer completes the reading of the data block from the LaunchingBuffer to the data block. GPU.
可以理解的是,CPU的DirectBuffer中存储的数据和GPU要处理的数据在存储格式上可能不一致,即对大小尾数问题的处理上不一致,其中,小尾数存储格式指的是数据的高位存储在高地址中,数据的低位存储在低地址中;大尾数存储格式指的是数据的高位存储在低地址中,数据的地位存储在高地址中。因此,数据预处理器还需要解决数据块的大小尾数问题。It can be understood that the data stored in the DirectBuffer of the CPU and the data to be processed by the GPU may be inconsistent in the storage format, that is, the processing of the size and mantissa problem is inconsistent, wherein the small endian storage format refers to the high order of the data stored in the high In the address, the lower bits of the data are stored in the lower address; the big-endian storage format means that the upper bits of the data are stored in the lower address, and the status of the data is stored in the upper address. Therefore, the data preprocessor also needs to solve the problem of the size and mantissa of the data block.
在CPU分配的DirectBuffer中自带有成员变量,该成员变量指示数据在该DirectBuffer中是以大尾格式还是小尾格式存储,同样也有指示是否在存储进LaunchingBuffer时需要转换存储格式,并给出需要转换成大尾格式还是小尾格式的提示。例如,数据集合中的数据以大尾格式存储在DirectBuffer中,而GPU对于数据的存储,却是以小尾格式存储,则在将数据块存储到LaunchingBuffer时,将数据块的数据以小尾数存储格式存储在 LaunchingBuffer中。之后,数据拼接器则可以直接从LaunchingBuffer中读取该数据块拼接到GPU,保证CPU的LaunchingBuffer和GPU对数据的存储格式一致,保证GPU能够正确读取数据块进行运算处理,避免将数据高位读成低位,或将数据地位读成高位导致运算错误。The DirectBuffer allocated by the CPU has its own member variable, which indicates whether the data is stored in the DirectBuffer in the big-end format or the small-tail format. It also indicates whether the storage format needs to be converted when stored in the LaunchingBuffer, and the conversion needs to be given. The big tail format is a hint for the small tail format. For example, the data in the data set is stored in the DirectBuffer in the big tail format, while the GPU stores the data in the small tail format. When the data block is stored in the LaunchingBuffer, the data in the data block is stored in a small endian format. save at LaunchingBuffer. After that, the data splicer can directly read the data block from the LaunchingBuffer and splice it to the GPU, ensuring that the storage format of the data of the LaunchingBuffer and the GPU of the CPU is consistent, ensuring that the GPU can correctly read the data block for arithmetic processing, and avoid reading the data high. Going low, or reading the data status high, causes an arithmetic error.
在本发明实施例中,通过数据预处理器先从DirectBuffer中读取到地址索引数组,根据地址索引数组中的数据元素从DirectBuffer中读取到对应的数据集合中的数据,之后,根据预设解析函数实现对数据集合中的数据进行数据格式转换,使得数据格式转换后的数据能够满足逻辑运算。将数据集合生成数据块存储到LaunchingBuffer中,由数据拼接器从LaunchingBuffer中读取数据块传送给GPU。本发明实施例由CPU中的数据预处理器独自完成,通过预设解析函数实现对数据自动解析,为GPU对数据块的运算提供方便,利用数据预处理器简化从节点设备的编程工作,有利于以后优化。In the embodiment of the present invention, the data preprocessor first reads the address index array from the DirectBuffer, and reads the data in the corresponding data set from the DirectBuffer according to the data elements in the address index array, and then, according to the preset The analytic function implements data format conversion on the data in the data set, so that the data after the data format conversion can satisfy the logical operation. The data set generation data block is stored in the LaunchingBuffer, and the data splicer reads the data block from the LaunchingBuffer and transmits it to the GPU. The embodiment of the invention is completed by the data preprocessor in the CPU, and the data is automatically parsed by the preset analytic function, which facilitates the operation of the data block by the GPU, and the data preprocessor is used to simplify the programming work of the slave device. Conducive to future optimization.
CPU自动分配和回收WorkingBuffer和LaunchingBuffer,其中,一个WorkingBuffer的生存周期为一个数据分片的处理时间,一个LaunchingBuffer的生存周期是处理一个数据集合的时间。另外,CPU上还分配有ResultBuffer,用来存储GPU运算后返回的运算结果,之后该运算结果作为MapReduce中Reduce任务的输入。The CPU automatically allocates and reclaims WorkingBuffer and LaunchingBuffer. The working period of a WorkingBuffer is the processing time of one data fragment, and the life cycle of a LaunchingBuffer is the time to process a data collection. In addition, the ResultBuffer is also allocated on the CPU to store the operation result returned by the GPU operation, and then the operation result is used as the input of the Reduce task in the MapReduce.
如图3所示,本发明实施例另一方面提供一种数据处理方法,包括:As shown in FIG. 3, another aspect of the present invention provides a data processing method, including:
S310、数据拼接器从CPU的第二缓冲区读取数据预处理器生成的数据块;S310. The data splicer reads the data block generated by the data preprocessor from the second buffer of the CPU.
本发明实施例应用于MapReduce架构下的Hadoop集群,该Hadoop集群中包括主节点设备和从节点设备,从节点设备包括处理器CPU和图形处理器GPU,该从节点设备从主节点设备获取数据分片,而在CPU中设置有数据预处理器和数据拼接器。The embodiment of the present invention is applied to a Hadoop cluster in a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave node device obtains data points from the master node device. Slice, and a data preprocessor and data splicer are provided in the CPU.
其中,数据预处理器用于完成从CPU第一缓冲区读取数据集合的数据,将数据转换数据格式后,将这个数据集合生成数据块存储到第二缓冲区。而数据拼接器主要完成将数据块从CPU拼接到GPU。The data preprocessor is configured to read data of the data set from the first buffer of the CPU, convert the data into a data format, and store the data set generated data block into the second buffer. The data splicer mainly completes the splicing of data blocks from the CPU to the GPU.
S320、所述数据拼接器将所述数据块拼接到GPU中被分配存储数据块的工作缓冲区。 S320. The data splicer splices the data block into a working buffer of a GPU that is allocated a storage data block.
本发明实施例中数据拼接器从CPU第二缓冲区读取数据块,将数据块从CPU的第二缓冲区拼接到GPU的工作缓冲区。本发明实施例中由数据拼接器完成数据的拼接,不再依赖于程序员的编程,从而简化了程序员的编程工作,还能有利于整个MapReduce架构后续的优化工作。In the embodiment of the present invention, the data splicer reads the data block from the second buffer of the CPU, and splices the data block from the second buffer of the CPU to the working buffer of the GPU. In the embodiment of the invention, the data splicing is completed by the data splicer, which is no longer dependent on the programming of the programmer, thereby simplifying the programmer's programming work and facilitating the subsequent optimization of the entire MapReduce architecture.
下面将对图3所提供的实施例作详细介绍,如图4所示,一种数据处理方法,可包括:The embodiment provided in FIG. 3 is described in detail below. As shown in FIG. 4, a data processing method may include:
S410、所述数据拼接器从LaunchingBuffer读取数据块;S410. The data splicer reads a data block from the LaunchingBuffer.
CPU在内存中还分配有LaunchingBuffer,主要用来存储需要拼接到GPU的数据块。The CPU also allocates a LaunchingBuffer in the memory, which is mainly used to store data blocks that need to be spliced to the GPU.
S420、所述数据拼接器从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的WorkingBuffer中可用于存储数据块的起始地址;S420: The data splicer splices the data block from a starting address indicated by a cursor parameter, where the cursor parameter is used to indicate a starting address of a WorkingBuffer in the GPU where a storage data block is allocated for storing a data block. ;
GPU内存中分配有WorkingBuffer,主要用来存储从CPU的LaunchingBuffer拼接过来的数据;该WorkingBuffer的内存大小由GPU自身决定,而在CPU中DirectBuffer的内存大小由java的运行环境所决定。通常来说,GPU上的WorkingBuffer的内存大小要远远大于CPU中由java支持的DirectBuffer的内存,因此,WorkingBuffer可能存储至少一个从DirectBuffer得来的数据块,而在存储到某一个数据块时,WorkingBuffer的剩余内存可能无法再继续存储数据块,将由数据拼接器对该数据块作出正确处理。WorkingBuffer is allocated in the GPU memory, which is mainly used to store data spliced from the CPU's LaunchingBuffer; the memory size of the WorkingBuffer is determined by the GPU itself, and the memory size of the DirectBuffer in the CPU is determined by the Java operating environment. Generally speaking, the memory size of the WorkingBuffer on the GPU is much larger than the memory of the DirectBuffer supported by Java in the CPU. Therefore, the WorkingBuffer may store at least one data block obtained from the DirectBuffer, and when stored in a certain data block, The remaining memory of the WorkingBuffer may no longer be able to store the data block, which will be processed correctly by the data splicer.
具体地,数据拼接器管理着一个游标参数,游标参数指示WorkingBuffer可存储数据的起始地址,在每一次将数据块拼接到WorkingBuffer后,就相应更新游标参数,以便下次能够准确知道WorkingBuffer可以存储数据的起始地址。在需要将数据块传送到WorkingBuffer时,从游标参数指示的起始地址开始,将数据块拼接到WorkingBuffer。Specifically, the data splicer manages a cursor parameter, and the cursor parameter indicates that the WorkingBuffer can store the starting address of the data. After each splicing of the data block to the WorkingBuffer, the cursor parameter is updated accordingly, so that the WorkingBuffer can be accurately stored next time. The starting address of the data. When the data block needs to be transferred to the WorkingBuffer, the data block is spliced into the WorkingBuffer starting from the starting address indicated by the cursor parameter.
S430、当所述数据拼接器将所述数据块拼接到WorkingBuffer失败时,则暂停拼接所述数据块,并触发所述GPU处理WorkingBuffer存储的数据块。S430. When the data splicer fails to splicing the data block to the WorkingBuffer, the splicing of the data block is suspended, and the GPU is triggered to process the data block stored by the WorkingBuffer.
其中,数据拼接器从LaunchingBuffer读取到的数据块中的数据是可以直接进行逻辑运算的,且满足GPU对数据的存储格式要求。调用应用程序接口(API,Application Programming Interface)将所述数据块中的数据拼接到 WorkingBuffer。若WorkingBuffer的剩余内存能够拼接完从CPU的LaunchingBuffer读取的数据块,则将整个数据块都拼接到WorkingBuffer中;如果WorkingBuffer的剩余内存不能拼接完从CPU的LaunchingBuffer读取的数据块,那么将暂停拼接该数据块,数据块仍然保存在LaunchingBuffer中,另外触发GPU开始对WorkingBuffer中的所有数据块进行运算处理。The data in the data block read by the data splicer from the LaunchingBuffer can be directly logically operated, and meets the storage format requirement of the GPU for the data. Invoking an application programming interface (API) to stitch data in the data block to WorkingBuffer. If the remaining memory of the WorkingBuffer can be spliced out of the data block read from the CPU's LaunchingBuffer, the entire data block is spliced into the WorkingBuffer; if the remaining memory of the WorkingBuffer cannot be spliced, the data block read from the CPU's LaunchingBuffer is suspended. The data block is spliced, the data block is still stored in the LaunchingBuffer, and the GPU is triggered to start processing all the data blocks in the WorkingBuffer.
本发明实施例中,CPU中的数据拼接器,用于解决CPU中DirectBuffer和GPU中的WorkingBuffer剩余内存大小不一致时的数据块拼接问题。数据拼接器通过将数据块从LaunchingBuffer中直接拼接到WorkingBuffer中,如果WorkingBuffer剩余内存大小不能满足存储数据块的时候,暂时停止该次拼接操作,若再下次WorkingBuffer的剩余内存能够拼接完时,再次LaunchingBuffer中读取数据块拼接到WorkingBuffer中。由于在LaunchingBuffer的数据块已经符合GPU对数据处理时的需要,GPU接收到数据块之后则可以直接进行运算处理,有效提高GPU的工作效率。In the embodiment of the present invention, the data splicer in the CPU is used to solve the problem of data block splicing when the remaining memory size of the WorkingBuffer in the DirectBuffer and the GPU in the CPU is inconsistent. The data splicer directly splices the data block from the LaunchingBuffer to the WorkingBuffer. If the remaining memory size of the WorkingBuffer cannot satisfy the storage data block, the splicing operation is temporarily stopped. If the remaining memory of the WorkingBuffer can be spliced again, again. The read data block in the LaunchingBuffer is spliced into the WorkingBuffer. Since the data block in the LaunchingBuffer has already met the needs of the GPU for data processing, the GPU can directly perform arithmetic processing after receiving the data block, thereby effectively improving the working efficiency of the GPU.
可以理解的是,数据拼接器在传送数据块成功之后,还执行以下步骤:It can be understood that after the data splicer successfully transmits the data block, the following steps are performed:
B1、所述数据拼接器通知所述GPU所述数据块的大小;B1. The data splicer notifies a size of the data block of the GPU.
B2、所述数据拼接器更新所述游标参数。B2. The data splicer updates the cursor parameter.
其中,数据拼接器在每一次将数据块成功拼接到GPU后,都将数据块大小通知到GPU,GPU能够直接使用,无需再计算数据块大小,能够减少GPU的工作量。The data splicer notifies the GPU to the GPU after each successful splicing of the data block to the GPU, and the GPU can directly use the data block size without reducing the GPU workload.
另外,与在上述CPU的DirectBuffer中通过地址索引数组来指示数据在DirectBuffer的存储地址相同,GPU也可以在WorkingBuffer头部为数据块添加一个查找索引数组,查找索引数组包含有与所述数据块的数据一一对应的数据元素,数据元素用来指示数据在WorkingBuffer的存储地址。在数据拼接器拼接过来一个数据块后,即在查找索引数组中添加对应该数据块的每一个数据的数据元素,以便后续GPU快速从WorkingBuffer中找到数据并读取数据进行运算。In addition, in the DirectBuffer of the above CPU, the address index array is used to indicate that the data is stored in the DirectBuffer storage address, and the GPU may also add a lookup index array to the data block in the WorkingBuffer header, and the search index array contains the data block. The data corresponds to the data element, and the data element is used to indicate the storage address of the data in the WorkingBuffer. After the data splicer splices a data block, the data element corresponding to each data of the data block is added in the search index array, so that the GPU can quickly find the data from the WorkingBuffer and read the data for operation.
上述该步骤B1和B2不分先后,在此不作限定。The above steps B1 and B2 are in no particular order and are not limited herein.
由于在CPU中,接收到的每一个数据分片可能最后生成多个数据块,GPU中分配的WorkingBuffer以数据块为单位进行存储,其生存周期为处理完一个 数据分片的时间。在数据拼接器将整个数据分片都传送成功之后,数据拼接器则返回传送成功的标志值,以便通知主节点设备分配下一个数据分片;在数据拼接器传送数据分片失败后,则返回传送失败的标志值,以便通知主节点设备暂停分配下一个数据分片。Since in the CPU, each data slice received may eventually generate multiple data blocks, the WorkingBuffer allocated in the GPU is stored in units of data blocks, and the lifetime is one processed. The time of data fragmentation. After the data splicer successfully transmits the entire data fragment, the data splicer returns the flag value of the successful transmission to notify the master node device to allocate the next data fragment; after the data splicer fails to transmit the data fragment, it returns The failed flag value is transmitted to inform the master node device to suspend the allocation of the next data slice.
另外,在GPU内存中同样分配ResultBuffer,该ResultBuffer用来保存运算后的结果,之后调用API接口,将该运算结果返回CPU并存储在CPU分配的ResultBuffer当中,作为MapReduce下的Reduce任务的输入。In addition, the ResultBuffer is also allocated in the GPU memory. The ResultBuffer is used to save the result of the operation, and then the API interface is called, and the operation result is returned to the CPU and stored in the ResultBuffer allocated by the CPU as an input of the Reduce task under the MapReduce.
CPU中用来存储数据集合的DirectBuffer、存储数据格式转换后的数据块的LaunchingBuffer和用来存储GPU返回的运算结果的ResultBuffer都是由CPU自动分配和回收,其中,LaunchingBuffer的生存周期是一个数据块的处理时间;GPU中用来存储接收的数据块的WorkingBuffer和存储运算结果的ResultBuffer都是由GPU自动分配和回收,其中,WorkingBuffer的生存周期是一个数据分片的处理时间;ResultBuffer的生存周期与WorkingBuffer的生存周期一样。CPU和GPU中buffer自动实现同步,例如CPU中ResultBuffer与GPU中WorkingBuffer、ResultBuffer实现分配和回收同步。The DirectBuffer used to store the data set in the CPU, the LaunchingBuffer that stores the data block after the data format conversion, and the ResultBuffer used to store the result returned by the GPU are automatically allocated and reclaimed by the CPU. The life cycle of the LaunchingBuffer is a data block. Processing time; the WorkingBuffer used to store the received data block in the GPU and the ResultBuffer storing the operation result are automatically allocated and reclaimed by the GPU. The working period of the WorkingBuffer is the processing time of a data fragment; the life cycle of the ResultBuffer is WorkingBuffer has the same life cycle. The buffers in the CPU and the GPU are automatically synchronized. For example, the ResultBuffer in the CPU is synchronized with the WorkingBuffer and ResultBuffer in the GPU.
如图5-a所示,本发明实施例还提供一种数据预处理器500,可包括:As shown in FIG. 5-a, an embodiment of the present invention further provides a data preprocessor 500, which may include:
第一读取单元510,用于从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址;a first reading unit 510, configured to read metadata from a first buffer of the CPU; wherein, when the data set acquired from the data fragment is stored in the first buffer, in the first buffer a region header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
第二读取单元520,用于根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块;The second reading unit 520 is configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;
转换单元530,用于对所述数据进行解析,并将解析后的数据集合生成数据块;The converting unit 530 is configured to parse the data, and generate a data block by using the parsed data set;
存储单元540,用于将所述数据块存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。The storage unit 540 is configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.
本发明实施例应用于MapReduce架构下的Hadoop集群,该数据预处理器500设置在Hadoop集群下的从节点设备的CPU中,其中,该CPU还设置 有数据拼接器,且每个从节点设备还GPU,从节点设备从Hadoop集群的主节点设备获取数据分片,然后将数据分片中的键值对中的值拼接成数据集合存储进CPU内存中分配的第一缓冲区,由于第一缓冲区的内存可能无法将数据分片中所有键值对的值一次存储完,因此,数据分片中键值对的值可以分多次拼接成数据集合。The embodiment of the present invention is applied to a Hadoop cluster under the MapReduce architecture, and the data preprocessor 500 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further configured. There is a data splicer, and each slave node device also has a GPU, and the slave node device acquires data fragments from the master node device of the Hadoop cluster, and then splicing the values in the key value pairs in the data fragment into data sets and storing them into the CPU memory. The first buffer allocated in the first buffer, because the memory of the first buffer may not be able to store the values of all the key-value pairs in the data fragment once, so the value of the key-value pair in the data fragment can be spliced into data multiple times. set.
在数据集合存储进第一缓冲区时,在第一缓冲区的头部为数据集合添加元数据,该元数据主要包括数据集合的数据在第一缓冲区中的存储地址。之后,由第一读取单元510从第一缓冲区读取元数据,然后第二读取单元520根据元数据所指示的存储地址从第一缓冲区读取数据集合中的数据,再由转换单元530对数据进行数据格式转换,并将格式转换后的整个数据集合生成数据块,存储单元540将数据块存储到CPU的第二缓冲区,第二缓冲区主要是CPU在内存中分配用来存储数据块,以便数据拼接器能够从第二缓冲区读取数据块传送到GPU的工作缓冲区。本发明实施例中,由数据预处理器自动完成数据读取和对数据格式的转换,无需程序员再编写相应的程序,减少了程序员的编程工作,更加有利于后续优化M,apReduce架构,提高CPU的工作效率。When the data set is stored in the first buffer, metadata is added to the data set at the head of the first buffer, the metadata mainly including the storage address of the data of the data set in the first buffer. Thereafter, the metadata is read from the first buffer by the first reading unit 510, and then the second reading unit 520 reads the data in the data set from the first buffer according to the storage address indicated by the metadata, and then converts The unit 530 performs data format conversion on the data, and generates a data block by converting the entire data set. The storage unit 540 stores the data block in the second buffer of the CPU, and the second buffer is mainly allocated by the CPU in the memory. The data block is stored so that the data splicer can read the data block from the second buffer and transfer it to the working buffer of the GPU. In the embodiment of the present invention, the data preprocessor automatically completes the data reading and the conversion of the data format, and the programmer does not need to write the corresponding program, which reduces the programming work of the programmer, and is more conducive to the subsequent optimization of the M, apReduce architecture. Improve CPU productivity.
进一步地,元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而如图5-b所示,上述第二读取单元520可包括:Further, the metadata specifically includes an address index array, the address index array includes a data element corresponding to the data of the data set, and the data element is used to indicate that the data of the data set is at the first The storage address of the buffer, and as shown in FIG. 5-b, the second reading unit 520 may include:
数据读取单元5210,用于从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。a data reading unit 5210, configured to start reading from a data element of the address index array, starting at a storage address of the first buffer, until the storage address indicated by the next data element or the end of the first buffer ends data.
具体地,数据读取单元5210根据地址索引数组中数据元素所指示的存储地址,在第一缓冲区中从该存储地址开始读取相应的数据,直到下一个数据元素所指示的存储地址或者是读取到第一缓冲区末端时结束,读取到数据集合的一个数据,然后继续读取下一个数据,直到将第一缓冲区中数据集合的数据读取完为止。Specifically, the data reading unit 5210 reads the corresponding data from the storage address in the first buffer according to the storage address indicated by the data element in the address index array until the storage address indicated by the next data element is When reading to the end of the first buffer, it ends, reads a data of the data set, and then continues to read the next data until the data of the data set in the first buffer is read.
如图5-c所示,上述解析单元530包括: As shown in FIG. 5-c, the parsing unit 530 includes:
数据格式转换单元5310,用于通过预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式;a data format conversion unit 5310, configured to convert, by using a preset analytic function, data of the data set into a data format that satisfies a logical operation specified by the analytic function;
生成单元5320,用于将转换的数据集合生成数据块。The generating unit 5320 is configured to generate the data block by converting the data set.
在MapReduce架构中,预设解析函数所指定的数据格式,可以是GPU逻辑运算时所要求的数据格式。具体地,该预设解析函数指定的可进行逻辑运算的数据格式,可以是整形数据、浮点型数据、字符串数据等。In the MapReduce architecture, the data format specified by the preset analytic function may be the data format required by the GPU logic operation. Specifically, the data format that can be logically operated by the preset analytic function may be shaped data, floating point data, string data, or the like.
如图5-d所示,上述解析单元530还可以包括:As shown in FIG. 5-d, the parsing unit 530 may further include:
格式转换单元5330,用于当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,将所述数据块中的数据转换成所述GPU中的存储格式。The format conversion unit 5330 is configured to convert the data in the data block into the GPU when the storage format of the data of the first buffer to the data set is inconsistent with the storage format of the data in the GPU. The storage format in .
CPU的第一缓冲区和GPU对数据的存储格式上可能要求不一致,即在大小尾数问题的处理上不一致,其中,小尾数存储格式指的是数据的高位存储在高地址中,数据的低位存储在低地址中;大尾数存储格式指的是数据的高位存储在低地址中,数据的地位存储在高地址中。The first buffer of the CPU and the storage format of the GPU may be inconsistent, that is, the processing of the size and mantissa problem is inconsistent. The small-endian storage format refers to the high-order data being stored in the high address, and the low-order storage of the data. In the low address; the big endian storage format means that the upper bits of the data are stored in the lower address, and the status of the data is stored in the upper address.
在CPU分配的第一缓冲区中自带有成员变量,该成员变量指示数据在该第一缓冲区中是以大尾格式还是小尾格式存储,同样也有指示是否在存储进第二缓冲区时需要转换存储格式,并给出需要转换成大尾格式还是小尾格式的提示。例如,数据集合中的数据以大尾格式存储在第一缓冲区,而GPU对于数据的存储,却是以小尾格式存储,格式转换单元5330则将数据块转换成小尾格式,存储在第二缓冲区。之后,数据拼接器则可以直接从第二缓冲区中读取该数据块拼接到GPU,保证CPU的第二缓冲区和GPU对数据的存储格式一致,保证GPU能够正确读取数据块进行运算处理,避免将数据高位读成低位,或将数据地位读成高位导致运算错误。The first buffer allocated by the CPU has its own member variable, which indicates whether the data is stored in the first buffer in the big tail format or the small tail format. It also indicates whether it needs to be stored in the second buffer. Convert the storage format and give hints that need to be converted to a big tail format or a small tail format. For example, the data in the data set is stored in the first buffer in a big tail format, while the GPU stores the data in a small tail format, and the format conversion unit 5330 converts the data block into a small tail format, which is stored in the second buffer. Area. After that, the data splicer can directly read the data block from the second buffer and splice it to the GPU, ensuring that the second buffer of the CPU and the GPU store the data in the same format, and ensure that the GPU can correctly read the data block for arithmetic processing. Avoid reading the high bit of the data to the low bit, or reading the data bit high to cause an operation error.
如图6-a所示,本发明实施例还提供一种数据拼接器600,可包括:As shown in FIG. 6-a, an embodiment of the present invention further provides a data splicer 600, which may include:
第三读取单元610,用于从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块;a third reading unit 610, configured to read a data block generated by the data preprocessor from a second buffer of the CPU;
拼接处理单元620,用于将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。 The splicing processing unit 620 is configured to splicing the data block into a working buffer of the GPU to which the storage data block is allocated.
本发明实施例应用于MapReduce架构下的Hadoop集群,该数据拼接器600设置在Hadoop集群下的从节点设备的CPU中,其中,该CPU还设置有如图5-a所示的数据预处理器500,且每个从节点设备还包括GPU,从节点设备从Hadoop集群的主节点设备获取数据分片,然后将数据分片中的键值对中的值拼接成数据集合存储进CPU内存中分配的第一缓冲区,由于第一缓冲区的内存可能无法将数据分片中所有键值对的值一次存储完,因此,数据分片中键值对的值可以分多次拼接成数据集合。The embodiment of the present invention is applied to a Hadoop cluster under the MapReduce architecture. The data splicer 600 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further provided with a data preprocessor 500 as shown in FIG. 5-a. And each slave node device further includes a GPU, and the slave node device acquires data fragments from the master node device of the Hadoop cluster, and then splices the values in the key value pairs in the data fragment into data sets and stores them in the CPU memory. The first buffer, because the memory of the first buffer may not be able to store the values of all the key-value pairs in the data fragment once, the value of the key-value pair in the data fragment can be spliced into data sets multiple times.
数据预处理器500根据元数据从第一缓冲区读取数据,然后将数据格式进行转换,再将转换数据格式后的整个数据集合生成数据块,存储到CPU中的第二缓冲区中,之后,则由数据拼接器的第三读取单元610从CPU的第二缓冲区中读取数据块,由拼接处理单元620将读取的数据块拼接到GPU中被分配存储数据块的工作缓冲区中。The data preprocessor 500 reads data from the first buffer according to the metadata, then converts the data format, and then generates the data block by converting the entire data set after the data format to the second buffer in the CPU, and then Then, the third reading unit 610 of the data splicer reads the data block from the second buffer of the CPU, and the splicing processing unit 620 splices the read data block into the working buffer of the GPU that is allocated the storage data block. in.
其中,在从节点设备的CPU中,由数据预处理器500完成数据格式转换,而由数据拼接器完成数据块拼接,不再依赖于程序员编写相应的程序,能够简化程序员的编程工作,而且通过数据预处理器500和数据拼接器自动操作,能够提高CPU的工作效率,也有利于后续对MapReduce的优化。Wherein, in the CPU of the slave node device, the data format converter 500 completes the data format conversion, and the data splicer completes the data block splicing, no longer relies on the programmer to write the corresponding program, which can simplify the programmer's programming work. Moreover, the automatic operation of the data preprocessor 500 and the data splicer can improve the working efficiency of the CPU, and is also beneficial to the subsequent optimization of MapReduce.
数据拼接器600管理着一个游标参数,游标参数指示GPU的工作缓冲区可存储数据的起始地址,在每一次将数据块拼接到GPU的工作缓冲区后,就相应更新游标参数,以便下次能够准确知道GPU的工作缓冲区可以存储数据的起始地址。在需要将数据块传送到GPU的工作缓冲区时,拼接处理单元620根据游标参数指示的起始地址,将数据块拼接到GPU的工作缓冲区。The data splicer 600 manages a cursor parameter, and the cursor parameter indicates that the working buffer of the GPU can store the starting address of the data. After each time the data block is spliced into the working buffer of the GPU, the cursor parameter is updated accordingly, so that the next time It is possible to know exactly where the GPU's working buffer can store the starting address of the data. When the data block needs to be transferred to the working buffer of the GPU, the splicing processing unit 620 splices the data block into the working buffer of the GPU according to the starting address indicated by the cursor parameter.
因此,上述拼接处理单元620具体用于从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。Therefore, the splicing processing unit 620 is specifically configured to splicing the data block from a starting address indicated by the vernier parameter, where the vernier parameter is used to indicate that the working buffer allocated to the stored data block in the GPU is available for storing data. The starting address of the block.
如图6-b所示,上述数据拼接器还包括:As shown in Figure 6-b, the data splicer further includes:
触发处理单元630,用于当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。The trigger processing unit 630 is configured to: when the data splicer splicing the data block to the working buffer of the GPU that is allocated to store the data block, suspending splicing the data block and triggering the GPU processing The data block stored by the working buffer.
数据拼接器600的第三读取单元610从第二缓冲区读取到的数据块中的 数据是可以直接进行逻辑运算的,且满足GPU对数据的存储格式要求。调用API将所述数据块中的数据拼接到GPU的工作缓冲区。若GPU的工作缓冲区的剩余内存能够拼接完从CPU的第二缓冲区读取的数据块,则将整个数据块都拼接到GPU的工作缓冲区中;如果GPU的工作缓冲区的剩余内存不能拼接完从CPU的第二缓冲区读取的数据块,即拼接数据块失败时,那么将暂停拼接该数据块,数据块仍然保存在第二缓冲区中,另外触发处理单元630触发GPU开始对工作缓冲区中的所有数据块进行运算处理。The third reading unit 610 of the data splicer 600 reads from the data block of the second buffer Data can be directly logically operated and meets the GPU's storage format requirements for data. The API is called to splicing the data in the data block to the working buffer of the GPU. If the remaining memory of the working buffer of the GPU can be spliced out of the data block read from the second buffer of the CPU, the entire data block is spliced into the working buffer of the GPU; if the remaining memory of the working buffer of the GPU cannot After the data block read from the second buffer of the CPU is spliced, that is, when the spliced data block fails, the data block is suspended and the data block is still stored in the second buffer, and the trigger processing unit 630 triggers the GPU to start. All data blocks in the working buffer are processed.
进一步地,如图6-c所示,上述数据拼接器600还可以包括:Further, as shown in FIG. 6-c, the data splicer 600 may further include:
通知单元640,用于通知所述GPU所述数据块的大小;a notification unit 640, configured to notify the GPU of a size of the data block;
更新单元650,用于更新所述游标参数。The updating unit 650 is configured to update the cursor parameter.
在每一次将数据块成功拼接到GPU后,通知单元640都将数据块大小通知到GPU,GPU能够直接使用,无需再计算数据块大小,能够减少GPU的工作量。另外,由更新单元650更加游标参数。After each successful splicing of the data block to the GPU, the notification unit 640 notifies the GPU of the data block size, and the GPU can directly use the data block size without reducing the workload of the GPU. In addition, the update unit 650 further cursors the parameters.
如图7所示,本发明实施例提供一种处理器700,包括如图5-a所示的数据预处理器500和如图6-a所示的数据拼接器600,具体可以参考上述对数据预处理器500和数据拼接器600的介绍,在此不作赘述。As shown in FIG. 7, the embodiment of the present invention provides a processor 700, which includes a data preprocessor 500 as shown in FIG. 5-a and a data splicer 600 as shown in FIG. 6-a. The introduction of the data preprocessor 500 and the data splicer 600 will not be described herein.
其中,在CPU中自动分配和回收第一缓冲区和第二缓冲区,第一缓冲区的生存周期为一个数据分片的处理时间,第二缓冲区的生存周期为一个数据块的处理时间;同样,在GPU中也自动分配工作缓冲区,该工作缓冲区的生存时间为上一个数据分片的处理时间。The first buffer and the second buffer are automatically allocated and reclaimed in the CPU. The life cycle of the first buffer is the processing time of one data fragment, and the life cycle of the second buffer is the processing time of one data block. Similarly, the working buffer is automatically allocated in the GPU, and the working time of the working buffer is the processing time of the previous data fragment.
如图8-a所示,本发明实施例还提供一种从节点设备,可包括:As shown in Figure 8-a, an embodiment of the present invention further provides a slave node device, which may include:
如上述图7所示的处理器CPU-700,以及图形处理器GPU-800;a processor CPU-700 as shown in FIG. 7 above, and a graphics processor GPU-800;
其中,CPU-700如上述介绍,在此不再赘述。The CPU-700 is as described above and will not be described here.
具体地,所述CPU-700中的数据预处理器用于将从数据分片获取的数据集合转换数据格式,并将转换数据格式后的数据集合生成数据块,通过所述CPU-700中的数据拼接器将所述数据块拼接到所述GPU-800中分配存储数据块的工作缓冲区中; Specifically, the data preprocessor in the CPU-700 is configured to convert a data set obtained from the data fragment into a data format, and generate a data block by converting the data format after the data format, by using the data in the CPU-700. a splicer splicing the data block into a working buffer of the GPU-800 to allocate a storage data block;
所述GPU-800用于对所述数据块进行处理得到处理结果,之后将所述处理结果返回给所述CPU-700。The GPU-800 is configured to process the data block to obtain a processing result, and then return the processing result to the CPU-700.
实际应用中,CPU-700中还将自动分配和回收一个ResultBuffer,同样,在GPU-800中自动分配和回收一个ResultBuffer,CPU-700中的ResultBuffer和GPU-800中的ResultBuffer的生存周期相同,都是用于存储运算得到的结果。若实际应用中,CPU-700所分配的第一缓冲区为DirectBuffer,第二缓冲区为LaunchingBuffer,GPU-800分配的工作缓冲区为WorkingBuffer,那么如图8-b所示,图8-b为本发明实施例提供的从节点设备中CPU-700与GPU-800之间的交互示意图。如图8-b所示,在该CPU-700设置数据预处理器500和数据拼接器600。另外,在CPU-700中分配有DirectBuffer、LaunchingBuffer和ResultBuffer,DirectBuffer中存储着需要转换数据格式的数据集合,数据集合包括由键值对中的值拼接组成的数据,且在DirectBuffer添加元数据,元数据主要包括数据集合的数据在DirectBuffer的存储地址,预处理器500根据元数据数据能够从DirectBuffer中读取数据集合中的数据,再通过指定的预设解析函数对数据进行自动数据格式转换,转换后的数据集合生成数据块,最后数据预处理器500将数据块存储进LaunchingBuffer。如果在存储进LaunchingBuffer时,需要转换数据块中数据的存储格式时,将进行存储格式的转换,保证在LaunchingBuffer中数据的存储格式与GPU-800中WorkingBuffer相同。数据拼接器600从LaunchingBuffer读取数据块拼接到GPU-800中WorkingBuffer中,若拼接失败,说明WorkingBuffer不能再存储数据块,则先触发GPU对WorkingBuffer中所存储的数据块进行运算处理,GPU将运算结果存储到其所在的ResultBuffer中,调用API接口后将运算结果传送到CPU中的ResultBuffer。In the actual application, a ResultBuffer will be automatically allocated and reclaimed in the CPU-700. Similarly, a ResultBuffer is automatically allocated and reclaimed in the GPU-800. The ResultBuffer in the CPU-700 has the same lifetime as the ResultBuffer in the GPU-800. Is the result of storing the operation. In actual application, the first buffer allocated by CPU-700 is DirectBuffer, the second buffer is LaunchingBuffer, and the working buffer allocated by GPU-800 is WorkingBuffer, as shown in Figure 8-b, Figure 8-b is A schematic diagram of interaction between the CPU-700 and the GPU-800 in the slave node device provided by the embodiment of the present invention. As shown in FIG. 8-b, the data preprocessor 500 and the data splicer 600 are set in the CPU-700. In addition, DirectBuffer, LaunchingBuffer and ResultBuffer are allocated in the CPU-700. The DirectBuffer stores a data set that needs to be converted into a data format. The data set includes data composed of values in a key-value pair, and metadata is added in the DirectBuffer. The data mainly includes the data of the data set in the storage address of the DirectBuffer, and the preprocessor 500 can read the data in the data set from the DirectBuffer according to the metadata data, and then perform automatic data format conversion and conversion on the data through the specified preset analytic function. The subsequent data set generates a data block, and finally the data preprocessor 500 stores the data block into the LaunchingBuffer. If the storage format of the data in the data block needs to be converted when stored in the LaunchingBuffer, the storage format is converted to ensure that the data storage format in the LaunchingBuffer is the same as the WorkingBuffer in the GPU-800. The data splicer 600 splicing the data block from the LaunchingBuffer to the WorkingBuffer in the GPU-800. If the splicing fails, the WorkingBuffer can no longer store the data block, and then the GPU is triggered to perform the operation processing on the data block stored in the WorkingBuffer, and the GPU will operate. The result is stored in the ResultBuffer where it is located, and the API interface is called to transfer the result of the operation to the ResultBuffer in the CPU.
请参阅图9,本发明实施例还提供一种数据处理设备,可包括:存储器910和至少一个处理器920(图9中以一个处理器为例)。本发明实施例的一些实施例中,存储器910和处理器920可通过总线或其它方式连接,其中,图9以通过总线连接为例。Referring to FIG. 9, an embodiment of the present invention further provides a data processing device, which may include: a memory 910 and at least one processor 920 (taking one processor in FIG. 9 as an example). In some embodiments of the embodiments of the present invention, the memory 910 and the processor 920 may be connected by a bus or other means, wherein FIG. 9 is exemplified by a bus connection.
其中,处理器920可以执行以下步骤:所述数据预处理器从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第 一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址;所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据;所述数据预处理器根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块后存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。The processor 920 may perform the following steps: the data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first a buffer, adding metadata to the data set in the first buffer header, where the metadata includes a storage address of the data of the data set in the first buffer; the data pre- The processor reads data of the data set from the first buffer according to a storage address indicated by the metadata; the data preprocessor converts data of the data set into a according to a preset analytic function Determining a data format indicated by the analytic function, and generating the data block after the converted data set is stored in a second buffer of the CPU, so that the data splicer reads the second buffer The data block is spliced to the GPU.
or
所述数据拼接器从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块;所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。The data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU; the data splicer splices the data block into a work of the GPU to which a storage data block is allocated Buffer.
在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据预处理器从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。In some embodiments of the present invention, the processor 920 may further perform the step of: the data preprocessor instructing the data element of the address index array to start reading at the storage address of the first buffer until the next data element The indicated storage address or the end of the first buffer ends reading data.
在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据预处理器根据所述预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式。In some embodiments of the present invention, the processor 920 may further perform the following steps: the data preprocessor converts data of the data set into a logical operation specified by the analytic function according to the preset analytic function Data Format.
在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据预处理器将所述数据块中的数据转换成所述GPU中的存储格式。In some embodiments of the invention, processor 920 may also perform the step of converting the data in the data block to a storage format in the GPU.
在本发明一些实施例中,处理器920还可以执行以下步骤:当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。In some embodiments of the present invention, the processor 920 may further perform the following steps: when the data splicer splicing the data block to the working buffer of the GPU that is allocated the storage data block, the splicing is suspended. Decoding the data block and triggering the GPU to process the data block stored by the working buffer.
在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据拼接器从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。In some embodiments of the present invention, the processor 920 may further perform the step of: the data splicer splicing the data block starting from a starting address indicated by a cursor parameter, the cursor parameter being used to indicate that the GPU is allocated The starting address of the data block that can be used to store the data block.
在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据拼 接器通知所述GPU所述数据块的大小;所述数据拼接器更新所述游标参数。In some embodiments of the present invention, the processor 920 may further perform the following steps: the data spelling The router notifies the GPU of the size of the data block; the data splicer updates the cursor parameter.
在本发明一些实施例中,存储器910可以用来存储数据集合、元数据以及数据块;In some embodiments of the present invention, the memory 910 can be used to store data sets, metadata, and data blocks;
在本发明一些实施例中,存储器910还可以用来存储地址索引数组。In some embodiments of the invention, the memory 910 can also be used to store an array of address indices.
在本发明一些实施例中,存储器910还可以用来存储游标参数。In some embodiments of the invention, the memory 910 can also be used to store cursor parameters.
在本发明一些实施例中,存储器910还可以用来存储运算结果。In some embodiments of the invention, the memory 910 can also be used to store the results of the operations.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be performed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, the above mentioned storage. The medium can be a read only memory, a magnetic disk or an optical disk or the like.
以上对本发明所提供的一种数据处理方法及相关设备进行了详细介绍,对于本领域的一般技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。 The data processing method and the related device provided by the present invention are described in detail above. For those skilled in the art, according to the idea of the embodiment of the present invention, there are changes in the specific implementation manner and application scope. In summary, the content of the specification should not be construed as limiting the invention.

Claims (22)

  1. 一种数据处理方法,其特征在于,应用于MapReduce架构下的Hadoop集群,所述Hadoop集群包括主节点设备和从节点设备,所述从节点设备包括处理器CPU和图形处理器GPU,所述从节点设备从所述主节点设备获取数据分片,所述CPU中设置有数据预处理器和数据拼接器,所述方法包括:A data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave device The node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:
    所述数据预处理器从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址;The data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, at the first buffer header Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
    所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据;The data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata;
    所述数据预处理器根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块后存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。The data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the CPU. a second buffer, such that the data splicer reads the data block from the second buffer to splicing to the GPU.
  2. 根据权利要求1所述的方法,其特征在于,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取数据集合中的数据包括:The method according to claim 1, wherein the metadata specifically comprises an address index array, the address index array comprising data elements in one-to-one correspondence with data of the data set, the data elements being used for And indicating that the data of the data set is in a storage address of the first buffer, and further that the data preprocessor reads data in the data set from the first buffer according to the storage address indicated by the metadata, including :
    所述数据预处理器从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。The data preprocessor instructs reading from the storage address of the first buffer from the data element of the address index array until the storage address indicated by the next data element or the end of the first buffer ends reading data.
  3. 根据权利要求1所述的方法,其特征在于,所述将所述数据集合的数据转换成所述预设解析函数所指示的数据格式包括:The method according to claim 1, wherein the converting the data of the data set into the data format indicated by the preset analytic function comprises:
    所述数据预处理器根据所述预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式。The data pre-processor converts the data of the data set into a data format that satisfies the logical operation specified by the analytic function according to the preset analytic function.
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括: The method of claim 3, wherein the method further comprises:
    当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,所述将转换后的数据集合生成数据块后包括:When the storage format of the data of the first buffer to the data set is inconsistent with the storage format of the data in the GPU, the data set generated by the converted data set includes:
    所述数据预处理器将所述数据块中的数据转换成所述GPU中的存储格式。The data preprocessor converts data in the data block into a storage format in the GPU.
  5. 根据权利要求1~4任一项所述的方法,其特征在于,The method according to any one of claims 1 to 4, characterized in that
    所述数据集合具体由所述数据分片中的多个键值对的值拼接组成。The data set is specifically composed of splicing values of a plurality of key value pairs in the data slice.
  6. 根据权利要求1~4任一项所述的方法,其特征在于,The method according to any one of claims 1 to 4, characterized in that
    所述第一缓冲区和所述第二缓冲区由所述CPU自动分配和回收,所述第一缓冲区的生存周期为一个数据分片的处理时间,所述第二缓冲区的生存周期为一个数据集合的处理时间。The first buffer and the second buffer are automatically allocated and reclaimed by the CPU, the life cycle of the first buffer is a processing time of a data fragment, and the life cycle of the second buffer is The processing time of a data collection.
  7. 一种数据处理方法,其特征在于,应用于MapReduce架构下的Hadoop集群,所述Hadoop集群包括主节点设备和从节点设备,所述从节点设备包括处理器CPU和图形处理器GPU,所述从节点设备从所述主节点设备获取数据分片,所述CPU中设置有数据预处理器和数据拼接器,所述方法包括:A data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave device The node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:
    所述数据拼接器从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块;The data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU;
    所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。The data splicer splices the data block into a working buffer of the GPU that is allocated a block of stored data.
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:The method of claim 7, wherein the method further comprises:
    当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。When the data splicer splicing the data block to a working buffer of the GPU that is allocated to store the data block, suspending splicing the data block and triggering the GPU to process the working buffer storage data block.
  9. 根据权利要求7或8所述的方法,其特征在于,所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区包括:The method according to claim 7 or 8, wherein the data splicer splicing the data block into a working buffer of the GPU to which the storage data block is allocated comprises:
    所述数据拼接器从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。The data splicer splices the data block from a starting address indicated by a cursor parameter, where the cursor parameter is used to indicate a starting address of a working buffer in the GPU in which a storage data block is allocated for storing a data block. .
  10. 根据权利要求9所述的方法,其特征在于,当拼接所述数据块成功后,所述方法还包括: The method according to claim 9, wherein after the splicing of the data block is successful, the method further comprises:
    所述数据拼接器通知所述GPU所述数据块的大小;The data splicer notifies the GPU of the size of the data block;
    所述数据拼接器更新所述游标参数。The data splicer updates the cursor parameters.
  11. 一种数据预处理器,其特征在于,包括:A data preprocessor, comprising:
    第一读取单元,用于从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址;a first reading unit, configured to read metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, in the first buffer The header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
    第二读取单元,用于根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据;a second reading unit, configured to read data of the data set from the first buffer according to a storage address indicated by the metadata;
    转换单元,用于根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块;a converting unit, configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;
    存储单元,用于将所述数据块存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。And a storage unit, configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.
  12. 根据权利要求11所述的数据预处理器,其特征在于,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而所述第二读取单元包括:The data preprocessor according to claim 11, wherein the metadata specifically includes an address index array, the address index array including data elements in one-to-one correspondence with data of the data set, the data The element is used to indicate that the data of the data set is in a storage address of the first buffer, and the second reading unit includes:
    数据读取单元,用于从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。a data reading unit, configured to start reading from a data element of the address index array, starting at a storage address of the first buffer, until the storage address indicated by the next data element or the end of the first buffer ends reading data .
  13. 根据权利要求11或12所述的数据预处理器,其特征在于,所述解析单元包括:The data preprocessor according to claim 11 or 12, wherein the parsing unit comprises:
    数据格式转换单元,用于通过预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式a data format conversion unit, configured to convert data of the data set into a data format that satisfies a logical operation specified by the analytic function by a preset analytic function
    生成单元,用于将转换的数据集合生成数据块。Generating unit for generating a data block by converting the data set.
  14. 根据权利要求11所述的数据预处理器,其特征在于,所述解析单元还包括:The data preprocessor according to claim 11, wherein the parsing unit further comprises:
    格式转换单元,用于当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,将所述数据块中的数据转换成 所述GPU中的存储格式。a format conversion unit, configured to convert data in the data block into data when a storage format of data of the first buffer to the data set is inconsistent with a storage format of data in the GPU The storage format in the GPU.
  15. 一种数据拼接器,其特征在于,包括:A data splicer, comprising:
    第三读取单元,用于从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块;a third reading unit, configured to read a data block generated by the data preprocessor from a second buffer of the CPU;
    拼接处理单元,用于将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。a splicing processing unit, configured to splicing the data block into a working buffer of the GPU to which a storage data block is allocated.
  16. 根据权利要求15所述的数据拼接器,其特征在于,所述数据拼接器还包括:The data splicer according to claim 15, wherein the data splicer further comprises:
    触发处理单元,用于当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。a triggering processing unit, configured to: when the data splicer splicing the data block to a working buffer of the GPU that is allocated a storage data block, suspending splicing the data block, and triggering the GPU processing The data block stored in the working buffer.
  17. 根据权利要求15或16所述的数据拼接器,其特征在于,所述拼接处理单元具体用于从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。The data splicer according to claim 15 or 16, wherein the splicing processing unit is specifically configured to splicing the data block starting from a starting address indicated by a cursor parameter, wherein the vernier parameter is used to indicate the GPU The starting address of the working block in which the memory block is allocated can be used to store the data block.
  18. 根据权利要求17所述的数据拼接器,其特征在于,所述数据拼接器还包括:The data splicer according to claim 17, wherein the data splicer further comprises:
    通知单元,用于通知所述GPU所述数据块的大小;a notification unit, configured to notify the GPU of the size of the data block;
    更新单元,用于更新所述游标参数。An update unit for updating the cursor parameter.
  19. 一种处理器,其特征在于,包括:如权利要求11所述的数据预处理器和权利要求15所述的数据拼接器。A processor, comprising: the data preprocessor of claim 11 and the data splicer of claim 15.
  20. 根据权利要求19所述的处理器,其特征在于,所述处理器还包括:The processor of claim 19, wherein the processor further comprises:
    自动分配和回收所述第一缓冲区和所述第二缓冲区,所述第一缓冲区的生存周期为一个数据分片的处理时间,所述第二缓冲区的生存周期为一个数据集合的处理时间。Automatically allocating and reclaiming the first buffer and the second buffer, wherein a life cycle of the first buffer is a processing time of one data fragment, and a life cycle of the second buffer is a data set. Processing time.
  21. 一种从节点设备,其特征在于,所述从节点设备为Hadoop集群中的从节点设备,所述Hadoop集群还包括主节点设备,所述从节点设备从所述Hadoop集群接收数据分片,所述从节点设备包括:图形处理器GPU以及如权利要求19所述的处理器CPU; A slave node device, wherein the slave node device is a slave node device in a Hadoop cluster, the Hadoop cluster further includes a master node device, and the slave node device receives data fragments from the Hadoop cluster. The slave node device includes: a graphics processor GPU and the processor CPU according to claim 19;
    其中,所述CPU中的数据预处理器用于将从数据分片获取的数据集合转换数据格式,并将转换数据格式后的数据集合生成数据块,通过所述CPU中的数据拼接器将所述数据块拼接到所述GPU中分配存储数据块的工作缓冲区中;The data preprocessor in the CPU is configured to convert a data set obtained from the data fragment into a data format, and generate a data block by converting the data format after the data format, by using a data splicer in the CPU Data blocks are spliced into a working buffer of the GPU that allocates storage data blocks;
    所述GPU用于对所述数据块进行处理得到处理结果,之后将所述处理结果返回给所述CPU。The GPU is configured to process the data block to obtain a processing result, and then return the processing result to the CPU.
  22. 根据权利要求21所述的从节点设备,其特征在于,所述GPU还包括:The slave device according to claim 21, wherein the GPU further comprises:
    自动分配和回收所述工作缓冲区,所述工作缓冲区的生存周期为一个数据分片的处理时间。 The working buffer is automatically allocated and reclaimed, and the working period of the working buffer is the processing time of one data fragment.
PCT/CN2014/094071 2013-12-23 2014-12-17 Data processing method and related device WO2015096649A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310719857.4 2013-12-23
CN201310719857.4A CN104731569B (en) 2013-12-23 2013-12-23 A kind of data processing method and relevant device

Publications (1)

Publication Number Publication Date
WO2015096649A1 true WO2015096649A1 (en) 2015-07-02

Family

ID=53455495

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094071 WO2015096649A1 (en) 2013-12-23 2014-12-17 Data processing method and related device

Country Status (2)

Country Link
CN (1) CN104731569B (en)
WO (1) WO2015096649A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023061295A1 (en) * 2021-10-13 2023-04-20 杭州趣链科技有限公司 Data processing method and apparatus, and electronic device and storage medium

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159610B (en) * 2015-09-01 2018-03-09 浪潮(北京)电子信息产业有限公司 Large-scale data processing system and method
CN106326029A (en) * 2016-08-09 2017-01-11 浙江万胜智能科技股份有限公司 Data storage method for electric power meter
KR102482516B1 (en) * 2016-11-29 2022-12-29 에이알엠 리미티드 memory address conversion
CN109408450B (en) * 2018-09-27 2021-03-30 中兴飞流信息科技有限公司 Data processing method, system, co-processing device and main processing device
CN111143232B (en) * 2018-11-02 2023-08-18 伊姆西Ip控股有限责任公司 Method, apparatus and computer readable medium for storing metadata
CN109522133B (en) * 2018-11-28 2020-10-02 北京字节跳动网络技术有限公司 Data splicing method and device, electronic equipment and storage medium
EP3964949B1 (en) * 2019-05-27 2023-09-06 Huawei Technologies Co., Ltd. Graphics processing method and apparatus
CN110769064B (en) * 2019-10-29 2023-02-24 广州趣丸网络科技有限公司 System, method and equipment for offline message pushing
CN113535857A (en) * 2021-08-04 2021-10-22 阿波罗智联(北京)科技有限公司 Data synchronization method and device
CN115952561A (en) * 2023-03-14 2023-04-11 北京全路通信信号研究设计院集团有限公司 Data processing method, device, equipment and medium applied to rail transit system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050140682A1 (en) * 2003-12-05 2005-06-30 Siemens Medical Solutions Usa, Inc. Graphics processing unit for simulation or medical diagnostic imaging
CN102662639A (en) * 2012-04-10 2012-09-12 南京航空航天大学 Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN102708088A (en) * 2012-05-08 2012-10-03 北京理工大学 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050140682A1 (en) * 2003-12-05 2005-06-30 Siemens Medical Solutions Usa, Inc. Graphics processing unit for simulation or medical diagnostic imaging
CN102662639A (en) * 2012-04-10 2012-09-12 南京航空航天大学 Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN102708088A (en) * 2012-05-08 2012-10-03 北京理工大学 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023061295A1 (en) * 2021-10-13 2023-04-20 杭州趣链科技有限公司 Data processing method and apparatus, and electronic device and storage medium

Also Published As

Publication number Publication date
CN104731569B (en) 2018-04-10
CN104731569A (en) 2015-06-24

Similar Documents

Publication Publication Date Title
WO2015096649A1 (en) Data processing method and related device
US20200210092A1 (en) Infinite memory fabric streams and apis
US9734085B2 (en) DMA transmission method and system thereof
CN108647104B (en) Request processing method, server and computer readable storage medium
TW201731253A (en) Quantum key distribution method and device obtaining a key sequence matching the requested length in the sub-key pool allocated from the requesting party after receiving a quantum key obtaining request
EP2437167A1 (en) Method and system for virtual storage migration and virtual machine monitor
US11023430B2 (en) Sparse dictionary tree
US10956335B2 (en) Non-volatile cache access using RDMA
TWI773959B (en) Data processing system, method and computer program product for handling an input/output store instruction
WO2016200655A1 (en) Infinite memory fabric hardware implementation with memory
JP2022105146A (en) Acceleration system, acceleration method, and computer program
US20220114145A1 (en) Resource Lock Management Method And Apparatus
US20210209057A1 (en) File system quota versioning
KR20210092689A (en) Method and apparatus for traversing graph database
CN110119304B (en) Interrupt processing method and device and server
AU2015402888A1 (en) Computer device and method for reading/writing data by computer device
JP5124430B2 (en) Virtual machine migration method, server, and program
US10216664B2 (en) Remote resource access method and switching device
US20200125548A1 (en) Efficient write operations for database management systems
US10169272B2 (en) Data processing apparatus and method
CN112074822A (en) Data processing network with stream compression for streaming data transmission
EP4242862A2 (en) Rdma-enabled key-value store
KR20220058581A (en) Constructor-Consumer Active Direct Cache Passing
CN112486702A (en) Global message queue implementation method based on multi-core multi-processor parallel system
JP2012234564A (en) Method for migrating virtual machine, server, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14873198

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14873198

Country of ref document: EP

Kind code of ref document: A1