WO2015096649A1 - Procédé de traitement de données et dispositif correspondant - Google Patents

Procédé de traitement de données et dispositif correspondant Download PDF

Info

Publication number
WO2015096649A1
WO2015096649A1 PCT/CN2014/094071 CN2014094071W WO2015096649A1 WO 2015096649 A1 WO2015096649 A1 WO 2015096649A1 CN 2014094071 W CN2014094071 W CN 2014094071W WO 2015096649 A1 WO2015096649 A1 WO 2015096649A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
buffer
gpu
block
splicer
Prior art date
Application number
PCT/CN2014/094071
Other languages
English (en)
Chinese (zh)
Inventor
崔慧敏
谢睿
阮功
杨文森
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015096649A1 publication Critical patent/WO2015096649A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/541Interprogram communication via adapters, e.g. between incompatible applications

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a data processing method and related equipment.
  • Cloud computing has powerful big data computing power, and computing speed is very fast, but the transmission of big data has become a big problem.
  • MapReduce (there is no unified Chinese translation in the field) is a well-known cloud computing architecture provided by Google search engine Google for parallel computing on large-scale data sets (greater than 1TB), Hadoop (there is no unified in this field) Chinese translation) is a concrete implementation of the MapReduce architecture, which is divided into a master node device and a slave node device in a Hadoop cluster.
  • the master node device uses the Map function provided by MapReduce to divide the data set into M pieces of data fragments according to the size, and distributes the data fragments to multiple slave nodes for parallel processing.
  • each slave node device obtains the value of the key value pair from the data fragment, and stores the value in a buffer allocated from a processor (Central Processing Unit, CPU for short) of the node device, and then, the buffer buffer.
  • a processor Central Processing Unit, CPU for short
  • the buffer buffer For example, convert the data format of the value of the key-value pair, and then splicing the parsed value to the graphics processor of the slave node through an Application Programming Interface (API)
  • API Application Programming Interface
  • the GPU Graphics Processing Unit allocates buffers for storing data, and the GPU performs calculation processing.
  • the present inventors have found that since the analytic function is not provided in the MapReduce architecture, when parsing the value of the key value pair, it is necessary to rely on the corresponding program written by the programmer; meanwhile, the CPU allocates the storage key value.
  • the buffer of the value of the buffer may be inconsistent with the buffer size allocated by the GPU to store the data, and the corresponding judgment method is not provided in the MapReduce architecture, and the buffer corresponding to the CPU and the GPU is also relied on the corresponding judgment function written by the programmer. Whether the judgment is consistent or not, and the execution efficiency of the slave node device is reduced.
  • the embodiment of the present invention provides a data processing method and related device, which is applied to a Hadoop cluster under the MapReduce architecture, which can improve the working efficiency of the slave node device in the Hadoop cluster and simplify the programmer's programming work, which is beneficial to the programming. Subsequent optimization of the MapReduce architecture.
  • the present invention provides a data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU.
  • the slave node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:
  • the data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, at the first buffer header Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
  • the data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata
  • the data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the CPU. a second buffer, such that the data splicer reads the data block from the second buffer to splicing to the GPU.
  • the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element Means for indicating that the data of the data set is in a storage address of the first buffer, and wherein the data preprocessor reads the data set from the first buffer according to the storage address indicated by the metadata
  • the data includes: the data preprocessor instructing to start reading from a storage address of the first buffer from a data element of the address index array until a storage address indicated by a next data element or an end of the first buffer end is read Take data.
  • the converting the data of the data set into the data format indicated by the preset analytic function comprises: the data preprocessor according to the preset The analytic function converts the data of the data set into a data format that satisfies the logical operation specified by the analytic function.
  • the generating the data block after the converted data set comprises: the data preprocessor converting the data in the data block into a storage format in the GPU.
  • the data set is specifically composed of a value splicing of a plurality of key value pairs in the data fragment.
  • the first buffer and the second buffer are automatically allocated and reclaimed by the CPU, and a life cycle of the first buffer is a processing time of a data fragment, and the second buffer is The life cycle is the processing time of a data set.
  • a second aspect of the present invention provides a data processing method, which is applied to a Hadoop cluster under a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU.
  • the slave node device obtains a data fragment from the master node device, where the CPU is provided with a data preprocessor and a data splicer, and the method includes:
  • the data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU;
  • the data splicer splices the data block into a working buffer of the GPU that is allocated a block of stored data.
  • the data splicer splicing the data block from a starting address indicated by a cursor parameter
  • the cursor parameter is used to indicate a starting address in the working buffer of the GPU in which the storage data block is allocated for storing the data block.
  • the method further includes: the data splicer notifying the GPU of the data The size of the block; the data splicer updates the cursor parameters.
  • a third aspect of the present invention provides a data preprocessor, including:
  • a first reading unit configured to read metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first buffer, in the first buffer The header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
  • a second reading unit configured to read data of the data set from the first buffer according to a storage address indicated by the metadata
  • a converting unit configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;
  • a storage unit configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.
  • the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element a storage address indicating that the data of the data set is in the first buffer, and wherein the second reading unit comprises: a data reading unit, configured to indicate from the data element of the address index array that the first The memory address of the buffer begins to be read until the memory address indicated by the next data element or the end of the first buffer ends to read data.
  • the parsing unit includes: a data format converting unit, configured to use the preset analytic function The data of the data set is converted into a data format generating unit that satisfies the logical operation specified by the analytic function, and is configured to generate the data block by converting the data set.
  • the parsing unit further includes: a format converting unit, configured to: when the first buffer stores data of the data set, in a format of the GPU When the storage format of the data is inconsistent, the data in the data block is converted into a storage format in the GPU.
  • a fourth aspect of the present invention provides a data splicer, including:
  • a third reading unit configured to read a data block generated by the data preprocessor from a second buffer of the CPU
  • a splicing processing unit configured to splicing the data block into a working buffer of the GPU to which a storage data block is allocated.
  • the data splicer further includes: a trigger processing unit, configured to: when the data splicer splices the data block into the GPU, is allocated to store data When the working buffer of the block fails, the data block is suspended and the GPU is triggered to process the data block stored in the working buffer.
  • a trigger processing unit configured to: when the data splicer splices the data block into the GPU, is allocated to store data When the working buffer of the block fails, the data block is suspended and the GPU is triggered to process the data block stored in the working buffer.
  • the splicing processing unit is specifically configured to splicing the data block from a starting address indicated by a cursor parameter
  • the cursor parameter is used to indicate a starting address in the working buffer of the GPU in which the storage data block is allocated for storing the data block.
  • the data splicer further includes: a notification unit, configured to notify the GPU of the size of the data block; and an update unit, Used to update the cursor parameters.
  • a fifth aspect of the present invention provides a processor, which may include the data preprocessor according to the above third aspect and the data splicer according to the fourth aspect.
  • the first buffer and the second buffer are automatically allocated and reclaimed, and a life cycle of the first buffer is a processing time of a data fragment.
  • the life cycle of the second buffer is the processing time of one data set.
  • a sixth aspect of the present invention provides a slave node device, which may include the processor CPU described in the above fifth aspect, and a graphics processor GPU; wherein the data preprocessor in the CPU is used to obtain the data slice from the data slice.
  • the data set converts the data format, and converts the data set after the data format to generate a data block, and splices the data block into a working buffer of the GPU to allocate the storage data block by using a data splicer in the CPU;
  • the GPU is configured to process the data block to obtain a processing result, and then return the processing result to the CPU.
  • the embodiment of the present invention reads the metadata from the first buffer of the CPU by the data preprocessor by setting the data preprocessor and the data splicer in the slave node device, because the metadata is in the number
  • the data set is generated for the data set when the set is stored in the first buffer, and is used to represent the storage address of the data of the data set in the first buffer, and then the data preprocessor can read the data set from the first buffer according to the metadata.
  • the data is further converted into a data set according to a preset analytic function, and then the converted data set is generated into a data block, and the data block is stored in a second buffer of the CPU so as to be completed by the data splicer.
  • GPU data block splicing is
  • the data preprocessor when the data set is stored into the first buffer, metadata including the storage address is added to the data of the data set, and the data preprocessor can automatically read from the first buffer. Taking data from a data collection does not require relying on the programmer to write the appropriate program. Furthermore, the data preprocessor can parse the data of the data set according to the preset analytic function, improve the processing efficiency in the CPU, and can be beneficial to the subsequent optimization of the MapReduce architecture;
  • the data block is read from the second buffer through the data splicer and spliced into the working buffer of the GPU that is allocated the storage data block, and the splicing fails, indicating the remaining of the working buffer of the GPU that is allocated the storage data block. If the memory is not enough to complete the splicing of the data block, the splicing of the data block is temporarily stopped, and the GPU is triggered to perform data operation on the data block. The data block will also be temporarily saved in the second buffer, and then spliced next time. Compared with the prior art, it does not need to rely on a program written by a programmer, and the data splicing can be automatically completed by the data splicer, which effectively prevents data block loss and improves data block splicing efficiency.
  • FIG. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a data processing method according to another embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a data processing method according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a data processing method according to another embodiment of the present invention.
  • FIG. 5-a is a schematic structural diagram of a data preprocessor according to an embodiment of the present invention.
  • FIG. 5-b is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention.
  • FIG. 5-c is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention.
  • FIG. 5-d is a schematic structural diagram of a data preprocessor according to another embodiment of the present invention.
  • 6-a is a schematic structural diagram of a data splicer according to an embodiment of the present invention.
  • 6-b is a schematic structural diagram of a data splicer according to another embodiment of the present invention.
  • 6-c is a schematic structural diagram of a data splicer according to another embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a processor according to an embodiment of the present invention.
  • FIG. 8-a is a schematic structural diagram of a slave node device according to an embodiment of the present invention.
  • FIG. 8-b is a schematic diagram of interaction between a CPU and a GPU in a slave node device according to an embodiment of the present invention
  • FIG. 9 is a schematic structural diagram of a data processing device according to an embodiment of the present invention.
  • the embodiment of the invention provides a data processing method and related device, which is applied to a Hadoop cluster under the MapReduce architecture, realizes automatic conversion of Hadoop slave node data format and automatic data splicing, simplifies programmer programming, and is beneficial to subsequent optimization of MapReduce. Architecture.
  • an aspect of the present invention provides a data processing method, including:
  • the data preprocessor reads the metadata from the first buffer of the CPU. When the data set obtained from the data fragment is stored in the first buffer, the first buffer is in the first buffer. Adding metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
  • the embodiment of the present invention is applied to a Hadoop cluster in a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave node device obtains data points from the master node device. Slice, and a data preprocessor and data splicer are provided in the CPU.
  • the data mainly includes the storage address of the data in the data set in the first buffer.
  • the data preprocessor reads data of the data set from the first buffer according to a storage address indicated by the metadata.
  • the data preprocessor can directly read the data of the data set from the first buffer according to the indication of the metadata, without having to rely on the programmer to write an additional program. To read the data.
  • the data pre-processor converts data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generates a data block after the converted data set is stored in the a second buffer of the CPU such that the data splicer reads the data block from the second buffer to spliced to the GPU.
  • an analytic function is pre-configured in the MapReduce architecture, and the data preprocessor can parse the data of the data set in the first buffer according to a preset analytic function, convert the data format into a preset analytic function, and then convert the data format.
  • the subsequent data set generates a data block.
  • a second buffer is allocated in the CPU for storing data blocks. The data splicer can then read the data block from the second buffer into the GPU.
  • the metadata since the metadata is added to the data set when the data set is stored in the first buffer of the CPU, the metadata includes the storage address of the data of the data set in the first buffer, and therefore, the data is preprocessed.
  • the device After reading the metadata from the first buffer, the device reads the data of the data set from the first buffer according to the storage address indicated by the metadata, and then converts the data format of the data by using a preset analytic function, and converts the format.
  • the subsequent data set generates a data block and stores it in the second buffer in the CPU, thereby implementing the operation of automatically reading the data of the first buffer by the data preprocessor and parsing the data, without further relying on the program.
  • the programming of the staff provides a more complete MapReduce architecture for the programmer, and is also beneficial for subsequent optimization of the MapReduce architecture.
  • mapping Map function is specified to map the input key-value pairs into new key-value pairs; and the concurrent reduced Reduce function is used to ensure the key-value pairs of all the mappings. Each of them shares the same key group.
  • Map function maps the input key-value pairs into new key-value pairs, all the new key-value pairs are divided into different data fragments according to the data size by the master node device in the Hadoop cluster, according to the data fragmentation arrangement. Perform corresponding arithmetic processing for each slave node device.
  • the CPU In the CPU where the slave device is located, call the RecordReader class to get the key in the data fragment. Value pairs, and extract the values from the key-value pairs into a data set.
  • the CPU allocates a DirectBuffer to its data set in its memory.
  • the data set is stored in the DirectBuffer in the format of the DirectBuffer.
  • metadata is added to the data set at the head of the DirectBuffer.
  • a preset analytic function for parsing data of the data set is pre-set, and the preset analytic function specifically converts the data into a specified data format that satisfies the logical operation.
  • a data processing method may include:
  • the data preprocessor reads the metadata from the DirectBuffer, where the metadata specifically includes an address index array, where the address index array includes a data element corresponding to the data of the data set, the data element a storage address for indicating data of the data set in the DirectBuffer;
  • the metadata when storing the data set into the DirectBuffer, metadata is added at the head of the DirectBuffer to indicate the storage address of the data in the data set in the DirectBuffer.
  • the metadata may include an address index array.
  • the storage address of the data is added to the address index array according to the data in the data set at the position of the DirectBuffer.
  • the address index array has data elements that correspond one-to-one with the data in the data set, and the data elements indicate the storage address of the data of the data set in the DirectBuffer.
  • the data stored in the data collection of DirectBuffer is the same data format, and can be a format such as text format or binary that cannot be logically operated.
  • the data preprocessor reads data in the data set from the DirectBuffer according to the data element of the address index array in the metadata.
  • the data preprocessor reads the corresponding data from the storage address in the DirectBuffer according to the storage address indicated by the data element in the address index array until the storage address indicated by the next data element is read or When the end of the DirectBuffer ends, one data of the data set is read, and then the next data is read until the data of the data set in the DirectBuffer is read.
  • the data preprocessor converts data of the data set according to a preset analytic function. And replacing with the data format specified by the preset analytic function to satisfy the logical operation;
  • the data stored in the data set of the DirectBuffer is generally a data format that cannot be logically operated, and needs to be converted into a format that can be logically operated before being transferred to the GPU for logical operations. Therefore, the analytic function is preset in the MapReduce architecture, and the data preprocessor automatically converts the data format according to the preset analytic function, and converts into a data format that satisfies the logical operation specified by the analytic function.
  • the data format specified by the preset analytic function may be a data format required by the GPU logic operation.
  • the data format that can be logically operated by the preset analytic function may be shaped data, floating point data, string data, or the like.
  • the preprocessor converts the data set after the data format to generate a data block.
  • the data preprocessor After the data preprocessor automatically converts each data into a data format that can be logically operated according to a preset analytic function according to a preset analytic function, the data format is converted in order to facilitate subsequent splicing of data between the CPU and the GPU.
  • the subsequent data set generates a data block.
  • the preprocessor stores the data block in a LaunchingBuffer, so that the data splicer reads the data block from the LaunchingBuffer and splices it to the GPU.
  • the CPU further allocates a LaunchingBuffer in the memory to temporarily store the data format converted data block, wherein the data preprocessor stores the data block in the LaunchingBuffer, and then the data splicer completes the reading of the data block from the LaunchingBuffer to the data block.
  • the data stored in the DirectBuffer of the CPU and the data to be processed by the GPU may be inconsistent in the storage format, that is, the processing of the size and mantissa problem is inconsistent, wherein the small endian storage format refers to the high order of the data stored in the high In the address, the lower bits of the data are stored in the lower address; the big-endian storage format means that the upper bits of the data are stored in the lower address, and the status of the data is stored in the upper address. Therefore, the data preprocessor also needs to solve the problem of the size and mantissa of the data block.
  • the DirectBuffer allocated by the CPU has its own member variable, which indicates whether the data is stored in the DirectBuffer in the big-end format or the small-tail format. It also indicates whether the storage format needs to be converted when stored in the LaunchingBuffer, and the conversion needs to be given.
  • the big tail format is a hint for the small tail format. For example, the data in the data set is stored in the DirectBuffer in the big tail format, while the GPU stores the data in the small tail format. When the data block is stored in the LaunchingBuffer, the data in the data block is stored in a small endian format. save at LaunchingBuffer.
  • the data splicer can directly read the data block from the LaunchingBuffer and splice it to the GPU, ensuring that the storage format of the data of the LaunchingBuffer and the GPU of the CPU is consistent, ensuring that the GPU can correctly read the data block for arithmetic processing, and avoid reading the data high. Going low, or reading the data status high, causes an arithmetic error.
  • the data preprocessor first reads the address index array from the DirectBuffer, and reads the data in the corresponding data set from the DirectBuffer according to the data elements in the address index array, and then, according to the preset
  • the analytic function implements data format conversion on the data in the data set, so that the data after the data format conversion can satisfy the logical operation.
  • the data set generation data block is stored in the LaunchingBuffer, and the data splicer reads the data block from the LaunchingBuffer and transmits it to the GPU.
  • the embodiment of the invention is completed by the data preprocessor in the CPU, and the data is automatically parsed by the preset analytic function, which facilitates the operation of the data block by the GPU, and the data preprocessor is used to simplify the programming work of the slave device. Conducive to future optimization.
  • the CPU automatically allocates and reclaims WorkingBuffer and LaunchingBuffer.
  • the working period of a WorkingBuffer is the processing time of one data fragment, and the life cycle of a LaunchingBuffer is the time to process a data collection.
  • the ResultBuffer is also allocated on the CPU to store the operation result returned by the GPU operation, and then the operation result is used as the input of the Reduce task in the MapReduce.
  • another aspect of the present invention provides a data processing method, including:
  • the data splicer reads the data block generated by the data preprocessor from the second buffer of the CPU.
  • the embodiment of the present invention is applied to a Hadoop cluster in a MapReduce architecture, where the Hadoop cluster includes a master node device and a slave node device, and the slave node device includes a processor CPU and a graphics processor GPU, and the slave node device obtains data points from the master node device. Slice, and a data preprocessor and data splicer are provided in the CPU.
  • the data preprocessor is configured to read data of the data set from the first buffer of the CPU, convert the data into a data format, and store the data set generated data block into the second buffer.
  • the data splicer mainly completes the splicing of data blocks from the CPU to the GPU.
  • the data splicer splices the data block into a working buffer of a GPU that is allocated a storage data block.
  • the data splicer reads the data block from the second buffer of the CPU, and splices the data block from the second buffer of the CPU to the working buffer of the GPU.
  • the data splicing is completed by the data splicer, which is no longer dependent on the programming of the programmer, thereby simplifying the programmer's programming work and facilitating the subsequent optimization of the entire MapReduce architecture.
  • a data processing method may include:
  • the data splicer reads a data block from the LaunchingBuffer.
  • the CPU also allocates a LaunchingBuffer in the memory, which is mainly used to store data blocks that need to be spliced to the GPU.
  • S420 The data splicer splices the data block from a starting address indicated by a cursor parameter, where the cursor parameter is used to indicate a starting address of a WorkingBuffer in the GPU where a storage data block is allocated for storing a data block. ;
  • WorkingBuffer is allocated in the GPU memory, which is mainly used to store data spliced from the CPU's LaunchingBuffer; the memory size of the WorkingBuffer is determined by the GPU itself, and the memory size of the DirectBuffer in the CPU is determined by the Java operating environment. Generally speaking, the memory size of the WorkingBuffer on the GPU is much larger than the memory of the DirectBuffer supported by Java in the CPU. Therefore, the WorkingBuffer may store at least one data block obtained from the DirectBuffer, and when stored in a certain data block, The remaining memory of the WorkingBuffer may no longer be able to store the data block, which will be processed correctly by the data splicer.
  • the data splicer manages a cursor parameter, and the cursor parameter indicates that the WorkingBuffer can store the starting address of the data. After each splicing of the data block to the WorkingBuffer, the cursor parameter is updated accordingly, so that the WorkingBuffer can be accurately stored next time. The starting address of the data. When the data block needs to be transferred to the WorkingBuffer, the data block is spliced into the WorkingBuffer starting from the starting address indicated by the cursor parameter.
  • the data in the data block read by the data splicer from the LaunchingBuffer can be directly logically operated, and meets the storage format requirement of the GPU for the data. Invoking an application programming interface (API) to stitch data in the data block to WorkingBuffer. If the remaining memory of the WorkingBuffer can be spliced out of the data block read from the CPU's LaunchingBuffer, the entire data block is spliced into the WorkingBuffer; if the remaining memory of the WorkingBuffer cannot be spliced, the data block read from the CPU's LaunchingBuffer is suspended. The data block is spliced, the data block is still stored in the LaunchingBuffer, and the GPU is triggered to start processing all the data blocks in the WorkingBuffer.
  • API application programming interface
  • the data splicer in the CPU is used to solve the problem of data block splicing when the remaining memory size of the WorkingBuffer in the DirectBuffer and the GPU in the CPU is inconsistent.
  • the data splicer directly splices the data block from the LaunchingBuffer to the WorkingBuffer. If the remaining memory size of the WorkingBuffer cannot satisfy the storage data block, the splicing operation is temporarily stopped. If the remaining memory of the WorkingBuffer can be spliced again, again.
  • the read data block in the LaunchingBuffer is spliced into the WorkingBuffer. Since the data block in the LaunchingBuffer has already met the needs of the GPU for data processing, the GPU can directly perform arithmetic processing after receiving the data block, thereby effectively improving the working efficiency of the GPU.
  • the data splicer notifies a size of the data block of the GPU.
  • the data splicer updates the cursor parameter.
  • the data splicer notifies the GPU to the GPU after each successful splicing of the data block to the GPU, and the GPU can directly use the data block size without reducing the GPU workload.
  • the address index array is used to indicate that the data is stored in the DirectBuffer storage address
  • the GPU may also add a lookup index array to the data block in the WorkingBuffer header, and the search index array contains the data block.
  • the data corresponds to the data element, and the data element is used to indicate the storage address of the data in the WorkingBuffer.
  • the data splicer splices a data block
  • the data element corresponding to each data of the data block is added in the search index array, so that the GPU can quickly find the data from the WorkingBuffer and read the data for operation.
  • steps B1 and B2 are in no particular order and are not limited herein.
  • each data slice received may eventually generate multiple data blocks
  • the WorkingBuffer allocated in the GPU is stored in units of data blocks, and the lifetime is one processed. The time of data fragmentation. After the data splicer successfully transmits the entire data fragment, the data splicer returns the flag value of the successful transmission to notify the master node device to allocate the next data fragment; after the data splicer fails to transmit the data fragment, it returns The failed flag value is transmitted to inform the master node device to suspend the allocation of the next data slice.
  • the ResultBuffer is also allocated in the GPU memory.
  • the ResultBuffer is used to save the result of the operation, and then the API interface is called, and the operation result is returned to the CPU and stored in the ResultBuffer allocated by the CPU as an input of the Reduce task under the MapReduce.
  • the DirectBuffer used to store the data set in the CPU, the LaunchingBuffer that stores the data block after the data format conversion, and the ResultBuffer used to store the result returned by the GPU are automatically allocated and reclaimed by the CPU.
  • the life cycle of the LaunchingBuffer is a data block. Processing time; the WorkingBuffer used to store the received data block in the GPU and the ResultBuffer storing the operation result are automatically allocated and reclaimed by the GPU.
  • the working period of the WorkingBuffer is the processing time of a data fragment; the life cycle of the ResultBuffer is WorkingBuffer has the same life cycle.
  • the buffers in the CPU and the GPU are automatically synchronized. For example, the ResultBuffer in the CPU is synchronized with the WorkingBuffer and ResultBuffer in the GPU.
  • an embodiment of the present invention further provides a data preprocessor 500, which may include:
  • a first reading unit 510 configured to read metadata from a first buffer of the CPU; wherein, when the data set acquired from the data fragment is stored in the first buffer, in the first buffer a region header adds metadata to the data set, where the metadata includes a storage address of the data of the data set in the first buffer;
  • the second reading unit 520 is configured to convert data of the data set into a data format indicated by the preset analytic function according to a preset analytic function, and generate a data block by using the converted data set;
  • the converting unit 530 is configured to parse the data, and generate a data block by using the parsed data set;
  • the storage unit 540 is configured to store the data block in a second buffer of the CPU, so that the data splicer reads the data block from the second buffer and splices to the GPU.
  • the embodiment of the present invention is applied to a Hadoop cluster under the MapReduce architecture, and the data preprocessor 500 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further configured.
  • the data preprocessor 500 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further configured.
  • There is a data splicer and each slave node device also has a GPU, and the slave node device acquires data fragments from the master node device of the Hadoop cluster, and then splicing the values in the key value pairs in the data fragment into data sets and storing them into the CPU memory.
  • the first buffer allocated in the first buffer because the memory of the first buffer may not be able to store the values of all the key-value pairs in the data fragment once, so the value of the key-value pair in the data fragment can be spliced into data multiple times. set.
  • the metadata is added to the data set at the head of the first buffer, the metadata mainly including the storage address of the data of the data set in the first buffer.
  • the metadata is read from the first buffer by the first reading unit 510, and then the second reading unit 520 reads the data in the data set from the first buffer according to the storage address indicated by the metadata, and then converts
  • the unit 530 performs data format conversion on the data, and generates a data block by converting the entire data set.
  • the storage unit 540 stores the data block in the second buffer of the CPU, and the second buffer is mainly allocated by the CPU in the memory. The data block is stored so that the data splicer can read the data block from the second buffer and transfer it to the working buffer of the GPU.
  • the data preprocessor automatically completes the data reading and the conversion of the data format, and the programmer does not need to write the corresponding program, which reduces the programming work of the programmer, and is more conducive to the subsequent optimization of the M, apReduce architecture. Improve CPU productivity.
  • the metadata specifically includes an address index array
  • the address index array includes a data element corresponding to the data of the data set
  • the data element is used to indicate that the data of the data set is at the first
  • the storage address of the buffer and as shown in FIG. 5-b, the second reading unit 520 may include:
  • a data reading unit 5210 configured to start reading from a data element of the address index array, starting at a storage address of the first buffer, until the storage address indicated by the next data element or the end of the first buffer ends data.
  • the data reading unit 5210 reads the corresponding data from the storage address in the first buffer according to the storage address indicated by the data element in the address index array until the storage address indicated by the next data element is When reading to the end of the first buffer, it ends, reads a data of the data set, and then continues to read the next data until the data of the data set in the first buffer is read.
  • the parsing unit 530 includes:
  • a data format conversion unit 5310 configured to convert, by using a preset analytic function, data of the data set into a data format that satisfies a logical operation specified by the analytic function;
  • the generating unit 5320 is configured to generate the data block by converting the data set.
  • the data format specified by the preset analytic function may be the data format required by the GPU logic operation.
  • the data format that can be logically operated by the preset analytic function may be shaped data, floating point data, string data, or the like.
  • the parsing unit 530 may further include:
  • the format conversion unit 5330 is configured to convert the data in the data block into the GPU when the storage format of the data of the first buffer to the data set is inconsistent with the storage format of the data in the GPU.
  • the first buffer of the CPU and the storage format of the GPU may be inconsistent, that is, the processing of the size and mantissa problem is inconsistent.
  • the small-endian storage format refers to the high-order data being stored in the high address, and the low-order storage of the data. In the low address; the big endian storage format means that the upper bits of the data are stored in the lower address, and the status of the data is stored in the upper address.
  • the first buffer allocated by the CPU has its own member variable, which indicates whether the data is stored in the first buffer in the big tail format or the small tail format. It also indicates whether it needs to be stored in the second buffer. Convert the storage format and give hints that need to be converted to a big tail format or a small tail format. For example, the data in the data set is stored in the first buffer in a big tail format, while the GPU stores the data in a small tail format, and the format conversion unit 5330 converts the data block into a small tail format, which is stored in the second buffer. Area.
  • the data splicer can directly read the data block from the second buffer and splice it to the GPU, ensuring that the second buffer of the CPU and the GPU store the data in the same format, and ensure that the GPU can correctly read the data block for arithmetic processing. Avoid reading the high bit of the data to the low bit, or reading the data bit high to cause an operation error.
  • an embodiment of the present invention further provides a data splicer 600, which may include:
  • a third reading unit 610 configured to read a data block generated by the data preprocessor from a second buffer of the CPU
  • the splicing processing unit 620 is configured to splicing the data block into a working buffer of the GPU to which the storage data block is allocated.
  • the embodiment of the present invention is applied to a Hadoop cluster under the MapReduce architecture.
  • the data splicer 600 is disposed in a CPU of a slave node device under a Hadoop cluster, wherein the CPU is further provided with a data preprocessor 500 as shown in FIG. 5-a.
  • each slave node device further includes a GPU, and the slave node device acquires data fragments from the master node device of the Hadoop cluster, and then splices the values in the key value pairs in the data fragment into data sets and stores them in the CPU memory.
  • the first buffer because the memory of the first buffer may not be able to store the values of all the key-value pairs in the data fragment once, the value of the key-value pair in the data fragment can be spliced into data sets multiple times.
  • the data preprocessor 500 reads data from the first buffer according to the metadata, then converts the data format, and then generates the data block by converting the entire data set after the data format to the second buffer in the CPU, and then Then, the third reading unit 610 of the data splicer reads the data block from the second buffer of the CPU, and the splicing processing unit 620 splices the read data block into the working buffer of the GPU that is allocated the storage data block. in.
  • the data format converter 500 completes the data format conversion, and the data splicer completes the data block splicing, no longer relies on the programmer to write the corresponding program, which can simplify the programmer's programming work.
  • the automatic operation of the data preprocessor 500 and the data splicer can improve the working efficiency of the CPU, and is also beneficial to the subsequent optimization of MapReduce.
  • the data splicer 600 manages a cursor parameter, and the cursor parameter indicates that the working buffer of the GPU can store the starting address of the data. After each time the data block is spliced into the working buffer of the GPU, the cursor parameter is updated accordingly, so that the next time It is possible to know exactly where the GPU's working buffer can store the starting address of the data.
  • the splicing processing unit 620 splices the data block into the working buffer of the GPU according to the starting address indicated by the cursor parameter.
  • the splicing processing unit 620 is specifically configured to splicing the data block from a starting address indicated by the vernier parameter, where the vernier parameter is used to indicate that the working buffer allocated to the stored data block in the GPU is available for storing data. The starting address of the block.
  • the data splicer further includes:
  • the trigger processing unit 630 is configured to: when the data splicer splicing the data block to the working buffer of the GPU that is allocated to store the data block, suspending splicing the data block and triggering the GPU processing The data block stored by the working buffer.
  • the third reading unit 610 of the data splicer 600 reads from the data block of the second buffer Data can be directly logically operated and meets the GPU's storage format requirements for data.
  • the API is called to splicing the data in the data block to the working buffer of the GPU.
  • the entire data block is spliced into the working buffer of the GPU; if the remaining memory of the working buffer of the GPU cannot After the data block read from the second buffer of the CPU is spliced, that is, when the spliced data block fails, the data block is suspended and the data block is still stored in the second buffer, and the trigger processing unit 630 triggers the GPU to start. All data blocks in the working buffer are processed.
  • the data splicer 600 may further include:
  • a notification unit 640 configured to notify the GPU of a size of the data block
  • the updating unit 650 is configured to update the cursor parameter.
  • the notification unit 640 After each successful splicing of the data block to the GPU, the notification unit 640 notifies the GPU of the data block size, and the GPU can directly use the data block size without reducing the workload of the GPU.
  • the update unit 650 further cursors the parameters.
  • the embodiment of the present invention provides a processor 700, which includes a data preprocessor 500 as shown in FIG. 5-a and a data splicer 600 as shown in FIG. 6-a.
  • a processor 700 which includes a data preprocessor 500 as shown in FIG. 5-a and a data splicer 600 as shown in FIG. 6-a.
  • the introduction of the data preprocessor 500 and the data splicer 600 will not be described herein.
  • the first buffer and the second buffer are automatically allocated and reclaimed in the CPU.
  • the life cycle of the first buffer is the processing time of one data fragment
  • the life cycle of the second buffer is the processing time of one data block.
  • the working buffer is automatically allocated in the GPU, and the working time of the working buffer is the processing time of the previous data fragment.
  • an embodiment of the present invention further provides a slave node device, which may include:
  • processor CPU-700 as shown in FIG. 7 above, and a graphics processor GPU-800;
  • the CPU-700 is as described above and will not be described here.
  • the data preprocessor in the CPU-700 is configured to convert a data set obtained from the data fragment into a data format, and generate a data block by converting the data format after the data format, by using the data in the CPU-700.
  • a splicer splicing the data block into a working buffer of the GPU-800 to allocate a storage data block;
  • the GPU-800 is configured to process the data block to obtain a processing result, and then return the processing result to the CPU-700.
  • a ResultBuffer will be automatically allocated and reclaimed in the CPU-700.
  • a ResultBuffer is automatically allocated and reclaimed in the GPU-800.
  • the ResultBuffer in the CPU-700 has the same lifetime as the ResultBuffer in the GPU-800. Is the result of storing the operation.
  • the first buffer allocated by CPU-700 is DirectBuffer
  • the second buffer is LaunchingBuffer
  • the working buffer allocated by GPU-800 is WorkingBuffer
  • Figure 8-b is A schematic diagram of interaction between the CPU-700 and the GPU-800 in the slave node device provided by the embodiment of the present invention. As shown in FIG.
  • the data preprocessor 500 and the data splicer 600 are set in the CPU-700.
  • DirectBuffer, LaunchingBuffer and ResultBuffer are allocated in the CPU-700.
  • the DirectBuffer stores a data set that needs to be converted into a data format.
  • the data set includes data composed of values in a key-value pair, and metadata is added in the DirectBuffer.
  • the data mainly includes the data of the data set in the storage address of the DirectBuffer, and the preprocessor 500 can read the data in the data set from the DirectBuffer according to the metadata data, and then perform automatic data format conversion and conversion on the data through the specified preset analytic function.
  • the subsequent data set generates a data block
  • the data preprocessor 500 stores the data block into the LaunchingBuffer. If the storage format of the data in the data block needs to be converted when stored in the LaunchingBuffer, the storage format is converted to ensure that the data storage format in the LaunchingBuffer is the same as the WorkingBuffer in the GPU-800.
  • the data splicer 600 splicing the data block from the LaunchingBuffer to the WorkingBuffer in the GPU-800. If the splicing fails, the WorkingBuffer can no longer store the data block, and then the GPU is triggered to perform the operation processing on the data block stored in the WorkingBuffer, and the GPU will operate. The result is stored in the ResultBuffer where it is located, and the API interface is called to transfer the result of the operation to the ResultBuffer in the CPU.
  • an embodiment of the present invention further provides a data processing device, which may include: a memory 910 and at least one processor 920 (taking one processor in FIG. 9 as an example).
  • the memory 910 and the processor 920 may be connected by a bus or other means, wherein FIG. 9 is exemplified by a bus connection.
  • the processor 920 may perform the following steps: the data preprocessor reads metadata from a first buffer of the CPU; wherein, when the data set obtained from the data fragment is stored in the first a buffer, adding metadata to the data set in the first buffer header, where the metadata includes a storage address of the data of the data set in the first buffer; the data pre- The processor reads data of the data set from the first buffer according to a storage address indicated by the metadata; the data preprocessor converts data of the data set into a according to a preset analytic function Determining a data format indicated by the analytic function, and generating the data block after the converted data set is stored in a second buffer of the CPU, so that the data splicer reads the second buffer The data block is spliced to the GPU.
  • the data splicer reads a data block generated by the data preprocessor from a second buffer of the CPU; the data splicer splices the data block into a work of the GPU to which a storage data block is allocated Buffer.
  • the processor 920 may further perform the step of: the data preprocessor instructing the data element of the address index array to start reading at the storage address of the first buffer until the next data element The indicated storage address or the end of the first buffer ends reading data.
  • the processor 920 may further perform the following steps: the data preprocessor converts data of the data set into a logical operation specified by the analytic function according to the preset analytic function Data Format.
  • processor 920 may also perform the step of converting the data in the data block to a storage format in the GPU.
  • the processor 920 may further perform the following steps: when the data splicer splicing the data block to the working buffer of the GPU that is allocated the storage data block, the splicing is suspended. Decoding the data block and triggering the GPU to process the data block stored by the working buffer.
  • the processor 920 may further perform the step of: the data splicer splicing the data block starting from a starting address indicated by a cursor parameter, the cursor parameter being used to indicate that the GPU is allocated The starting address of the data block that can be used to store the data block.
  • the processor 920 may further perform the following steps: the data spelling The router notifies the GPU of the size of the data block; the data splicer updates the cursor parameter.
  • the memory 910 can be used to store data sets, metadata, and data blocks;
  • the memory 910 can also be used to store an array of address indices.
  • the memory 910 can also be used to store cursor parameters.
  • the memory 910 can also be used to store the results of the operations.

Abstract

L'invention concerne un procédé de traitement de données et un dispositif correspondant, mettant en œuvre une conversion automatique de format de données et un raccordement automatique de données dans un dispositif de nœud par un Hadoop. Le procédé comprend principalement les étapes suivantes : un préprocesseur de données lit des métadonnées à partir d'une première zone tampon d'une unité centrale de traitement, lit les données d'une collection de données provenant de la première zone tampon sur base d'une adresse mémoire indiquée par les métadonnées, convertit, sur base d'une fonction analytique prédéfinie, les données de la collection de données en un format de données indiqué par la fonction analytique prédéfinie, et stocke des blocs de données générés avec la collection de données convertie dans une seconde zone tampon de l'unité centrale de traitement, permettant ainsi à un jonctionneur de données de lire à partir de la seconde zone tampon les blocs de données et de les joindre à une unité de traitement graphique.
PCT/CN2014/094071 2013-12-23 2014-12-17 Procédé de traitement de données et dispositif correspondant WO2015096649A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310719857.4A CN104731569B (zh) 2013-12-23 2013-12-23 一种数据处理方法及相关设备
CN201310719857.4 2013-12-23

Publications (1)

Publication Number Publication Date
WO2015096649A1 true WO2015096649A1 (fr) 2015-07-02

Family

ID=53455495

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094071 WO2015096649A1 (fr) 2013-12-23 2014-12-17 Procédé de traitement de données et dispositif correspondant

Country Status (2)

Country Link
CN (1) CN104731569B (fr)
WO (1) WO2015096649A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023061295A1 (fr) * 2021-10-13 2023-04-20 杭州趣链科技有限公司 Procédé et appareil de traitement de données, et dispositif électronique et support de stockage

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159610B (zh) * 2015-09-01 2018-03-09 浪潮(北京)电子信息产业有限公司 大规模数据处理系统及方法
CN106326029A (zh) * 2016-08-09 2017-01-11 浙江万胜智能科技股份有限公司 一种用于电力仪表的数据存储方法
US10853262B2 (en) * 2016-11-29 2020-12-01 Arm Limited Memory address translation using stored key entries
CN109408450B (zh) * 2018-09-27 2021-03-30 中兴飞流信息科技有限公司 一种数据处理的方法、系统、协处理装置和主处理装置
CN111143232B (zh) * 2018-11-02 2023-08-18 伊姆西Ip控股有限责任公司 用于存储元数据的方法、设备和计算机可读介质
CN109522133B (zh) * 2018-11-28 2020-10-02 北京字节跳动网络技术有限公司 一种数据拼接方法、装置、电子设备及存储介质
EP3964949B1 (fr) * 2019-05-27 2023-09-06 Huawei Technologies Co., Ltd. Procédé et appareil de traitement graphique
CN110769064B (zh) * 2019-10-29 2023-02-24 广州趣丸网络科技有限公司 一种用于离线推送消息的系统、方法和设备
CN113535857A (zh) * 2021-08-04 2021-10-22 阿波罗智联(北京)科技有限公司 数据同步方法及装置
CN115952561A (zh) * 2023-03-14 2023-04-11 北京全路通信信号研究设计院集团有限公司 应用于轨道交通系统的数据处理方法、装置、设备及介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050140682A1 (en) * 2003-12-05 2005-06-30 Siemens Medical Solutions Usa, Inc. Graphics processing unit for simulation or medical diagnostic imaging
CN102662639A (zh) * 2012-04-10 2012-09-12 南京航空航天大学 一种基于Mapreduce的多GPU协同计算方法
CN102708088A (zh) * 2012-05-08 2012-10-03 北京理工大学 面向海量数据高性能计算的cpu/gpu协同处理方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050140682A1 (en) * 2003-12-05 2005-06-30 Siemens Medical Solutions Usa, Inc. Graphics processing unit for simulation or medical diagnostic imaging
CN102662639A (zh) * 2012-04-10 2012-09-12 南京航空航天大学 一种基于Mapreduce的多GPU协同计算方法
CN102708088A (zh) * 2012-05-08 2012-10-03 北京理工大学 面向海量数据高性能计算的cpu/gpu协同处理方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023061295A1 (fr) * 2021-10-13 2023-04-20 杭州趣链科技有限公司 Procédé et appareil de traitement de données, et dispositif électronique et support de stockage

Also Published As

Publication number Publication date
CN104731569A (zh) 2015-06-24
CN104731569B (zh) 2018-04-10

Similar Documents

Publication Publication Date Title
WO2015096649A1 (fr) Procédé de traitement de données et dispositif correspondant
US20200210092A1 (en) Infinite memory fabric streams and apis
CN112422615B (zh) 一种通信的方法及装置
US9671970B2 (en) Sharing an accelerator context across multiple processes
US9734085B2 (en) DMA transmission method and system thereof
CN108647104B (zh) 请求处理方法、服务器及计算机可读存储介质
TW201731253A (zh) 量子金鑰分發方法及裝置
EP2437167A1 (fr) Procédé et système pour la migration de stockage virtuel et dispositif de surveillance de machine virtuelle
US10956335B2 (en) Non-volatile cache access using RDMA
TWI773959B (zh) 用於處理輸入輸出儲存指令之資料處理系統、方法及電腦程式產品
US20190155925A1 (en) Sparse dictionary tree
JP2022105146A (ja) アクセラレーションシステム、アクセラレーション方法、及びコンピュータプログラム
US20220114145A1 (en) Resource Lock Management Method And Apparatus
US20210209057A1 (en) File system quota versioning
KR20210092689A (ko) 그래프 데이터베이스의 순회 방법, 장치, 설비 및 저장매체
CN110119304B (zh) 一种中断处理方法、装置及服务器
AU2015402888A1 (en) Computer device and method for reading/writing data by computer device
JP5124430B2 (ja) 仮想マシンの移行方法、サーバ、及び、プログラム
US10216664B2 (en) Remote resource access method and switching device
US20200125548A1 (en) Efficient write operations for database management systems
US20160055107A1 (en) Data processing apparatus and method
EP4242862A2 (fr) Base de données clés/valeurs accédée par rdma
KR20220058581A (ko) 생성자-소비자 활성 직접 캐시 전달
CN112486702A (zh) 基于多核多处理器并行系统的全局消息队列实现方法
JP2012234564A (ja) 仮想マシンの移行方法、サーバ、及び、プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14873198

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14873198

Country of ref document: EP

Kind code of ref document: A1