Búsqueda Imágenes Maps Play YouTube Noticias Gmail Drive Más »
Iniciar sesión
Usuarios de lectores de pantalla: deben hacer clic en este enlace para utilizar el modo de accesibilidad. Este modo tiene las mismas funciones esenciales pero funciona mejor con el lector.

Patentes

  1. Búsqueda avanzada de patentes
Número de publicaciónWO2015096649 A1
Tipo de publicaciónSolicitud
Número de solicitudPCT/CN2014/094071
Fecha de publicación2 Jul 2015
Fecha de presentación17 Dic 2014
Fecha de prioridad23 Dic 2013
También publicado comoCN104731569A
Número de publicaciónPCT/2014/94071, PCT/CN/14/094071, PCT/CN/14/94071, PCT/CN/2014/094071, PCT/CN/2014/94071, PCT/CN14/094071, PCT/CN14/94071, PCT/CN14094071, PCT/CN1494071, PCT/CN2014/094071, PCT/CN2014/94071, PCT/CN2014094071, PCT/CN201494071, WO 2015/096649 A1, WO 2015096649 A1, WO 2015096649A1, WO-A1-2015096649, WO2015/096649A1, WO2015096649 A1, WO2015096649A1
Inventores崔慧敏, 谢睿, 阮功, 杨文森
Solicitante华为技术有限公司
Exportar citaBiBTeX, EndNote, RefMan
Enlaces externos:  Patentscope, Espacenet
Data processing method and related device
WO 2015096649 A1
Resumen
A data processing method and a related device, implementing automatic conversion of data format and automatic splicing of data in a node device by a Hadoop. The method mainly comprises: a data preprocessor reads metadata from a first buffer area of a CPU, reads data of a data collection from the first buffer area on the basis of a memory address indicated by the metadata, converts, on the basis of a preset analytic function, the data of the data collection into a data format indicated by the preset analytic function, and stores data blocks generated with the converted data collection in a second buffer area of the CPU, thus allowing a data splicer to read from the second buffer area the data blocks and to splice same to a GPU.
Imágenes(15)
Previous page
Next page
Reclamaciones(22)  traducido del chino
  1. 一种数据处理方法,其特征在于,应用于MapReduce架构下的Hadoop集群,所述Hadoop集群包括主节点设备和从节点设备,所述从节点设备包括处理器CPU和图形处理器GPU,所述从节点设备从所述主节点设备获取数据分片,所述CPU中设置有数据预处理器和数据拼接器,所述方法包括: A data processing method, wherein applied Hadoop MapReduce cluster architecture, the Hadoop cluster includes a master node device and the slave device, the slave device includes a processor CPU and graphics processor GPU, the slave node device obtains data from the master node device fragmentation, the CPU is provided with a data pre-processor and data splice, said method comprising:
    所述数据预处理器从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址; The data preprocessor reads the metadata from the first buffer of the CPU; wherein, when the data set stored in the buffer gets the first slice from the data in the first buffer head adding to the data set metadata, the metadata comprising the data set in said first buffer memory address;
    所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据; According to the data pre-processor reads the metadata storage address indicated by the data buffer from said first set of data;
    所述数据预处理器根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块后存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。 The data preprocessor by default analytic function to convert data into the data set the default format of analytic functions indicated, and the converted data set after generation data stored in the CPU's block a second buffer, so that the read data splicer to splice the data blocks from the second buffer the GPU.
  2. 根据权利要求1所述的方法,其特征在于,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取数据集合中的数据包括: The method according to claim 1, wherein the metadata specifically include address indexed array indexed array that contains the address data correspond to the data set of data elements, the data element is used Data indicative of the data set in the first address of the buffer memory, and then reads the data pre-processor comprises a data collection from the first buffer memory according to the address indicated by the metadata :
    所述数据预处理器从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。 The data read from the data pre-processor element array index indicating the address of the first buffer memory address, memory address until the next data element indicates the end or the end of the first buffer to read the data.
  3. 根据权利要求1所述的方法,其特征在于,所述将所述数据集合的数据转换成所述预设解析函数所指示的数据格式包括: The method according to claim 1, characterized in that said data set to the data is converted into the preset data format indicated by analytic functions include:
    所述数据预处理器根据所述预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式。 The data is converted into data format preprocessor function specified logical operation to meet the analytical data based on the pre-analytic function of the data collection.
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括: The method according to claim 3, characterized in that the method further comprises:
    当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,所述将转换后的数据集合生成数据块后包括: When inconsistencies after the first buffer storage format for the data collection and data storage format of the data in the GPU, the converted data collection will generate the data block comprises:
    所述数据预处理器将所述数据块中的数据转换成所述GPU中的存储格式。 The data pre-processor converts the data into the data block in the GPU storage format.
  5. 根据权利要求1~4任一项所述的方法,其特征在于, A method as claimed in any one of claims 1 to 4, characterized in that,
    所述数据集合具体由所述数据分片中的多个键值对的值拼接组成。 The collection of data by the data points specific splicing multiple key values of the composition of the film.
  6. 根据权利要求1~4任一项所述的方法,其特征在于, A method as claimed in any one of claims 1 to 4, characterized in that,
    所述第一缓冲区和所述第二缓冲区由所述CPU自动分配和回收,所述第一缓冲区的生存周期为一个数据分片的处理时间,所述第二缓冲区的生存周期为一个数据集合的处理时间。 Said first buffer and said second buffer is assigned automatically by the CPU and recovery, the first buffer of the processing time for the life cycle of a data slice, the life cycle of the second buffer a processing time of data collection.
  7. 一种数据处理方法,其特征在于,应用于MapReduce架构下的Hadoop集群,所述Hadoop集群包括主节点设备和从节点设备,所述从节点设备包括处理器CPU和图形处理器GPU,所述从节点设备从所述主节点设备获取数据分片,所述CPU中设置有数据预处理器和数据拼接器,所述方法包括: A data processing method, wherein applied Hadoop MapReduce cluster architecture, the Hadoop cluster includes a master node device and the slave device, the slave device includes a processor CPU and graphics processor GPU, the slave node device obtains data from the master node device fragmentation, the CPU is provided with a data pre-processor and data splice, said method comprising:
    所述数据拼接器从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块; The data read from the CPU splicer second buffer the data pre-processor generates data blocks;
    所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。 The data splicer will splice the data block is allocated to the GPU memory block work buffer.
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括: The method according to claim 7, characterized in that the method further comprises:
    当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。 When the data splicer will splice the data block is allocated to the GPU memory block work buffer failed, the data block is suspended splicing, and triggering the GPU processing the work stored in the buffer data blocks.
  9. 根据权利要求7或8所述的方法,其特征在于,所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区包括: The method according to claim 7 or claim 8, wherein said data splicer to splice the data block is allocated to the GPU work memory block buffer comprising:
    所述数据拼接器从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。 The data splicer begin splicing from the start address of the data block parameters indicated by the cursor, the cursor parameter is used to start address indicating the GPU is allocated memory block job buffer blocks can be used to store data .
  10. 根据权利要求9所述的方法,其特征在于,当拼接所述数据块成功后,所述方法还包括: The method according to claim 9, characterized in that, after stitching the data block is successful, the method further comprising:
    所述数据拼接器通知所述GPU所述数据块的大小; The data splicer notify the GPU of the size of the data block;
    所述数据拼接器更新所述游标参数。 The data splicer updating the cursor parameters.
  11. 一种数据预处理器,其特征在于,包括: A data pre-processor, characterized by comprising:
    第一读取单元,用于从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址; First reading unit for reading metadata from the first buffer of the CPU; wherein, when the data set stored in the buffer gets the first slice from the data in the first buffer added to the head of the data set metadata, the metadata comprising the data set in the address of the first buffer memory;
    第二读取单元,用于根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据; Second reading means for reading the meta data according to the stored address indicated by the data buffer from said first set of data;
    转换单元,用于根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块; Conversion unit for analytic functions by default, to convert the data into the data set the default format of analytic functions indicated, and the converted data set to generate a data block;
    存储单元,用于将所述数据块存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。 A storage unit for the data block stored in said second buffer of the CPU, so that the read data splicer to splice the data blocks from the second buffer the GPU.
  12. 根据权利要求11所述的数据预处理器,其特征在于,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而所述第二读取单元包括: Data pre-processor according to claim 11, wherein the metadata specifically include address indexed array indexed array that contains the address data correspond to the data set of data elements, the data Data elements for indicating the set of data stored in said first buffer address, the second reading unit further comprises:
    数据读取单元,用于从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。 Data reading unit for reading data from the beginning of the address of an indexed array element is indicated in the first buffer memory address until the storage address of the next instruction or data element to the first end of the end of the read data buffer .
  13. 根据权利要求11或12所述的数据预处理器,其特征在于,所述解析单元包括: According to the data pre-processor of claim 11 or 12, characterized in that said analyzing means comprises:
    数据格式转换单元,用于通过预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式 Data format conversion unit for pre-analytic functions by converting the data into the data set to meet the analytic function specifies the data format of the logic operation
    生成单元,用于将转换的数据集合生成数据块。 Generating means for converting the data set to generate a data block.
  14. 根据权利要求11所述的数据预处理器,其特征在于,所述解析单元还包括: According to the data pre-processor of claim 11, wherein said analyzing means further comprises:
    格式转换单元,用于当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,将所述数据块中的数据转换成 所述GPU中的存储格式。 Format conversion unit for, when the first buffer storage format inconsistency of the data collection and data storage format of the GPU in the data, converts the data into the data block in the GPU storage format.
  15. 一种数据拼接器,其特征在于,包括: A data stitcher, characterized by comprising:
    第三读取单元,用于从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块; Third reading means for reading from said second buffer of said CPU data pre-processor generates data blocks;
    拼接处理单元,用于将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。 Stitching processing unit for the data block is spliced to the GPU work memory blocks allocated buffer.
  16. 根据权利要求15所述的数据拼接器,其特征在于,所述数据拼接器还包括: Data splicer according to claim 15, characterized in that said data splicer further comprises:
    触发处理单元,用于当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。 Trigger processing unit for the data when the data block splicer spliced to the GPU fails to work when the buffer is allocated for storing data blocks, the data block is suspended splicing, and trigger the GPU processing said work data blocks stored in the buffer.
  17. 根据权利要求15或16所述的数据拼接器,其特征在于,所述拼接处理单元具体用于从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。 According to the splicer 15 or claim 16, wherein, wherein said processing unit is configured to start stitching splicing from the start address of the data block indicated by the cursor parameter, the parameter for indicating the cursor GPU The stored data blocks are allocated work buffer can be used to store the starting address of the data block.
  18. 根据权利要求17所述的数据拼接器,其特征在于,所述数据拼接器还包括: Data splicer according to claim 17, characterized in that said data splicer further comprises:
    通知单元,用于通知所述GPU所述数据块的大小; A notification unit for notifying the GPU of the size of the data block;
    更新单元,用于更新所述游标参数。 Updating means for updating the cursor parameters.
  19. 一种处理器,其特征在于,包括:如权利要求11所述的数据预处理器和权利要求15所述的数据拼接器。 A processor comprising: data pre-processor according to claim 11 and as claimed in claim claim 15, wherein the data splicer.
  20. 根据权利要求19所述的处理器,其特征在于,所述处理器还包括: The processor according to claim 19, characterized in that said processor further comprises:
    自动分配和回收所述第一缓冲区和所述第二缓冲区,所述第一缓冲区的生存周期为一个数据分片的处理时间,所述第二缓冲区的生存周期为一个数据集合的处理时间。 The automatic dispensing and recovering said first buffer and a second buffer, the first buffer is a data life cycle processing time slice, the second buffer is a data set for the life cycle of processing time.
  21. 一种从节点设备,其特征在于,所述从节点设备为Hadoop集群中的从节点设备,所述Hadoop集群还包括主节点设备,所述从节点设备从所述Hadoop集群接收数据分片,所述从节点设备包括:图形处理器GPU以及如权利要求19所述的处理器CPU; A slave device, wherein said slave device is a Hadoop cluster from the node device, the Hadoop cluster also includes a master node device, the slave device receives data fragments from the Hadoop cluster, the said slave device comprising: a processor CPU 19 and the GPU graphics processor of claim;
    其中,所述CPU中的数据预处理器用于将从数据分片获取的数据集合转换数据格式,并将转换数据格式后的数据集合生成数据块,通过所述CPU中的数据拼接器将所述数据块拼接到所述GPU中分配存储数据块的工作缓冲区中; Wherein the data set data format conversion, the CPU of the data from the data pre-processor for the acquisition of the slice, and converts the data format of the data set is generated after the data block to the CPU through the data splicer data block stitching to work the GPU buffer allocated for storing data blocks;
    所述GPU用于对所述数据块进行处理得到处理结果,之后将所述处理结果返回给所述CPU。 The GPU for processing of the data blocks get results, then the results returned to the CPU.
  22. 根据权利要求21所述的从节点设备,其特征在于,所述GPU还包括: From the node device according to claim, wherein said 21, the GPU further comprises:
    自动分配和回收所述工作缓冲区,所述工作缓冲区的生存周期为一个数据分片的处理时间。 And recovering the work automatically assigned buffer, the working buffer for the life cycle of a data processing time slice.
Descripción  traducido del chino
一种数据处理方法及相关设备 A data processing method and related equipment 技术领域 Technical Field

本发明涉及信息处理技术领域,具体涉及一种数据处理方法及相关设备。 The present invention relates to the field of information processing technology, in particular to a data processing method and related equipment.

背景技术 Background technique

大数据与云计算一起,为信息技术(IT,Information Technology)带来一场新革命,云计算具备强大的大数据计算能力,计算速度非常快,可是大数据的传送却成为其一大难题。 Big data and cloud computing together, information technology (IT, Information Technology) brought a new revolution, cloud computing with powerful big data computing capability, computing speed is very fast, but the transmission of large data has become one big problem.

MapReduce(本领域中暂时没有统一的中文译文)是谷歌搜索引擎Google提供的一个著名的云计算架构,用于大规模数据集(大于1TB)上的并行运算,Hadoop(本领域中暂时没有统一的中文译文)是MapReduce架构的具体实现,在Hadoop集群中分为主节点设备和从节点设备。 MapReduce (No unified the Chinese translation of the art) is a well-known cloud computing architecture Google search engine Google to provide for parallel computing with large data sets (more than 1TB) on, Hadoop (art No uniform Chinese translation) is the implementation of MapReduce architecture carved in the primary node Hadoop cluster and slave node devices. 其中,在主节点设备中利用MapReduce所提供的Map函数把数据集按照大小分割成M片数据分片,将数据分片分配到多个从节点设备上做并行处理。 Among them, the master node apparatus is provided utilizing MapReduce Map function according to the size of the data set is divided into M pieces of data fragmentation, allocate data fragmentation to do parallel processing of multiple devices from the node. 具体地,每个从节点设备从数据分片中获取键值对的值,将值拼接存储在从节点设备的处理器(Central Processing Unit,简称CPU)分配的缓冲区中,之后,从缓冲区读取键值对的值进行解析,例如转换键值对的值的数据格式等,再将解析后的值通过应用程序编程接口(API,Application Programming Interface)拼接到从节点设备的图形处理器(GPU,Graphics Processing Unit)分配存储数据的缓冲区中,由GPU进行计算处理。 Specifically, each node device from the data obtained from key slice of the values, the values stored in the processor node splice device (Central Processing Unit, referred to as the CPU) allocated buffer, after which the buffer reads the value of the key to be resolved, as the conversion of the value of the key data format, and then parsed value through the application programming interface (API, Application Programming Interface) to the slave device spliced graphics processor ( GPU, Buffer Graphics Processing Unit) to allocate storage data, performed by the GPU computing.

本发明技术人员在实现上述方案时发现,由于MapReduce架构中没有提供解析函数,在对键值对的值进行解析时,需要依靠于程序员所编写的相应程序;同时,由于CPU分配存储键值对的值的缓冲区与GPU分配用来存储数据的缓冲区大小可能不一致,而MapReduce架构中没有提供相应的判断方法,同样依靠于程序员所编写的相应判断函数,对CPU和GPU的缓冲区是否一致进行判断,降低从节点设备的执行效率。 Technical personnel of the present invention when implementing the above scheme found that due MapReduce framework does not provide analytic functions, in the value of the key to be resolved, need to rely on programmers to write the corresponding program; at the same time, since the CPU allocation of storage keys buffer of values and GPU buffer size allocated to store data may be inconsistent, and MapReduce architecture does not provide a method to judge, also it depends on the programmer to write the corresponding judgment function, for CPU and GPU buffers It is the same judgment, reducing the efficiency of slave devices.

发明内容 DISCLOSURE

针对上述缺陷,本发明实施例提供了一种的数据处理方法及相关设备,应用于MapReduce架构下的Hadoop集群,可以提高Hadoop集群中从节点设备的工作效率,简化程序员的编程工作,有利于后续优化MapReduce架构。 In response to these shortcomings, the present embodiment of the invention provides a method of data processing and related equipment, used in Hadoop MapReduce cluster architecture can improve the efficiency of the Hadoop cluster node devices, simplifying their programming work, help Subsequent optimization MapReduce architecture.

第一方面,本发明提供一种数据处理方法,应用于MapReduce架构下的Hadoop集群,所述Hadoop集群包括主节点设备和从节点设备,所述从节点设备包括处理器CPU和图形处理器GPU,所述从节点设备从所述主节点设备获取数据分片,所述CPU中设置有数据预处理器和数据拼接器,所述方法包括: The first aspect, the present invention provides a data processing method applied Hadoop MapReduce cluster architecture, the Hadoop cluster includes a master node device and the slave device, the slave device includes a processor CPU and graphics processor GPU, The slave device to obtain data slice from the master node device, the CPU is provided with a data pre-processor and data splice, said method comprising:

所述数据预处理器从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址; The data preprocessor reads the metadata from the first buffer of the CPU; wherein, when the data set stored in the buffer gets the first slice from the data in the first buffer head adding to the data set metadata, the metadata comprising the data set in said first buffer memory address;

所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据; According to the data pre-processor reads the metadata storage address indicated by the data buffer from said first set of data;

所述数据预处理器根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块后存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。 The data preprocessor by default analytic function to convert data into the data set the default format of analytic functions indicated, and the converted data set after generation data stored in the CPU's block a second buffer, so that the read data splicer to splice the data blocks from the second buffer the GPU.

结合第一方面,在第一种可能的实现方式中,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取数据集合中的数据包括:所述数据预处理器从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。 With the first aspect, the first possible implementation, the metadata specifically include address indexed array indexed array that contains the address data correspond to the data set of data elements, the data element It is used to indicate the data set in the first address of the buffer memory, and then storing the data preprocessor in accordance with the address indicated by the meta data read from the first data set in the buffer Data include: the data read from the data pre-processor element array index indicating the address of the first buffer memory address until the end of the storage address of the next instruction or data element of the first buffer end reads fetch data.

结合第一方面,在第二种可能的实现方式中,所述将所述数据集合的数据转换成所述预设解析函数所指示的数据格式包括:所述数据预处理器根据所述预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式。 Connection with the first aspect, in a second possible implementation, the data to convert the data into a set of analytic functions of the preset data format indicated by comprising: a pre-processor according to the preset data of the data analytic function of the data set into a data format function specified logical operation to meet the resolution.

结合第一方面的第二种可能的实现方式,在第三种可能的实现方式中,当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,所述将转换后的数据集合生成数据块后包括:所述数据预处理器将所述数据块中的数据转换成所述GPU中的存储格式。 Connection with the first aspect of the second possible implementation, in a third possible implementation, when the storage format of the GPU said first buffer to said data set to the data in the data storage format After inconsistent, the data is converted to generate the set of data blocks comprising: a data pre-processor converts the data into the data blocks stored in the format of the GPU.

结合第一方面,或第一方面的第一种可能的实现方式,或第一方面的第二种可能的实现方式,或第一方面的第三种可能的实现方式,在第四种可能的实现方式中,所述数据集合具体由所述数据分片中的多个键值对的值拼接组成。 Connection with the first aspect, or a first possible implementation of the first aspect, or a second possible implementation of the first aspect, or a third possible implementation of the first aspect, in the fourth possible implementation, the set of data specific data points from said plurality of key values stitching on the composition of the film.

结合第一方面,或第一方面的第一种可能的实现方式,或第一方面的第二种可能的实现方式,或第一方面的第三种可能的实现方式,在第五种可能的实现方式中,所述第一缓冲区和所述第二缓冲区由所述CPU自动分配和回收,所述第一缓冲区的生存周期为一个数据分片的处理时间,所述第二缓冲区的生存周期为一个数据集合的处理时间。 Connection with the first aspect, or a first possible implementation of the first aspect, or a second possible implementation of the first aspect, or a third possible implementation of the first aspect, in the fifth possible implementations, the first buffer and the second buffer is assigned automatically by the CPU and recovery, the first buffer is a life cycle processing time slice data, said second buffer The life cycle is the processing time of one data set.

本发明第二方面提供一种数据处理方法,应用于MapReduce架构下的Hadoop集群,所述Hadoop集群包括主节点设备和从节点设备,所述从节点设备包括处理器CPU和图形处理器GPU,所述从节点设备从所述主节点设备获取数据分片,所述CPU中设置有数据预处理器和数据拼接器,所述方法包括: The second aspect of the present invention to provide a data processing method, applied to Hadoop MapReduce cluster architecture, the Hadoop cluster includes a master node device and the slave device, the slave device includes a processor CPU and graphics processor GPU, the said slave device to obtain from the master node device data fragmentation, the CPU is provided with a data pre-processor and data splice, said method comprising:

所述数据拼接器从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块; The data read from the CPU splicer second buffer the data pre-processor generates data blocks;

所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。 The data splicer will splice the data block is allocated to the GPU memory block work buffer.

结合第二方面,在第一种可能的实现方式中,当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。 With the second aspect, in a first possible implementation, when the data splicer will splice the data block is allocated when the GPU memory block buffer fails to work, then pause the data splicing block, and triggers the processing of the GPU work memory data block buffer.

结合第二方面,或第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述数据拼接器从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。 Combined with a second aspect, or a first possible implementation of the second aspect, in a second possible implementation, the data splicer splicing starts from the start address of the data block indicated by the cursor parameters, the Cursor parameters for instructing the GPU work buffer is allocated for storing data blocks can be used to store the starting address of the data block.

结合第二方面的第二种可能的实现方式,在第三种可能的实现方式中,当拼接所述数据块成功后,所述方法还包括:所述数据拼接器通知所述GPU所述数据块的大小;所述数据拼接器更新所述游标参数。 With the second aspect of the second possible implementation, in a third possible implementation, when the data block successfully after splicing, the method further comprising: the data splicer notify the data the GPU the size of the block; the data splicer updating the cursor parameters.

本发明第三方面提供一种数据预处理器,包括: A third aspect of the present invention there is provided a data pre-processor, comprising:

第一读取单元,用于从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址; First reading unit for reading metadata from the first buffer of the CPU; wherein, when the data set stored in the buffer gets the first slice from the data in the first buffer added to the head of the data set metadata, the metadata comprising the data set in the address of the first buffer memory;

第二读取单元,用于根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据; Second reading means for reading the meta data according to the stored address indicated by the data buffer from said first set of data;

转换单元,用于根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块; Conversion unit for analytic functions by default, to convert the data into the data set the default format of analytic functions indicated, and the converted data set to generate a data block;

存储单元,用于将所述数据块存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。 A storage unit for the data block stored in said second buffer of the CPU, so that the read data splicer to splice the data blocks from the second buffer the GPU.

结合第三方面,在第一种可能的实现方式中,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而所述第二读取单元包括:数据读取单元,用于从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。 Combined with a third aspect, the first possible implementation, the metadata specifically include address indexed array indexed array that contains the address data correspond to the data set of data elements, the data element data for indicating the set of data stored in said first buffer address, the second reading unit further comprises: data reading unit, the address for the data element in the first array index indicating start reading the buffer memory address, memory address, or until the next data element indicates the end of the first end of the read data buffer.

结合第三方面,或第三方面的第一种可能的实现方式,在第二种可能的实现方式中,所述解析单元包括:数据格式转换单元,用于通过预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式生成单元,用于将转换的数据集合生成数据块。 Combined with a third aspect, or a first possible implementation of the third aspect, in a second possible implementation, the parsing unit comprising: data format conversion means for parsing function preset by the the data set is converted into the data format parsing meet specified logical function generating means for converting the data set to generate a data block.

结合第三方面,在第三种可能的实现方式中,所述解析单元还包括:格式转换单元,用于当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,将所述数据块中的数据转换成所述GPU中的存储格式。 Combined with a third aspect, in the third possible implementation, the analytical unit further comprises: format conversion unit for storage format when said first buffer to said data set with the data the GPU of inconsistent data storage format, to convert the data into the data block in the GPU storage format.

本发明第四方面提供一种数据拼接器,包括: A fourth aspect there is provided a data splicer of the present invention, comprises:

第三读取单元,用于从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块; Third reading means for reading from said second buffer of said CPU data pre-processor generates data blocks;

拼接处理单元,用于将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。 Stitching processing unit for the data block is spliced to the GPU work memory blocks allocated buffer.

结合第四方面,在第一种可能的实现方式中,所述数据拼接器还包括:触发处理单元,用于当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。 Combined with a fourth aspect, in the first possible implementation, the data splicer further comprises: triggering a processing unit for the data when the data block splicer spliced to the GPU is allocated to store data Working buffer block fails, the data block is suspended splicing, and triggering the GPU processing the job data block stored in the buffer.

结合第四方面,或第四方面的第一种可能的实现方式,在第二种可能的实现方式中,所述拼接处理单元具体用于从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。 Combined with a fourth aspect, or a first possible implementation of the fourth aspect, in the second possible implementation, the processing unit is configured to splicing from the starting address of the parameter indicated by the cursor splicing the data block , the cursor for indicating the parameter GPU memory block is allocated work buffer can be used to store the starting address of the data block.

结合第四方面的第二种可能的实现方式,在第三种可能的实现方式中,所述数据拼接器还包括:通知单元,用于通知所述GPU所述数据块的大小;更新单元,用于更新所述游标参数。 Combined with a fourth aspect of the second possible implementation, in a third possible implementation, the data splicer further comprising: a notification unit for notifying the size of the data block GPU; update unit, for updating the cursor parameters.

本发明第五方面提供一种处理器,可包括上述第三方面所述的数据预处理器和上述第四方面所述的数据拼接器。 A fifth aspect of the present invention to provide a processor, may include data pre-processor of the third aspect of the splicer and data according to the fourth aspect.

结合第五方面,在第一种可能的实现方式中,自动分配和回收所述第一缓冲区和所述第二缓冲区,所述第一缓冲区的生存周期为一个数据分片的处理时间,所述第二缓冲区的生存周期为一个数据集合的处理时间。 Combined with a fifth aspect, in a first possible implementation, the automatic dispensing and recovering said first buffer and said second buffer, said first buffer to a data life cycle processing time slice , the processing time of the second buffer for the life cycle of a data set.

本发明第六方面提供一种从节点设备,可包括上述第五方面所述的处理器CPU,以及图形处理器GPU;其中,所述CPU中的数据预处理器用于将从数据分片获取的数据集合转换数据格式,并将转换数据格式后的数据集合生成数据块,通过所述CPU中的数据拼接器将所述数据块拼接到所述GPU中分配存储数据块的工作缓冲区中;所述GPU用于对所述数据块进行处理得到处理结果,之后将所述处理结果返回给所述CPU。 The sixth aspect of the present invention there is provided a slave device may include a processor according to the fifth aspect of the CPU, and a graphics processor GPU; wherein, in said CPU from a data pre-processor for data acquisition fragment data set data format conversion, and converts the data format of the data set is generated after the data block, the CPU through the data to the data block splicer to splice the GPU work buffer allocated to store data blocks; the said GPU for processing of the data blocks to obtain the results, then the results returned to the CPU.

从以上技术方案可以看出,本发明实施例具有以下优点: As can be seen from the above technical solutions, the embodiment of the present invention has the following advantages:

一方面,本发明实施例通过在从节点设备中设置数据预处理器和数据拼接器,由数据预处理器从CPU的第一缓冲区读取元数据,由于元数据是在数 据集合存储进第一缓冲区时为该数据集合生成,用于表示该数据集合的数据在第一缓冲区的存储地址,之后数据预处理器能够根据元数据从第一缓冲区读取数据集合的数据,再根据预设的解析函数对数据集合的数据进行格式转换,之后将转换后的数据集合生成数据块,把数据块存储到CPU的第二缓冲区中,以便由数据拼接器完成与GPU的数据块拼接。 In one aspect, embodiments of the invention by providing data pre-processor and data from node splice device, by the data preprocessor reads the metadata from the first buffer of the CPU, because the metadata is stored in the first data set in When a buffer for the data set is generated, the data indicating the set of data stored in the first buffer address, then the data pre-processor can read the data from the first buffer in accordance with a set of metadata, according to pre-analytic functions for data collection format conversion, and then the converted data set to generate a data block, the data block stored in the second buffer of the CPU in order to complete the data splicer splicing block and GPU . 与现有技术相比,本发明实施例中通过在将数据集合存储进第一缓冲区时,为数据集合的数据添加包括存储地址的元数据,数据预处理器可以自动从第一缓冲区读取数据集合的数据,不需要依赖于程序员编写相应的程序。 Compared with the prior art, by the first set of data stored in the buffer, the data added to the data set metadata storage case comprising address, data pre-processor can be automatically read from the buffer a first embodiment of the present invention Data fetch data collection does not depend on programmers to write a program. 再者,数据预处理器可以根据预设解析函数对数据集合的数据进行解析,提高CPU中的处理效率,还能有利于后续优化MapReduce架构; Furthermore, the data pre-processor for data collection can be parsed by default analytic functions, improve the CPU efficiency, but also conducive to the subsequent optimization MapReduce architecture;

另一方面,通过数据拼接器从第二缓冲区读取数据块拼接到GPU中被分配存储数据块的工作缓冲区中,在拼接失败,说明GPU中被分配存储数据块的工作缓冲区的剩余内存不够完成数据块的拼接,则暂时停止拼接该数据块,转而触发GPU对数据块进行数据运算。 On the other hand, the data read blocks splicer spliced from the second buffer to the GPU memory blocks are allocated work in the buffer, the stitching fails, the remaining GPU is allocated memory block job buffer Not enough memory to complete the splice block, then temporarily stop stitching the block, in turn trigger GPU computing for data modules. 而数据块还将暂时保存在第二缓冲区中,下次再进行拼接。 The block will be temporarily stored in the second buffer, the next stitching. 与现有技术相比,不需要依赖于程序员编写的程序,可以由数据拼接器自动完成数据块拼接,有效防止数据块丢失,提高数据块拼接效率。 Compared with the prior art, it does not rely on programmers to write programs that can be done automatically by the data block spliced splicer, effectively preventing the loss of data blocks, improving block splicing efficiency.

附图说明 Brief Description

为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。 In order to more clearly illustrate the technical aspect of an embodiment of the present invention, the following embodiment of the present invention will need to use the figures to make a brief introduction, Obviously, the following description of the drawings are only some embodiments of the invention, for Those of ordinary skill in terms of creative work without paying the premise, you can also obtain other drawings based on these drawings.

图1为本发明一实施例提供的数据处理方法流程示意图; Figure 1 of the present invention to provide a data processing method of flow diagram illustrating an embodiment example;

图2为本发明另一实施例提供的数据处理方法流程示意图; Figure 2 is a data processing method of the invention provides a schematic diagram of another example process embodiment;

图3为本发明一实施例提供的数据处理方法流程示意图; Figure 3 of the present invention to provide a data processing method of flow diagram illustrating an example embodiment;

图4为本发明另一实施例提供的数据处理方法流程示意图; The data processing method embodiment provided by the flow of another embodiment of FIG. 4 is a schematic diagram;

图5-a为本发明一实施例提供的数据预处理器的结构示意图; Figure 5-a schematic diagram of the structure of the present invention to provide a data pre-processor of an embodiment;

图5-b为本发明另一实施例提供的数据预处理器的结构示意图; Embodiment provides a data structure of another embodiment of a preprocessor in FIG. 5-b a schematic view of the present invention;

图5-c为本发明另一实施例提供的数据预处理器的结构示意图; Figure 5-c of the present invention to provide a data structure diagram illustrating another embodiment of the pre-processor;

图5-d为本发明另一实施例提供的数据预处理器的结构示意图; Data structure provides a schematic illustration of another embodiment of a pre-processor of the present invention FIG. 5-d;

图6-a为本发明一实施例提供的数据拼接器的结构示意图; Figure 6-a schematic view of the data structure of the present invention to provide a splice an example embodiment;

图6-b为本发明另一实施例提供的数据拼接器的结构示意图; Figure 6-b schematic diagram of another embodiment of the present invention, the structure of the data provided by splice cases;

图6-c为本发明另一实施例提供的数据拼接器的结构示意图; Figure 6-c schematic diagram of another embodiment of the present invention, the structure of the data provided by splice cases;

图7为本发明一实施例提供的处理器的结构示意图; Figure 7 is a schematic diagram of the structure of the invention provides an embodiment of the processor;

图8-a为本发明一实施例提供的从节点设备的结构示意图; Figure 8-a structural diagram of the present invention from the node apparatus embodiment provides an implementation;

图8-b为本发明一实施例提供的从节点设备中CPU与GPU之间的交互示意图; Figure 8-b of the present invention schematic interaction between the CPU and GPU embodiment provides a slave device in one embodiment;

图9为本发明一实施例提供的数据处理设备的结构示意图。 Figure 9 provides a schematic diagram of the structure of the present invention, a data processing apparatus according to an embodiment.

具体实施方式 DETAILED DESCRIPTION

下面将结合本发明实施例的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。 Below with the drawings embodiments of the present invention, the present invention will be apparent case of technical implementation of the program, a complete description of, obviously, the described embodiments are only part of the embodiments of the present invention, but not all embodiments. 基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。 Based on the embodiments of the present invention, those of ordinary skill in making all other embodiments no creative effort obtained are within the scope of protection of the present invention.

本发明实施例提供了一种数据处理方法及相关设备,应用于MapReduce架构下的Hadoop集群,实现Hadoop从节点设备数据格式自动转换和数据自动拼接,简化程序员的编程工作,有利于后续优化MapReduce架构。 Embodiment of the invention provides a data processing method and related equipment, used in Hadoop clusters MapReduce architecture to achieve Hadoop node devices automatically switch from automatic stitching data formats and data, simplify their programming work, help to optimize subsequent MapReduce architecture.

如图1所示,本发明一方面提供一种数据处理方法,包括: 1, one aspect of the present invention to provide a data processing method, comprising:

S110、数据预处理器从CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址; S110, data preprocessor reads the metadata from the first buffer of the CPU; wherein when the first set of data stored in the buffer obtained from data fragmentation, at the head of the first buffer said data collection add metadata, the metadata comprising data of the data set in said first buffer memory address;

本发明实施例应用于MapReduce架构下的Hadoop集群,该Hadoop集群中包括主节点设备和从节点设备,从节点设备包括处理器CPU和图形处理器GPU,该从节点设备从主节点设备获取数据分片,而在CPU中设置有数据预处理器和数据拼接器。 Example applied to Hadoop MapReduce cluster architecture of the present invention, which includes a master node Hadoop cluster and slave node device, the slave device includes a processor CPU and graphics processor GPU, the slave device from the master device to obtain data points sheet, while the CPU is provided with data pre-processor and data splicer.

在CPU中分配第一缓冲区,用于存储从数据分片中获取的数据集合,而在数据集合存储进第一缓冲区时,则在第一缓冲区头部为数据集合添加元数据,元数据主要包括数据集合中的数据在该第一缓冲区中的存储地址。 CPU allocation in the first buffer for storing data slice from the acquired data sets, and in the collection of data stored in the first buffer, the buffer head in the first set to add metadata to data, metadata data includes data in the data set in the first buffer memory address.

S120、所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据; S120, the processor reads data pre-stored according to the address indicated by the meta data from the buffer of said first data set;

由于元数据包括有数据集合在第一缓冲区的存储地址,数据预处理器可以根据元数据的指示从第一缓冲区中直接读取数据集合的数据,而无需再依赖程序员编写额外的程序来读取数据。 Since the meta data includes data collection in the first buffer memory addresses, data pre-processor can read the data collection according to the instructions of metadata from the first buffer directly, without the need to rely on programmers to write additional programs to read the data.

S130、所述数据预处理器根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块后存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到GPU。 S130, the data processor according to a preset pre-analytic function to convert data into the data set the default format of analytic functions indicated, and the converted data set to generate a data block stored in the rear The second buffer of the CPU, so that the read data splicer to splice the data blocks from the second GPU buffer.

另外,在MapReduce架构中预设有解析函数,数据预处理器可以根据预设解析函数对第一缓冲区中的数据集合的数据进行解析,转换成预设解析函数对应的数据格式,然后将转换后的数据集合生成数据块。 In addition, the MapReduce architecture preset with analytic functions, data pre-processor can buffer data in the first set of data to parse analytic functions by default, converted into a preset analytic functions corresponding data format, and then convert After the data set to generate data blocks. 同时,CPU中还分配了第二缓冲区,用来存储数据块。 At the same time, CPU is also assigned a second buffer for storing data blocks. 数据拼接器则可以从第二缓冲区中读取数据块拼接到GPU中。 Data stitcher can be spliced to read blocks of data from the second buffer GPU.

本发明实施例中,由于在数据集合存储进CPU的第一缓冲区时,为数据集合添加了元数据,该元数据包括数据集合的数据在第一缓冲区的存储地址,因此,数据预处理器先从第一缓冲区中读取元数据后,根据元数据所指示的存储地址从第一缓冲区读取数据集合的数据,再利用预设解析函数转换数据的数据格式,将都转换格式后的数据集合生成数据块,存储到CPU中的第二缓冲区,进而实现由数据预处理器自动完成读取第一缓冲区的数据,和对数据进行解析的操作,无需再额外依赖于程序员的编程,为程序员提供更加完善的MapReduce架构,也有利于后续对MapReduce架构优化。 Embodiment of the invention, since the collection of data stored in the CPU of the first buffers for data collection added metadata, the metadata includes the data set in the first buffer memory addresses, and therefore, data preprocessing After the start of the first buffer is read the metadata, based on the data stored in the address indicated by the metadata buffer to read data from the first set, and then use the default data analytic function converts data formats are converted format After the data set to generate a data block stored in the CPU in the second buffer, so as to realize the data pre-processor automatically finished reading the first buffer of data, and manipulate the data parsing, no longer need to rely on additional procedures programmers for programmers to provide a more complete MapReduce architecture, but also conducive to follow-up on the MapReduce architecture optimization.

可以理解的是,在MapReduce架构中,指定一个映射Map函数,用来把输入的键值对映射成新的键值对;再指定并发的化简Reduce函数,用来保证所有映射的键值对中的每一个共享相同的键组。 You can understand that in the MapReduce architecture, specify a mapping Map function to the input of key mapping into a new key-value pair; then specify concurrent simplified Reduce function is used to ensure that all mapped key-value pairs each group share the same key. 而在Map函数把输入的键值对映射成新的键值对后,由Hadoop集群中的主节点设备,按照数据大小将所有新键值对划分成不同的数据分片,根据数据分片安排给每个从节点设备进行相应的运算处理。 In the Map function to enter into a new key-value mapping of key-value pairs, by the Hadoop cluster master node device, according to the data size will be the key to all the new data into different fragments, according to the data slice arrangement corresponding to each arithmetic processing slave device.

在从节点设备所在的CPU中,调用RecordReader类获取数据分片中的键 值对,并将键值对中的值提取出来拼接成数据集合。 In the CPU where the slave device, call RecordReader class to obtain data slice of key-value pairs, and the key value of the extracted data together into a collection. CPU为数据集合在其内存中分配DirectBuffer,数据集合以该DirectBuffer的格式要求存储进DirectBuffer中,其中,在数据集合存储进DirectBuffer时,会在DirectBuffer的头部为数据集合添加元数据。 CPU for the data set in its memory allocation DirectBuffer, collection of data to the format requirements DirectBuffer stored into DirectBuffer, where, in the collection of data stored in the DirectBuffer, will head DirectBuffer add metadata for the data set. 同时,在MapReduce架构中,预设有对数据集合的数据进行解析的预设解析函数,预设解析函数将数据具体转换成满足逻辑运算的指定数据格式。 Meanwhile, in the MapReduce architecture, is preset to parse the data collection analytic functions preset, the preset data specific analytic functions will be converted into logic operations to meet the specified data format. 而在CPU中设置有数据预处理器,由数据预处理器完成根据元数据从DirectBuffer读取数据,和通过预设解析函数自动实现数据格式的转换。 The data provided in the CPU pre-processor, the data processor performs the pre-read data based on metadata from DirectBuffer, and automatically convert data formats by default analytic functions. 具体地,下面将对图1提供的实施例作详细地说明,请参阅图2,一种数据处理方法,可包括: Specifically, the following will be provided in Figure 1 embodiment as described in detail, please refer to Figure 2, a data processing method, comprising:

S210、数据预处理器从DirectBuffer读取元数据,其中,所述元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述DirectBuffer中的存储地址; S210, the data read from the pre-processor DirectBuffer metadata, wherein the metadata specifically include address indexed array indexed array that contains the address data correspond to the data set of data elements, the data element for indicating the storage address of the data in the data set DirectBuffer;

具体地,在存储数据集合进DirectBuffer时,在DirectBuffer的头部添加元数据,用来指示所述数据集合中的数据在所述DirectBuffer中的存储地址。 Specifically, in the storage data set into DirectBuffer, at the head of DirectBuffer add metadata for data indicating that the data collection in the DirectBuffer storage address. 可以理解的是,该元数据可以包括一个地址索引数组,在数据集合存储进DirectBuffer时,根据数据集合中的数据在DirectBuffer的位置,将数据的存储地址添加进地址索引数组中。 To be understood that the metadata may include an address indexed array, when the data set stored in the DirectBuffer, according to the data in the data set position DirectBuffer will store the address data added to the address of an indexed array. 该地址索引数组具有与数据集合中数据一一对应的数据元素,数据元素指示数据集合的数据在DirectBuffer中的存储地址。 The address index array with data elements and data correspond to the data collection, data element indicates the data set stored in DirectBuffer address. 通常来讲,存储进DirectBuffer的数据集合中的数据都是同一种数据格式,可以是文本格式或者二进制等不可进行逻辑运算的格式。 Generally, the data stored in the data collection DirectBuffer are the same data format, can not be carried out logic operation text format or binary format.

S220、所述数据预处理器根据所述元数据中地址索引数组的数据元素,从DirectBuffer中读取数据集合中的数据; S220, the data pre-processor based on the data element address index the metadata array, data is read from the data set in DirectBuffer;

具体地,该数据预处理器根据地址索引数组中数据元素所指示的存储地址,在DirectBuffer中从该存储地址开始读取相应的数据,直到下一个数据元素所指示的存储地址或者是读取到DirectBuffer末端时结束,读取到数据集合的一个数据,然后继续读取下一个数据,直到将DirectBuffer中数据集合的数据读取完为止。 Specifically, the data pre-processor based on the stored address Address indexed array of data elements indicated in DirectBuffer start reading the corresponding data from the memory address, the memory address until the next data element is indicated or read DirectBuffer end of the end of the reading to a data collection, and then continue to read the next data, the data in the data set until DirectBuffer read last.

S230、所述数据预处理器根据预设解析函数,将所述数据集合的数据转 换成所述预设解析函数指定的满足逻辑运算的数据格式; S230, the data processor according to a preset pre-analytic function to convert data into the data set of analytic functions of the preset logical operation to meet specified data format;

存储进DirectBuffer的数据集合中的数据一般是不可进行逻辑运算的数据格式,而在传送给GPU进行逻辑运算之前,需要将其转换成可进行逻辑运算的格式。 Data collection is stored into DirectBuffer of the general data format is not logical operation and, before transferring to the GPU for logical operations, it needs to be converted into a logic operation format. 因此,在MapReduce架构中预设解析函数,数据预处理器根据预设解析函数自动实现数据格式转换,转换成解析函数所指定的满足逻辑运算的数据格式。 Thus, the default MapReduce architecture in analytic functions, data pre-processor automatically according to the preset data format conversion analytic functions, converted into data format specified by the analytic function satisfy a logical operation.

可选地,预设解析函数所指定的数据格式,可以是GPU逻辑运算时所要求的数据格式。 Alternatively, the pre-analytic functions specified data format, data format GPU logic operation when required. 具体地,该预设解析函数指定的可进行逻辑运算的数据格式,可以是整形数据、浮点型数据、字符串数据等。 Specifically, the pre-analytic functions can be specified logical operations data format, data can be plastic, floating-point data, string data.

S240、所述预处理器将转换数据格式后的数据集合生成数据块; S240, the pre-processor converts the data format set after generating block;

在数据预处理器根据预设解析函数,自动将每个数据都转换成预设解析函数指定的可进行逻辑运算的数据格式后,为了方便后续CPU与GPU之间数据的拼接,将转换数据格式后的数据集合生成数据块。 After the pre-processor based on the preset data analytic function, data is automatically converted into a preset each analytic functions can be specified logical operations data format, in order to facilitate the subsequent splicing between the CPU and the GPU data, converts the data format After the data set to generate data blocks.

S250、所述预处理器将所述数据块存储到LaunchingBuffer中,以使得数据拼接器从所述LaunchingBuffer中读取所述数据块拼接到GPU。 S250, the pre-processor to the data block stored in the LaunchingBuffer in order to make the data splicer spliced to read the data block in the GPU from LaunchingBuffer.

具体地,CPU还在内存中分配LaunchingBuffer来暂时存储数据格式转换后的数据块,其中,数据预处理器将数据块存储到LaunchingBuffer中,之后由数据拼接器完成从该LaunchingBuffer读取数据块拼接到GPU。 Specifically, CPU also allocated memory block LaunchingBuffer to temporarily store data after format conversion, wherein the pre-processor data stored in the data block LaunchingBuffer, after which the splice is completed by the data read from the data blocks to the spliced LaunchingBuffer GPU.

可以理解的是,CPU的DirectBuffer中存储的数据和GPU要处理的数据在存储格式上可能不一致,即对大小尾数问题的处理上不一致,其中,小尾数存储格式指的是数据的高位存储在高地址中,数据的低位存储在低地址中;大尾数存储格式指的是数据的高位存储在低地址中,数据的地位存储在高地址中。 To be understood that the data stored in the CPU's DirectBuffer data and GPU to be processed on the storage format may be inconsistent, that inconsistency on the issue of the size of the mantissa of the process in which little-endian storage format refers to the data stored in the high high address, the lower data stored in low address; big-endian storage format refers to the data stored in the high low address, the status of the data stored in the high-address. 因此,数据预处理器还需要解决数据块的大小尾数问题。 Thus, the pre-processor data also need to address the problem of data block size of the mantissa.

在CPU分配的DirectBuffer中自带有成员变量,该成员变量指示数据在该DirectBuffer中是以大尾格式还是小尾格式存储,同样也有指示是否在存储进LaunchingBuffer时需要转换存储格式,并给出需要转换成大尾格式还是小尾格式的提示。 In the CPU allocation comes DirectBuffer members variable, the member variable indicates the end of data is large or small format store format in the DirectBuffer the end, there are also indicates whether when stored into LaunchingBuffer need to convert the storage format, and to give the required conversion Cheng Mei format or suggest small tail format. 例如,数据集合中的数据以大尾格式存储在DirectBuffer中,而GPU对于数据的存储,却是以小尾格式存储,则在将数据块存储到LaunchingBuffer时,将数据块的数据以小尾数存储格式存储在 LaunchingBuffer中。 For example, the data in the data set is stored in large format DirectBuffer the tail, while the GPU for data storage, but are stored in little-endian format, the data blocks stored in the LaunchingBuffer, the data blocks are stored in little-endian format stored in LaunchingBuffer in. 之后,数据拼接器则可以直接从LaunchingBuffer中读取该数据块拼接到GPU,保证CPU的LaunchingBuffer和GPU对数据的存储格式一致,保证GPU能够正确读取数据块进行运算处理,避免将数据高位读成低位,或将数据地位读成高位导致运算错误。 Afterwards, the data can be read directly stitcher in the data block from LaunchingBuffer spliced to GPU, ensure consistent data storage format of LaunchingBuffer CPU and GPU, GPU assurance can correctly read the data block processing, is to avoid high data read a low, or the data is read into the high position result in operation errors.

在本发明实施例中,通过数据预处理器先从DirectBuffer中读取到地址索引数组,根据地址索引数组中的数据元素从DirectBuffer中读取到对应的数据集合中的数据,之后,根据预设解析函数实现对数据集合中的数据进行数据格式转换,使得数据格式转换后的数据能够满足逻辑运算。 In the present embodiment of the invention, the data read from the pre-processor to start DirectBuffer to address the array index, read the address of an indexed array of data elements from DirectBuffer the data corresponding to the data set, and then, by default analytic functions for data collection to data format conversion, making the data format converted to meet the logical operations. 将数据集合生成数据块存储到LaunchingBuffer中,由数据拼接器从LaunchingBuffer中读取数据块传送给GPU。 The collection of data generated in the data block stored LaunchingBuffer by data splicer block transfer read data to the GPU from LaunchingBuffer. 本发明实施例由CPU中的数据预处理器独自完成,通过预设解析函数实现对数据自动解析,为GPU对数据块的运算提供方便,利用数据预处理器简化从节点设备的编程工作,有利于以后优化。 Embodiments of the present invention is performed by the CPU alone data pre-processor, programmable analytic functions to achieve automatic data analysis, providing convenience for GPU computing data block, using data from the pre-processor node to simplify the programming device, there are conducive to future optimization.

CPU自动分配和回收WorkingBuffer和LaunchingBuffer,其中,一个WorkingBuffer的生存周期为一个数据分片的处理时间,一个LaunchingBuffer的生存周期是处理一个数据集合的时间。 CPU automatic allocation and deallocation WorkingBuffer and LaunchingBuffer, where a WorkingBuffer life cycle of a data processing time slice, a LaunchingBuffer life cycle is time to deal with a collection of data. 另外,CPU上还分配有ResultBuffer,用来存储GPU运算后返回的运算结果,之后该运算结果作为MapReduce中Reduce任务的输入。 In addition, the assigned CPU ResultBuffer, stores the result GPU used after the operation returned as a result of the operation after the Reduce input MapReduce tasks.

如图3所示,本发明实施例另一方面提供一种数据处理方法,包括: As shown in Figure 3, the embodiment of another aspect of the present invention to provide a data processing method, comprising:

S310、数据拼接器从CPU的第二缓冲区读取数据预处理器生成的数据块; S310, data registration data from preprocessor reads data blocks from the second buffer of the CPU;

本发明实施例应用于MapReduce架构下的Hadoop集群,该Hadoop集群中包括主节点设备和从节点设备,从节点设备包括处理器CPU和图形处理器GPU,该从节点设备从主节点设备获取数据分片,而在CPU中设置有数据预处理器和数据拼接器。 Example applied to Hadoop MapReduce cluster architecture of the present invention, which includes a master node Hadoop cluster and slave node device, the slave device includes a processor CPU and graphics processor GPU, the slave device from the master device to obtain data points sheet, while the CPU is provided with data pre-processor and data splicer.

其中,数据预处理器用于完成从CPU第一缓冲区读取数据集合的数据,将数据转换数据格式后,将这个数据集合生成数据块存储到第二缓冲区。 Among them, the data read from the pre-processor for the CPU to complete the first set of data buffer data, the data after data format conversion, this data collection will generate the data block stored in the second buffer. 而数据拼接器主要完成将数据块从CPU拼接到GPU。 The data stitcher completed the data block from the main CPU spliced to GPU.

S320、所述数据拼接器将所述数据块拼接到GPU中被分配存储数据块的工作缓冲区。 S320, the data splicer spliced to the GPU block is allocated memory block job buffer.

本发明实施例中数据拼接器从CPU第二缓冲区读取数据块,将数据块从CPU的第二缓冲区拼接到GPU的工作缓冲区。 Data splice embodiment of the present invention to read data blocks from the CPU of the second buffer, the data from the second splicing block buffer of the CPU to the GPU working buffer. 本发明实施例中由数据拼接器完成数据的拼接,不再依赖于程序员的编程,从而简化了程序员的编程工作,还能有利于整个MapReduce架构后续的优化工作。 Example embodiments of the present invention is stitching is done by the data splice data, no longer dependent on their programming, which simplifies their programming work, but also benefit the entire MapReduce architecture subsequent optimization.

下面将对图3所提供的实施例作详细介绍,如图4所示,一种数据处理方法,可包括: The following will be provided in the embodiment of Figure 3 for detailed description, a data processing method shown in Figure 4, may include:

S410、所述数据拼接器从LaunchingBuffer读取数据块; S410, the data read data blocks from the splice LaunchingBuffer;

CPU在内存中还分配有LaunchingBuffer,主要用来存储需要拼接到GPU的数据块。 CPU memory is also assigned LaunchingBuffer, mainly used to store data to the GPU requires splicing blocks.

S420、所述数据拼接器从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的WorkingBuffer中可用于存储数据块的起始地址; S420, the data began to splice splicer from the start address of the data block cursor parameters indicated by the cursor parameter is used to instruct the GPU start address is allocated memory block of WorkingBuffer can be used to store data blocks ;

GPU内存中分配有WorkingBuffer,主要用来存储从CPU的LaunchingBuffer拼接过来的数据;该WorkingBuffer的内存大小由GPU自身决定,而在CPU中DirectBuffer的内存大小由java的运行环境所决定。 GPU memory allocated WorkingBuffer, mainly used to store the CPU of LaunchingBuffer stitching over the data; the WorkingBuffer memory size is determined by the GPU itself, but in the CPU DirectBuffer memory size is determined by the java runtime environment. 通常来说,GPU上的WorkingBuffer的内存大小要远远大于CPU中由java支持的DirectBuffer的内存,因此,WorkingBuffer可能存储至少一个从DirectBuffer得来的数据块,而在存储到某一个数据块时,WorkingBuffer的剩余内存可能无法再继续存储数据块,将由数据拼接器对该数据块作出正确处理。 Generally speaking, the memory size of the GPU WorkingBuffer much larger than the memory used by the CPU in DirectBuffer java support, therefore, WorkingBuffer may store at least one data derived from DirectBuffer blocks and storing the data to a certain block, WorkingBuffer remaining memory may not be able to continue to store data blocks by the data splicer to correctly handle the data block.

具体地,数据拼接器管理着一个游标参数,游标参数指示WorkingBuffer可存储数据的起始地址,在每一次将数据块拼接到WorkingBuffer后,就相应更新游标参数,以便下次能够准确知道WorkingBuffer可以存储数据的起始地址。 Specifically, the data splicer manages a cursor parameter, the cursor parameter indicates WorkingBuffer can start address data is stored in a data block spliced WorkingBuffer each, the parameters will be updated accordingly cursor to the next can be stored exactly know WorkingBuffer the starting address of the data. 在需要将数据块传送到WorkingBuffer时,从游标参数指示的起始地址开始,将数据块拼接到WorkingBuffer。 When the need for data block transfer to WorkingBuffer, cursor from the starting address of the parameter indicates the data block spliced WorkingBuffer.

S430、当所述数据拼接器将所述数据块拼接到WorkingBuffer失败时,则暂停拼接所述数据块,并触发所述GPU处理WorkingBuffer存储的数据块。 S430, when the data splicer to splice the data block when WorkingBuffer fails to, then pause splice the data block, and to trigger the GPU processing WorkingBuffer stored data blocks.

其中,数据拼接器从LaunchingBuffer读取到的数据块中的数据是可以直接进行逻辑运算的,且满足GPU对数据的存储格式要求。 Among them, the data read from LaunchingBuffer to splice the data blocks can be directly logical operations, and the GPU meet the requirements for data storage format. 调用应用程序接口(API,Application Programming Interface)将所述数据块中的数据拼接到 WorkingBuffer。 Calling application program interface (API, Application Programming Interface) of the splicing said data blocks to WorkingBuffer. 若WorkingBuffer的剩余内存能够拼接完从CPU的LaunchingBuffer读取的数据块,则将整个数据块都拼接到WorkingBuffer中;如果WorkingBuffer的剩余内存不能拼接完从CPU的LaunchingBuffer读取的数据块,那么将暂停拼接该数据块,数据块仍然保存在LaunchingBuffer中,另外触发GPU开始对WorkingBuffer中的所有数据块进行运算处理。 If the remaining memory can be spliced WorkingBuffer complete data blocks read from the CPU LaunchingBuffer, then the entire data block is spliced to WorkingBuffer; if not splice the remaining memory WorkingBuffer complete data blocks read from the CPU LaunchingBuffer, it will be suspended splice the data block, the block remains in LaunchingBuffer, additional trigger GPU began WorkingBuffer all data blocks arithmetic processing.

本发明实施例中,CPU中的数据拼接器,用于解决CPU中DirectBuffer和GPU中的WorkingBuffer剩余内存大小不一致时的数据块拼接问题。 Embodiments of the present invention embodiment, CPU data splicer for splicing blocks to solve the problem in DirectBuffer CPU and GPU in WorkingBuffer inconsistent when remaining memory size. 数据拼接器通过将数据块从LaunchingBuffer中直接拼接到WorkingBuffer中,如果WorkingBuffer剩余内存大小不能满足存储数据块的时候,暂时停止该次拼接操作,若再下次WorkingBuffer的剩余内存能够拼接完时,再次LaunchingBuffer中读取数据块拼接到WorkingBuffer中。 The data blocks are spliced from LaunchingBuffer directly to WorkingBuffer, if WorkingBuffer remaining memory size can not meet the data storage block, temporarily stop the second splicing operation, if more next WorkingBuffer the remaining memory can be spliced by splicer complete data again LaunchingBuffer read data blocks to WorkingBuffer in splicing. 由于在LaunchingBuffer的数据块已经符合GPU对数据处理时的需要,GPU接收到数据块之后则可以直接进行运算处理,有效提高GPU的工作效率。 Since LaunchingBuffer data block has to meet the needs of the data processing GPU, the GPU receives the data block after the operation can be directly processed, effectively improve the efficiency of the GPU.

可以理解的是,数据拼接器在传送数据块成功之后,还执行以下步骤: It is appreciated that the data after the splicer transport blocks successfully, perform the following further steps:

B1、所述数据拼接器通知所述GPU所述数据块的大小; B1, the data splicer notify the GPU of the size of the data block;

B2、所述数据拼接器更新所述游标参数。 B2, the data splicer updating the cursor parameters.

其中,数据拼接器在每一次将数据块成功拼接到GPU后,都将数据块大小通知到GPU,GPU能够直接使用,无需再计算数据块大小,能够减少GPU的工作量。 Wherein the data each time the splicer block successfully spliced to the GPU, the data block size can be notified to the GPU, GPU can be used directly, without recalculation of the data block size, able to reduce the workload of the GPU.

另外,与在上述CPU的DirectBuffer中通过地址索引数组来指示数据在DirectBuffer的存储地址相同,GPU也可以在WorkingBuffer头部为数据块添加一个查找索引数组,查找索引数组包含有与所述数据块的数据一一对应的数据元素,数据元素用来指示数据在WorkingBuffer的存储地址。 Further, in the above-mentioned CPU of DirectBuffer to indicate address array index data in the same memory address DirectBuffer, GPU can also add a search index array WorkingBuffer head for the data block, find the index of the array contains the data block The data correspond to the data element, the data element is used to indicate the data stored in WorkingBuffer address. 在数据拼接器拼接过来一个数据块后,即在查找索引数组中添加对应该数据块的每一个数据的数据元素,以便后续GPU快速从WorkingBuffer中找到数据并读取数据进行运算。 After the data stitcher stitching over a data block, that should add to each data block of the data elements in the array to find the index for subsequent GPU to quickly find the data and read data from WorkingBuffer operated.

上述该步骤B1和B2不分先后,在此不作限定。 B1 and B2 of the steps described above in no particular order, this will not be limited.

由于在CPU中,接收到的每一个数据分片可能最后生成多个数据块,GPU中分配的WorkingBuffer以数据块为单位进行存储,其生存周期为处理完一个 数据分片的时间。 Since the CPU, the received data for each fragment may eventually generate a plurality of data blocks, GPU WorkingBuffer allocated to data storage block units, its life cycle is finished with a data time slice. 在数据拼接器将整个数据分片都传送成功之后,数据拼接器则返回传送成功的标志值,以便通知主节点设备分配下一个数据分片;在数据拼接器传送数据分片失败后,则返回传送失败的标志值,以便通知主节点设备暂停分配下一个数据分片。 After the data stitcher entire data fragments have been transferred successfully, the value of the flag data transfer splicer successful return, in order to inform the main device is assigned a node data slice; splice after data transferring data fragmentation fails, it returns transmission failure flag value, in order to inform the main node device to pause assign a data slice.

另外,在GPU内存中同样分配ResultBuffer,该ResultBuffer用来保存运算后的结果,之后调用API接口,将该运算结果返回CPU并存储在CPU分配的ResultBuffer当中,作为MapReduce下的Reduce任务的输入。 In addition, the same GPU memory allocation ResultBuffer, the ResultBuffer to save the results after the operation, after the call API interface, the calculated result is returned and stored in the CPU CPU allocated among ResultBuffer as MapReduce tasks under Reduce input.

CPU中用来存储数据集合的DirectBuffer、存储数据格式转换后的数据块的LaunchingBuffer和用来存储GPU返回的运算结果的ResultBuffer都是由CPU自动分配和回收,其中,LaunchingBuffer的生存周期是一个数据块的处理时间;GPU中用来存储接收的数据块的WorkingBuffer和存储运算结果的ResultBuffer都是由GPU自动分配和回收,其中,WorkingBuffer的生存周期是一个数据分片的处理时间;ResultBuffer的生存周期与WorkingBuffer的生存周期一样。 CPU used to store data collection DirectBuffer, LaunchingBuffer block storage data format converted and used to store the result of the operation from the GPU ResultBuffer are assigned automatically by the CPU and recycling, which, LaunchingBuffer life cycle is a data block processing time; GPU for storing the received data blocks and storing operation results WorkingBuffer ResultBuffer are assigned automatically by the GPU and recovery, which, WorkingBuffer life cycle is a data processing time slice; and lifetime ResultBuffer WorkingBuffer the same life cycle. CPU和GPU中buffer自动实现同步,例如CPU中ResultBuffer与GPU中WorkingBuffer、ResultBuffer实现分配和回收同步。 CPU and GPU in the buffer is automatically synchronized, such as the CPU and GPU in ResultBuffer WorkingBuffer, ResultBuffer achieve distribution and recovery of synchronization.

如图5-a所示,本发明实施例还提供一种数据预处理器500,可包括: As shown in Figure 5-a, embodiments of the present invention also provides a data pre-processor 500, may include:

第一读取单元510,用于从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址; First reading unit 510 for reading metadata from the first buffer of the CPU; wherein, when the data collection is stored into the first buffer from data acquired slice in the first buffer the head of the data area is set to add metadata, the metadata comprising data of the data set in the address of the first buffer memory;

第二读取单元520,用于根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块; The second reading unit 520 for analytic functions by default, to convert the data into the data set the preset data format parsing function indicated, and the converted data set to generate a data block;

转换单元530,用于对所述数据进行解析,并将解析后的数据集合生成数据块; Conversion unit 530, for the data analysis, and data collection to generate the parsed data block;

存储单元540,用于将所述数据块存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。 A storage unit 540 for the data block stored in said second buffer of the CPU, so that the read data splicer to splice the data blocks from the second buffer the GPU.

本发明实施例应用于MapReduce架构下的Hadoop集群,该数据预处理器500设置在Hadoop集群下的从节点设备的CPU中,其中,该CPU还设置 有数据拼接器,且每个从节点设备还GPU,从节点设备从Hadoop集群的主节点设备获取数据分片,然后将数据分片中的键值对中的值拼接成数据集合存储进CPU内存中分配的第一缓冲区,由于第一缓冲区的内存可能无法将数据分片中所有键值对的值一次存储完,因此,数据分片中键值对的值可以分多次拼接成数据集合。 Embodiment of the invention applied to Hadoop MapReduce cluster architecture, the data pre-processor 500 is provided under Hadoop cluster node device from the CPU, in which the CPU is also provided with data splicer, and each slave device also GPU, obtain the slave device from the master node Hadoop cluster device data fragmentation, then the data points of the first film's key buffer to the value spliced into data sets stored in the CPU memory allocation, since the first buffer memory area may not be able to film all the key data points on the value of a storage end, therefore, the value of the data slice of key-value pairs can be split into several splice into data collection.

在数据集合存储进第一缓冲区时,在第一缓冲区的头部为数据集合添加元数据,该元数据主要包括数据集合的数据在第一缓冲区中的存储地址。 In the first set of data stored in the buffer, the buffer zone at the head of the first set to add to the metadata, the metadata includes the data set in the first buffer memory address. 之后,由第一读取单元510从第一缓冲区读取元数据,然后第二读取单元520根据元数据所指示的存储地址从第一缓冲区读取数据集合中的数据,再由转换单元530对数据进行数据格式转换,并将格式转换后的整个数据集合生成数据块,存储单元540将数据块存储到CPU的第二缓冲区,第二缓冲区主要是CPU在内存中分配用来存储数据块,以便数据拼接器能够从第二缓冲区读取数据块传送到GPU的工作缓冲区。 Thereafter, the first reading unit 510 reads the metadata from the first buffer and the second reading unit 520 reads the data from the data collection according to a first buffer memory address indicated by the metadata, and then by converting unit 530 converts the data format of the data, and the entire data set is generated after the format conversion block, the block storage unit 540 to store the data to a second buffer of the CPU, the second CPU in the buffer memory is mainly allocated to The memory block so that the data can be read splicer block transfer of data to the GPU working buffer from the second buffer. 本发明实施例中,由数据预处理器自动完成数据读取和对数据格式的转换,无需程序员再编写相应的程序,减少了程序员的编程工作,更加有利于后续优化M,apReduce架构,提高CPU的工作效率。 Embodiments of the present invention, the data pre-processor automates data read and data format conversion, without programmer and then write the corresponding procedures, reducing their programming work, more conducive to subsequent optimization M, apReduce architecture, improve the efficiency of the CPU.

进一步地,元数据具体包括地址索引数组,所述地址索引数组包含有与所述数据集合的数据一一对应的数据元素,所述数据元素用于指示所述数据集合的数据在所述第一缓冲区的存储地址,进而如图5-b所示,上述第二读取单元520可包括: Further, the metadata specifically including addresses indexed array indexed array that contains the address data correspond to the data set of data elements, the data element is used to indicate the data set of data in the first buffer storage address, and then as shown in FIG. 5-b, the second reading unit 520 may include:

数据读取单元5210,用于从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。 Data reading unit 5210, the data element is used to read from the array index indicating the address of the first buffer memory address until the storage address of the next instruction or data element to the first end of the end of the read buffer data.

具体地,数据读取单元5210根据地址索引数组中数据元素所指示的存储地址,在第一缓冲区中从该存储地址开始读取相应的数据,直到下一个数据元素所指示的存储地址或者是读取到第一缓冲区末端时结束,读取到数据集合的一个数据,然后继续读取下一个数据,直到将第一缓冲区中数据集合的数据读取完为止。 Specifically, the data reading unit 5210 in accordance with the address stored in the address indexed array of data elements indicated in the first buffer to start reading the corresponding data from the memory address, the memory address until the next data element is indicated or reading to the end of the first buffer end, read to a data collection, and then continue to read the next data until the data in the data of the first set of buffers read last.

如图5-c所示,上述解析单元530包括: As shown in FIG. 5-c, 530 of the analysis unit comprises:

数据格式转换单元5310,用于通过预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式; Data format conversion unit 5310, for parsing by a preset function to convert the data into a data format of data set to meet the specified logical function operation of said parsing;

生成单元5320,用于将转换的数据集合生成数据块。 Generating unit 5320 for converting the data set to generate a data block.

在MapReduce架构中,预设解析函数所指定的数据格式,可以是GPU逻辑运算时所要求的数据格式。 In the MapReduce architecture, pre-analytic functions specified data format, data format GPU logic operation when required. 具体地,该预设解析函数指定的可进行逻辑运算的数据格式,可以是整形数据、浮点型数据、字符串数据等。 Specifically, the pre-analytic functions can be specified logical operations data format, data can be plastic, floating-point data, string data.

如图5-d所示,上述解析单元530还可以包括: As shown in FIG. 5-d, of the analysis unit 530 may further comprise:

格式转换单元5330,用于当所述第一缓冲区对所述数据集合的数据的存储格式与所述GPU中对数据的存储格式不一致时,将所述数据块中的数据转换成所述GPU中的存储格式。 Format conversion unit 5330, when the storage format for the GPU said first buffer to said data set stored in the data format of the data is inconsistent, to convert the data into the data block in the GPU The storage format.

CPU的第一缓冲区和GPU对数据的存储格式上可能要求不一致,即在大小尾数问题的处理上不一致,其中,小尾数存储格式指的是数据的高位存储在高地址中,数据的低位存储在低地址中;大尾数存储格式指的是数据的高位存储在低地址中,数据的地位存储在高地址中。 The first buffer of the CPU and GPU may be required for the storage format of the data are inconsistent, that inconsistency in dealing with the issue of the size of the mantissa, which refers to the little-endian storage format in the high address, low high data storage for storing data In the low address; big-endian storage format refers to the data stored in the high low address, the status of the data stored in the high-address.

在CPU分配的第一缓冲区中自带有成员变量,该成员变量指示数据在该第一缓冲区中是以大尾格式还是小尾格式存储,同样也有指示是否在存储进第二缓冲区时需要转换存储格式,并给出需要转换成大尾格式还是小尾格式的提示。 In the first buffer allocated CPU comes with member variables, the member variable indicates the end of data is large or small format store format in the first end of the buffer, there are also indicates whether when stored in the second buffer conversion storage format, and to give the required format or converted to upper tail tips small tail format. 例如,数据集合中的数据以大尾格式存储在第一缓冲区,而GPU对于数据的存储,却是以小尾格式存储,格式转换单元5330则将数据块转换成小尾格式,存储在第二缓冲区。 For example, data collection format to store large tail in the first buffer, and GPU for data storage, but are stored in little-endian format, the format conversion unit 5330 then the data blocks into small tail format, in the second buffer memory area. 之后,数据拼接器则可以直接从第二缓冲区中读取该数据块拼接到GPU,保证CPU的第二缓冲区和GPU对数据的存储格式一致,保证GPU能够正确读取数据块进行运算处理,避免将数据高位读成低位,或将数据地位读成高位导致运算错误。 Afterwards, the data can be read directly stitcher from the second buffer the data block spliced to GPU, ensure the second buffer of the CPU and GPU consistent data storage format, ensuring GPU can correctly read data blocks arithmetic processing to avoid the high data read into low or high data status read as cause operation errors.

如图6-a所示,本发明实施例还提供一种数据拼接器600,可包括: 6-a, the embodiment of the present invention further provides a FIG data splicer 600, may include:

第三读取单元610,用于从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块; The third reading unit 610 for reading data pre-processor generates said data blocks from said second buffer of the CPU;

拼接处理单元620,用于将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。 Stitching processing unit 620, the data blocks for splicing is allocated to the GPU work memory block buffer.

本发明实施例应用于MapReduce架构下的Hadoop集群,该数据拼接器600设置在Hadoop集群下的从节点设备的CPU中,其中,该CPU还设置有如图5-a所示的数据预处理器500,且每个从节点设备还包括GPU,从节点设备从Hadoop集群的主节点设备获取数据分片,然后将数据分片中的键值对中的值拼接成数据集合存储进CPU内存中分配的第一缓冲区,由于第一缓冲区的内存可能无法将数据分片中所有键值对的值一次存储完,因此,数据分片中键值对的值可以分多次拼接成数据集合。 Data pre-processor embodiment of the invention applied to Hadoop MapReduce cluster architecture, the data splicer 600 is provided in the Hadoop cluster node device from the CPU, wherein the CPU is also provided as shown in Figure 5-a 500 and each slave device further includes a GPU, to obtain data slice from the primary node device node device from the Hadoop cluster, then the data slice of key value pairs spliced into data sets stored in the CPU memory allocation The first buffer, since the first buffer memory may not be the data slice all key-value pairs stored once finished, so the value of the data slice of key-value pairs can be split into several splice into data collection.

数据预处理器500根据元数据从第一缓冲区读取数据,然后将数据格式进行转换,再将转换数据格式后的整个数据集合生成数据块,存储到CPU中的第二缓冲区中,之后,则由数据拼接器的第三读取单元610从CPU的第二缓冲区中读取数据块,由拼接处理单元620将读取的数据块拼接到GPU中被分配存储数据块的工作缓冲区中。 Based on meta data pre-processor 500 reads data from the first data buffer, and then convert the data format, and then converts the data format of the entire data set is generated after the data blocks stored in the second CPU in the buffer, then , by splicing the third data reading unit 610 reads data blocks from the second buffer of the CPU, the read processing unit 620 by the splicing block spliced to GPU memory blocks are allocated work buffer in.

其中,在从节点设备的CPU中,由数据预处理器500完成数据格式转换,而由数据拼接器完成数据块拼接,不再依赖于程序员编写相应的程序,能够简化程序员的编程工作,而且通过数据预处理器500和数据拼接器自动操作,能够提高CPU的工作效率,也有利于后续对MapReduce的优化。 Wherein the CPU node device, by the data pre-processor 500 completes the data format conversion, and the data is completed splice block splicing, no longer dependent on the programmer to write the appropriate procedures to simplify their programming work, and automatic operation through data pre-processor 500 and the splice, can improve the efficiency of the CPU, but also conducive to the subsequent optimization of MapReduce.

数据拼接器600管理着一个游标参数,游标参数指示GPU的工作缓冲区可存储数据的起始地址,在每一次将数据块拼接到GPU的工作缓冲区后,就相应更新游标参数,以便下次能够准确知道GPU的工作缓冲区可以存储数据的起始地址。 Data splicer 600 manages a cursor parameter, the cursor parameter indicates GPU work can start address of the buffer for storing data in a data block is spliced to working buffer per GPU, the cursor parameters will be updated for the next GPU can accurately know the starting address work buffer can store data. 在需要将数据块传送到GPU的工作缓冲区时,拼接处理单元620根据游标参数指示的起始地址,将数据块拼接到GPU的工作缓冲区。 When you need to block transfer of work to the GPU buffers, mosaic processing unit 620 according to the parameters indicated by the start address of the cursor, the data block spliced GPU working buffer.

因此,上述拼接处理单元620具体用于从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。 Thus, the above stitching processing unit 620 is specifically configured to begin splicing from the start address of the data block cursor parameters indicated, the cursor parameter is used to indicate the GPU is allocated memory block of work that can be used to store data buffers the starting address of the block.

如图6-b所示,上述数据拼接器还包括: As shown in FIG. 6-b, the above-described data splicer further comprises:

触发处理单元630,用于当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。 Trigger processing unit 630, when the data for the data block splicer spliced to the GPU is allocated memory block job buffer fails, the data block is suspended splicing, and triggering the GPU processing the data blocks stored in the buffer work.

数据拼接器600的第三读取单元610从第二缓冲区读取到的数据块中的 数据是可以直接进行逻辑运算的,且满足GPU对数据的存储格式要求。 Data splicer third reading unit 610 reads from the second buffer 600 to the data blocks can be directly logical operations, and the GPU meet the requirements for data storage format. 调用API将所述数据块中的数据拼接到GPU的工作缓冲区。 Call the API of the splicing said data blocks to the GPU working buffer. 若GPU的工作缓冲区的剩余内存能够拼接完从CPU的第二缓冲区读取的数据块,则将整个数据块都拼接到GPU的工作缓冲区中;如果GPU的工作缓冲区的剩余内存不能拼接完从CPU的第二缓冲区读取的数据块,即拼接数据块失败时,那么将暂停拼接该数据块,数据块仍然保存在第二缓冲区中,另外触发处理单元630触发GPU开始对工作缓冲区中的所有数据块进行运算处理。 If the GPU work remaining memory buffer can splice complete data blocks read from the second buffer of the CPU, the entire data block is spliced to the GPU working buffer; the remaining memory buffer if the GPU can not work After stitching data blocks read from the second buffer of the CPU that when splicing block fails, then the data will be suspended splicing block, data block remains in the second buffer, another trigger trigger GPU processing unit 630 starts Work buffer all data blocks arithmetic processing.

进一步地,如图6-c所示,上述数据拼接器600还可以包括: Further, as shown in FIG. 6-c, said data registration unit 600 may further comprise:

通知单元640,用于通知所述GPU所述数据块的大小; A notification unit 640 for notifying the GPU of the size of the data block;

更新单元650,用于更新所述游标参数。 Updating unit 650 for updating the cursor parameters.

在每一次将数据块成功拼接到GPU后,通知单元640都将数据块大小通知到GPU,GPU能够直接使用,无需再计算数据块大小,能够减少GPU的工作量。 Each time the data block is successfully spliced to the GPU, the notification unit 640 notifies all the data block size to the GPU, GPU can be used directly, without recalculation of the data block size, able to reduce the workload of the GPU. 另外,由更新单元650更加游标参数。 Further, the parameter update unit 650 more cursors.

如图7所示,本发明实施例提供一种处理器700,包括如图5-a所示的数据预处理器500和如图6-a所示的数据拼接器600,具体可以参考上述对数据预处理器500和数据拼接器600的介绍,在此不作赘述。 7, the embodiment of the present invention to provide a processor 700 as shown, includes a data pre-processor as shown in Figure 5-a and 500 shown in FIG. 6-a data splicer 600, the above-described specific reference to data pre-processor 500 and the data presentation splicer 600, which I will not repeat them here.

其中,在CPU中自动分配和回收第一缓冲区和第二缓冲区,第一缓冲区的生存周期为一个数据分片的处理时间,第二缓冲区的生存周期为一个数据块的处理时间;同样,在GPU中也自动分配工作缓冲区,该工作缓冲区的生存时间为上一个数据分片的处理时间。 Wherein the automatic dispensing and recycling buffer and a second buffer in the first CPU, the first buffer for the life cycle of a data processing time slice, the second buffer of the processing time for the life cycle of a data block; Similarly, in the GPU also automatically assign work buffer, the buffer survival time working on a data processing time slice.

如图8-a所示,本发明实施例还提供一种从节点设备,可包括: As shown in Figure 8-a, an embodiment of the present invention also provides a slave device, which may include:

如上述图7所示的处理器CPU-700,以及图形处理器GPU-800; The processor CPU-700 shown in FIG. 7, and the graphics processor GPU-800;

其中,CPU-700如上述介绍,在此不再赘述。 Wherein, CPU-700 as described above, will not repeat them here.

具体地,所述CPU-700中的数据预处理器用于将从数据分片获取的数据集合转换数据格式,并将转换数据格式后的数据集合生成数据块,通过所述CPU-700中的数据拼接器将所述数据块拼接到所述GPU-800中分配存储数据块的工作缓冲区中; Data set data format conversion Specifically, the CPU-700 data from the pre-processor is used to obtain data slice, and converts the data format of the data is generated after the set of blocks, the data in the CPU-700 splicer to splice the data block of the GPU-800 work buffer allocated to store data blocks;

所述GPU-800用于对所述数据块进行处理得到处理结果,之后将所述处理结果返回给所述CPU-700。 The GPU-800 is used for the data block is processed to obtain results, then the results returned to the CPU-700.

实际应用中,CPU-700中还将自动分配和回收一个ResultBuffer,同样,在GPU-800中自动分配和回收一个ResultBuffer,CPU-700中的ResultBuffer和GPU-800中的ResultBuffer的生存周期相同,都是用于存储运算得到的结果。 Practical applications, CPU-700 will also be automatically assigned and recycling of one ResultBuffer, similarly, automatically allocate and deallocate a ResultBuffer in GPU-800, the same CPU-700 in ResultBuffer and the GPU-800 ResultBuffer life cycle, are It is used to store operation results obtained. 若实际应用中,CPU-700所分配的第一缓冲区为DirectBuffer,第二缓冲区为LaunchingBuffer,GPU-800分配的工作缓冲区为WorkingBuffer,那么如图8-b所示,图8-b为本发明实施例提供的从节点设备中CPU-700与GPU-800之间的交互示意图。 If the actual application, the first CPU-700 buffer assigned to DirectBuffer, the second buffer is LaunchingBuffer, GPU-800 working buffer is allocated WorkingBuffer, then as shown in FIG. 8-b, 8-b of FIG. interactive schematic CPU-700 with GPU-800 between the cases supplied from node apparatus embodiment of the present invention. 如图8-b所示,在该CPU-700设置数据预处理器500和数据拼接器600。 As shown in FIG. 8-b, provided in the CPU-700 data pre-processor 500 and data splicer 600. 另外,在CPU-700中分配有DirectBuffer、LaunchingBuffer和ResultBuffer,DirectBuffer中存储着需要转换数据格式的数据集合,数据集合包括由键值对中的值拼接组成的数据,且在DirectBuffer添加元数据,元数据主要包括数据集合的数据在DirectBuffer的存储地址,预处理器500根据元数据数据能够从DirectBuffer中读取数据集合中的数据,再通过指定的预设解析函数对数据进行自动数据格式转换,转换后的数据集合生成数据块,最后数据预处理器500将数据块存储进LaunchingBuffer。 In addition, the CPU-700 in assigned DirectBuffer, LaunchingBuffer and ResultBuffer, DirectBuffer need to convert the data stored in the format of the data set, the data set includes data from the key to the value of mosaic composition, and in DirectBuffer add metadata, data including data collection in DirectBuffer storage address, pre-processor 500 can read the data from the meta DirectBuffer data in the data set, and then the pre-analytic functions by specifying the data automatically and data format conversion, conversion After the data set to generate a data block, the last data pre-processor 500 blocks of data stored in the LaunchingBuffer. 如果在存储进LaunchingBuffer时,需要转换数据块中数据的存储格式时,将进行存储格式的转换,保证在LaunchingBuffer中数据的存储格式与GPU-800中WorkingBuffer相同。 If at the time of storage into LaunchingBuffer, the need to convert the data storage format of the data blocks, the conversion of the storage format, to ensure that the data storage format LaunchingBuffer with GPU-800 in WorkingBuffer same. 数据拼接器600从LaunchingBuffer读取数据块拼接到GPU-800中WorkingBuffer中,若拼接失败,说明WorkingBuffer不能再存储数据块,则先触发GPU对WorkingBuffer中所存储的数据块进行运算处理,GPU将运算结果存储到其所在的ResultBuffer中,调用API接口后将运算结果传送到CPU中的ResultBuffer。 Data splicer 600 block of data read from LaunchingBuffer spliced to GPU-800 in WorkingBuffer, if the stitching fails, the WorkingBuffer can no longer store data blocks, the first GPU to WorkingBuffer trigger stored in data blocks arithmetic processing, GPU will be operational and stores the result in the ResultBuffer it, call API interface to the calculation result will be transferred to the CPU ResultBuffer.

请参阅图9,本发明实施例还提供一种数据处理设备,可包括:存储器910和至少一个处理器920(图9中以一个处理器为例)。 See Figure 9, the embodiment of the invention also provides a data processing apparatus may include: a memory 910 and at least one processor 920 (FIG. 9 with a processor, for example). 本发明实施例的一些实施例中,存储器910和处理器920可通过总线或其它方式连接,其中,图9以通过总线连接为例。 Some embodiments of the present invention embodiments, the memory 910 and processor 920 may be connected via a bus or other means, which are connected by a bus in FIG. 9 as an example.

其中,处理器920可以执行以下步骤:所述数据预处理器从所述CPU的第一缓冲区读取元数据;其中,当从数据分片获取的数据集合存储进所述第 一缓冲区时,在所述第一缓冲区头部为所述数据集合添加元数据,所述元数据中包括所述数据集合的数据在所述第一缓冲区的存储地址;所述数据预处理器根据所述元数据所指示的存储地址从所述第一缓冲区读取所述数据集合的数据;所述数据预处理器根据预设解析函数,将所述数据集合的数据转换成所述预设解析函数所指示的数据格式,并将转换后的数据集合生成数据块后存储在所述CPU的第二缓冲区,以使得所述数据拼接器从所述第二缓冲区读取所述数据块拼接到所述GPU。 Among them, the processor 920 can perform the following steps: the data preprocessor reads the metadata from the first buffer of the CPU; wherein, when the data acquired from the fragmentation of the first set of data stored in the buffers In the first set of the buffer header to the data adding metadata, the metadata comprising data of the data set in said first buffer storage address; the data based on the pre-processor the storage address indicated by the metadata read from the first data buffer of said data set; the data processor according to a preset pre analytic function, the data set is converted into the preset data parsing After the function indicated by the data format, and the converted data to generate a set of data blocks stored in said second buffer of the CPU, so that the data splicer splicing block of data read from the second buffer to the GPU.

Or

所述数据拼接器从所述CPU的第二缓冲区读取所述数据预处理器生成的数据块;所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区。 The data read from the CPU splicer second buffer the data pre-processor generates data blocks; the data splicer will splice the data block is allocated to the GPU memory block work buffer.

在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据预处理器从所述地址索引数组的数据元素指示在第一缓冲区的存储地址开始读取,直到下一个数据元素指示的存储地址或所述第一缓冲区末端结束读取数据。 In some embodiments of the present invention, the processor 920 can also perform the following steps: pre-processor of the data read from the data element array index indicating the address of the first buffer memory address until the next data element memory address or direction of the first end of the end of the read data buffer.

在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据预处理器根据所述预设的解析函数将所述数据集合的数据转换成所述解析函数指定的满足逻辑运算的数据格式。 In some embodiments of the present invention, the processor 920 can also perform the following steps: the data pre-processor according to the preset data of the analytical function to convert the data set of analytic functions satisfying specified logical operation Data Format.

在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据预处理器将所述数据块中的数据转换成所述GPU中的存储格式。 In some embodiments of the present invention, the processor 920 can also perform the following steps: the data pre-processor to the data blocks into a storage format of the GPU.

在本发明一些实施例中,处理器920还可以执行以下步骤:当所述数据拼接器将所述数据块拼接到所述GPU中被分配存储数据块的工作缓冲区失败时,则暂停拼接所述数据块,并触发所述GPU处理所述工作缓冲区存储的数据块。 In some embodiments of the present invention, the processor 920 can also perform the following steps: when the data splicer to splice the data block when the GPU is allocated memory block buffer fails to work, the stitching is suspended said data blocks, and to trigger the processing of the GPU work memory data block buffer.

在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据拼接器从游标参数指示的起始地址开始拼接所述数据块,所述游标参数用于指示所述GPU中被分配存储数据块的工作缓冲区中可用于存储数据块的起始地址。 In some embodiments of the present invention, the processor 920 can also perform the following steps: the data splicer begin splicing from the start address of the data block parameters indicated by the cursor, the cursor parameter is used to indicate the GPU is allocated Working buffer for storing data blocks can be used to store the data block start address.

在本发明一些实施例中,处理器920还可以执行以下步骤:所述数据拼 接器通知所述GPU所述数据块的大小;所述数据拼接器更新所述游标参数。 In some embodiments of the present invention, the processor 920 can also perform the following steps: the data splicer notify the GPU of the size of the data block; the data splicer updating the cursor parameters.

在本发明一些实施例中,存储器910可以用来存储数据集合、元数据以及数据块; In some embodiments of the present invention, the memory 910 can be used to store data collection, metadata, and data block;

在本发明一些实施例中,存储器910还可以用来存储地址索引数组。 In some embodiments of the present invention, the memory 910 can also be used to store the address of an indexed array.

在本发明一些实施例中,存储器910还可以用来存储游标参数。 In some embodiments of the present invention, the memory 910 may also be used to store cursor parameters.

在本发明一些实施例中,存储器910还可以用来存储运算结果。 In some embodiments of the present invention, the memory 910 may also be used to store the operation result.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。 Those of ordinary skill will be understood to achieve the above-described embodiments of the method in all or part of the steps by a program instructing relevant hardware, the program may be stored in a computer readable storage medium storing the above-mentioned medium may be read-only memory, disk, or CD-ROM.

以上对本发明所提供的一种数据处理方法及相关设备进行了详细介绍,对于本领域的一般技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。 Above for a data processing method and associated apparatus of the present invention provides a detailed description, those of ordinary skill in the art, according to the thinking of the example embodiment of the invention, there are changes in the specific embodiments and applications of, In summary, the contents of this manual should not be construed as limiting the present invention.

Citas de patentes
Patente citada Fecha de presentación Fecha de publicación Solicitante Título
CN102662639A *10 Abr 201212 Sep 2012南京航空航天大学Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN102708088A *8 May 20123 Oct 2012北京理工大学CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation
US20050140682 *16 Feb 200530 Jun 2005Siemens Medical Solutions Usa, Inc.Graphics processing unit for simulation or medical diagnostic imaging
Clasificaciones
Clasificación internacionalG06F9/38
Clasificación cooperativaG06F9/541
Eventos legales
FechaCódigoEventoDescripción
19 Ago 2015121Ep: the epo has been informed by wipo that ep was designated in this application
Ref document number: 14873198
Country of ref document: EP
Kind code of ref document: A1
23 Jun 2016NENPNon-entry into the national phase in:
Ref country code: DE
18 Ene 2017122Ep: pct application non-entry in european phase
Ref document number: 14873198
Country of ref document: EP
Kind code of ref document: A1