WO2012149776A1 - Method and apparatus for storing data - Google Patents

Method and apparatus for storing data Download PDF

Info

Publication number
WO2012149776A1
WO2012149776A1 PCT/CN2011/080284 CN2011080284W WO2012149776A1 WO 2012149776 A1 WO2012149776 A1 WO 2012149776A1 CN 2011080284 W CN2011080284 W CN 2011080284W WO 2012149776 A1 WO2012149776 A1 WO 2012149776A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
hotspot
storage device
hotspot data
data record
Prior art date
Application number
PCT/CN2011/080284
Other languages
French (fr)
Chinese (zh)
Inventor
张振龙
巩玉旺
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN2011800020461A priority Critical patent/CN102388374A/en
Priority to PCT/CN2011/080284 priority patent/WO2012149776A1/en
Publication of WO2012149776A1 publication Critical patent/WO2012149776A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • Embodiments of the present invention relate to the field of computer technology, and, more particularly, to a method and apparatus for storing data. Background technique
  • the hotspot data refers to data that is often used during the operation of the database of the server, and generally refers to data blocks, that is, the hotspot data is mainly stored in the form of data blocks.
  • the identification algorithm for identifying such hotspot data is relatively mature. For example, it is known whether the data block is hot data by the number of hits of the statistical data block.
  • This storage method identifies and stores hotspot data at the data block level (ie, the lower layer of the database), and cannot implement an effective storage strategy at the data record level (upper layer of the database). Summary of the invention
  • Embodiments of the present invention provide a method and apparatus for storing data, which can implement an effective storage strategy at a data recording level.
  • a method for storing data including: establishing a hotspot data model based on an original data record; and extracting hotspot data from the original data record or a new data record according to the hotspot data model; The hotspot data is stored in the first storage device.
  • an apparatus for storing data including: an establishing module, configured to establish a hotspot data model based on an original data record; and a screening module, configured to record from the foregoing original data according to the hotspot data model or The hot data is filtered out in the data record; the storage module is configured to store the filtered hot data into the first storage device.
  • the hotspot data is filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level.
  • FIG. 1 is a schematic flow chart of a method of storing data according to an embodiment of the present invention.
  • FIG. 2 is a schematic flow chart of a method of storing data according to another embodiment of the present invention.
  • 3 is a schematic flow chart of a process of storing data in accordance with an embodiment of the present invention.
  • FIG. 4 is a structural schematic diagram of an apparatus for storing data in accordance with one embodiment of the present invention.
  • FIG. 5 is a structural schematic diagram of an apparatus for storing data according to another embodiment of the present invention. detailed description
  • the query of large data volume is often involved.
  • the query of data accounts for more than 70% of the application of the database, and it takes a large price to search for a large amount of data from the disk, and the user needs
  • the data of the query is usually only about 20% of the data in the data table.
  • the hotspot data identified by the block-level hotspot identification and pre-identification technology is usually stored in the cache in the form of a block. Therefore, there is no way to optimize the specific scenarios and applications accordingly.
  • the hotspot data related to the data record can be identified according to the creation time of the data record (for example, the data block created in the preset time period can be used as the hot spot data and the data block is cached), but only Determining hotspot data based on creation time is not flexible enough, and the decision factor is too numerous.
  • the hotspot data is stored or cached on the high-speed storage device at the data record level to improve the query efficiency of the database.
  • FIG. 1 is a schematic flow diagram of a method 100 of storing data in accordance with one embodiment of the present invention.
  • the method 100 of Figure 1 can be performed by a server.
  • Hotspot data in accordance with embodiments of the present invention refers to data records that are often used in a database.
  • a data record is a set of related information corresponding to a row of information in a data source, which can be a row in a data table, each row including n attributes (fields or data items).
  • the above original data record may be a data record stored on an original storage device (for example, a normal disk).
  • the hotspot data model may be a function model for identifying hotspot data.
  • the hotspot data model may be automatically generated by an artificial intelligence method (for example, a Bayesian classification algorithm), and may be based on actual application conditions.
  • the update updates the model.
  • the hotspot data model is used to classify data records to separate data records into hotspot data and non-hotspot data.
  • the legacy data record is entered into the hotspot data model in accordance with an embodiment of the present invention to determine whether each data record in the original data record is hotspot data or non-hotspot data. Further, the newly stored data record can also be entered into the hotspot data model to determine if the new data record is hot point data.
  • the first storage device may be a high speed storage device or a storage device that acts as a cache or memory.
  • hotspot data can be filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level.
  • identification and pre-identification of the hot spot data is performed at the data recording level according to an embodiment of the present invention, so that the identification and pre-identification of the hotspot data is transparent to the application.
  • the method further includes: storing, in the original data record, the data record that is not the hotspot data in the second storage device, where the storage rate of the first storage device is higher than the second storage The storage rate of the storage device.
  • storing selected hotspot data in a storage device having a higher storage rate can significantly improve query efficiency.
  • the first storage device is a high-speed storage device, wherein in 120, a sample data record is extracted from the original data record; determining a hit count of the sample data record; and using the sample data record as a The data source of the hot data model is established, and the hot data model is established based on the number of hits.
  • a certain number of data records can be randomly extracted from the original data records on the ordinary disk as samples, and the number of times these samples are hit in the preset time can be calculated, and then according to the number of hits. These sample data records are divided into hot data and non-hot data. Then, the classified hotspot data and non-hotspot data can be analyzed by artificial intelligence method to determine the influence of the attribute value of the data record on the hotspot data classification, thereby obtaining the hotspot data model.
  • the method further includes: performing the process of establishing the hotspot data model in the case that the hotspot data model expires, and updating the hotspot data in the first storage device according to the re-established hotspot data model.
  • the expiration of the hotspot data model includes: the lifetime of the hotspot data model exceeds a preset time or the hit rate of the hotspot data in the first storage device is too low.
  • the data records in the database may change. Accordingly, the hotspot data model established based on the original data records will expire. In addition, the hit rate of hotspot data in the high-speed storage device may be too low. In this case, the sample needs to be re-extracted from the changed data record, and a new hotspot data model is established based on the extracted samples, so as to maintain an effective storage strategy and efficient query efficiency.
  • FIG. 2 is a schematic flow diagram of a method 200 of storing data in accordance with one embodiment of the present invention.
  • the method 200 of Figure 2 can be performed by a server.
  • 210, 220, and 230 of Fig. 2 are similar to 110, 120, and 130 of Fig. 1, and will not be described again.
  • the server's query optimizer can generate and evaluate multiple execution plans, and finally select the lowest-cost (for example, the fastest running, least-resourced) execution plan for the query. For example, when performing query optimization, you can perform a query on each of the high-speed storage device and the normal disk, and take the joint result set of the two as the final execution plan.
  • an execution plan stored in the cache may be directly used as a final execution plan for inquiry.
  • hotspot data can be filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level.
  • the hotspot data is identified and pre-identified at the data recording level, so that the identification and pre-identification of the hotspot data is transparent to the application transparent to the application, and the hotspot data is cached to the specific layer in the upper layer of the database. In storage devices, it helps to achieve query optimization.
  • the method further includes: performing the process of establishing the hotspot data model in the case that the hotspot data model expires or the hit rate of the hotspot data in the first storage device is too low, and according to the re The established hotspot data model updates the hotspot data in the first storage device.
  • FIG. 3 is a schematic flow chart of a process of storing data in accordance with an embodiment of the present invention.
  • the source data table (Table) in the database lists 9999999 data records, each data record containing four attributes (fields or data items): identification, name, gender, and age.
  • the above source data table can be stored in a normal disk.
  • the columns used to decide hotspot data may be different. For example, this example selects gender and age attributes (fields or data items) for judgment.
  • a user can be provided with configurable items so that the user can specify which columns to use as objects for decision hotspot data at the database application level when creating the form.
  • Embodiments according to the present invention are not limited thereto, and which can be determined statistically Columns can be used to make decision hotspot data.
  • Extract a sample data record from a data record of a source data table of an original storage device for example, a normal disk. For example, in the initial stage of establishing a hotspot data model, random sampling statistics are performed on a large number of data records, and a part of the data (for example, 20% of data) in the above source data table is extracted as a sample data record.
  • the sample data can be retained in the original storage device and identified to be distinguished from other data, thereby being logically abstracted into a table (hereinafter referred to as a sample data table).
  • the sample data record in the high speed storage device can be updated every predetermined time (e.g., one day or one week).
  • determining the number of hits for each sample data record For example, a statistical column is added to the sample data table to count the number of times each data record is hit, as shown in Table 2.
  • ID Name (Name) Gender (Sex) Age (Age) Hits (CNT)
  • 9874574 , , , , , , , , 330.
  • Record the sample data as a data source for establishing a hotspot data model, and establish a hotspot data model based on the number of hits. For example, after a preset time (which can be set according to a specific application, such as one day or one week), the above sample data records are sorted according to the number of hits recorded for each sample data, and the top 30 percent of the hits are ranked.
  • the data record is designated as hotspot data, and embodiments according to the present invention are not limited thereto, and the above percentage may be adjusted as needed.
  • an artificial intelligence method for example, a Bayesian classification algorithm
  • a process of using the Bayesian classification algorithm for intelligent analysis is also called a learning process or a training process of hotspot data.
  • the specific intelligent analysis process will be described in detail later.
  • the filtered hotspot data into the high speed storage device, and store the non hotspot data into the original storage device.
  • the filtered hotspot data is stored in a high-speed storage device at the data recording level, and the non-hotspot number is stored in a normal disk.
  • Table 3 data records of women under the age of 20 and whose gender is female are identified as hotspot data and stored in high speed storage to the device.
  • Table 4 non-hotspot data is stored on a regular disk.
  • ID Name (Name) Gender (Sex) Age ( Age)
  • ID Name (Name) Gender (Sex) Age ( Age)
  • ID Name (Name) Gender (Sex) Age ( Age)
  • ID Name (Name) Gender (Sex) Age ( Age)
  • the query When receiving the query request, optimize the query to generate a corresponding execution plan, and obtain data from the high speed storage device and the original storage device according to the execution plan. For example, if a query request is received, the query can be optimized at the database query optimizer level (the most optimized optimization here is to execute the query statement on the high-speed storage device and the original storage device, respectively, and take the joint result set) , to generate a corresponding execution plan, and obtain corresponding data from the high-speed storage device and the original storage device according to the execution plan.
  • the database query optimizer level the most optimized optimization here is to execute the query statement on the high-speed storage device and the original storage device, respectively, and take the joint result set
  • the hotspot data model expires, for example, the hotspot data in the foregoing high-speed storage device has a low hit rate or the lifetime of the hotspot data model exceeds a preset time, and the process of establishing the hotspot data model is performed again. And update the hotspot data model in the high speed storage device. For example, after a period of time, the hotspot data model may change, causing the original hotspot data to no longer be a hotspot.
  • the process of establishing the hotspot data model is re-executed, and according to The re-established hotspot data model updates (or refreshes) the hotspot data.
  • the hotspot data that matches the hotspot data model is filtered from the high-speed storage device and remains on the high-speed storage device, and the rest is stored in a normal disk, and then selected from the ordinary disk.
  • the hotspot data that matches the hot data model is stored in the high-speed storage device, and the rest remains on the normal disk.
  • the following uses the Naive Bayes classification method as an example to describe the establishment process of the hotspot data model.
  • the following procedure extracts only 10 samples and selects gender and age attributes as objects for decision hotspot data.
  • the first column and the second column are the gender and age attributes of the sample, respectively, and the third column indicates whether the corresponding data record is hotspot data for training (or learning) (hereinafter referred to as training hotspot data).
  • the threshold value 20 of the age attribute may be an average of the ages in the data table.
  • the target value of the Yesi classification method output, ie the maximum value of the classification function, . eV ⁇ i3 ⁇ 4, N 0 ⁇ is the target value of each training sample data, and ⁇ is the value of each attribute used to train the sample data.
  • the naive Bayesian classification formula of this example can be as follows:
  • h arg max ph j )p(Sex I h )p(Age I ), where / ⁇ represents the maximum of the probability that a data record is hot data or non-hot spot data, hj indicates that each sample data record is hot Data or non-hotspot data.
  • a certain data record is hot data or non-hot data.
  • FIG. 4 is a structural schematic diagram of an apparatus 400 for storing data in accordance with one embodiment of the present invention.
  • the apparatus of FIG. 4 may be a server, including: an establishing module 410, a screening module 420, and a storage module 430.
  • the setup module 410 establishes a hotspot data model based on the original data records.
  • the screening module 420 filters the hotspot data from the original data record or the new data record according to the hotspot data model, or filters the hotspot data from the original data record and the new data record.
  • the storage module 430 stores the filtered hotspot data into the first storage device.
  • hotspot data can be filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level.
  • identification and pre-identification of the hot spot data is performed at the data recording level according to an embodiment of the present invention, so that the identification and pre-identification of the hotspot data is transparent to the application.
  • the storage module 430 stores the data record of the original data record that is not the hotspot data in the second storage device, where the storage rate of the first storage device is higher than the storage rate of the second storage device.
  • the establishing module 410 further performs the above process of establishing a hotspot data model if the hotspot data model expires or the hit rate of the hotspot data in the first storage device is too low, and according to The re-established hotspot data model updates the above hotspot data model.
  • the first storage device is a high speed storage device
  • the establishing module 410 is from The sample data record is extracted from the original data record, the number of hits of the sample data record is determined, the sample data record is used as a data source for establishing the hot data model, and the hot data model is established based on the number of hits.
  • FIG. 5 shows a structural schematic diagram of an apparatus 500 for storing data in accordance with another embodiment of the present invention.
  • the apparatus of FIG. 5 may be a server, including: an establishing module 510, a screening module 520, a storage module 530, an optimization module 540, and an acquisition module 550.
  • the device 500 building module 510, the screening module 520, and the storage module 530 of FIG. 5 are similar to the building module 410, the screening module 420, and the storage module 430 of FIG. 4, and are not described herein again.
  • the optimization module 540 when receiving the query request, optimizes the query to generate a corresponding execution plan.
  • the acquisition module 550 acquires data from the first storage device and the second storage device, respectively, according to the execution plan described above.
  • hotspot data can be filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level.
  • the hotspot data is identified and pre-identified at the data recording level, so that the identification and pre-identification of the hotspot data is transparent to the application transparent to the application, and the hotspot data is cached to the specific layer in the upper layer of the database. In storage devices, it helps to achieve query optimization.
  • the establishing module 410 further performs the above process of establishing a hotspot data model if the hotspot data model expires or the hit rate of the hotspot data in the first storage device is too low, and according to The re-established hotspot data model updates the hotspot data in the first storage device.
  • pre-identification of hotspot data is performed on the upper layer of the database, which is transparent to the application layer and reduces the complexity of application development.
  • the use of the high speed storage device as the storage device or the cache device of the upper layer of the database according to the embodiment of the present invention facilitates the decision of the query optimizer, and the pre-identification of the hotspot data for the newly generated data record can be improved according to an embodiment of the present invention. Query efficiency.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential to the prior art or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Abstract

An embodiment of the present invention provides a method and apparatus for storing data. The method comprises: establishing a hotspot data model based on an original data record; screening for hotspot data from the original data record and/or a new data record according to the hotspot data model; and storing the screened hotspot data into a first storage device. According to the embodiment of the present invention, hotspot data can be screened out at the data record level according to the hotspot data model, and the screened hotspot data is stored into a specific storage device, thus implementing an effective storage policy at the data record level.

Description

存储数据的方法和装置 技术领域  Method and device for storing data
本发明实施例涉及计算机技术领域, 并且更具体地, 涉及一种存储数据 的方法和装置。 背景技术  Embodiments of the present invention relate to the field of computer technology, and, more particularly, to a method and apparatus for storing data. Background technique
新型存储设备的出现, 改变了传统的存储架构, 促使数据库进行相应的 改进。例如,新型的高速存储设备 SSD ( Solid State Disk, 固态硬盘)和 PCM ( Phase Change Memory , 相变存储器 ) 的读写速度比普通磁盘快, 比内存 慢, 数据掉电不丟失, 经常作为数据库的二级緩存(Cache )使用。 而如何 识别出需要緩存的热点数据 ( Hot Data ) , 以及如何在新型的高速存储设备上 对数据进行组织是有效实现数据存储或緩存需要解决的重要问题。  The emergence of new storage devices has changed the traditional storage architecture and prompted the database to be improved accordingly. For example, the new high-speed storage devices SSD (Solid State Disk) and PCM (Phase Change Memory) read and write faster than ordinary disks, slower than memory, data loss is not lost, often as a database The second level cache (Cache) is used. How to identify the Hot Data that needs to be cached and how to organize the data on the new high-speed storage device is an important issue that needs to be solved to effectively implement data storage or caching.
目前, 数据块级别的热点数据识另1 j ( Identification ) 和预识别 ( Pre-identification )技术已经较为完善。 在现有技术中, 热点数据是指在服 务器的数据库(Database )运行过程中经常被使用的数据, 一般指数据块, 即热点数据主要以数据块的形式被存储。用于识别这种热点数据的识别算法 相对比较成熟, 例如, 通过统计数据块的命中次数获知该数据块是否为热点 数据。 这种存储方法在数据块层面(即数据库下层)对热点数据进行识别和 存储, 无法在数据记录层面 (数据库上层)上实现有效的存储策略。 发明内容 At present, the data of the hotspot data at the block level has been improved by the identification of 1 j (identification) and pre-identification (Pre-identification). In the prior art, the hotspot data refers to data that is often used during the operation of the database of the server, and generally refers to data blocks, that is, the hotspot data is mainly stored in the form of data blocks. The identification algorithm for identifying such hotspot data is relatively mature. For example, it is known whether the data block is hot data by the number of hits of the statistical data block. This storage method identifies and stores hotspot data at the data block level (ie, the lower layer of the database), and cannot implement an effective storage strategy at the data record level (upper layer of the database). Summary of the invention
本发明实施例提供一种存储数据的方法和装置, 能够在数据记录层面上 实现有效的存储策略。  Embodiments of the present invention provide a method and apparatus for storing data, which can implement an effective storage strategy at a data recording level.
一方面, 提供了一种存储数据的方法, 包括: 基于原有数据记录建立热 点数据模型; 根据该热点数据模型从上述原有数据记录或新的数据记录中筛 选出热点数据; 将筛选出的热点数据存储到第一存储设备中。  In one aspect, a method for storing data is provided, including: establishing a hotspot data model based on an original data record; and extracting hotspot data from the original data record or a new data record according to the hotspot data model; The hotspot data is stored in the first storage device.
另一方面, 提供了一种存储数据的装置, 包括: 建立模块, 用于基于原 有数据记录建立热点数据模型; 筛选模块, 用于根据该热点数据模型从上述 所述原有数据记录或新的数据记录中筛选出热点数据; 存储模块, 用于将筛 选出的热点数据存储到第一存储设备中。 本发明实施例可以在数据记录层面上根据热点数据模型筛选出热点数 据, 并将筛选出的热点数据存储到特定的存储设备中, 从而在数据记录层面 上实现有效的存储策略。 附图说明 In another aspect, an apparatus for storing data is provided, including: an establishing module, configured to establish a hotspot data model based on an original data record; and a screening module, configured to record from the foregoing original data according to the hotspot data model or The hot data is filtered out in the data record; the storage module is configured to store the filtered hot data into the first storage device. In the embodiment of the present invention, the hotspot data is filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level. DRAWINGS
为了更清楚地说明本发明实施例的技术方案, 下面将对实施例或现有技 术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附图 仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造 性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.
图 1是根据本发明一个实施例的存储数据的方法的示意性流程图。  1 is a schematic flow chart of a method of storing data according to an embodiment of the present invention.
图 2是根据本发明的另一实施例的存储数据的方法的示意性流程图。 图 3是根据本发明的实施例的存储数据的过程的示意性流程图。  2 is a schematic flow chart of a method of storing data according to another embodiment of the present invention. 3 is a schematic flow chart of a process of storing data in accordance with an embodiment of the present invention.
图 4是根据本发明的一个实施例的存储数据的装置的结构性示意图。 图 5是根据本发明的另一实施例的存储数据的装置的结构性示意图。 具体实施方式  4 is a structural schematic diagram of an apparatus for storing data in accordance with one embodiment of the present invention. FIG. 5 is a structural schematic diagram of an apparatus for storing data according to another embodiment of the present invention. detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是 全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创 造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without making creative labor are within the scope of the present invention.
应理解, 本发明的技术方案可以应用于各种使用计算机的领域, 例如, 可以应用于电信领域, 电子商务,社交平台等,尤其是涉及大数据量的应用。  It should be understood that the technical solution of the present invention can be applied to various fields of using computers, for example, it can be applied to the field of telecommunications, e-commerce, social platforms, etc., especially applications involving a large amount of data.
在数据库的实际应用过程中, 经常涉及到大数据量的查询, 例如, 数据 的查询在数据库的应用中的比重占 70%以上,从磁盘中搜索大量数据要花费 较大的代价, 而用户需要查询的数据通常只是数据表中 20%左右的数据。  In the actual application process of the database, the query of large data volume is often involved. For example, the query of data accounts for more than 70% of the application of the database, and it takes a large price to search for a large amount of data from the disk, and the user needs The data of the query is usually only about 20% of the data in the data table.
目前,数据块级别的热点识别与预识别技术识别出的热点数据通常以块 形式存储在緩存中, 因此, 没有办法对特定的场景和应用做相应的优化。 另 夕卜, 虽然可以根据数据记录的创建时间来识别与数据记录相关的热点数据, (例如, 可以将预设时段内创建的数据块作为热点数据, 并对该数据块进行 緩存), 但仅根据创建时间确定热点数据, 灵活性不够, 并且决策因子过于 筒单。 热点数据, 并在数据记录层面上将通过这种预识别技术得到的热点数据存储 或緩存在高速存储设备上, 以提高了数据库的查询效率。 At present, the hotspot data identified by the block-level hotspot identification and pre-identification technology is usually stored in the cache in the form of a block. Therefore, there is no way to optimize the specific scenarios and applications accordingly. In addition, although the hotspot data related to the data record can be identified according to the creation time of the data record (for example, the data block created in the preset time period can be used as the hot spot data and the data block is cached), but only Determining hotspot data based on creation time is not flexible enough, and the decision factor is too numerous. The hotspot data is stored or cached on the high-speed storage device at the data record level to improve the query efficiency of the database.
图 1 是根据本发明的一个实施例的存储数据的方法 100 的示意性流程 图。 图 1的方法 100可以由服务器执行。  1 is a schematic flow diagram of a method 100 of storing data in accordance with one embodiment of the present invention. The method 100 of Figure 1 can be performed by a server.
110, 基于原有数据记录建立热点数据模型。  110. Establish a hotspot data model based on the original data record.
根据本发明的实施例的热点数据指的是数据库中经常被使用的数据记 录。 在关系数据库中, 数据记录是指对应于数据源中的一行信息的一组相关 信息, 可以是数据表中的一行, 每一行包括 n个属性(字段或数据项)。 上 述原有数据记录可以是存储在原有存储设备 (例如, 普通磁盘 )上的数据记 录。  Hotspot data in accordance with embodiments of the present invention refers to data records that are often used in a database. In a relational database, a data record is a set of related information corresponding to a row of information in a data source, which can be a row in a data table, each row including n attributes (fields or data items). The above original data record may be a data record stored on an original storage device (for example, a normal disk).
根据本发明实施例的热点数据模型可以是用于识别热点数据的函数模 型, 例如, 可通过人工智能方法(例如, 贝叶斯分类算法) 自动生成该热点 数据模型, 并可根据实际应用情况的变化更新该模型。 该热点数据模型用于 将数据记录进行分类, 以将数据记录分成热点数据和非热点数据。  The hotspot data model according to an embodiment of the present invention may be a function model for identifying hotspot data. For example, the hotspot data model may be automatically generated by an artificial intelligence method (for example, a Bayesian classification algorithm), and may be based on actual application conditions. The update updates the model. The hotspot data model is used to classify data records to separate data records into hotspot data and non-hotspot data.
120, 根据上述热点数据模型从上述原有数据记录或新的数据记录中筛 选出热点数据, 或者从上述原有数据记录和新的数据记录中 选出热点数 据。  120. Filter hotspot data from the original data record or the new data record according to the hot data model, or select hotspot data from the original data record and the new data record.
根据本发明的实施例将原有数据记录输入到热点数据模型中, 以确定原 有数据记录中各个数据记录是热点数据还是非热点数据。 进一步地, 也可以 将新存储的数据记录输入到热点数据模型中, 以确定新的数据记录是否为热 点数据。  The legacy data record is entered into the hotspot data model in accordance with an embodiment of the present invention to determine whether each data record in the original data record is hotspot data or non-hotspot data. Further, the newly stored data record can also be entered into the hotspot data model to determine if the new data record is hot point data.
130, 将筛选出的热点数据存储到第一存储设备中。  130. Store the filtered hotspot data into the first storage device.
例如, 第一存储设备可以是高速存储设备, 也可以是作为緩存或内存的 存储设备。  For example, the first storage device may be a high speed storage device or a storage device that acts as a cache or memory.
本发明实施例可以在数据记录层面上根据热点数据模型筛选出热点数 据, 并将筛选出的热点数据存储到特定的存储设备中, 从而在数据记录层面 上实现有效的存储策略。 另外, 根据本发明的实施例在数据记录层面进行热 点数据的识别和预识别, 使得热点数据的识别和预识别对应用程序透明。  In the embodiment of the present invention, hotspot data can be filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level. In addition, the identification and pre-identification of the hot spot data is performed at the data recording level according to an embodiment of the present invention, so that the identification and pre-identification of the hotspot data is transparent to the application.
根据本发明的另一实施例, 还包括: 将原有数据记录中不是热点数据的 数据记录存储到第二存储设备中, 其中第一存储设备的存储速率高于第二存 储设备的存储速率。 According to another embodiment of the present invention, the method further includes: storing, in the original data record, the data record that is not the hotspot data in the second storage device, where the storage rate of the first storage device is higher than the second storage The storage rate of the storage device.
例如, 根据本发明的实施例, 将 选出的热点数据存储在存储速率较高 的存储设备(例如, 高速存储设备、 緩存或内存) 中, 可以显著提高查询效 率。  For example, in accordance with an embodiment of the present invention, storing selected hotspot data in a storage device having a higher storage rate (e.g., a high speed storage device, cache, or memory) can significantly improve query efficiency.
根据本发明的实施例, 第一存储设备为高速存储设备, 其中在 120中, 从上述原有数据记录中抽取出样本数据记录; 确定上述样本数据记录的命中 次数; 将上述样本数据记录作为用于建立上述热点数据模型的数据源, 并且 基于上述命中次数建立上述热点数据模型。  According to an embodiment of the present invention, the first storage device is a high-speed storage device, wherein in 120, a sample data record is extracted from the original data record; determining a hit count of the sample data record; and using the sample data record as a The data source of the hot data model is established, and the hot data model is established based on the number of hits.
例如, 为了减少建模的开销, 可以随机从普通磁盘上的原有数据记录中 提取一定数量的数据记录作为样本, 并且可以计算这些样本在预设时间内被 命中的次数,再根据命中次数将这些样本数据记录分为热点数据和非热点数 据。 然后, 可以利用人工智能方法对这些分类后的热点数据和非热点数据进 行分析, 以确定数据记录的属性值对热点数据分类的影响, 从而得到热点数 据模型。  For example, in order to reduce the overhead of modeling, a certain number of data records can be randomly extracted from the original data records on the ordinary disk as samples, and the number of times these samples are hit in the preset time can be calculated, and then according to the number of hits. These sample data records are divided into hot data and non-hot data. Then, the classified hotspot data and non-hotspot data can be analyzed by artificial intelligence method to determine the influence of the attribute value of the data record on the hotspot data classification, thereby obtaining the hotspot data model.
根据本发明的另一实施例,还包括:在上述热点数据模型过期的情况下, 重新进行上述建立热点数据模型的过程, 并且根据重新建立的热点数据模型 更新第一存储设备中的热点数据。  According to another embodiment of the present invention, the method further includes: performing the process of establishing the hotspot data model in the case that the hotspot data model expires, and updating the hotspot data in the first storage device according to the re-established hotspot data model.
根据本发明的实施例, 上述热点数据模型过期包括: 该热点数据模型的 生命周期超过预设时间或者上述热点数据在第一存储设备中的命中率太低。  According to the embodiment of the present invention, the expiration of the hotspot data model includes: the lifetime of the hotspot data model exceeds a preset time or the hit rate of the hotspot data in the first storage device is too low.
例如, 数据库在运行一段时间之后, 数据库中的数据记录可能会变化, 相应地, 基于原数据记录建立的热点数据模型就会过期, 另外, 可能出现高 速存储设备中的热点数据的命中率太低的情况, 这时, 需要从变化后的数据 记录中重新抽取样本, 并根据抽取的样本建立新的热点数据模型, 以便维持 有效的存储策略和高效的查询效率。  For example, after the database is running for a period of time, the data records in the database may change. Accordingly, the hotspot data model established based on the original data records will expire. In addition, the hit rate of hotspot data in the high-speed storage device may be too low. In this case, the sample needs to be re-extracted from the changed data record, and a new hotspot data model is established based on the extracted samples, so as to maintain an effective storage strategy and efficient query efficiency.
图 2是根据本发明的一个实施例的存储数据的方法 200 的示意性流程 图。 图 2的方法 200可以由服务器执行。 图 2的 210、 220和 230类似于图 1 的 110、 120和 130, 在此不再赘述。  2 is a schematic flow diagram of a method 200 of storing data in accordance with one embodiment of the present invention. The method 200 of Figure 2 can be performed by a server. 210, 220, and 230 of Fig. 2 are similar to 110, 120, and 130 of Fig. 1, and will not be described again.
210, 基于原有数据记录建立热点数据模型。  210. Establish a hotspot data model based on the original data record.
220, 根据上述热点数据模型从上述原有数据记录或新的数据记录中筛 选出热点数据, 或者从上述原有数据记录和新的数据记录中 选出热点数 据。 230, 将筛选出的热点数据存储到第一存储设备中, 并且将原有数据记 录中不是热点数据的数据记录存储到第二存储设备中, 其中第一存储设备的 存储速率高于第二存储设备的存储速率。 220. Filter out hotspot data from the original data record or the new data record according to the hot data model, or select hotspot data from the original data record and the new data record. 230. The filtered hotspot data is stored in the first storage device, and the data record in the original data record that is not the hotspot data is stored in the second storage device, where the storage rate of the first storage device is higher than the second storage The storage rate of the device.
240, 在接收到查询请求时, 对查询进行优化, 以生成相应的执行计划。 通常在接收到查询请求时之后,服务器的查询优化器可以产生并评估多 个执行计划, 最后选择开销最低的 (例如, 运行最快、 使用资源最少的)执 行计划用于查询。 例如, 在进行查询优化时, 可以分别在高速存储设备和普 通磁盘上各执行一次查询, 并且取两者的联合结果集作为最终的执行计划。 根据本发明的实施例并不限于此, 例如, 在接收到查询请求时, 也可以直接 使用存储在緩存中的执行计划作为最终的执行计划进行查询。  240. When receiving the query request, optimize the query to generate a corresponding execution plan. Usually after receiving a query request, the server's query optimizer can generate and evaluate multiple execution plans, and finally select the lowest-cost (for example, the fastest running, least-resourced) execution plan for the query. For example, when performing query optimization, you can perform a query on each of the high-speed storage device and the normal disk, and take the joint result set of the two as the final execution plan. Embodiments according to the present invention are not limited thereto, and for example, when a query request is received, an execution plan stored in the cache may be directly used as a final execution plan for inquiry.
250 , 根据上述执行计划分别从第一存储设备和第二存储设备中获取数 据。  250. Acquire data from the first storage device and the second storage device according to the foregoing execution plan.
本发明实施例可以在数据记录层面上根据热点数据模型筛选出热点数 据, 并将筛选出的热点数据存储到特定的存储设备中, 从而在数据记录层面 上实现有效的存储策略。 另外, 根据本发明的实施例在数据记录层面进行热 点数据的识别和预识别,使得热点数据的识别和预识别对应用程序透明对应 用程序透明, 而将热点数据以数据库上层的方式緩存到特定存储设备中, 有 助于实现查询优化。  In the embodiment of the present invention, hotspot data can be filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level. In addition, according to the embodiment of the present invention, the hotspot data is identified and pre-identified at the data recording level, so that the identification and pre-identification of the hotspot data is transparent to the application transparent to the application, and the hotspot data is cached to the specific layer in the upper layer of the database. In storage devices, it helps to achieve query optimization.
根据本发明的另一实施例, 还包括: 在上述热点数据模型过期或者上述 热点数据在第一存储设备中的命中率太低的情况下, 重新进行上述建立热点 数据模型的过程, 并且根据重新建立的热点数据模型更新第一存储设备中的 热点数据。  According to another embodiment of the present invention, the method further includes: performing the process of establishing the hotspot data model in the case that the hotspot data model expires or the hit rate of the hotspot data in the first storage device is too low, and according to the re The established hotspot data model updates the hotspot data in the first storage device.
下面结合具体例子, 更加详细地描述本发明的实施例。 图 3是根据本发 明的实施例的存储数据的过程的示意性流程图。  Embodiments of the present invention are described in more detail below with reference to specific examples. 3 is a schematic flow chart of a process of storing data in accordance with an embodiment of the present invention.
如表 1所示,数据库中的源数据表( Table )列出了 9999999条数据记录, 每个数据记录包含四个属性(字段或数据项): 标识、 姓名、 性别和年龄。 上述源数据表可以存储在普通磁盘中。 在不同的应用或不同的数据表结构 中, 用于决策热点数据的列可以不相同, 例如, 本实例选择性别和年龄属性 (字段或数据项)进行判断。 例如, 可以为用户提供可配置项, 以便用户可 以在创建表格时在数据库应用层面上指定使用哪些列作为用于决策热点数 据的对象。 根据本发明的实施例并不限于此, 可以采用统计方式决策出哪些 列可以用于决策热点数据。 As shown in Table 1, the source data table (Table) in the database lists 9999999 data records, each data record containing four attributes (fields or data items): identification, name, gender, and age. The above source data table can be stored in a normal disk. In different applications or different data table structures, the columns used to decide hotspot data may be different. For example, this example selects gender and age attributes (fields or data items) for judgment. For example, a user can be provided with configurable items so that the user can specify which columns to use as objects for decision hotspot data at the database application level when creating the form. Embodiments according to the present invention are not limited thereto, and which can be determined statistically Columns can be used to make decision hotspot data.
Figure imgf000008_0001
Figure imgf000008_0001
310, 从原存储设备(例如, 普通磁盘) 的源数据表的数据记录中抽取 出样本数据记录。 例如, 在建立热点数据模型的初期, 对大量数据记录进行 随机抽样统计, 将上述源数据表中的一部分数据(例如, 20%的数据)提取 出来作为样本数据记录。 可以将样本数据保留在原存储设备中, 并进行标识 以便与其它数据相区分, 从而在逻辑上抽象为一张表(在下文中称样本数据 表)。 可选地, 可以每隔预设的时间 (例如, 一天或一周) 更新高速存储设 备中的样本数据记录。  310. Extract a sample data record from a data record of a source data table of an original storage device (for example, a normal disk). For example, in the initial stage of establishing a hotspot data model, random sampling statistics are performed on a large number of data records, and a part of the data (for example, 20% of data) in the above source data table is extracted as a sample data record. The sample data can be retained in the original storage device and identified to be distinguished from other data, thereby being logically abstracted into a table (hereinafter referred to as a sample data table). Alternatively, the sample data record in the high speed storage device can be updated every predetermined time (e.g., one day or one week).
320, 确定各个样本数据记录的命中次数。 例如, 在样本数据表中加入 一个统计列, 用于统计各个数据记录被命中的次数, 如表 2所示。  320, determining the number of hits for each sample data record. For example, a statistical column is added to the sample data table to count the number of times each data record is hit, as shown in Table 2.
表 2  Table 2
标识( ID ) 姓名 (Name ) 性别 (Sex ) 年龄( Age ) 命中次数(CNT ) Identification (ID) Name (Name) Gender (Sex) Age (Age) Hits (CNT)
168 N168 女 18 100 168 N168 Female 18 100
520 N520 女 14 6  520 N520 Female 14 6
777 N777 男 17 9  777 N777 Male 17 9
1234 N1234 男 30 50  1234 N1234 Male 30 50
5202 N5202 女 20 123  5202 N5202 Female 20 123
9999 N9999 男 12 3  9999 N9999 Male 12 3
t t t t t t t t t t t t t t t t tttttttttttttt
9874574 , , , , , , , , , , , , 330, 将上述样本数据记录作为用于建立热点数据模型的数据源, 并且 基于上述命中次数建立热点数据模型。 例如, 在预设时间 (其可以根据具体 应用设置, 例如一天或一周)后, 根据各个样本数据记录的命中次数, 对上 述样本数据记录进行排序,将命中次数排名前百分之三十的样本数据记录指 定为热点数据, 根据本发明的实施例并不限于此, 可以根据需要调整上述百 分比。 例如, 可以利用人工智能方法(例如, 贝叶斯分类算法)进行智能分 析, 利用贝叶斯分类算法进行智能分析的过程也称为热点数据的学习过程或 训练过程。 具体的智能分析过程将在稍后进行详细的描述。 9874574 , , , , , , , , , , , , 330. Record the sample data as a data source for establishing a hotspot data model, and establish a hotspot data model based on the number of hits. For example, after a preset time (which can be set according to a specific application, such as one day or one week), the above sample data records are sorted according to the number of hits recorded for each sample data, and the top 30 percent of the hits are ranked. The data record is designated as hotspot data, and embodiments according to the present invention are not limited thereto, and the above percentage may be adjusted as needed. For example, an artificial intelligence method (for example, a Bayesian classification algorithm) can be used for intelligent analysis, and a process of using the Bayesian classification algorithm for intelligent analysis is also called a learning process or a training process of hotspot data. The specific intelligent analysis process will be described in detail later.
340 , 根据热点数据模型对源数据表中的数据记录进行分类以 选出热 点数据。例如,可以将上述源数据表中的数据记录作为热点数据模型的输入, 在经过热点数据模型之后, 这些数据记录被分成热点数据和非热点数据作为 热点数据模型的输出。  340. Sort the data records in the source data table according to the hotspot data model to select hotspot data. For example, the data records in the source data table described above can be used as input to the hotspot data model. After passing through the hotspot data model, the data records are divided into hotspot data and non-hotspot data as output of the hotspot data model.
350, 将筛选出的热点数据存储到高速存储设备中, 并且将非热点数据 存储到原存储设备中。 例如, 在数据记录层面上将筛选出的热点数据存储到 高速存储设备中, 并将非热点数存储到普通磁盘中。 如表 3所示, 年龄在 20 岁以下, 性别为女的数据记录被确定为热点数据, 并且被存储到高速存储到 设备中。 如表 4所示, 非热点数据被存储到普通磁盘中。  350: Store the filtered hotspot data into the high speed storage device, and store the non hotspot data into the original storage device. For example, the filtered hotspot data is stored in a high-speed storage device at the data recording level, and the non-hotspot number is stored in a normal disk. As shown in Table 3, data records of women under the age of 20 and whose gender is female are identified as hotspot data and stored in high speed storage to the device. As shown in Table 4, non-hotspot data is stored on a regular disk.
表 3  table 3
标识( ID ) 姓名 (Name ) 性别 (Sex ) 年龄( Age ) Identification (ID) Name (Name) Gender (Sex) Age ( Age)
2 N2 女 172 N2 Female 17
5 N5 女 20 t t t t t t t t t t t t 5 N5 female 20 t ttttttttttt
9986454 t t t 女 21 9986454 ttt female 21
表 4  Table 4
标识( ID ) 姓名 (Name ) 性别 (Sex ) 年龄( Age ) Identification (ID) Name (Name) Gender (Sex) Age ( Age)
1 Nl 力 161 Nl force 16
3 N3 力 183 N3 force 18
4 N4 力 194 N4 force 19
6 N6 力 206 N6 force 20
... ... ... ... ... ... ... ...
9999999 • · · 力 • · · 360 ,根据热点数据模型判断新的数据记录为热点数据还是非热点数据。 在完成热点数据的学习或训练之后, 如果有新的数据记录(例如, 表 5和表 6中的标识为 10000000和标识为 10000001的数据记录)需要被存储时, 可 以根据热点数据模型判断该数据记录为热点数据还是非热点数据,如果是热 点数据, 则存储到高速存储设备中, 如果非热点数据, 则存储在磁盘中。 例 如, 表 5中的数据存储到普通磁盘中, 表 6中的数据记录存储到高速存储设 备中。 9999999 • · · Force • · · 360. Determine whether the new data record is hot data or non-hot data according to the hot data model. After completing the learning or training of the hotspot data, if there is a new data record (for example, the data records identified as 10000000 and 10000001 in Table 5 and Table 6) need to be stored, the data may be judged according to the hotspot data model. Recorded as hot data or non-hot data, if it is hot data, it is stored in the high-speed storage device, if it is not hot data, it is stored in the disk. For example, the data in Table 5 is stored in a normal disk, and the data records in Table 6 are stored in a high speed storage device.
表 5  table 5
标识( ID ) 姓名 (Name ) 性别 (Sex ) 年龄( Age ) Identification (ID) Name (Name) Gender (Sex) Age ( Age)
10000000 N10000000 女 60 表 6 10000000 N10000000 Female 60 Table 6
标识( ID ) 姓名 (Name ) 性别 (Sex ) 年龄( Age ) Identification (ID) Name (Name) Gender (Sex) Age ( Age)
10000001 N10000001 女 20 10000001 N10000001 Female 20
370, 在接收到查询请求时, 对查询进行优化, 以生成相应的执行计划, 并根据执行计划分别从高速存储设备和原存储设备中获取数据。 例如, 如果 接收到查询请求, 可以在数据库查询优化器层面对查询进行优化(这里最筒 单的优化是将查询语句分别在高速存储设备和原存储设备上各执行一次, 并 且取联合结果集), 以生成相应的执行计划, 并且根据执行计划分别从高速 存储设备和原存储设备中取得相应的数据。  370. When receiving the query request, optimize the query to generate a corresponding execution plan, and obtain data from the high speed storage device and the original storage device according to the execution plan. For example, if a query request is received, the query can be optimized at the database query optimizer level (the most optimized optimization here is to execute the query statement on the high-speed storage device and the original storage device, respectively, and take the joint result set) , to generate a corresponding execution plan, and obtain corresponding data from the high-speed storage device and the original storage device according to the execution plan.
380, 在上述热点数据模型过期的情况下, 例如上述热点数据在上述高 速存储设备中的命中率太低或者热点数据模型的生命周期超过预设时间之 后, 重新进行上述建立热点数据模型的过程, 并且更新高速存储设备中的热 点数据模型。 例如, 在经过一段时间之后, 热点数据模型可能发生改变, 导 致原来的热点数据现在已不再是热点。 在预设时间内 (例如, 一天), 在非 繁忙时段, 根据命中统计信息, 例如, 在高速存储设备中的热点数据的命中 率小于 50%时, 重新进行建立热点数据模型的过程, 并且根据重新建立的热 点数据模型更新(或刷新)热点数据, 例如, 从高速存储设备上筛选符合热 点数据模型的热点数据保留在高速存储设备上, 其余的存储到普通磁盘中, 然后从普通磁盘中选择符合热点数据模型的热点数据,存储到高速存储设备 中, 其余的保留在普通磁盘上。 下面以利用朴素贝叶斯分类方法为例,具体描述热点数据模型的建立过 程。 为了描述方便, 下面的过程仅抽取了 10个样本, 并且选择性别和年龄 属性作为用于决策热点数据的对象。 如表 7所示, 第一列和第二列分别是样 本的性别和年龄属性, 第三列指示相应数据记录是否为用于训练 (或学习) 的热点数据(以下筒称训练热点数据)。 另外, 年龄属性的门限值 20可以是 数据表中年龄的平均值。 380. In the case that the hotspot data model expires, for example, the hotspot data in the foregoing high-speed storage device has a low hit rate or the lifetime of the hotspot data model exceeds a preset time, and the process of establishing the hotspot data model is performed again. And update the hotspot data model in the high speed storage device. For example, after a period of time, the hotspot data model may change, causing the original hotspot data to no longer be a hotspot. During a preset time (for example, one day), during a non-busy period, according to the hit statistics, for example, when the hit rate of the hotspot data in the high speed storage device is less than 50%, the process of establishing the hotspot data model is re-executed, and according to The re-established hotspot data model updates (or refreshes) the hotspot data. For example, the hotspot data that matches the hotspot data model is filtered from the high-speed storage device and remains on the high-speed storage device, and the rest is stored in a normal disk, and then selected from the ordinary disk. The hotspot data that matches the hot data model is stored in the high-speed storage device, and the rest remains on the normal disk. The following uses the Naive Bayes classification method as an example to describe the establishment process of the hotspot data model. For convenience of description, the following procedure extracts only 10 samples and selects gender and age attributes as objects for decision hotspot data. As shown in Table 7, the first column and the second column are the gender and age attributes of the sample, respectively, and the third column indicates whether the corresponding data record is hotspot data for training (or learning) (hereinafter referred to as training hotspot data). In addition, the threshold value 20 of the age attribute may be an average of the ages in the data table.
表 7  Table 7
性别 (Sex) 年龄 (Age) <20? 是否为热点数据(H) 女(F) 25 Yes  Gender (Sex) Age (Age) <20? Whether it is hot data (H) Female (F) 25 Yes
女(F) 17 Yes  Female (F) 17 Yes
男 (M) 19 No  Male (M) 19 No
男 (M) 30 No  Male (M) 30 No
女(F) 14 Yes  Female (F) 14 Yes
女(F) 18 Yes  Female (F) 18 Yes
男 (M) 23 No  Male (M) 23 No
女(F) 40 No  Female (F) 40 No
女(F) 17 Yes  Female (F) 17 Yes
男 (M) 30 Yes  Male (M) 30 Yes
朴素贝叶斯分类公式为 = argmaxJP(v J (<¾ I J, 其中 v表示朴素贝 v ,■ eV 叶斯分类方法输出的目标值, 即分类函数的最大值, . eV = {i¾,N0}为每个 训练样本数据的目标值, ^为用于训练样本数据的各个属性的值。 本实例的朴素贝叶斯分类公式可以为如下公式: The naive Bayes classification formula is = argmax J P(v J (<3⁄4 IJ, where v represents Naobo V, ■ eV The target value of the Yesi classification method output, ie the maximum value of the classification function, . eV = {i3⁄4, N 0 } is the target value of each training sample data, and ^ is the value of each attribute used to train the sample data. The naive Bayesian classification formula of this example can be as follows:
h = arg max phj )p(Sex I h )p(Age I ), 其中/ ί表示某个数据记录是热点数据或非热点数据的概率中的最大值, hj表示每个样本数据记录是热点数据或是非热点数据。 由该公式可以得到热 点数据模型的参数如下: P (H = Yes) =6/10=0.6, P ( H = No ) =4/10=0.4, P ( Sex=F I H = Yes ) =5/6, P ( Sex=F I H = No ) =1/4, P ( Sex=M I H = Yes ) =1/6, P (Sex=M I H = No ) =3/4, P ( Age < 20IH=Yes ) =4/6, P (Age 20 I H=No ) =1/4, P ( Age > 20 I H=Yes ) =2/6, 以及 P ( Age > 20 I H=No ) =3/4。 根据上述热点数据模型的参数可以判断某个数据记录为热点数据或非 热点数据。 例如, 数据记录 1的性别属性为女而年龄属性为 14, 如果数据记 录 1为热点数据, 则 P ( H=Yes ) P ( Sex=F I H = Yes ) P ( Age < 20 I H = Yes ) =0.6 x 5/6 x 4/6=0.3333 , 如果数据记录 1 为非热点数据, 则 P ( H=No ) P ( Sex=F I H = No ) P ( Age < 20 I H = No ) =0.4 1/4 1/4=0.025 , 最后得到 h = 0.3333 , 因此可以确定数据记录 1 最可能为热点数据。 再例如, 数据记 录 2的性别属性为男而年龄属性为 16, 如果数据记录 2为热点数据, 则 P ( H=Yes )P( Sex=M I H = Yes )P( Age < 20 I H = Yes )=0.6 x 1/6 x 4/6=0.0667, 如果数据记录 2为非热点数据, 则 P ( H=No ) P ( Sex=M I H = No ) P ( Age 20 I H = No ) =0.4 x 3/4 x 1/4=0.075 , 最后得到 A = 0.075 , 因此, 可以确定 数据记录 2最可能为非热点数据。 h = arg max ph j )p(Sex I h )p(Age I ), where / ί represents the maximum of the probability that a data record is hot data or non-hot spot data, hj indicates that each sample data record is hot Data or non-hotspot data. The parameters of the hotspot data model can be obtained by this formula as follows: P (H = Yes) = 6/10 = 0.6, P ( H = No ) = 4/10 = 0.4, P ( Sex = FIH = Yes ) = 5 / 6 , P ( Sex=FIH = No ) = 1/4, P ( Sex = MIH = Yes ) =1/6, P (Sex=MIH = No ) = 3/4, P ( Age < 20IH=Yes ) = 4 /6, P (Age 20 I H = No ) = 1/4, P ( Age > 20 IH = Yes ) = 2/6, and P ( Age > 20 IH = No ) = 3/4. According to the parameters of the hot data model described above, it can be determined that a certain data record is hot data or non-hot data. For example, the gender attribute of data record 1 is female and the age attribute is 14, and if data record 1 is hotspot data, then P ( H=Yes ) P ( Sex=FIH = Yes ) P ( Age < 20 IH = Yes ) = 0.6 x 5/6 x 4/6=0.3333, if data record 1 is non-hot spot data, then P ( H=No ) P ( Sex=FIH = No ) P ( Age < 20 IH = No ) = 0.4 1/4 1 /4=0.025, and finally get h = 0.3333, so it can be determined that data record 1 is most likely hot data. For another example, the gender attribute of data record 2 is male and the age attribute is 16, and if data record 2 is hotspot data, then P ( H = Yes ) P ( Sex = MIH = Yes ) P ( Age < 20 IH = Yes ) = 0.6 x 1/6 x 4/6=0.0667, if data record 2 is non-hot spot data, then P ( H=No ) P ( Sex=MIH = No ) P ( Age 20 IH = No ) =0.4 x 3/4 x 1/4=0.075, and finally get A = 0.075, so it can be determined that data record 2 is most likely to be non-hotspot data.
图 4是根据本发明的一个实施例的存储数据的装置 400 的结构性示意 图。 图 4的装置可以是服务器, 包括: 建立模块 410、 筛选模块 420和存储 模块 430。  4 is a structural schematic diagram of an apparatus 400 for storing data in accordance with one embodiment of the present invention. The apparatus of FIG. 4 may be a server, including: an establishing module 410, a screening module 420, and a storage module 430.
建立模块 410基于原有数据记录建立热点数据模型。 筛选模块 420根据 上述热点数据模型从上述原有数据记录或新的数据记录中筛选出热点数据, 或者从上述原有数据记录和新的数据记录中筛选出热点数据。 存储模块 430 将筛选出的热点数据存储到第一存储设备中。  The setup module 410 establishes a hotspot data model based on the original data records. The screening module 420 filters the hotspot data from the original data record or the new data record according to the hotspot data model, or filters the hotspot data from the original data record and the new data record. The storage module 430 stores the filtered hotspot data into the first storage device.
本发明实施例可以在数据记录层面上根据热点数据模型筛选出热点数 据, 并将筛选出的热点数据存储到特定的存储设备中, 从而在数据记录层面 上实现有效的存储策略。 另外, 根据本发明的实施例在数据记录层面进行热 点数据的识别和预识别, 使得热点数据的识别和预识别对应用程序透明。  In the embodiment of the present invention, hotspot data can be filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level. In addition, the identification and pre-identification of the hot spot data is performed at the data recording level according to an embodiment of the present invention, so that the identification and pre-identification of the hotspot data is transparent to the application.
根据本发明的另一实施例,存储模块 430还将原有数据记录中不是热点 数据的数据记录存储到第二存储设备中, 其中第一存储设备的存储速率高于 第二存储设备的存储速率。  According to another embodiment of the present invention, the storage module 430 stores the data record of the original data record that is not the hotspot data in the second storage device, where the storage rate of the first storage device is higher than the storage rate of the second storage device. .
根据本发明的另一实施例,建立模块 410还在上述热点数据模型过期或 者上述热点数据在第一存储设备中的命中率太低的情况下, 重新进行上述建 立热点数据模型的过程, 并且根据重新建立的热点数据模型更新上述热点数 据模型。  According to another embodiment of the present invention, the establishing module 410 further performs the above process of establishing a hotspot data model if the hotspot data model expires or the hit rate of the hotspot data in the first storage device is too low, and according to The re-established hotspot data model updates the above hotspot data model.
根据本发明的实施例, 第一存储设备为高速存储设备, 建立模块 410从 上述原有数据记录中抽取出样本数据记录,确定上述样本数据记录的命中次 数, 将上述样本数据记录作为用于建立上述热点数据模型的数据源, 并且基 于上述命中次数建立上述热点数据模型。 According to an embodiment of the invention, the first storage device is a high speed storage device, and the establishing module 410 is from The sample data record is extracted from the original data record, the number of hits of the sample data record is determined, the sample data record is used as a data source for establishing the hot data model, and the hot data model is established based on the number of hits.
装置 400的各个单元的操作和功能可以参考上述图 1的方法的 110、 120 和 130, 为了避免重复, 在此不再赘述。  For the operation and function of the various units of the device 400, reference may be made to the methods 110, 120, and 130 of the above-described FIG. 1, and in order to avoid redundancy, details are not described herein again.
图 5示出根据本发明的另一实施例的存储数据的装置 500的结构性示意 图。 图 5的装置可以是服务器, 包括: 建立模块 510、 筛选模块 520、 存储 模块 530、优化模块 540和获取模块 550。 图 5的装置 500建立模块 510、 筛 选模块 520和存储模块 530类似于图 4的建立模块 410、 筛选模块 420和存 储模块 430, 在此不再赘述。  FIG. 5 shows a structural schematic diagram of an apparatus 500 for storing data in accordance with another embodiment of the present invention. The apparatus of FIG. 5 may be a server, including: an establishing module 510, a screening module 520, a storage module 530, an optimization module 540, and an acquisition module 550. The device 500 building module 510, the screening module 520, and the storage module 530 of FIG. 5 are similar to the building module 410, the screening module 420, and the storage module 430 of FIG. 4, and are not described herein again.
优化模块 540在接收到查询请求时, 对查询进行优化, 以生成相应的执 行计划。获取模块 550根据上述执行计划分别从第一存储设备和第二存储设 备中获取数据。  The optimization module 540, when receiving the query request, optimizes the query to generate a corresponding execution plan. The acquisition module 550 acquires data from the first storage device and the second storage device, respectively, according to the execution plan described above.
本发明实施例可以在数据记录层面上根据热点数据模型筛选出热点数 据, 并将筛选出的热点数据存储到特定的存储设备中, 从而在数据记录层面 上实现有效的存储策略。 另外, 根据本发明的实施例在数据记录层面进行热 点数据的识别和预识别,使得热点数据的识别和预识别对应用程序透明对应 用程序透明, 而将热点数据以数据库上层的方式緩存到特定存储设备中, 有 助于实现查询优化。  In the embodiment of the present invention, hotspot data can be filtered according to the hotspot data model at the data recording level, and the filtered hotspot data is stored in a specific storage device, thereby implementing an effective storage strategy at the data record level. In addition, according to the embodiment of the present invention, the hotspot data is identified and pre-identified at the data recording level, so that the identification and pre-identification of the hotspot data is transparent to the application transparent to the application, and the hotspot data is cached to the specific layer in the upper layer of the database. In storage devices, it helps to achieve query optimization.
根据本发明的另一实施例,建立模块 410还在上述热点数据模型过期或 者上述热点数据在第一存储设备中的命中率太低的情况下, 重新进行上述建 立热点数据模型的过程, 并且根据重新建立的热点数据模型更新第一存储设 备中的热点数据。  According to another embodiment of the present invention, the establishing module 410 further performs the above process of establishing a hotspot data model if the hotspot data model expires or the hit rate of the hotspot data in the first storage device is too low, and according to The re-established hotspot data model updates the hotspot data in the first storage device.
装置 500的各个单元的操作和功能可以参考上述图 2的方法的 210、220、 230和 240, 为了避免重复, 在此不再赘述。  For operations and functions of the various units of the apparatus 500, reference may be made to the methods 210, 220, 230, and 240 of the above-described FIG. 2, and in order to avoid redundancy, details are not described herein again.
根据本发明的实施例在数据库上层进行热点数据的预识别,对应用层透 明, 降低了应用程序开发的复杂度。 另外, 根据本发明的实施例利用高速存 储设备作为数据库上层的存储设备或緩存设备, 有利于查询优化器的决策, 而且根据本发明的实施例对新产生数据记录进行热点数据的预识别能够提 高查询效率。  According to an embodiment of the present invention, pre-identification of hotspot data is performed on the upper layer of the database, which is transparent to the application layer and reduces the complexity of application development. In addition, the use of the high speed storage device as the storage device or the cache device of the upper layer of the database according to the embodiment of the present invention facilitates the decision of the query optimizer, and the pre-identification of the hotspot data for the newly generated data record can be improved according to an embodiment of the present invention. Query efficiency.
本领域普通技术人员可以意识到, 结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 定应用和设计约束条件。 专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。 One of ordinary skill in the art will recognize that each of the embodiments described herein in connection with the embodiments disclosed herein The exemplary unit and algorithm steps can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
所属领域的技术人员可以清楚地了解到, 为描述的方便和筒洁, 上述描 述的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应 过程, 在此不再赘述。  It will be apparent to those skilled in the art that, for the convenience of the description and the cleaning process, the specific operation of the system, the device and the unit described above may be referred to the corresponding processes in the foregoing method embodiments, and details are not described herein again.
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。  In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。  The units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元 中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一 个单元中。  In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使 用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明 的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部 分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 而前 述的存储介质包括: U盘、移动硬盘、只读存储器( ROM , Read-Only Memory )、 随机存取存储器(RAM, Random Access Memory ), 磁碟或者光盘等各种可 以存储程序代码的介质。  The functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential to the prior art or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。 The above description is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited. In this regard, any person skilled in the art can easily conceive changes or substitutions within the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

权利要求 Rights request
1、 一种存储数据的方法, 其特征在于, 包括:  A method for storing data, comprising:
基于原有数据记录建立热点数据模型;  Establish a hotspot data model based on the original data records;
根据所述热点数据模型从所述原有数据记录或新的数据记录中筛选出 热点数据;  Extracting hotspot data from the original data record or the new data record according to the hotspot data model;
将筛选出的热点数据存储到第一存储设备中。  The filtered hotspot data is stored in the first storage device.
2、 根据权利要求 1所述的方法, 其特征在于, 还包括:  2. The method according to claim 1, further comprising:
将原有数据记录中不是热点数据的数据记录存储到第二存储设备中,其 中所述第一存储设备的存储速率高于第二存储设备的存储速率。  The data record in the original data record that is not the hotspot data is stored in the second storage device, wherein the storage rate of the first storage device is higher than the storage rate of the second storage device.
3、 根据权利要求 2所述的方法, 其特征在于, 还包括:  3. The method according to claim 2, further comprising:
在接收到查询请求时, 对查询进行优化, 以生成相应的执行计划; 根据所述执行计划分别从所述第一存储设备和所述第二存储设备中获 取数据。  Upon receiving the query request, the query is optimized to generate a corresponding execution plan; data is obtained from the first storage device and the second storage device, respectively, according to the execution plan.
4、 根据权利要求 1至 3中的任一项所述的方法, 其特征在于, 还包括: 在所述热点数据模型过期的情况下,重新进行所述建立热点数据模型的 过程, 并且根据重新建立的热点数据模型更新所述第一存储设备中的热点数 据。  The method according to any one of claims 1 to 3, further comprising: re-executing the process of establishing a hotspot data model if the hotspot data model expires, and according to The established hotspot data model updates hotspot data in the first storage device.
5、 根据权利要求 1至 4中的任一项所述的方法, 其特征在于, 所述热 点数据模型过期包括: 所述热点数据模型的生命周期超过预设时间或者所述 热点数据在所述第一存储设备中的命中率太低。  The method according to any one of claims 1 to 4, wherein the expiration of the hotspot data model comprises: the life cycle of the hotspot data model exceeds a preset time or the hotspot data is in the The hit rate in the first storage device is too low.
6、 根据权利要求 1至 5中的任一项所述的方法, 其特征在于, 所述第 一存储设备为高速存储设备, 所述基于原有数据记录建立热点数据模型, 包 括:  The method according to any one of claims 1 to 5, wherein the first storage device is a high-speed storage device, and the hot data model is established based on the original data record, including:
从所述原有数据记录中抽取出样本数据记录;  Extracting a sample data record from the original data record;
确定所述样本数据记录的命中次数;  Determining the number of hits of the sample data record;
将所述样本数据记录作为用于建立所述热点数据模型的数据源,并且基 于所述命中次数建立所述热点数据模型。  The sample data record is used as a data source for establishing the hotspot data model, and the hotspot data model is established based on the number of hits.
7、 一种存储数据的装置, 其特征在于, 包括:  7. A device for storing data, comprising:
建立模块, 用于基于原有数据记录建立热点数据模型;  Establishing a module for establishing a hotspot data model based on the original data record;
筛选模块,用于根据所述热点数据模型从所述原有数据记录或新的数据 记录中筛选出热点数据; 存储模块, 用于将筛选出的热点数据存储到第一存储设备中。 a screening module, configured to filter hotspot data from the original data record or the new data record according to the hotspot data model; a storage module, configured to store the filtered hotspot data into the first storage device.
8、 根据权利要求 7所述的装置, 其特征在于, 所述存储模块还将原有 数据记录中不是热点数据的数据记录存储到第二存储设备中, 其中所述第一 存储设备的存储速率高于第二存储设备的存储速率。  The device according to claim 7, wherein the storage module further stores a data record that is not hotspot data in the original data record into the second storage device, where the storage rate of the first storage device Higher than the storage rate of the second storage device.
9、 根据权利要求 8所述的装置, 其特征在于, 还包括:  9. The device according to claim 8, further comprising:
优化模块, 用于在接收到查询请求时, 对查询进行优化, 以生成相应的 执行计划;  An optimization module, configured to optimize a query to generate a corresponding execution plan when receiving a query request;
获取模块,用于根据所述执行计划分别从所述第一存储设备和所述第二 存储设备中获取数据。  And an obtaining module, configured to respectively acquire data from the first storage device and the second storage device according to the execution plan.
10、 根据权利要求 7至 9中的任一项所述的装置, 其特征在于, 所述建 立模块还在所述热点数据模型过期的情况下, 重新进行所述建立热点数据模 型的过程, 并且根据重新建立的热点数据模型更新所述第一存储设备中的热 点数据。  The apparatus according to any one of claims 7 to 9, wherein the establishing module further performs the process of establishing a hotspot data model in a case where the hotspot data model expires, and Updating hotspot data in the first storage device according to the re-established hotspot data model.
11、 根据权利要求 7至 10中的任一项所述的装置, 其特征在于, 所述 热点数据模型过期包括: 所述热点数据模型的生命周期超过预设时间或者所 述热点数据在所述第一存储设备中的命中率太低。  The apparatus according to any one of claims 7 to 10, wherein the expiration of the hotspot data model comprises: a life cycle of the hotspot data model exceeds a preset time or the hotspot data is in the The hit rate in the first storage device is too low.
12、 根据权利要求 7至 11 中的任一项所述的装置, 其特征在于, 所述 第一存储设备为高速存储设备, 所述建立模块从所述原有数据记录中抽取出 样本数据记录, 确定所述样本数据记录的命中次数, 将所述样本数据记录作 为用于建立所述热点数据模型的数据源, 并且基于所述命中次数建立所述热 点数据模型。  The device according to any one of claims 7 to 11, wherein the first storage device is a high speed storage device, and the establishing module extracts a sample data record from the original data record. Determining a number of hits of the sample data record, using the sample data record as a data source for establishing the hotspot data model, and establishing the hotspot data model based on the number of hits.
PCT/CN2011/080284 2011-09-28 2011-09-28 Method and apparatus for storing data WO2012149776A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2011800020461A CN102388374A (en) 2011-09-28 2011-09-28 Method and device for data storage
PCT/CN2011/080284 WO2012149776A1 (en) 2011-09-28 2011-09-28 Method and apparatus for storing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/080284 WO2012149776A1 (en) 2011-09-28 2011-09-28 Method and apparatus for storing data

Publications (1)

Publication Number Publication Date
WO2012149776A1 true WO2012149776A1 (en) 2012-11-08

Family

ID=45826495

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/080284 WO2012149776A1 (en) 2011-09-28 2011-09-28 Method and apparatus for storing data

Country Status (2)

Country Link
CN (1) CN102388374A (en)
WO (1) WO2012149776A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077221A (en) * 2012-12-29 2013-05-01 深圳先进技术研究院 Automatic placement device and method for mass data
CN103077219A (en) * 2012-12-29 2013-05-01 深圳先进技术研究院 Method and device for automatically storing data
CN103049559A (en) * 2012-12-29 2013-04-17 深圳先进技术研究院 Automatic mass data placement method and device
CN103312776A (en) * 2013-05-08 2013-09-18 青岛海信传媒网络技术有限公司 Method and device for caching contents of videos by edge node server
CN104217004B (en) * 2014-09-15 2017-10-13 中国工商银行股份有限公司 The monitoring method and device of a kind of database focus of transaction system
CN106202092B (en) 2015-05-04 2020-03-06 阿里巴巴集团控股有限公司 Data processing method and system
CN105138541B (en) * 2015-07-08 2018-02-06 广州酷狗计算机科技有限公司 The method and apparatus of audio-frequency fingerprint matching inquiry
CN108664516A (en) * 2017-03-31 2018-10-16 华为技术有限公司 Enquiring and optimizing method and relevant apparatus
CN107463514B (en) * 2017-08-16 2021-06-29 郑州云海信息技术有限公司 Data storage method and device
CN107728952A (en) * 2017-10-31 2018-02-23 郑州云海信息技术有限公司 A kind of prediction type data migration method and system
CN110866063B (en) * 2018-08-27 2023-10-31 阿里云计算有限公司 Data tracking processing method and device
CN110908974A (en) * 2018-09-14 2020-03-24 阿里巴巴集团控股有限公司 Database management method, device, equipment and storage medium
CN109739913A (en) * 2018-12-24 2019-05-10 北京明朝万达科技股份有限公司 A kind of hot spot data method for caching and processing and equipment based on configurableization
CN109976905B (en) * 2019-03-01 2021-10-22 联想(北京)有限公司 Memory management method and device and electronic equipment
CN112685634A (en) * 2020-12-29 2021-04-20 平安普惠企业管理有限公司 Data query method and device, electronic equipment and storage medium
CN113064930B (en) * 2020-12-29 2023-04-28 中国移动通信集团贵州有限公司 Cold and hot data identification method and device of data warehouse and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080082729A1 (en) * 2006-10-02 2008-04-03 Samsung Electronics Co. Ltd. Device driver including a flash memory file system and method thereof and a flash memory device and method thereof
US20090100244A1 (en) * 2007-10-15 2009-04-16 Li-Pin Chang Adaptive hybrid density memory storage device and control method thereof
CN101483668A (en) * 2009-02-10 2009-07-15 成都市华为赛门铁克科技有限公司 Network storage and access method, device and system for hot spot data
CN101604226A (en) * 2009-07-14 2009-12-16 浪潮电子信息产业股份有限公司 A kind of method that makes up raising performance of storage system in dynamic buffering pond based on virtual RAID
CN101788995A (en) * 2009-12-31 2010-07-28 成都市华为赛门铁克科技有限公司 Hotspot data identification method and device
CN102129472A (en) * 2011-04-14 2011-07-20 上海红神信息技术有限公司 Construction method for high-efficiency hybrid storage structure of semantic-orient search engine

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556582A (en) * 2008-04-09 2009-10-14 上海复旦光华信息科技股份有限公司 System for analyzing and predicting netizen interest in forum
CN101477556B (en) * 2009-01-22 2010-09-15 苏州智讯科技有限公司 Method for discovering hot spot in internet mass information
CN101488150B (en) * 2009-03-04 2011-01-05 哈尔滨工程大学 Real-time multi-view network focus event analysis apparatus and analysis method
CN101887440B (en) * 2009-05-13 2012-05-30 财团法人资讯工业策进会 Hot spot analytic system and method
CN101763401B (en) * 2009-12-30 2012-05-30 暨南大学 Network public sentiment hotspot prediction and analysis method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080082729A1 (en) * 2006-10-02 2008-04-03 Samsung Electronics Co. Ltd. Device driver including a flash memory file system and method thereof and a flash memory device and method thereof
US20090100244A1 (en) * 2007-10-15 2009-04-16 Li-Pin Chang Adaptive hybrid density memory storage device and control method thereof
CN101483668A (en) * 2009-02-10 2009-07-15 成都市华为赛门铁克科技有限公司 Network storage and access method, device and system for hot spot data
CN101604226A (en) * 2009-07-14 2009-12-16 浪潮电子信息产业股份有限公司 A kind of method that makes up raising performance of storage system in dynamic buffering pond based on virtual RAID
CN101788995A (en) * 2009-12-31 2010-07-28 成都市华为赛门铁克科技有限公司 Hotspot data identification method and device
CN102129472A (en) * 2011-04-14 2011-07-20 上海红神信息技术有限公司 Construction method for high-efficiency hybrid storage structure of semantic-orient search engine

Also Published As

Publication number Publication date
CN102388374A (en) 2012-03-21

Similar Documents

Publication Publication Date Title
WO2012149776A1 (en) Method and apparatus for storing data
US9189506B2 (en) Database index management
JP5577350B2 (en) Method and system for efficient data synchronization
CN104182898B (en) The method that banking system carries out amended record to the on-line transaction occurred during night mode
WO2016004813A1 (en) Data storage method, query method and device
CN104301360B (en) A kind of method of logdata record, log server and system
JP6553649B2 (en) Clustering storage method and apparatus
CN105373541B (en) The processing method and system of the data operation request of database
WO2017096892A1 (en) Index construction method, search method, and corresponding device, apparatus, and computer storage medium
CN110362632A (en) A kind of method of data synchronization, device, equipment and computer readable storage medium
JP2010532522A5 (en)
EP3084704A1 (en) Long string pattern matching of aggregated account data
CN106294772A (en) The buffer memory management method of distributed memory columnar database
WO2019153506A1 (en) Provident fund transferring method, computer readable storage medium, terminal device and apparatus
US20220130426A1 (en) Method and system for long term stitching of video data using a data processing unit
CN109815344A (en) Network model training system, method, apparatus and medium based on parameter sharing
US20220131862A1 (en) Method and system for performing an authentication and authorization operation on video data using a data processing unit
Mohd Khairudin et al. Effect of temporal relationships in associative rule mining for web log data
CN103377292B (en) Database result set caching method and device
JP2022137281A (en) Data query method, device, electronic device, storage medium, and program
CN103365987A (en) Clustered database system and data processing method based on shared-disk framework
CN106462591B (en) Partition filtering using intelligent indexing in memory
CN107004002A (en) According to the set of structural data generation unstructured searching inquiry
US20190340646A1 (en) Method and apparatus for content-virality amplification
US20180181616A1 (en) Meta-join and meta-group-by indexes for big data

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180002046.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11864869

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11864869

Country of ref document: EP

Kind code of ref document: A1