CN103412917A - Extensible database system and management method for coordinated management of data in multi-type field - Google Patents

Extensible database system and management method for coordinated management of data in multi-type field Download PDF

Info

Publication number
CN103412917A
CN103412917A CN201310343157XA CN201310343157A CN103412917A CN 103412917 A CN103412917 A CN 103412917A CN 201310343157X A CN201310343157X A CN 201310343157XA CN 201310343157 A CN201310343157 A CN 201310343157A CN 103412917 A CN103412917 A CN 103412917A
Authority
CN
China
Prior art keywords
data
field
database
module
data object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310343157XA
Other languages
Chinese (zh)
Other versions
CN103412917B (en
Inventor
陈宁江
肖中正
董世龙
胡丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanning super cube science and Technology Co Ltd
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Priority to CN201310343157.XA priority Critical patent/CN103412917B/en
Publication of CN103412917A publication Critical patent/CN103412917A/en
Application granted granted Critical
Publication of CN103412917B publication Critical patent/CN103412917B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an extensible database system and a management method for coordinated management of data in the multi-type field. The extensible database system for the coordinated management of the data in the multi-type field comprises a data resource ontology base module, a hierarchical type field database module, a network type field database module and a field data evolution module, wherein the data resource ontology base module, various type databases, the hierarchical type field database module and the network type field database module are formed into a database set together. According to the extensible database system for the coordinated management of the data in the multi-type field, a huge number of data storage bases which face business fields are established, an extensible data resource ontology base system is established on the above basis, hierarchical type field databases and network type field databases in fields of different types are rapidly expanded, and new data objects are extracted from non-structured original textual data to establish new field data.

Description

The data base set of a kind of extendible polymorphic type field data coordinated management management method of unifying
Technical field
The present invention relates to a kind ofly to expand, the data base set of polymorphic type field data coordinated management is unified management method, belongs to database and artificial intelligence field.
Background technology
Database be according to data structure organize, the warehouse of store and management data, be the conventional data disposal system of a unit or an application.Along with the development in infotech and market, data management is no longer only the store and management data, and is transformed into the mode of the needed various data managements of user.Database has number of different types, all is widely used in all fields to the large-scale database system that can carry out mass data storage from the form that the most simply stores various data.Along with quickening and the arrival in " large data " epoch of IT application process, business data more and more is tending towards magnanimity, Un-structured and complicated.The combination of artificial intelligence and two computer technologies of database, promoted the database intelligent development.General application program is that the knowledge of problem solving impliedly is coded in program, based on the system of intelligent database, the problem solving key element explicitly of application is expressed, and forms individually a relatively independent program entity.
Along with IT application process is accelerated, the management of magnanimity complex data more and more is subject to the attention of enterprise, but following problem often can be encountered by enterprise in the process of carrying out resource management: the storage of magnanimity business data, difficult management; Search slow, inefficiency; Field data version management confusion; Data security lacks guarantee; Each field database can't be shared by effective cooperation.Therefore tackle the management of the complicated Un-structured data of magnanimity, needs can expand, can develop and the intelligent database of polymorphic type field coordinated management is stored, processes and analyze these data.
Summary of the invention
Technology of the present invention is dealt with problems and is: solve efficient storage tissue, index and the inquiry problem of non-structured, as to have Various Complex relation data, the data base set of a kind of extendible polymorphic type field data coordinated management management method of unifying is provided.
Technical solution of the present invention: the Database Systems of a kind of extendible polymorphic type field data coordinated management, comprise: data resource ontology library module, network-type field database module, hierarchical field database module and field data evolution module, wherein:
Data resource ontology library module, definition top layer data resource model, realize logical view design and the node store structure design of primitive, data storage and inquire shoring of foundation ability is provided, and set up the database that comprises a large number of services data object, relation and concept; Data resource ontology library module provides top layer data abstraction rule and data access rule for network-type field database module, hierarchical field database module;
Network-type field database module, attribute, relational network and other specific properties according to data object, on the basis of data resource ontology library, build the database of type data attribute Network Based, realize Data Structure Design, design Storage and the Index Design of network-type data object, formed the network of personal connections that comprises a large amount of network-type data objects, and realized providing the network-type database access interface to outside; Network-type field data module is that data resource ontology storehouse is belonged to succession and the realization of the instantiation on the data of network-type field; The query interface of V-neck V numeric field data Network Based is provided to user, other modules and external system;
Hierarchical field database module, according to being subordinate between the hierarchical data object, adjacent, intersect, the characteristics that concern such as at the same level, build the database that special expression data object and level thereof are subordinate to relevant information, and realize providing data object and level thereof to be subordinate to the access interface of database to outside; Level field database module is the further evolution to network-type field data module, by only having level structure field data, carry out storage organization with the form of tree, the implementation level semanteme, and provide the query interface based on level field data to user, other modules and external system;
Field data evolution module, follow the trail of and control the data resource ontology library, the variation of field data in user's use procedure in network-type field database and level field database, set up data object version history, and the raw data set that the user provides is analyzed in conjunction with data with existing, thereby obtain new field data and be input in field database by screening, data evolution module in field is controlled for above-mentioned three storehouses provide based on the versions of data of record, automatically from the raw data of user's input, finding new field data, and use its interface to carry out the corresponding management of developing.
Described data resource ontology library module comprises that data persistence module, bottom dictionary set up module, contextual definition module, data directory module and interface module;
Data persistence module, definition are towards the implementation method of interface, according to the configuration data persistence realization flexibly of different hardware environment, context environmental and other demands; Based on the object serialization technology, serializing and the unserializing agreement of definition field data related object, output to file, database or network site to the binary stream obtained after object serialization by the file organization agreement during data persistence; When needs were not written into the object of Object pooling, the logical address of sending according to the upper strata request read respective stream of data, then by the unserializing agreement reconstruct object of object; The logical organization mode of data in file is the piece storage mode, and the management of piece adopts pile structure to manage.The data persistence module of data resource ontology library is also simultaneously that the data persistence of network-type field database and hierarchical field database is abstract, the data storage function of latter two module customizes and expands according to different persistence agreements based on this persistence module, forms the persistence storehouse of specific data type;
Module set up in the bottom dictionary, and storage, without the data object of extended attribute and relation, is set up basic field data object, serializing agreement, unserializing agreement and storage manager; The single data mode of bottom dictionary, for the definition of network-type field data and hierarchical field data provides the data basis.Network-type field database module and hierarchical field database module have realized serializing and the unserializing interface of definition here;
The contextual definition module, on the basis that the bottom dictionary is realized, set up synonymy, antonymy, membership with new file to the entry of bottom data object database; Highly abstract, general contextual definition, tissue, storage and management, make the network-type field database realize on this basis flexible expansion;
The data directory module, first to the definition of making a summary of field data object, by double coding algorithm fast, high-ranking military officer's numeric field data summary shines upon with the logical storage information of field data object, reaches the purpose of quick-searching and access control; Network-type field database module and hierarchical field database module all contain index part, and wherein key word obtains a long numeral to realizing after all by dual coding, calculating;
Interface module, based on the EJB3.0 standard, realize, issue with EJB interface and Web Service interface shape, realize cross-platform service, network-type field database and hierarchical field database, by inheriting data resource ontology library interface module, realize the interface issuing function of customization.
Described network-type field database module implementation procedure is as follows:
(1) on design Storage based on the storage administration layer of described data resource ontology library, field data object, persistence agreement that define grid V-neck V numeric field data is relevant;
(2) on the data object basis, network-type field defined, set up accumulation layer, defined attribute part basic structure and process.Attribute section is divided into to two parts, and the attribute existed when a part is database design, be called base attribute; Another part is the User Defined attribute, is called extended attribute;
(3) on the data object storage organization of network-type field, set up data directory, based on the tree of the B with the fast cache district and Bloom Filter, realize, the B tree dynamically generated when inserting the network-type data object, and do not limit the maximum number of plies of B tree, for network-type data object situation of the same name, attribute block is connected together and forms an attribute block chained list with pointer, when with name, being called keyword query network-type data object, obtain fast network-type data object list of the same name;
(4) after realizing data directory, for the renewal of data recording, by realizing checkpoint and journal file, efficient access and the high fault tolerance of safeguards system.
Described hierarchical field database module implementation procedure is as follows:
(1) on design Storage based on the storage administration layer of described data resource ontology library, relevant field data object, the persistence agreement of definition level V-neck V regional data base;
(2) at hierarchical field data object and based on hierarchical field data structure, expand on the basis of persistence agreement, carry out the design Storage of hierarchical structure;
(3), after accumulation layer completes, carry out the relational structure between organisational level type data object by the binary tree limited.Key word in index file is the even formation of the dual coding number of a data object, due to the even uniqueness of this number, the problem of conflicting so need not consider.In property file, depositing father's data object and subdata object is also to adopt this number occasionally to store; During retrieval, even by the number that calculates this data object, in index file, mate identical number occasionally, obtain the pointer of corresponding attribute, read attribute, if a plurality of subdata objects are arranged, can find all subordinate's data objects by the pointer of the next attribute of sensing of attribute;
(4), after index completes, on the checkpoint and journal file function basis of network-type field database module, build the simplification journal file that is applicable to the hierarchical structure.
Described field data evolution module implementation procedure is as follows:
(1) at first collect the User Activity record of each database, the variation of the level of activity of monitoring field data object.
(2) the data object activity change data analysis to collecting, include activity in warning standby storehouse lower than the field data object of system thresholds;
(3) the further activation record of analysis user, the field data object that core attribute is changed is set up the version change records of this data object;
(4) system also the user is provided or internet on text data analyze, build a googol according to the object analysis storehouse; After new data are input to data object analysis storehouse, triggering is read to the version information of associated data object, then by analyzing the relation between data object and associated data object version, calculate the current data object and be the probability of new field data, and automatically or the user manually new field data are modified, and add in corresponding field database.
The data base management method of a kind of extendible polymorphic type field data coordinated management, performing step is as follows:
(1) text data file user provided carries out pre-service, removes and comprises the non-core field data of stopping word, modal particle and punctuation mark, obtains the preprocessed text data;
(2) preprocessed data of output in step (1) is input in the LDA probability model, mates with built vertical data model, obtain domain-specific data object wherein;
(3) the domain-specific data object of output in step (2) is carried out to the structure of suffix tree, and merge existing suffix tree, the suffix tree be combined progressively travels through, and obtains the character string of high frequency, then domain-specific data object of initialization;
(4) domain-specific data object step (3) obtained is input to the data resource ontology library and carries out type, relation judgement and coupling, obtain the type with this domain-specific data object, be hierarchical, network-type or user-defined type, and the other field data object associated with this data object;
(5) field database that the domain-specific data object of step (4) being exported and associated data thereof are input to corresponding types, set up the data change log recording, and, by this domain-specific data object input double coding algorithm, obtain corresponding index number even;
(6) number step (5) obtained is even carries out combinations of services with data object and association area data object, and final output contains domain-specific, multiple relation, multiattribute field data object.
The present invention's advantage compared with prior art is:
(1) the present invention is based on the memory technology of piece storage, heap manager and many daily records group, guarantee efficient, the safety of bottom storage;
(2), based on extendible method for designing, by the self-defining data relation, on the data resource ontology library, can expand the field database in a plurality of sub-fields;
(3) the present invention realizes to a large amount of text data automatic detections, analysis and abstraction function, by data object evolution frontier data on the basis of latest edition data resource ontology library of obtaining.
The accompanying drawing explanation
Fig. 1 is the composition frame chart of system of the present invention;
Fig. 2 is that data resource ontology library modular concept of the present invention concerns schematic diagram;
Fig. 3 is data resource ontology library module tlv triple master data model schematic diagram of the present invention;
Fig. 4 is data resource ontology library index schematic diagram of the present invention;
Fig. 5 is data resource ontology library module accesses process schematic diagram of the present invention;
Fig. 6 is data resource ontology library definition newer field name flow process schematic diagram of the present invention;
Fig. 7 is that data resource ontology library of the present invention inserts the new relation schematic flow sheet;
Fig. 8 is data resource ontology library retrieve data object process schematic diagram of the present invention;
Fig. 9 is network-type database basic structure of the present invention and concerns schematic diagram;
Figure 10 is hierarchical field database basic structure of the present invention and concerns schematic diagram;
Figure 11 is hierarchical data of the present invention field library inquiry schematic flow sheet;
Figure 12 is the newly-built field of hierarchical field database of the present invention data object flow process schematic diagram;
Figure 13 is field data evolution module LDA model representation figure of the present invention;
Figure 14 is field data evolution Version Control structural representation of the present invention.
Embodiment
The present invention, by efficient data storage organization method, realizes effective storage, the inquiry of data object, relation.And can find new data object and field data by the mass data that the user provides; Realize data resource ontology library, network-type field database, hierarchical field database, and can carry out the expansion of field database.
The present invention includes data resource ontology library module, hierarchical field database module, network-type field database module and field data evolution module, wherein data resource ontology library module, hierarchical field database module and network-type field database module form data base set jointly, the retrieval of carrying out information extraction and data object is provided, set up the required various related data library of object of field database system, for data coordinated management in polymorphic type field provides data storage and inquire shoring of foundation ability; Data evolution module in field is followed the trail of and the variation of control field data, realizes evolution and the field data version management of field database itself.
1, data resource ontology library module
The definition of attribute: the character of a things and relation are called the attribute of this things.The particular attribute of certain class things, from out abstract concrete things, such as people's voice, thinking is particular attribute.The occasional attribute of certain class things, be that some things of certain class has but not all things all has attribute, such as people's the colour of skin, nationality is all occasional attribute.
The definition of concept: concept is the thinking form of the particular attribute (build-in attribute or essential attribute) of reflection things.Concept has abstractness, ubiquity.Concept has true and false dividing, and real concept is the concept that correctly reflects the particular attribute of things.The intension of concept is the particular attribute of the things that reflects of concept.The extension of concept is the things with particular attribute that concept reflects.General " is-a " relation of using means that certain conceptual model is the extension of certain conceptual model.Relation is an extension of concept.The role is also an extension of concept.Concept requirement is clear and definite, namely requires to go a clear and definite concept from two aspects of the connotation and extension.According to the extension of concept, be a things or a plurality of things, concept can be divided into single concept and universal concept.The extension of single concept is a unique things, such as concrete time and concrete space.And the extension of universal concept can comprise many things.Such as " city ", " commodity " such concept." city " comprises a lot of concrete cities, and " commodity " are also the concept set of a large amount of physical commodities." is " and " is-a " is all a kind of unique concept.Concept can also be divided into collective concept and non-collective concept.Collective concept is the concept of reflection aggregate.Non-collective concept is the concept that does not reflect aggregate.Concept can also be divided into positive concept and negative concept.Positive concept is the concept that reflection has the things of certain attribute.Negative concept is the concept that reflects the things without certain attribute.Concept also is divided into relative concept and absolute probability.Relative concept is the concept that reflection has the things of certain relation.Absolute probability is the concept that reflection has the things of certain character.
Relation between concept has following 4 kinds of fundamental relations, as shown in Figure 2:
(1) complete same relation: if all a are b, simultaneously, b is a, and a and b just have complete same relation so.Two concepts have complete same relation, and the extension of these two concepts is same so.
(2) membership: if all b are a, but have a not belong to b, a and b are superior relations so, and the relation of b and a is subordinate's relation.
(3) cross reference: if some a are b, and some a are not b, and some b are not again a, and so, a and b are cross references.
(4) disparate relation: all a are not b, and a and b are disparate relations so.
The master data model of " Concept A-Relation-Concept B " this tlv triple is the basis of field database, is namely basic logical organization, is also basic physical storage structure (Fig. 3)." concept " is the fundamental element in field database, and it has expressed the cognition of people to things, is the thinking form of the particular attribute of reflection things.Concept can be both a kind of object of necessary being, may be also a kind of mankind's the imagination or design, can be namely a kind of attribute, may be also a kind of activity.The logic meaning that this tlv triple is expressed is between " concept A " and " concept B ", to be this " relation (Relation) ", and " concept A ", in this tlv triple, is owner's identity, and " concept B " is participant's identity in this tlv triple." concept A " and " concept B " can also have or participate in other relation.Thereby by pass, tie up in n concept and set up a very complicated reticulate texture, this complex structure degree is determined by user or actual motion environment." concept " according to abstraction hierarchy can be divided into " class " (Class) and " individuality " (Individual)." relation " is used between concept, setting up contact, can be divided into " relation between class ", " class and personal relationship " and " relation between individuality ".In Data Model Designing, " relation " all unified to regard as a kind of " concept ", therefore, in field database, all are all " concepts ", each " concept " has a unique sign.The all spectra data all can adopt the mode of this conception of species-relation-concept to express, and very complicated if the data in certain field form, this structure of field tables of data, be likely a very complicated reticulate texture so, but train of thought is but very clear.There is certain contact in the direct or indirect capital of data, a lot of field in the world, by field data and this expression structure of relation, can be easy to find associated data, different from relational data structure, this structure of the conceptual relation here, almost can hold all relations and data, can arbitrarily revise and create new relation and data, and unlike relational database, after from database design, completing, relation between data and data just no longer is changed, from on this point, this data building form of conceptual relation, when carrying out pattern search or data mining, occupy unrivaled advantage.
1.1 the data resource Ontology database structure is described
In the data resource ontology library, the ontology data that deposit storehouse the inside has the characteristics such as definite conception, morphology are terse, monosemy.A body is only explained a concept, and is to take noun and noun phrase as main.Some data object has specific relation, for example synonymy, antonymy and membership.At first design a underlying database, the inside is the data object without extended attribute and relation.Underlying database provides service for realizing synonym, antisense and being subordinate to three kinds of relations.Another pith is the internal memory index, and its structure has object hash value, disk block address; The object hash value is unique numeral, supposes that two data objects clash, and namely hash value equates, they are deposited in the piece that disk is identical, and such piece just can be deposited many records.
Disk file: will specify big or small disk space to be divided into a piece; A piece is deposited many records, if record has more the upper limit of piece, by next piece of pointed.By program, the source data object file is passed through to mapping, write binary data file.In data file, be to take data block to be unit, what in data block, store is that hash value is identical, and the data object of Hash conflict namely occurs.What in this data block, store is a plurality of attribute fields of data object.If necessary, can add other fields, such as for the synonym database, adding the address field of its synonym data object.While searching certain data object, just utilize the Hash function to calculate its hash value, according to mapping relations, just can directly find the relative physical block number of data object, then give database manager and Object Manager and carry out that data read and deblocking.In the middle of the method, be provided with database fast cache district, avoided the mass data object is called in to internal memory, save storage space, access efficiency is very high.
Independent data objects: namely data object without any relation, only returns to this data object of word, the relation of not showing other with other data objects during inquiry.This part is directly utilized underlying database, and underlying database is not realized the relation between data object.Field data storage logic structure comprises (structure is as shown in Figure 4):
(1) index: data object is by the value that hash function calculates, and is used for the locator data object address.
(2) block address: the relative physical address of the piece of store data object.
(3) data object record: data object itself.
Band relation data object: this relation comprises synonymy, antonymy and membership.These passes tie up on the basis of underlying database, utilize the address of its data object provided to realize.
The data object relational implementation is as follows: on the basis of underlying database, set up respectively concordance list file and record set file.Concordance list file and record set file are divided into respectively the continuous piece of specific size on physical storage structure, herein by one referred to as a record, (be that owner record is numbered 0, the second record number is 1 to every corresponding unique numbering of record, by that analogy).In the concordance list file, index record includes positive association group #, inverse association group #, subordinate's data object group #, the relative physical address of higher level's data object (these numberings all refer to the numbering in the record set file).In the record set file, every record has two signs to come last record number and the trailer record numbering of minute book record, simultaneously can store the relative physical address of data object of some, when a record space is finished, its trailer record of can reallocating as required.In the time adding positive association (inverse association/subordinate's association) data object will for certain data object, at first use Hash function deal with data object, produce unique relative physical address of data object, distribute an index record then in the concordance list file, for this data object, according to the data object address, record the index record numbering of this data object at block management data layer ad-hoc location.Then, at record of record set file allocation, this record is exclusively used in synonym positive association address (inverse association/subordinate's association) the relative physical address of data object of preserving the notebook data object, then new assignment record numbering is write in the index record word of notebook data object.Afterwards, in the corresponding record in the record set file, add synonym positive association address (antisense inverse association/subordinate's association) the relative physical address of data object.If add subordinate's word association data object, the relative physical address of its higher level's word association data object need to be set in the index terms record of subordinate's word association data object.After execution deletion positive association, inverse association, subordinate's association etc. have operated, to check corresponding record set file record, corresponding index record, if being sky or index record content, the respective record content is sky, record their numbering, and remove the relation between they and respective data object, free disk space recoverable like this, greatly improve the utilization factor of disk space.
1.2 definition database
The purpose of setting up this module is mainly management and the expansion for the ease of data-object library.Only have the definition certain field name after, the data object in data-object library can add this field with and field value; If revise certain field name, the data object that has this field in storehouse will be revised as new field name by this field name automatically, but the corresponding field value is constant; If defined certain field name in the deletion storehouse, all data objects that have this field in database will automatically delete this field with and field value.In like manner, after defining certain and concerning name, the data object in database can be set up this relation; If revise certain, concern name, in database, have between the data object of this relation and will automatically this relation be revised as to new relation; If delete database defined certain concern name, in database, have between all data objects of this relation and will automatically terminate this relation.For the defined default fields name of database with concern that name is can not modify and the operation such as deletion, for user-defined field name with concern name, the operations such as the user can add, modification, deletion.Fig. 5 has shown data resource ontology library abstract architecture and access process.With define field example by name,
Fig. 6 is the execution flow process of define field name.Step is as follows:
(1) the client-side interface serviced component serves serviced component to send the request of define field name to server.
(2) service end is called the method define field name of field database manager.Whether the inspection of database manager calling data library access object the define field name.Database access object echo check result.If the data Kuku is the define field name, to service end, return to integer 0.Service end is returned to operating result 0 to client, and process finishes.
(3) the database librarian sends the request of reading old checkpoint in log (log is the section end position effectively) to the database access object.
(4) the database access device returns to old checkpoint, and assignment is to startPos.Database manager sends the request of reading the effective backup version of current database number to the abstract data access object.The database access object returns to request results, and database manager will return results and be assigned to variable version.The coping database registration table, be called for short the new registration table.
(5) if the field recorded in the new registration table is by name empty, directly field name is added to the new registration table, field name is encoded to 1; Otherwise: read current largest field name coding in the new registration table.If in the new registration table, exist spare field name coding i to be less than current largest field name, compile, by this coding assignment, give newly-increased field name; Otherwise current largest field name coding adds 1 automatically.By current largest field name coding assignment, give newly-increased field name.Reset current largest field name coding.
(6) by current largest field name encoding setting in registration table, be 1.Create a log recording object.For each marking variable assignment of daily record object, comprise the data that it loads and the action that will carry out, then log recording is write to the log file.
(7) return to effective checkpoint new in journal file (being called for short new checkpoint).New checkpoint is write to the journal file head.By the log recording content data writing file (be called for short and submit to) newly write in journal file.
(8) return to the submission result, make a mistake or failure if submit to, database manager returns to operating result-1, terminal procedure to service end.If submit to successfully: the size of audit log file, if surpass a certain size, by the replacement journal file.Database manager returns to operating result 1 to service end.Service end is returned to solicit operation result 1 to client, the success of define field name.
1.3 managed data object information
Main being responsible for manages the data object information, and the functions such as increase, deletion, Update Table object information are provided.When destroying certain data object, by the full detail of this data object (relation that comprises field that this data object has and it and other word) from thoroughly deletion database.To three kinds of approach be arranged toward certain database of interpolation in database: the one, directly add certain data object, subsidiary its any information; The 2nd, for newly-increased certain field of certain data object the time, if also there is not this data object in database, database will add this data object automatically, and then adds new field information for it; The 3rd, be that two data objects are when setting up certain and concerning, if one of them data object or two data objects are not present in database, system is first added non-existent data object in database in storehouse to automatically, and then for they set up corresponding relation.When for data object, increasing newly or revising field, this field name must be the defined field name of database; In like manner, when being two data object opening relationships, concern that name must be also the defined name that concerns of database.Meanwhile, can arrange by the frequency word frequency of this module to the data object, database file can be imported and derives.The data inserting object field of below take is example brief introduction associative operation flow process, and Fig. 7 is the precedence diagram of data inserting object field.The corresponding step of precedence diagram is as follows:
(1) user's service interface sends to keyword and adds field to service end, and field value is content.
(2) to call the method for dictionary manager be that data object adds field and field contents to service end.
(3) database manager calls the dual coding that Dual-encoder calculates keyword.
(4) Dual-encoder returns to keyword dual coding object key(abbreviation index key); If return results as sky, turn to step 5; Otherwise turn to step 7.
(5) keyword is calculated to the dual coding failure, the data object management device returns to operating result-1 to service end.
(6) service end is returned to operating result-1 to client, and declaration solicit operation failure, forward step 40 to.
(7) the data object management device sends the coding request of obtaining field name in registration table to the abstract data access object.
(8) the database access object returns to the coding of corresponding field name, if rreturn value is not null value (being that field name defines).Forward step 11 to.
(9) database manager returns to operating result 0 to service end, declaration database undefined field name fieldName, can not for keyword insert this field with and content (field value).
(10) service end is returned to operating result 0 to client, the undefined field name fieldName of declaration dictionary, solicit operation failure.
(11) database manager sends the index value that obtains index key key in concordance list to the database access object.
(12) the database access object returns to the index value value that index key key shines upon, if value is empty, there is not keyword in database of descriptions, forwards step 13 to; Otherwise forward step 16 to.
(13) to database, add keyword, return and add result (integer); If add unsuccessfully, forward step 6 to, otherwise forward step 14 to.
(14) again to data remittance library access object, send the index value that obtains index key key in concordance list.
(15) the database access object returns to the index value value of index key key.
(16) database manager sends the request of reading old checkpoint in log (log is the section end position effectively) to the database access object.
(17) the database access object returns to old checkpoint, and database manager will return results assignment to variable startPos.
(18) database manager sends the request of reading the effective backup version of current database number to the abstract data access object.
(19) the database access object returns to reading result, and database manager will return results assignment to variable version.
(20) coping database registration table, be referred to as the new registration table.
(21) database manager sends the byte data information that reads keyword to the database access object.
(22) the database access object returns to the byte data information of keyword and the carrier of disk address collection (being called the data car).
(23) database manager sends the byte data of keyword is processed to data processing factory, changes into the request of visual information object.
(24) data processing factory returns to the key words content carrier to database manager, and database manager checks the key words content carrier; The field name that will add if existed with and the corresponding field value, forward step 25 to, otherwise forward step 27 to.
(25) database manager returns to operating result 2 to service end, means that the content that will add exists.
(26) service end is returned to operating result 2 to client, means that the content that will add exists.
(27) fieldName and content are added in the key words content carrier.
(28) database manager sends the data of keyword is processed to data processing factory, changes into byte data.
(29) data processing factory returns to the byte data of key words content to database manager, and the byte data that database manager will return is put into the data car.
(30) according to the information of the data of data car and new registration table, redistribute the address of required disk block, revise the information of new registration table.
(31) create new log recording object, and data car, new registration table and the action that will carry out are loaded in the log recording object.
(32) database manager sends to the request that writes newly-built log recording object in the log recording file to the database access object.
(33) the database access object returns to effective checkpoint that journal file is new (being called for short new checkpoint) to database manager.
(34) data management system sends the request that new checkpoint is write to the journal file head to the database access object, database access object response request.
(35) database manager sends the log recording content data writing file (be called for short and submit to) will newly write in journal file to the database access object.
(36) the database access object returns to the submission result, if submit to successfully, forwards step 38 to, otherwise forwards step 37 to.
(37) if submit to unsuccessfully or above each step throw exception, turn to step 6.
(38) database manager returns to operating result 1 to service end.
(39) service end is returned to solicit operation result 1 to client, inserts the field success.
(40) finish.
1.4 retrieve data object information
Search function is as follows:
(1) check the data object existence: check in database whether have certain data object;
(2) packet of retrieve data object: all visual data message of retrieve data object (field, field value, relation, relative), and be packaged into packet, for network or the transmission of other form;
(3) retrieve data object field value: the field value of retrieve data object field (field contents);
(4) retrieve data relative: all relatives of retrieve data object relation;
(5) by field name, retrieve: be divided into by the retrieval of individual character section with by the both field combined retrieval.By the retrieval of individual character section, refer to the packet that retrieves all data objects that have a certain field; By the both field combined retrieval, refer to the packet that retrieves all data objects that have simultaneously certain two field;
(6) by concerning the name retrieval: the packet that retrieves all data objects that have certain relation;
(7) coupling retrieval backward: retrieve the packet with all data objects headed by certain keyword;
(8) fuzzy matching retrieval: the packet that retrieves all data objects that contain certain keyword;
(9) retrieval high frequency words: the data object or the data object packet that retrieve from high to low specified quantity by frequency;
(10) retrieval low-frequency word: retrieve all data objects of word frequency lower than certain frequency;
(11) retrieval frequency word frequency: the frequency (number of times be retrieved) of a certain data object of retrieval;
(12) the defined all field names of searching database;
(13) the defined all names that concern of searching database.
Fig. 8 shows the precedence diagram of retrieve data object packet.The precedence diagram brief introduction:
(1) client is sent the request of packet of the data object of search key to service end.
(2) the method retrieve data object packet of service end calling data library searching device.
(3) database manager calls the dual coding that Dual-encoder calculates keyword.
(4) Dual-encoder returns to keyword dual coding object key(abbreviation index key).
(5) database searcher sends the request of obtaining the index value that in concordance list, key is corresponding to the database access object.
(6) the database access object returns to database searcher the index value value that key shines upon, if value is null value, there is not keyword in database of descriptions, forwards step 7 to; Otherwise forward step 9 to.
(7) database searcher returns to result for retrieval null to service end.
(8) service end is returned to result for retrieval null to client, forwards step 14 to.
(9) frequency of keyword is added to 1, then database searcher sends the request of upgrading keyword word frequency in the database index table, the automatic response request of database access object to the database access object.
(10) database searcher sends the request of upgrading keyword frequency in data file, the automatic response request of database access object to the database access object.
(11) database searcher calls the packet of the method search key of self at the first address of disk according to keyword.
(12) database searcher returns to the packet of keyword to service end.
(13) service end is returned to the packet of keyword to client.
(14) finish.
2, network-type field database module:
2.1 ultimate principle
Each network-type field data is converged with storing and is divided into index part and attribute section, and index part stores in the name.dct file, and attribute section stores in the attr.dct file.The name index of network-type data is partly the unrestricted B tree dynamically generated in the data inserting object.
In indexed file, used pointer operation, so N pointer of definition, they correspond respectively to N Chinese characters in common use in the GB2312-80 coding.Pointed be take the tree root of this word as the B at the name place of lead-in tree.Take same word in all names of lead-in all leave in same B tree.
In when retrieval, can use name to be referred to as key word the network-type data are retrieved, key word is by the Hash function, utilizes the GB2312-80 coding of title to calculate.During retrieval, first find the B tree, then find name by B tree searching algorithm.Title and the attribute in property file in index file are one to one, namely in indexed file, have found the summary info of data object in property file, to certainly exist the attribute of this network-type data object.The summary info of usining in indexed file is searched as key word (key), and the position of network-type data attribute in property file that the summary info of take is corresponding is lookup result.When in indexed file, finding summary info, can be according in property index immediate subordinate file corresponding before summary info, directly reading association attributes.Therefore to the operation of property file, be very fast, the time is mainly expended searching of indexed file.In indexed file, the storage of summary info and search and used hash algorithm and B tree algorithm, this algorithm is based on the retrieval of hard disk, and addressing operation is simultaneously directly found according to pointer, so efficiency of algorithm is higher.
2.2 index stores structure
In indexed file, the storage of the summary info of network-type data and search and used hash algorithm and B tree algorithm.The first character of summary info is called to " lead-in " here, the remainder of removing lead-in is called " suffix ".At first in indexed file, set up a position table that N list item arranged, each list item consists of single character and its GB2312-80 value.Character in each list item can obtain it in Biao Zhongde address, position as long as calculate its key value by the Hash function.In list item, also deposit the pointer that points to B tree tree root.This B tree is used for storing the suffix of summary info.
When the title of storage networking type data, first according to the summary info lead-in, find the B tree that this word is corresponding in the table of position, then its suffix is inserted in the B tree.The search procedure of summary info and storage class seemingly, first find the B tree corresponding with it according to the lead-in of summary info in the table of position, in the B tree, search the suffix of summary info.
The structure of B tree: the B tree is used for storing the suffix of summary info, and suffix is stored as key word in the B tree.In order to reduce the disk reading times, used according to actual needs n rank B tree.In the B tree, each node comprises following message:
(n,C 0,A 1,K 1,C 1,A 2,K 2,C 2,…,A n,K n,C n,Father)
Wherein n is the number of key word in node; K i(i=1 ... .., n) be key word (suffix of summary info), and K i<K I+1(i=1 ... .., n); C i(i=1 ... .., n) for pointing to the pointer of subtree root node, and pointer C I-1Key word in the indication subtree all is less than K i(i=1 ... .., n), C nIn the indication subtree, the key word of all nodes all is greater than K nA i(i=1 ... .., n) be the pointer of property file, this pointed be take character corresponding to node place B tree as lead-in, with K iThe position of summary info attribute in property file for suffix; Father is for pointing to the pointer of parent node.
In the time will searching a data object, first according to the lead-in of its given summary info, calculate the list item address of this word in the table of position by Hash, then the content that reads list item finds the B tree tree root address that this word is corresponding, then in the B tree, searches the suffix of summary info.Searching the nonrecoverable time and be the time that Hash calculates adds the time that the B tree is searched, so this searching algorithm efficiency is higher.Use the memory-mapped technology, do not need index file is read in to internal memory, only need the node that will use to read in internal memory and get final product, greatly reduced disk and read the time, improve memory usage.
2.3 the property store structure of network-type data
In the attribute of network-type data object, other all is kept in property file except title, and attribute is divided into two parts, and the attribute existed when a part is database design, be called base attribute, is kept in the base attribute file; Another part is the User Defined attribute, is called extended attribute, is kept in the extended attribute file.Network-type data object base attribute is to deposit with the form of attribute block, and the property store of a data object is in an attribute block, and the base attribute piece of data object leaves in the base attribute file successively according to the insertion sequence of data object.The data object extended attribute is preserved with the form of chained list.The storage organization of property file as shown in Figure 9.The base attribute piece consists of base attribute piece, pointer a, the pointer b etc. of data object.At base attribute piece pointer a, point to attribute block of the same name, pointer b points to the extended attribute first address of data object in the extended attribute file.
During Network Search type data object attribute, first in the title indexed file with data object, search, when finding the title of data object, can in the node of store name, find the position indicator pointer of data object attribute piece in property file, according to this pointer, directly to the appropriate address of property file, read base attribute, then according to the associated pointers in attribute block, in the extended attribute file, read extended attribute again.Therefore the recall precision in property file is very high.
3, hierarchical field database module:
3.1 ultimate principle
According to the characteristics that concern between hierarchical data, can know, between hierarchical data, mainly contain be subordinate to, adjacent, intersect, four kinds of relations of finger together, wherein membership is main relation, the existing higher level of hierarchical data object possibility, subordinate is also likely arranged, here stipulate that each hierarchical data object is directly responsible to upper level, or leader's next stage, so design is for the consideration that exists much basic primary attributes to equate fully in hierarchical data.
For each hierarchical data object, can carry out this data object of unique identification by the key word using it as database.Key word is to calculate several couples by the Hash function based on dual coding, as the logical address of this data object.The key word and its memory address on disk that are each data object are one to one, search some data objects, as long as calculate by the Hash function logical address that its key value just is equivalent to obtain this data object, then by the data access task, give Object Manager and log manager completes.This method has been avoided the search coupling, time consumption is mainly in the calculating of hash value, and all data blocks need not be called in internal memory, as long as needed data object is read in to internal memory, no matter in the execution efficiency of algorithm, or, on the utilization factor of memory headroom, be all feasible.Because addressing is directly searched according to pointer, the recall precision of data object is high simultaneously.
3.2 the structure of index file
The indexed file structure body mainly comprises following each territory, and its Action Specification is as follows:
Key is the key value calculated by Hash, and this key value is unique for each data object;
Father/Son domain representation membership, Father means the Key of upper level hierarchical data object, Son means the Key of next stage hierarchical data object;
Neighbour domain representation neighbouring relations, neighbour means the Key of hierarchical data object adjacent on relational structure, the more than Key of this territory possibility;
Cross domain representation cross reference, cross means on relational structure to have the Key of the hierarchical data object of cross reference, and this territory also may a more than Key equally;
The Co-ref domain representation is with the finger relation, and co-ref is illustrated on semantic understanding the hierarchical data object referred in same territory, and this territory is the more than Key of possibility also;
This hierarchical data object of 0/1 domain representation may have the duplication of name phenomenon, and namely name is identical, but two diverse meanings.If be set to 0, meaning not bear the same name, is that 1 expression has duplication of name, back followed by the Fathers territory putting down in writing all upper level data objects that comprise this data object;
All upper level data objects that comprise this hierarchical data object when Fathers is putting down in writing in territory the duplication of name phenomenon, so this territory also may a more than Key.If do not bear the same name be NULL.
Above each territory accounts for four bytes.
3.3 the structure of data file
Data file is to deposit the file of hierarchical data object itself, it and index file in conjunction with realizing the accessing operation to the data object.Data file is the data object linear list on mathematical logic, the list item in linear list---data entry by pointer, set up and structure in contacting between each territory, as shown in figure 10.W wherein i, W jIt is respectively the entry of a hierarchical data object, Father, Son, Neighbour, Cross, Co-ref, the pointer in Fathers territory point to respectively this data object entry upper level data object, next stage data object, adjacent data object, intersect data object, with index according to object, and when having base attribute to equate fully, all upper level data objects of this data object.By the upper level data object, can distinguish this two data objects.
When needs are searched certain data object, as long as calculate the address of this data object by Hash, and it is called in to the institutional framework that just can build easily this data object and other related data objects in internal memory.And without processes such as retrieval couplings, the time of whole retrieval is mainly expended in Hash calculating, the algorithm time efficiency is high, and the load factor of Hash is more than 0.8.
Figure 11, Figure 12 have shown the related service algorithmic procedure of hierarchical field database.
4, field data evolution module:
4.1 the information extraction based on LDA
The first step that the field data develop is that a large amount of valuable text messages are carried out to analyzing and processing.Native system adopts the text cluster digging technology based on LDA (Latent Dirichlet Allocation) probability generation model, and it is gathered into different classifications automatically by the text by similar in text set, helps to find the association area data.Text means with vector space model, and the text representation matrix has very high dimension usually, in cluster process, tends to cause similarity measurement to lose meaning because of " dimension calamity ".By the LDA topic model, have good text representation ability, can excavate the potential semantic information of text, obtain the expression of document in the theme space, reduce the dimension of document representation.By the modeling to text, can carry out feature selecting, subject classification, judgement similarity etc. to text.The LDA model has adopted the method for word bag, and the method is considered as a word frequency vector by each piece text data resource, thereby text message is converted into to the numerical information that is easy to modeling.
Three layers of Bayesian model of LDA model mean as shown in figure 13.Φ kMean the lexical item probability distribution in theme K, θ mThe theme probability distribution that means m piece of writing document, Φ k, θ mParameter as multinomial distribution is respectively used to generate theme and word again.K represents the theme number, and M represents number of documents, N mThe document length that means m piece of writing document, ω m,nAnd Z m,nMean respectively in a m piece of writing document n word and theme thereof.α and β are the parameters that Dirichlet distributes, and normally fixed value and symmetrical, therefore mean with scalar.Φ k, θ mAll obey Dirichlet and distribute, this distribution function is shown below:
Dir ( &mu; | &alpha; ) = &Gamma; ( &alpha; 0 ) &Gamma; ( &alpha; 1 ) . . . &Gamma; ( &alpha; k ) &Pi; k = 1 K &mu; k &alpha; k - 1 (formula one)
Wherein, 0≤μ k≤ 1,
Figure BDA00003637736000172
Γ is gamma function.The generative process of LDA is as follows.
(a) for theme, sample
Figure BDA00003637736000176
(b) for m document in language material, m ∈ [1, M];
(c) sampling theme probability distribution θ m~ Dir (α);
(d) adopt the document length N m~ Poiss (ξ);
(e) for n word in document m, n ∈ [1, N m];
(f) select implicit theme z m,n~ Mult (θ m);
(g) generate word
Figure BDA00003637736000173
The parameter estimation of LDA, at first calculate the conditional probability of subject nucleotide sequence under word sequence, and formula is as follows:
p ( z | w ) = p ( w , z ) &Sigma; z p ( w , z ) (formula two)
Then subject nucleotide sequence is carried out to the Gibbs sampling, the sampling formula is as follows:
p ( z i = k | z . . . i , w ) &Proportional; n k , . . . , i ( t ) + &beta; t [ &Sigma; &upsi; = 1 V n k ( &upsi; ) + &beta; &upsi; ] - 1 &CenterDot; n m , . . . , i ( k ) + &alpha; k [ &Sigma; z = 1 K n m ( z ) + &alpha; z ] - 1 (formula three)
Obtain the label of the theme z of each word ω, final parameter calculation formula is expressed as follows:
Figure BDA00003637736000181
(formula four)
&theta; m , k = n m ( k ) + &alpha; k &Sigma; z = 1 K n m ( z ) + &alpha; k
The model M trained, appoint to new document
Figure BDA00003637736000183
Wherein the implicit theme sampling formula of each word is as follows:
p ( z ~ t = k | &omega; ~ i = t , z ~ &RightArrow; i , &omega; ~ &RightArrow; i ; M ) = n k ( t ) + n k , &RightArrow; i ( t ) + &beta; t &Sigma; &upsi; = 1 V n k ( &upsi; ) + n ~ k , &RightArrow; i ( &upsi; ) + &beta; &upsi; &CenterDot; n m ~ , &RightArrow; i ( k ) + &alpha; k [ &Sigma; z = 1 K n m ~ ( z ) + &alpha; z ] - 1 (formula five)
Wherein,
Figure BDA00003637736000185
Represent new document
Figure BDA00003637736000189
Corresponding theme vector.
By the above-mentioned Gibbs method of sampling, obtain the theme label of each word, use formula six, calculate the document after the value on each theme component, the document in this space has just obtained the expression in the theme space.
&theta; m ~ , k = n m ~ ( k ) + &alpha; k &Sigma; z = 1 K n m ~ ( z ) + &alpha; z (formula six)
Through after above step, can carry out cluster process.Utilize after LDA selects a certain proportion of feature, select the K-means algorithm to carry out cluster to text.The text cluster flow process is as follows:
(1) urtext is carried out to pre-service, comprise participle, remove stop words;
(2) with the LDA model, carry out feature selecting;
(3) feature to selecting, add up the weight of each feature in every piece of text, and the computing formula of the weights W (d, w) of feature in text is as follows:
W ( d , w ) = log ( tf ( d , w ) + 1 ) &times; log ( ( M + 1 ) / ( df ( w ) + 0.5 ) ) &Sigma; log ( tf ( d , w &prime; ) + 1 ) &times; log ( ( M + 1 ) / ( df ( w &prime; ) + 0.5 ) ) (formula seven)
Wherein, M is overall text number, and the number of times that tf (d, w) occurs in text d for lemma w, df (w) are the text frequency of lemma w.After obtaining the expression of text, just can generate a vector space model.
(4) choose at random initial point, utilize the K-means algorithm to obtain final cluster result.Wherein the K-means clustering algorithm need to be measured the distance between text, adopts the cosine similarity to calculate.For two text d and d ', their calculating formula of similarity is as follows:
sim ( d , d &prime; ) = &Sigma; w &Element; d , d &prime; W ( d , w ) &times; W ( d &prime; , w ) d &times; d &prime; (formula eight)
4.2 field database Version Control
Field database Version Control module is quoted the theory of Version Control and the management of evolutionary process that method is carried out fulfillment database and is controlled.The Evolution States each time of data can be considered a version, and this module provides the functions such as version generation, version recovery, version deletion.Particularly, due to factors such as the modification of field database and evolutions, field database can be along with the front and then continuous evolution of time, and the function of this module records this a series of evolutionary process.It has recorded the Evolution History of specific field data on the one hand, the user can be checked at any time, and can recover certain field data to certain version in the past, the database that the user also can certain state of mark on the other hand is a version, in order to make sometime whole database recovery to this version in future.In needs, the user can delete certain non-key version.Its structure as shown in figure 14.
4.3 new data-objects is found
The discovery of new data-objects needs the user that a large amount of base text data are provided, and system is analyzed these data by above-mentioned analytical model based on LDA, builds a googol according to the object analysis storehouse; After new data are input to data object analysis storehouse, triggering is read to the version information of the field data that are associated, then by analyzing the relation between data object and associated field versions of data, calculate the current data object and be the possibility of new field data, and automatically or the manual field data to new of user modify, and add in field database.The core texture that data object is analyzed storehouse is a suffix tree.It is that catalogue monitors module that this part also has the another one pith, for the arrival of system automatic sensing new data, and then the processing of automatically developing, its disposal route is as follows:
(1) system starts, and checks that configuration file is to obtain field data evolution data source catalogue.
(2) start the state variation of catalogue monitor (AutoDectector) monitored data source directory.When having new file to increase, the catalogue monitor can detect this variation, then checks its file layout, if be a kind of in text, pdf document, html file and Word document, it is read to analysis.
(3) according to the type difference of input file, different paper analyzers: TxtAnalyzer, PdfAnalyzer, HtmlAnalyzer, WordAnalyzer have been realized.Wherein PdfAnalyzer and WordAnalyzer have been used Open-Source Tools Apache POI to realize.After paper analyzer, obtain text or text flow (returned text stream when data volume is huge).
(4) for the text obtained or text flow, be input in the analysis storehouse, namely be inserted in current up-to-date suffix tree.When changing, suffix tree can trigger the inspection to the entry of associated change: after the frequency of entry reaches threshold value t, from inquiry field data relevant to this entry the data resource ontology library, and determine whether to build new basic area data according to Query Result.
(5) after analyzing complete file, by the file of this document RNTO with " .analyzed " ending, to distinguish over the file of not analyzing.After this, check the meta data file in the data source catalogue, if the current quantity capacity of Study document or size have reached the upper limit deleted some files of analyzing the earliest.
The content be not described in detail in instructions of the present invention belongs to the known prior art of professional and technical personnel in the field.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (6)

1. the Database Systems of an extendible polymorphic type field data coordinated management is characterized in that comprising: data resource ontology library module, network-type field database module, hierarchical field database module and field data evolution module, wherein:
Data resource ontology library module, definition top layer data resource model, realize logical view design and the node store structure design of primitive, data storage and inquire shoring of foundation ability is provided, and set up the database that comprises a large number of services data object, relation and concept; Data resource ontology library module provides top layer data abstraction rule and data access rule for network-type field database module, hierarchical field database module;
Network-type field database module, attribute, relational network and other specific properties according to data object, on the basis of data resource ontology library, build the database of type data attribute Network Based, realize Data Structure Design, design Storage and the Index Design of network-type data object, formed the network of personal connections that comprises a large amount of network-type data objects, and realized providing the network-type database access interface to outside; Network-type field data module is that data resource ontology storehouse is belonged to succession and the realization of the instantiation on the data of network-type field; The query interface of V-neck V numeric field data Network Based is provided to user, other modules and external system;
Hierarchical field database module, according to being subordinate between the hierarchical data object, adjacent, intersect, the characteristics that concern such as at the same level, build the database that special expression data object and level thereof are subordinate to relevant information, and realize providing data object and level thereof to be subordinate to the access interface of database to outside; Level field database module is the further evolution to network-type field data module, by only having level structure field data, carry out storage organization with the form of tree, the implementation level semanteme, and provide the query interface based on level field data to user, other modules and external system;
Field data evolution module, follow the trail of and control the data resource ontology library, the variation of field data in user's use procedure in network-type field database and level field database, set up versions of data history, and the raw data set that the user provides is analyzed in conjunction with data with existing, thereby obtain new field data and be input in field database by screening, data evolution module in field is controlled for above-mentioned three storehouses provide based on the versions of data of record, automatically from the raw data of user's input, finding new field data, and use its interface to carry out the corresponding management of developing.
2. the Database Systems of a kind of extendible polymorphic type field data coordinated managements according to claim 1 is characterized in that: described data resource ontology library module comprises that data persistence module, bottom dictionary set up module, contextual definition module, data directory module and interface module;
Data persistence module, definition are towards the implementation method of interface, according to the configuration data persistence realization flexibly of different hardware environment, context environmental and other demands; Based on the object serialization technology, serializing and the unserializing agreement of definition field data related object, output to file, database or network site to the binary stream obtained after object serialization by the file organization agreement during data persistence; When needs were not written into the object of Object pooling, the logical address of sending according to the upper strata request read respective stream of data, then by the unserializing agreement reconstruct object of object; The logical organization mode of data in file is the piece storage mode, and the management of piece adopts pile structure to manage; The data persistence module of data resource ontology library is also simultaneously that the data persistence of network-type field database and hierarchical field database is abstract, the data storage function of latter two module customizes and expands according to different persistence agreements based on this persistence module, forms the persistence storehouse of specific data type;
Underlying database is set up module, and storage, without the data object data of extended attribute and relation, is set up basic field data object, serializing agreement, unserializing agreement and storage manager; The single data mode of underlying database, for the definition of network-type field data and hierarchical field data provides the data basis; Network-type field database module and hierarchical field database module have realized serializing and the unserializing interface of definition;
The contextual definition module, on the basis that underlying database is realized, set up synonymy, antonymy, membership with new file to the entry of underlying database; Highly abstract, general contextual definition, tissue, storage and management, make the network-type field database realize on this basis flexible expansion;
The data directory module, to the definition of making a summary of field data object, by double coding algorithm fast, high-ranking military officer's numeric field data summary shines upon with the logical storage information of field data object, reaches the purpose of quick-searching and access control; Network-type field database module and hierarchical field database module all contain index part, and wherein key word obtains a long numeral to realizing after all by dual coding, calculating;
Interface module, based on the EJB3.0 standard, realize, issue with EJB interface and Web Service interface shape, realize cross-platform service, network-type field database and hierarchical field database, by inheriting data resource ontology library interface module, realize the interface issuing function of customization.
3. the Database Systems of a kind of extendible polymorphic type field data coordinated managements according to claim 1, it is characterized in that: described network-type field database module implementation procedure is as follows:
(1) on design Storage based on the storage administration layer of described data resource ontology library, field data object and persistence agreement that define grid V-neck V numeric field data is relevant;
(2) on data object basis, defined network-type field, carry out design Storage, at first defined attribute part basic structure and process; Attribute section is divided into to two parts, and the attribute existed when a part is database design, be called base attribute; Another part is user-defined attribute, is called extended attribute;
(3) on the data object storage organization of network-type field, set up data directory, based on the tree of the B with the fast cache district and Bloom Filter, realize, the B tree dynamically generated when inserting the network-type data object, and do not limit the maximum number of plies of B tree, for the identical situation of network-type data object key word, attribute block is connected together and forms an attribute block chained list with pointer, when with keyword query network-type data object, obtain fast the network-type data object list of same key word;
(4), after having realized data directory, for the renewal of data recording, by checkpoint and journal file, data are upgraded and carried out the most fine-grained efficient access and the high fault tolerance that records safeguards system.
4. the Database Systems of a kind of extendible polymorphic type field data coordinated managements according to claim 1, it is characterized in that: described hierarchical field database module implementation procedure is as follows:
(1) on design Storage based on the storage administration layer of described data resource ontology library, relevant field data object, the persistence agreement of definition level V-neck V regional data base;
(2) at hierarchical field data object and based on hierarchical field data structure, expand on the basis of persistence agreement, carry out the storage of hierarchical structure and set up;
(3), after accumulation layer has been set up, carry out the relational structure between organisational level type data object by the binary tree limited; Key word in index file forms by the unique dual coding number of data object is even, need not consider the problem of conflicting; In property file, the hierarchical data of depositing father's data object and subdata object is also to adopt this number occasionally to store; During retrieval, even by the number that calculates this data object, in index file, mate identical number occasionally, obtain the pointer of corresponding attribute, read attribute, if a plurality of subdata objects are arranged, can find all subordinate's data objects by the pointer of the next attribute of sensing of attribute;
(4), after index completes, on the checkpoint and journal file function basis of network-type field database module, build the simplification journal file that is applicable to the hierarchical structure.
5. the Database Systems of a kind of extendible polymorphic type field data coordinated managements according to claim 1, it is characterized in that: described field data evolution module implementation procedure is as follows:
(1) at first collect the User Activity record of each database, the variation of the level of activity of monitoring field data object;
(2) the data object activity change data analysis to collecting, include activity in warning standby storehouse lower than the field data object of system thresholds;
(3) the further activation record of analysis user, the field data object that core attribute is changed is set up the version change records of this data object;
(4) system the user is provided or internet on text data analyze, build googol storehouse according to one's analysis; After new data are input to Data analysis library, triggering is read to the version information of associated data object, then by analyzing the relation between data object and associated data object version, calculate the current data object and be the probability of new field data, automatic or the manual field data to new of user are modified, and add in corresponding field database.
6. the data base management method of an extendible polymorphic type field data coordinated management is characterized in that performing step is as follows:
(1) text data user provided carries out pre-service, removes and comprises the non-core field data of stopping word, modal particle and punctuation mark, obtains the preprocessed text data;
(2) preprocessed data of output in step (1) is input in LDA (Latent Dirichlet Allocation) probability model, mates with built vertical data model, obtain domain-specific data object wherein;
(3) the domain-specific data object of output in step (2) is carried out to the structure of suffix tree (Suffix Tree), and merge existing suffix tree, the suffix tree be combined progressively travels through, and obtains the character string of high frequency, then domain-specific data object of initialization;
(4) domain-specific data object step (3) obtained is input to the data resource ontology library and carries out type, relation judgement and coupling, obtain the type with this domain-specific data object, be hierarchical, network-type or user-defined type, and the other field data object associated with this data object;
(5) field database that the domain-specific data object of step (4) being exported and associated data thereof are input to corresponding types, set up the data change log recording, and, by this domain-specific data object input double coding algorithm, obtain corresponding index number even;
(6) number step (5) obtained is even carries out combinations of services with data object and association area data object, and final output contains domain-specific, multiple relation, multiattribute field data object.
CN201310343157.XA 2013-08-08 2013-08-08 The Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management and management method Active CN103412917B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310343157.XA CN103412917B (en) 2013-08-08 2013-08-08 The Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management and management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310343157.XA CN103412917B (en) 2013-08-08 2013-08-08 The Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management and management method

Publications (2)

Publication Number Publication Date
CN103412917A true CN103412917A (en) 2013-11-27
CN103412917B CN103412917B (en) 2016-08-10

Family

ID=49605929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310343157.XA Active CN103412917B (en) 2013-08-08 2013-08-08 The Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management and management method

Country Status (1)

Country Link
CN (1) CN103412917B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183735A (en) * 2014-06-18 2015-12-23 阿里巴巴集团控股有限公司 Data query method and query device
CN105354266A (en) * 2015-10-23 2016-02-24 北京航空航天大学 Rich graph model RichGraph based graph data management method
CN106326457A (en) * 2016-08-29 2017-01-11 山大地纬软件股份有限公司 Construction method and system of human society person portfolio database on the basis of big data
CN106569941A (en) * 2016-11-04 2017-04-19 金蝶软件(中国)有限公司 Data process recording method and apparatus
CN106682173A (en) * 2016-12-28 2017-05-17 华南理工大学 Social security big data OLAP pre-processing method and on-line analysis and query method
CN107133283A (en) * 2017-04-17 2017-09-05 北京科技大学 A kind of Legal ontology knowledge base method for auto constructing
CN108182265A (en) * 2018-01-09 2018-06-19 清华大学 For the Multilevel Iteration screening technique and device of relational network
CN108431780A (en) * 2016-01-19 2018-08-21 微软技术许可有限责任公司 Use the versioned record management for restarting the epoch
CN109254962A (en) * 2017-07-06 2019-01-22 中国移动通信集团浙江有限公司 A kind of optimiged index method and device based on T- tree
CN109446175A (en) * 2018-11-12 2019-03-08 郑州云海信息技术有限公司 A kind of method and apparatus for the log object constructing key operation
CN110019474A (en) * 2017-12-19 2019-07-16 北京金山云网络技术有限公司 Synonymous data automatic correlation method, device and electronic equipment in heterogeneous database
CN110569327A (en) * 2019-07-08 2019-12-13 电子科技大学 multi-keyword ciphertext retrieval method supporting dynamic updating
CN110851848A (en) * 2019-11-12 2020-02-28 广西师范大学 Privacy protection method for symmetric searchable encryption
CN111192176A (en) * 2019-12-30 2020-05-22 华中师范大学 Online data acquisition method and device supporting education informatization assessment
CN111767332A (en) * 2020-06-12 2020-10-13 上海森亿医疗科技有限公司 Data integration method, system and terminal for heterogeneous data sources
CN111897824A (en) * 2020-03-25 2020-11-06 上海云励科技有限公司 Data operation method, device, equipment and storage medium
CN112597348A (en) * 2020-12-15 2021-04-02 电子科技大学中山学院 Method and device for optimizing big data storage
CN112990601A (en) * 2021-04-09 2021-06-18 重庆大学 Data mining-based worm gear machining precision self-healing model and method
KR102392880B1 (en) * 2021-09-06 2022-05-02 (주) 바우디움 Method for managing hierarchical documents and apparatus using the same
CN115048344A (en) * 2022-08-16 2022-09-13 安格利(成都)仪器设备有限公司 Storage method for three-dimensional contour and image data of inner wall and outer wall of pipeline or container
CN115543960A (en) * 2022-09-16 2022-12-30 北京神舟航天软件技术股份有限公司 Dynamic modeling method and system for business object

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724577A (en) * 1995-06-07 1998-03-03 Lockheed Martin Corporation Method for operating a computer which searches a relational database organizer using a hierarchical database outline
CN102110165A (en) * 2011-02-28 2011-06-29 深圳市五巨科技有限公司 Method and system for scheduling interior of browser of mobile terminal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724577A (en) * 1995-06-07 1998-03-03 Lockheed Martin Corporation Method for operating a computer which searches a relational database organizer using a hierarchical database outline
CN102110165A (en) * 2011-02-28 2011-06-29 深圳市五巨科技有限公司 Method and system for scheduling interior of browser of mobile terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈宁江等: "用户访问特征驱动的中间件语义缓存替换策略", 《广西大学学报:自然科学版》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183735A (en) * 2014-06-18 2015-12-23 阿里巴巴集团控股有限公司 Data query method and query device
CN105354266A (en) * 2015-10-23 2016-02-24 北京航空航天大学 Rich graph model RichGraph based graph data management method
CN108431780B (en) * 2016-01-19 2022-03-04 微软技术许可有限责任公司 Versioned record management computing system, method and computer readable storage device
CN108431780A (en) * 2016-01-19 2018-08-21 微软技术许可有限责任公司 Use the versioned record management for restarting the epoch
CN106326457A (en) * 2016-08-29 2017-01-11 山大地纬软件股份有限公司 Construction method and system of human society person portfolio database on the basis of big data
CN106326457B (en) * 2016-08-29 2019-04-30 山大地纬软件股份有限公司 The construction method and system of people society personnel file pouch database based on big data
CN106569941A (en) * 2016-11-04 2017-04-19 金蝶软件(中国)有限公司 Data process recording method and apparatus
CN106569941B (en) * 2016-11-04 2019-01-01 金蝶软件(中国)有限公司 The method and apparatus for recording data course
CN106682173A (en) * 2016-12-28 2017-05-17 华南理工大学 Social security big data OLAP pre-processing method and on-line analysis and query method
CN106682173B (en) * 2016-12-28 2019-10-18 华南理工大学 A kind of social security big data OLAP preprocess method and on-line analysis querying method
CN107133283A (en) * 2017-04-17 2017-09-05 北京科技大学 A kind of Legal ontology knowledge base method for auto constructing
CN109254962A (en) * 2017-07-06 2019-01-22 中国移动通信集团浙江有限公司 A kind of optimiged index method and device based on T- tree
CN110019474A (en) * 2017-12-19 2019-07-16 北京金山云网络技术有限公司 Synonymous data automatic correlation method, device and electronic equipment in heterogeneous database
CN108182265B (en) * 2018-01-09 2021-06-29 清华大学 Multilayer iterative screening method and device for relational network
CN108182265A (en) * 2018-01-09 2018-06-19 清华大学 For the Multilevel Iteration screening technique and device of relational network
CN109446175A (en) * 2018-11-12 2019-03-08 郑州云海信息技术有限公司 A kind of method and apparatus for the log object constructing key operation
CN110569327A (en) * 2019-07-08 2019-12-13 电子科技大学 multi-keyword ciphertext retrieval method supporting dynamic updating
CN110851848A (en) * 2019-11-12 2020-02-28 广西师范大学 Privacy protection method for symmetric searchable encryption
CN110851848B (en) * 2019-11-12 2022-03-25 广西师范大学 Privacy protection method for symmetric searchable encryption
CN111192176A (en) * 2019-12-30 2020-05-22 华中师范大学 Online data acquisition method and device supporting education informatization assessment
CN111192176B (en) * 2019-12-30 2023-04-28 华中师范大学 Online data acquisition method and device supporting informatization assessment of education
CN111897824A (en) * 2020-03-25 2020-11-06 上海云励科技有限公司 Data operation method, device, equipment and storage medium
CN111767332A (en) * 2020-06-12 2020-10-13 上海森亿医疗科技有限公司 Data integration method, system and terminal for heterogeneous data sources
CN112597348A (en) * 2020-12-15 2021-04-02 电子科技大学中山学院 Method and device for optimizing big data storage
CN112990601A (en) * 2021-04-09 2021-06-18 重庆大学 Data mining-based worm gear machining precision self-healing model and method
CN112990601B (en) * 2021-04-09 2023-10-31 重庆大学 Worm wheel machining precision self-healing system and method based on data mining
KR102392880B1 (en) * 2021-09-06 2022-05-02 (주) 바우디움 Method for managing hierarchical documents and apparatus using the same
KR102472345B1 (en) * 2021-09-06 2022-11-30 (주) 바우디움 Method for managing hierarchical documents and apparatus using the same
CN115048344A (en) * 2022-08-16 2022-09-13 安格利(成都)仪器设备有限公司 Storage method for three-dimensional contour and image data of inner wall and outer wall of pipeline or container
CN115543960A (en) * 2022-09-16 2022-12-30 北京神舟航天软件技术股份有限公司 Dynamic modeling method and system for business object
CN115543960B (en) * 2022-09-16 2024-01-05 北京神舟航天软件技术股份有限公司 Dynamic modeling method and system for business object

Also Published As

Publication number Publication date
CN103412917B (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN103412917B (en) The Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management and management method
CN109739849B (en) Data-driven network sensitive information mining and early warning platform
Karnitis et al. Migration of relational database to document-oriented database: Structure denormalization and data transformation
CN105653691B (en) Management of information resources method and managing device
CN103631596B (en) Business object data typing and the configuration device and collocation method for updating rule
US9507807B1 (en) Meta file system for big data
Leung et al. Mining frequent patterns from uncertain data with MapReduce for big data analytics
Fernández et al. Towards Efficient Archiving of Dynamic Linked Open Data.
CN102768674B (en) A kind of XML data based on path structure storage method
CN107851123A (en) Expression formula is embodied in virtual column unit in memory to accelerate analysis to inquire about
Lu et al. Multi-model Data Management: What's New and What's Next?
CN101916299A (en) Three-dimensional spatial data storage and management method based on file system
CN107944041A (en) A kind of storage organization optimization method of HDFS
CN111475653B (en) Method and device for constructing knowledge graph in oil and gas exploration and development field
CN115858513A (en) Data governance method, data governance device, computer equipment and storage medium
Tiwari et al. Pattern warehouse: context based modeling and quality issues
Nasution et al. Data management as emerging problems of data science
Fan et al. A hierarchical contraction scheme for querying big graphs
CN101770473A (en) Method for querying hierarchical semantic venation document
KR102153259B1 (en) Data domain recommendation method and method for constructing integrated data repository management system using recommended domain
Castelltort et al. Exploiting NoSQL graph databases and in memory architectures for extracting graph structural data summaries
CN102597969A (en) Database management device using key-value store with attributes, and key-value-store structure caching-device therefor
Truică et al. A scalable document-based architecture for text analysis
Liu et al. Efficient mining of distance‐based subspace clusters
Portmann et al. Fuzzy online reputation analysis framework

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20191017

Address after: No. 1089, building a, No. 19, Guokai Avenue, Nanning City, 530000 Guangxi Zhuang Autonomous Region

Patentee after: Nanning super cube science and Technology Co Ltd

Address before: 530004 No. 100, University Road, the Guangxi Zhuang Autonomous Region, Nanning

Patentee before: Guangxi University