CN104317937A - Massive Chinese data query method and system based on oracle database - Google Patents

Massive Chinese data query method and system based on oracle database Download PDF

Info

Publication number
CN104317937A
CN104317937A CN201410602655.6A CN201410602655A CN104317937A CN 104317937 A CN104317937 A CN 104317937A CN 201410602655 A CN201410602655 A CN 201410602655A CN 104317937 A CN104317937 A CN 104317937A
Authority
CN
China
Prior art keywords
index
oracle database
job
chinese
word table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410602655.6A
Other languages
Chinese (zh)
Inventor
姜连海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Tech Information Technology Co Ltd
Original Assignee
Beijing Si Tech Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Tech Information Technology Co Ltd filed Critical Beijing Si Tech Information Technology Co Ltd
Priority to CN201410602655.6A priority Critical patent/CN104317937A/en
Publication of CN104317937A publication Critical patent/CN104317937A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The invention relates to a massive Chinese data query method and system based on oracle database. The method includes the steps: 1, a text index based on a Chinese field is created on the oracle database layer; 2, two jobs are created, one job is used for adding a text index for new data in the oracle database, and the other job is used for optimizing the text indexes of all data in the oracle database; 3, the two jobs are performed in the oracle database layer; 4, based on the created text indexed, SQL (structured query language) statements are adopted to conduct Chinese field inquiry in the oracle database. With the massive Chinese data query method and system based on oracle database, the text indexes are built on the Chinese field, the job is then created to regularly index the new data and optimize the established indexes. Therefore, the efficiency is high, the system is easy to maintain, and the method and system are suitable for big data query.

Description

A kind of magnanimity Chinese data querying method based on oracle database and system
Technical field
The present invention relates to IT industry browser and database big data quantity inquires about mutual field, particularly relate to a kind of magnanimity Chinese data querying method based on oracle database and system.
Background technology
In a lot of systems, often can run into the fuzzy query to Chinese character, such as: about the address administration of system, data volume the chances are up to ten million address date, then we often can retrieve these Chinese datas according to title, traditional fuzzy query adopts: like " %xxxx% " this mode is inquired about, this mode can carry out the scan search of full table, substantially the demand of user can be met in the system that data volume is smaller, run into up to a million, when even up to ten million data, this mode is substantially just done useless, inquiry velocity is slow in the extreme.Therefore, the present invention proposes a kind of the magnanimity Chinese data querying method and the system that are applicable to large data query.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of magnanimity Chinese data querying method based on oracle database and system, for solving the large Chinese data querying method of data volume and system.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of magnanimity Chinese data querying method based on oracle database, comprising:
Step 1, creates the text index based on Chinese Fields at oracle database layer;
Step 2, creates two Job, and a Job is used for adding text index to the newly-increased data in oracle database, and another Job is for optimizing the text index of all data in oracle database;
Step 3, performs two job at oracle database layer;
Step 4, based on the text index after two Job process, adopts SQL statement to carry out Chinese Fields inquiry in oracle database.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described step 1 specifically comprises:
Step 11, deletes original lexical analyzer in oracle database layer, and based on the newly-built lexical analyzer of Chinese Fields;
Step 12, deletes original non-index word table class in oracle database layer, and based on Chinese Fields newly-built non-index word table class;
Step 13, creates non-index word table based on newly-built non-index word table class, defines allly not to be suitable for the word carrying out data query in this non-index word table;
Step 14, in oracle database, creates text index to all Chinese Fields beyond the word defined in non-index word table.
Further, the text index also comprised creating verifies.
Further, the described text index to creating is carried out verification and is comprised: if the contains order can successfully called in SQL statement is inquired about, then show that text index creates successfully.
Further, in described step 2, PLSQL instrument is adopted to create job.
Further, described step 2 also comprises: when creating Job, arranges the time interval that Job performs.
Technical scheme of the present invention also comprises a kind of magnanimity Chinese data inquiry system based on oracle database, comprising:
Index creation module, it is for creating the text index based on Chinese Fields at oracle database layer;
Job creation module, it is for creating two Job, and a Job is used for adding text index to the newly-increased data in oracle database, and another Job is for optimizing the text index of all data in oracle database;
Job execution module, it is for performing two job at oracle database layer;
Enquiry module, it, for based on the text index after two Job process, adopts SQL statement to carry out Chinese Fields inquiry in oracle database.
Further, described index creation module comprises:
Lexical analyzer processing module, it is for deleting original lexical analyzer in oracle database layer, and based on the newly-built lexical analyzer of Chinese Fields;
Non-index word table class processing module, it is for deleting original non-index word table class in oracle database layer, and based on Chinese Fields newly-built non-index word table class;
Non-index word table creation module, it is for creating non-index word table based on newly-built non-index word table class, defines allly not to be suitable for the word carrying out data query in this non-index word table;
Text index sets up module, and it is in oracle database, creates text index to all Chinese Fields beyond the word defined in non-index word table.
Further, also comprise correction verification module, it is for verifying the text index created.
Further, described non-index word table class processing module adopts PLSQL instrument to create job.
The invention has the beneficial effects as follows: the present invention sets up text index in Chinese Fields, then set up job periodically to go to set up index to newly-increased data and optimize the index of the data set up, this mode efficiency is high, and safeguards to get up also to be easy to, and is applicable to the inquiry of large data.The slow problem of inquiry of the existing existence of fuzzy query in systems in which can be improved by this technology, be conducive to the overall performance of raising system.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the magnanimity Chinese data querying method based on oracle database of the present invention;
Fig. 2 is the structural representation of the present invention's magnanimity Chinese data inquiry system based on oracle database of the present invention.
Embodiment
Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, this gives a kind of magnanimity Chinese data querying method based on oracle database, comprising:
Step 1, creates the text index based on Chinese Fields at oracle database layer;
Step 2, creates two Job, and a Job is used for adding text index to the newly-increased data in oracle database, and another Job is for optimizing the text index of all data in oracle database;
Step 3, performs two job at oracle database layer;
Step 4, based on the text index after two Job process, adopts SQL statement to carry out Chinese Fields inquiry in oracle database.Described in the present embodiment, step 1 specifically comprises:
Step 11, deletes original lexical analyzer in oracle database layer, and based on the newly-built lexical analyzer of Chinese Fields;
Step 12, deletes original non-index word table class in oracle database layer, and based on Chinese Fields newly-built non-index word table class;
Step 13, creates non-index word table based on newly-built non-index word table class, defines allly not to be suitable for the word carrying out data query in this non-index word table;
Step 14, in oracle database, creates text index to all Chinese Fields beyond the word defined in non-index word table.
In addition, also need the text index to creating to verify, concrete grammar is: if the contains order can called in SQL statement is inquired about, then show that text index creates successfully.
The present embodiment adopts PLSQL instrument to create job, and the time interval of Job execution is set, the Job that such as can arrange for adding text index to the newly-increased data in oracle database performs once for every 15 minutes, and the Job arranging the text index for optimizing all data in oracle database performs once every day.
Accordingly, the present embodiment gives a kind of magnanimity Chinese data inquiry system based on oracle database, as shown in Figure 2, comprising:
Index creation module, it is for creating the text index based on Chinese Fields at oracle database layer;
Job creation module, it is for creating two Job, and a Job is used for adding text index to the newly-increased data in oracle database, and another Job is for optimizing the text index of all data in oracle database;
Job execution module, it is for performing two job at oracle database layer;
Enquiry module, it, for based on the text index after two Job process, adopts SQL statement to carry out Chinese Fields inquiry in oracle database.
In addition, also comprise correction verification module, it is for verifying the text index created.
In the present embodiment, described index creation module comprises:
Lexical analyzer processing module, it is for deleting original lexical analyzer in oracle database layer, and based on the newly-built lexical analyzer of Chinese Fields;
Non-index word table class processing module, it is for deleting original non-index word table class in oracle database layer, and based on Chinese Fields newly-built non-index word table class;
Non-index word table creation module, it is for creating non-index word table based on newly-built non-index word table class, defines allly not to be suitable for the word carrying out data query in this non-index word table;
Text index sets up module, and it is in oracle database, creates text index to all Chinese Fields beyond the word defined in non-index word table.
Adopt magnanimity Chinese data inquiry system and the method for the present embodiment, the specific implementation process of carrying out data query is as follows:
1, delete original lexical analyzer, if do not have lexical analyzer to report an error, then can ignore.The program realized is as follows:
BEGIN
ctx_ddl.drop_preference('CHINA_LEXER');
END;
2, create lexical analyzer, newly-built lexical analyzer is used for the plain text in intelligent extraction sectionaliser, and is split as discontinuous mark, and practical function is resolved text exactly and marked, for follow-up retrieval provides precondition.Realize program as follows.
3, delete original non-index word table class, if do not have non-index word table class to report an error, can ignore, realize program as follows:
BEGIN
ctx_ddl.drop_preference('CC_STOPLIST');
END;
4, non-index word table class is created
5, non-index word table is created
Non-index word table is also referred to as stop words, and in oracle database, full-text index allows user to set up stop words, shields those and comprises the smaller and word that probability of occurrence is higher of quantity of information.
The words such as a, this, are, the in such as English, almost all can comprise these everyday words, therefore carry out having little significance of index to these words in every section of article.The present embodiment lists following stop words:
6, by CTXSYS/CTXSYS user's log database
Compose authority to dbcustadm, if there is no CTXSYS user, first directly can perform login, do not run succeeded if logged in, solve CTXSYS system user problem with regard to needs SPD.
GRANT?EXECUTE?ON?ctx_ddl?TO?dbcustadm;
7, create index, the program of realization is as follows.
CREATE?INDEX?dbcustadm.addrmsgindex?ON?dbcustadm.danalogmsgadd(addr_msg)INDEXTYPE?IS?CTXSYS.CONTEXT
PARAMETERS('MEMORY?50M?LEXER?dbcustadm.CHINA_LEXER?STOPLIST?dbcustadm.CC_STOPLIST');
8, verify whether index creates successfully
Can inquire about with contains, just illustrate and created successfully, be exemplified below:
SELECT*FROM danalogmsgadd WHERE contains (addr_msg, ' 31 Unit 1 ') >0;
SELECT*FROM danalogmsgadd WHERE contains (addr_msg, 1 unit and 1 Room, ' 31 ') >0;
9: under PLSQL instrument, create two job, the content creating job is as follows:
--job1-adds text index to for every 15 minutes the newly-increased data in oracle database
what:ctx_ddl.sync_index('dbcustadm.addrmsgindex');
next?date:sysdate
interval:SYSDATE+(1/24/4)
--job2-carries out an optimiged index every day
what:CTX_DDL.OPTIMIZE_INDEX('dbcustadm.addrmsgindex','full');
next?date:sysdate
interval:SYSDATE+1
10, job performs inspection:
select*from?all_jobs;
Create job can here display, resource also should have job;
Last_date: the working time recording next job, job1 perform once for every 15 minutes, and job2 performs once every day;
Total_time: the number of run that can record job, after performing, quantity is progressively increased.
Generally speaking, the present embodiment sets up the index needing to use according to the mode introduced in technical scheme at database layer, and then safeguard the job of index, contains (addr_msg, ' xxxx') >0 is adopted to inquire about finally by using in application program in fuzzy SQL statement.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1., based on a magnanimity Chinese data querying method for oracle database, it is characterized in that, comprising:
Step 1, creates the text index based on Chinese Fields at oracle database layer;
Step 2, creates two Job, and a Job is used for adding text index to the newly-increased data in oracle database, and another Job is for optimizing the text index of all data in oracle database;
Step 3, performs two Job at oracle database layer;
Step 4, based on the text index after two Job process, adopts SQL statement to carry out Chinese Fields inquiry in oracle database.
2. magnanimity Chinese data querying method according to claim 1, it is characterized in that, described step 1 specifically comprises:
Step 11, deletes original lexical analyzer in oracle database layer, and based on the newly-built lexical analyzer of Chinese Fields;
Step 12, deletes original non-index word table class in oracle database layer, and based on Chinese Fields newly-built non-index word table class;
Step 13, creates non-index word table based on newly-built non-index word table class, defines allly not to be suitable for the word carrying out data query in this non-index word table;
Step 14, in oracle database, creates text index to all Chinese Fields beyond the word defined in non-index word table.
3. magnanimity Chinese data querying method according to claim 1 and 2, is characterized in that, the text index also comprised creating verifies.
4. magnanimity Chinese data querying method according to claim 3, is characterized in that, the described text index to creating is carried out verification and comprised: if the contains order can successfully called in SQL statement is inquired about, then show that text index creates successfully.
5. magnanimity Chinese data querying method according to claim 1, is characterized in that, in described step 2, adopts PLSQL instrument to create job.
6. magnanimity Chinese data querying method according to claim 1, it is characterized in that, described step 2 also comprises: when creating Job, arranges the time interval that Job performs.
7., based on a magnanimity Chinese data inquiry system for oracle database, it is characterized in that, comprising:
Index creation module, it is for creating the text index based on Chinese Fields at oracle database layer;
Job creation module, it is for creating two Job, and a Job is used for adding text index to the newly-increased data in oracle database, and another Job is for optimizing the text index of all data in oracle database;
Job execution module, it is for performing two job at oracle database layer;
Enquiry module, it, for based on the text index after two Job process, adopts SQL statement to carry out Chinese Fields inquiry in oracle database.
8. magnanimity Chinese data inquiry system according to claim 7, is characterized in that, described index creation module comprises:
Lexical analyzer processing module, it is for deleting original lexical analyzer in oracle database layer, and based on the newly-built lexical analyzer of Chinese Fields;
Non-index word table class processing module, it is for deleting original non-index word table class in oracle database layer, and based on Chinese Fields newly-built non-index word table class;
Non-index word table creation module, it is for creating non-index word table based on newly-built non-index word table class, defines allly not to be suitable for the word carrying out data query in this non-index word table;
Text index sets up module, and it is in oracle database, creates text index to all Chinese Fields beyond the word defined in non-index word table.
9. the magnanimity Chinese data inquiry system according to claim 7 or 8, is characterized in that, also comprise correction verification module, and it is for verifying the text index created.
10. magnanimity Chinese data inquiry system according to claim 7, is characterized in that, described non-index word table class processing module adopts PLSQL instrument to create job.
CN201410602655.6A 2014-10-31 2014-10-31 Massive Chinese data query method and system based on oracle database Pending CN104317937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410602655.6A CN104317937A (en) 2014-10-31 2014-10-31 Massive Chinese data query method and system based on oracle database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410602655.6A CN104317937A (en) 2014-10-31 2014-10-31 Massive Chinese data query method and system based on oracle database

Publications (1)

Publication Number Publication Date
CN104317937A true CN104317937A (en) 2015-01-28

Family

ID=52373169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410602655.6A Pending CN104317937A (en) 2014-10-31 2014-10-31 Massive Chinese data query method and system based on oracle database

Country Status (1)

Country Link
CN (1) CN104317937A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5303361A (en) * 1989-01-18 1994-04-12 Lotus Development Corporation Search and retrieval system
CN101154241A (en) * 2007-10-11 2008-04-02 北京金山软件有限公司 Data searching method and data searching system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5303361A (en) * 1989-01-18 1994-04-12 Lotus Development Corporation Search and retrieval system
CN101154241A (en) * 2007-10-11 2008-04-02 北京金山软件有限公司 Data searching method and data searching system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
聂红梅等: "Oracle数据库中Clob大字段的查询优化技术研究", 《计算机技术与发展》 *

Similar Documents

Publication Publication Date Title
CN107402992B (en) Distributed NewSQL database system and full-text retrieval establishing method
EP3026579B1 (en) Forced ordering of a dictionary storing row identifier values
EP3026578B1 (en) N-bit compressed versioned column data array for in-memory columnar stores
US10255309B2 (en) Versioned insert only hash table for in-memory columnar stores
US9009182B2 (en) Distributed transaction management with tokens
WO2020233330A1 (en) Batch testing method, apparatus, and computer-readable storage medium
US9965504B2 (en) Transient and persistent representation of a unified table metadata graph
US20160147862A1 (en) Delegation of Database Post-Commit Processing
US20160147786A1 (en) Efficient Database Undo / Redo Logging
US9165049B2 (en) Translating business scenario definitions into corresponding database artifacts
CN103186639B (en) Data creation method and system
US10474697B2 (en) Updating a partitioning column
Zhou et al. A survey on the management of uncertain data
US9189515B1 (en) Data retrieval from heterogeneous storage systems
CN104572895A (en) MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method
CN105389344A (en) Self-service novelty retrieval method and system
TWI706260B (en) Index establishment method and device based on mobile terminal NoSQL database
CN102253975A (en) Automatic switching system and method for database
CN112214453B (en) Large-scale industrial data compression storage method, system and medium
CN103678634A (en) Method for improving data query speed in J-Hi open-source development platform
CN101639851B (en) Method for storing and querying data and devices thereof
CN105574027A (en) On-line transaction processing/on-line analytical processing (OLTP/OLAP) hybrid application based multi-dimensional performance data storage method, device and system
CN105005619A (en) Rapid retrieval method and system for mass website basic information
US20120284224A1 (en) Build of website knowledge tables
CN104317937A (en) Massive Chinese data query method and system based on oracle database

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150128

RJ01 Rejection of invention patent application after publication