CN100444591C - Method for acquiring front-page keyword and its application system - Google Patents

Method for acquiring front-page keyword and its application system Download PDF

Info

Publication number
CN100444591C
CN100444591C CNB2006101124628A CN200610112462A CN100444591C CN 100444591 C CN100444591 C CN 100444591C CN B2006101124628 A CNB2006101124628 A CN B2006101124628A CN 200610112462 A CN200610112462 A CN 200610112462A CN 100444591 C CN100444591 C CN 100444591C
Authority
CN
China
Prior art keywords
web page
webpage
keyword
root
page title
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2006101124628A
Other languages
Chinese (zh)
Other versions
CN1909522A (en
Inventor
田野
陈亮
李晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Original Assignee
Beijing Kingsoft Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Software Co Ltd filed Critical Beijing Kingsoft Software Co Ltd
Priority to CNB2006101124628A priority Critical patent/CN100444591C/en
Publication of CN1909522A publication Critical patent/CN1909522A/en
Application granted granted Critical
Publication of CN100444591C publication Critical patent/CN100444591C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method for obtaining the keyword of page, and a relative system, wherein said method comprises: classifying the page title, to obtaining the root of page title; based on the times of each page title root in the page, selecting the most one page title root as the keyword of said page. The inventive method can quickly and accurately obtain the keyword of page. And the method can be used in page catch system, to analyze the caught page, to obtain the keyword and store the page and keyword into database, to supply more pages to user.

Description

Obtain the method and the application system thereof of front-page keyword
Technical field
The present invention relates to networking technology area, relate in particular to a kind of method and application system thereof of obtaining front-page keyword.
Background technology
Along with the fast development of network, obtain the important means that information has become people's acquired information by network.Go out the webpage of being concerned about for the ease of user's rapid screening from a large amount of webpages, the webpage supplier need carry out preliminary treatment to web page contents, and reed is got front-page keyword, and each front-page keyword and web page contents together are kept in the database.When a certain webpage was browsed in user request, server at first obtained the keyword of this webpage correspondence from database, and the webpage that search has a same keyword from database offers the viewer.
The method of obtaining front-page keyword at present is by manual read's web page contents, to obtain front-page keyword.Adopt this method to obtain the keyword weak point and be when webpage quantity is huge, need a large amount of manual operations, increase workload, efficient is low, the front-page keyword accuracy that is found is not high, and this method only is applicable to info web by website supplier issue, and a limited number of websites of webpage, for example, news websites etc. are not suitable for the website that forum etc. is released news by the user, or the huge website of webpage quantity, for example, forum etc.
Summary of the invention
The technical problem to be solved in the present invention provides a kind of method and application system thereof of obtaining front-page keyword, to realize obtaining fast and accurately front-page keyword.
For solving the problems of the technologies described above, the objective of the invention is to be achieved through the following technical solutions.
A kind of method of obtaining front-page keyword, this method comprises:
Obtain web page title, web page title is carried out participle, obtain the web page title root; At netpage search web page title root, the number of times that statistical web page title root occurs in webpage; Be chosen at least one many web page title root of occurrence number in the webpage as described front-page keyword.
In said method, described web page title is carried out participle, obtain the web page title root and be specially:
According to the read-write order, travel through each web page title character, in each ergodic process, at first current character is preserved as a web page title root, on this web page title root basis, order is appended character or character string again, preserves as the web page title root.
Said method further comprises: for each web page title root is provided with a counter;
Described at netpage search web page title root, the number of times that statistical web page title root occurs in webpage is specially:
According to the read-write order, from the webpage source file, read the effective text data in the web data, travel through each character that effective text data comprises, in each ergodic process, at first current character is mated as a web page contents root and web page title root, if the match is successful, the counter of corresponding web page title root is added 1, on this web page contents root basis, order is appended character or character string again, mate as web page contents root and web page title root,, corresponding web page title root counter is added 1 if the match is successful.
Said method further comprises: described front-page keyword and described webpage are kept at web database.
Said method further comprises: add up the number of times that described front-page keyword occurs in a plurality of webpages, select at least one many front-page keyword of occurrence number as the hottest keyword.
Said method further comprises: described front-page keyword and/or the hottest keyword are enumerated on webpage, and for it link is set.
The system of front-page keyword method is obtained in a kind of application, and described system comprises: the front-page keyword acquiring unit, and webpage is preserved unit, web search unit;
The front-page keyword acquiring unit is used to obtain web page title, and web page title is carried out participle, obtains the web page title root; At netpage search web page title root, the number of times that statistical web page title root occurs in webpage; Be chosen at least one many web page title root of occurrence number in the webpage as described front-page keyword;
Webpage is preserved the unit, is used to preserve the front-page keyword that web page contents, web page address, front-page keyword acquiring unit obtain;
The web search unit is used for that webpage is preserved the unit and retrieves, and obtains the webpage that has same keyword with current browsing page.
Said system further comprises: webpage is climbed and is grabbed the unit, is used to obtain webpage;
The front-page keyword acquiring unit, be used for webpage climbed and grab that the webpage caught is climbed in the unit or the title of the current webpage of browsing of user carries out the participle analysis, obtain the web page title root, according to the number of times that each web page title root occurs in webpage, select at least one many web page title root of occurrence number as the described keyword of grabbing webpage or current browsing page of climbing.
Above technical scheme as can be seen, because the present invention is by carrying out the participle analysis to web page title, obtain the title root, the number of times that in webpage, occurs according to each title root, select the keyword of at least one many title root of occurrence number as described webpage, therefore, adopt this method can obtain front-page keyword fast, further, adopt the method for obtaining web page title root and statistical web page title root occurrence number in webpage provided by the present invention, the front-page keyword that obtains is more accurate than artificial method.In addition, this method is applicable to various types of websites, such as, comprehensive website that webpage quantity is very huge or info web are by the websites such as forum of user's issue, and climb the webpage that the technology of grabbing obtains for employing, can obtain rapidly to climb and grab front-page keyword, described climbing grabbed webpage and keyword deposits database in, the webpage supplier can provide more webpages for inquiry for it according to user's needs.
Description of drawings
The explanation of Fig. 1 web page title;
Fig. 2 obtains the front-page keyword method flow;
The flow process of Fig. 3 statistical web page title root occurrence number method in webpage;
Fig. 4 obtains the front-page keyword method and is applied to the block diagram that webpage is climbed the system of grabbing;
Fig. 5 system shown in Figure 4 workflow.
Embodiment
Core concept of the present invention is: by web page title is carried out the participle analysis, obtain the title root, according to the number of times that each title root occurs, select the keyword of at least one many title root of occurrence number as described webpage in webpage.
Choose web page title, it is carried out the participle analysis, reason is that web page title generally is the summary to web page contents, often comprises front-page keyword.
With reference to Fig. 1, web page title is described, wherein, label 101 is depicted as the web page title hurdle, and label 102 is depicted as the web page contents title, and label 103 is depicted as the corresponding webpage source code of web page title; Each webpage all has a title (title) attribute, this title attribute value, be generally shown in the title bar of browser, when checking source code, can see<title</title〉this is to label, this is exactly the title attribute value of webpage to the value that label bracketed, and the webpage supplier can be by setting<title〉</title〉value that bracketed, for webpage is provided with a title.Generally speaking, the webpage supplier can be set to web page title by the web page contents title, for example, and headline, article title, model title etc.
More than for the core concept of the inventive method and propose the foundation of this thought, below will introduce method provided by the present invention in detail, with reference to Fig. 2, Fig. 2 shows the realization flow of the inventive method, said method comprising the steps of:
Step 201: obtain web page title; Each developing instrument all provides the interface function that obtains the web page title attribute, by calling described interface function, can obtain the title of a webpage, for example, under the VC development environment, can obtain web page title by following code:
HRESULT IHTMLDocument2::get_title (BSTR*P); Wherein, IHTMLDocument2 points to the current web page data;
Step 202: web page title is carried out participle, and reed is got the web page title root and is kept in the tabulation temporarily;
Step 203: at netpage search web page title root, the number of times that statistical web page title root occurs in webpage;
Step 204: select the keyword of at least one many web page title root of occurrence number as described webpage;
So far, realized acquisition front-page keyword business, in actual applications, this method further comprises:
Together be kept at front-page keyword and web page contents in the web database;
Described front-page keyword is enumerated on webpage, and link is set for each keyword;
Add up the number of times that described front-page keyword occurs in a plurality of webpages, select at least one many front-page keyword of occurrence number to enumerate on webpage, and, offer the viewer for it is provided with link as the hottest keyword.
The invention provides two kinds of modes of obtaining the web page title root, wherein, the embodiment of the invention one adopts title root obtain manner (), and described title root obtain manner () is specially:
According to the read-write order, travel through each web page title character, in each ergodic process, at first current character is preserved as a web page title root, on this heading pile foundation, append character late again, preserve as a title root, by that analogy, until traveling through, having appended last character, be kept at the web page title root in the tabulation temporarily;
For example: Chinese Government releases the intellectual property new measure, can be divided into following root: in, China, middle international politics, Chinese Government, Chinese Government push away, and the like, traveled through " in " behind this character, begin traversal " state ", can be divided into following root: state, international politics, state government, state government pushes away.。。。And the like, " arrange " until having traveled through last character;
Wherein, the embodiment of the invention two adopts title root obtain manner (two), and described title root obtain manner (two) is specially:
Web page title is offered third party's participle software, carry out participle, obtain the web page title root;
This mode can effectively reduce root, improves search efficiency, such as, use participle software that " Chinese Government releases the intellectual property new measure " title is analyzed, can obtain roots such as Chinese Government, intellectual property, behave;
In embodiments of the present invention, can a counter be set for each web page title root, initial value is 0, is used for writing down the number of times that each web page title root occurs at webpage;
In other embodiment of the present invention, can adopt other counting mode, the number of times that record web page title root occurs in webpage does not influence the present invention and realizes;
The embodiment of the invention provides, and at netpage search web page title root, the method for the number of times that statistical web page title root occurs in webpage referring to Fig. 3, specifically comprises:
Step 301: from the webpage source file, read effective text data in the web data by the read-write order;
Wherein, comprise effective text data, label data, descriptive data in the webpage source file; Different data have different labels, and the present invention adopts regular expression or other character string processing method in reading process, remove the non-legible content in the webpage source file, obtain effective text data;
Wherein, those skilled in the art will know that described regular expression is a kind of character string processing method commonly used;
Wherein, described effective text data refers to be presented at the word content on the webpage, can be that Chinese also can be the literal of other Languages; Described label data and data of description refer to be used in the webpage source file order of the descriptive language of display web page content, with the html language is example, comprising: display text order<p〉</p, display graphics order<img, show form order<table</table, show chained command<ahref= Www.sina.com.cnSina</a〉etc.;
Wherein, in the webpage during display text, can not use any markup language in the webpage source code, if need carry out attribute or locational setting to literal, then can use markup language, for example<font size=1 () color=red hello</font show on the webpage be exactly font size be 10, red " hello " two words;
Step 302: a comparison string variable Str is set;
Step 303: from effective text data, read a character S i,, Str=S is set as current web page content root i, each the title root in Str and the tabulation is mated, if the title root of coupling is arranged, then the counter with this title root adds 1, represents that this title root has occurred in webpage once, after coupling is finished, presses the read-write order at S iAfter append a character S I+1, Str=S is set iS I+1, again with tabulation in each title root mate, if the match is successful, then with counter+1 of corresponding title root, the rest may be inferred, is appended in proper order by 15 characters by read-write until described web page contents root and forms, be i.e. Str=S iS I+1S I+2... S I+j, from S iTo S I+jBe 15 characters, then finish this step, enter step 304;
Step 304: read next effectively text data character S I+1, judge that whether this character is last character in effective text data, if not, then repeating step 303, if then enter step 305;
Step 305: this character and title root are mated, if the match is successful, then the counter with corresponding title root adds 1, finishes whole flow process;
Wherein, in the step 303,, also can finish this step, enter step 304 if read non-legible characters such as punctuation mark, or space character;
Wherein, the web page contents root is appended to the reason that comprises 15 characters at most and is that the length of title root generally can not surpass 15 characters in the step 303;
Wherein, the probability that title root that only comprises a character becomes keyword is very low, therefore, can not consider only to comprise the title root of a character when selecting keyword;
Wherein, can determine the keyword number of described webpage according to webpage supplier's needs.
More than be the description of method provided by the present invention, the inventive method has multiple application, will introduce respectively below:
(1) use one:
The website that is releasing news by the user, such as, forum etc., or the huge website of webpage quantity, adopt the method for obtaining keyword provided by the invention, obtain each front-page keyword and together be kept at database, when a certain webpage is browsed in user's request with web page contents, server obtains the keyword of this webpage correspondence from database, search for the webpage with same keyword according to user's needs from database and offer the user;
Because the content of posting of forum is determined by the general user, if adopt the method for manual read's web page contents to obtain keyword, then can't deposit this front-page keyword in database in real time, if and the user's modification web page contents causes front-page keyword to change, adopt manual type can't in time revise the front-page keyword that deposits database in, the webpage that causes searching does not meet customer requirements, adopts method provided by the present invention, can avoid the generation of above problem;
(2) use two:
Using 2 is further to optimize using 1, make it more convenient user, after adopting method provided by the invention to obtain the keyword of current web page, each front-page keyword and web page contents that the webpage supplier not only will obtain together are kept at database, and these keywords are enumerated on webpage, and for each keyword is provided with link, link is pointed to all and is had the address of one or more webpages of this keyword, and the user can check the keywords link of being concerned about according to the needs of oneself;
(3) use three:
Provide recent network the hottest keyword, employing the invention provides method and obtains front-page keyword, adds up the number of times that described front-page keyword occurs in a plurality of webpages, and the keyword that occurrence number is maximum promptly is the hottest keyword;
Wherein, the hottest described keyword is meant in the recent period the frequent front-page keyword that occurs in a plurality of webpages with identical or close theme;
(4) use four:
Owing to the invention provides the method for obtaining front-page keyword automatically, therefore when reed is got relevant info web, the related web page of other websites except that this website can also be provided as required, only the related web page of other websites need be climbed and grab, use the invention provides method and obtains and climb the front-page keyword of catching, and it is kept in the web database.
It is a kind of technology of obtaining webpage that described webpage is climbed the technology of grabbing, and may further comprise the steps:
One,, obtains the content of this webpage according to web page address; Different programming languages provides different interface function in order to obtain web page contents, and for example, the PHP language provides GetContentString () function, in order to obtain the web page contents of specifying network address;
Two, behind webpage of acquisition, analyze this web page contents again, according to regular expression, obtain the link that comprised in this webpage, re-use GetContentString () function, obtain the corresponding web page contents of each link, and the like, can obtain multistage webpage as required, again web page contents and its corresponding address are kept in the webpage preservation unit.
It is huge that the webpage quantity that the technology of grabbing obtains is climbed in employing, in this case, manually obtains the method for front-page keyword then need a large amount of manual operations if still adopt, and wastes time and energy.
Fig. 4 is for climbing system's pie graph of using the method that the invention provides in the system of grabbing at webpage, this system comprises:
Webpage is climbed and is grabbed the unit, is used to obtain webpage;
The front-page keyword acquiring unit is used to obtain web page title, and web page title is carried out participle, obtains the web page title root; At netpage search web page title root, the number of times that statistical web page title root occurs in webpage; Be chosen at least one many web page title root of occurrence number in the webpage as described front-page keyword;
Webpage is preserved the unit, is used to preserve the front-page keyword that web page contents, web page address, front-page keyword acquiring unit obtain;
The web search unit is used for that webpage is preserved the unit and retrieves, and obtains the webpage with current browsing page same keyword.
Fig. 5 is the system shown in Figure 4 workflow, comprising:
Step 501: the user browses certain webpage to the Website server request;
Step 502: the front-page keyword acquiring unit, this webpage is analyzed, obtain at least one keyword of this webpage;
Step 503: webpage is climbed and is grabbed the unit and webpage is climbed grab as required, and is kept in the database;
Step 504: the front-page keyword acquiring unit, handle climbing the webpage of catching in the step 503 respectively, obtain the keyword of each webpage, and the webpage that keyword is corresponding with it is saved in together in the webpage preservation module;
Step 505: the web search unit is preserved in the unit at webpage according to the keyword that obtains in the step 502, retrieves the webpage identical with this keyword, offers the user;
Wherein, step 503 and 504 can be carried out in advance.
More than a kind of method and application system thereof of obtaining front-page keyword provided by the present invention is described in detail, used specific case herein principle of the present invention and execution mode are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (8)

1, a kind of method of obtaining front-page keyword is characterized in that, this method comprises:
Obtain web page title, web page title is carried out participle, obtain the web page title root;
At netpage search web page title root, the number of times that statistical web page title root occurs in webpage;
Be chosen at least one many web page title root of occurrence number in the webpage as described front-page keyword.
2, according to the described method of claim 1, it is characterized in that, described web page title carried out participle, obtain the web page title root and be specially:
According to the read-write order, travel through each web page title character;
In each ergodic process, at first current character is preserved as a web page title root;
On this web page title root basis, order is appended character or character string again, preserves as the web page title root.
3, according to claim 1 or 2 described arbitrary methods, it is characterized in that this method further is included as each web page title root a counter is set;
Described at netpage search web page title root, the number of times that statistical web page title root occurs in webpage is specially:
According to the read-write order, from the webpage source file, read the effective text data in the web data;
Travel through each character that effective text data comprises;
In each ergodic process, at first current character is mated as a web page contents root and web page title root, if the match is successful, the counter of corresponding web page title root is added 1;
On this web page contents root basis, order is appended character or character string again, mates as web page contents root and web page title root, if the match is successful, corresponding web page title root counter is added 1.
4, method according to claim 1 is characterized in that, this method further comprises: described front-page keyword and described webpage are kept at web database.
5, according to claim 1 or 4 described methods, it is characterized in that this method further comprises: add up the number of times that described front-page keyword occurs in a plurality of webpages, select at least one many front-page keyword of occurrence number as the hottest keyword.
According to the described method of claim 5, it is characterized in that 6, this method further comprises: described front-page keyword and/or the hottest keyword are enumerated, and link is set for it on webpage.
7, the system of front-page keyword method is obtained in a kind of application, it is characterized in that, described system comprises: the front-page keyword acquiring unit, and webpage is preserved unit, web search unit;
The front-page keyword acquiring unit is used to obtain web page title, and web page title is carried out participle, obtains the web page title root; At netpage search web page title root, the number of times that statistical web page title root occurs in webpage; Be chosen at least one many web page title root of occurrence number in the webpage as described front-page keyword;
Webpage is preserved the unit, is used to preserve the front-page keyword that web page contents, web page address, front-page keyword acquiring unit obtain;
The web search unit is used for that webpage is preserved the unit and retrieves, and obtains the webpage that has same keyword with current browsing page.
According to the described system of claim 7, it is characterized in that 8, described system further comprises: webpage is climbed and is grabbed the unit, is used to obtain webpage;
The front-page keyword acquiring unit, be used for webpage climbed and grab that the webpage caught is climbed in the unit or the title of the current webpage of browsing of user carries out the participle analysis, obtain the web page title root, according to the number of times that each web page title root occurs in webpage, select at least one many web page title root of occurrence number as the described keyword of grabbing webpage or current browsing page of climbing.
CNB2006101124628A 2006-08-18 2006-08-18 Method for acquiring front-page keyword and its application system Active CN100444591C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101124628A CN100444591C (en) 2006-08-18 2006-08-18 Method for acquiring front-page keyword and its application system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101124628A CN100444591C (en) 2006-08-18 2006-08-18 Method for acquiring front-page keyword and its application system

Publications (2)

Publication Number Publication Date
CN1909522A CN1909522A (en) 2007-02-07
CN100444591C true CN100444591C (en) 2008-12-17

Family

ID=37700516

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101124628A Active CN100444591C (en) 2006-08-18 2006-08-18 Method for acquiring front-page keyword and its application system

Country Status (1)

Country Link
CN (1) CN100444591C (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276361B (en) 2007-03-28 2010-09-15 阿里巴巴集团控股有限公司 Method and system for displaying related key words
CN101082936A (en) * 2007-06-29 2007-12-05 中兴通讯股份有限公司 Data enquiring system and method
CN101833556B (en) * 2009-03-12 2011-12-14 英业达股份有限公司 File content management system and method thereof
CN101957809A (en) * 2010-10-14 2011-01-26 传神联合(北京)信息技术有限公司 Anti-plagiarism method
CN103186618B (en) * 2011-12-30 2016-06-29 北京新媒传信科技有限公司 The acquisition methods of just data and device
CN104158698B (en) * 2014-08-06 2017-07-28 厦门天锐科技股份有限公司 A kind of web page browsing record statistical method and system
CN104333638A (en) * 2014-10-23 2015-02-04 张勇平 Contact webpage storage method of mobile terminal and corresponding mobile terminal
CN106708813A (en) * 2015-07-14 2017-05-24 阿里巴巴集团控股有限公司 Title processing method and equipment
CN106611009A (en) * 2015-10-26 2017-05-03 任子行网络技术股份有限公司 Method and device for auditing webpage keywords
CN105491136B (en) * 2015-12-11 2019-04-26 网易(杭州)网络有限公司 Message method and device
CN110516140A (en) * 2019-08-15 2019-11-29 北京泰迪熊移动科技有限公司 A kind of information processing method, equipment and computer storage medium
CN111191430B (en) * 2019-12-27 2023-02-14 中国平安财产保险股份有限公司 Automatic table building method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1125108A (en) * 1997-07-02 1999-01-29 Matsushita Electric Ind Co Ltd Automatic extraction device for relative keyword, document retrieving device and document retrieving system using these devices
JP2004013726A (en) * 2002-06-10 2004-01-15 Sumitomo Electric Ind Ltd Device for extracting keyword and device for retrieving information
CN1536483A (en) * 2003-04-04 2004-10-13 陈文中 Method for extracting and processing network information and its system
CN1614587A (en) * 2003-11-07 2005-05-11 杨立伟 Method for digesting Chinese document automatically
CN1682220A (en) * 2002-07-30 2005-10-12 索尼株式会社 Automatic keyword extraction device and method, recording medium, and program
WO2006047654A2 (en) * 2004-10-25 2006-05-04 Yuanhua Tang Full text query and search systems and methods of use
CN1809830A (en) * 2003-06-20 2006-07-26 新加坡科技研究局 Method and platform for term extraction from large collection of documents

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1125108A (en) * 1997-07-02 1999-01-29 Matsushita Electric Ind Co Ltd Automatic extraction device for relative keyword, document retrieving device and document retrieving system using these devices
JP2004013726A (en) * 2002-06-10 2004-01-15 Sumitomo Electric Ind Ltd Device for extracting keyword and device for retrieving information
CN1682220A (en) * 2002-07-30 2005-10-12 索尼株式会社 Automatic keyword extraction device and method, recording medium, and program
CN1536483A (en) * 2003-04-04 2004-10-13 陈文中 Method for extracting and processing network information and its system
CN1809830A (en) * 2003-06-20 2006-07-26 新加坡科技研究局 Method and platform for term extraction from large collection of documents
CN1614587A (en) * 2003-11-07 2005-05-11 杨立伟 Method for digesting Chinese document automatically
WO2006047654A2 (en) * 2004-10-25 2006-05-04 Yuanhua Tang Full text query and search systems and methods of use

Also Published As

Publication number Publication date
CN1909522A (en) 2007-02-07

Similar Documents

Publication Publication Date Title
CN100444591C (en) Method for acquiring front-page keyword and its application system
JP5721818B2 (en) Use of model information group in search
CA2610208C (en) Learning facts from semi-structured text
US7747657B2 (en) Mapping hierarchical data from a query result into a tabular format with jagged rows
CN106126648B (en) It is a kind of based on the distributed merchandise news crawler method redo log
US20100169311A1 (en) Approaches for the unsupervised creation of structural templates for electronic documents
US20040221233A1 (en) Systems and methods for report design and generation
TWI695277B (en) Automatic website data collection method
KR101505858B1 (en) A templet-based online composing system for analyzing reports or views of big data by providing past templets of database tables and reference fields
CN101727486A (en) Web forum information extraction system
CN102360367A (en) XBRL (Extensible Business Reporting Language) data search method and search engine
CN108733813A (en) Information extracting method, system towards BBS forum Web pages contents and medium
Ujwal et al. Classification-based adaptive web scraper
Abramowicz et al. Filtering the Web to feed data warehouses
CN108874870A (en) A kind of data pick-up method, equipment and computer can storage mediums
Nadee et al. Towards data extraction of dynamic content from JavaScript Web applications
Yu et al. Web content information extraction based on DOM tree and statistical information
US8266140B2 (en) Tagging system using internet search engine
JP5439100B2 (en) Document analysis system
Jou Schema extraction for deep web query interfaces using heuristics rules
CN109948015B (en) Meta search list result extraction method and system
Meng et al. Data extraction from the web based on pre-defined schema
Souza et al. ARCTIC: metadata extraction from scientific papers in pdf using two-layer CRF
WO2010147453A1 (en) System and method for designing a gui for an application program
Lim et al. Generalized and lightweight algorithms for automated web forum content extraction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: BEIJING KINGSOFT OFFICE SOFTWARE CO., LTD.

Free format text: FORMER OWNER: BEIJING JINSHAN SOFTWARE CO., LTD.

Effective date: 20140312

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100083 HAIDIAN, BEIJING TO: 100085 HAIDIAN, BEIJING

TR01 Transfer of patent right

Effective date of registration: 20140312

Address after: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road

Patentee after: Beijing Kingsoft WPS Office Co., Ltd.

Address before: 100083, Beijing, Haidian District No. 238 North Fourth Ring Road, No. 20, Bai Yan building

Patentee before: Beijing Jinshan Software Co., Ltd.

C56 Change in the name or address of the patentee
CP01 Change in the name or title of a patent holder

Address after: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road

Patentee after: Beijing Kingsoft office software Limited by Share Ltd

Address before: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road

Patentee before: Beijing Kingsoft WPS Office Co., Ltd.