US20010016809A1 - Method, apparatus, and computer program product for generating a summary of a document based on common expressions appearing in the document - Google Patents

Method, apparatus, and computer program product for generating a summary of a document based on common expressions appearing in the document Download PDF

Info

Publication number
US20010016809A1
US20010016809A1 US09/061,096 US6109698A US2001016809A1 US 20010016809 A1 US20010016809 A1 US 20010016809A1 US 6109698 A US6109698 A US 6109698A US 2001016809 A1 US2001016809 A1 US 2001016809A1
Authority
US
United States
Prior art keywords
information
commonly
held
sentence
constituting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/061,096
Other versions
US6338034B2 (en
Inventor
Kai Ishikawa
Akitoshi Okumura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Assigned to NEC CORPORATION reassignment NEC CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ISHIKAWA, KAI, OKUMURA, AKITOSHI
Publication of US20010016809A1 publication Critical patent/US20010016809A1/en
Application granted granted Critical
Publication of US6338034B2 publication Critical patent/US6338034B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • G01N1/02Devices for withdrawing samples
    • G01N1/10Devices for withdrawing samples in the liquid or fluent state
    • G01N1/20Devices for withdrawing samples in the liquid or fluent state for flowing or falling materials

Definitions

  • the present invention relates to a method, an apparatus, and a computer program product for making a summary of a document on the basis of commonly owned information and relation between ideas found in the sentences in the document.
  • Japanese Patent Laid-Open Publication Nos. 2-93866 and 2-112069 examples of summarizing techniques using an important-sentence selecting rule are disclosed in Japanese Patent Laid-Open Publication Nos. 2-93866 and 2-112069.
  • keywords are selected among words in a document based on the frequency distribution of occurrence of the same word, database of keywords, and user's decision, and then, important sentences containing the keywords are selected.
  • Japanese Patent Laid-Open Publication No.2-215049 discloses a technique to select important parts from a document by analyzing context vectors.
  • Japanese Patent Laid-Open Publication No.3-191475 discloses a technique to select important sentences from a document by applying a rule reflecting the feature of each paragraph to sentences therein.
  • the technique of automatically making a summary of a document on the basis of the structure of the document such as the context thereof or the feature of the paragraphs therin is suitable for selecting the structure of a long document and catching the transition of the subject of a long document, but is not suitable for creating a compact summary of high-accuracy.
  • An object of the present invention is to provide a method, an apparatus, and a computer program product for generating from a document an accurate and compact summary on which the point of view of a user is reflected.
  • a method of summarizing a document which comprises the steps of: extracting sentence-constituting-elements from the document; tabularizing the sentence-constituting-elements corresponding to categories and sentences in the document; extracting commonly-held-information which is common to the sentence-constituting-elements in the same category from the sentence-constituting-elements; looking up common expression information which is common to plural pieces of the commonly-held-information in a thesaurus in which the commonly-held-information and the common expression information are connected by a hierarchical tree; and composing a summary based on the commonly-held-information and the common expression information.
  • Sentence-constituting-elements which come under predetermined categories are extracted from a document.
  • the predetermined categories includes “When”, “Where”, “Who”, “What”, “Why”, “How” (5W1H) and “Done”.
  • the extracted sentence-constituting-elements are categorized based on the categories.
  • the categorized sentence-constituting-elements are referred to as categorized information.
  • the categorized information which is subjected to generation of the framework of a sentence in a summary is referred to as commonly-held-information.
  • the common expression information which is common to commonly-held-information is extracted consulting a thesaurus.
  • the categorized information which is subjected to generation of the qualifying part of a sentence in the summary is referred to as commonly-occurring-information.
  • a sentence in the summary is generated from pieces of commonly-held-information, pieces of common expression information and optionally from pieces of commonly-occurring-information.
  • FIG. 1 is a block diagram showing the basic structure of a summarizing apparatus according to a first embodiment of the present invention
  • FIG. 2 shows one example of a document supplied to input processing sub-section 5 of sentence-constituting-elements database producing section 1 in FIG. 1;
  • FIG. 3 shows, in the form of table, one example of sentence-constituting-elements stored in sentence-constituting-elements database 2 in FIG. 1;
  • FIG. 4 shows examples of the contents of thesaurus 9 contained in summary-constituting-elements generating 3 section in FIG. l;
  • FIG. 5 shows, in the form of table, one example of commonly-held-information and common expression information which is extracted in commonly-held-information extracting sub-section 7 in FIG. 1;
  • FIG. 6 is a block diagram showing the basic structure of a summarizing apparatus according to a second embodiment of the present invention.
  • FIG. 7 shows one example of a set consisting of common expression information, common expression information linked to commonly-occurring-information, and commonly-held-information which is generated by from commonly-occurring-information linking sub-section 14 in FIG. 6;
  • FIG. 8 shows examples of the contents of thesaurus 9 according to a third embodiment
  • FIG. 9 shows one example of a document supplied to input processing sub-section 5 in a summarizing apparatus according to a fourth embodiment.
  • a summarizing apparatus comprises sentence-constituting-elements database producing section 1 which extracts sentence-constituting-elements corresponding to each of categories from a document and output the sentence-constituting-elements accompanied by the corresponding categories; sentence-constituting-elements database 2 which stores the sentence-constituting-elements dividedly according to the categories; summary-constituting-elements generating section 3 which extracts pieces of commonly-held-information corresponding to the categories from the sentence-constituting-elements and extracts pieces of common expression information which are common to pieces of commonly-held-information; and summary completing section 4 which generates a summary based on pieces of commonly-held-information and pieces of common expression information.
  • the categories include “When”, “Where”, “Who”, “What”, “Why”, “How” (5W1H) and “Done”.
  • the sentence-constituting-elements database producing section 1 comprises input processing sub-section 5 which executes preprocessing such as morphology analysis to generate preprocessed information, analytic dictionary 7 which stores data used for analyzing the preprocessed information, and 5W1H decomposing sub-section 6 which analyzes the preprocessed information sentence by sentence consulting analytic dictionary 7 to generate sentence-constituting-elements corresponding to categories.
  • preprocessing such as morphology analysis to generate preprocessed information
  • analytic dictionary 7 which stores data used for analyzing the preprocessed information
  • 5W1H decomposing sub-section 6 which analyzes the preprocessed information sentence by sentence consulting analytic dictionary 7 to generate sentence-constituting-elements corresponding to categories.
  • Sentence-constituting-elements database 2 stores sentence-constituting-elements. Fields of sentence-constituting-elements database 2 are categories and each record of sentence-constituting-elements database 2 corresponds to a sentence in the original document.
  • Summary-constituting elements generating section 3 comprises commonly-held-information extracting sub-section 8 which extracts and outputs commonly-held-information corresponding to each of attributes from sentence-constituting-elements and thesaurus 9 in which hierarchical trees of common expression information and commonly-held-information are stored.
  • commonly-held-information extracting sub-section 8 finds common expression information consulting thesaurus 9 and outputs the commonly-held-information and the common expression information to summary completing section 4 .
  • Summary completing section 4 comprises sentence-composing dictionary 11 in which data used for composing a sentence are stored and sentence-composing sub-section 10 which composes a summary from the commonly-held-information and common expression information consulting sentence-composing dictionary 11 .
  • the document as shown in FIG. 2 is supplied to input processing sub-section 5 of sentence-constituting-elements database producing section 1 .
  • the document includes three sentences, which are: “Company A established a software company in Taiwan in September, '95.”; “Company B established a software venture company in Hong Kong in April, '95.”; and “Company C established a color television plant in China in December, '95.”
  • 5W1H decomposition sub-section 6 extracts sentence-constituting-elements for every sentences in the documents and all the categories consulting analytic dictionary 7 and supplies the sentence-constituting-elements to sentence-constituting-elements database 2 in which a group of the sentence-constituting-elements in one sentence is treated as one record and each member of the group is recorded in a corresponding field representing one of categories as shown in FIG.
  • commonly-held-information extracting sub-section 8 extracts “'95” as commonly-held-information for category “when” from sentences 7 , 23 and 51 , and “establish” as commonly-held-information for category “done” from sentences 7 , 23 and 51 . In addition, consulting thesaurus 9 as shown in FIG.
  • commonly-held-information extracting sub-section 8 extracts “three electronics companies” as common expression information for category “who” from sentences 7 , 23 , and 51 , “two communication & computer companies” as common expression information for category “who” from sentences 7 and 23 , “Eastern Asia” for category “where” from sentences 7 , 23 , and 51 , “software companies and a color television plant” for category “what” from sentences 7 , 23 , and 51 , “software companies” for category “what” from sentences 7 and 23 .
  • the extracted commonly-held-information and common expression information are tabled in FIG. 5.
  • Summary completing section 4 generates a summary from the commonly-held-information and the common expression information as follows:
  • FIG. 6 shows the basic structure of a summarizing apparatus according to a second embodiment of the present invention.
  • summary-constituting-elements generating section 3 comprises commonly-occurring-information extracting sub-section 12 which extracts commonly-occurring-information which commonly occurs in plural records from sentence-constituting-elements database 2 , commonly-occurring-information database 13 which stores the commonly-occurring-information, and commonly-occurring-information linking sub-section 14 which links the commonly-occurring-information to commonly-held-information or common expression information.
  • sentence-constituting-elements database 2 stores the sentence-constituting-elements as shown in FIG. 3 and commonly-held-information extracting sub-section 8 extracts commonly-held-information and common expression information as shown in FIG. 5.
  • the operation described so far is similar to that of the summarizing apparatus of the first embodiment.
  • Commonly-occurring-information extracting sub-section 12 extracts commonly-occurring-information which commonly occurs in plural records. But exact correspondency among records is not required. Only partial correspondency among records is required. For example, sentence-constituting-elements common to several “who”s are extracted as commonly-occurring-information. In this case, commonly-occurring-information extracting sub-section 12 extracts “annual sales prospect” and “upwardly correct” as commonly-occurring-information which are common to company A and B. Subsequently, commonly-occurring-information extracting sub-section 12 stores the commonly-occurring-information to commonly-occurring-information database 13 .
  • Commonly-occurring-information linking sub-section 14 links commonly-occurring-information stored in commonly-occurring-information database 13 to commonly-held-information or common expression information consulting thesaurus 9 .
  • the reason why commonly-occurring-information linking sub-section 14 consult thesaurus 9 is “who”s for the same commonly-occurring-information may be grouped in thesaurus 14 .
  • commonly-occurring-information linking sub-section 14 outputs common expression information linked to commonly-occurring-information as shown in FIG. 7.
  • commonly-occurring-information linking sub-section 14 may outputs commonly-held-information linked to commonly-occurring-information.
  • Summary completing section 4 generates a summary from commonly-held-information, common expression information, and common expression information linked commonly-occurring-information.
  • the summary in this example becomes “Two communication & computer companies which upwardly corrected annual sales prospect established software companies in Eastern Asia in '95.”.
  • a summarizing apparatus comprises a plurality of thesaurus associated with commonly-held-information extracting sub-section 8 . These thesauri are provided for variety of fields of knowledge to generate various summaries.
  • sentence-constituting-elements database 2 stores sentence-constituting-elements as shown in FIG. 3.
  • thesaurus covering the geographical field as shown in FIG. 4 a thesaurus covering the global-financial field as shown in FIG. 8 is provided. The latter thesaurus is use in this embodiment.
  • commonly-held-information extracting sub-section 8 extracts commonly-held-information according to the priority of categories made by a user, whereby a summary reflecting information of user's strong interest can be generated with flexibility.
  • the present invention is a computer program executing in a computer system which comprises a central processing unit, a main memory, a secondary memory, and a bus connecting the central processing unit, the main memory, and the secondary memory.
  • the computer program is stored in the secondary memory.
  • the computer program resides in the main memory during execution.
  • the computer program includes instructions which causes the central processing unit to perform the functions or methods explained above.

Abstract

A method of summarizing a document which comprises the steps of: extracting sentence-constituting-elements from the document; tabularizing the sentence-constituting-elements corresponding to categories and sentences in the document; extracting commonly-held-information which is common to the sentence-constituting-elements in the same category from the sentence-constituting-elements; looking up common expression information which is common to plural pieces of the commonly-held-information in a thesaurus in which the commonly-held-information and the common expression information are connected by a hierarchical tree; and composing a summary based on the commonly-held-information and the common expression information.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0001]
  • The present invention relates to a method, an apparatus, and a computer program product for making a summary of a document on the basis of commonly owned information and relation between ideas found in the sentences in the document. [0002]
  • 2. Description of the Prior Art [0003]
  • In a computer-aided automatic summarizing system, important sentences are selected from the sentences in a document according to an important-sentence selecting rule, and then, a summary is made from the important sentences according to a summarizing rule. [0004]
  • Examples of summarizing techniques using an important-sentence selecting rule are disclosed in Japanese Patent Laid-Open Publication Nos. 2-93866 and 2-112069. In these examples, keywords are selected among words in a document based on the frequency distribution of occurrence of the same word, database of keywords, and user's decision, and then, important sentences containing the keywords are selected. Japanese Patent Laid-Open Publication No.2-215049 discloses a technique to select important parts from a document by analyzing context vectors. Moreover, Japanese Patent Laid-Open Publication No.3-191475 discloses a technique to select important sentences from a document by applying a rule reflecting the feature of each paragraph to sentences therein. [0005]
  • The technique of automatically making a summary of a document by gathering sentences containing keywords inevitably requires the use of database of keywords in making a decision as to whether or not each sentence contains keywords, and therefore, the resultant summary is likely to be affected by the contents of the database, and the contents of the summary may be biased or lack flexibility. [0006]
  • The technique of automatically making a summary of a document on the basis of the structure of the document such as the context thereof or the feature of the paragraphs therin is suitable for selecting the structure of a long document and catching the transition of the subject of a long document, but is not suitable for creating a compact summary of high-accuracy. [0007]
  • The technique of making a summary from a collection of important sentences in a document requires an effective method for collecting information common to the important sentences and generating a compact expression reflecting the common information. However, such method has not been proposed as yet. [0008]
  • SUMMARY OF THE INVENTION
  • An object of the present invention is to provide a method, an apparatus, and a computer program product for generating from a document an accurate and compact summary on which the point of view of a user is reflected. [0009]
  • According to the present invention, there is provided a method of summarizing a document which comprises the steps of: extracting sentence-constituting-elements from the document; tabularizing the sentence-constituting-elements corresponding to categories and sentences in the document; extracting commonly-held-information which is common to the sentence-constituting-elements in the same category from the sentence-constituting-elements; looking up common expression information which is common to plural pieces of the commonly-held-information in a thesaurus in which the commonly-held-information and the common expression information are connected by a hierarchical tree; and composing a summary based on the commonly-held-information and the common expression information. [0010]
  • Sentence-constituting-elements which come under predetermined categories are extracted from a document. The predetermined categories includes “When”, “Where”, “Who”, “What”, “Why”, “How” (5W1H) and “Done”. The extracted sentence-constituting-elements are categorized based on the categories. The categorized sentence-constituting-elements are referred to as categorized information. The categorized information which is subjected to generation of the framework of a sentence in a summary is referred to as commonly-held-information. The common expression information which is common to commonly-held-information is extracted consulting a thesaurus. The categorized information which is subjected to generation of the qualifying part of a sentence in the summary is referred to as commonly-occurring-information. A sentence in the summary is generated from pieces of commonly-held-information, pieces of common expression information and optionally from pieces of commonly-occurring-information. [0011]
  • These and other objects, features and advantages of the present invention will become more apparent in light of the following detailed description of the best mode embodiments thereof, as illustrated in the accompanying drawings. [0012]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram showing the basic structure of a summarizing apparatus according to a first embodiment of the present invention; [0013]
  • FIG. 2 shows one example of a document supplied to [0014] input processing sub-section 5 of sentence-constituting-elements database producing section 1 in FIG. 1;
  • FIG. 3 shows, in the form of table, one example of sentence-constituting-elements stored in sentence-constituting-[0015] elements database 2 in FIG. 1;
  • FIG. 4 shows examples of the contents of [0016] thesaurus 9 contained in summary-constituting-elements generating 3 section in FIG. l;
  • FIG. 5 shows, in the form of table, one example of commonly-held-information and common expression information which is extracted in commonly-held-[0017] information extracting sub-section 7 in FIG. 1;
  • FIG. 6 is a block diagram showing the basic structure of a summarizing apparatus according to a second embodiment of the present invention; [0018]
  • FIG. 7 shows one example of a set consisting of common expression information, common expression information linked to commonly-occurring-information, and commonly-held-information which is generated by from commonly-occurring-[0019] information linking sub-section 14 in FIG. 6;
  • FIG. 8 shows examples of the contents of [0020] thesaurus 9 according to a third embodiment; and
  • FIG. 9 shows one example of a document supplied to [0021] input processing sub-section 5 in a summarizing apparatus according to a fourth embodiment.
  • DESCRIPTION OF PREFERRED EMBODIMENTS
  • Referring to FIG. 1, a summarizing apparatus according to a first embodiment of the present invention comprises sentence-constituting-elements [0022] database producing section 1 which extracts sentence-constituting-elements corresponding to each of categories from a document and output the sentence-constituting-elements accompanied by the corresponding categories; sentence-constituting-elements database 2 which stores the sentence-constituting-elements dividedly according to the categories; summary-constituting-elements generating section 3 which extracts pieces of commonly-held-information corresponding to the categories from the sentence-constituting-elements and extracts pieces of common expression information which are common to pieces of commonly-held-information; and summary completing section 4 which generates a summary based on pieces of commonly-held-information and pieces of common expression information. The categories include “When”, “Where”, “Who”, “What”, “Why”, “How” (5W1H) and “Done”.
  • The sentence-constituting-elements [0023] database producing section 1 comprises input processing sub-section 5 which executes preprocessing such as morphology analysis to generate preprocessed information, analytic dictionary 7 which stores data used for analyzing the preprocessed information, and 5W1H decomposing sub-section 6 which analyzes the preprocessed information sentence by sentence consulting analytic dictionary 7 to generate sentence-constituting-elements corresponding to categories.
  • Sentence-constituting-[0024] elements database 2 stores sentence-constituting-elements. Fields of sentence-constituting-elements database 2 are categories and each record of sentence-constituting-elements database 2 corresponds to a sentence in the original document.
  • Summary-constituting [0025] elements generating section 3 comprises commonly-held-information extracting sub-section 8 which extracts and outputs commonly-held-information corresponding to each of attributes from sentence-constituting-elements and thesaurus 9 in which hierarchical trees of common expression information and commonly-held-information are stored. In addition to extracting commonly-held-information, commonly-held-information extracting sub-section 8 finds common expression information consulting thesaurus 9 and outputs the commonly-held-information and the common expression information to summary completing section 4.
  • [0026] Summary completing section 4 comprises sentence-composing dictionary 11 in which data used for composing a sentence are stored and sentence-composing sub-section 10 which composes a summary from the commonly-held-information and common expression information consulting sentence-composing dictionary 11.
  • Next, the operation of the summarizing apparatus according to the first embodiment will be explained by example. [0027]
  • The document as shown in FIG. 2 is supplied to [0028] input processing sub-section 5 of sentence-constituting-elements database producing section 1. The document includes three sentences, which are: “Company A established a software company in Taiwan in September, '95.”; “Company B established a software venture company in Hong Kong in April, '95.”; and “Company C established a color television plant in China in December, '95.” 5W1H decomposition sub-section 6 extracts sentence-constituting-elements for every sentences in the documents and all the categories consulting analytic dictionary 7 and supplies the sentence-constituting-elements to sentence-constituting-elements database 2 in which a group of the sentence-constituting-elements in one sentence is treated as one record and each member of the group is recorded in a corresponding field representing one of categories as shown in FIG. 3. In summary-constituting-elements generating section 3, commonly-held-information extracting sub-section 8 extracts “'95” as commonly-held-information for category “when” from sentences 7, 23 and 51, and “establish” as commonly-held-information for category “done” from sentences 7, 23 and 51. In addition, consulting thesaurus 9 as shown in FIG. 4, commonly-held-information extracting sub-section 8 extracts “three electronics companies” as common expression information for category “who” from sentences 7, 23, and 51, “two communication & computer companies” as common expression information for category “who” from sentences 7 and 23, “Eastern Asia” for category “where” from sentences 7, 23, and 51, “software companies and a color television plant” for category “what” from sentences 7, 23, and 51, “software companies” for category “what” from sentences 7 and 23. The extracted commonly-held-information and common expression information are tabled in FIG. 5.
  • [0029] Summary completing section 4 generates a summary from the commonly-held-information and the common expression information as follows:
  • “Two communication & computer companies established software companies in Eastern Asia in '95.” and “Three electronics companies established software companies and a color television plant in Eastern Asia in '95.”. [0030]
  • (EMBODIMENT 2) [0031]
  • FIG. 6 shows the basic structure of a summarizing apparatus according to a second embodiment of the present invention. [0032]
  • In addition to the steps of the first embodiment, in this embodiment, commonly-occurring-information which commonly occurs in plural records is extracted, the commonly-held-information is linked to commonly-held-information or common expression information, thereby summary-constituting-elements can be extended. [0033]
  • As shown in FIG. 6, in addition to commonly-held-information extracting sub-section [0034] 8 and thesaurus 9, summary-constituting-elements generating section 3 comprises commonly-occurring-information extracting sub-section 12 which extracts commonly-occurring-information which commonly occurs in plural records from sentence-constituting-elements database 2, commonly-occurring-information database 13 which stores the commonly-occurring-information, and commonly-occurring-information linking sub-section 14 which links the commonly-occurring-information to commonly-held-information or common expression information.
  • Assuming that the document as shown in FIG. 2 is supplied to [0035] input processing sub-section 5 similarly to the first embodiment, sentence-constituting-elements database 2 stores the sentence-constituting-elements as shown in FIG. 3 and commonly-held-information extracting sub-section 8 extracts commonly-held-information and common expression information as shown in FIG. 5. The operation described so far is similar to that of the summarizing apparatus of the first embodiment.
  • Commonly-occurring-[0036] information extracting sub-section 12 extracts commonly-occurring-information which commonly occurs in plural records. But exact correspondency among records is not required. Only partial correspondency among records is required. For example, sentence-constituting-elements common to several “who”s are extracted as commonly-occurring-information. In this case, commonly-occurring-information extracting sub-section 12 extracts “annual sales prospect” and “upwardly correct” as commonly-occurring-information which are common to company A and B. Subsequently, commonly-occurring-information extracting sub-section 12 stores the commonly-occurring-information to commonly-occurring-information database 13. Commonly-occurring-information linking sub-section 14 links commonly-occurring-information stored in commonly-occurring-information database 13 to commonly-held-information or common expression information consulting thesaurus 9. The reason why commonly-occurring-information linking sub-section 14 consult thesaurus 9 is “who”s for the same commonly-occurring-information may be grouped in thesaurus 14. As a result, commonly-occurring-information linking sub-section 14 outputs common expression information linked to commonly-occurring-information as shown in FIG. 7. In general, commonly-occurring-information linking sub-section 14 may outputs commonly-held-information linked to commonly-occurring-information.
  • [0037] Summary completing section 4 generates a summary from commonly-held-information, common expression information, and common expression information linked commonly-occurring-information. The summary in this example becomes “Two communication & computer companies which upwardly corrected annual sales prospect established software companies in Eastern Asia in '95.”.
  • (EMBODIMENT 3) [0038]
  • A summarizing apparatus according to a third embodiment of the present invention comprises a plurality of thesaurus associated with commonly-held-information extracting sub-section [0039] 8. These thesauri are provided for variety of fields of knowledge to generate various summaries.
  • Assuming that the document as shown in FIG. 2 is supplied to input processing [0040] sub-section 5 similarly to the first and second embodiments, sentence-constituting-elements database 2 stores sentence-constituting-elements as shown in FIG. 3. In addition to the thesaurus covering the geographical field as shown in FIG. 4, a thesaurus covering the global-financial field as shown in FIG. 8 is provided. The latter thesaurus is use in this embodiment.
  • Commonly-held-information extracting sub-section extract “Asian NIES” as common expression information for category “where” from [0041] sentences 7, 23, and 51 consulting the thesaurus shown in FIG. 8. The resultant summary becomes “Two communication & computer companies established software companies in Asia NIES in '95.”
  • (EMBODIMENT 4) [0042]
  • In this embodiment, commonly-held-information extracting sub-section [0043] 8 extracts commonly-held-information according to the priority of categories made by a user, whereby a summary reflecting information of user's strong interest can be generated with flexibility.
  • Assuming that the document containing “Company A established a semiconductor plant in Korea in January, '93.”, “Company A established a software company in Taiwan in September, '95.”, and “Company B established a personal computer plant in Taiwan in June, '94.” as shown in FIG. 9 is supplied to input processing [0044] sub-section 5.
  • When priority is given to “Who” and “What”, the resultant summary becomes “Company A established a semiconductor plant and a software company in Eastern Asia in '93 and '95, respectively.”. [0045]
  • When priority is given to “Where” and “Done”, the resultant summary becomes “Two communication & computer companies established a personal computer plant and a software company in Taiwan in '94 and '95, respectively.”. [0046]
  • Preferably, the present invention is a computer program executing in a computer system which comprises a central processing unit, a main memory, a secondary memory, and a bus connecting the central processing unit, the main memory, and the secondary memory. The computer program is stored in the secondary memory. The computer program resides in the main memory during execution. The computer program includes instructions which causes the central processing unit to perform the functions or methods explained above. [0047]
  • Although the present invention has been shown and explained with respect to the best mode embodiments thereof, it should be understood by those skilled in the art that the foregoing and various other changes, omissions, and additions in the form and detail thereof may be made therein without departing from the spirit and scope of the present invention. [0048]

Claims (15)

What is claimed is:
1. A method of summarizing a document which comprises the steps of:
extracting sentence-constituting-elements from the document;
tabularizing said sentence-constituting-elements corresponding to categories and sentences in said document;
extracting commonly-held-information which is common to said sentence-constituting-elements in the same category from said sentence-constituting-elements;
looking up common expression information which is common to plural pieces of said commonly-held-information in a thesaurus in which said commonly-held-information and said common expression information are connected by a hierarchical tree; and
composing a summary based on said commonly-held-information and said common expression information.
2. The method as set forth in
claim 1
, wherein said categories include “When”, “Where”, “Who”, “What”, “Why”, “How” and “Done”.
3. The method as set forth in
claim 1
, wherein said thesaurus is plural and one of said thesauri is selected corresponding to a user's operation.
4. The method as set forth in
claim 1
, which further comprises the steps of:
extracting commonly-occurring-information which is common to several sentences in said document in one or more categories; and
linking said commonly-occurring-information to said commonly-held-information or said common expression information.
5. The method as set forth in
claim 1
, wherein priority is given to a part of said categories corresponding to a user's operation.
6. An apparatus of summarizing a document which comprises:
means for extracting sentence-constituting-elements from the document;
means for tabularizing said sentence-constituting-elements corresponding to categories and sentences in said document;
means for extracting commonly-held-information which is common to said sentence-constituting-elements in the same category from said sentence-constituting-elements;
thesaurus in which said commonly-held-information and common expression information are connected by a hierarchical tree;
means for looking up said common expression information which is common to plural pieces of said commonly-held-information in said thesaurus; and
means for composing a summary based on said commonly-held-information and said common expression information.
7. The apparatus as set forth in
claim 6
, wherein said categories include “When”, “Where”, “Who”, “What”, “Why”, “How” and “Done”.
8. The apparatus as set forth in
claim 6
, wherein said thesaurus is plural and further comprises means for selecting one of said thesauri corresponding to a user's operation.
9. The apparatus as set forth in
claim 6
, which further comprises:
means for extracting commonly-occurring-information which is common to several sentences in said document in one or more categories; and
means for linking said commonly-occurring-information to said commonly-held-information or said common expression information.
10. The apparatus as set forth in
claim 6
, which further comprises means for giving priority to a part of said categories corresponding to a user's operation.
11. A computer program product comprising a computer useable medium having structured data and computer program logic stored therein,
said structured data comprising:
thesaurus in which commonly-held-information and common expression information are connected by a hierarchical tree; and
said computer program logic comprising:
means for extracting sentence-constituting-elements from the document;
means for tabularizing said sentence-constituting-elements corresponding to categories and sentences in said document;
means for extracting commonly-held-information which is common to said sentence-constituting-elements in the same category from said sentence-constituting-elements;
means for looking up said common expression information which is common to plural pieces of said commonly-held-information in said thesaurus; and
means for composing a summary based on said commonly-held-information and said common expression information.
12. The computer program product as set forth in
claim 11
, wherein said categories include “When”, “Where”, “Who”, “What”, “Why”, “How” and “Done”.
13. The computer program product as set forth in
claim 11
, wherein said thesaurus is plural and said computer program logic further comprises means for selecting one of said thesauri corresponding to a user's operation.
14. The computer program product as set forth in
claim 11
, wherein said computer program logic further comprises:
means for extracting commonly-occurring-information which is common to several sentences in said document in one or more categories; and
means for linking said commonly-occurring-information to said commonly-held-information or said common expression information.
15. The computer program product as set forth in
claim 11
, wherein said computer program logic further comprises means for giving priority to a part of said categories corresponding to a user's operation.
US09/061,096 1997-04-17 1998-04-16 Method, apparatus, and computer program product for generating a summary of a document based on common expressions appearing in the document Expired - Fee Related US6338034B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP9100432A JP3001047B2 (en) 1997-04-17 1997-04-17 Document summarization device
JP9-100432 1997-04-17
JP09-100432 1997-04-17

Publications (2)

Publication Number Publication Date
US20010016809A1 true US20010016809A1 (en) 2001-08-23
US6338034B2 US6338034B2 (en) 2002-01-08

Family

ID=14273798

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/061,096 Expired - Fee Related US6338034B2 (en) 1997-04-17 1998-04-16 Method, apparatus, and computer program product for generating a summary of a document based on common expressions appearing in the document

Country Status (2)

Country Link
US (1) US6338034B2 (en)
JP (1) JP3001047B2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030126553A1 (en) * 2001-12-27 2003-07-03 Yoshinori Nagata Document information processing method, document information processing apparatus, communication system and memory product
US20050234893A1 (en) * 1999-04-27 2005-10-20 Surfnotes, Inc. Method and apparatus for improved information representation
US20080270119A1 (en) * 2007-04-30 2008-10-30 Microsoft Corporation Generating sentence variations for automatic summarization
US7475334B1 (en) * 2000-01-19 2009-01-06 Alcatel-Lucent Usa Inc. Method and system for abstracting electronic documents
US20090018819A1 (en) * 2007-07-11 2009-01-15 At&T Corp. Tracking changes in stratified data-streams
US20100057710A1 (en) * 2008-08-28 2010-03-04 Yahoo! Inc Generation of search result abstracts
US20130046754A1 (en) * 2001-03-21 2013-02-21 Eugene M. Lee Method and system to formulate intellectual property search and to organize results of intellectual property search
WO2014063354A1 (en) * 2012-10-26 2014-05-01 Hewlett-Packard Development Company, L.P. Method for summarizing document

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6789230B2 (en) * 1998-10-09 2004-09-07 Microsoft Corporation Creating a summary having sentences with the highest weight, and lowest length
JP2000348041A (en) 1999-06-03 2000-12-15 Nec Corp Document retrieval method, device therefor and mechanically readable recording medium
JP2001101207A (en) * 1999-09-30 2001-04-13 Oki Electric Ind Co Ltd Document summarizing device
GB2360609A (en) * 2000-03-22 2001-09-26 Ibm Dynamically generating expanded user messages in a computer system
US6763500B2 (en) 2000-12-01 2004-07-13 Microsoft Corporation Real-time-on-demand dynamic document generation
US6990634B2 (en) * 2001-04-27 2006-01-24 The United States Of America As Represented By The National Security Agency Method of summarizing text by sentence extraction
US7925616B2 (en) 2001-06-19 2011-04-12 Microstrategy, Incorporated Report system and method using context-sensitive prompt objects
US7861161B1 (en) 2001-06-19 2010-12-28 Microstrategy, Inc. Report system and method using prompt objects
US7356758B1 (en) 2001-06-19 2008-04-08 Microstrategy Incorporated System and method for run-time report resolution of reports that include prompt objects
US7302639B1 (en) * 2001-06-19 2007-11-27 Microstrategy, Inc. Report system and method using prompt in prompt objects
US7251781B2 (en) * 2001-07-31 2007-07-31 Invention Machine Corporation Computer based summarization of natural language documents
US8799776B2 (en) * 2001-07-31 2014-08-05 Invention Machine Corporation Semantic processor for recognition of whole-part relations in natural language documents
US9009590B2 (en) * 2001-07-31 2015-04-14 Invention Machines Corporation Semantic processor for recognition of cause-effect relations in natural language documents
JP2003248676A (en) * 2002-02-22 2003-09-05 Communication Research Laboratory Solution data compiling device and method, and automatic summarizing device and method
JP3624186B2 (en) * 2002-03-15 2005-03-02 Tdk株式会社 Control circuit for switching power supply device and switching power supply device using the same
JP2004030021A (en) * 2002-06-24 2004-01-29 Oki Electric Ind Co Ltd Document processor and processing method
KR100463655B1 (en) * 2002-11-15 2004-12-29 삼성전자주식회사 Text-to-speech conversion apparatus and method having function of offering additional information
US7376552B2 (en) * 2003-08-12 2008-05-20 Wall Street On Demand Text generator with an automated decision tree for creating text based on changing input data
US8868670B2 (en) * 2004-04-27 2014-10-21 Avaya Inc. Method and apparatus for summarizing one or more text messages using indicative summaries
JP2006023878A (en) * 2004-07-07 2006-01-26 Quin Land Co Ltd Data extraction system
KR100785927B1 (en) * 2006-06-02 2007-12-17 삼성전자주식회사 Method and apparatus for providing data summarization
US9031947B2 (en) * 2007-03-27 2015-05-12 Invention Machine Corporation System and method for model element identification
US7925496B1 (en) 2007-04-23 2011-04-12 The United States Of America As Represented By The Secretary Of The Navy Method for summarizing natural language text
KR100895535B1 (en) 2007-05-10 2009-04-30 (주) 지육공팔 컨설팅그룹 Data search device and method thereof
EP2406739A2 (en) * 2009-03-13 2012-01-18 Invention Machine Corporation System and method for knowledge research
KR20120009446A (en) * 2009-03-13 2012-01-31 인벤션 머신 코포레이션 System and method for automatic semantic labeling of natural language texts
JP5388038B2 (en) * 2009-12-28 2014-01-15 独立行政法人情報通信研究機構 Document summarization apparatus, document processing apparatus, and program
JP6338842B2 (en) * 2013-11-01 2018-06-06 キヤノンメディカルシステムズ株式会社 Medical processing apparatus and medical processing program
US10387538B2 (en) 2016-06-24 2019-08-20 International Business Machines Corporation System, method, and recording medium for dynamically changing search result delivery format
US10482180B2 (en) * 2017-11-17 2019-11-19 International Business Machines Corporation Generating ground truth for questions based on data found in structured resources
JP2021114184A (en) * 2020-01-20 2021-08-05 シャープ株式会社 Summary generation device, summary generation method and program

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4965763A (en) * 1987-03-03 1990-10-23 International Business Machines Corporation Computer method for automatic extraction of commonly specified information from business correspondence
JPH01290076A (en) 1988-05-18 1989-11-21 Canon Inc Natural language processor
JP2783558B2 (en) * 1988-09-30 1998-08-06 株式会社東芝 Summary generation method and summary generation device
JPH02112069A (en) 1988-10-21 1990-04-24 Hitachi Ltd Automatic summarizing system
JPH03191475A (en) 1989-12-20 1991-08-21 Nec Corp Document summarizing system
JPH0418673A (en) * 1990-05-11 1992-01-22 Hitachi Ltd Method and device for extracting text information
JPH0748217B2 (en) 1990-07-17 1995-05-24 工業技術院長 Document summarization device
JPH0743728B2 (en) 1990-08-02 1995-05-15 工業技術院長 Summary sentence generation method
EP0523269A1 (en) * 1991-07-18 1993-01-20 International Business Machines Corporation Computer system for data management
JPH05233729A (en) 1992-02-24 1993-09-10 Nippon Telegr & Teleph Corp <Ntt> Summary selection type information providing device
JP2944346B2 (en) 1993-01-20 1999-09-06 シャープ株式会社 Document summarization device
JPH06259423A (en) 1993-03-02 1994-09-16 N T T Data Tsushin Kk Summary automatically generating system
US5638543A (en) * 1993-06-03 1997-06-10 Xerox Corporation Method and apparatus for automatic document summarization
JP3300142B2 (en) 1993-12-17 2002-07-08 シャープ株式会社 Natural language processor
US5689716A (en) * 1995-04-14 1997-11-18 Xerox Corporation Automatic method of generating thematic summaries
US5778397A (en) * 1995-06-28 1998-07-07 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization
US5918240A (en) * 1995-06-28 1999-06-29 Xerox Corporation Automatic method of extracting summarization using feature probabilities
US5838323A (en) * 1995-09-29 1998-11-17 Apple Computer, Inc. Document summary computer system user interface
US5848191A (en) * 1995-12-14 1998-12-08 Xerox Corporation Automatic method of generating thematic summaries from a document image without performing character recognition
US5924108A (en) * 1996-03-29 1999-07-13 Microsoft Corporation Document summarizer for word processors
US5857184A (en) * 1996-05-03 1999-01-05 Walden Media, Inc. Language and method for creating, organizing, and retrieving data from a database
US5963965A (en) * 1997-02-18 1999-10-05 Semio Corporation Text processing and retrieval system and method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234893A1 (en) * 1999-04-27 2005-10-20 Surfnotes, Inc. Method and apparatus for improved information representation
US7882115B2 (en) * 1999-04-27 2011-02-01 Scott Hirsch Method and apparatus for improved information representation
US7475334B1 (en) * 2000-01-19 2009-01-06 Alcatel-Lucent Usa Inc. Method and system for abstracting electronic documents
US20090083621A1 (en) * 2000-01-19 2009-03-26 Kermani Bahram G Method and system for abstracting electronic documents
US20130046754A1 (en) * 2001-03-21 2013-02-21 Eugene M. Lee Method and system to formulate intellectual property search and to organize results of intellectual property search
US20030126553A1 (en) * 2001-12-27 2003-07-03 Yoshinori Nagata Document information processing method, document information processing apparatus, communication system and memory product
US20080270119A1 (en) * 2007-04-30 2008-10-30 Microsoft Corporation Generating sentence variations for automatic summarization
US20090018819A1 (en) * 2007-07-11 2009-01-15 At&T Corp. Tracking changes in stratified data-streams
US20100057710A1 (en) * 2008-08-28 2010-03-04 Yahoo! Inc Generation of search result abstracts
US8984398B2 (en) * 2008-08-28 2015-03-17 Yahoo! Inc. Generation of search result abstracts
WO2014063354A1 (en) * 2012-10-26 2014-05-01 Hewlett-Packard Development Company, L.P. Method for summarizing document
CN104871151A (en) * 2012-10-26 2015-08-26 惠普发展公司,有限责任合伙企业 Method for summarizing document
EP2912569A4 (en) * 2012-10-26 2016-06-15 Hewlett Packard Development Co Method for summarizing document
US9727556B2 (en) 2012-10-26 2017-08-08 Entit Software Llc Summarization of a document

Also Published As

Publication number Publication date
US6338034B2 (en) 2002-01-08
JPH10293762A (en) 1998-11-04
JP3001047B2 (en) 2000-01-17

Similar Documents

Publication Publication Date Title
US6338034B2 (en) Method, apparatus, and computer program product for generating a summary of a document based on common expressions appearing in the document
US7493252B1 (en) Method and system to analyze data
US5748973A (en) Advanced integrated requirements engineering system for CE-based requirements assessment
US6014680A (en) Method and apparatus for generating structured document
EP1399842B1 (en) Creation of structured data from plain text
JP3067966B2 (en) Apparatus and method for retrieving image parts
US5442780A (en) Natural language database retrieval system using virtual tables to convert parsed input phrases into retrieval keys
US8015171B2 (en) Document analysis and retrieval
US7567954B2 (en) Sentence classification device and method
CN107818082A (en) With reference to the semantic role recognition methods of phrase structure tree
Bais et al. A model of a generic natural language interface for querying database
CN112506488A (en) Method for generating programming language class based on sql creating statement
US20050033566A1 (en) Natural language processing method
WO2021136009A1 (en) Search information processing method and apparatus, and electronic device
US7177796B1 (en) Automated set up of web-based natural language interface
JP2000227917A (en) Thesaurus browsing system and method therefor and recording medium recording its processing program
US5551036A (en) Method and system for generating operation specification object information
KR20220041336A (en) Graph generation system of recommending significant keywords and extracting core documents and method thereof
JPH0743728B2 (en) Summary sentence generation method
Abibi et al. Object on Use Case Description: Sequence Diagram Conformance based on Step Performed using Text Pre-Processing on Sipranta Application SRS
KR20020061443A (en) Method and system for data gathering, processing and presentation using computer network
Degeratu et al. An Automatic Method for Constructing Domain-Specific Ontology Resources.
US20090313221A1 (en) Patent technology association classification method
CN110688453B (en) Scene application method, system, medium and equipment based on information classification
JP4348145B2 (en) Sentence classification program, sentence classification method, and sentence classification apparatus

Legal Events

Date Code Title Description
AS Assignment

Owner name: NEC CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ISHIKAWA, KAI;OKUMURA, AKITOSHI;REEL/FRAME:009105/0466

Effective date: 19980407

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20140108