US20070282809A1 - Method and apparatus for concept-based visual - Google Patents

Method and apparatus for concept-based visual Download PDF

Info

Publication number
US20070282809A1
US20070282809A1 US11/526,409 US52640906A US2007282809A1 US 20070282809 A1 US20070282809 A1 US 20070282809A1 US 52640906 A US52640906 A US 52640906A US 2007282809 A1 US2007282809 A1 US 2007282809A1
Authority
US
United States
Prior art keywords
concept
returned
document
accordance
term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/526,409
Other versions
US7809717B1 (en
Inventor
Orland Hoeber
Xue-Dong Yang
Yiyu Yao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Blackbird Tech LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=38791565&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=US20070282809(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Individual filed Critical Individual
Assigned to UNIVERSITY OF REGINA reassignment UNIVERSITY OF REGINA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOEBER, ORLAND, YANG, XUE-DONG, YAO, YIYU
Publication of US20070282809A1 publication Critical patent/US20070282809A1/en
Application granted granted Critical
Publication of US7809717B1 publication Critical patent/US7809717B1/en
Assigned to BLACKBIRD TECH LLC reassignment BLACKBIRD TECH LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOEBER, ORLAND, YANG, XUE-DONG, YAO, YIYU
Assigned to YAO, YIYU, HOEBER, ORLAND, YANG, XUE-DONG reassignment YAO, YIYU ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: UNIVERSITY OF REGINA
Assigned to SECURITY FINANCE LLC reassignment SECURITY FINANCE LLC SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BLACKBIRD TECH LLC
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the World Wide Web has given computer users on the Internet access to vast amounts of information in the form of billions of Web pages. Each of these pages can be accessed directly by a user typing the URL (universal resource locator) of a web page into a web browser on the user's computer, but often, a person is more likely to access a website by finding it with the use of search engine.
  • a search engine allows a user to input a search query made up of words or terms that a user thinks will be used in the web pages containing the information he or she is looking for. The search engine will attempt to match web pages to the terms in the search query and will then return the located web pages to the user.
  • the search results generated from a user's query typically consist of a collection of document surrogates, each of which contains summary information, attributes, and other meta-data about the matched documents. These documents surrogates are often presented in a simple list-based format, displaying the title of the document, a snippet containing.
  • a user can then select one of the returned entries to view the corresponding web page.
  • One solution that can be used to address these numerous search results is for the user to reformulate his or her search query to narrow the search with the result that fewer document are located mating the search query, however, in many cases there may be high quality relevant documents buried in the search results set that were missed because the users did not look at enough search result pages.
  • Mat Another method Mat has also been used is to cluster the search results such that documents that are similar to one another are grouped together.
  • a user navigates the clusters in order to narrow down the search results and avoid clusters of relevant documents.
  • the user will select the relevant clusters and view lists of the returned documents in which a large portion are relevant to the requirements of the user.
  • One of the problems with these systems is that determining what the clusters should be centered around and determining an adequate description of the cluster. If the information does not correctly describe the document contained in the cluster, a user may either choose clusters that are not relevant or entirely miss clusters that may contain relevant documents.
  • a method for visually coding search results based on at least one concept.
  • the method comprises: using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising at least one returned documents; obtaining the at least one concept by matching the plurality of search terms to the at least one concept; evaluating the similarity between the at least one returned document and the at least one concept; and displaying the at least one returned document with an accordance indicator.
  • the accordance indicator indicates tee similarity between the at least one returned document and the at least one concept.
  • a data processing system for visually coding search results based on at least one concept.
  • the data processing system comprises: at least one processor, a memory operatively coupled to the at least one processor; a display device operative to display data; and a program module stored in the memory and operative for providing instructions to the at least one processor.
  • the at least one processor is responsive to the instructions of the program module.
  • the program module operative is for: using a search very containing at least one search term to conduct a seal of a plurality of computer readable documents and obtain search results comprising a plurality of returned documents; obtaining the least one concept by matching the at least one search term to the at least one concept; evaluating the similarity between a returned document and the at least one concept; and displaying on the display device the returned document with an accordance indicator.
  • the accordance indicator indicates the similarity between the returned document and the at least one concept.
  • a method of visually sorting search results based on at least one concept comprises: using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising a first returned document and a second returned document; obtaining the least one concept by matching the plurality of search terms to the at least one concept; determining a first accordance value by evaluating the similarity between the first returned document and the at least one concept and a second accordance value by evaluating the similarity between the second returned document and the at least one concept; and displaying the first returned document and second returned document sorted in an order based on the fist accordance value and the second accordance value.
  • a data processing system for visually sorting search results based on at least one concept.
  • the data processing system comprises: at least one processor, a memory operatively coupled to the at least one processor, a display device operative to display data; and a program module stored in the memory and operative for providing instructions to the at least one processor.
  • the at least one processor is responsive to the instructions of the program module.
  • the program module operative is for: using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising a first returned document and a second returned document; obtaining the least one concept by matching the plurality of search terms to the at least one concept; determining a first accordance value by evaluating the similarity between the first returned document and the at least one concept and a second accordance value by evaluating the similarity between the second returned document and the at least one concept; and displaying the first returned document and second returned document sorted in an order based on the first accordance value and the second accordance value.
  • a computer readable memory having recorded thereon statements and instructions for execution by a computer to carry out the above methods is provided.
  • FIG. 1 is a schematic illustration of a data processing system suitable for supporting the operation of methods in accordance with the present invention
  • FIG. 2A is a schematic illustration of a network configuration suitable for supporting the operation of methods in accordance with the present invention wherein the data processing system is connected over a network to a plurality of servers operating as a search engine;
  • FIG. 2B is a schematic illustration of a network configuration suitable for supporting the operation of methods in accordance with the present invention wherein the data processing system is configured as a server and a remote device is used to access the data processing system;
  • FIG. 3 is an architectural schematic of a data structure for a concept knowledge base in accordance with an aspect of the present invention
  • FIG. 4 illustrates a flowchart of a method of automatically creating a data structure containing a concept knowledge base in accordance with the present invention
  • FIG. 5 is a schematic diagram of an overview of a software system in accordance with an aspect of the present invention.
  • FIG. 6 illustrates a flowchart of a method for generating a set of concepts based on the search terms in the search query in accordance with an aspect of a present invention
  • FIG. 7 is a typical document surrogate data object, which is commonly provided as a returned document by a search engine as one of a set of search results;
  • FIG. 8 is a flowchart of a method of determining accordance values for each returned document in a set of search results
  • FIG. 9 is an embodiment of a graphical interface displaying a number of returned documents with accordance indicators
  • FIG. 10 is another embodiment of a graphical interface displaying a number of returned documents with accordance indicators and sorted by accordance values.
  • FIG. 11 is an embodiment of a graphical interface displaying a number of returned documents with a plurality of accordance indicators for each displayed returned document.
  • FIG. 1 illustrates a data processing system 1 suitable for supporting the operation of methods in accordance with the present invention.
  • the data processing system 1 could be a personal computer, server, mobile computing device, cell phone, etc.
  • the data processing system 1 typically comprises: at least one processing unit 3 ; a memory storage device 4 ; at least one input device 5 ; a display device 6 and a program module 8 .
  • the processing unit 3 can be any processor that is typically known in the art with the capacity to run the program and is operatively coupled to the memory storage device 4 through a system bus. In some circumstances the data processing system 1 may contain more than one processing unit 3 .
  • the memory storage device 4 is operative to store data and can be any storage device that is known in the art, such as a local hard-disk, etc. and can include local memory employed during actual execution of the program code, bulk storage, and cache memories for providing temporary storage. Additionally, the memory storage device 4 can be a database that is external to the data processing system 1 but operatively coupled to the data processing system 1 .
  • the input device 5 can be any suitable device suitable for inputting data into the data processing system 1 , such as a keyboard, mouse or data port, such as a network connection, and is operatively coupled to the processing unit 3 and allowing the processing unit 3 to receive information from the input device 5 .
  • the display device 6 is a CRT, LCD monitor, etc. operatively coupled to the data processing system 1 and operative to display information.
  • the display device 6 could be a stand-alone screen or if the data processing system 1 is a mobile device, the display device 6 could be integrated into a casing containing the processing unit 3 and the memory storage device 4 .
  • the program module 8 is stored in the memory storage device 4 and operative to provide instructions to processing unit 3 and the processing unit 3 is responsive to the instructions from the program module 8 .
  • FIG. 2A illustrates a network configuration wherein the data processing system 1 is connected over a network 55 to a plurality of servers 50 operating as a search engine.
  • FIG. 2B illustrates a network configuration wherein the data processing system 1 is configured as a server and a remote device 60 , such as another computer, a PDA, cell phone or other mobile device connected to the Internet, is used to access the data processing system 1 .
  • the data processing system 1 runs the majority of the software and methods, in accordance with the present invention, and accesses a plurality of servers 50 operating as a search engine to conduct a web search.
  • the remote device 60 does not need to have the capacity necessary to contain all the necessary data structures and run all the methods.
  • a computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • FIG. 3 illustrates an architectural schematic of a data structure for a concept knowledge base 10 , in accordance with an aspect of the present invention.
  • the data structure is stored on a memory and is accessible by an application program being executed by a data processing system, such as the data processing system 1 illustrated in FIG. 1 .
  • the data structure contains information that is accessible by the application program.
  • the concept knowledge base 10 contains information relating to a field of knowledge.
  • the concept knowledge base 10 could contain information related to the field of science.
  • the concept knowledge base 10 contains a number of concept data objects 12 , a number of term data objects 14 and a number of edge data objects 16 .
  • Each concept data object 12 contains a concept field 13 containing a concept that is related to a specific concept falling within the field of knowledge of the concept knowledge base 10 .
  • the concept field 13 typically contains a text string identifying the concept.
  • the concept knowledge base 10 is for computer science, there may be concept data objects 12 with the concept field 13 containing the text string of “computer graphics”, another concept data object 12 with the concept field 13 containing the text string of “distributed computing”, another concept data object 12 with the concept field 13 containing the text string “artificial intelligence”, etc.
  • Each term data object 14 contains a term field 15 containing a text string.
  • the text string contains a word or phrase that describes a concept of one of the concept data objects 12 .
  • Each concept data object 12 is associated with one or more term data objects 14 and each term data object 14 is associated with one ore more concept data objects 12 .
  • the association of a concept data object 12 and a term data object 14 is defined by an edge data object 16 which contains a weight field 18 .
  • a term data object 14 that is associated with a concept data object 12 contains a term in the term field 15 that describes the concept contained in the concept field 13 of the concept data object 12 .
  • the relevancy of the term in the term field 15 of the term data object 14 to the concept in the concept field 13 of an associated concept data object 12 is represented by a weight in the weight field 18 of the edge data object 16 .
  • FIG. 4 illustrates a flowchart of a method of automatically creating a data structure containing a concept knowledge base in accordance with the present invention.
  • Method 100 comprises the steps of: determining a concept 110 ; selecting a document describing the concept 120 ; determining terms in the document to be analyzed 130 ; determining the frequency of the selected terms 140 ; checking if there are any remaining documents describing a concept 150 ; calling a preliminary weight 160 ; checking if there are any more concepts 170 ; and normalizing all of the weights 180 .
  • the method takes a number of documents and/or descriptions in computer readable form that describe a number of different concepts in a knowledge area and uses the documents to automatically generate a data structure of a concept knowledge base 10 , as shown in FIG. 3 .
  • the method 100 begins with step 110 .
  • a concept falling within the concept knowledge base is determined and a concept data object is created with information identifying the concept contained in the concept field.
  • Each concept will be described by one or more document or descriptions in computer readable format. Once a concept has been determined at step 110 , one or more documents describing the concept are identified and at step 120 one of these documents is selected to be analyzed.
  • the method 100 determines the terms to be analyzed in the document. For each term to be analyzed, method 100 creates a term data object for each selected term with the term field containing the term, if a term data object containing the term does not already exist. An edge data object indicating the association of the term data object and the concept data object is also created and after the method 100 is completed will contain a weight indicating the relation of the term data object with the associated concept data object containing the concept described by the document being analyzed.
  • the terms that are analyzed can include all of the words used in the document or only specific words in the documents. For example, common words that are basically non-descriptive, such as “the”, “a”, “this”, etc. may be excluded from the selected terms that are selected for analysis at step 130 .
  • the frequency of each of the selected terms in the selected document is determined.
  • the occurrence of each selected term in the document is determined.
  • the occurrence of a selected term t j in the document being analyzed can easily be determined, via text matching, and is defined by the function:
  • each of the terms appearing in the document are then averaged based on the number of occurrences of all of the terms in the document. For example, the averaging could be done using the following equation:
  • This equation simply divides the frequency or tally of a term being analyzed by the total number of terms being analyzed in document d ik .
  • the eventual weight determined for each association between a term node and a concept node takes into account the number of occurrences of a term in the document and provides a potentially more relevant indicator of the relation between the term data object to the concept data object because words or terms that appear often relative to the total number of terms will be given more weight.
  • This preliminary averaging is used to try to prevent a single large document describing a concept from providing term weights that overshadow the weights provided by a number of smaller documents.
  • step 150 the method 100 checks to see if there are any more documents related to the concept that have not been analyzed. If there are more documents to be analyzed related to the concept, the method 100 returns to step 120 , selects the next unanalyzed document and repeats steps 130 , 140 and 150 . As long as more documents related to the concept exist, step 150 , causes the method 100 to analyze all of the documents. When there are no more documents related to the concept to be analyzed, the method 100 continues on to step 160 .
  • the method 100 calculates a preliminary weight for each of the terms used in the documents related to a single concept. For each term an interim weight w ij * is calculated taking into account the average term frequency of the documents related to the concept.
  • This calculation is used to prevent concepts with a large numbers of documents from producing term weights that overshadow term weights from concepts with fewer documents describing the concept.
  • the method 100 checks to see if there are any more concepts left to be evaluated. If there are concepts remaining that have not been analyzed, the method 100 returns to step 110 and the next concept is selected to be analyzed. The method 100 then repeats steps 120 , 130 , 140 , 150 and 160 determining a preliminary weight for each of the terms appearing in the documents describing the selected document. The method 100 continues to analyze each concept repeating steps 110 , 120 , 130 , 140 , 150 , 160 and 170 until all of the concepts have been analyzed, at which point, the method 100 continues on to step 180 .
  • the method 100 determines a normalized weight for each of the terms associated with the concepts.
  • the preliminary weight w lj * previously determined for each association between a term t i and a concept is divided by the sum of all of the weights determined for the term t i connected to r concepts. This equation is shown as follows:
  • the normalization of the weights is used to prevent common terms that are included in many of the documents for many concepts from having higher weight values than other less common terms. These terms are often of little value in describing a concept. By using normalization, the weights of common terms are significantly reduced. Without this normalization step, common terms that are included in many documents for many different concepts would have a very high weight, even though these terms are of little value in describing the concept. With this normalization step, the weights of these common terms are significantly reduce.
  • the stems of the roots of the terms are used to construct the knowledge base allowing terms to be matched based on their stems or roots rather than being based on exact text matches.
  • the method 100 will focus on only specific terms in a document that are highlighted in a particular way, i.e. in an abstract. Alternatively, there could be a list of terms that are not analyzed, such as common terms that are not descriptive of a concept, for example terms such as the, and, etc. may be excluded from being selected.
  • FIG. 5 illustrates a software system in accordance with the present invention.
  • the software system 200 contains: a search query module 210 ; a concept generator module 220 ; a concept knowledge database 230 ; a search module 240 ; a search engine module 250 ; a concept accordance module 260 ; and a visualization interface module 270 .
  • a search query is input to the system 200 at the search query module 210 .
  • the search query contains one or more search terms and usually at least two or three search terms. From the search query module 210 this search query containing one or more search terms is passed to the concept generator module 220 , which uses the concept knowledge database 230 to select a set of concepts and generate a concept vector for each selected concept.
  • the search terms and the concept vectors are passed from the concept generator module 220 to the search module 240 where the search module 240 requests a search from the search engine 250 using the search query and receives search results.
  • the results returned by the search engine module 250 are a list of returned documents, where each returned document is typically a document surrogate that describes the actual documents located by the search engine module 250 .
  • the search engine module 250 passes the returned document to the concept accordance module 260 where a document vector is determined for each of the returned document and an accordance value indicating the similarity of the returned document to each of the concepts in the concept set is formulated.
  • a fuzzy membership score is typically determined for the returned document relative to each of the concepts in the concept set and is used for the accordance value.
  • the search module 240 can wait until all the search results are returned from the search engine module 250 before passing the search results all together to the concept accordance module 260 , however, in one embodiment, the search module 240 passes each returned document of the search results to the concept accordance module 260 as the search module 250 receives the returned document to decrease the time required by the system 200 to display the search results.
  • the concept set and the search results (comprising a plurality of returned document), with each returned document having an accordance value indicating the similarity of the returned document to each of the concepts in the concept set, are passed to the visualization interface module 270 wherein the visualization interface module 270 displays the search results to a user.
  • the software system 200 can be implemented wholly on a data processing system, such as the data processing system 1 shown in FIG. 2A , with only the search engine module 250 resident on a server 50 connected to the data processing system 1 over the network 55 .
  • various components of the software system 100 such as the search query module 210 could be resident on a mobile device 60 operably connected to a data processing system 1 which contains other components of the software system 200 , (such as the concept generator 220 , search module 240 , etc.) as shown in FIG. 2B .
  • the software system 200 begins with an initial search query being input to the search query module 210 which passes the search query to the concept generator module 220 .
  • the concept generator module 220 accesses the concept knowledge database 230 and uses the information in the concept knowledge database 230 to generate a concept set and a respective set of concept vectors using the search query.
  • FIG. 6 illustrates a flowchart of a method 400 for generating a set of concepts based on the search terms in the search query.
  • Method 400 is implemented by the concept generator module 220 in FIG. 4 , using the concept knowledge database 230 .
  • the method 400 uses the concept knowledge database 230 to generate a list of concepts using relationships between the search terms and concepts.
  • Method 400 comprises the steps of: matching terms in the search query to term data objects in the concept knowledge base to obtain a first term set 410 ; obtaining a concept set of concept data objects associated with the first term set 420 ; obtaining a second term set of term data objects associated with the concepts objects in the concept set 430 ; and formulating a set of concept vectors 440 .
  • the method 400 begins with step 410 and matching the terms in the search query to term data objects in the concept knowledge database 230 .
  • the concept knowledge database 230 is accessed and the terms making up the search query are matched with any term data objects that have a term in the term field matching the term in the search query.
  • a fist term set containing these selected term data objects is obtained.
  • step 410 is completed, all of the term data objects in the concept knowledge database 230 that have a term in the term field that corresponds to one of the terms in the search query are identified and these term data objects have been added to a first term set.
  • the first term set is used to obtain a concept set containing concept data objects from the concept knowledge database 230 associated with one or more term data objects in the first term set.
  • the term data objects making up the first term set are used to obtain a number of concept data objects from the concept knowledge database 230 .
  • Concept data objects associated with one or more term data objects in the first term set are selected to form the concept set.
  • Concept data objects that are not strongly associated with term data objects in the first term set can be excluded from the concept set using a first weight threshold and a term ratio threshold.
  • the first weight threshold is used to exclude concept data objects that are not strongly associated with one of the term data objects in the first term set by comparing the weight assigned to an association between a concept data object and a term data object and excluding the concept data object from the concept set if the weight determined for the association is less than the first weight threshold.
  • a term ratio threshold is used to further exclude concept data objects from the concept set.
  • a concept data object is associated with one of the term data objects in the first term set with a weight greater than the first weight threshold, the concept data object is evaluated to determine the ratio of all of the term data objects in the first term set to which the concept data object is associated with a weight greater than the first weight threshold. If this ratio is less than the term ratio threshold, the concept data object is excluded from the concept set.
  • the first weight threshold, term ratio threshold and second weight threshold can be determined. For example, some initial studies found that a first weight threshold of 0.05, a term ratio threshold of 0.26 and a second weight threshold of 0.10 provided satisfactory results.
  • a second term set is obtained for each of the concept data objects in the concept set.
  • Each of the concept data objects in the concept set are evaluated to determine term data objects, in the concept knowledge database 230 , associated with each of these concept data objects.
  • Term data objects associated with the concept data objects selected for the concept set are added to the second term set.
  • a second weight threshold may be used to exclude term data objects from the second term set if they are associated with concept data objects in the concept sets by a weight that is less than the second weight threshold.
  • the concept set is used to formulate a set of concept vectors.
  • a concept vector is created using the first term set and second term set obtained for the concept data object at step 430 .
  • Each term data object in the first term set and second term set for a concept data object is defined as a dimension of the concept vector and the weight of the association between the term data object and the concept data object is used to set the magnitude of the concept vector in that assigned dimension.
  • the concept set and the respective concept vectors are passed to the search module 240 to request a search using the search query.
  • the sear module 240 When the search module 240 receives the search query, the concept set and the set of concept vectors, the sear module 240 requests the search engine module 250 to conduct a search using the search query.
  • the search module 240 is typically resident on the data processing system 1 and the search engine module 250 is typically a web search engine, such as the web search engine running on servers 50 in FIGS. 2A and 2B , with the search being conducted on a number of computer readable documents, such as searching for web pages on the World Wide Web.
  • the search engine module 240 could be used in any computerized document storage system capable of searching a large number of computer readable documents.
  • the search engine module 240 could return the results of the search in the form of a list of complete documents located in the search, however, due to the likelihood that a relatively large number of documents can be located with the search and to save overhead on the data processing system, the search results are typically returned in the form of list of document surrogates, with a document surrogate returned for each document located in the search.
  • FIG. 7 illustrates a typical document surrogate data object 160 which is commonly provided as a returned document by a search engine as one of a set of search results.
  • search engines typically provide a set of document surrogates 160 in place of supplying the completed documents.
  • Document surrogates 160 are the primary data objects in the list-based representation used by search engines. Each document surrogate 160 provides information describing the corresponding complete document which commonly consists of: a title 162 ; a URL 164 ; a summary 166 ; and any other additional other assorted information.
  • the title 162 provides the tile of the corresponding complete document described by the document surrogate 160
  • the URL 164 provides the address of the complete document
  • the summary 166 contains a short description or snippet of the complete document and usually provides he query terms of the search term in context.
  • search results obtained by the search module 240 are passed to the concept accordance module 260 where accordance values for each return document indicating the degree of similarity between the returned document and the concepts in die concept set are formulated.
  • FIG. 8 illustrates a method 500 of determining accordance values for each returned document of the search results.
  • Each returned document has an accordance value determined for each concept in the concept set and indicating the degree of similarity between the returned document and the concept.
  • the accordance value is typically based on a fuzzy membership score with each concept in the concept set treated as a centroid to determine the clustering of the returned documents around the concepts.
  • a returned document might have a higher fuzzy membership score in relation to one concept as compared to another concept, indicating that the first concept has a higher degree of similarity to the returned document than the second concept.
  • Method 500 comprises the steps of: selecting a first returned document 510 ; analyzing the returned document 520 ; generating a document vector 530 for the selected returned document; selecting a first concept 540 ; determining an accordance value for the selected returned document in relation to a selected concept 550 ; checking if more concepts remain in the concept set to be analyzed 560 in relation to the selected returned document and selecting the next concept 570 , if more concepts remain in the concept set; once the selected returned document has been analyzed in relation to all the concepts in the concept set, checking if more returned documents are present 580 and selecting the next returned document 590 , if there are more returned documents; and ending when all the returned documents have an accordance indicator determined for each of the concepts in the concept set.
  • the method 500 begins with the selection of a first returned document at step 510 . As the search results are returned, the method 500 selects one of the returned documents returned as the search results.
  • the returned document is analyzed to determine a frequency of each unique word or term in the returned document. If the returned document is the entire document, the occurrence of each unique term in the entire document is determined. If the returned document is a document surrogate the occurrence of each unique term can be determined based on the summary of the complete document and optionally in the title.
  • roots of the words rather than the words themselves, such as using Porter's stemming algorithm, so words with various prefixes and suffixes are not counted as separate terms.
  • Each unique term in a returned document will be based on the root or stem of the term so that the frequencies determined are not reduced by the use of terms that use different suffixes, prefixes, etc.
  • Match based on the stems or roots of the search terms can be more effective than exact word matches, since it takes into account different variations of the same root word.
  • the frequencies determined for the unique terms in the selected returned document are then used to generate a document vector at step 530 .
  • the document vector represents the frequency of unique terms in the returned document in a multidimensional vector form. Each unique term is used to define a dimension of the document vector and the magnitude of the document vector in that dimension is then set as the frequency of the term in the returned document.
  • a first one of the concepts in the concept set is selected at step 540 and an accordance value indicating the degree of similarity between the returned document and the selected concept is determined at step 550 .
  • the accordance value is assigned by determining a fuzzy membership score of the returned document relative to the selected concept.
  • the fuzzy membership score, u ij of the returned document can be determined with respect to a concept c j by the following equation:
  • sim(d i c j ) or the similarity between a document vector and a concept vector is given by the Euclidean distance metric:
  • x i is the document vector and x j is the concept vector.
  • the method 500 then checks, at step 560 , to see if more concepts remain in the concept set and if more concepts remain, the next concept in the concept set is selected at step 570 and an accordance value consisting of a fuzzy membership score for the newly selected concept is determined at step 550 .
  • the method 500 will continue to repeat steps 550 , 560 and 570 until all the concepts in the concept set of been selected and an accordance value determined for each of the returned document has been established for each of the concepts in the concept set.
  • step 580 if more returned documents are remaining, the next unit returned document is selected at step 590 and steps 520 , 530 , 540 , 550 , 560 , 570 , 580 and 590 are repeated until all of the repeated documents have been returned.
  • the method 500 determines a document vector for each of the returned documents and uses the document vector to determine a fuzzy membership score for the returned document and each concept in the concept set, which is used then to assign an accordance value indicating the similarity between the returned document and each of the concepts in the concept set.
  • each returned document in a set of search results is evaluated as the returned document is returned from the search engine module 250 to decrease the amount of time the system requires before displaying the search results to a user.
  • each of the returned documents will have an accordance value determined in relation to each of the concepts in the concept set.
  • the search results comprising the plurality of returned documents and the formulated accordance values are passed to the visualization interface module 270 .
  • Each returned document in the first portion 610 can be displayed with the summary of the returned document.
  • each returned document in the first portion 610 can display the summary of the returned document only when a user moves a cursor over the returned document in the first portion 610 .
  • a popup field can appear containing the summary of the returned document.
  • Each returned document in the first portion 610 is displayed with an accordance indicator 614 .
  • the accordance indicator 614 is based on the accordance values formulated for the returned documents and indicates the similarity between a returned document and one or more of the concepts in the concept set generated using the search terms in the search query.
  • the accordance values consisting of the fuzzy membership scores determined for each returned document in relation to the concepts in the concept set, using method 500 , illustrated in FIG. 8 , are used to formulate the accordance indicators 614 .
  • Each concept 685 in the set of concepts is displayed in a viewing pane 680 .
  • a user can select one or more of the concepts 685 . If a user selects only a single concept 685 , the accordance indicator 614 shows the accordance value of each turned document in relation to the selected concept. However, if the user selects more than one of the concepts 685 in the concept set, the accordance indicator 614 is the result of the addition of the accordance values of each returned document in relation to all of the selected concepts.
  • a color shade is assigned to each accordance indictor 614 based on the accordance values of the selected concepts for each returned document.
  • returned documents with higher accordance values for the selected concepts are assigned a color shade that is more intense or deeper.
  • an accordance indicator 614 indicating a relatively low accordance may have a very pale yellow color
  • an accordance indicator 614 indicating a higher accordance may be a color shade of a much darker red.
  • Each of the returned documents in the second portion 630 is displayed in a format which provides a compressed or small view of the returned document.
  • Each returned document shown in the second portion 630 is displayed with an accordance indicator 614 , and, typically, a title representation 616 .
  • the title representation 616 represents the title of the returned document; however, the title representation 616 does not necessarily have to provide the title in a readable format.
  • Returned documents displayed in the second portion 630 may be displayed so small that a solid line is used to provide the title representation 616 and the title representation 616 merely indicates the approximate length of the title of the returned document in relation to the length of the titles of the other returned documents.
  • Each of the returned documents displayed in the second portion 630 corresponds to a returned document displayed in the first portion 610 , such that all of the returned documents in the first portion 610 are contained in the second portion 630 , with the returned documents in the first portion 610 occurring in the same order that they occur in the second portion 630 .
  • the second portion 630 represents a subset of list of returned documents making up the search results from the search engine and the first portion 610 represents a subset of the returned documents show in the second portion 630 .
  • the first format allows a user to see a large number of the returned documents in a compressed view in the second portion 630 and then also see some of these returned shown in the second portion 630 in a larger, more detailed view in the first portion 610 .
  • the accordance indicators 614 displayed with the returned documents in the first portion 610 indicates the same accordance values as the accordance indicators 614 associated with the same returned document in the second portion 630 .
  • the second portion 630 displays a much greater portion of the list of returned documents than the first portion 610 . In some cases, more than one hundred (100) returned documents may be displayed in the second portion 630 .
  • the first portion 610 displays a relatively smaller number of the returned documents because the returned documents displayed in the first portion 610 provides more details and therefore the returned documents must be shown in a large enough size that a user can read the titles 636 of the returned documents shown in the first portion 610 .
  • the second portion 630 may display one hundred (100) returned documents in the first format the first portion 610 may display fewer than twenty five (25) returned documents.
  • An indicator frame 650 is positioned over the returned documents in the second portion 630 that are also shown in the first portion 610 .
  • the indicator frame 650 indicates the returned documents shown in the second portion 630 that are also shown in the first portion 610 .
  • the second portion 630 is updated to indicate the same returned documents shown in the first portion 610 in the second portion 630 , by moving the indicator frame 650 along the second portion 630 .
  • a user can quickly look over the accordance indicators 614 for each returned document shown in the first portion 610 to determine which returned documents correlate to the desired concepts.
  • a user can quickly look over the accordance indicators 614 for each returned document in the second portion 630 and quickly determine which returned documents shown in the second format 630 correlate to the desired concepts closer then the other returned documents without requiring the user to perform any in-depth analysis of each returned document.
  • By simply scanning over the accordance indicators 614 a user can quickly and easily visually locate the accordance indicators 614 that indicate a returned document that has a high correlation with a concept by the various shades of color shown in the accordance indicators 614 .
  • a user can also visually analyze the returned documents shown in the first portion 610 , checking for returned documents that contain accordance indicators 614 indicating the similarity of a returned document to one or more of the concepts. Once a user identifies a returned document or a grouping of returned documents in the first portion 610 that the user wishes to examine in more detail, the user can then move the indicator frame 660 so that the selected returned documents or grouping of returned documents in the second portion 630 are displayed in the second format in the first portion 610 . A user can then examine the titles 636 of the represented documents and click on a desired returned document title 636 to go to the document.
  • FIG. 10 illustrates a screen shot of a further embodiment of a graphical interface 700 that may be generated by the visual interface module 470 shown in FIG. 5 , for displaying the search results wherein the accordance values of the returned documents in relation to the selected concept or concepts in the concept set are used to sort the returned documents in the first portion 610 and second portion 630 .
  • the returned documents are sorted based on the accordance values for the returned documents in relation to a selected concept.
  • Returned documents that have a higher degree of similarity to a selected concept (shown with the accordance indicators 614 ) appear higher in the list than other returned documents that are less similar to the selected concept or concepts.
  • FIG. 11 illustrates a screen shot of a further embodiment of a graphical interface 800 that may be generated by the visual interface module 470 shown in FIG. 5 , for displaying the search results.
  • Graphical interface 800 is similar to graphical interface 600 (illustrated in FIG. 9 ) in that a first portion 610 of a list of returned documents is displayed with each returned document displaying the title 636 and any other information. However, instead of displaying a single accordance indicator 614 with each returned document wherein when more than one concept from the concept set is selected the single accordance indicator 614 is made up of the sums of the accordance values of the selected concepts, graphical interface 800 provides a separate accordance indicator 814 for each selected concept.
  • a user can sort the list of the returned documents displayed in the graphical interface 800 based on one concept over the one or more others by selecting one of the concepts to sort the list by.
  • a user selects the concept the user desires the list to be sorted by.
  • the list of returned documents is then resorted to place a precedent on the selected concept and the first portion 610 of returned documents and the second portion 630 of the returned documents are updated to reflect the newly sorted list.
  • a user can also conduct a nested sort by selecting a second concept.
  • the list of returned documents is then resorted to place a primary weight on returned documents with high accordance values with the first selected concept and then a secondary weight on returned documents with high accordance values with relation to the second selected concept.
  • the first portion 60 of returned documents of and the second portion 60 of the returned documents are updated to reflect the newly sorted list.

Abstract

A method and apparatus is provided for visually coding or sorting search results based on the similarity of the search results to one or more concepts. A search query containing search terms is used to conduct a web search and obtain search results comprising a number of document surrogates describing the located web pages. Concepts are obtained using the search terms and the similarities between the obtained concepts and the search results are evaluated. The search results are then displayed in a manner that indicates the relative similarity of search results to one or more of the determined concepts, such as by sorting the search results based on the level of similarity of the search results to one or more concepts or by providing an accordance indicator with each displayed search result the accordance indicators indicating the similarity of the corresponding search result with one ore more of the concepts.

Description

    RELATED APPLICATION
  • The present application claims the benefit of Canadian Patent Application Serial No. 2,549,536 filed Jun. 6, 2006, which is incorporated herein in its entirety by reference.
  • BACKGROUND
  • The World Wide Web has given computer users on the Internet access to vast amounts of information in the form of billions of Web pages. Each of these pages can be accessed directly by a user typing the URL (universal resource locator) of a web page into a web browser on the user's computer, but often, a person is more likely to access a website by finding it with the use of search engine. A search engine allows a user to input a search query made up of words or terms that a user thinks will be used in the web pages containing the information he or she is looking for. The search engine will attempt to match web pages to the terms in the search query and will then return the located web pages to the user.
  • The search results generated from a user's query typically consist of a collection of document surrogates, each of which contains summary information, attributes, and other meta-data about the matched documents. These documents surrogates are often presented in a simple list-based format, displaying the title of the document, a snippet containing.
  • the query terms in context, and the uniform resource locator (the URL). A user can then select one of the returned entries to view the corresponding web page.
  • With the continued growth of web pages available on the Internet making the task of search engines more and more difficult, web search engines have greatly increased the size of their indexes and made significant advances in the algorithms used to match a user's query to these indexes. However, while it is clear that significant effort has gone into creating web search engines that can index billion of documents and return the search results in a fraction of a second, this has resulted in the creation of the problem of search queries returning numerous results.
  • While relevant documents might be present in the search results returned from a search engine, often the returned search results consist of tens or hundreds of individual documents making it hard for a user to determine which of the search results may or may not be relevant to the information the user is looking for.
  • While information retrieval techniques used by web search engines have improved substantially over the years, the search results are still typically represented in a simple list-based format. Although this list-based representation makes it easy to evaluate a single document, it does not support the users in the broader tasks of manipulating the search results, comparing documents, or finding a set of relevant documents. Even though this simple list-based representation provides the search results in a clear and effective manner for determining the relevance of individual document surrogates, it requires that each document surrogate be evaluated in turn, and to some degree, in the order provided. If hundreds of documents are returned it is inefficient if not completely impractical to have a user review hundreds of results to determine the most relevant document located in the search. Requiring users to evaluate each document surrogate individually, often with only ten documents per page, leads to a common user search trait of evaluating only a few pages of search results before either re-formulating their query or giving up.
  • One solution that can be used to address these numerous search results is for the user to reformulate his or her search query to narrow the search with the result that fewer document are located mating the search query, however, in many cases there may be high quality relevant documents buried in the search results set that were missed because the users did not look at enough search result pages.
  • Another method Mat has also been used is to cluster the search results such that documents that are similar to one another are grouped together. In such a system, a user navigates the clusters in order to narrow down the search results and avoid clusters of relevant documents. Ideally, the user will select the relevant clusters and view lists of the returned documents in which a large portion are relevant to the requirements of the user.
  • One of the problems with these systems is that determining what the clusters should be centered around and determining an adequate description of the cluster. If the information does not correctly describe the document contained in the cluster, a user may either choose clusters that are not relevant or entirely miss clusters that may contain relevant documents.
  • SUMMARY OF THE INVENTION
  • It is an object of the present invention to provide a system and method that overcomes problems in the prior art.
  • In a first aspect of the invention, a method is provided for visually coding search results based on at least one concept. The method comprises: using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising at least one returned documents; obtaining the at least one concept by matching the plurality of search terms to the at least one concept; evaluating the similarity between the at least one returned document and the at least one concept; and displaying the at least one returned document with an accordance indicator. The accordance indicator indicates tee similarity between the at least one returned document and the at least one concept.
  • In a second aspect of the invention, a data processing system for visually coding search results based on at least one concept. The data processing system comprises: at least one processor, a memory operatively coupled to the at least one processor; a display device operative to display data; and a program module stored in the memory and operative for providing instructions to the at least one processor. The at least one processor is responsive to the instructions of the program module. The program module operative is for: using a search very containing at least one search term to conduct a seal of a plurality of computer readable documents and obtain search results comprising a plurality of returned documents; obtaining the least one concept by matching the at least one search term to the at least one concept; evaluating the similarity between a returned document and the at least one concept; and displaying on the display device the returned document with an accordance indicator. The accordance indicator indicates the similarity between the returned document and the at least one concept.
  • In a third aspect of the invention, a method of visually sorting search results based on at least one concept is provided. The method comprises: using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising a first returned document and a second returned document; obtaining the least one concept by matching the plurality of search terms to the at least one concept; determining a first accordance value by evaluating the similarity between the first returned document and the at least one concept and a second accordance value by evaluating the similarity between the second returned document and the at least one concept; and displaying the first returned document and second returned document sorted in an order based on the fist accordance value and the second accordance value.
  • In a fourth aspect of the invention, a data processing system for visually sorting search results based on at feast one concept. The data processing system comprises: at least one processor, a memory operatively coupled to the at least one processor, a display device operative to display data; and a program module stored in the memory and operative for providing instructions to the at least one processor. The at least one processor is responsive to the instructions of the program module. The program module operative is for: using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising a first returned document and a second returned document; obtaining the least one concept by matching the plurality of search terms to the at least one concept; determining a first accordance value by evaluating the similarity between the first returned document and the at least one concept and a second accordance value by evaluating the similarity between the second returned document and the at least one concept; and displaying the first returned document and second returned document sorted in an order based on the first accordance value and the second accordance value.
  • In a further aspect of the invention, a computer readable memory having recorded thereon statements and instructions for execution by a computer to carry out the above methods is provided.
  • DESCRIPTION OF THE DRAWINGS
  • While the invention is claimed in the concluding portions hereof, preferred embodiments are provided in the accompanying detailed description which may be best understood in conjunction with the accompanying diagrams where like parts in each of the several diagrams are labeled with like numbers, and where:
  • FIG. 1 is a schematic illustration of a data processing system suitable for supporting the operation of methods in accordance with the present invention;
  • FIG. 2A is a schematic illustration of a network configuration suitable for supporting the operation of methods in accordance with the present invention wherein the data processing system is connected over a network to a plurality of servers operating as a search engine;
  • FIG. 2B is a schematic illustration of a network configuration suitable for supporting the operation of methods in accordance with the present invention wherein the data processing system is configured as a server and a remote device is used to access the data processing system;
  • FIG. 3 is an architectural schematic of a data structure for a concept knowledge base in accordance with an aspect of the present invention;
  • FIG. 4 illustrates a flowchart of a method of automatically creating a data structure containing a concept knowledge base in accordance with the present invention;
  • FIG. 5 is a schematic diagram of an overview of a software system in accordance with an aspect of the present invention;
  • FIG. 6 illustrates a flowchart of a method for generating a set of concepts based on the search terms in the search query in accordance with an aspect of a present invention;
  • FIG. 7 is a typical document surrogate data object, which is commonly provided as a returned document by a search engine as one of a set of search results;
  • FIG. 8 is a flowchart of a method of determining accordance values for each returned document in a set of search results;
  • FIG. 9 is an embodiment of a graphical interface displaying a number of returned documents with accordance indicators;
  • FIG. 10 is another embodiment of a graphical interface displaying a number of returned documents with accordance indicators and sorted by accordance values; and
  • FIG. 11 is an embodiment of a graphical interface displaying a number of returned documents with a plurality of accordance indicators for each displayed returned document.
  • DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS
  • Data Processing System
  • FIG. 1 illustrates a data processing system 1 suitable for supporting the operation of methods in accordance with the present invention. The data processing system 1 could be a personal computer, server, mobile computing device, cell phone, etc. The data processing system 1 typically comprises: at least one processing unit 3; a memory storage device 4; at least one input device 5; a display device 6 and a program module 8.
  • The processing unit 3 can be any processor that is typically known in the art with the capacity to run the program and is operatively coupled to the memory storage device 4 through a system bus. In some circumstances the data processing system 1 may contain more than one processing unit 3.
  • The memory storage device 4 is operative to store data and can be any storage device that is known in the art, such as a local hard-disk, etc. and can include local memory employed during actual execution of the program code, bulk storage, and cache memories for providing temporary storage. Additionally, the memory storage device 4 can be a database that is external to the data processing system 1 but operatively coupled to the data processing system 1.
  • The input device 5 can be any suitable device suitable for inputting data into the data processing system 1, such as a keyboard, mouse or data port, such as a network connection, and is operatively coupled to the processing unit 3 and allowing the processing unit 3 to receive information from the input device 5.
  • The display device 6 is a CRT, LCD monitor, etc. operatively coupled to the data processing system 1 and operative to display information. The display device 6 could be a stand-alone screen or if the data processing system 1 is a mobile device, the display device 6 could be integrated into a casing containing the processing unit 3 and the memory storage device 4.
  • The program module 8 is stored in the memory storage device 4 and operative to provide instructions to processing unit 3 and the processing unit 3 is responsive to the instructions from the program module 8.
  • Although other internal components of the data processing system 1 are not illustrated, it will be understood by those of ordinary skill in the art that only the components of the data processing system 1 necessary for an understanding of the present invention are illustrated and that many more components and interconnections between them are well known and can be used.
  • FIG. 2A illustrates a network configuration wherein the data processing system 1 is connected over a network 55 to a plurality of servers 50 operating as a search engine. FIG. 2B illustrates a network configuration wherein the data processing system 1 is configured as a server and a remote device 60, such as another computer, a PDA, cell phone or other mobile device connected to the Internet, is used to access the data processing system 1. The data processing system 1 runs the majority of the software and methods, in accordance with the present invention, and accesses a plurality of servers 50 operating as a search engine to conduct a web search. By having the data processing system 1 configured as a server, the remote device 60 does not need to have the capacity necessary to contain all the necessary data structures and run all the methods.
  • Furthermore, the invention can take the form of a computer readable medium having recorded thereon statements and instructions for execution by a data processing system 1. For the purposes of this description, a computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • Concept Knowledge Base
  • FIG. 3 illustrates an architectural schematic of a data structure for a concept knowledge base 10, in accordance with an aspect of the present invention. The data structure is stored on a memory and is accessible by an application program being executed by a data processing system, such as the data processing system 1 illustrated in FIG. 1. The data structure contains information that is accessible by the application program.
  • The concept knowledge base 10 contains information relating to a field of knowledge. For example, the concept knowledge base 10 could contain information related to the field of science. The concept knowledge base 10 contains a number of concept data objects 12, a number of term data objects 14 and a number of edge data objects 16.
  • Each concept data object 12 contains a concept field 13 containing a concept that is related to a specific concept falling within the field of knowledge of the concept knowledge base 10. The concept field 13 typically contains a text string identifying the concept. For example, if the concept knowledge base 10 is for computer science, there may be concept data objects 12 with the concept field 13 containing the text string of “computer graphics”, another concept data object 12 with the concept field 13 containing the text string of “distributed computing”, another concept data object 12 with the concept field 13 containing the text string “artificial intelligence”, etc.
  • Each term data object 14 contains a term field 15 containing a text string. The text string contains a word or phrase that describes a concept of one of the concept data objects 12.
  • Each concept data object 12 is associated with one or more term data objects 14 and each term data object 14 is associated with one ore more concept data objects 12. The association of a concept data object 12 and a term data object 14 is defined by an edge data object 16 which contains a weight field 18. A term data object 14 that is associated with a concept data object 12 contains a term in the term field 15 that describes the concept contained in the concept field 13 of the concept data object 12. The relevancy of the term in the term field 15 of the term data object 14 to the concept in the concept field 13 of an associated concept data object 12 is represented by a weight in the weight field 18 of the edge data object 16.
  • While it is possible to manually construct the data structure containing the concept knowledge base 10, FIG. 4 illustrates a flowchart of a method of automatically creating a data structure containing a concept knowledge base in accordance with the present invention.
  • Method 100 comprises the steps of: determining a concept 110; selecting a document describing the concept 120; determining terms in the document to be analyzed 130; determining the frequency of the selected terms 140; checking if there are any remaining documents describing a concept 150; calling a preliminary weight 160; checking if there are any more concepts 170; and normalizing all of the weights 180.
  • The method takes a number of documents and/or descriptions in computer readable form that describe a number of different concepts in a knowledge area and uses the documents to automatically generate a data structure of a concept knowledge base 10, as shown in FIG. 3.
  • The method 100 begins with step 110. A concept falling within the concept knowledge base is determined and a concept data object is created with information identifying the concept contained in the concept field.
  • Each concept will be described by one or more document or descriptions in computer readable format. Once a concept has been determined at step 110, one or more documents describing the concept are identified and at step 120 one of these documents is selected to be analyzed.
  • At step 130, the method 100 determines the terms to be analyzed in the document. For each term to be analyzed, method 100 creates a term data object for each selected term with the term field containing the term, if a term data object containing the term does not already exist. An edge data object indicating the association of the term data object and the concept data object is also created and after the method 100 is completed will contain a weight indicating the relation of the term data object with the associated concept data object containing the concept described by the document being analyzed.
  • The terms that are analyzed can include all of the words used in the document or only specific words in the documents. For example, common words that are basically non-descriptive, such as “the”, “a”, “this”, etc. may be excluded from the selected terms that are selected for analysis at step 130.
  • At step 140 the frequency of each of the selected terms in the selected document is determined. The occurrence of each selected term in the document is determined. The occurrence of a selected term tj in the document being analyzed can easily be determined, via text matching, and is defined by the function:

  • f(dik,ti)
  • Each of the terms appearing in the document are then averaged based on the number of occurrences of all of the terms in the document. For example, the averaging could be done using the following equation:
  • f * ( d k , t j ) = f ( d ik , t j ) l = 1 m f ( d ik , t t , ik )
  • where dlk is the document bet analyzed for the set of terms tik=(tl,ik, . . . , tm,ik) with m being the number of terms in document dik. This equation simply divides the frequency or tally of a term being analyzed by the total number of terms being analyzed in document dik. By conducting this averaging, the eventual weight determined for each association between a term node and a concept node takes into account the number of occurrences of a term in the document and provides a potentially more relevant indicator of the relation between the term data object to the concept data object because words or terms that appear often relative to the total number of terms will be given more weight. This preliminary averaging is used to try to prevent a single large document describing a concept from providing term weights that overshadow the weights provided by a number of smaller documents.
  • Next, at step 150, the method 100 checks to see if there are any more documents related to the concept that have not been analyzed. If there are more documents to be analyzed related to the concept, the method 100 returns to step 120, selects the next unanalyzed document and repeats steps 130, 140 and 150. As long as more documents related to the concept exist, step 150, causes the method 100 to analyze all of the documents. When there are no more documents related to the concept to be analyzed, the method 100 continues on to step 160.
  • At step 160 the method 100 calculates a preliminary weight for each of the terms used in the documents related to a single concept. For each term an interim weight wij* is calculated taking into account the average term frequency of the documents related to the concept.
  • w ij * = k = 1 n f * ( d ik , t j ) n
  • Wherein there are 1 . . . n documents.
  • This equation, in its entirety, is as follows:
  • w ij * = k = 1 n f ( d ik , t j ) l = 1 m f ( d ik , t 1 , ik ) n
  • This calculation is used to prevent concepts with a large numbers of documents from producing term weights that overshadow term weights from concepts with fewer documents describing the concept.
  • At step 170, the method 100 checks to see if there are any more concepts left to be evaluated. If there are concepts remaining that have not been analyzed, the method 100 returns to step 110 and the next concept is selected to be analyzed. The method 100 then repeats steps 120, 130, 140, 150 and 160 determining a preliminary weight for each of the terms appearing in the documents describing the selected document. The method 100 continues to analyze each concept repeating steps 110, 120, 130, 140, 150, 160 and 170 until all of the concepts have been analyzed, at which point, the method 100 continues on to step 180.
  • At step 180 the method 100 determines a normalized weight for each of the terms associated with the concepts. The preliminary weight wlj * previously determined for each association between a term ti and a concept is divided by the sum of all of the weights determined for the term ti connected to r concepts. This equation is shown as follows:
  • w ij = w ij * k = 1 r w if * ( k )
  • Wherein the index f(k) is given by f(x), x=1 . . . r, representing the r concepts to which term i is connected to in the concept knowledge base.
  • The normalization of the weights is used to prevent common terms that are included in many of the documents for many concepts from having higher weight values than other less common terms. These terms are often of little value in describing a concept. By using normalization, the weights of common terms are significantly reduced. Without this normalization step, common terms that are included in many documents for many different concepts would have a very high weight, even though these terms are of little value in describing the concept. With this normalization step, the weights of these common terms are significantly reduce.
  • Additionally, rather than using the terms exactly as they appear in the documents or descriptions, in a further aspect of the invention, the stems of the roots of the terms are used to construct the knowledge base allowing terms to be matched based on their stems or roots rather than being based on exact text matches.
  • Additionally, in some circumstances it may not be necessary to analyze every term in a document. In a further aspect, the method 100 will focus on only specific terms in a document that are highlighted in a particular way, i.e. in an abstract. Alternatively, there could be a list of terms that are not analyzed, such as common terms that are not descriptive of a concept, for example terms such as the, and, etc. may be excluded from being selected.
  • At the conclusion of the method 100 a concept knowledge base as illustrated in FIG. 3 will have been automatically constructed by the method 100.
  • Overview of System
  • FIG. 5 illustrates a software system in accordance with the present invention. The software system 200 contains: a search query module 210; a concept generator module 220; a concept knowledge database 230; a search module 240; a search engine module 250; a concept accordance module 260; and a visualization interface module 270.
  • A search query is input to the system 200 at the search query module 210. The search query contains one or more search terms and usually at least two or three search terms. From the search query module 210 this search query containing one or more search terms is passed to the concept generator module 220, which uses the concept knowledge database 230 to select a set of concepts and generate a concept vector for each selected concept.
  • The search terms and the concept vectors are passed from the concept generator module 220 to the search module 240 where the search module 240 requests a search from the search engine 250 using the search query and receives search results. Typically, the results returned by the search engine module 250 are a list of returned documents, where each returned document is typically a document surrogate that describes the actual documents located by the search engine module 250.
  • As the search results are received from the search engine module 250, the search engine module 250 passes the returned document to the concept accordance module 260 where a document vector is determined for each of the returned document and an accordance value indicating the similarity of the returned document to each of the concepts in the concept set is formulated. A fuzzy membership score is typically determined for the returned document relative to each of the concepts in the concept set and is used for the accordance value.
  • The search module 240 can wait until all the search results are returned from the search engine module 250 before passing the search results all together to the concept accordance module 260, however, in one embodiment, the search module 240 passes each returned document of the search results to the concept accordance module 260 as the search module 250 receives the returned document to decrease the time required by the system 200 to display the search results.
  • Finally, the concept set and the search results (comprising a plurality of returned document), with each returned document having an accordance value indicating the similarity of the returned document to each of the concepts in the concept set, are passed to the visualization interface module 270 wherein the visualization interface module 270 displays the search results to a user.
  • The software system 200 can be implemented wholly on a data processing system, such as the data processing system 1 shown in FIG. 2A, with only the search engine module 250 resident on a server 50 connected to the data processing system 1 over the network 55. Alternatively, various components of the software system 100, such as the search query module 210 could be resident on a mobile device 60 operably connected to a data processing system 1 which contains other components of the software system 200, (such as the concept generator 220, search module 240, etc.) as shown in FIG. 2B.
  • Obtaining the Concepts
  • The software system 200 begins with an initial search query being input to the search query module 210 which passes the search query to the concept generator module 220. The concept generator module 220 accesses the concept knowledge database 230 and uses the information in the concept knowledge database 230 to generate a concept set and a respective set of concept vectors using the search query.
  • FIG. 6 illustrates a flowchart of a method 400 for generating a set of concepts based on the search terms in the search query. Method 400 is implemented by the concept generator module 220 in FIG. 4, using the concept knowledge database 230. When a search query is passed to the concept generator module 220, the method 400 uses the concept knowledge database 230 to generate a list of concepts using relationships between the search terms and concepts.
  • Method 400 comprises the steps of: matching terms in the search query to term data objects in the concept knowledge base to obtain a first term set 410; obtaining a concept set of concept data objects associated with the first term set 420; obtaining a second term set of term data objects associated with the concepts objects in the concept set 430; and formulating a set of concept vectors 440.
  • The method 400 begins with step 410 and matching the terms in the search query to term data objects in the concept knowledge database 230. The concept knowledge database 230 is accessed and the terms making up the search query are matched with any term data objects that have a term in the term field matching the term in the search query. A fist term set containing these selected term data objects is obtained. After step 410 is completed, all of the term data objects in the concept knowledge database 230 that have a term in the term field that corresponds to one of the terms in the search query are identified and these term data objects have been added to a first term set.
  • At step 420, the first term set is used to obtain a concept set containing concept data objects from the concept knowledge database 230 associated with one or more term data objects in the first term set. The term data objects making up the first term set are used to obtain a number of concept data objects from the concept knowledge database 230. Concept data objects associated with one or more term data objects in the first term set are selected to form the concept set.
  • Concept data objects that are not strongly associated with term data objects in the first term set can be excluded from the concept set using a first weight threshold and a term ratio threshold. The first weight threshold is used to exclude concept data objects that are not strongly associated with one of the term data objects in the first term set by comparing the weight assigned to an association between a concept data object and a term data object and excluding the concept data object from the concept set if the weight determined for the association is less than the first weight threshold. By using this first weight threshold, the concept set is limited to only the more relevant concepts. Additionally, a term ratio threshold is used to further exclude concept data objects from the concept set. If a concept data object is associated with one of the term data objects in the first term set with a weight greater than the first weight threshold, the concept data object is evaluated to determine the ratio of all of the term data objects in the first term set to which the concept data object is associated with a weight greater than the first weight threshold. If this ratio is less than the term ratio threshold, the concept data object is excluded from the concept set.
  • Through experiments, the first weight threshold, term ratio threshold and second weight threshold can be determined. For example, some initial studies found that a first weight threshold of 0.05, a term ratio threshold of 0.26 and a second weight threshold of 0.10 provided satisfactory results.
  • At step 430 a second term set is obtained for each of the concept data objects in the concept set. Each of the concept data objects in the concept set are evaluated to determine term data objects, in the concept knowledge database 230, associated with each of these concept data objects. Term data objects associated with the concept data objects selected for the concept set are added to the second term set. A second weight threshold may be used to exclude term data objects from the second term set if they are associated with concept data objects in the concept sets by a weight that is less than the second weight threshold.
  • At step 440, the concept set is used to formulate a set of concept vectors. For each concept represented by a concept data object in the concept set, a concept vector is created using the first term set and second term set obtained for the concept data object at step 430. Each term data object in the first term set and second term set for a concept data object is defined as a dimension of the concept vector and the weight of the association between the term data object and the concept data object is used to set the magnitude of the concept vector in that assigned dimension.
  • At this point, the method 400 ends and a set of concepts and their corresponding concept vectors C={c1, c2, . . . , cm)have been determined.
  • Referring again to FIG. 5, from the concept accordance module 260, the concept set and the respective concept vectors are passed to the search module 240 to request a search using the search query.
  • Search Module
  • When the search module 240 receives the search query, the concept set and the set of concept vectors, the sear module 240 requests the search engine module 250 to conduct a search using the search query. The search module 240 is typically resident on the data processing system 1 and the search engine module 250 is typically a web search engine, such as the web search engine running on servers 50 in FIGS. 2A and 2B, with the search being conducted on a number of computer readable documents, such as searching for web pages on the World Wide Web. However, the search engine module 240 could be used in any computerized document storage system capable of searching a large number of computer readable documents.
  • The search engine module 240, could return the results of the search in the form of a list of complete documents located in the search, however, due to the likelihood that a relatively large number of documents can be located with the search and to save overhead on the data processing system, the search results are typically returned in the form of list of document surrogates, with a document surrogate returned for each document located in the search.
  • FIG. 7 illustrates a typical document surrogate data object 160 which is commonly provided as a returned document by a search engine as one of a set of search results. Rather than a search engine returning each complete document that is located in a search, search engines typically provide a set of document surrogates 160 in place of supplying the completed documents. Document surrogates 160 are the primary data objects in the list-based representation used by search engines. Each document surrogate 160 provides information describing the corresponding complete document which commonly consists of: a title 162; a URL 164; a summary 166; and any other additional other assorted information. The title 162 provides the tile of the corresponding complete document described by the document surrogate 160, the URL 164 provides the address of the complete document and the summary 166 contains a short description or snippet of the complete document and usually provides he query terms of the search term in context.
  • Referring again to FIG. 5, the search results obtained by the search module 240 are passed to the concept accordance module 260 where accordance values for each return document indicating the degree of similarity between the returned document and the concepts in die concept set are formulated.
  • Concept Accordance
  • FIG. 8 illustrates a method 500 of determining accordance values for each returned document of the search results. Each returned document has an accordance value determined for each concept in the concept set and indicating the degree of similarity between the returned document and the concept. The accordance value is typically based on a fuzzy membership score with each concept in the concept set treated as a centroid to determine the clustering of the returned documents around the concepts. A returned document might have a higher fuzzy membership score in relation to one concept as compared to another concept, indicating that the first concept has a higher degree of similarity to the returned document than the second concept.
  • Method 500 comprises the steps of: selecting a first returned document 510; analyzing the returned document 520; generating a document vector 530 for the selected returned document; selecting a first concept 540; determining an accordance value for the selected returned document in relation to a selected concept 550; checking if more concepts remain in the concept set to be analyzed 560 in relation to the selected returned document and selecting the next concept 570, if more concepts remain in the concept set; once the selected returned document has been analyzed in relation to all the concepts in the concept set, checking if more returned documents are present 580 and selecting the next returned document 590, if there are more returned documents; and ending when all the returned documents have an accordance indicator determined for each of the concepts in the concept set.
  • The method 500 begins with the selection of a first returned document at step 510. As the search results are returned, the method 500 selects one of the returned documents returned as the search results.
  • Next at step 520, the returned document is analyzed to determine a frequency of each unique word or term in the returned document. If the returned document is the entire document, the occurrence of each unique term in the entire document is determined. If the returned document is a document surrogate the occurrence of each unique term can be determined based on the summary of the complete document and optionally in the title.
  • In some cases it may be preferable to use the roots of the words rather than the words themselves, such as using Porter's stemming algorithm, so words with various prefixes and suffixes are not counted as separate terms. Each unique term in a returned document will be based on the root or stem of the term so that the frequencies determined are not reduced by the use of terms that use different suffixes, prefixes, etc. Match based on the stems or roots of the search terms can be more effective than exact word matches, since it takes into account different variations of the same root word.
  • The frequencies determined for the unique terms in the selected returned document are then used to generate a document vector at step 530. The document vector represents the frequency of unique terms in the returned document in a multidimensional vector form. Each unique term is used to define a dimension of the document vector and the magnitude of the document vector in that dimension is then set as the frequency of the term in the returned document.
  • A first one of the concepts in the concept set is selected at step 540 and an accordance value indicating the degree of similarity between the returned document and the selected concept is determined at step 550. The accordance value is assigned by determining a fuzzy membership score of the returned document relative to the selected concept. Given the concept set, the corresponding concept vector set, C={c1, c2, . . . , cm}, and the document vector dl, determined for the returned document determined at step 530, the fuzzy membership score, uij, of the returned document can be determined with respect to a concept cj by the following equation:
  • u i , j = 1 k = 1 m ( sim ( d i , c j ) sim ( d i , c k ) ) 2
  • Where sim(dicj) or the similarity between a document vector and a concept vector is given by the Euclidean distance metric:
  • sim ( x i , x j ) = ( k = 1 p ( x i , k - x j , k ) 2 ) 1 / 2
  • where xi is the document vector and xj is the concept vector.
  • Although a specific clustering algorithm is provided a person skilled in the art will appreciate that other clustering algorithms could also be used.
  • The method 500 then checks, at step 560, to see if more concepts remain in the concept set and if more concepts remain, the next concept in the concept set is selected at step 570 and an accordance value consisting of a fuzzy membership score for the newly selected concept is determined at step 550. The method 500 will continue to repeat steps 550, 560 and 570 until all the concepts in the concept set of been selected and an accordance value determined for each of the returned document has been established for each of the concepts in the concept set.
  • At step 580, if more returned documents are remaining, the next unit returned document is selected at step 590 and steps 520, 530, 540, 550, 560, 570, 580 and 590 are repeated until all of the repeated documents have been returned.
  • In this manner, the method 500 determines a document vector for each of the returned documents and uses the document vector to determine a fuzzy membership score for the returned document and each concept in the concept set, which is used then to assign an accordance value indicating the similarity between the returned document and each of the concepts in the concept set.
  • In one embodiment of the present invention, each returned document in a set of search results is evaluated as the returned document is returned from the search engine module 250 to decrease the amount of time the system requires before displaying the search results to a user.
  • At the end of the method 500 each of the returned documents will have an accordance value determined in relation to each of the concepts in the concept set.
  • From the concept accordance module 260, the search results comprising the plurality of returned documents and the formulated accordance values are passed to the visualization interface module 270.
  • Vizualization Interface
  • The visualization interface module 270 displays the search results in a manner that allows a user to quickly determine the degree of similarity between one or more concepts in a concept set and a returned document. Additionally, the search results are displayed in such a way that a user can determine whether a first returned document is more or less similar to one or more selected concepts than a second returned document. Referring to FIG. 5, the search results in the form of a list of returned documents (with the accordance values determined for each returned document) and the concept set are passed, from the concept accordance module 260, to the visualization interface module 270 where the search results are displayed to a user.
  • FIG. 9 illustrates a screen shot of an embodiment of a graphical interface 600 that may be generated by the visual interface module 270 shown in FIG. 5, for displaying the search results. A compressed level of detail provides the user with a large overview of a large number of the search results and a more detailed level of view provides more detailed information about a smaller number of the search results.
  • A first portion 610 of the list of returned documents is shown in the graphical interface 600. Typically, each returned document shown in the first portion 610 comprises an accordance indicator 614 and a title 636. The title 636 provides readable text showing the title of the returned document and is typically a hyperlink to the actual located document, such as the webpage located in a web search or the document in an information retrieval system.
  • Each returned document in the first portion 610 can be displayed with the summary of the returned document. Alternatively, each returned document in the first portion 610 can display the summary of the returned document only when a user moves a cursor over the returned document in the first portion 610. When the user moves a cursor over the returned document in the first portion 610, a popup field (tool tip) can appear containing the summary of the returned document.
  • Each returned document in the first portion 610 is displayed with an accordance indicator 614. The accordance indicator 614 is based on the accordance values formulated for the returned documents and indicates the similarity between a returned document and one or more of the concepts in the concept set generated using the search terms in the search query.
  • The accordance values consisting of the fuzzy membership scores determined for each returned document in relation to the concepts in the concept set, using method 500, illustrated in FIG. 8, are used to formulate the accordance indicators 614. Each concept 685 in the set of concepts is displayed in a viewing pane 680. A user can select one or more of the concepts 685. If a user selects only a single concept 685, the accordance indicator 614 shows the accordance value of each turned document in relation to the selected concept. However, if the user selects more than one of the concepts 685 in the concept set, the accordance indicator 614 is the result of the addition of the accordance values of each returned document in relation to all of the selected concepts.
  • Typically, a color shade is assigned to each accordance indictor 614 based on the accordance values of the selected concepts for each returned document. Typically, returned documents with higher accordance values for the selected concepts (more similarity between a selected concept and a returned document) are assigned a color shade that is more intense or deeper. For example, an accordance indicator 614 indicating a relatively low accordance may have a very pale yellow color, while an accordance indicator 614 indicating a higher accordance may be a color shade of a much darker red.
  • A second portion 630 of the list of returned documents is shown in the graphical interface 600. The first portion 610 is a subset of the returned documents shown in the second portion 630. The second portion 630 displays each returned document in a much smaller format so that the second portion 630 displays more of the returned documents than the first portion 610 however, the first portion 610 provides more information regarding each displayed returned documents than the returned documents displayed in the second portion 630.
  • Each of the returned documents in the second portion 630 is displayed in a format which provides a compressed or small view of the returned document. Each returned document shown in the second portion 630 is displayed with an accordance indicator 614, and, typically, a title representation 616.
  • The title representation 616 represents the title of the returned document; however, the title representation 616 does not necessarily have to provide the title in a readable format. Returned documents displayed in the second portion 630 may be displayed so small that a solid line is used to provide the title representation 616 and the title representation 616 merely indicates the approximate length of the title of the returned document in relation to the length of the titles of the other returned documents.
  • By displaying a more detailed first portion 610 of the list of returned documents and a less detailed second portion 630 of the list of returned documents, where the second portion 630 contains many more returned documents than the first portion 610 the interface 600 provides two levels of detail to a user about returned documents provided in the list returned as the search results.
  • Each of the returned documents displayed in the second portion 630 corresponds to a returned document displayed in the first portion 610, such that all of the returned documents in the first portion 610 are contained in the second portion 630, with the returned documents in the first portion 610 occurring in the same order that they occur in the second portion 630. The second portion 630 represents a subset of list of returned documents making up the search results from the search engine and the first portion 610 represents a subset of the returned documents show in the second portion 630. The first format allows a user to see a large number of the returned documents in a compressed view in the second portion 630 and then also see some of these returned shown in the second portion 630 in a larger, more detailed view in the first portion 610.
  • The accordance indicators 614 displayed with the returned documents in the first portion 610 indicates the same accordance values as the accordance indicators 614 associated with the same returned document in the second portion 630.
  • The second portion 630 displays a much greater portion of the list of returned documents than the first portion 610. In some cases, more than one hundred (100) returned documents may be displayed in the second portion 630. On the other hand the first portion 610 displays a relatively smaller number of the returned documents because the returned documents displayed in the first portion 610 provides more details and therefore the returned documents must be shown in a large enough size that a user can read the titles 636 of the returned documents shown in the first portion 610. For example, while the second portion 630 may display one hundred (100) returned documents in the first format the first portion 610 may display fewer than twenty five (25) returned documents.
  • While each of the accordance indicators 614 could be an assigned number such as a percentage, the use of a color shade as the accordance indicators 614 (or as part of the accordance indicators 614) allows the information to be conveyed to the user even though the returned document in the second portion 630 may be displayed too small for a user to either easily read or even be able to read text. Using a color shade for the accordance indicators 614, the accordance indicators 614 do not have to be very large in order to convey the necessary information to a user; just large enough to convey to a user a shade of color. While numbers, text or geometric shapes cannot be illustrated using a single pixel; a color shade can be. In some cases, the accordance indicator 614 may be made as small as a single pixel of a display screen.
  • An indicator frame 650 is positioned over the returned documents in the second portion 630 that are also shown in the first portion 610. The indicator frame 650 indicates the returned documents shown in the second portion 630 that are also shown in the first portion 610.
  • When a user makes a selection that changes the returned documents shown in the first portion 610, such as by using a scroll bar 660 to scroll to a new set of returned documents displayed in the first portion 610, the second portion 630 is updated to indicate the same returned documents shown in the first portion 610 in the second portion 630, by moving the indicator frame 650 along the second portion 630.
  • In this manner, a user can quickly look over the accordance indicators 614 for each returned document shown in the first portion 610 to determine which returned documents correlate to the desired concepts. Additionally, using the second portion 630, a user can quickly look over the accordance indicators 614 for each returned document in the second portion 630 and quickly determine which returned documents shown in the second format 630 correlate to the desired concepts closer then the other returned documents without requiring the user to perform any in-depth analysis of each returned document. By simply scanning over the accordance indicators 614 a user can quickly and easily visually locate the accordance indicators 614 that indicate a returned document that has a high correlation with a concept by the various shades of color shown in the accordance indicators 614.
  • A user can also visually analyze the returned documents shown in the first portion 610, checking for returned documents that contain accordance indicators 614 indicating the similarity of a returned document to one or more of the concepts. Once a user identifies a returned document or a grouping of returned documents in the first portion 610 that the user wishes to examine in more detail, the user can then move the indicator frame 660 so that the selected returned documents or grouping of returned documents in the second portion 630 are displayed in the second format in the first portion 610. A user can then examine the titles 636 of the represented documents and click on a desired returned document title 636 to go to the document.
  • FIG. 10 illustrates a screen shot of a further embodiment of a graphical interface 700 that may be generated by the visual interface module 470 shown in FIG. 5, for displaying the search results wherein the accordance values of the returned documents in relation to the selected concept or concepts in the concept set are used to sort the returned documents in the first portion 610 and second portion 630. The returned documents are sorted based on the accordance values for the returned documents in relation to a selected concept. Returned documents that have a higher degree of similarity to a selected concept (shown with the accordance indicators 614) appear higher in the list than other returned documents that are less similar to the selected concept or concepts.
  • FIG. 11 illustrates a screen shot of a further embodiment of a graphical interface 800 that may be generated by the visual interface module 470 shown in FIG. 5, for displaying the search results. Graphical interface 800 is similar to graphical interface 600 (illustrated in FIG. 9) in that a first portion 610 of a list of returned documents is displayed with each returned document displaying the title 636 and any other information. However, instead of displaying a single accordance indicator 614 with each returned document wherein when more than one concept from the concept set is selected the single accordance indicator 614 is made up of the sums of the accordance values of the selected concepts, graphical interface 800 provides a separate accordance indicator 814 for each selected concept. Each accordance indicator 814 displays a single accordance value indicating the similarity between a returned document and a specific concept in the concept set. By simply viewing the accordance indicators 813 a user can see how similar each of the displayed returned documents are to a specific concept. In this manner a user can compare the similarity of a returned document to two or more of the concepts in a concept set.
  • A user can sort the list of the returned documents displayed in the graphical interface 800 based on one concept over the one or more others by selecting one of the concepts to sort the list by. In one embodiment, a user selects the concept the user desires the list to be sorted by. The list of returned documents is then resorted to place a precedent on the selected concept and the first portion 610 of returned documents and the second portion 630 of the returned documents are updated to reflect the newly sorted list.
  • A user can also conduct a nested sort by selecting a second concept. The list of returned documents is then resorted to place a primary weight on returned documents with high accordance values with the first selected concept and then a secondary weight on returned documents with high accordance values with relation to the second selected concept. The first portion 60 of returned documents of and the second portion 60 of the returned documents are updated to reflect the newly sorted list.
  • The foregoing is considered as illustrative only of the principles of the invention. Further, since numerous changes and modifications will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation shown and described, and accordingly, all such suitable changes or modifications in structure or operation which may be resorted to are intended to fall within the scope of the claimed invention.

Claims (36)

1. A method of visually coding search results based on at least one concept, the method comprising:
using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising at least one returned documents;
obtaining the at least one concept by matching the plurality of search terms to the at least one concept;
evaluating the similarity between the at least one returned document and the at least one concept; and
displaying the at least one returned document with an accordance indicator, the accordance indicator indicating the similarity between the at least one returned document and the at least one concept.
2. The method of claim 1 wherein the at least one concept is matched to the plurality of search terms by:
accessing a concept knowledge base data structure having a plurality of concept data objects and a plurality of term data objects, each term data object defining a term and associated with at least one of the concept data objects and each concept data object defining an object;
generating a first term set containing term data objects from the concept knowledge base data structure wherein each term data object in the first term set matches at least one of the plurality of search terms; and
generating a concept set containing concept data objects from the concept knowledge base data structure wherein each concept data object in the concept set is associated with one or more of the term data objects in the first term set.
3. The method of claim 2 wherein a concept vector is generated for each concept in the concept set and a document vector is generated for the at least one returned document and wherein the accordance indicator is based on a fuzzy membership score determined by comparing the concept vector to the document vector.
4. The method of claim 3 further comprising allowing a user to select more than one concept from the concept set and having the accordance indicator based on the combined fuzzy membership scores determined for the at least one returned document and the selected concepts.
5. The method of claim 3 her comprising displaying at least two accordance indicators with the at least one returned document, each of the at least two accordance indicators corresponding to one of the concepts in the concept set.
6. The method of claim 1 wherein the accordance indictors consist of a color shade.
7. The method of claim 6 wherein the returned documents of the search results are ordered in a list and wherein the at least one returned document is displayed in a first portion of the list of returned documents and simultaneously displayed in a second portion of the list of returned documents on a display screen, and wherein the number of returned documents displayed in the second portion is greater than the number of returned documents displayed in the first portion.
8. The method of claim 7 wherein the returned documents are ordered in the list based on the similarity between the at least one returned document and the at least one concept.
9. The method of claim 3 wherein the returned document is a document surrogate.
10. A data processing system for visually coding search results based on at least one concept, the data precessing system comprising:
at least one processor;
a memory operatively coupled to the at least one processor;
a display device operative to display data; and
a program module stored in the memory and operative for providing instructions to the at least one processor, the at least one processor responsive to the instructions of the program module, the program module operative for:
using a search query containing at least one search term to conduct a search of a plurality of computer readable documents and obtain search results comprising a plurality of returned documents;
obtaining the at least one concept by matching the at least one search term to the at least one concept;
evaluating the similarity between a returned document and the at least one concept; and
displaying on the display device the returned document with an accordance indicator, the accordance indicator indicating the similarity between the returned document and the at least one concept.
11. The apparatus of claim 10 wherein the at least one concept is matched to the at least one search term by:
accessing a concept knowledge base data structure having a plurality of concept data objects and a plurality of term data objects, each term data object defining a term and associated with at least one of the concept data objects, each concept data object defining an object;
generating a first term set containing term data objects from the concept knowledge base data structure wherein each term data object in the first term set matches at least one of the plurality of search terms; and
generating a concept set containing concept data objects from the concept knowledge base data structure wherein each concept data object in the concept set is associated with one or more of the term data objects in the first term set.
12. The apparatus of claim 11 wherein a concept vector is generated for each concept in the concept set and a document vector is generated for the at least one returned document and wherein the accordance indicator is based on a fuzzy membership score determined by comparing the concept vector to the document vector.
13. The apparatus of claim 12 further comprising allowing a user to select more than one concept from the concept set and having the accordance indicator based on the combined the fuzzy membership scores determined for the at least one returned document and the selected concepts.
14. The apparatus of claim 12 further comprising displaying at least two accordance indicators with the at least one returned document, each of the at least two accordance indicators corresponding to one of the concepts in the concept set.
15. The apparatus of claim 10 wherein the accordance indictors consist of a color shade.
16. The apparatus of claim 12 wherein the returned documents of the search results are ordered in a list and wherein the at least one returned document is displayed in a first portion of the list of returned documents and simultaneously displayed in a second portion of the list of returned documents on a display screen, and wherein the member of returned documents displayed in the second portion is greater than the number of returned documents displayed in the first portion.
17. The apparatus of claim 16 wherein the returned documents are ordered in the list based on the similarity between the at least one returned document and the at least one concept.
18. The apparatus of claim 12 wherein the returned document is a document surrogate.
19. A computer readable memory having recorded thereon statements and instructions for execution by a computer to carry out the method of claim 1.
20. A method of visually sorting search results based on at least one concept, the method comprising:
using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising a first returned document and a second returned document;
obtaining the least one concept by matching the plurality of search terms to the at least one concept;
determining a first accordance value by evaluating the similarity between the first returned document and the at least one concept and a second accordance value by evaluating the similarity between the second returned document aid the at least one concept; and
displaying the first returned document and second returned document sorted in an order based on the first accordance value and the second accordance value.
21. The method of claim 20 wherein the at least one concept is matched to the plurality of search terms by:
accessing a concept knowledge base data structure having a plurality of concept data objects and a plurality of term data objects, each term data object defining a term and associated with at least one of the concept data objects and each concept data object defining an object;
generating a first term set containing term data objects from the concept knowledge base data structure wherein each term data object in the first term set matches at least one of the plurality of search terms; and
generating a concept set containing concept data objects from the concept knowledge base data structure wherein each concept data object in the concept set is associated with one or more of the term data objects in the first term set.
22. The method of claim 21 wherein a concept vector is generated for each concept in the concept set, a first document vector is generated for the first returned document, and a second document vector for the second returned document, and wherein the first accordance indicator is based on a fuzzy membership score determined by comparing the concept vector of the at least one concept to the first document vector and the second accordance indicator is based on a fuzzy membership score determined by comparing the concept vector of the at least one concept to the second document vector.
23. The method of claim 22 further comprising displaying the first returned document with a first accordance indicator based on the first accordance value and the second returned document with a second accordance indicator based on the second accordance value.
24. The method of claim 23 further comprising displaying at least two accordance indicators with the first returned document, each of the at least two accordance indicators corresponding to one of the concepts in the concept set.
25. The method of claim 24 wherein the accordance indictors consist of a color shade.
26. The method of claim 22 wherein the search results comprise of list of returned documents containing the first returned document and second returned document and wherein the first returned document and second returned document are displayed in a first portion of the list of returned documents and simultaneously displayed in a second portion of the list of returned documents, and wherein the number of returned documents displayed in the second portion is greater than the number of returned documents displayed in the first portion.
27. The method of claim 20 wherein the first returned document and second returned document are document surrogates.
28. A data processing system for visually sorting search results based on at least one concept, the data processing system comprising:
at least one processor;
a memory operatively coupled to the at least one processor;
a display device operative to display data; and
a program module stored in the memory and operative for providing instructions to the at least one processor, the at least one processor responsive to the instructions of the program module, the program module operative for:
using a search query containing a plurality of search terms to conduct a search of a plurality of computer readable documents and obtain the search results, the search results comprising a first returned document and a second returned document;
obtaining the least one concept by matching the plurality of search terms to the at least one concept;
determining a first accordance value by evaluating the similarity between the first returned document and the at least one concept and a second accordance value by evaluating the similarity between the second returned document and the at least one concept; and
displaying the first returned document and second returned document sorted in an order based on the first accordance value and the second accordance value.
29. The apparatus of claim 28 wherein the at least one concept is matched to the plurality of search terms by:
accessing a concept knowledge base data structure having a plurality of concept data objects and a plurality of term data objects, each term data object defining a term and associated with at least one of the concept data objects and each concept data object defining an object;
generating a first term set containing term data objects from the concept knowledge base data structure wherein each term data object in the first term set matches at least one of the plurality of search terms; and
generating a concept set containing concept data objects from the concept knowledge base data structure wherein each concept data object in the concept set is associated with one or more of the term data objects in the first term set.
30. The apparatus of claim 29 wherein a concept vector is generated for each concept in the concept set, a first document vector is generated for the first returned document, and a second document vector for the second returned document, and wherein the first accordance indicator is based on a fuzzy membership score determined by comparing the concept vector of the at least one concept to the first document vector and the second accordance indicator is based on a fuzzy membership score determined by comparing the concept vector of the at least one concept to the second document vector.
31. The apparatus of claim 30 further comprising displaying the first returned document with a first accordance indicator based on the first accordance value and the second returned document with a second accordance indicator based on the second accordance value.
32. The method of claim 31 further comprising displaying at least two accordance indicators with the first returned document, each of the at least two accordance indicators corresponding to one of the concepts in the concept set.
33. The method of claim 31 wherein the first accordance indictor and second accordance indicator consist of a color shade.
34. The method of claim 30 wherein the search results comprise of list of returned documents containing the first returned document an second returned document and wherein the first returned document and second returned document are displayed in a rust portion of the list of returned documents and simultaneously displayed in a second portion of the list of returned documents, and wherein the number of returned documents displayed in the second portion is greater than the number of returned documents displayed in the first portion.
35. The method of claim 28 wherein the first returned document and second returned document are document surrogates.
36. A computer readable memory having recorded thereon statements and instructions for execution by a computer to carry out the method of claim 20.
US11/526,409 2006-06-06 2006-09-25 Method and apparatus for concept-based visual presentation of search results Expired - Fee Related US7809717B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CA2,549,536 2006-06-06
CA2549536A CA2549536C (en) 2006-06-06 2006-06-06 Method and apparatus for construction and use of concept knowledge base
CA2549536 2006-06-06

Publications (2)

Publication Number Publication Date
US20070282809A1 true US20070282809A1 (en) 2007-12-06
US7809717B1 US7809717B1 (en) 2010-10-05

Family

ID=38791565

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/526,409 Expired - Fee Related US7809717B1 (en) 2006-06-06 2006-09-25 Method and apparatus for concept-based visual presentation of search results

Country Status (2)

Country Link
US (1) US7809717B1 (en)
CA (1) CA2549536C (en)

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243828A1 (en) * 2007-03-29 2008-10-02 Reztlaff James R Search and Indexing on a User Device
US20080295039A1 (en) * 2007-05-21 2008-11-27 Laurent An Minh Nguyen Animations
US20090276420A1 (en) * 2008-05-04 2009-11-05 Gang Qiu Method and system for extending content
US20090326872A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Analytical Map Models
US20100131254A1 (en) * 2008-11-26 2010-05-27 Microsoft Corporation Use of taxonomized analytics reference model
US20100131248A1 (en) * 2008-11-26 2010-05-27 Microsoft Corporation Reference model for data-driven analytics
US20100131255A1 (en) * 2008-11-26 2010-05-27 Microsoft Corporation Hybrid solver for data-driven analytics
US20100131546A1 (en) * 2008-11-26 2010-05-27 Microsoft Way Search and exploration using analytics reference model
US20100321407A1 (en) * 2009-06-19 2010-12-23 Microsoft Corporation Data-driven model implemented with spreadsheets
US7865817B2 (en) 2006-12-29 2011-01-04 Amazon Technologies, Inc. Invariant referencing in digital works
US20110040747A1 (en) * 2009-08-12 2011-02-17 Vladimir Brad Reference file for formatted views
US8117145B2 (en) 2008-06-27 2012-02-14 Microsoft Corporation Analytical model solver framework
US8131647B2 (en) 2005-01-19 2012-03-06 Amazon Technologies, Inc. Method and system for providing annotations of a digital work
WO2012094592A1 (en) * 2011-01-07 2012-07-12 Rengaswamy Mohan Concepts and link discovery system
CN102725758A (en) * 2010-02-05 2012-10-10 微软公司 Generating and presenting lateral concepts
US8314793B2 (en) 2008-12-24 2012-11-20 Microsoft Corporation Implied analytical reasoning and computation
US8352397B2 (en) 2009-09-10 2013-01-08 Microsoft Corporation Dependency graph in data-driven model
US8352449B1 (en) 2006-03-29 2013-01-08 Amazon Technologies, Inc. Reader device content indexing
US8378979B2 (en) 2009-01-27 2013-02-19 Amazon Technologies, Inc. Electronic device with haptic feedback
US8411085B2 (en) 2008-06-27 2013-04-02 Microsoft Corporation Constructing view compositions for domain-specific environments
US8417772B2 (en) 2007-02-12 2013-04-09 Amazon Technologies, Inc. Method and system for transferring content from the web to mobile devices
US8423889B1 (en) 2008-06-05 2013-04-16 Amazon Technologies, Inc. Device specific presentation control for electronic book reader devices
US8493406B2 (en) 2009-06-19 2013-07-23 Microsoft Corporation Creating new charts and data visualizations
US8531451B2 (en) 2009-06-19 2013-09-10 Microsoft Corporation Data-driven visualization transformation
US8571535B1 (en) 2007-02-12 2013-10-29 Amazon Technologies, Inc. Method and system for a hosted mobile management service architecture
US8620635B2 (en) 2008-06-27 2013-12-31 Microsoft Corporation Composition of analytics models
US8692826B2 (en) 2009-06-19 2014-04-08 Brian C. Beckman Solver-based visualization framework
US8725565B1 (en) 2006-09-29 2014-05-13 Amazon Technologies, Inc. Expedited acquisition of a digital item following a sample presentation of the item
US8788574B2 (en) 2009-06-19 2014-07-22 Microsoft Corporation Data-driven visualization of pseudo-infinite scenes
US8793575B1 (en) 2007-03-29 2014-07-29 Amazon Technologies, Inc. Progress indication for a digital work
US8832584B1 (en) 2009-03-31 2014-09-09 Amazon Technologies, Inc. Questions on highlighted passages
US8866818B2 (en) 2009-06-19 2014-10-21 Microsoft Corporation Composing shapes and data series in geometries
US20140379364A1 (en) * 2013-06-20 2014-12-25 Koninklijke Philips N.V. Intelligent computer-guided structured reporting for efficiency and clinical decision support
US9087032B1 (en) 2009-01-26 2015-07-21 Amazon Technologies, Inc. Aggregation of highlights
US9104750B1 (en) * 2012-05-22 2015-08-11 Google Inc. Using concepts as contexts for query term substitutions
US9158741B1 (en) 2011-10-28 2015-10-13 Amazon Technologies, Inc. Indicators for navigating digital works
US9245243B2 (en) 2009-04-14 2016-01-26 Ureveal, Inc. Concept-based analysis of structured and unstructured data using concept inheritance
US9275052B2 (en) 2005-01-19 2016-03-01 Amazon Technologies, Inc. Providing annotations of a digital work
US9330503B2 (en) 2009-06-19 2016-05-03 Microsoft Technology Licensing, Llc Presaging and surfacing interactivity within data visualizations
US9495322B1 (en) 2010-09-21 2016-11-15 Amazon Technologies, Inc. Cover display
US9564089B2 (en) 2009-09-28 2017-02-07 Amazon Technologies, Inc. Last screen rendering for electronic book reader
US20170060939A1 (en) * 2015-08-25 2017-03-02 Schlafender Hase GmbH Software & Communications Method for comparing text files with differently arranged text sections in documents
US9672533B1 (en) 2006-09-29 2017-06-06 Amazon Technologies, Inc. Acquisition of an item based on a catalog presentation of items
US20180196855A1 (en) * 2015-09-25 2018-07-12 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and system for displaying search results, apparatus and computer storage medium
US10628504B2 (en) 2010-07-30 2020-04-21 Microsoft Technology Licensing, Llc System of providing suggestions based on accessible and contextual information

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080114750A1 (en) * 2006-11-14 2008-05-15 Microsoft Corporation Retrieval and ranking of items utilizing similarity
US8983989B2 (en) 2010-02-05 2015-03-17 Microsoft Technology Licensing, Llc Contextual queries
US8150859B2 (en) * 2010-02-05 2012-04-03 Microsoft Corporation Semantic table of contents for search results
US8260664B2 (en) * 2010-02-05 2012-09-04 Microsoft Corporation Semantic advertising selection from lateral concepts and topics
US8434001B2 (en) 2010-06-03 2013-04-30 Rhonda Enterprises, Llc Systems and methods for presenting a content summary of a media item to a user based on a position within the media item
US9326116B2 (en) 2010-08-24 2016-04-26 Rhonda Enterprises, Llc Systems and methods for suggesting a pause position within electronic text
US9069754B2 (en) 2010-09-29 2015-06-30 Rhonda Enterprises, Llc Method, system, and computer readable medium for detecting related subgroups of text in an electronic document
US8990246B2 (en) * 2011-12-15 2015-03-24 Yahoo! Inc. Understanding and addressing complex information needs
US9092740B2 (en) * 2012-11-08 2015-07-28 International Business Machines Corporation Concept noise reduction in deep question answering systems

Citations (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5634051A (en) * 1993-10-28 1997-05-27 Teltech Resource Network Corporation Information management system
US5768678A (en) * 1996-05-08 1998-06-16 Pyron Corporation Manganese sulfide composition and its method of production
US6038560A (en) * 1997-05-21 2000-03-14 Oracle Corporation Concept knowledge base search and retrieval system
US20020023077A1 (en) * 2000-06-09 2002-02-21 Nguyen Thanh Ngoc Method and apparatus for data collection and knowledge management
US20020042793A1 (en) * 2000-08-23 2002-04-11 Jun-Hyeog Choi Method of order-ranking document clusters using entropy data and bayesian self-organizing feature maps
US20020052894A1 (en) * 2000-08-18 2002-05-02 Francois Bourdoncle Searching tool and process for unified search using categories and keywords
US20020107840A1 (en) * 2000-09-12 2002-08-08 Rishe Naphtali David Database querying system and method
US6523026B1 (en) * 1999-02-08 2003-02-18 Huntsman International Llc Method for retrieving semantically distant analogies
US20030177111A1 (en) * 1999-11-16 2003-09-18 Searchcraft Corporation Method for searching from a plurality of data sources
US6675159B1 (en) * 2000-07-27 2004-01-06 Science Applic Int Corp Concept-based search and retrieval system
US6711585B1 (en) * 1999-06-15 2004-03-23 Kanisa Inc. System and method for implementing a knowledge management system
US20040122656A1 (en) * 2001-03-16 2004-06-24 Eli Abir Knowledge system method and appparatus
US20040220944A1 (en) * 2003-05-01 2004-11-04 Behrens Clifford A Information retrieval and text mining using distributed latent semantic indexing
US20040267774A1 (en) * 2003-06-30 2004-12-30 Ibm Corporation Multi-modal fusion in content-based retrieval
US6847966B1 (en) * 2002-04-24 2005-01-25 Engenium Corporation Method and system for optimally searching a document database using a representative semantic space
US20050165766A1 (en) * 2000-02-01 2005-07-28 Andrew Szabo Computer graphic display visualization system and method
US20060047663A1 (en) * 2004-09-02 2006-03-02 Rail Peter D System and method for guiding navigation through a hypertext system
US20060074859A1 (en) * 2003-05-28 2006-04-06 Bomi Patel-Framroze Of Row2 Technologies Inc. System, apparatus, and method for user tunable and selectable searching of a database using a weighted quantized feature vector
US20060167930A1 (en) * 2004-10-08 2006-07-27 George Witwer Self-organized concept search and data storage method
US20060184521A1 (en) * 1999-07-30 2006-08-17 Ponte Jay M Compressed document surrogates
US7113960B2 (en) * 2002-08-22 2006-09-26 International Business Machines Corporation Search on and search for functions in applications with varying data types
US20060259481A1 (en) * 2005-05-12 2006-11-16 Xerox Corporation Method of analyzing documents
US20060265388A1 (en) * 2005-05-20 2006-11-23 Woelfel Joseph K Information retrieval system and method for distinguishing misrecognized queries and unavailable documents
US7181438B1 (en) * 1999-07-21 2007-02-20 Alberti Anemometer, Llc Database access system
US20070073533A1 (en) * 2005-09-23 2007-03-29 Fuji Xerox Co., Ltd. Systems and methods for structural indexing of natural language text
US20070073748A1 (en) * 2005-09-27 2007-03-29 Barney Jonathan A Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects
US20070078848A1 (en) * 2005-10-04 2007-04-05 Microsoft Corporation Indexing and caching strategy for local queries
US7249117B2 (en) * 2002-05-22 2007-07-24 Estes Timothy W Knowledge discovery agent system and method
US20070203693A1 (en) * 2002-05-22 2007-08-30 Estes Timothy W Knowledge Discovery Agent System and Method
US20070214154A1 (en) * 2004-06-25 2007-09-13 Gery Ducatel Data Storage And Retrieval
US20070214131A1 (en) * 2006-03-13 2007-09-13 Microsoft Corporation Re-ranking search results based on query log
US20070244859A1 (en) * 2006-04-13 2007-10-18 American Chemical Society Method and system for displaying relationship between structured data and unstructured data
US7296021B2 (en) * 2004-05-21 2007-11-13 International Business Machines Corporation Method, system, and article to specify compound query, displaying visual indication includes a series of graphical bars specify weight relevance, ordered segments of unique colors where each segment length indicative of the extent of match of each object with one of search parameters
US20080140616A1 (en) * 2005-09-21 2008-06-12 Nicolas Encina Document processing

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5600831A (en) 1994-02-28 1997-02-04 Lucent Technologies Inc. Apparatus and methods for retrieving information by modifying query plan based on description of information sources
US6070176A (en) 1997-01-30 2000-05-30 Intel Corporation Method and apparatus for graphically representing portions of the world wide web
GB9811874D0 (en) 1998-06-02 1998-07-29 Univ Brunel Information management system
US6742003B2 (en) 2001-04-30 2004-05-25 Microsoft Corporation Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
US6363377B1 (en) 1998-07-30 2002-03-26 Sarnoff Corporation Search data processor
US20020073079A1 (en) 2000-04-04 2002-06-13 Merijn Terheggen Method and apparatus for searching a database and providing relevance feedback
AUPQ673100A0 (en) 2000-04-06 2000-05-04 Limberis, Jim System and method for creating and searching web sites
US6895406B2 (en) 2000-08-25 2005-05-17 Seaseer R&D, Llc Dynamic personalization method of creating personalized user profiles for searching a database of information
AU2002317119A1 (en) 2001-07-06 2003-01-21 Angoss Software Corporation A method and system for the visual presentation of data mining models

Patent Citations (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5634051A (en) * 1993-10-28 1997-05-27 Teltech Resource Network Corporation Information management system
US5768678A (en) * 1996-05-08 1998-06-16 Pyron Corporation Manganese sulfide composition and its method of production
US6038560A (en) * 1997-05-21 2000-03-14 Oracle Corporation Concept knowledge base search and retrieval system
US6523026B1 (en) * 1999-02-08 2003-02-18 Huntsman International Llc Method for retrieving semantically distant analogies
US7401087B2 (en) * 1999-06-15 2008-07-15 Consona Crm, Inc. System and method for implementing a knowledge management system
US6711585B1 (en) * 1999-06-15 2004-03-23 Kanisa Inc. System and method for implementing a knowledge management system
US7181438B1 (en) * 1999-07-21 2007-02-20 Alberti Anemometer, Llc Database access system
US20060184521A1 (en) * 1999-07-30 2006-08-17 Ponte Jay M Compressed document surrogates
US20030177111A1 (en) * 1999-11-16 2003-09-18 Searchcraft Corporation Method for searching from a plurality of data sources
US20060288023A1 (en) * 2000-02-01 2006-12-21 Alberti Anemometer Llc Computer graphic display visualization system and method
US20050165766A1 (en) * 2000-02-01 2005-07-28 Andrew Szabo Computer graphic display visualization system and method
US20020023077A1 (en) * 2000-06-09 2002-02-21 Nguyen Thanh Ngoc Method and apparatus for data collection and knowledge management
US6721729B2 (en) * 2000-06-09 2004-04-13 Thanh Ngoc Nguyen Method and apparatus for electronic file search and collection
US6675159B1 (en) * 2000-07-27 2004-01-06 Science Applic Int Corp Concept-based search and retrieval system
US20020052894A1 (en) * 2000-08-18 2002-05-02 Francois Bourdoncle Searching tool and process for unified search using categories and keywords
US20020042793A1 (en) * 2000-08-23 2002-04-11 Jun-Hyeog Choi Method of order-ranking document clusters using entropy data and bayesian self-organizing feature maps
US20020107840A1 (en) * 2000-09-12 2002-08-08 Rishe Naphtali David Database querying system and method
US20040122656A1 (en) * 2001-03-16 2004-06-24 Eli Abir Knowledge system method and appparatus
US6847966B1 (en) * 2002-04-24 2005-01-25 Engenium Corporation Method and system for optimally searching a document database using a representative semantic space
US20070203693A1 (en) * 2002-05-22 2007-08-30 Estes Timothy W Knowledge Discovery Agent System and Method
US7249117B2 (en) * 2002-05-22 2007-07-24 Estes Timothy W Knowledge discovery agent system and method
US7113960B2 (en) * 2002-08-22 2006-09-26 International Business Machines Corporation Search on and search for functions in applications with varying data types
US20040220944A1 (en) * 2003-05-01 2004-11-04 Behrens Clifford A Information retrieval and text mining using distributed latent semantic indexing
US20060074859A1 (en) * 2003-05-28 2006-04-06 Bomi Patel-Framroze Of Row2 Technologies Inc. System, apparatus, and method for user tunable and selectable searching of a database using a weighted quantized feature vector
US20040267774A1 (en) * 2003-06-30 2004-12-30 Ibm Corporation Multi-modal fusion in content-based retrieval
US7296021B2 (en) * 2004-05-21 2007-11-13 International Business Machines Corporation Method, system, and article to specify compound query, displaying visual indication includes a series of graphical bars specify weight relevance, ordered segments of unique colors where each segment length indicative of the extent of match of each object with one of search parameters
US20070214154A1 (en) * 2004-06-25 2007-09-13 Gery Ducatel Data Storage And Retrieval
US20060047663A1 (en) * 2004-09-02 2006-03-02 Rail Peter D System and method for guiding navigation through a hypertext system
US20060167930A1 (en) * 2004-10-08 2006-07-27 George Witwer Self-organized concept search and data storage method
US20060259481A1 (en) * 2005-05-12 2006-11-16 Xerox Corporation Method of analyzing documents
US20060265388A1 (en) * 2005-05-20 2006-11-23 Woelfel Joseph K Information retrieval system and method for distinguishing misrecognized queries and unavailable documents
US20080140616A1 (en) * 2005-09-21 2008-06-12 Nicolas Encina Document processing
US20070073533A1 (en) * 2005-09-23 2007-03-29 Fuji Xerox Co., Ltd. Systems and methods for structural indexing of natural language text
US20070073748A1 (en) * 2005-09-27 2007-03-29 Barney Jonathan A Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects
US20070078848A1 (en) * 2005-10-04 2007-04-05 Microsoft Corporation Indexing and caching strategy for local queries
US20070214131A1 (en) * 2006-03-13 2007-09-13 Microsoft Corporation Re-ranking search results based on query log
US20070244859A1 (en) * 2006-04-13 2007-10-18 American Chemical Society Method and system for displaying relationship between structured data and unstructured data

Cited By (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131647B2 (en) 2005-01-19 2012-03-06 Amazon Technologies, Inc. Method and system for providing annotations of a digital work
US9275052B2 (en) 2005-01-19 2016-03-01 Amazon Technologies, Inc. Providing annotations of a digital work
US10853560B2 (en) 2005-01-19 2020-12-01 Amazon Technologies, Inc. Providing annotations of a digital work
US8352449B1 (en) 2006-03-29 2013-01-08 Amazon Technologies, Inc. Reader device content indexing
US9292873B1 (en) 2006-09-29 2016-03-22 Amazon Technologies, Inc. Expedited acquisition of a digital item following a sample presentation of the item
US9672533B1 (en) 2006-09-29 2017-06-06 Amazon Technologies, Inc. Acquisition of an item based on a catalog presentation of items
US8725565B1 (en) 2006-09-29 2014-05-13 Amazon Technologies, Inc. Expedited acquisition of a digital item following a sample presentation of the item
US7865817B2 (en) 2006-12-29 2011-01-04 Amazon Technologies, Inc. Invariant referencing in digital works
US9116657B1 (en) 2006-12-29 2015-08-25 Amazon Technologies, Inc. Invariant referencing in digital works
US9313296B1 (en) 2007-02-12 2016-04-12 Amazon Technologies, Inc. Method and system for a hosted mobile management service architecture
US8571535B1 (en) 2007-02-12 2013-10-29 Amazon Technologies, Inc. Method and system for a hosted mobile management service architecture
US8417772B2 (en) 2007-02-12 2013-04-09 Amazon Technologies, Inc. Method and system for transferring content from the web to mobile devices
US9219797B2 (en) 2007-02-12 2015-12-22 Amazon Technologies, Inc. Method and system for a hosted mobile management service architecture
US8793575B1 (en) 2007-03-29 2014-07-29 Amazon Technologies, Inc. Progress indication for a digital work
US9665529B1 (en) 2007-03-29 2017-05-30 Amazon Technologies, Inc. Relative progress and event indicators
US8954444B1 (en) 2007-03-29 2015-02-10 Amazon Technologies, Inc. Search and indexing on a user device
US7716224B2 (en) 2007-03-29 2010-05-11 Amazon Technologies, Inc. Search and indexing on a user device
US20080243828A1 (en) * 2007-03-29 2008-10-02 Reztlaff James R Search and Indexing on a User Device
US7853900B2 (en) 2007-05-21 2010-12-14 Amazon Technologies, Inc. Animations
US9568984B1 (en) 2007-05-21 2017-02-14 Amazon Technologies, Inc. Administrative tasks in a media consumption system
US8990215B1 (en) 2007-05-21 2015-03-24 Amazon Technologies, Inc. Obtaining and verifying search indices
US8234282B2 (en) 2007-05-21 2012-07-31 Amazon Technologies, Inc. Managing status of search index generation
US8965807B1 (en) 2007-05-21 2015-02-24 Amazon Technologies, Inc. Selecting and providing items in a media consumption system
US7921309B1 (en) 2007-05-21 2011-04-05 Amazon Technologies Systems and methods for determining and managing the power remaining in a handheld electronic device
US8266173B1 (en) * 2007-05-21 2012-09-11 Amazon Technologies, Inc. Search results generation and sorting
US9479591B1 (en) 2007-05-21 2016-10-25 Amazon Technologies, Inc. Providing user-supplied items to a user device
US20080295039A1 (en) * 2007-05-21 2008-11-27 Laurent An Minh Nguyen Animations
US9178744B1 (en) 2007-05-21 2015-11-03 Amazon Technologies, Inc. Delivery of items for consumption by a user device
US8341513B1 (en) 2007-05-21 2012-12-25 Amazon.Com Inc. Incremental updates of items
US8341210B1 (en) 2007-05-21 2012-12-25 Amazon Technologies, Inc. Delivery of items for consumption by a user device
US8700005B1 (en) 2007-05-21 2014-04-15 Amazon Technologies, Inc. Notification of a user device to perform an action
US9888005B1 (en) 2007-05-21 2018-02-06 Amazon Technologies, Inc. Delivery of items for consumption by a user device
US8656040B1 (en) 2007-05-21 2014-02-18 Amazon Technologies, Inc. Providing user-supplied items to a user device
US20090276420A1 (en) * 2008-05-04 2009-11-05 Gang Qiu Method and system for extending content
US8296302B2 (en) * 2008-05-04 2012-10-23 Gang Qiu Method and system for extending content
US8423889B1 (en) 2008-06-05 2013-04-16 Amazon Technologies, Inc. Device specific presentation control for electronic book reader devices
US8620635B2 (en) 2008-06-27 2013-12-31 Microsoft Corporation Composition of analytics models
US8117145B2 (en) 2008-06-27 2012-02-14 Microsoft Corporation Analytical model solver framework
US20090326872A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Analytical Map Models
US8411085B2 (en) 2008-06-27 2013-04-02 Microsoft Corporation Constructing view compositions for domain-specific environments
US8255192B2 (en) 2008-06-27 2012-08-28 Microsoft Corporation Analytical map models
US8103608B2 (en) 2008-11-26 2012-01-24 Microsoft Corporation Reference model for data-driven analytics
US20100131248A1 (en) * 2008-11-26 2010-05-27 Microsoft Corporation Reference model for data-driven analytics
US20100131255A1 (en) * 2008-11-26 2010-05-27 Microsoft Corporation Hybrid solver for data-driven analytics
US20100131546A1 (en) * 2008-11-26 2010-05-27 Microsoft Way Search and exploration using analytics reference model
US20100131254A1 (en) * 2008-11-26 2010-05-27 Microsoft Corporation Use of taxonomized analytics reference model
US8145615B2 (en) 2008-11-26 2012-03-27 Microsoft Corporation Search and exploration using analytics reference model
US8155931B2 (en) 2008-11-26 2012-04-10 Microsoft Corporation Use of taxonomized analytics reference model
US8190406B2 (en) 2008-11-26 2012-05-29 Microsoft Corporation Hybrid solver for data-driven analytics
US8314793B2 (en) 2008-12-24 2012-11-20 Microsoft Corporation Implied analytical reasoning and computation
US9087032B1 (en) 2009-01-26 2015-07-21 Amazon Technologies, Inc. Aggregation of highlights
US8378979B2 (en) 2009-01-27 2013-02-19 Amazon Technologies, Inc. Electronic device with haptic feedback
US8832584B1 (en) 2009-03-31 2014-09-09 Amazon Technologies, Inc. Questions on highlighted passages
US9245243B2 (en) 2009-04-14 2016-01-26 Ureveal, Inc. Concept-based analysis of structured and unstructured data using concept inheritance
US20100321407A1 (en) * 2009-06-19 2010-12-23 Microsoft Corporation Data-driven model implemented with spreadsheets
US9330503B2 (en) 2009-06-19 2016-05-03 Microsoft Technology Licensing, Llc Presaging and surfacing interactivity within data visualizations
US9342904B2 (en) 2009-06-19 2016-05-17 Microsoft Technology Licensing, Llc Composing shapes and data series in geometries
US8493406B2 (en) 2009-06-19 2013-07-23 Microsoft Corporation Creating new charts and data visualizations
US8866818B2 (en) 2009-06-19 2014-10-21 Microsoft Corporation Composing shapes and data series in geometries
US8692826B2 (en) 2009-06-19 2014-04-08 Brian C. Beckman Solver-based visualization framework
US8531451B2 (en) 2009-06-19 2013-09-10 Microsoft Corporation Data-driven visualization transformation
US8259134B2 (en) 2009-06-19 2012-09-04 Microsoft Corporation Data-driven model implemented with spreadsheets
US8788574B2 (en) 2009-06-19 2014-07-22 Microsoft Corporation Data-driven visualization of pseudo-infinite scenes
US8700646B2 (en) * 2009-08-12 2014-04-15 Apple Inc. Reference file for formatted views
US20110040747A1 (en) * 2009-08-12 2011-02-17 Vladimir Brad Reference file for formatted views
US8352397B2 (en) 2009-09-10 2013-01-08 Microsoft Corporation Dependency graph in data-driven model
US9564089B2 (en) 2009-09-28 2017-02-07 Amazon Technologies, Inc. Last screen rendering for electronic book reader
CN102725758A (en) * 2010-02-05 2012-10-10 微软公司 Generating and presenting lateral concepts
US10628504B2 (en) 2010-07-30 2020-04-21 Microsoft Technology Licensing, Llc System of providing suggestions based on accessible and contextual information
US9495322B1 (en) 2010-09-21 2016-11-15 Amazon Technologies, Inc. Cover display
WO2012094592A1 (en) * 2011-01-07 2012-07-12 Rengaswamy Mohan Concepts and link discovery system
US9158741B1 (en) 2011-10-28 2015-10-13 Amazon Technologies, Inc. Indicators for navigating digital works
US9104750B1 (en) * 2012-05-22 2015-08-11 Google Inc. Using concepts as contexts for query term substitutions
US20140379364A1 (en) * 2013-06-20 2014-12-25 Koninklijke Philips N.V. Intelligent computer-guided structured reporting for efficiency and clinical decision support
US20170060939A1 (en) * 2015-08-25 2017-03-02 Schlafender Hase GmbH Software & Communications Method for comparing text files with differently arranged text sections in documents
US10474672B2 (en) * 2015-08-25 2019-11-12 Schlafender Hase GmbH Software & Communications Method for comparing text files with differently arranged text sections in documents
US10949439B2 (en) * 2015-09-25 2021-03-16 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and system for displaying search results, apparatus and computer storage medium
US20180196855A1 (en) * 2015-09-25 2018-07-12 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and system for displaying search results, apparatus and computer storage medium

Also Published As

Publication number Publication date
CA2549536A1 (en) 2007-12-06
CA2549536C (en) 2012-12-04
US7809717B1 (en) 2010-10-05

Similar Documents

Publication Publication Date Title
US7809717B1 (en) Method and apparatus for concept-based visual presentation of search results
US7788261B2 (en) Interactive web information retrieval using graphical word indicators
US10650058B2 (en) Information retrieval systems with database-selection aids
US7752243B2 (en) Method and apparatus for construction and use of concept knowledge base
US7752557B2 (en) Method and apparatus of visual representations of search results
US10997678B2 (en) Systems and methods for image searching of patent-related documents
EP1435581B1 (en) Retrieval of structured documents
JP5391633B2 (en) Term recommendation to define the ontology space
JP4944406B2 (en) How to generate document descriptions based on phrases
JP4976666B2 (en) Phrase identification method in information retrieval system
JP4944405B2 (en) Phrase-based indexing method in information retrieval system
US10354308B2 (en) Distinguishing accessories from products for ranking search results
US20090119281A1 (en) Granular knowledge based search engine
US20070112867A1 (en) Methods and apparatus for rank-based response set clustering
JP2007527558A (en) Navigation by websites and other information sources
CN111506727B (en) Text content category acquisition method, apparatus, computer device and storage medium
US11232137B2 (en) Methods for evaluating term support in patent-related documents
CN112184021A (en) Answer quality evaluation method based on similar support set
CA2560159C (en) Method and apparatus for concept-based visual presentation of search results
Irshad et al. SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data.
Vinay et al. Evaluating relevance feedback algorithms for searching on small displays
Purnama et al. Search system for translation of Al-Qur'an verses in Indonesian using BM25 and semantic query expansion
CN109213830A (en) The document retrieval system of professional technical documentation
Ramya et al. Automatic extraction of facets for user queries [AEFUQ]
Sharma Hybrid Query Expansion assisted Adaptive Visual Interface for Exploratory Information Retrieval

Legal Events

Date Code Title Description
AS Assignment

Owner name: UNIVERSITY OF REGINA, CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HOEBER, ORLAND;YANG, XUE-DONG;YAO, YIYU;REEL/FRAME:019755/0150

Effective date: 20060918

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 4

SULP Surcharge for late payment
AS Assignment

Owner name: YANG, XUE-DONG, CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UNIVERSITY OF REGINA;REEL/FRAME:034046/0358

Effective date: 20141015

Owner name: BLACKBIRD TECH LLC, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HOEBER, ORLAND;YANG, XUE-DONG;YAO, YIYU;REEL/FRAME:034046/0340

Effective date: 20141024

Owner name: YAO, YIYU, CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UNIVERSITY OF REGINA;REEL/FRAME:034046/0358

Effective date: 20141015

Owner name: HOEBER, ORLAND, CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UNIVERSITY OF REGINA;REEL/FRAME:034046/0358

Effective date: 20141015

AS Assignment

Owner name: SECURITY FINANCE LLC, DELAWARE

Free format text: SECURITY INTEREST;ASSIGNOR:BLACKBIRD TECH LLC;REEL/FRAME:037004/0544

Effective date: 20151106

IPR Aia trial proceeding filed before the patent and appeal board: inter partes review

Free format text: TRIAL NO: IPR2017-00899

Opponent name: KCURA LLC

Effective date: 20170214

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.)

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20181005