WO2006073810A2 - Associating features with entities, such as categories or web page documents, and/or weighting such features - Google Patents

Associating features with entities, such as categories or web page documents, and/or weighting such features Download PDF

Info

Publication number
WO2006073810A2
WO2006073810A2 PCT/US2005/046194 US2005046194W WO2006073810A2 WO 2006073810 A2 WO2006073810 A2 WO 2006073810A2 US 2005046194 W US2005046194 W US 2005046194W WO 2006073810 A2 WO2006073810 A2 WO 2006073810A2
Authority
WO
WIPO (PCT)
Prior art keywords
document
computer
implemented method
information
features
Prior art date
Application number
PCT/US2005/046194
Other languages
French (fr)
Other versions
WO2006073810A3 (en
Inventor
Ross Koningstein
Stephen Lawrence
Valentin Spitkovsky
Original Assignee
Google, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google, Inc. filed Critical Google, Inc.
Priority to EP05854841A priority Critical patent/EP1839203A4/en
Priority to AU2005323159A priority patent/AU2005323159B2/en
Priority to CA2592741A priority patent/CA2592741C/en
Publication of WO2006073810A2 publication Critical patent/WO2006073810A2/en
Publication of WO2006073810A3 publication Critical patent/WO2006073810A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • G06Q30/0256User search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Definitions

  • the present invention concerns advertising.
  • the present invention concerns improving targeted advertising.
  • Interactive advertising provides opportunities for advertisers to target their ads to a receptive audience. That is, targeted ads are more likely to be useful to end users since the ads may be relevant to a need inferred from some user activity (e.g., relevant to a user's search query to a search engine, relevant to content in a document requested by the user, etc.).
  • Query keyword relevant advertising has been used by search engines.
  • the AdWords advertising system by Google of Mountain View, CA is one example of query keyword relevant advertising.
  • content-relevant advertising systems such as the AdSense advertising system by Google for example, have been used. For example, U.S.
  • Patent Application Serial Numbers 10/314,427 (incorporated herein by reference and referred to as “the '427 application”) titled “METHODS AND APPARATUS FOR SERVING RELEVANT ADVERTISEMENTS", filed on December 6, 2002 and listing Jeffrey A. Dean, Georges R. Hank and Paul Bucheit as inventors, and 10/375,900 (incorporated by reference and referred to as “the '900 application”) titled “SERVING ADVERTISEMENTS BASED ON CONTENT,” filed on February 26, 2003 and listing Darrell Anderson, Paul Bucheit, Alex Carobus, Marie Cui, Jeffrey A. Dean, Georges R. Hank, Deepak Jindal and Narayanan Shivakumar as inventors, describe methods and apparatus for serving ads relevant to the content of a document, such as a Web page for example.
  • relevance information about the document is needed.
  • Such relevance information may be determined from information intrinsic to the document, such as content extracted from the document. For example, concepts or topics may be determined using the content of the document.
  • the document may also be assigned to one or more clusters.
  • feature vectors may be used to represent the occurrence of words and/or phrases in the document.
  • Embodiments consistent with the present invention may be used to determine features that may be used to represent relevance information (e.g., properties, characteristics, etc.) of an entity, such as a document or category for example. Such features may be determined and associated with the entity by accepting an identifier that identifies the entity, obtaining search query information related to the entity using the entity identifier, determining features using the obtained query information, and associating the features determined with the entity. In at least some embodiments consistent with the present invention, such features may be determined for an entity using query information, and/or perhaps user action information.. In at least some embodiments consistent with the present invention, in addition to, or instead of, query information, other serving parameter information may be used to determine and/or weight features.
  • relevance information e.g., properties, characteristics, etc.
  • weights of such features may be similarly determined.
  • the weights may be determined using scores.
  • the scores may be a function of one or more of (i) whether the document was selected, (ii) a user dwell time on a selected document, (iii) whether or not a conversion occurred with respect to the document, (iv) a frequency of queries including the feature, etc.
  • the document is a Web page.
  • the features are n-grams.
  • the relevance information of the document may be used to target the serving of advertisements with the document.
  • the features of a category may be used to associate query terms and categories, and/or ads and categories.
  • a score (e.g., a weight) associated with the feature-to-entity association may be updated by (i) using the feature-to-entity association to generate one or more results for presentation to a user, (ii) tracking user behavior with respect to the results, and (ii) updating the score associated with the feature-to-entity association using the tracked user behavior.
  • Figure l is a block diagram illustrating an exemplary on-line advertising environment in which, or with which, the present invention may be used.
  • Figure 2 is a bubble diagram illustrating operations that may be performed, and information that may be generated, used, and/or stored, by a document feature generation and/or update system consistent with the present invention.
  • Figure 3 is a bubble chart illustrating operations that may be used with search operations to associate query terms and selections with documents in a manner consistent with the present invention.
  • Figure 4 is a bubble diagram illustrating operations that may be performed, and information that may be generated, used, and/or stored, by a document feature generation and/or update system consistent with the present invention.
  • Figure 5 is a flow diagram of an exemplary method that may be used to generate and/or update document feature information in a manner consistent with the present invention.
  • Figure 6 is a flow diagram of an exemplary method that may be used to generate and/or update document feature information in a manner consistent with the present invention.
  • Figure 7 is block diagram of a machine that may perform one or more operations and store information used and/or generated in a manner consistent with the present invention.
  • Figure 8 is a diagram illustrating an example of how an exemplary embodiment consistent with present invention can make associations between categories and query terms and/or ads.
  • the present invention may involve novel methods, apparatus, message formats, and/or data structures for associating one or more features with an entity, such as a Web page document, or category for example, and/or applying and/or adjusting a score or weight to at least one of such features.
  • entity such as a Web page document, or category for example
  • the following description is presented to enable one skilled in the art to make and use the invention, and is provided in the context of particular applications and their requirements. Thus, the following description of embodiments consistent with the present invention provides illustration and description, but is not intended to be exhaustive or to limit the present invention to the precise form disclosed. Various modifications to the disclosed embodiments will be apparent to those skilled in the art, and the general principles set forth below may be applied to other embodiments and applications.
  • Online ads may have various intrinsic features. Such features may be specified by an application and/or an advertiser. These features are referred to as "ad features" below.
  • ad features may include a title line, ad text, and an embedded link.
  • ad features may include images, executable code, and an embedded link.
  • ad features may include one or more of the following: text, a link, an audio file, a video file, an image file, executable code, embedded information, etc.
  • Serving parameters may include, for example, one or more of the following: features of (including information on) a document on which, or with which, the ad was served, a search query or search results associated with the serving of the ad, a user characteristic (e.g., their geographic location, the language used by the user, the type of browser used, previous page views, previous behavior), a host or affiliate site (e.g., America Online, Google, Yahoo) that initiated the request, an absolute position of the ad on the page on which it was served, a position (spatial or temporal) of the ad relative to other ads served, an absolute size of the ad, a size of the ad relative to other ads, a color of the ad, a number of other ads served, types of other ads served, time of day served, time
  • serving parameters may be extrinsic to ad features, they may be associated with an ad as serving conditions or constraints. When used as serving conditions or constraints, such serving parameters are referred to simply as “serving constraints" (or “targeting criteria"). For example, in some systems, an advertiser may be able to target the serving of its ad by specifying that it is only to be served on weekdays, no lower than a certain position, only to users in a certain location, etc. As another example, in some systems, an advertiser may specify that its ad is to be served only if a page or search query includes certain keywords or phrases. As yet another example, in some systems, an advertiser may specify that its ad is to be served only if a document being served includes certain topics or concepts, or falls under a particular cluster or clusters, or some other classification or classifications.
  • Ad information may include any combination of ad features, ad serving constraints, information derivable from ad features or ad serving constraints (referred to as “ad derived information”), and/or information related to the ad (referred to as “ad related information”), as well as an extension of such information (e.g., information derived from ad related information).
  • the ratio of the number of selections (e.g., clickthroughs) of an ad to the number of impressions of the ad (i.e., the number of times an ad is rendered) is defined as the "selection rate" (or "clickthrough rate") of the ad.
  • a "conversion" is said to occur when a user consummates a transaction related to a previously served ad. What constitutes a conversion may vary from case to case and can be determined in a variety of ways. For example, it may be the case that a conversion occurs when a user clicks on an ad, is referred to the advertiser's Website, and consummates a purchase there before leaving that Website. Alternatively, a conversion may be defined as a user being shown an ad, and making a purchase on the advertiser's Website within a predetermined time (e.g., seven days).
  • a conversion may be defined by an advertiser to be any measurable/observable user action such as, for example, downloading a white paper, navigating to at least a given depth of a Website, viewing at least a certain number of Web pages, spending at least a predetermined amount of time on a Website or Web page, registering on a Website, etc.
  • user actions don't indicate a consummated purchase, they may indicate a sales lead, although user actions constituting a conversion are not limited to this. Indeed, many other definitions of what constitutes a conversion are possible.
  • conversion rate The ratio of the number of conversions to the number of impressions of the ad (i.e., the number of times an ad is rendered) is referred to as the "conversion rate.” If a conversion is defined to be able to occur within a predetermined time since the serving of an ad, one possible definition of the conversion rate might only consider ads that have been served more than the predetermined time in the past.
  • a "document” is to be broadly interpreted to include any machine-readable and machine-storable work product.
  • a document may be a file, a combination of files, one or more files with embedded links to other files, etc.
  • the files may be of any type, such as text, audio, image, video, etc.
  • Parts of a document to be rendered to an end user can be thought of as "content" of the document.
  • a document may include "structured data” containing both content (words, pictures, etc.) and some indication of the meaning of that content (for example, e-mail fields and associated data, HTML tags and associated data, etc.)
  • Ad spots in the document may be defined by embedded information or instructions.
  • a common document is a Web page.
  • Web pages often include content and may include embedded information (such as meta information, hyperlinks, etc.) and/or embedded instructions (such as JavaScript, etc.).
  • a document has a unique, addressable, storage location and can therefore be uniquely identified by this addressable location.
  • a universal resource locator is a unique address used to access information on the Internet.
  • Document information may include any information included in the document, information derivable from information included in the document (referred to as “document derived information”), and/or information related to the document (referred to as “document related information”), as well as an extensions of such information (e.g., information derived from related information).
  • An example of document derived information is a classification based on textual content of a document.
  • Examples of document related information include document information from other document(s) with links to the instant document, as well as document information from other document(s) to which the instant document links and document information from other document(s) related to the instant document.
  • Content from a document may be rendered on a "content rendering application or device".
  • content rendering applications or devices include an Internet browser (e.g., Explorer or Netscape), a media player (e.g., an MP3 player, a Realnetworks streaming audio file player, etc.), a viewer (e.g., an Abobe Acrobat pdf reader), etc.
  • a "content owner” is a person or entity that has some property right in the content of a document.
  • a content owner may be an author of the content.
  • a content owner may have rights to reproduce the content, rights to prepare derivative works of the content, rights to display or perform the content publicly, and/or other proscribed rights in the content.
  • a content server might be a content owner in the content of the documents it serves, this is not necessary.
  • User information may include user behavior information and/or user profile information.
  • E-mail information may include any information included in an e-mail (also referred to as "internal e-mail information”), information derivable from information included in the e-mail and/or information related to the e-mail, as well as extensions of such information (e.g., information derived from related information).
  • An example of information derived from e-mail information is information extracted or otherwise derived from search results returned in response to a search query composed of terms extracted from an e-mail subject line.
  • Examples of information related to e-mail information include e-mail information about one or more other e-mails sent by the same sender of a given e-mail, or user information about an e-mail recipient.
  • Information derived from or related to e-mail information may be referred to as "external e-mail information.” ⁇ 4.2 ENVIRONMENTS IN WHICH, OR WITH WHICH, THE PRESENT INVENTION MAY OPERATE
  • FIG 1 illustrates an exemplary environment 100 in which, or with which, the present invention may be used.
  • a user device also referred to as a "client” or “client device”
  • a user device 150 may include a browser facility (such as the Explorer browser from Microsoft, the Opera Web Browser from Opera Software of Norway, the Navigator browser from AOL/Time Warner, etc.), an e-mail facility (e.g., Outlook from Microsoft), or any other software application or hardware device used to render content.
  • a search engine 120 may permit user devices 150 to search collections of documents (e.g., Web pages).
  • a content server 130 may permit user devices 150 to access (e.g., for rendering) documents.
  • An e-mail server 140 may be used to provide e-mail functionality to user devices 150.
  • An ad server 110 may be used to serve ads to user devices 150. The ads may be served in association with search results provided by the search engine 120. Content-relevant ads may be served in association with content provided by the content server 130, and/or e-mail supported by the e-mail server 140 and/or user device 150 e-mail facilities.
  • the ad server 110 may be a content-relevant ad server, such as those described in the '427 and '900 applications introduced above.
  • ads may be targeted to documents served by content servers.
  • a content server 130 that receives requests for documents (e.g., articles, discussion threads, music, video, graphics, search results, Web page listings, etc.), and retrieves the requested document in response to, or otherwise services, the request may consume ads.
  • the content server 130 may submit a request for ads to the ad server 110.
  • a user device 150 may submit such a request.
  • a Web-based e-mail server 140 may submit such a request.
  • Such an ad request may include a number of ads desired.
  • the ad request may also include document request information.
  • This information may include the document itself (e.g., a Web page), a category or topic corresponding to the content of the document or the document request (e.g., arts, business, computers, arts-movies, arts-music, etc.), part or all of the document request, content age, content type (e.g., text, graphics, video, audio, mixed media, etc.), geolocation information, end user local time information, document information (such as document features for example), etc.
  • the document e.g., a Web page
  • a category or topic corresponding to the content of the document or the document request e.g., arts, business, computers, arts-movies, arts-music, etc.
  • content age e.g., text, graphics, video, audio, mixed media, etc.
  • geolocation information e.g., text, graphics, video, audio, mixed media, etc.
  • end user local time information e.g., text, graphics, video, audio, mixed media, etc.
  • document information such as document features for example
  • the content server 130, Web-based e-mail server 140, and/or user device 150 may combine the requested document with one or more of the advertisements provided by the ad server 110. This combined information including the document content and advertisement(s) is then forwarded towards, and/or rendered on, the end user device 150 that requested the document, for presentation to the user. Alternatively, or in addition, the ad(s) may be combined with, or rendered with, the requested document in some other way (e.g., by the client device).
  • the content server 130 or Web-based e-mail server 140 may transmit information about the ads and how, when, and/or where the ads are to be rendered (e.g., position, clickthrough or not, impression time, impression date, size, conversion or not, etc.) back to the ad server 110.
  • information about the ads and how, when, and/or where the ads are to be rendered e.g., position, clickthrough or not, impression time, impression date, size, conversion or not, etc.
  • the ad server 110 may store ad performance information.
  • a search engine 120 may receive queries for search results and may consume ads. In response, the search engine may retrieve relevant search results (e.g., from an index of Web pages).
  • relevant search results e.g., from an index of Web pages.
  • An exemplary search engine is described in the article S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Search Engine," Seventh International World Wide Web Conference, Brisbane, Australia and in U.S. Patent No. 6,285,999 (both incorporated herein by reference).
  • Such search results may include, for example, lists of Web page titles, snippets of text extracted from those Web pages, and hypertext links to those Web pages, and may be grouped into a predetermined number of (e.g., ten) search results.
  • the search engine 120 may submit a request for ads to the ad server 110.
  • the request may include a number of ads desired. This number may depend on the search results, the amount of screen or page space occupied by the search results, the size and shape of the ads, etc. In one embodiment, the number of desired ads will be from one to ten, and preferably from three to five.
  • the request for ads may also include the query (as entered or parsed), information based on the query (such as end user local time information, geolocation information, whether the query came from an affiliate and an identifier of such an affiliate), and/or information associated with, or based on, the search results.
  • Such information may include, for example, identifiers related to the search results (e.g., document identifiers or "docIDs”), scores related to the search results (e.g., information retrieval ("IR") scores such as dot products of feature vectors corresponding to a query and a document, Page Rank scores, and/or combinations of IR scores and Page Rank scores), snippets of text extracted from identified documents (e.g., Web pages), full text of identified documents, topics of identified documents, feature vectors of identified documents, etc.
  • identifiers related to the search results e.g., document identifiers or "docIDs”
  • scores related to the search results e.g., information retrieval (“IR") scores such as dot products of feature vectors corresponding to a query and a document, Page Rank scores, and/or combinations of IR scores and Page Rank scores
  • snippets of text extracted from identified documents e.g., Web pages
  • full text of identified documents e.g., topics of
  • the search engine 120 may combine the search results with one or more of the advertisements provided by the ad server 110.
  • the ad(s) may be combined with, or rendered with, the requested document in some other way (e.g., by the client device).
  • This combined information including the search results and advertisement(s) is then forwarded towards the user that submitted the search, for presentation to the user.
  • the search results are maintained as distinct from the ads, so as not to confuse the user between paid advertisements and presumably neutral search results.
  • the search engine 120 may transmit information about the ad and when (e.g., end user local time), where (e.g., geolocation), and/or how the ad was to be rendered (e.g., position, click-through or not, impression time, impression date, size, conversion or not, etc.) back to the ad server 110. Alternatively, or in addition, such information may be provided back to the ad server 110 by some other means. Consistent with the present invention, the search engine 120 may also associate search query information (and/or other serving parameter information) with the documents associated with search results, documents associated with ads, and/or ads.
  • search query information and/or other serving parameter information
  • the search engine 120 may also associate the search query information with user actions (e.g., selections, dwell time, etc.) with respect to the documents linked from the search result pages, and/or user actions (e.g., selections, conversions, etc.) with respect to the ads rendered with the search results pages.
  • user actions e.g., selections, dwell time, etc.
  • user actions e.g., selections, conversions, etc.
  • the Web-based e-mail server 140 may be thought of, generally, as a content server in which a document served is simply an e-mail. Further, e-mail applications (such as Microsoft Outlook for example) may be used to send and/or receive e-mail. Therefore, a Web-based e-mail server 140 or a client device 150 application may be thought of as an ad consumer. Thus, e-mails may be thought of as documents, and targeted ads may be served in association with such documents. For example, one or more ads may be served in, under, over, or otherwise in association with an e-mail.
  • e-mails may be thought of as documents, and targeted ads may be served in association with such documents. For example, one or more ads may be served in, under, over, or otherwise in association with an e-mail.
  • the various servers may exchange information via one or more networks 160, such as the Internet for example.
  • the present invention permits features, such as keywords or topics, to be associated with entities, such as Web pages or categories.
  • entities or representatives of entities
  • Such associations may be used for a variety or reasons, such as, for example, targeting ads, suggesting targeting features for an advertisement for presentation to advertisers, automatically generating targeting criteria for an advertisement, etc.
  • features are associated with entities using search engine query logs, search engine referrals, and/or other user actions with respect to documents associated with a search results page.
  • Methods and apparatus consistent with the present invention can improve the effectiveness of marketing campaigns, and can reduce the amount of work (and cost) in running a campaign.
  • FIG. 2 is a bubble diagram illustrating operations 235 that may be performed, and information that may be generated, used, and/or stored, by a document feature generation and/or update system consistent with the present invention, as well as operations for generating information used by such operations 235.
  • operations 235 may accept a document identifier (such as, for example, a URL if the document is a Web page) 220, use the document identifier 220 to obtain query (and/or user action) information 210 associated with the document, and generate and/or update features (and perhaps weights) for the document 260 using the obtained query (and/or user action) information.
  • a document identifier such as, for example, a URL if the document is a Web page
  • document query information lookup operations 230 may use the document identifier 220 to lookup query (and/or user action) information 240 pertaining to the identified document 220 from stored information 210.
  • Document feature (vector) generation/update operations 250 may then use this query (and/or user action) information 240 to generate features (and perhaps weights) 260 associated with the identified document.
  • the document identifier to query (and/or user action) information association information 210 was available. This information may have been generated by the operations illustrated above the dashed line 299.
  • query (and/or user action) logging operations 270 may be used to generate an aggregated log of query to document associations, and perhaps user action (including inaction) to document associations 280.
  • Index inverting operations 290 may be used to generate the document identifier to query (and/or user action) information associations 210 from the aggregated log of query to document associations, and perhaps user action (including inaction) to document associations 280.
  • Figure 3 is a bubble chart illustrating operations that may be used with search operations to associate query terms and selections with documents in a manner consistent with the present invention.
  • search operations 310 use term to document inverted index information 340 and perhaps search ranking information 350 to generate a search results document 330.
  • the document 330 may include one or more search results 360.
  • the document 330 may also include one or more ads 370.
  • the search results 360 and/or ads 370 may be selected as indicated by cursor click 380.
  • query (and/or user action) logging operations 270 may be used to log associations between query information 320 and document identifiers (such as URLs or ad identifiers for example) corresponding to the search results 360 and/or ads 370. These operations 270 may also be used to log associations between user actions (e.g., selections, conversions, dwell time, etc.) and document identifiers (such as URLs or ad identifiers for example) corresponding to the search results 360 and/or ads 370.
  • user actions e.g., selections, conversions, dwell time, etc.
  • document identifiers such as URLs or ad identifiers for example
  • FIG. 4 is a bubble diagram illustrating operations that may be performed, and information that may be generated and/or stored, by document feature generation and/or update system consistent with the present invention.
  • Document feature generation/update operations 420 may use query (and perhaps user action) information to document associations 410 to generate or update features (and perhaps weights) associated with document identifiers 430.
  • indexing operations 440 may use this information 430 to generate an index of document identifiers to (weighted) features association information 450.
  • FIG. 5 is a flow diagram of an exemplary method 500 that may be used to generate and/or update document feature information in a manner consistent with the present invention.
  • a document identifier e.g., a URL of a Web page
  • query information and/or user action information
  • blocks 510 and 520 exemplify a method, consistent with the present invention, which may be used to perform the document query information lookup operations 230 of Figure 2.
  • blocks 530 and 540 exemplify a method, consistent with the present invention, which may be used to perform the document feature generation/update operations 250 of Figure 2.
  • FIG. 6 is a flow diagram of an exemplary method 600 that may be used to generate and/or update document feature information in a manner consistent with the present invention.
  • Query (and perhaps user action) information for a document is accepted.
  • Block 610 If any (weighted) feature information already exists for the document, it may be accepted.
  • Block 620 For example, the method 600 may be used to update already existing document (weighted) feature information. New (weighted) feature information is then determined for the document, or existing (weighted) feature information for the document is updated.
  • the determined and/or updated (weighted) features are then stored in association with the document (Block 640) before the method 600 is left (Node 650).
  • the features may be unigrams and n-grams
  • the document may be a Web page and the document identifier may be a URL of the Web page.
  • the features may be keywords, such as keywords used for targeting ads for example.
  • the features may be concepts, such as concepts used for targeting ads for example.
  • the features may have associated weights in which higher weights indicate features more closely associated with the Web page.
  • the Web page may have an associated weighted feature vector generated and/or updated by embodiments consistent with the present invention.
  • Methods consistent with the present invention may be performed for a number of Web pages.
  • the methods 500 and 600 may be performed for each URL u in plurality of URLs.
  • a plurality of queries Q are retrieved from a plurality of logged queries that returned the URL in a list of search results.
  • the plurality of queries Q may be retrieved from a plurality of logged queries that returned the ad in a set of one or more ads rendered with on the search results page.
  • Features from the queries may be used to populate (and/or update weights of) a feature vector associated with the URL.
  • only information from queries under which a URL selection occurred is used to populate (and/or update weights of) a feature vector associated with the URL.
  • information from all queries that returned the URL in a list of search results is used to populate (and/or update weights of) a feature vector associated with the URL, but a user action is used to weight the features. For example, information from a query that led to a selected URL may be weighted more than information from a query that let to a rendered URL that was not selected. Other user actions may also affect the feature weight. For example, the feature may be weighted more if a long dwell time occurred after selection than if a short dwell time occurred after selection. As another example, the feature may be weighed more if a conversion occurred after selection of a URL than if no conversion occurred after selection of a
  • Different embodiments may select different features associated with the appropriate queries. For example, one embodiment consistent with the present invention may use all exact queries as associated features. As another example, another embodiment consistent with the present invention may use all n-grams from length 1_1 to length 1_2 as associated features (optionally with "stop" words and/or non-content words such as "the” removed). In many cases, there will be a set of features that "best" specify a document. If the features are scored and weighed such that the sum of the weights equals 1.00, one embodiment consistent with the present invention would be to take the features with the best weights until the sum of factors reaches some value (e.g., 0.80).
  • some value e.g. 0.80
  • features with weights less than a predetermine percent (e.g., 20%) of the weight of the best feature could be ignored.
  • Still other embodiments consistent with the present invention ma use some combination of the forgoing concepts (e.g., filtering features using absolute and/or relative weight or score thresholds) to obtain the "best" set features for a given document, or to filter out features without a strong affinity to the document.
  • the (e.g., weighted) features associated with a document may be used in a variety of ways.
  • the features may be used as document relevance information when determining a match (e.g., a similarity) to an ad in a content-relevant ad server such as the one described in the '900 patent application.
  • the features may be used to provide or suggest keywords (e.g., used for an ad where the ad is the document, or wherein a landing page of the ad is the document).
  • FIG. 7 is high-level block diagram of a machine 700 that may perform one or more of the operations discussed above.
  • One or more such machines 700 may be used as a content-relevant ad server, a separate server, client devices, etc.
  • the machine 700 basically includes one or more processors 710, one or more input/output interface units 730, one or more storage devices 720, and one or more system buses and/or networks 740 for facilitating the communication of information among the coupled elements.
  • One or more input devices 732 and one or more output devices 734 may be coupled with the one or more input/output interfaces 730.
  • the one or more processors 710 may execute machine-executable instructions (e.g., C or C++ running on the Solaris operating system available from Sun Microsystems Inc. of
  • At least a portion of the machine executable instructions may be stored (temporarily or more permanently) on the one or more storage devices 720 and/or may be received from an external source via one or more input interface units 730.
  • the machine 700 may be one or more conventional personal computers.
  • the processing units 710 may be one or more microprocessors.
  • the bus 740 may include a system bus.
  • the storage devices 720 may include system memory, such as read only memory (ROM) and/or random access memory (RAM).
  • the storage devices 720 may also include a hard disk drive for reading from and writing to a hard disk, a magnetic disk drive for reading from or writing to a (e.g., removable) magnetic disk, and an optical disk drive for reading from or writing to a removable (magneto-) optical disk such as a compact disk or other (magneto-) optical media.
  • a user may enter commands and information into the personal computer through input devices 732, such as a keyboard and pointing device (e.g., a mouse) for example.
  • Other input devices such as a microphone, a joystick, a game pad, a satellite dish, a scanner, or the like, may also (or alternatively) be included.
  • These and other input devices are often connected to the processing unit(s) 710 through an appropriate interface 730 coupled to the system bus 740.
  • the output devices 734 may include a monitor or other type of display device, which may also be connected to the system bus 740 via an appropriate interface.
  • the personal computer may include other (peripheral) output devices (not shown), such as speakers and printers for example.
  • Each of the ad server 110, the search engine 120, the content server 130, the e-mail server 140, and the user device 150, etc., may be embodied by one or more such machines 700.
  • a feature-to-entity association is accepted or generated.
  • the association is used to generate (e.g., a document with) results.
  • keyword-to-category associations may be used to determine a Web page with selectable category listings in response to a query including the keyword.
  • category-to-ad associations may be used to determine a Web page including one or more ads when a category is selected (or if the Web page has content that pertains to the category).
  • User behavior with respect to the results e.g., selection or not, conversion or not, dwell time, etc.
  • the tracked user behavior may then be used to update (e.g., the weight of, generally referred to as the "score" of) the feature-to-entity association.
  • the keyword-to-first category association may be somewhat strengthened (e.g., due to the user selection), but not too much (e.g., due to the short dwell time and quick return), the keyword-to-second category association may be strengthened to a greater degree (e.g., due to the user selection and long dwell time), and the keyword-to-third category association may be weakened (e.g., due to the fact that the user did not select the third category link).
  • Each of the features may be given a score.
  • the score may be used to determine a weight to assign to the feature, and/or to filter features. For example, a feature with a higher score may receive a higher weight, while a feature with a lower score may receive a lower weight. Weight should be a monotonic function of score, but need not be linear.
  • the score may also be compared with a given (e.g., predetermined) threshold. If the score for the feature is below the threshold, the feature may be removed from association with the document, or it may be weighted to zero.
  • the threshold may be absolute, and/or relative. For example, an absolute threshold might filter out a feature if its score did not exceed a predetermined value, while a relative threshold might filter out a feature that was not one of the top twenty features for the document.
  • the score may be a function of one or more of (a) a frequency of the feature with respect to the document, (b) a user action with respect to the document, (c) feature scores of related or similar documents, (d) total frequency and inverse document frequency of the feature, (e) general performance (e.g., selection rate, conversion rate, etc. across all queries) of the document, etc. Examples of each of these factors are described below. Frequency
  • the feature score may be a function of the frequency of the feature (e.g., generated from query information). More frequent features may be given a higher score for example.
  • the feature score may also be a function of the frequency of selections (e.g., clickthroughs) and/or queries for that term.
  • the feature score may be a function of a user action with respect to the document. For example, if the user selected the document when it was rendered on a search results page to a query, features from the query would be scored higher than if the document were not selected. As another example, if the user competed a transaction at a document when it was rendered on a search results page to a query, features from the query would be scored higher than if the no conversion took place on the document. Dwell time may also be considered.
  • features from the query would be scored higher than if the document were selected but the user only dwelled on the document for short period of time. Indeed, a very short dwell time may be used to discount a score enhanced by the fact that a user selected the document.
  • Documents may be grouped with other documents in various ways. For example, for Web page documents, it may be desirable to combine the analysis for multiple URLs on a Website, for URLs within a directory, URLs on similar topics, linked documents, etc. As a more specific example, all URLs on a Website may be grouped together, and all queries (and user actions) that lead to the Website are used to find features for Web pages of the entire Website. Similar pages may be computed using, for example, TF-IDF.
  • f(S) is a function of the queries and user actions corresponding to URLs within set S.
  • f(S) may factor in the number of occurrences of term t, user selections, and dwell times on the URL or site that the clicked through to.
  • Weights w_l to w_4 allow the contribution of each set to vary.
  • Another improvement is possible by considering the probability of a user action (e.g., selection) for a URL for a term or query.
  • the expected user action e.g., selection
  • the actual user action e.g., selection
  • Features may be weighted according to their user action (e.g., selection) rate, with features that result in user action rates above the average (expected) rate being given higher weights, and features that result in user action rates below the average (expected) given lower weights.
  • the features and/or feature scores associated with a document may be tracked generally, over all users, or may be tracked per user group, or per individual user. That is, it may be desirable to segment the query and user action data for different types of users in order to create different sets of associated features that may subsequently be used with the different types of users. For example, information may be tracked and aggregated per user group (e.g., users within different demographics, users with similar interests, or individual users). For example, a separation by age groups may result in different features being the best associated features for a specific document. Similarly, if detailed information is available for the interests of a user, the associated features may be biased toward the interests of that user, for example by increasing the weight of features in the analysis above according to the weight of those features for the interests of the user.
  • the information associations 210 may be stored and/or accessed, depending on the particular embodiment used.
  • the information associations 210 may include one or more of (i) whether or not the document was selected, (ii) qualitative or quantitative dwell time information, (iii) query frequency, (iv) query parts, (v) document site information, (vi) document directory information, (vii) document group information, (viii) user information, etc.
  • Figure 8 is a diagram illustrating an example of how an exemplary embodiment consistent with the present invention can be used to associate features (such as terms, n-grams, etc.) with entities (such as categories).
  • a query processor 820 returns a document 830 in response to received query information 810.
  • the query information 810 may include search query terms.
  • the document may include one or more of (a) search results 832 including links to documents 840, (b) keyword targeted and/or category targeted ads 834 including links to ad landing pages 850, and (c) category links 836 to pages 860 including category targeted (which may also be keyword targeted) ads.
  • the document 830 may include other links to other types of information as well.
  • a corresponding document 840 is returned (e.g., loaded into a browser of an end user device).
  • a corresponding ad landing page 850 is returned.
  • a corresponding page including one or more category targeted ads 860 is returned.
  • One or more ads with links to ad landing pages may also be provided, for example, below associated category headings or links. If the end user selects one of the ads on document 860, a corresponding ad landing page 850 is returned.
  • a "filtered" version of the document 830 may be rendered.
  • search results 832, keyword and/or category targeted ads 834, and/or category links 836 may be filtered such that they pertain to the selected category.
  • embodiments consistent with the present invention may be used to associate query information 810 with the listed documents, and/or any selected document(s) 840. Such an association may reflect whether or not a document was selected.
  • embodiments consistent with the present invention may be used to associate query information 810 with listed ads, and/or any selected ad(s) 850. Such an association may reflect whether or not an ad was selected.
  • the present invention may be used to associate query information 810 with keywords and/or concepts used to target the serving of the ads 834. Such an association may reflect whether or not an ad was selected.
  • embodiments consistent with the present invention may be used to associate query information 810 with listed categories and/or any selected category(ies). Such an association may reflect whether or not a category was selected. Alternatively, or in addition, such an association may reflect whether or not a category targeted ad on page 860 was selected. Further, the present invention may be used to associate query information 810 with keywords and/or concepts used to target the serving of the ads on page 860.
  • An embodiment in which the document 830 includes category links 836 to a page 860 with one or more category targeted ads may be used, for example, to provide "Yellow Pages" style classification to ads, such as local ads for example.
  • an ad serving system includes the category "plumbers," and one or more advertisers associate their ad campaigns with this "Yellow Page" category.
  • category links 836 include a "Local Plumbers" category link.
  • This keyword to category association may have been derived from the fact that one or more advertisers associated both the keyword target "clogged drain” and the category “Plumbers” with their ads. Alternatively, or in addition, a category may be inferred from a collection of words (e.g., extracted from ad information).)
  • "Local Plumbers" category link indicates an negative correlation between the query information "DIY clogged drain” and the “Local Plumbers” category, while selections (or long dwell times) of the "Local Plumbing Supplies” category indicates an correlation between the query information "DIY clogged drain” and the "Local Plumbing Supplies” category.
  • a page 860 with local plumber ads (which may also be targeted by keywords carried through from the terms of the search query 810) is provided. If the page 860 also includes ads having a strong association to a category (e.g., due to advertiser association), then a similar process, in which it is determined just how strong the association between the advertiser and the category is by observing action or inaction on that advertiser's link, may occur.
  • an ad-category association may be modified depending on a user action with respect to the ad when the category was used to target the serving of the ad(s) on the page 860 (and possibly modified by keywords carried through from the original query 810).
  • At least some embodiments consistent with the present invention may be used to recommend to advertisers that they associate their ad with such categories. For example, such an embodiment may recommend that an advertiser with an ad with the targeting keywords "clogged drain” and "emergency service” associate its ad with the category "Plumber”. Alternatively, such an association may be generated automatically.
  • embodiments consistent with the present invention may be used to assign and/or weight features, such as n-grams, to entities, such as documents or concepts.
  • the assigned features may represent relevance of the document and may be used to target the serving of advertisements with the document.

Abstract

Features that may be used to represent relevance information (e.g., properties, characteristics, etc.) of an entity, such as a document o concept for example, may be associated with the document by accepting an identifier that identifies a document; obtaining search query information (and/or other serving parameter information) related to the document using the document identifier, determining features using the obtained query information (and/or other serving parameter information), and associating the features determined with the document. Weights of such features may be similarly determined. The weights may be determined using scores. The scores may be a function of one or more of whether the document was selected, a user dwell time on a selected document, whether or not a conversion occurred with respect to the document, etc.

Description

ASSOCIATING FEATURES WITH ENTITIES, SUCH AS CATEGORIES OR WEB PAGE DOCUMENTS, AND/OR WEIGHTING SUCH FEATURES
§ 1. BACKGROUND OF THE INVENTION
§ 1.1 FIELD OF THE INVENTION
[0001] The present invention concerns advertising. In particular, the present invention concerns improving targeted advertising.
§ 1.2 BACKGROUND INFORMATION
[0002] Interactive advertising provides opportunities for advertisers to target their ads to a receptive audience. That is, targeted ads are more likely to be useful to end users since the ads may be relevant to a need inferred from some user activity (e.g., relevant to a user's search query to a search engine, relevant to content in a document requested by the user, etc.). Query keyword relevant advertising has been used by search engines. The AdWords advertising system by Google of Mountain View, CA is one example of query keyword relevant advertising. Similarly, content-relevant advertising systems, such as the AdSense advertising system by Google for example, have been used. For example, U.S. Patent Application Serial Numbers 10/314,427 (incorporated herein by reference and referred to as "the '427 application") titled "METHODS AND APPARATUS FOR SERVING RELEVANT ADVERTISEMENTS", filed on December 6, 2002 and listing Jeffrey A. Dean, Georges R. Hank and Paul Bucheit as inventors, and 10/375,900 (incorporated by reference and referred to as "the '900 application") titled "SERVING ADVERTISEMENTS BASED ON CONTENT," filed on February 26, 2003 and listing Darrell Anderson, Paul Bucheit, Alex Carobus, Claire Cui, Jeffrey A. Dean, Georges R. Hank, Deepak Jindal and Narayanan Shivakumar as inventors, describe methods and apparatus for serving ads relevant to the content of a document, such as a Web page for example.
[0003] When ads are to be served using some measure of their relevance to document, relevance information about the document is needed. Such relevance information may be determined from information intrinsic to the document, such as content extracted from the document. For example, concepts or topics may be determined using the content of the document. The document may also be assigned to one or more clusters. (See, e.g., U.S. Provisional Application Serial No. 60/416,144 (incorporated herein by reterence), titled "METHODS AND APPARATUS FOR PROBALISTIC HIERARCHICAL INFERENTIAL LEARNER," filed on October 3, 2003 In another example, feature vectors may be used to represent the occurrence of words and/or phrases in the document. Although such techniques for determining relevance information for documents have worked well, it is desirable to be able to provide additional relevance information, and/or to refine the relevance information to make it more useful.
[0004] Further if ads are to be associated with categories (e.g., for targeting to document categories, for association with categorical listings, etc.) it would be useful to develop and/or test such associations. Similarly, if query terms are to be associated with categories (e.g., for generating a categorized result page in response to a search query), it would be useful to develop and/or test such associations.
[0005] In view of the foregoing, it would be useful to expand and/or refine document and/or category relevance information. More generally, it would be useful to associate features with entities, such as documents, categories, etc. It would also be useful to score (e.g., weight) such associations.
§ 2. SUMMARY OF THE INVENTION
[0006] Embodiments consistent with the present invention may be used to determine features that may be used to represent relevance information (e.g., properties, characteristics, etc.) of an entity, such as a document or category for example. Such features may be determined and associated with the entity by accepting an identifier that identifies the entity, obtaining search query information related to the entity using the entity identifier, determining features using the obtained query information, and associating the features determined with the entity. In at least some embodiments consistent with the present invention, such features may be determined for an entity using query information, and/or perhaps user action information.. In at least some embodiments consistent with the present invention, in addition to, or instead of, query information, other serving parameter information may be used to determine and/or weight features.
[0007] In at least some embodiments consistent with the present invention, weights of such features may be similarly determined. The weights may be determined using scores. In the context of document entities, the scores may be a function of one or more of (i) whether the document was selected, (ii) a user dwell time on a selected document, (iii) whether or not a conversion occurred with respect to the document, (iv) a frequency of queries including the feature, etc.
[0008] In at least some embodiments consistent with the present invention, the document is a Web page. In at least some embodiments consistent with the present invention, the features are n-grams.
[0009] In at least some embodiments consistent with the present invention, the relevance information of the document may be used to target the serving of advertisements with the document. In at lease some other embodiments consistent with the present invention, the features of a category may be used to associate query terms and categories, and/or ads and categories.
[0010] In at least some embodiments consistent with the present invention, a score (e.g., a weight) associated with the feature-to-entity association may be updated by (i) using the feature-to-entity association to generate one or more results for presentation to a user, (ii) tracking user behavior with respect to the results, and (ii) updating the score associated with the feature-to-entity association using the tracked user behavior.
§ 3. BRIEF DESCRIPTION OF THE DRAWINGS
[0011 ] Figure l is a block diagram illustrating an exemplary on-line advertising environment in which, or with which, the present invention may be used.
[0012] Figure 2 is a bubble diagram illustrating operations that may be performed, and information that may be generated, used, and/or stored, by a document feature generation and/or update system consistent with the present invention.
[0013] Figure 3 is a bubble chart illustrating operations that may be used with search operations to associate query terms and selections with documents in a manner consistent with the present invention.
[0014] Figure 4 is a bubble diagram illustrating operations that may be performed, and information that may be generated, used, and/or stored, by a document feature generation and/or update system consistent with the present invention.
[0015] Figure 5 is a flow diagram of an exemplary method that may be used to generate and/or update document feature information in a manner consistent with the present invention.
[0016] Figure 6 is a flow diagram of an exemplary method that may be used to generate and/or update document feature information in a manner consistent with the present invention. [0017] Figure 7 is block diagram of a machine that may perform one or more operations and store information used and/or generated in a manner consistent with the present invention.
[0018] Figure 8 is a diagram illustrating an example of how an exemplary embodiment consistent with present invention can make associations between categories and query terms and/or ads.
§ 4. DETAILED DESCRIPTION
[0019] The present invention may involve novel methods, apparatus, message formats, and/or data structures for associating one or more features with an entity, such as a Web page document, or category for example, and/or applying and/or adjusting a score or weight to at least one of such features. The following description is presented to enable one skilled in the art to make and use the invention, and is provided in the context of particular applications and their requirements. Thus, the following description of embodiments consistent with the present invention provides illustration and description, but is not intended to be exhaustive or to limit the present invention to the precise form disclosed. Various modifications to the disclosed embodiments will be apparent to those skilled in the art, and the general principles set forth below may be applied to other embodiments and applications. For example, although a series of acts may be described with reference to a flow diagram, the order of acts may differ in other implementations when the performance of one act is not dependent on the completion of another act. Further, non-dependent acts may be performed in parallel. No element, act or instruction used in the description should be construed as critical or essential to the present invention unless explicitly described as such. Also, as used herein, the article "a" is intended to include one or more items. Where only one item is intended, the term "one" or similar language is used. Thus, the present invention is not intended to be limited to the embodiments shown and the inventors regard their invention as any patentable subject matter described.
[0020] In the following, definitions of terms that may be used in the specification are provided in § 4.1. Then, environments in which, or with which, the present invention may operate are described in § 4.2. Thereafter, exemplary embodiments consistent with the present invention are described in § 4.3. An example illustrating an operation in an exemplary embodiment consistent with the present invention is provided in §4.4. Finally, some conclusions regarding the present invention are set forth in § 4.5. § 4.1. DEFINITIONS
[0021] Online ads may have various intrinsic features. Such features may be specified by an application and/or an advertiser. These features are referred to as "ad features" below. For example, in the case of a text ad, ad features may include a title line, ad text, and an embedded link. In the case of an image ad, ad features may include images, executable code, and an embedded link. Depending on the type of online ad, ad features may include one or more of the following: text, a link, an audio file, a video file, an image file, executable code, embedded information, etc.
[0022] When an online ad is served, one or more parameters may be used to describe how, when, and/or where the ad was served. These parameters are referred to as "serving parameters" below. Serving parameters may include, for example, one or more of the following: features of (including information on) a document on which, or with which, the ad was served, a search query or search results associated with the serving of the ad, a user characteristic (e.g., their geographic location, the language used by the user, the type of browser used, previous page views, previous behavior), a host or affiliate site (e.g., America Online, Google, Yahoo) that initiated the request, an absolute position of the ad on the page on which it was served, a position (spatial or temporal) of the ad relative to other ads served, an absolute size of the ad, a size of the ad relative to other ads, a color of the ad, a number of other ads served, types of other ads served, time of day served, time of week served, time of year served, etc. Naturally, there are other serving parameters that may be used in the context of the invention.
[0023] Although serving parameters may be extrinsic to ad features, they may be associated with an ad as serving conditions or constraints. When used as serving conditions or constraints, such serving parameters are referred to simply as "serving constraints" (or "targeting criteria"). For example, in some systems, an advertiser may be able to target the serving of its ad by specifying that it is only to be served on weekdays, no lower than a certain position, only to users in a certain location, etc. As another example, in some systems, an advertiser may specify that its ad is to be served only if a page or search query includes certain keywords or phrases. As yet another example, in some systems, an advertiser may specify that its ad is to be served only if a document being served includes certain topics or concepts, or falls under a particular cluster or clusters, or some other classification or classifications.
[0024] "Ad information" may include any combination of ad features, ad serving constraints, information derivable from ad features or ad serving constraints (referred to as "ad derived information"), and/or information related to the ad (referred to as "ad related information"), as well as an extension of such information (e.g., information derived from ad related information).
[0025] The ratio of the number of selections (e.g., clickthroughs) of an ad to the number of impressions of the ad (i.e., the number of times an ad is rendered) is defined as the "selection rate" (or "clickthrough rate") of the ad.
[0026] A "conversion" is said to occur when a user consummates a transaction related to a previously served ad. What constitutes a conversion may vary from case to case and can be determined in a variety of ways. For example, it may be the case that a conversion occurs when a user clicks on an ad, is referred to the advertiser's Website, and consummates a purchase there before leaving that Website. Alternatively, a conversion may be defined as a user being shown an ad, and making a purchase on the advertiser's Website within a predetermined time (e.g., seven days). In yet another alternative, a conversion may be defined by an advertiser to be any measurable/observable user action such as, for example, downloading a white paper, navigating to at least a given depth of a Website, viewing at least a certain number of Web pages, spending at least a predetermined amount of time on a Website or Web page, registering on a Website, etc. Often, if user actions don't indicate a consummated purchase, they may indicate a sales lead, although user actions constituting a conversion are not limited to this. Indeed, many other definitions of what constitutes a conversion are possible.
[0027] The ratio of the number of conversions to the number of impressions of the ad (i.e., the number of times an ad is rendered) is referred to as the "conversion rate." If a conversion is defined to be able to occur within a predetermined time since the serving of an ad, one possible definition of the conversion rate might only consider ads that have been served more than the predetermined time in the past.
[0028] A "document" is to be broadly interpreted to include any machine-readable and machine-storable work product. A document may be a file, a combination of files, one or more files with embedded links to other files, etc. The files may be of any type, such as text, audio, image, video, etc. Parts of a document to be rendered to an end user can be thought of as "content" of the document. A document may include "structured data" containing both content (words, pictures, etc.) and some indication of the meaning of that content (for example, e-mail fields and associated data, HTML tags and associated data, etc.) Ad spots in the document may be defined by embedded information or instructions. In the context of the Internet, a common document is a Web page. Web pages often include content and may include embedded information (such as meta information, hyperlinks, etc.) and/or embedded instructions (such as JavaScript, etc.). In many cases, a document has a unique, addressable, storage location and can therefore be uniquely identified by this addressable location. A universal resource locator (URL) is a unique address used to access information on the Internet.
[0029] "Document information" may include any information included in the document, information derivable from information included in the document (referred to as "document derived information"), and/or information related to the document (referred to as "document related information"), as well as an extensions of such information (e.g., information derived from related information). An example of document derived information is a classification based on textual content of a document. Examples of document related information include document information from other document(s) with links to the instant document, as well as document information from other document(s) to which the instant document links and document information from other document(s) related to the instant document.
[0030] Content from a document may be rendered on a "content rendering application or device". Examples of content rendering applications or devices include an Internet browser (e.g., Explorer or Netscape), a media player (e.g., an MP3 player, a Realnetworks streaming audio file player, etc.), a viewer (e.g., an Abobe Acrobat pdf reader), etc.
[0031 ] A "content owner" is a person or entity that has some property right in the content of a document. A content owner may be an author of the content. In addition, or alternatively, a content owner may have rights to reproduce the content, rights to prepare derivative works of the content, rights to display or perform the content publicly, and/or other proscribed rights in the content. Although a content server might be a content owner in the content of the documents it serves, this is not necessary.
[0032] "User information" may include user behavior information and/or user profile information.
[0033] "E-mail information" may include any information included in an e-mail (also referred to as "internal e-mail information"), information derivable from information included in the e-mail and/or information related to the e-mail, as well as extensions of such information (e.g., information derived from related information). An example of information derived from e-mail information is information extracted or otherwise derived from search results returned in response to a search query composed of terms extracted from an e-mail subject line. Examples of information related to e-mail information include e-mail information about one or more other e-mails sent by the same sender of a given e-mail, or user information about an e-mail recipient. Information derived from or related to e-mail information may be referred to as "external e-mail information." § 4.2 ENVIRONMENTS IN WHICH, OR WITH WHICH, THE PRESENT INVENTION MAY OPERATE
[0034] Figure 1 illustrates an exemplary environment 100 in which, or with which, the present invention may be used. A user device (also referred to as a "client" or "client device") 150 may include a browser facility (such as the Explorer browser from Microsoft, the Opera Web Browser from Opera Software of Norway, the Navigator browser from AOL/Time Warner, etc.), an e-mail facility (e.g., Outlook from Microsoft), or any other software application or hardware device used to render content. A search engine 120 may permit user devices 150 to search collections of documents (e.g., Web pages). A content server 130 may permit user devices 150 to access (e.g., for rendering) documents. An e-mail server (such as Hotmail from Microsoft Network, Yahoo Mail, GMail from Google, etc.) 140 may be used to provide e-mail functionality to user devices 150. An ad server 110 may be used to serve ads to user devices 150. The ads may be served in association with search results provided by the search engine 120. Content-relevant ads may be served in association with content provided by the content server 130, and/or e-mail supported by the e-mail server 140 and/or user device 150 e-mail facilities. Thus, the ad server 110 may be a content-relevant ad server, such as those described in the '427 and '900 applications introduced above.
[0035] As discussed in the '900 application (introduced above), ads may be targeted to documents served by content servers. Thus, a content server 130 that receives requests for documents (e.g., articles, discussion threads, music, video, graphics, search results, Web page listings, etc.), and retrieves the requested document in response to, or otherwise services, the request may consume ads. The content server 130 may submit a request for ads to the ad server 110. Alternatively, or in addition, a user device 150 may submit such a request. Alternatively, or in addition, a Web-based e-mail server 140 may submit such a request. Such an ad request may include a number of ads desired. The ad request may also include document request information. This information may include the document itself (e.g., a Web page), a category or topic corresponding to the content of the document or the document request (e.g., arts, business, computers, arts-movies, arts-music, etc.), part or all of the document request, content age, content type (e.g., text, graphics, video, audio, mixed media, etc.), geolocation information, end user local time information, document information (such as document features for example), etc.
[0036] The content server 130, Web-based e-mail server 140, and/or user device 150 may combine the requested document with one or more of the advertisements provided by the ad server 110. This combined information including the document content and advertisement(s) is then forwarded towards, and/or rendered on, the end user device 150 that requested the document, for presentation to the user. Alternatively, or in addition, the ad(s) may be combined with, or rendered with, the requested document in some other way (e.g., by the client device). Finally, the content server 130 or Web-based e-mail server 140 may transmit information about the ads and how, when, and/or where the ads are to be rendered (e.g., position, clickthrough or not, impression time, impression date, size, conversion or not, etc.) back to the ad server 110. Alternatively, or in addition, such information may be provided back to the ad server 110 by some other means. Consistent with the present invention, the ad server 110 may store ad performance information.
[0037] A search engine 120 may receive queries for search results and may consume ads. In response, the search engine may retrieve relevant search results (e.g., from an index of Web pages). An exemplary search engine is described in the article S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Search Engine," Seventh International World Wide Web Conference, Brisbane, Australia and in U.S. Patent No. 6,285,999 (both incorporated herein by reference). Such search results may include, for example, lists of Web page titles, snippets of text extracted from those Web pages, and hypertext links to those Web pages, and may be grouped into a predetermined number of (e.g., ten) search results.
[0038] The search engine 120 may submit a request for ads to the ad server 110. The request may include a number of ads desired. This number may depend on the search results, the amount of screen or page space occupied by the search results, the size and shape of the ads, etc. In one embodiment, the number of desired ads will be from one to ten, and preferably from three to five. The request for ads may also include the query (as entered or parsed), information based on the query (such as end user local time information, geolocation information, whether the query came from an affiliate and an identifier of such an affiliate), and/or information associated with, or based on, the search results. Such information may include, for example, identifiers related to the search results (e.g., document identifiers or "docIDs"), scores related to the search results (e.g., information retrieval ("IR") scores such as dot products of feature vectors corresponding to a query and a document, Page Rank scores, and/or combinations of IR scores and Page Rank scores), snippets of text extracted from identified documents (e.g., Web pages), full text of identified documents, topics of identified documents, feature vectors of identified documents, etc.
[0039] The search engine 120 may combine the search results with one or more of the advertisements provided by the ad server 110. Alternatively, or in addition, the ad(s) may be combined with, or rendered with, the requested document in some other way (e.g., by the client device). This combined information including the search results and advertisement(s) is then forwarded towards the user that submitted the search, for presentation to the user. Preferably, the search results are maintained as distinct from the ads, so as not to confuse the user between paid advertisements and presumably neutral search results.
[0040] Finally, the search engine 120 may transmit information about the ad and when (e.g., end user local time), where (e.g., geolocation), and/or how the ad was to be rendered (e.g., position, click-through or not, impression time, impression date, size, conversion or not, etc.) back to the ad server 110. Alternatively, or in addition, such information may be provided back to the ad server 110 by some other means. Consistent with the present invention, the search engine 120 may also associate search query information (and/or other serving parameter information) with the documents associated with search results, documents associated with ads, and/or ads. The search engine 120 may also associate the search query information with user actions (e.g., selections, dwell time, etc.) with respect to the documents linked from the search result pages, and/or user actions (e.g., selections, conversions, etc.) with respect to the ads rendered with the search results pages.
[0041 ] The Web-based e-mail server 140 may be thought of, generally, as a content server in which a document served is simply an e-mail. Further, e-mail applications (such as Microsoft Outlook for example) may be used to send and/or receive e-mail. Therefore, a Web-based e-mail server 140 or a client device 150 application may be thought of as an ad consumer. Thus, e-mails may be thought of as documents, and targeted ads may be served in association with such documents. For example, one or more ads may be served in, under, over, or otherwise in association with an e-mail.
[0042] The various servers may exchange information via one or more networks 160, such as the Internet for example.
§ 4.3 EXEMPLARY EMBODIMENTS
§ 4.3.1 OVERVIEW
[0043] The present invention permits features, such as keywords or topics, to be associated with entities, such as Web pages or categories. (Generally, entities (or representatives of entities) can be put on a result page, and can be acted on by users.) Such associations may be used for a variety or reasons, such as, for example, targeting ads, suggesting targeting features for an advertisement for presentation to advertisers, automatically generating targeting criteria for an advertisement, etc. In some embodiments consistent with the present invention, features are associated with entities using search engine query logs, search engine referrals, and/or other user actions with respect to documents associated with a search results page. Methods and apparatus consistent with the present invention can improve the effectiveness of marketing campaigns, and can reduce the amount of work (and cost) in running a campaign.
[0044] Figure 2 is a bubble diagram illustrating operations 235 that may be performed, and information that may be generated, used, and/or stored, by a document feature generation and/or update system consistent with the present invention, as well as operations for generating information used by such operations 235. As shown, operations 235 may accept a document identifier (such as, for example, a URL if the document is a Web page) 220, use the document identifier 220 to obtain query (and/or user action) information 210 associated with the document, and generate and/or update features (and perhaps weights) for the document 260 using the obtained query (and/or user action) information. More specifically, document query information lookup operations 230 may use the document identifier 220 to lookup query (and/or user action) information 240 pertaining to the identified document 220 from stored information 210. Document feature (vector) generation/update operations 250 may then use this query (and/or user action) information 240 to generate features (and perhaps weights) 260 associated with the identified document.
[0045] In the foregoing example, it was assumed that the document identifier to query (and/or user action) information association information 210 was available. This information may have been generated by the operations illustrated above the dashed line 299. For example, query (and/or user action) logging operations 270 may be used to generate an aggregated log of query to document associations, and perhaps user action (including inaction) to document associations 280. Index inverting operations 290 may be used to generate the document identifier to query (and/or user action) information associations 210 from the aggregated log of query to document associations, and perhaps user action (including inaction) to document associations 280.
[0046] Figure 3 is a bubble chart illustrating operations that may be used with search operations to associate query terms and selections with documents in a manner consistent with the present invention. In response to a search query 320, search operations 310 use term to document inverted index information 340 and perhaps search ranking information 350 to generate a search results document 330. The document 330 may include one or more search results 360. The document 330 may also include one or more ads 370. The search results 360 and/or ads 370 may be selected as indicated by cursor click 380. Referring back to Figure 2, query (and/or user action) logging operations 270 may be used to log associations between query information 320 and document identifiers (such as URLs or ad identifiers for example) corresponding to the search results 360 and/or ads 370. These operations 270 may also be used to log associations between user actions (e.g., selections, conversions, dwell time, etc.) and document identifiers (such as URLs or ad identifiers for example) corresponding to the search results 360 and/or ads 370.
[0047] Although performance is improved when an index is used, such an index is not required. For example, features (and perhaps weights) for a document may be derived directly from query (and perhaps user actions) associated with the document. Figure 4 is a bubble diagram illustrating operations that may be performed, and information that may be generated and/or stored, by document feature generation and/or update system consistent with the present invention. Document feature generation/update operations 420 may use query (and perhaps user action) information to document associations 410 to generate or update features (and perhaps weights) associated with document identifiers 430. Although not necessary, indexing operations 440 may use this information 430 to generate an index of document identifiers to (weighted) features association information 450.
§ 4.3.2 EXEMPLARY METHODS
[0048] Figure 5 is a flow diagram of an exemplary method 500 that may be used to generate and/or update document feature information in a manner consistent with the present invention. A document identifier (e.g., a URL of a Web page) is accepted (Block 510) and query information (and/or user action information) associated with the identified document is obtained (Block 520). As indicated by bracket 230', blocks 510 and 520 exemplify a method, consistent with the present invention, which may be used to perform the document query information lookup operations 230 of Figure 2. Then, features and/or weights are generated using the obtained query information (and/or user action information) (Block 530), and the features, perhaps weighted features, are stored in association with the document (Block 540) before the method 500 is left (Node 550). As indicated by bracket 250', blocks 530 and 540 exemplify a method, consistent with the present invention, which may be used to perform the document feature generation/update operations 250 of Figure 2.
[0049] Figure 6 is a flow diagram of an exemplary method 600 that may be used to generate and/or update document feature information in a manner consistent with the present invention. Query (and perhaps user action) information for a document is accepted. (Block 610) If any (weighted) feature information already exists for the document, it may be accepted. (Block 620) For example, the method 600 may be used to update already existing document (weighted) feature information. New (weighted) feature information is then determined for the document, or existing (weighted) feature information for the document is updated. (Block 630) The determined and/or updated (weighted) features are then stored in association with the document (Block 640) before the method 600 is left (Node 650).
[0050] In one embodiment consistent with the present invention, the features may be unigrams and n-grams, the document may be a Web page and the document identifier may be a URL of the Web page. Alternatively, or in addition, the features may be keywords, such as keywords used for targeting ads for example. Alternatively, or in addition, the features may be concepts, such as concepts used for targeting ads for example. The features may have associated weights in which higher weights indicate features more closely associated with the Web page. Thus, the Web page may have an associated weighted feature vector generated and/or updated by embodiments consistent with the present invention.
[0051 ] Methods consistent with the present invention, such as the methods 500 and 600 may be performed for a number of Web pages. Thus, the methods 500 and 600 may be performed for each URL u in plurality of URLs. In an exemplary embodiment, a plurality of queries Q are retrieved from a plurality of logged queries that returned the URL in a list of search results. (Note that if the document is an ad, or a Web page linked from an ad, the plurality of queries Q may be retrieved from a plurality of logged queries that returned the ad in a set of one or more ads rendered with on the search results page. Features from the queries may be used to populate (and/or update weights of) a feature vector associated with the URL. In one embodiment, only information from queries under which a URL selection occurred is used to populate (and/or update weights of) a feature vector associated with the URL. In yet another embodiment, information from all queries that returned the URL in a list of search results is used to populate (and/or update weights of) a feature vector associated with the URL, but a user action is used to weight the features. For example, information from a query that led to a selected URL may be weighted more than information from a query that let to a rendered URL that was not selected. Other user actions may also affect the feature weight. For example, the feature may be weighted more if a long dwell time occurred after selection than if a short dwell time occurred after selection. As another example, the feature may be weighed more if a conversion occurred after selection of a URL than if no conversion occurred after selection of a
URL. [0052] Different embodiments may select different features associated with the appropriate queries. For example, one embodiment consistent with the present invention may use all exact queries as associated features. As another example, another embodiment consistent with the present invention may use all n-grams from length 1_1 to length 1_2 as associated features (optionally with "stop" words and/or non-content words such as "the" removed). In many cases, there will be a set of features that "best" specify a document. If the features are scored and weighed such that the sum of the weights equals 1.00, one embodiment consistent with the present invention would be to take the features with the best weights until the sum of factors reaches some value (e.g., 0.80). In an alternative embodiment consistent with the present invention, features with weights less than a predetermine percent (e.g., 20%) of the weight of the best feature could be ignored. Still other embodiments consistent with the present invention ma use some combination of the forgoing concepts (e.g., filtering features using absolute and/or relative weight or score thresholds) to obtain the "best" set features for a given document, or to filter out features without a strong affinity to the document.
[0053] The (e.g., weighted) features associated with a document may be used in a variety of ways. For example, the features may be used as document relevance information when determining a match (e.g., a similarity) to an ad in a content-relevant ad server such as the one described in the '900 patent application. As another example, the features may be used to provide or suggest keywords (e.g., used for an ad where the ad is the document, or wherein a landing page of the ad is the document).
§ 4.3.3 EXEMPLARY APPARATUS
[0054] Figure 7 is high-level block diagram of a machine 700 that may perform one or more of the operations discussed above. One or more such machines 700 may be used as a content-relevant ad server, a separate server, client devices, etc. The machine 700 basically includes one or more processors 710, one or more input/output interface units 730, one or more storage devices 720, and one or more system buses and/or networks 740 for facilitating the communication of information among the coupled elements. One or more input devices 732 and one or more output devices 734 may be coupled with the one or more input/output interfaces 730.
[0055] The one or more processors 710 may execute machine-executable instructions (e.g., C or C++ running on the Solaris operating system available from Sun Microsystems Inc. of
Palo Alto, California or the Linux operating system widely available from a number of vendors such as Red Hat, Inc. of Durham, North Carolina) to effect one or more aspects of the present invention. At least a portion of the machine executable instructions may be stored (temporarily or more permanently) on the one or more storage devices 720 and/or may be received from an external source via one or more input interface units 730.
[0056] In one embodiment, the machine 700 may be one or more conventional personal computers. In this case, the processing units 710 may be one or more microprocessors. The bus 740 may include a system bus. The storage devices 720 may include system memory, such as read only memory (ROM) and/or random access memory (RAM). The storage devices 720 may also include a hard disk drive for reading from and writing to a hard disk, a magnetic disk drive for reading from or writing to a (e.g., removable) magnetic disk, and an optical disk drive for reading from or writing to a removable (magneto-) optical disk such as a compact disk or other (magneto-) optical media.
[0057] A user may enter commands and information into the personal computer through input devices 732, such as a keyboard and pointing device (e.g., a mouse) for example. Other input devices such as a microphone, a joystick, a game pad, a satellite dish, a scanner, or the like, may also (or alternatively) be included. These and other input devices are often connected to the processing unit(s) 710 through an appropriate interface 730 coupled to the system bus 740. The output devices 734 may include a monitor or other type of display device, which may also be connected to the system bus 740 via an appropriate interface. In addition to (or instead of) the monitor, the personal computer may include other (peripheral) output devices (not shown), such as speakers and printers for example.
[0058] Each of the ad server 110, the search engine 120, the content server 130, the e-mail server 140, and the user device 150, etc., may be embodied by one or more such machines 700.
§ 4.3.4 REFINEMENTS AND ALTERNATIVES
[0059] Although the method 600 of Figure 6 was described in the context of determining and/or updating (e.g., the weight of) unigram or n-gram to document associations, embodiments consistent with the present invention may be used to determine and/or update (the weight of) other feature-to-entity associations (e.g., keyword-to-category associations, category-to-ad associations, etc.). First, a feature-to-entity association is accepted or generated. Then, the association is used to generate (e.g., a document with) results. For example, keyword-to-category associations may be used to determine a Web page with selectable category listings in response to a query including the keyword. As another example, category-to-ad associations may be used to determine a Web page including one or more ads when a category is selected (or if the Web page has content that pertains to the category). User behavior with respect to the results (e.g., selection or not, conversion or not, dwell time, etc.) may be tracked. The tracked user behavior may then be used to update (e.g., the weight of, generally referred to as the "score" of) the feature-to-entity association.
[0060] Thus, suppose for example that three keyword-to-category associations were used to generate a Webpage with a three selectable category links. Suppose further that the user selected the first category link but quickly returned. Now suppose that the user selected the second category link and dwelled on the linked page. Finally, suppose that the user did not select the third category link. The keyword-to-first category association may be somewhat strengthened (e.g., due to the user selection), but not too much (e.g., due to the short dwell time and quick return), the keyword-to-second category association may be strengthened to a greater degree (e.g., due to the user selection and long dwell time), and the keyword-to-third category association may be weakened (e.g., due to the fact that the user did not select the third category link).
[0061 ] Refinements of, and alternatives to, the embodiments described above are possible. Each of the features may be given a score. The score may be used to determine a weight to assign to the feature, and/or to filter features. For example, a feature with a higher score may receive a higher weight, while a feature with a lower score may receive a lower weight. Weight should be a monotonic function of score, but need not be linear. The score may also be compared with a given (e.g., predetermined) threshold. If the score for the feature is below the threshold, the feature may be removed from association with the document, or it may be weighted to zero. The threshold may be absolute, and/or relative. For example, an absolute threshold might filter out a feature if its score did not exceed a predetermined value, while a relative threshold might filter out a feature that was not one of the top twenty features for the document.
[0062] The score may be a function of one or more of (a) a frequency of the feature with respect to the document, (b) a user action with respect to the document, (c) feature scores of related or similar documents, (d) total frequency and inverse document frequency of the feature, (e) general performance (e.g., selection rate, conversion rate, etc. across all queries) of the document, etc. Examples of each of these factors are described below. Frequency
[0063] The feature score may be a function of the frequency of the feature (e.g., generated from query information). More frequent features may be given a higher score for example. The feature score may also be a function of the frequency of selections (e.g., clickthroughs) and/or queries for that term.
User Action
[0064] The feature score may be a function of a user action with respect to the document. For example, if the user selected the document when it was rendered on a search results page to a query, features from the query would be scored higher than if the document were not selected. As another example, if the user competed a transaction at a document when it was rendered on a search results page to a query, features from the query would be scored higher than if the no conversion took place on the document. Dwell time may also be considered. For example, if the user selects and dwells on the document for a long period of time when it was rendered on a search results page to a query, features from the query would be scored higher than if the document were selected but the user only dwelled on the document for short period of time. Indeed, a very short dwell time may be used to discount a score enhanced by the fact that a user selected the document.
Feature Scores of Related or Similar Documents
[0065] Since there may be few queries and/or user actions (e.g., selections, conversions, etc.) for some documents, it may be desirable to group documents together and treat them collectively, applying features and weights or scores across more than one document of the group. Documents may be grouped with other documents in various ways. For example, for Web page documents, it may be desirable to combine the analysis for multiple URLs on a Website, for URLs within a directory, URLs on similar topics, linked documents, etc. As a more specific example, all URLs on a Website may be grouped together, and all queries (and user actions) that lead to the Website are used to find features for Web pages of the entire Website. Similar pages may be computed using, for example, TF-IDF.
[0066] Consider URL u, a set of other URLs within the same directory of the Website S_l, a set of all URLs on the same Website S_2, and a set of all URLs with similar content S_3. Consider n-gram features T within queries that resulted in a clickthrough event to the URL u. A score S_t can be assigned for each term t in T, for example, as follows: S_t = w_l * f(S_l) + w_2 * f(S_2) + w_3 * f(S_3) + w_4 * f(u)
where f(S) is a function of the queries and user actions corresponding to URLs within set S. For example, as above, f(S) may factor in the number of occurrences of term t, user selections, and dwell times on the URL or site that the clicked through to. Weights w_l to w_4 allow the contribution of each set to vary.
[0067] Another improvement is possible by considering the probability of a user action (e.g., selection) for a URL for a term or query. In this case, the expected user action (e.g., selection) can be compared based on the position of a URL in the result list, with the actual user action (e.g., selection). Features may be weighted according to their user action (e.g., selection) rate, with features that result in user action rates above the average (expected) rate being given higher weights, and features that result in user action rates below the average (expected) given lower weights.
Levels of Tracking
[0068] The features and/or feature scores associated with a document may be tracked generally, over all users, or may be tracked per user group, or per individual user. That is, it may be desirable to segment the query and user action data for different types of users in order to create different sets of associated features that may subsequently be used with the different types of users. For example, information may be tracked and aggregated per user group (e.g., users within different demographics, users with similar interests, or individual users). For example, a separation by age groups may result in different features being the best associated features for a specific document. Similarly, if detailed information is available for the interests of a user, the associated features may be biased toward the interests of that user, for example by increasing the weight of features in the analysis above according to the weight of those features for the interests of the user.
Data Structures
[0069] Referring back to Figure 2, different information associations 210 may be stored and/or accessed, depending on the particular embodiment used. For example, the information associations 210 may include one or more of (i) whether or not the document was selected, (ii) qualitative or quantitative dwell time information, (iii) query frequency, (iv) query parts, (v) document site information, (vi) document directory information, (vii) document group information, (viii) user information, etc. Features
[0070] Instead of, or in addition to, search query information corresponding to a document, other serving parameters, such as those listed in § 4.1. above for example, may be used to assign and/or weight features.
§ 4.4 OPERATIONAL EXAMPLE OF AN EXEMPLARY EMBODIMENT
[0071 ] Figure 8 is a diagram illustrating an example of how an exemplary embodiment consistent with the present invention can be used to associate features (such as terms, n-grams, etc.) with entities (such as categories). As shown, in this exemplary embodiment, a query processor 820 returns a document 830 in response to received query information 810. The query information 810 may include search query terms. The document may include one or more of (a) search results 832 including links to documents 840, (b) keyword targeted and/or category targeted ads 834 including links to ad landing pages 850, and (c) category links 836 to pages 860 including category targeted (which may also be keyword targeted) ads. The document 830 may include other links to other types of information as well.
[0072] Upon end user selection of one of the search result links 832, a corresponding document 840 is returned (e.g., loaded into a browser of an end user device). Upon end user selection of one of the ad links 834, a corresponding ad landing page 850 is returned. Finally, upon end user selection of one of the category links 836, a corresponding page including one or more category targeted ads 860 is returned. One or more ads with links to ad landing pages may also be provided, for example, below associated category headings or links. If the end user selects one of the ads on document 860, a corresponding ad landing page 850 is returned.
[0073] In at least some alternative embodiments consistent with the present invention, if an end user selects one of the category links 836, a "filtered" version of the document 830 may be rendered. In such a "filtered" version of the document 830, search results 832, keyword and/or category targeted ads 834, and/or category links 836 may be filtered such that they pertain to the selected category.
[0074] In the case where search results 832 are returned, embodiments consistent with the present invention may be used to associate query information 810 with the listed documents, and/or any selected document(s) 840. Such an association may reflect whether or not a document was selected. [0075] In the case where keyword targeted and/or category targeted ads 834 are returned, embodiments consistent with the present invention may be used to associate query information 810 with listed ads, and/or any selected ad(s) 850. Such an association may reflect whether or not an ad was selected. Further, the present invention may be used to associate query information 810 with keywords and/or concepts used to target the serving of the ads 834. Such an association may reflect whether or not an ad was selected.
[0076] In the case where category links 836 are returned, embodiments consistent with the present invention may be used to associate query information 810 with listed categories and/or any selected category(ies). Such an association may reflect whether or not a category was selected. Alternatively, or in addition, such an association may reflect whether or not a category targeted ad on page 860 was selected. Further, the present invention may be used to associate query information 810 with keywords and/or concepts used to target the serving of the ads on page 860.
[0077] An embodiment in which the document 830 includes category links 836 to a page 860 with one or more category targeted ads may be used, for example, to provide "Yellow Pages" style classification to ads, such as local ads for example. As a more specific example, suppose that an ad serving system includes the category "plumbers," and one or more advertisers associate their ad campaigns with this "Yellow Page" category.
[0078] Suppose further that when an end user enters the query 810 "clogged drain," category links 836 include a "Local Plumbers" category link. (This keyword to category association may have been derived from the fact that one or more advertisers associated both the keyword target "clogged drain" and the category "Plumbers" with their ads. Alternatively, or in addition, a category may be inferred from a collection of words (e.g., extracted from ad information).)
[0079] If the end user then selects the "Local Plumbers" category link, they are provided with a page 860 containing one or more ads from local plumber advertisers. Embodiments consistent with the present invention may create an association, or reinforce an existing association, between the feature "clogged drain" and the entity "category=Plumbers."
[0080] Now suppose that an end user enters the query 810 "DIY clogged drain" and that a document 830 with category links 836 including the "Local Plumbers" category link is provided. However, suppose that the end user does not select the "Local Plumbers" category link because do-it-yourselfers won't usually hire a plumber. Suppose instead that the user selects a "Local Plumbing Supplies" category link 836. Lack of selections (or short dwell times) of the
"Local Plumbers" category link indicates an negative correlation between the query information "DIY clogged drain" and the "Local Plumbers" category, while selections (or long dwell times) of the "Local Plumbing Supplies" category indicates an correlation between the query information "DIY clogged drain" and the "Local Plumbing Supplies" category.
[0081 ] In at least some embodiments consistent with the present invention, when the "Local Plumbers" category link 836 is selected, a page 860 with local plumber ads (which may also be targeted by keywords carried through from the terms of the search query 810) is provided. If the page 860 also includes ads having a strong association to a category (e.g., due to advertiser association), then a similar process, in which it is determined just how strong the association between the advertiser and the category is by observing action or inaction on that advertiser's link, may occur. That is, an ad-category association may be modified depending on a user action with respect to the ad when the category was used to target the serving of the ad(s) on the page 860 (and possibly modified by keywords carried through from the original query 810).
[0082] As such information is gathered and analyzed, a strong affinity between "clogged drain" and the "Yellow Pages" category "Plumbers" (as long as the term "diy" is not included) is learned.
[0083] The fact that some advertisers who indicate that they are "plumbers" (e.g., by associating their ad with the category plumbers) may have ads that aren't selected much (or dwelled on) may be learned. Using such information, an ad serving system may cease to provide such ads in a page 860 linked from the category link 836 "Local Plumbers". Alternatively, in an ad serving system in which ads are scored, the scores of such ads may be reduced.
[0084] Finally, for ads without an associated category (and even for ads with an associated category), if there is a strong association (e.g., correlation) between such ads and one or more categories, at least some embodiments consistent with the present invention may be used to recommend to advertisers that they associate their ad with such categories. For example, such an embodiment may recommend that an advertiser with an ad with the targeting keywords "clogged drain" and "emergency service" associate its ad with the category "Plumber". Alternatively, such an association may be generated automatically.
§ 4.5 CONCLUSIONS
[0085] As can be appreciated by the foregoing, embodiments consistent with the present invention may be used to assign and/or weight features, such as n-grams, to entities, such as documents or concepts. The assigned features may represent relevance of the document and may be used to target the serving of advertisements with the document.

Claims

WHAT IS CLAIMED IS:
1. A computer-implemented method comprising: a) obtaining serving information related to a document; b) determining features using the obtained serving information ; and c) associating the features determined with the document.
2. The computer-implemented method of claim 1 further comprising: d) determining whether or not to serve an ad with the document using the features associated with the document.
3. The computer-implemented method of claim 1 wherein the serving information related to the document includes information from at least one past query that caused the rendering of information of the document on a search results list.
4. The computer-implemented method of claim 3 wherein the serving information related to the document includes whether or not the rendered information of the document was selected.
5. The computer-implemented method of claim 3 wherein the serving information related to the document includes a time that the user dwelled on the document after selecting the rendered information of the document.
6. The computer-implemented method of claim 3 wherein the document is a Web page.
7. The computer-implemented method of claim 6 wherein the serving information related to the document includes information from at least one past query that caused the rendering of information of the document on a search results list.
8. The computer-implemented method of claim 6 wherein the serving information related to the document includes whether or not the rendered information of the document was selected.
9. The computer-implemented method of claim 6 wherein the serving information related to the document includes a time that the user dwelled on the document after selecting the rendered information of the document.
10. The computer-implemented method of claim 1 further comprising: e) obtaining user action information related to the document using the document identifier; f) determining scores using the user action information; and g) assigning weights to the features using the scores determined.
11. The computer-implemented method of claim 10 wherein each of the weights is a monotonic function of an associated one of the scores.
12. The computer-implemented method of claim 11 wherein the user action is a dwell time after a selection, and wherein the score is higher for a longer dwell time than for a shorter dwell time.
13. The computer-implemented method of claim 11 wherein the user action is selection, and wherein the score is higher for a selection than for a non-selection.
14. The computer-implemented method of claim 11 wherein the user action is conversion, and wherein the score is higher for a conversion than for a non -conversion.
15. The computer-implemented method of claim 1 further comprising: e) determining scores using the serving information ; and f) assigning weights to the features using the scores determined.
16. The computer-implemented method of claim 15 wherein the score for a feature is determined using a frequency of the feature in the serving information .
17. The computer-implemented method of claim 15 wherein the score for a feature is determined using an inverse frequency of the feature in serving information for a collection of documents.
18. The computer-implemented method of claim 1 further comprising: e) obtaining user action information related to the document using the document identifier; f) determining scores using both the serving information and the user action information; and g) assigning weights to the features using the scores determined.
19. The computer-implemented method of claim 18 wherein each of the weights is a monotonic function of an associated one of the scores.
20. The computer-implemented method of claim 1 further comprising: e) determining scores using at least one of (A) the serving information and (B) user action information related to the document; and f) filtering the features using the scores determined.
21. The computer-implemented method of claim 20 further comprising: g) assigning weights to the features using the scores determined.
22. The computer-implemented method of claim 1 wherein at least one of the features is an n-gram.
23. The computer-implemented method of claim 1 wherein at least one of the features is a keyword.
24. The computer-implemented method of claim 1 wherein at least one of the features is a concept.
25. The computer-implemented method of claim 1 wherein the serving information related to the document is obtained using an accepted document identifier.
26. The computer-implemented method of claim 25 wherein the document identifier is a universal resource locator.
27. A computer-implemented method comprising: a) accepting a feature-to-entity association; b) using the feature-to-entity association to generate one or more results for presentation to a user; c) tracking user behavior with respect to the results; and d) updating a score associated with the feature-to-entity association using the tracked user behavior.
28. The computer-implemented method of claim 27 wherein the feature-to-entity association is a keyword-to-category association.
29. The computer-implemented method of claim 28 wherein the one or more results generated include one or more category listings provided on a document.
30. The computer-implemented method of claim 27 wherein the feature-to-entity association is a category-to-ad association.
31. The computer-implemented method of claim 30 wherein the one or more results generated include one or more category targeted ads provided on a document.
32. The computer-implemented method of claim 27 wherein the user behavior includes whether or not a user selects a result.
33. The computer-implemented method of claim 27 wherein the user behavior includes whether or not a user converts on a result.
34. Apparatus comprising: a) means for obtaining serving information related to a document; b) means for determining features using the obtained serving information; and c) means for associating the features determined with the document.
35. Apparatus comprising: a) means for accepting a feature-to-entity association; b) means for using the feature-to-entity association to generate one or more results for presentation to a user; c) means for tracking user behavior with respect to the results; and d) means for updating a score associated with the feature-to-entity association using the tracked user behavior.
PCT/US2005/046194 2004-12-30 2005-12-21 Associating features with entities, such as categories or web page documents, and/or weighting such features WO2006073810A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP05854841A EP1839203A4 (en) 2004-12-30 2005-12-21 Associating features with entities, such as categories or web page documents, and/or weighting such features
AU2005323159A AU2005323159B2 (en) 2004-12-30 2005-12-21 Associating features with entities, such as categories or web page documents, and/or weighting such features
CA2592741A CA2592741C (en) 2004-12-30 2005-12-21 Associating features with entities, such as categories or web page documents, and/or weighting such features

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/026,497 US20060149710A1 (en) 2004-12-30 2004-12-30 Associating features with entities, such as categories of web page documents, and/or weighting such features
US11/026,497 2004-12-30

Publications (2)

Publication Number Publication Date
WO2006073810A2 true WO2006073810A2 (en) 2006-07-13
WO2006073810A3 WO2006073810A3 (en) 2007-02-22

Family

ID=36641892

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/046194 WO2006073810A2 (en) 2004-12-30 2005-12-21 Associating features with entities, such as categories or web page documents, and/or weighting such features

Country Status (5)

Country Link
US (2) US20060149710A1 (en)
EP (1) EP1839203A4 (en)
AU (1) AU2005323159B2 (en)
CA (1) CA2592741C (en)
WO (1) WO2006073810A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9092504B2 (en) 2012-04-09 2015-07-28 Vivek Ventures, LLC Clustered information processing and searching with structured-unstructured database bridge

Families Citing this family (74)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7428497B2 (en) 2003-10-06 2008-09-23 Utbk, Inc. Methods and apparatuses for pay-per-call advertising in mobile/wireless applications
US7421441B1 (en) * 2005-09-20 2008-09-02 Yahoo! Inc. Systems and methods for presenting information based on publisher-selected labels
US7958115B2 (en) * 2004-07-29 2011-06-07 Yahoo! Inc. Search systems and methods using in-line contextual queries
US7409402B1 (en) * 2005-09-20 2008-08-05 Yahoo! Inc. Systems and methods for presenting advertising content based on publisher-selected labels
US7962465B2 (en) * 2006-10-19 2011-06-14 Yahoo! Inc. Contextual syndication platform
US20060155687A1 (en) * 2005-01-13 2006-07-13 Bridgewell Inc. Portable database search agent processing system
US7702671B2 (en) * 2005-04-29 2010-04-20 Microsoft Corporation Systems and methods for discovery of data that needs improving or authored using user search results diagnostics
US8832055B1 (en) 2005-06-16 2014-09-09 Gere Dev. Applications, LLC Auto-refinement of search results based on monitored search activities of users
US9286388B2 (en) * 2005-08-04 2016-03-15 Time Warner Cable Enterprises Llc Method and apparatus for context-specific content delivery
US20070124194A1 (en) * 2005-11-14 2007-05-31 Barnette James R Jr Systems and methods to facilitate keyword portfolio management
US7529748B2 (en) * 2005-11-15 2009-05-05 Ji-Rong Wen Information classification paradigm
US20070130145A1 (en) * 2005-11-23 2007-06-07 Microsoft Corporation User activity based document analysis
US7769751B1 (en) * 2006-01-17 2010-08-03 Google Inc. Method and apparatus for classifying documents based on user inputs
US7657626B1 (en) 2006-09-19 2010-02-02 Enquisite, Inc. Click fraud detection
US20070299677A1 (en) * 2006-06-22 2007-12-27 Richard James Maertz Business methods for providing emergency property repairs and other property-related benefits
US8078625B1 (en) * 2006-09-11 2011-12-13 Aol Inc. URL-based content categorization
US9317855B2 (en) * 2006-10-24 2016-04-19 Yellowpages.Com Llc Systems and methods to provide voice connections via local telephone numbers
EP2080127A2 (en) * 2006-11-01 2009-07-22 Bloxx Limited Methods and systems for web site categorisation training, categorisation and access control
US8463830B2 (en) * 2007-01-05 2013-06-11 Google Inc. Keyword-based content suggestions
US20080270474A1 (en) * 2007-04-30 2008-10-30 Searete Llc Collecting influence information
US7788254B2 (en) * 2007-05-04 2010-08-31 Microsoft Corporation Web page analysis using multiple graphs
US8335719B1 (en) * 2007-06-26 2012-12-18 Amazon Technologies, Inc. Generating advertisement sets based on keywords extracted from data feeds
US20090006175A1 (en) * 2007-06-27 2009-01-01 Richard James Maertz Business methods for providing emergency property repairs and other property-related benefits
US20090024470A1 (en) * 2007-07-20 2009-01-22 Google Inc. Vertical clustering and anti-clustering of categories in ad link units
US8005782B2 (en) * 2007-08-10 2011-08-23 Microsoft Corporation Domain name statistical classification using character-based N-grams
US8041662B2 (en) * 2007-08-10 2011-10-18 Microsoft Corporation Domain name geometrical classification using character-based n-grams
WO2009095616A1 (en) * 2008-01-30 2009-08-06 France Telecom Method of identifying a multimedia document in a reference base, corresponding computer program and identification device
US11159909B2 (en) * 2008-02-05 2021-10-26 Victor Thomas Anderson Wireless location establishing device
US8051080B2 (en) * 2008-04-16 2011-11-01 Yahoo! Inc. Contextual ranking of keywords using click data
US9183535B2 (en) * 2008-07-30 2015-11-10 Aro, Inc. Social network model for semantic processing
US8244517B2 (en) 2008-11-07 2012-08-14 Yahoo! Inc. Enhanced matching through explore/exploit schemes
US8458171B2 (en) * 2009-01-30 2013-06-04 Google Inc. Identifying query aspects
US8301624B2 (en) 2009-03-31 2012-10-30 Yahoo! Inc. Determining user preference of items based on user ratings and user features
US8612435B2 (en) * 2009-07-16 2013-12-17 Yahoo! Inc. Activity based users' interests modeling for determining content relevance
US9069862B1 (en) 2010-10-14 2015-06-30 Aro, Inc. Object-based relationship search using a plurality of sub-queries
US8972391B1 (en) * 2009-10-02 2015-03-03 Google Inc. Recent interest based relevance scoring
US20130066710A1 (en) * 2010-03-02 2013-03-14 Digg, Inc. Including content items in advertisements
US20110264530A1 (en) 2010-04-23 2011-10-27 Bryan Santangelo Apparatus and methods for dynamic secondary content and data insertion and delivery
US8600979B2 (en) 2010-06-28 2013-12-03 Yahoo! Inc. Infinite browse
US9703871B1 (en) * 2010-07-30 2017-07-11 Google Inc. Generating query refinements using query components
US9779168B2 (en) 2010-10-04 2017-10-03 Excalibur Ip, Llc Contextual quick-picks
US8429099B1 (en) * 2010-10-14 2013-04-23 Aro, Inc. Dynamic gazetteers for entity recognition and fact association
US10248960B2 (en) * 2010-11-16 2019-04-02 Disney Enterprises, Inc. Data mining to determine online user responses to broadcast messages
US8407215B2 (en) * 2010-12-10 2013-03-26 Sap Ag Text analysis to identify relevant entities
US9613135B2 (en) 2011-09-23 2017-04-04 Aol Advertising Inc. Systems and methods for contextual analysis and segmentation of information objects
US8793252B2 (en) 2011-09-23 2014-07-29 Aol Advertising Inc. Systems and methods for contextual analysis and segmentation using dynamically-derived topics
US8930393B1 (en) * 2011-10-05 2015-01-06 Google Inc. Referent based search suggestions
US10013152B2 (en) 2011-10-05 2018-07-03 Google Llc Content selection disambiguation
WO2013052866A2 (en) 2011-10-05 2013-04-11 Google Inc. Semantic selection and purpose facilitation
US9324323B1 (en) * 2012-01-13 2016-04-26 Google Inc. Speech recognition using topic-specific language models
US8775177B1 (en) 2012-03-08 2014-07-08 Google Inc. Speech recognition process
US8751505B2 (en) * 2012-03-11 2014-06-10 International Business Machines Corporation Indexing and searching entity-relationship data
US9372589B2 (en) * 2012-04-18 2016-06-21 Facebook, Inc. Structured information about nodes on a social networking system
WO2014165180A2 (en) * 2013-03-12 2014-10-09 Thomson Reuters Global Resources Workflow software structured around taxonomic themes of regulatory activity
US20140280133A1 (en) * 2013-03-13 2014-09-18 Google Inc. Structured Data to Aggregate Analytics
US9367646B2 (en) 2013-03-14 2016-06-14 Appsense Limited Document and user metadata storage
US20150025981A1 (en) * 2013-03-15 2015-01-22 David Zaretsky Url shortening computer-processed platform for processing internet traffic
US20150046260A1 (en) * 2013-07-22 2015-02-12 Google Inc. Using entities in content selection
US20150100413A1 (en) * 2013-10-09 2015-04-09 Google Inc. Generating and using entity selection criteria
US20150170217A1 (en) * 2013-12-12 2015-06-18 Verizon Patent And Licensing Inc. Business directory assistance activity analysis by user device and network medium
US9436743B1 (en) * 2014-03-28 2016-09-06 Veritas Technologies Llc Systems and methods for expanding search results
US20160027049A1 (en) * 2014-06-23 2016-01-28 Node, Inc. Systems and methods for facilitating deals
US10776376B1 (en) 2014-12-05 2020-09-15 Veritas Technologies Llc Systems and methods for displaying search results
CN106156096A (en) * 2015-04-02 2016-11-23 腾讯科技(深圳)有限公司 A kind of page time of staying acquisition methods, system and user terminal
US10510093B2 (en) * 2015-04-02 2019-12-17 Vungle, Inc. Systems and methods for providing advertising services to devices with dynamic ad creative deep linking
US9710563B2 (en) * 2015-08-28 2017-07-18 International Business Machines Corporation Search engine analytics and optimization for media content in social networks
US10318562B2 (en) 2016-07-27 2019-06-11 Google Llc Triggering application information
US11126630B2 (en) * 2018-05-07 2021-09-21 Salesforce.Com, Inc. Ranking partial search query results based on implicit user interactions
RU2731658C2 (en) 2018-06-21 2020-09-07 Общество С Ограниченной Ответственностью "Яндекс" Method and system of selection for ranking search results using machine learning algorithm
RU2733481C2 (en) 2018-12-13 2020-10-01 Общество С Ограниченной Ответственностью "Яндекс" Method and system for generating feature for ranging document
RU2744029C1 (en) 2018-12-29 2021-03-02 Общество С Ограниченной Ответственностью "Яндекс" System and method of forming training set for machine learning algorithm
US11403849B2 (en) 2019-09-25 2022-08-02 Charter Communications Operating, Llc Methods and apparatus for characterization of digital content
CN111930592A (en) * 2020-07-20 2020-11-13 国网浙江省电力有限公司嘉兴供电公司 Method and system for detecting log sequence abnormity in real time
CN116385157B (en) * 2023-06-05 2023-08-15 紫金诚征信有限公司 Data processing method and device for credit investigation credit principal identification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134532A (en) 1997-11-14 2000-10-17 Aptex Software, Inc. System and method for optimal adaptive matching of users to most relevant entity and information in real-time
US20030046161A1 (en) 2001-09-06 2003-03-06 Kamangar Salar Arta Methods and apparatus for ordering advertisements based on performance information and price information
US20030050863A1 (en) 2001-09-10 2003-03-13 Michael Radwin Targeted advertisements using time-dependent key search terms
WO2004111771A2 (en) 2003-06-02 2004-12-23 Google, Inc. Serving advertisements using user request information and user information

Family Cites Families (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5371807A (en) * 1992-03-20 1994-12-06 Digital Equipment Corporation Method and apparatus for text classification
US5724521A (en) * 1994-11-03 1998-03-03 Intel Corporation Method and apparatus for providing electronic advertisements to end users in a consumer best-fit pricing manner
US5659732A (en) * 1995-05-17 1997-08-19 Infoseek Corporation Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents
US5740549A (en) * 1995-06-12 1998-04-14 Pointcast, Inc. Information and advertising distribution system and method
US6026368A (en) * 1995-07-17 2000-02-15 24/7 Media, Inc. On-line interactive system and method for providing content and advertising information to a targeted set of viewers
US5713016A (en) * 1995-09-05 1998-01-27 Electronic Data Systems Corporation Process and system for determining relevance
JP2001525951A (en) 1995-12-08 2001-12-11 テルコーディア テクノロジーズ インコーポレイテッド Method and system for placing advertisements in a computer network
WO1997026729A2 (en) * 1995-12-27 1997-07-24 Robinson Gary B Automated collaborative filtering in world wide web advertising
US5848397A (en) * 1996-04-19 1998-12-08 Juno Online Services, L.P. Method and apparatus for scheduling the presentation of messages to computer users
US5848396A (en) * 1996-04-26 1998-12-08 Freedom Of Information, Inc. Method and apparatus for determining behavioral profile of a computer user
US5920859A (en) * 1997-02-05 1999-07-06 Idd Enterprises, L.P. Hypertext document retrieval system and method
CA2184518A1 (en) * 1996-08-30 1998-03-01 Jim Reed Real time structured summary search engine
US6119114A (en) * 1996-09-17 2000-09-12 Smadja; Frank Method and apparatus for dynamic relevance ranking
US5948061A (en) * 1996-10-29 1999-09-07 Double Click, Inc. Method of delivery, targeting, and measuring advertising over networks
US6078914A (en) * 1996-12-09 2000-06-20 Open Text Corporation Natural language meta-search system and method
US6285999B1 (en) * 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
US6144944A (en) * 1997-04-24 2000-11-07 Imgis, Inc. Computer system for efficiently selecting and providing information
US6044376A (en) * 1997-04-24 2000-03-28 Imgis, Inc. Content stream analysis
AUPO710597A0 (en) * 1997-06-02 1997-06-26 Knowledge Horizons Pty. Ltd. Methods and systems for knowledge management
WO1998058334A1 (en) * 1997-06-16 1998-12-23 Doubleclick Inc. Method and apparatus for automatic placement of advertising
US6012053A (en) * 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US5999925A (en) * 1997-07-25 1999-12-07 Claritech Corporation Information retrieval based on use of sub-documents
US5943670A (en) * 1997-11-21 1999-08-24 International Business Machines Corporation System and method for categorizing objects in combined categories
US6003027A (en) * 1997-11-21 1999-12-14 International Business Machines Corporation System and method for determining confidence levels for the results of a categorization system
US6167382A (en) * 1998-06-01 2000-12-26 F.A.C. Services Group, L.P. Design and production of print advertising and commercial display materials over the Internet
US6256633B1 (en) * 1998-06-25 2001-07-03 U.S. Philips Corporation Context-based and user-profile driven information retrieval
US6078866A (en) * 1998-09-14 2000-06-20 Searchup, Inc. Internet site searching and listing service based on monetary ranking of site listings
IL126373A (en) * 1998-09-27 2003-06-24 Haim Zvi Melman Apparatus and method for search and retrieval of documents
US6654735B1 (en) * 1999-01-08 2003-11-25 International Business Machines Corporation Outbound information analysis for generating user interest profiles and improving user productivity
US6985882B1 (en) * 1999-02-05 2006-01-10 Directrep, Llc Method and system for selling and purchasing media advertising over a distributed communication network
US7047242B1 (en) * 1999-03-31 2006-05-16 Verizon Laboratories Inc. Weighted term ranking for on-line query tool
US6327590B1 (en) * 1999-05-05 2001-12-04 Xerox Corporation System and method for collaborative ranking of search results employing user and group profiles derived from document collection content analysis
US7110993B2 (en) 1999-05-28 2006-09-19 Overture Services, Inc. System and method for influencing a position on a search result list generated by a computer network search engine
US6269361B1 (en) * 1999-05-28 2001-07-31 Goto.Com System and method for influencing a position on a search result list generated by a computer network search engine
US6651057B1 (en) * 1999-09-03 2003-11-18 Bbnt Solutions Llc Method and apparatus for score normalization for information retrieval applications
US6839680B1 (en) * 1999-09-30 2005-01-04 Fujitsu Limited Internet profiling
US6489968B1 (en) * 1999-11-18 2002-12-03 Amazon.Com, Inc. System and method for exposing popular categories of browse tree
US7287214B1 (en) * 1999-12-10 2007-10-23 Books24X7.Com, Inc. System and method for providing a searchable library of electronic documents to a user
US6473751B1 (en) * 1999-12-10 2002-10-29 Koninklijke Philips Electronics N.V. Method and apparatus for defining search queries and user profiles and viewing search results
US6415368B1 (en) * 1999-12-22 2002-07-02 Xerox Corporation System and method for caching
US6401075B1 (en) * 2000-02-14 2002-06-04 Global Network, Inc. Methods of placing, purchasing and monitoring internet advertising
US6584471B1 (en) * 2000-02-14 2003-06-24 Leon Maclin System and method for the adaptive, hierarchical receipt, ranking, organization and display of information based upon democratic criteria and resultant dynamic profiling
US6311194B1 (en) * 2000-03-15 2001-10-30 Taalee, Inc. System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising
US6484164B1 (en) * 2000-03-29 2002-11-19 Koninklijke Philips Electronics N.V. Data search user interface with ergonomic mechanism for user profile definition and manipulation
US6968332B1 (en) * 2000-05-25 2005-11-22 Microsoft Corporation Facility for highlighting documents accessed through search or browsing
JP3870666B2 (en) * 2000-06-02 2007-01-24 株式会社日立製作所 Document retrieval method and apparatus, and recording medium recording the processing program
US20030217052A1 (en) * 2000-08-24 2003-11-20 Celebros Ltd. Search engine method and apparatus
US7284008B2 (en) * 2000-08-30 2007-10-16 Kontera Technologies, Inc. Dynamic document context mark-up technique implemented over a computer network
JP3934325B2 (en) * 2000-10-31 2007-06-20 株式会社日立製作所 Document search method, document search apparatus, and storage medium for document search program
US6901398B1 (en) * 2001-02-12 2005-05-31 Microsoft Corporation System and method for constructing and personalizing a universal information classifier
US8001118B2 (en) * 2001-03-02 2011-08-16 Google Inc. Methods and apparatus for employing usage statistics in document retrieval
US20030018659A1 (en) * 2001-03-14 2003-01-23 Lingomotors, Inc. Category-based selections in an information access environment
US7058624B2 (en) * 2001-06-20 2006-06-06 Hewlett-Packard Development Company, L.P. System and method for optimizing search results
US6732090B2 (en) * 2001-08-13 2004-05-04 Xerox Corporation Meta-document management system with user definable personalities
US20030078928A1 (en) * 2001-10-23 2003-04-24 Dorosario Alden Network wide ad targeting
WO2003067497A1 (en) * 2002-02-04 2003-08-14 Cataphora, Inc A method and apparatus to visually present discussions for data mining purposes
US7716161B2 (en) * 2002-09-24 2010-05-11 Google, Inc, Methods and apparatus for serving relevant advertisements
US7136875B2 (en) * 2002-09-24 2006-11-14 Google, Inc. Serving advertisements based on content
US7346606B2 (en) * 2003-06-30 2008-03-18 Google, Inc. Rendering advertisements with documents having one or more topics using user topic interest
US20030216930A1 (en) * 2002-05-16 2003-11-20 Dunham Carl A. Cost-per-action search engine system, method and apparatus
US7231395B2 (en) 2002-05-24 2007-06-12 Overture Services, Inc. Method and apparatus for categorizing and presenting documents of a distributed database
US20040059712A1 (en) * 2002-09-24 2004-03-25 Dean Jeffrey A. Serving advertisements using information associated with e-mail
US8086559B2 (en) * 2002-09-24 2011-12-27 Google, Inc. Serving content-relevant advertisements with client-side device support
US20050091106A1 (en) * 2003-10-27 2005-04-28 Reller William M. Selecting ads for a web page based on keywords located on the web page
JP2004152041A (en) 2002-10-31 2004-05-27 Ricoh Co Ltd Program, recording medium and apparatus for extracting key phrase
US20050033771A1 (en) * 2003-04-30 2005-02-10 Schmitter Thomas A. Contextual advertising system
KR20040104060A (en) 2003-06-02 2004-12-10 송재현 Linking method of related site with keyword db mining of blog contents
US20050033657A1 (en) * 2003-07-25 2005-02-10 Keepmedia, Inc., A Delaware Corporation Personalized content management and presentation systems
US8775443B2 (en) * 2003-08-07 2014-07-08 Sap Ag Ranking of business objects for search engines
US7505964B2 (en) * 2003-09-12 2009-03-17 Google Inc. Methods and systems for improving a search ranking using related queries
US7606798B2 (en) * 2003-09-22 2009-10-20 Google Inc. Methods and systems for improving a search ranking using location awareness
US7346839B2 (en) * 2003-09-30 2008-03-18 Google Inc. Information retrieval based on historical data
US20050222989A1 (en) * 2003-09-30 2005-10-06 Taher Haveliwala Results based personalization of advertisements in a search engine
US20050076003A1 (en) * 2003-10-06 2005-04-07 Dubose Paul A. Method and apparatus for delivering personalized search results
US20050144158A1 (en) * 2003-11-18 2005-06-30 Capper Liesl J. Computer network search engine
US20050137939A1 (en) * 2003-12-19 2005-06-23 Palo Alto Research Center Incorporated Server-based keyword advertisement management
US20050149388A1 (en) * 2003-12-30 2005-07-07 Scholl Nathaniel B. Method and system for placing advertisements based on selection of links that are not prominently displayed
US7761447B2 (en) * 2004-04-08 2010-07-20 Microsoft Corporation Systems and methods that rank search results
US20060015401A1 (en) * 2004-07-15 2006-01-19 Chu Barry H Efficiently spaced and used advertising in network-served multimedia documents
US7496563B2 (en) * 2004-08-04 2009-02-24 International Business Machines Corporation Method for locating documents a user has previously accessed
US7634461B2 (en) * 2004-08-04 2009-12-15 International Business Machines Corporation System and method for enhancing keyword relevance by user's interest on the search result documents
US20060036966A1 (en) * 2004-08-10 2006-02-16 Slava Yevdayev Method and system for presenting links associated with a requested website
US20060036659A1 (en) * 2004-08-12 2006-02-16 Colin Capriati Method of retrieving information using combined context based searching and content merging
US8386453B2 (en) * 2004-09-30 2013-02-26 Google Inc. Providing search information relating to a document
US7546294B2 (en) * 2005-03-31 2009-06-09 Microsoft Corporation Automated relevance tuning
US8756228B2 (en) * 2006-02-16 2014-06-17 Moreover Acquisition Corporation Method and apparatus for creating contextualized feeds

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134532A (en) 1997-11-14 2000-10-17 Aptex Software, Inc. System and method for optimal adaptive matching of users to most relevant entity and information in real-time
US20030046161A1 (en) 2001-09-06 2003-03-06 Kamangar Salar Arta Methods and apparatus for ordering advertisements based on performance information and price information
US20030050863A1 (en) 2001-09-10 2003-03-13 Michael Radwin Targeted advertisements using time-dependent key search terms
WO2004111771A2 (en) 2003-06-02 2004-12-23 Google, Inc. Serving advertisements using user request information and user information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1839203A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9092504B2 (en) 2012-04-09 2015-07-28 Vivek Ventures, LLC Clustered information processing and searching with structured-unstructured database bridge

Also Published As

Publication number Publication date
EP1839203A4 (en) 2009-12-16
WO2006073810A3 (en) 2007-02-22
US20060149710A1 (en) 2006-07-06
CA2592741C (en) 2012-04-10
CA2592741A1 (en) 2006-07-13
US9852225B2 (en) 2017-12-26
US20150317679A1 (en) 2015-11-05
AU2005323159A1 (en) 2006-07-13
AU2005323159B2 (en) 2010-08-05
EP1839203A2 (en) 2007-10-03

Similar Documents

Publication Publication Date Title
US9852225B2 (en) Associating features with entities, such as categories of web page documents, and/or weighting such features
US11367112B2 (en) Identifying related information given content and/or presenting related information in association with content-related advertisements
AU2004260464B2 (en) Improving content-targeted advertising using collected user behavior data
US8135619B2 (en) Increasing a number of relevant advertisements using a relaxed match
US8090706B2 (en) Rendering advertisements with documents having one or more topics using user topic interest information
AU2004256801B2 (en) Serving advertisements using a search of advertiser web information
US7346615B2 (en) Using match confidence to adjust a performance threshold

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2592741

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2005854841

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2005854841

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1083/MUMNP/2007

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2005323159

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 2005323159

Country of ref document: AU

Date of ref document: 20051221

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 2005323159

Country of ref document: AU

WWP Wipo information: published in national office

Ref document number: 2005854841

Country of ref document: EP