US20100100564A1 - System and method for spam identification - Google Patents
System and method for spam identification Download PDFInfo
- Publication number
- US20100100564A1 US20100100564A1 US12/647,110 US64711009A US2010100564A1 US 20100100564 A1 US20100100564 A1 US 20100100564A1 US 64711009 A US64711009 A US 64711009A US 2010100564 A1 US2010100564 A1 US 2010100564A1
- Authority
- US
- United States
- Prior art keywords
- spam
- search query
- individual result
- result
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/954—Navigation, e.g. using categorised browsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/904—Browsing; Visualisation therefor
Definitions
- Embodiments of the present invention relate to a system and method for spam identification. More particularly, embodiments of the invention relate to facilitating user feedback to improve spam identification.
- users have gained access to large amounts of information distributed over a large number of computers.
- users In order to access the vast amounts of information, users typically implement a user browser to access a search engine.
- the search engine responds to an input user query by returning one or more sources of information available over the Internet or other network.
- the search engine typically performs two functions including (1) finding matching results and (2) scoring the matching results to determine a display order.
- the search engines typically order or rank the results based on the similarity between the terms found in the accessed information sources to the terms input by the user. Results that show identical words and word order with the request input by the user are given a high rank and will be placed near the top of the list presented to the user.
- Scoring performed by different search engines takes into account various factors including whether a match was found in the title, the importance of the match, the importance of a phrase match, and other factors determined by the search engine. Parameters that work well for one kind of search may not work well for all searches and parameters that work for some users may not work well for others.
- Web site owners are constantly trying to manipulate search engines in order to artificially inflate their web site rankings for specific search terms.
- Highly monetizable terms such as “travel”, “hotel”, “Viagra”, “dvd”, etc., are spammed in order to drive traffic to the web site.
- the search engines may give these web sites a high ranking and never learn that the web sites are spam sites. This spamming technique can lead to an inferior user experience on average and distort the true value of a web site to the user.
- the user base of searchers will generally be the best source for information pertaining to whether results are spam results.
- requests to end users to provide more feedback data have been met with limited success.
- the limited success stems from the fact that providing feedback is often cumbersome and time consuming for users.
- pre-configured feedback formats are often inadequate.
- Embodiments of the present invention include a method for improving a user search experience by identifying any spam results in a result set produced in response to a query.
- the method may include receiving user feedback indicating whether a given result is spam and implementing automated spam identification techniques on the given result.
- the method may additionally include merging data obtained from the user feedback and the automated spam identification techniques to obtain an indicator for the given result, the indicator providing a likelihood that the given result is spam.
- a method for improving a user search experience by identifying any spam results in a result set produced in response to a query.
- the method may include providing a user interface spam feedback mechanism for allowing a user to indicate that a given result is spam and implementing the received feedback to alter a future ranking of the given result based on the user feedback.
- the method may additionally include determining whether user feedback is spam feedback.
- a system for improving a user search experience by identifying spam results in a result set produced in response to a query.
- the system may include a user interface spam feedback mechanism for allowing a user to indicate that a given result is spam and an automated spam identification mechanism for implementing automated techniques on the given result to determine whether the given result is spam.
- the system may additionally include a merging component for merging the determinations of the user interface spam feedback mechanism and the automated spam identification mechanism for deriving an indicator of the likelihood that a given result is spam.
- FIG. 1 is a block diagram illustrating an overview of a system in accordance with an embodiment of the invention
- FIG. 2 is block diagram illustrating a computerized environment in which embodiments of the invention may be implemented
- FIG. 3 is a block diagram illustrating components of a spam analysis system in accordance with an embodiment of the invention.
- FIG. 4 is a block diagram illustrating an automated spam analysis module in accordance with an embodiment of the invention.
- FIG. 5 is a block diagram illustrating a user feedback analyzer in accordance with an embodiment of the invention.
- FIG. 6 is a flow chart illustrating a method for analyzing results in accordance with an embodiment of the invention.
- a system and method are provided for identifying results produced by a search engine as spam.
- the system and method utilize a combination of automated spam identification techniques and user feedback to identify results as spam and adjust result rankings accordingly.
- a plurality of user computers 10 may be connected over a network 20 with a search system 200 .
- the search system 200 may respond to a user query by searching a plurality of information sources 30 .
- the search system may also be connected with an advertising system 260 and a spam analysis system 300 .
- the advertising system 260 may store information pertaining to advertiser bids on keywords and access stored advertisements.
- the spam analysis system 300 may utilize information from the advertising system 260 and the search system 200 to detect spam results.
- the search system 200 may include search and ranking components 210 , a crawler 220 , an index 230 , user interaction components 240 , and a cache 250 .
- the crawler 220 may traverse the information sources 30 and store results indexed by keyword in the index 230 .
- the cache 250 may be used to store results that are frequently accessed in order to facilitate efficient operation of the search system 200 .
- the search and ranking components 210 may locate and rank results based on an input query.
- the user interaction components 240 may be provided to obtain user feedback pertaining to spam and deliver the feedback to the spam analysis system 300 .
- the spam analysis system 300 may accumulate user feedback for subsequent use by the search system 200 for optimization of future search results.
- search engine 200 may include additional known components, omitted for simplicity.
- Embodiments of the invention through the user feedback components 240 and the spam analysis system 300 , provide a friendly interface and enable highly actionable user feedback to be gathered on a large scale from willing users.
- the user feedback components 240 enable users to provide feedback regarding what results are spam for their specific queries.
- this technique also invites artificial inflation techniques.
- a spammer can use this mechanism to elect his or her spam site as a good site and a competitor's site as spam site.
- the spam analysis system 300 includes components for detecting false spam feedback.
- Embodiments of the invention implement a user interaction UI mechanism such as a toolbar button or other UI element on a search results page to allow a user to send information back to the search system 200 identifying a particular result as spam for a particular query.
- the spam analysis system 300 aggregates input data for all user spam feedback and merges the data with the data coming from automated spam analysis module 400 . If both pieces of data agree that the result is spam, the spam analysis system 300 may ensure that the result will be penalized in future rankings to prevent the artificial rank inflation of spam.
- FIG. 2 illustrates an example of a suitable computing system environment 100 on which the system for spam identification may be implemented.
- the computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100 .
- program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- program modules may be located in both local and remote computer storage media including memory storage devices.
- the exemplary system 100 for implementing the invention includes a general purpose-computing device in the form of a computer 110 including a processing unit 120 , a system memory 130 , and a system bus 121 that couples various system components including the system memory to the processing unit 120 .
- Computer 110 typically includes a variety of computer readable media.
- computer readable media may comprise computer storage media and communication media.
- the system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132 .
- ROM read only memory
- RAM random access memory
- a basic input/output system 133 (BIOS) containing the basic routines that help to transfer information between elements within computer 110 , such as during start-up, is typically stored in ROM 131 .
- BIOS basic input/output system 133
- RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120 .
- FIG. 2 illustrates operating system 134 , application programs 135 , other program modules 136 , and program data 137 .
- the computer 110 may also include other removable/nonremovable, volatile/nonvolatile computer storage media.
- FIG. 2 illustrates a hard disk drive 141 that reads from or writes to nonremovable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152 , and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 such as a CD ROM or other optical media.
- removable/nonremovable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
- the hard disk drive 141 is typically connected to the system bus 121 through an non-removable memory interface such as interface 140
- magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150 .
- hard disk drive 141 is illustrated as storing operating system 144 , application programs 145 , other program modules 146 , and program data 147 . Note that these components can either be the same as or different from operating system 134 , application programs 135 , other program modules 136 , and program data 137 . Operating system 144 , application programs 145 , other program modules 146 , and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies.
- a user may enter commands and information into the computer 110 through input devices such as a keyboard 162 and pointing device 161 , commonly referred to as a mouse, trackball or touch pad.
- Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
- These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB).
- a monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190 .
- computers may also include other peripheral output devices such as speakers 197 and printer 196 , which may be connected through an output peripheral interface 195 .
- the computer 110 in the present invention will operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180 .
- the remote computer 180 may be a personal computer, and typically includes many or all of the elements described above relative to the computer 110 , although only a memory storage device 181 has been illustrated in FIG. 2 .
- the logical connections depicted in FIG. 2 include a local area network (LAN) 171 and a wide area network (WAN) 173 , but may also include other networks.
- LAN local area network
- WAN wide area network
- the computer 110 When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170 .
- the computer 110 When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173 , such as the Internet.
- the modem 172 which may be internal or external, may be connected to the system bus 121 via the user input interface 160 , or other appropriate mechanism.
- program modules depicted relative to the computer 110 may be stored in the remote memory storage device.
- FIG. 2 illustrates remote application programs 185 as residing on memory device 181 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
- FIG. 1 illustrates a system for evaluating whether results produced by a search engine are spam results.
- the system and method utilize a combination of automated spam identification techniques and user feedback to identify results as spam and adjust result rankings accordingly.
- user computers 10 may be connected over the network 20 with the search system 200 .
- the network 20 may be one of any number of different types of networks such as the Internet.
- the search system 200 may also be connected with the advertising system 260 and the spam analysis system 300 .
- the advertising system 260 may store information pertaining to advertiser bids on keywords and access advertisements.
- the information may include keyword monetization values that are based on advertiser bids.
- the spam analysis system 300 may utilize information from the advertising system 260 and the search system 200 to identify and appropriately identify and address spam results.
- the user interaction components 240 of the search system 200 may receive user input regarding spam results and deliver this input to the spam analysis system 300 .
- the user interaction components 240 are implemented in a UI mechanism such as a toolbar button or UI element on the search results page.
- a user can send information back to the search system 200 identifying a particular result as spam for a particular query.
- each result on a search result page may include an adjacent “feedback” link.
- the user may click the feedback” link next to any result.
- a form may then open that permits the user to mark the result as spam for the query.
- the toolbar may be equipped with a button capable of marking any displayed result or a currently shown web page as spam.
- the user interaction components 240 are configured to provide a simple, non-intrusive technique for facilitating user feedback on spam. As will be further illustrated below, the technique further ensures that the system also recognizes and appropriately reacts to feedback from spammers.
- FIG. 3 is a block diagram illustrating an embodiment of the spam analysis system 300 .
- the spam analysis system 300 may include a user feedback aggregator 310 , a user feedback analyzer 500 , an automated spam analysis module 400 , a merging component 320 , and an indexing mechanism 330 .
- the spam analysis system 300 may obtain results 270 and user feedback 280 from the search system 200 .
- the user feedback aggregator 310 aggregates feedback across multiple users and delivers it to the user feedback analyzer 500 .
- the user feedback analyzer 500 includes algorithms for analyzing user feedback as will further explained below.
- results produced by the search system 200 are delivered to the spam analysis system 300 .
- the automated spam analysis module 400 analyzes the search results 270 for spam.
- the merging component 320 merges the determinations of the automated spam analysis module 400 and the user feedback analyzer 500 and generates an indicator of the likelihood that a given result is a spam result.
- the indexing mechanism 330 may index the result along with an indicator of whether the result is likely to be a spam result.
- the indexing mechanism 330 receives information from the merging component 320 that indicates whether the automated spam analysis module 400 and the user feedback analyzer 500 reached the same conclusion regarding whether a result is spam. If both sources agreed, the search and ranking components 210 may penalize this result in future results rankings based on the indexed spam indicator so as to not allow the artificial inflation of its rank in the future.
- FIG. 4 is a block diagram illustrating an embodiment of the automated spam analysis module 400 .
- the automated spam analysis module 400 may include a characteristic analyzer 402 , a query independent rank analysis mechanism 410 , a monetization analysis mechanism 420 , and a popularity analysis mechanism 430 .
- the characteristic analyzer 402 may examine features of a result such as how many advertisements are included on a website, whether keyword stuffing appears to occur within the referenced result, and whether the result appears to be a member of a group of results with the same IP address that tend to be spammer pages. Based on these characteristics, the characteristic analyzer 402 may determine whether a result is likely to be a spam result. The determination of the characteristic analyzer 402 may be used in combination with other automated determinations.
- the query independent rank analysis mechanism 410 may consider the query independent rank of each result as determined by a known technique such as numbers of links to the result.
- the monetization analysis mechanism 420 may consider monetization value of query terms are based on monetization data from the advertising system 260 and on clickthrough rates on sponsored sites for the input query and bid rates for the query terms leading to the result. For example, if a query is non-commercial, such as “Carnegie Mellon University”, the automated spam analysis module 400 might be less aggressive at finding spam. However, if a query is highly commercial, such as “hotel”, advertisers may be bidding highly to have their advertisements shown. Accordingly, the automated spam analysis module 400 may be more aggressive about filtering out spam.
- the popularity analysis mechanism 430 determines the popularity of the results produced by examining traffic to the website referenced by the result.
- the popularity analysis mechanism 430 may operate through the toolbar by capturing all of the URLs each user visits. If data collected from multiple user toolbars indicates that many users visit a particular result, then the automated spam analysis module 400 decreases the probability that the result is spam.
- FIG. 5 illustrates an embodiment of the user feedback analyzer 500 .
- the user feedback analyzer 500 may include a source analysis component 510 , a unique user volume analyzer 520 , and a multiple query volume analyzer 530 .
- the source analysis component 510 may determine the originating IP address of the user feedback.
- the unique user volume analyzer 520 may mark feedback as spam feedback if excessive feedback is originating with a single user. For example, the unique volume analyzer 520 may determine if all user feedback for a result is coming from one or very few IP address as determined by the source analysis component 510 . The unique user volume analyzer 520 determines that this is likely a spammer trying to spam vote a result negatively.
- the multiple query volume analyzer 530 determines whether a result is being marked as spam across multiple queries. If a result is marked as spam across multiple queries, this is a higher confidence measure that the result is spam and will not create a positive user experience regardless of the query. Accordingly, the user feedback analyzer 500 utilizes spam feedback volume across unique users and spam feedback volume across multiple queries to mark a result as spam. The capability of the system to detect and disregard spam voting ensures the data is accurate.
- the combination of the determinations of the automated spam analysis module 400 and the user feedback analyzer 500 yields a reliable indication of whether or not a result ranking should be lowered because of the likelihood that the result may be spam.
- the determinations of the automated spam analysis module 400 and the user feedback analyzer 500 are merged by the merging component 320 .
- the merging component 320 delivers its conclusion to the indexing mechanism 330 .
- the data provided by the automated spam analysis module 400 can be utilized independently of any user feedback to filter out spam results.
- the merging component 320 may implement a spam scale and provide a number to the indexing mechanism 330 , which will index the result along with the relevant number, so that the search and ranking component 210 of the search engine 200 can adjust the rank of the result accordingly when the result is produced in response to a user query.
- the number produced by the merging component 320 indicates the likelihood that a result is a spam result.
- the number derived and delivered by the merging component 350 may affect a future rank of a given result in all queries.
- FIG. 6 is a flow chart illustrating a method in accordance with an embodiment of the invention.
- the method begins in step 600 and the system provides results along with feedback options in step 610 .
- the results are processed by the automated spam analysis module 400 .
- the system receives and analyzes user feedback.
- the merging component receives information from the automated spam analysis module 400 and the user feedback analyzer 500 and merges the user feedback analysis with the result analysis in order to produce a number or other indicator that provides a likelihood that the result is a spam result. If no user feedback is available, the merging component 320 will determine the indicator based on the automated determination.
- indexing component 330 indexes the result along with the number delivered to the merging component 320 .
- the search and ranking components 210 of the search engine 200 may adjust the rank of the result.
- the indexed number is a number between zero and one that indicates a spam probability and is used to determine a ranking penalty of a result. The method ends in step 660 .
- Embodiments of the invention implement a UI mechanism such as a toolbar button or other UI element on the search results page to allow a user to send information back to the search system, identifying a particular result as spam for a particular query.
- this information is aggregated for all user spam feedback, and this data is merged with the data coming from automated spam identification techniques. If both pieces of data agree that the result is spam, this result will be penalized in future results rankings so as to not allow the artificial inflation of its rank in the future. Integrating user feedback data and automated spam techniques provides more reliable data for arriving at a spam determination for each result.
Abstract
A system and method are provided for improving a user search experience by identifying spam results in a result set produced in response to a query. The system may include a user interface spam feedback mechanism for allowing a user to indicate that a given result is spam. The system may additionally include an automated spam identification mechanism for implementing automated techniques on the given result to determine whether the given result is spam. The system may further include a merging component for merging the determinations of the user interface spam feedback mechanism and the automated spam identification mechanism for deriving an indicator of the likelihood that a given result is spam.
Description
- This application is a continuation of and claims benefit of priority to U.S. patent application Ser. No. 11/117,568, filed on Apr. 29, 2005, which application is herein incorporated by reference.
- None.
- Embodiments of the present invention relate to a system and method for spam identification. More particularly, embodiments of the invention relate to facilitating user feedback to improve spam identification.
- Through the Internet and other networks, users have gained access to large amounts of information distributed over a large number of computers. In order to access the vast amounts of information, users typically implement a user browser to access a search engine. The search engine responds to an input user query by returning one or more sources of information available over the Internet or other network.
- The search engine typically performs two functions including (1) finding matching results and (2) scoring the matching results to determine a display order. The search engines typically order or rank the results based on the similarity between the terms found in the accessed information sources to the terms input by the user. Results that show identical words and word order with the request input by the user are given a high rank and will be placed near the top of the list presented to the user.
- Scoring performed by different search engines takes into account various factors including whether a match was found in the title, the importance of the match, the importance of a phrase match, and other factors determined by the search engine. Parameters that work well for one kind of search may not work well for all searches and parameters that work for some users may not work well for others.
- Web site owners are constantly trying to manipulate search engines in order to artificially inflate their web site rankings for specific search terms. Highly monetizable terms such as “travel”, “hotel”, “Viagra”, “dvd”, etc., are spammed in order to drive traffic to the web site. The search engines may give these web sites a high ranking and never learn that the web sites are spam sites. This spamming technique can lead to an inferior user experience on average and distort the true value of a web site to the user.
- The user base of searchers will generally be the best source for information pertaining to whether results are spam results. However, requests to end users to provide more feedback data have been met with limited success. The limited success stems from the fact that providing feedback is often cumbersome and time consuming for users. Furthermore, pre-configured feedback formats are often inadequate.
- Additionally, in considering user feedback, a system must be able to identify feedback from spammers in order to prevent such feedback from artificially lowering rankings of competitors' websites.
- User satisfaction is a critical success factor for a search engine. Spam results significantly decrease the quality of the user experience. Accordingly, a solution is needed that facilitates identification and filtering of spam results.
- Embodiments of the present invention include a method for improving a user search experience by identifying any spam results in a result set produced in response to a query. The method may include receiving user feedback indicating whether a given result is spam and implementing automated spam identification techniques on the given result. The method may additionally include merging data obtained from the user feedback and the automated spam identification techniques to obtain an indicator for the given result, the indicator providing a likelihood that the given result is spam.
- In an additional aspect, a method is provided for improving a user search experience by identifying any spam results in a result set produced in response to a query. The method may include providing a user interface spam feedback mechanism for allowing a user to indicate that a given result is spam and implementing the received feedback to alter a future ranking of the given result based on the user feedback. The method may additionally include determining whether user feedback is spam feedback.
- In yet an additional aspect, a system is provided for improving a user search experience by identifying spam results in a result set produced in response to a query. The system may include a user interface spam feedback mechanism for allowing a user to indicate that a given result is spam and an automated spam identification mechanism for implementing automated techniques on the given result to determine whether the given result is spam. The system may additionally include a merging component for merging the determinations of the user interface spam feedback mechanism and the automated spam identification mechanism for deriving an indicator of the likelihood that a given result is spam.
- The present invention is described in detail below with reference to the attached drawings figures, wherein:
-
FIG. 1 is a block diagram illustrating an overview of a system in accordance with an embodiment of the invention; -
FIG. 2 is block diagram illustrating a computerized environment in which embodiments of the invention may be implemented; -
FIG. 3 is a block diagram illustrating components of a spam analysis system in accordance with an embodiment of the invention; -
FIG. 4 is a block diagram illustrating an automated spam analysis module in accordance with an embodiment of the invention; -
FIG. 5 is a block diagram illustrating a user feedback analyzer in accordance with an embodiment of the invention; and -
FIG. 6 is a flow chart illustrating a method for analyzing results in accordance with an embodiment of the invention. - A system and method are provided for identifying results produced by a search engine as spam. The system and method utilize a combination of automated spam identification techniques and user feedback to identify results as spam and adjust result rankings accordingly. As illustrated in
FIG. 1 , a plurality ofuser computers 10 may be connected over anetwork 20 with asearch system 200. Thesearch system 200 may respond to a user query by searching a plurality ofinformation sources 30. The search system may also be connected with anadvertising system 260 and aspam analysis system 300. Theadvertising system 260 may store information pertaining to advertiser bids on keywords and access stored advertisements. Thespam analysis system 300 may utilize information from theadvertising system 260 and thesearch system 200 to detect spam results. - The
search system 200 may include search andranking components 210, acrawler 220, anindex 230,user interaction components 240, and acache 250. In operation, thecrawler 220 may traverse theinformation sources 30 and store results indexed by keyword in theindex 230. Thecache 250 may be used to store results that are frequently accessed in order to facilitate efficient operation of thesearch system 200. The search and rankingcomponents 210 may locate and rank results based on an input query. - The
user interaction components 240 may be provided to obtain user feedback pertaining to spam and deliver the feedback to thespam analysis system 300. Thespam analysis system 300 may accumulate user feedback for subsequent use by thesearch system 200 for optimization of future search results. - Although the aforementioned components are variously shown as integrated with the
search system 200, one or more of the components may exist as separate and discrete units or systems. Thesearch engine 200 may include additional known components, omitted for simplicity. - As set forth above, optimizing search result ranking is challenging due to the difficulty inherent in accurately evaluating results. Embodiments of the invention, through the
user feedback components 240 and thespam analysis system 300, provide a friendly interface and enable highly actionable user feedback to be gathered on a large scale from willing users. In embodiments of the invention, theuser feedback components 240 enable users to provide feedback regarding what results are spam for their specific queries. Unfortunately, this technique also invites artificial inflation techniques. A spammer can use this mechanism to elect his or her spam site as a good site and a competitor's site as spam site. Accordingly, thespam analysis system 300 includes components for detecting false spam feedback. - Embodiments of the invention implement a user interaction UI mechanism such as a toolbar button or other UI element on a search results page to allow a user to send information back to the
search system 200 identifying a particular result as spam for a particular query. Thespam analysis system 300 aggregates input data for all user spam feedback and merges the data with the data coming from automatedspam analysis module 400. If both pieces of data agree that the result is spam, thespam analysis system 300 may ensure that the result will be penalized in future rankings to prevent the artificial rank inflation of spam. -
FIG. 2 illustrates an example of a suitablecomputing system environment 100 on which the system for spam identification may be implemented. Thecomputing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should thecomputing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in theexemplary operating environment 100. - The invention is described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
- With reference to
FIG. 2 , theexemplary system 100 for implementing the invention includes a general purpose-computing device in the form of acomputer 110 including aprocessing unit 120, asystem memory 130, and asystem bus 121 that couples various system components including the system memory to theprocessing unit 120. -
Computer 110 typically includes a variety of computer readable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Thesystem memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements withincomputer 110, such as during start-up, is typically stored inROM 131.RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processingunit 120. By way of example, and not limitation,FIG. 2 illustratesoperating system 134,application programs 135,other program modules 136, andprogram data 137. - The
computer 110 may also include other removable/nonremovable, volatile/nonvolatile computer storage media. By way of example only,FIG. 2 illustrates ahard disk drive 141 that reads from or writes to nonremovable, nonvolatile magnetic media, amagnetic disk drive 151 that reads from or writes to a removable, nonvolatilemagnetic disk 152, and anoptical disk drive 155 that reads from or writes to a removable, nonvolatileoptical disk 156 such as a CD ROM or other optical media. Other removable/nonremovable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. Thehard disk drive 141 is typically connected to thesystem bus 121 through an non-removable memory interface such asinterface 140, andmagnetic disk drive 151 andoptical disk drive 155 are typically connected to thesystem bus 121 by a removable memory interface, such asinterface 150. - The drives and their associated computer storage media discussed above and illustrated in
FIG. 2 , provide storage of computer readable instructions, data structures, program modules and other data for thecomputer 110. InFIG. 2 , for example,hard disk drive 141 is illustrated as storingoperating system 144,application programs 145,other program modules 146, andprogram data 147. Note that these components can either be the same as or different fromoperating system 134,application programs 135,other program modules 136, andprogram data 137.Operating system 144,application programs 145,other program modules 146, andprogram data 147 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into thecomputer 110 through input devices such as akeyboard 162 andpointing device 161, commonly referred to as a mouse, trackball or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to theprocessing unit 120 through auser input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). Amonitor 191 or other type of display device is also connected to thesystem bus 121 via an interface, such as avideo interface 190. In addition to the monitor, computers may also include other peripheral output devices such asspeakers 197 andprinter 196, which may be connected through an outputperipheral interface 195. - The
computer 110 in the present invention will operate in a networked environment using logical connections to one or more remote computers, such as aremote computer 180. Theremote computer 180 may be a personal computer, and typically includes many or all of the elements described above relative to thecomputer 110, although only amemory storage device 181 has been illustrated inFIG. 2 . The logical connections depicted inFIG. 2 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks. - When used in a LAN networking environment, the
computer 110 is connected to theLAN 171 through a network interface oradapter 170. When used in a WAN networking environment, thecomputer 110 typically includes amodem 172 or other means for establishing communications over theWAN 173, such as the Internet. Themodem 172, which may be internal or external, may be connected to thesystem bus 121 via theuser input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to thecomputer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,FIG. 2 illustratesremote application programs 185 as residing onmemory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used. - Although many other internal components of the
computer 110 are not shown, those of ordinary skill in the art will appreciate that such components and the interconnection are well known. Accordingly, additional details concerning the internal construction of thecomputer 110 need not be disclosed in connection with the present invention. - As set forth above,
FIG. 1 illustrates a system for evaluating whether results produced by a search engine are spam results. The system and method utilize a combination of automated spam identification techniques and user feedback to identify results as spam and adjust result rankings accordingly. As illustrated inFIG. 1 ,user computers 10 may be connected over thenetwork 20 with thesearch system 200. As described above with respect toFIG. 2 , thenetwork 20 may be one of any number of different types of networks such as the Internet. - The
search system 200 may also be connected with theadvertising system 260 and thespam analysis system 300. Theadvertising system 260 may store information pertaining to advertiser bids on keywords and access advertisements. The information may include keyword monetization values that are based on advertiser bids. Thespam analysis system 300 may utilize information from theadvertising system 260 and thesearch system 200 to identify and appropriately identify and address spam results. - The
user interaction components 240 of thesearch system 200 may receive user input regarding spam results and deliver this input to thespam analysis system 300. In embodiments of the invention, theuser interaction components 240 are implemented in a UI mechanism such as a toolbar button or UI element on the search results page. A user can send information back to thesearch system 200 identifying a particular result as spam for a particular query. - For example, each result on a search result page may include an adjacent “feedback” link. The user may click the feedback” link next to any result. A form may then open that permits the user to mark the result as spam for the query. In a further embodiment, the toolbar may be equipped with a button capable of marking any displayed result or a currently shown web page as spam. The
user interaction components 240 are configured to provide a simple, non-intrusive technique for facilitating user feedback on spam. As will be further illustrated below, the technique further ensures that the system also recognizes and appropriately reacts to feedback from spammers. -
FIG. 3 is a block diagram illustrating an embodiment of thespam analysis system 300. Thespam analysis system 300 may include auser feedback aggregator 310, auser feedback analyzer 500, an automatedspam analysis module 400, a mergingcomponent 320, and anindexing mechanism 330. In operation, thespam analysis system 300 may obtainresults 270 anduser feedback 280 from thesearch system 200. Theuser feedback aggregator 310 aggregates feedback across multiple users and delivers it to theuser feedback analyzer 500. Theuser feedback analyzer 500 includes algorithms for analyzing user feedback as will further explained below. - In addition to the user feedback, results produced by the
search system 200 are delivered to thespam analysis system 300. The automatedspam analysis module 400 analyzes the search results 270 for spam. The mergingcomponent 320 merges the determinations of the automatedspam analysis module 400 and theuser feedback analyzer 500 and generates an indicator of the likelihood that a given result is a spam result. Finally, theindexing mechanism 330 may index the result along with an indicator of whether the result is likely to be a spam result. Theindexing mechanism 330 receives information from the mergingcomponent 320 that indicates whether the automatedspam analysis module 400 and theuser feedback analyzer 500 reached the same conclusion regarding whether a result is spam. If both sources agreed, the search and rankingcomponents 210 may penalize this result in future results rankings based on the indexed spam indicator so as to not allow the artificial inflation of its rank in the future. -
FIG. 4 is a block diagram illustrating an embodiment of the automatedspam analysis module 400. The automatedspam analysis module 400 may include acharacteristic analyzer 402, a query independentrank analysis mechanism 410, amonetization analysis mechanism 420, and apopularity analysis mechanism 430. Thecharacteristic analyzer 402 may examine features of a result such as how many advertisements are included on a website, whether keyword stuffing appears to occur within the referenced result, and whether the result appears to be a member of a group of results with the same IP address that tend to be spammer pages. Based on these characteristics, thecharacteristic analyzer 402 may determine whether a result is likely to be a spam result. The determination of thecharacteristic analyzer 402 may be used in combination with other automated determinations. - The query independent
rank analysis mechanism 410 may consider the query independent rank of each result as determined by a known technique such as numbers of links to the result. Themonetization analysis mechanism 420 may consider monetization value of query terms are based on monetization data from theadvertising system 260 and on clickthrough rates on sponsored sites for the input query and bid rates for the query terms leading to the result. For example, if a query is non-commercial, such as “Carnegie Mellon University”, the automatedspam analysis module 400 might be less aggressive at finding spam. However, if a query is highly commercial, such as “hotel”, advertisers may be bidding highly to have their advertisements shown. Accordingly, the automatedspam analysis module 400 may be more aggressive about filtering out spam. - The
popularity analysis mechanism 430 determines the popularity of the results produced by examining traffic to the website referenced by the result. Thepopularity analysis mechanism 430 may operate through the toolbar by capturing all of the URLs each user visits. If data collected from multiple user toolbars indicates that many users visit a particular result, then the automatedspam analysis module 400 decreases the probability that the result is spam. -
FIG. 5 illustrates an embodiment of theuser feedback analyzer 500. Theuser feedback analyzer 500 may include asource analysis component 510, a uniqueuser volume analyzer 520, and a multiple query volume analyzer 530. - The
source analysis component 510 may determine the originating IP address of the user feedback. The uniqueuser volume analyzer 520 may mark feedback as spam feedback if excessive feedback is originating with a single user. For example, theunique volume analyzer 520 may determine if all user feedback for a result is coming from one or very few IP address as determined by thesource analysis component 510. The uniqueuser volume analyzer 520 determines that this is likely a spammer trying to spam vote a result negatively. - The multiple query volume analyzer 530 determines whether a result is being marked as spam across multiple queries. If a result is marked as spam across multiple queries, this is a higher confidence measure that the result is spam and will not create a positive user experience regardless of the query. Accordingly, the
user feedback analyzer 500 utilizes spam feedback volume across unique users and spam feedback volume across multiple queries to mark a result as spam. The capability of the system to detect and disregard spam voting ensures the data is accurate. - The combination of the determinations of the automated
spam analysis module 400 and theuser feedback analyzer 500 yields a reliable indication of whether or not a result ranking should be lowered because of the likelihood that the result may be spam. As set forth above, the determinations of the automatedspam analysis module 400 and theuser feedback analyzer 500 are merged by the mergingcomponent 320. The mergingcomponent 320 delivers its conclusion to theindexing mechanism 330. In situations in which the user fails to provide feedback, the data provided by the automatedspam analysis module 400 can be utilized independently of any user feedback to filter out spam results. The mergingcomponent 320 may implement a spam scale and provide a number to theindexing mechanism 330, which will index the result along with the relevant number, so that the search and rankingcomponent 210 of thesearch engine 200 can adjust the rank of the result accordingly when the result is produced in response to a user query. The number produced by the mergingcomponent 320 indicates the likelihood that a result is a spam result. The number derived and delivered by the merging component 350 may affect a future rank of a given result in all queries. -
FIG. 6 is a flow chart illustrating a method in accordance with an embodiment of the invention. The method begins instep 600 and the system provides results along with feedback options instep 610. Instep 620, the results are processed by the automatedspam analysis module 400. Instep 630, the system receives and analyzes user feedback. Instep 640, the merging component receives information from the automatedspam analysis module 400 and theuser feedback analyzer 500 and merges the user feedback analysis with the result analysis in order to produce a number or other indicator that provides a likelihood that the result is a spam result. If no user feedback is available, the mergingcomponent 320 will determine the indicator based on the automated determination. Instep 650,indexing component 330 indexes the result along with the number delivered to themerging component 320. Ultimately, the search and rankingcomponents 210 of thesearch engine 200 may adjust the rank of the result. In embodiments of the invention, the indexed number is a number between zero and one that indicates a spam probability and is used to determine a ranking penalty of a result. The method ends instep 660. - Embodiments of the invention implement a UI mechanism such as a toolbar button or other UI element on the search results page to allow a user to send information back to the search system, identifying a particular result as spam for a particular query. On the back end, this information is aggregated for all user spam feedback, and this data is merged with the data coming from automated spam identification techniques. If both pieces of data agree that the result is spam, this result will be penalized in future results rankings so as to not allow the artificial inflation of its rank in the future. Integrating user feedback data and automated spam techniques provides more reliable data for arriving at a spam determination for each result.
- While particular embodiments of the invention have been illustrated and described in detail herein, it should be understood that various changes and modifications might be made to the invention without departing from the scope and intent of the invention. The embodiments described herein are intended in all respects to be illustrative rather than restrictive. Alternate embodiments will become apparent to those skilled in the art to which the present invention pertains without departing from its scope.
- From the foregoing it will be seen that this invention is one well adapted to attain all the ends and objects set forth above, together with other advantages, which are obvious and inherent to the system and method. It will be understood that certain features and sub-combinations are of utility and may be employed without reference to other features and sub-combinations. This is contemplated and within the scope of the appended claims.
Claims (20)
1. A method for improving a user search experience by identifying spam results in a result set produced in response to a query, the method comprising:
receiving, at a computing device, a search query;
determining a monetization value for the search query;
returning a plurality of results that are responsive to the search query, wherein the plurality includes an individual result that is responsive to the search query;
calculating a probability that the individual result is spam using one or more automated spam identification techniques, wherein at least one of the one or more automated spam identification techniques uses the monetization value of the query to calculate the probability, wherein the probability that the individual result is spam is lower when the monetization value of the search query is lower; and
displaying the individual result and other results from the plurality in an order that is based on the probability that the individual result is spam.
2. The method of claim 1 , wherein the monetization value is calculated using information from an online advertising exchange.
3. The method of claim 1 , wherein the monetization value is based on bid rates for one or more terms in the search query, wherein higher bid rates indicate a higher monetization value for the search query.
4. The method of claim 1 , wherein the monetization value is based on clickthrough rates on sponsored sites for one or more terms in the search query, wherein higher clickthrough rates indicate a higher monetization value for the search query.
5. The method of claim 1 , receiving user feedback identifying the individual result as spam, wherein the user feedback is received through an input displayed when the individual result is presented within a plurality of search results in response to one or more queries.
6. The method of claim 5 , further comprising analyzing the user feedback for the individual result across multiple queries, wherein a confidence that the individual result is spam is increased when the user feedback is received in connection with presenting the individual result in response to the multiple queries.
7. The method of claim 6 , merging data obtained from the user feedback and the one or more automated spam identification techniques to calculate the probability that the individual result is spam.
8. The method of claim 1 , wherein the one or more automated spam identification techniques comprise a popularity analysis that determines a popularity of the individual result by examining traffic to a website referenced by the individual result, wherein a high popularity indicates a lower probability that the individual result is spam.
9. The method of claim 8 , wherein the popularity analysis is based on traffic data obtained from search toolbars utilized by a plurality of users, wherein when the data indicates that many users visit the individual result, then the probability that the individual result is spam is decreased.
10. One or more computer-readable media including computer-executable instructions, that when executed by a computing device, performs a method for improving a user search experience by identifying spam results in a result set produced in response to a query, the method comprising:
receiving, at the computing device, a search query;
determining a monetization value for the search query;
returning a plurality of search results that are responsive to the search query, wherein the plurality includes an individual result that is responsive to the search query, and wherein the individual result is a website;
calculating a probability that the individual result is spam using one or more automated spam identification techniques, wherein at least one of the one or more automated spam identification techniques uses the monetization value of the query to calculate the probability, wherein the probability that the individual result is spam is lower when the monetization value of the search query is lower; and
displaying the individual result and other results from the plurality in an order that is based on the probability that the individual result is spam.
11. The media of claim 10 , wherein the monetization value is calculated using information from an online advertising exchange.
12. The media of claim 10 , wherein the monetization value is based on bid rates for one or more terms in the search query, wherein higher bid rates indicate a higher monetization value for the search query.
13. The media of claim 10 , wherein the monetization value is based on clickthrough rates on sponsored sites for one or more terms in the search query, wherein higher clickthrough rates indicate a higher monetization value for the search query.
14. The media of claim 10 , wherein the one or more automated spam identification techniques comprise a popularity analysis that determines a popularity of the individual result by examining traffic to the individual result, wherein a high popularity indicates a lower probability that the individual result is spam.
15. The media of claim 14 , wherein the popularity analysis is based on traffic data obtained from search toolbars utilized by a plurality of users, wherein when the data indicates that many users visit the individual result, then the probability that the individual result is spam is decreased.
16. The media of claim 10 , wherein the one or more automated spam identification techniques comprise analyzing how many advertisements are included on the individual result.
17. One or more computer-readable media including computer-executable instructions, that when executed by a computing device, performs a method for improving a user search experience by identifying spam results in a result set produced in response to a query, the method comprising:
receiving, at the computing device, a search query;
determining a monetization value for the search query;
returning a plurality of results that are responsive to the search query, wherein the plurality includes an individual result that is responsive to the search query;
ranking the individual result relative to other results in the plurality based on responsiveness to the search query;
calculating a probability that the individual result is spam using one or more automated spam identification techniques;
adjusting a rank of the individual result based on the probability and the monetization value of the search query, wherein less adjustment to the rank is made when the monetization value of the search query is low;
displaying the individual result and other results in an order that is based on the probability that the individual result is spam.
18. The media of claim 17 , wherein the monetization value is calculated using information from an online advertising exchange.
19. The media of claim 17 , wherein the monetization value is based on bid rates for one or more terms in the search query, wherein higher bid rates indicate a higher monetization value for the search query.
20. The media of claim 17 , wherein the monetization value is based on clickthrough rates on sponsored sites for one or more terms in the search query, wherein higher clickthrough rates indicate a higher monetization value for the search query.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/647,110 US20100100564A1 (en) | 2005-04-29 | 2009-12-24 | System and method for spam identification |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/117,568 US7660792B2 (en) | 2005-04-29 | 2005-04-29 | System and method for spam identification |
US12/647,110 US20100100564A1 (en) | 2005-04-29 | 2009-12-24 | System and method for spam identification |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/117,568 Continuation US7660792B2 (en) | 2005-04-29 | 2005-04-29 | System and method for spam identification |
Publications (1)
Publication Number | Publication Date |
---|---|
US20100100564A1 true US20100100564A1 (en) | 2010-04-22 |
Family
ID=37235668
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/117,568 Expired - Fee Related US7660792B2 (en) | 2005-04-29 | 2005-04-29 | System and method for spam identification |
US12/647,110 Abandoned US20100100564A1 (en) | 2005-04-29 | 2009-12-24 | System and method for spam identification |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/117,568 Expired - Fee Related US7660792B2 (en) | 2005-04-29 | 2005-04-29 | System and method for spam identification |
Country Status (1)
Country | Link |
---|---|
US (2) | US7660792B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8682990B2 (en) | 2011-10-03 | 2014-03-25 | Microsoft Corporation | Identifying first contact unsolicited communications |
WO2015167999A1 (en) * | 2014-04-28 | 2015-11-05 | Quixey, Inc. | Application spam detector |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7693830B2 (en) | 2005-08-10 | 2010-04-06 | Google Inc. | Programmable search engine |
US7743045B2 (en) | 2005-08-10 | 2010-06-22 | Google Inc. | Detecting spam related and biased contexts for programmable search engines |
US7716199B2 (en) | 2005-08-10 | 2010-05-11 | Google Inc. | Aggregating context data for programmable search engines |
US7769751B1 (en) * | 2006-01-17 | 2010-08-03 | Google Inc. | Method and apparatus for classifying documents based on user inputs |
US20080033797A1 (en) * | 2006-08-01 | 2008-02-07 | Microsoft Corporation | Search query monetization-based ranking and filtering |
US20080086555A1 (en) * | 2006-10-09 | 2008-04-10 | David Alexander Feinleib | System and Method for Search and Web Spam Filtering |
US7693833B2 (en) | 2007-02-01 | 2010-04-06 | John Nagle | System and method for improving integrity of internet search |
US7756987B2 (en) * | 2007-04-04 | 2010-07-13 | Microsoft Corporation | Cybersquatter patrol |
US7930303B2 (en) * | 2007-04-30 | 2011-04-19 | Microsoft Corporation | Calculating global importance of documents based on global hitting times |
US7873635B2 (en) * | 2007-05-31 | 2011-01-18 | Microsoft Corporation | Search ranger system and double-funnel model for search spam analyses and browser protection |
US8667117B2 (en) * | 2007-05-31 | 2014-03-04 | Microsoft Corporation | Search ranger system and double-funnel model for search spam analyses and browser protection |
US9430577B2 (en) * | 2007-05-31 | 2016-08-30 | Microsoft Technology Licensing, Llc | Search ranger system and double-funnel model for search spam analyses and browser protection |
US8219549B2 (en) * | 2008-02-06 | 2012-07-10 | Microsoft Corporation | Forum mining for suspicious link spam sites detection |
US8473838B2 (en) * | 2008-04-16 | 2013-06-25 | Google Inc. | Website advertising inventory |
US9003308B2 (en) * | 2008-04-16 | 2015-04-07 | Google Inc. | Interactive placement ordering |
US20090287655A1 (en) * | 2008-05-13 | 2009-11-19 | Bennett James D | Image search engine employing user suitability feedback |
US8316021B2 (en) * | 2010-06-30 | 2012-11-20 | Emergency 24, Inc. | Methods and systems for enhanced placement search engine based on user usage |
US8707441B1 (en) * | 2010-08-17 | 2014-04-22 | Symantec Corporation | Techniques for identifying optimized malicious search engine results |
US20130086635A1 (en) * | 2011-09-30 | 2013-04-04 | General Electric Company | System and method for communication in a network |
US8621623B1 (en) | 2012-07-06 | 2013-12-31 | Google Inc. | Method and system for identifying business records |
CN104077530A (en) | 2013-03-27 | 2014-10-01 | 国际商业机器公司 | Method and device used for evaluating safety of data access sentence |
CN103970832A (en) * | 2014-04-01 | 2014-08-06 | 百度在线网络技术(北京)有限公司 | Method and device for recognizing spam |
CN110955778A (en) * | 2019-12-13 | 2020-04-03 | 中国科学院深圳先进技术研究院 | Junk short message identification method and system based on differential privacy joint learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6078866A (en) * | 1998-09-14 | 2000-06-20 | Searchup, Inc. | Internet site searching and listing service based on monetary ranking of site listings |
US20050131758A1 (en) * | 2003-12-11 | 2005-06-16 | Desikan Pavan K. | Systems and methods detecting for providing advertisements in a communications network |
US20050144067A1 (en) * | 2003-12-19 | 2005-06-30 | Palo Alto Research Center Incorporated | Identifying and reporting unexpected behavior in targeted advertising environment |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020087526A1 (en) * | 2000-04-21 | 2002-07-04 | Rao Dileep R. | Information search and retrieval system |
US7231395B2 (en) * | 2002-05-24 | 2007-06-12 | Overture Services, Inc. | Method and apparatus for categorizing and presenting documents of a distributed database |
US7555485B2 (en) * | 2002-08-22 | 2009-06-30 | Yahoo! Inc. | System and method for conducting an auction-based ranking of search results on a computer network |
US7283997B1 (en) * | 2003-05-14 | 2007-10-16 | Apple Inc. | System and method for ranking the relevance of documents retrieved by a query |
US7346839B2 (en) * | 2003-09-30 | 2008-03-18 | Google Inc. | Information retrieval based on historical data |
US20050165745A1 (en) * | 2004-01-13 | 2005-07-28 | International Business Machines Corporation | Method and apparatus for collecting user feedback based on search queries |
US20060149606A1 (en) * | 2005-01-05 | 2006-07-06 | Stottler Henke Associates, Inc. | System and method for agent assisted information retrieval |
US7406466B2 (en) * | 2005-01-14 | 2008-07-29 | Yahoo! Inc. | Reputation based search |
US7657520B2 (en) * | 2005-03-03 | 2010-02-02 | Google, Inc. | Providing history and transaction volume information of a content source to users |
-
2005
- 2005-04-29 US US11/117,568 patent/US7660792B2/en not_active Expired - Fee Related
-
2009
- 2009-12-24 US US12/647,110 patent/US20100100564A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6078866A (en) * | 1998-09-14 | 2000-06-20 | Searchup, Inc. | Internet site searching and listing service based on monetary ranking of site listings |
US20050131758A1 (en) * | 2003-12-11 | 2005-06-16 | Desikan Pavan K. | Systems and methods detecting for providing advertisements in a communications network |
US20050144067A1 (en) * | 2003-12-19 | 2005-06-30 | Palo Alto Research Center Incorporated | Identifying and reporting unexpected behavior in targeted advertising environment |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8682990B2 (en) | 2011-10-03 | 2014-03-25 | Microsoft Corporation | Identifying first contact unsolicited communications |
US9596201B2 (en) | 2011-10-03 | 2017-03-14 | Microsoft Technology Licensing, Llc | Identifying first contact unsolicited communications |
US10091150B2 (en) | 2011-10-03 | 2018-10-02 | Microsoft Technology Licensing, Llc | Identifying first contact unsolicited communications |
WO2015167999A1 (en) * | 2014-04-28 | 2015-11-05 | Quixey, Inc. | Application spam detector |
US9432395B2 (en) | 2014-04-28 | 2016-08-30 | Quixey, Inc. | Application spam detector |
US9794284B2 (en) | 2014-04-28 | 2017-10-17 | Quixey, Inc. | Application spam detector |
Also Published As
Publication number | Publication date |
---|---|
US20060248072A1 (en) | 2006-11-02 |
US7660792B2 (en) | 2010-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7660792B2 (en) | System and method for spam identification | |
US11206311B2 (en) | Method and system for measuring user engagement using click/skip in content stream | |
US11188544B1 (en) | Modifying search result ranking based on implicit user feedback | |
TWI636416B (en) | Method and system for multi-phase ranking for content personalization | |
US11461336B2 (en) | Selecting between global and location-specific search results | |
JP5513624B2 (en) | Retrieving information based on general query attributes | |
US7840538B2 (en) | Discovering query intent from search queries and concept networks | |
US7783632B2 (en) | Using popularity data for ranking | |
JP4950448B2 (en) | System and method for ranking search results based on detected user preferences | |
US7849089B2 (en) | Method and system for adapting search results to personal information needs | |
US10354308B2 (en) | Distinguishing accessories from products for ranking search results | |
US20140280890A1 (en) | Method and system for measuring user engagement using scroll dwell time | |
US20070073708A1 (en) | Generation of topical subjects from alert search terms | |
US8713028B2 (en) | Related news articles | |
US20080065633A1 (en) | Job Search Engine and Methods of Use | |
US20080091650A1 (en) | Augmented Search With Error Detection and Replacement | |
US20140280550A1 (en) | Method and system for measuring user engagement from stream depth | |
US20130110829A1 (en) | Method and Apparatus of Ranking Search Results, and Search Method and Apparatus | |
US20060248066A1 (en) | System and method for optimizing search results through equivalent results collapsing | |
US20080147669A1 (en) | Detecting web spam from changes to links of web sites | |
KR20090084853A (en) | Mechanism for automatic matching of host to guest content via categorization | |
CN108345601A (en) | Search result ordering method and device | |
JP2011154467A (en) | Retrieval result ranking method and system | |
US9262526B2 (en) | System and method for compiling search results using information regarding length of time users spend interacting with individual search results | |
CN113535790A (en) | Collaborative recommendation optimization method and device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION,WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BREWER, BRETT D.;WATSON, ERIC B.;REEL/FRAME:023700/0685 Effective date: 20050615 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034766/0001 Effective date: 20141014 |