US20140108429A1 - System and method for detecting personal experience event reports from user generated internet content - Google Patents

System and method for detecting personal experience event reports from user generated internet content Download PDF

Info

Publication number
US20140108429A1
US20140108429A1 US14/106,880 US201314106880A US2014108429A1 US 20140108429 A1 US20140108429 A1 US 20140108429A1 US 201314106880 A US201314106880 A US 201314106880A US 2014108429 A1 US2014108429 A1 US 2014108429A1
Authority
US
United States
Prior art keywords
terms
personal experience
post
posts
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/106,880
Inventor
Roee Robert Sa'adon
Tsvi Rabkin
Michael Palei
Idan Amit
Itzchak Lichtenfeld
Assaf Yardeni
Michael Milman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Treato Ltd
Original Assignee
Treato Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Treato Ltd filed Critical Treato Ltd
Priority to US14/106,880 priority Critical patent/US20140108429A1/en
Publication of US20140108429A1 publication Critical patent/US20140108429A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • G06F17/30867

Definitions

  • the present invention relates to Internet search engines generally and to customized search engines for user generated experience reports in particular.
  • the Internet contains a plethora of reports that are at least somewhat related to consumer products and services.
  • the sources for these reports are varied. For example, manufacturer/providers may provide information as part of their marketing efforts. Their competitors may provide conflicting information to promote competing products and services.
  • Nominally disinterested parties provide independent reviews, although such reviews are often prejudiced by concerns not readily apparent to the reader.
  • Such products and services are also often mentioned “by the way” as background for other subjects, making it difficult to weed out “true” reports from a multitude of “hits” received when using conventional Internet search engines.
  • the Internet also contains “forum” sites where users can post opinions and discuss various issues of interest. Some of the user posts on such sites constitute “personal experience” reports wherein consumers discuss their actual personal experiences using products and services. A typical such personal experience would be something like: “I used product X and my digestion improved immediately.” In such manner, forum sites may provide valuable firsthand information from actual consumers of products and services.
  • a method implementable on a computing device for scoring segments of Internet posts may include defining a set of indicating factors, where each indicating factor is associated with a possible feature in the segments, and where possible features affect a likelihood that the Internet posts represent a user generated product personal experience event report associated with a pre-defined search subject.
  • the method may include weighting the indicating factors in accordance with the likelihood, where each of the indicating factors is at least one of a negative and a positive value.
  • a method implementable on a computing device for scoring segments of Internet posts may include listing factors detected in text segments of the Internet posts, defining weights to be associated with each of the factors, where each defined weight reflects a value for an associated factor as a predictor of the segments representing a user generated product personal experience event report.
  • the method may additionally include multiplying a ratio of the weighted factors, divided by an overall number of words in the segment.
  • FIG. 1 is a block diagram of a novel user-generated personal experience retrieval system 100 , designed and operative in accordance with a preferred embodiment of the present invention
  • FIG. 2 is a block diagram of the segment analyzer of the embodiment of FIG. 1 ;
  • FIG. 3 is a block diagram of a novel process to be performed by the system of FIG. 1 ;
  • FIG. 4 is an illustration of an exemplary Internet post to be analyzed and processed by the system of FIG. 1 ;
  • FIGS. 5-7B are illustrations of exemplary scoring tables to be used during the process of FIG. 3 ;
  • FIG. 8 is a schematic diagram of a novel forum website selection utility, constructed and operative in accordance with a preferred embodiment of the present invention.
  • FIG. 9 is a block diagram of a novel process to be performed by the system of FIG. 8 .
  • An Internet user generated personal experience event report may be a statement written by users on an Internet platform (such as a message board), referring to their own experience with regard to a specific product or service.
  • a specialized search process may be configured to identify such reports related to a specific field of products and/or services in order to filter out “false hits” and extraneous information that may typically be retrieved by a search engine.
  • System 100 may comprise post collector 50 in communication with forums 20 on Internet 10 .
  • System 100 may also comprise segment analyzer 200 , scoring engine 300 and user search interface 350 .
  • system 100 may be configured to identify user-generated personal experience event reports that may be related to pharmaceutical products.
  • a typical subject for which there may be demand for collating and analyzing user-generated personal experience event reports may be pharmaceuticals.
  • potential users of pharmaceuticals may understandably wish to study personal experience event reports prior to beginning a treatment.
  • system 100 and its methods of operation may therefore be described hereinbelow in the context of a pharmaceutical based configuration.
  • the present invention may be configured for any suitable subject for which personal experience event reports may be posted on the Internet, for example, automobiles, airline travel, banking services, food and beverages, etc
  • Post collector 50 may periodically collect posts from a “collection list” of chat forums 20 on Internet 10 .
  • the collected posts may be forwarded to segment analyzer 200 to identify segments of forum posts that may be likely to contain personal experience event reports regarding the subject for which system 100 may be configured.
  • segment analyzer may identify post segments that may be likely to contain personal experience event reports regarding the use of pharmaceuticals.
  • These segments may be forwarded to scoring engine 300 which may “score” the segments in terms of their likely relevance as personal reports. Scored segments may then be stored in personal experience database 110 along with addressing information, such as a uniform resource locator (URL) for the original post. Users may then use user search interface 350 to search database 110 for user-generated personal experience event reports regarding the products/services for which system 100 may be configured. For example, a user may search for event reports relating to “Drug A” in order to find out if anyone that had personally used Drug A had reported regarding its success and/or any side effects suffered when using it. The output of such a search may consist of a list of chat posts, sorted according to the score assigned by scoring engine 300 . It will be appreciated that the present invention may include any suitable implementation for user search interface 350 , such as, for example, a browser based utility for inputting search parameters and displaying links to related user generated personal experience event reports.
  • a browser based utility for inputting search parameters and displaying links to related user generated personal experience event reports.
  • the collection list used by post collector 50 may include chat forums 20 deemed to be relevant to the subject for which system 100 may be configured. For example, if system 100 is configured for personal reports on pharmaceutical products, the collection list may include a list on chat forums 20 on which it may be likely that users may post personal experience event reports relating to their use of pharmaceutical products. It will be appreciated that post collector 50 may be configured with to include any suitable method such as known in the art for “scraping” forum posts from the collection list. It will similarly be appreciated that post collector 50 may be configured perform such “scraping” on an incremental basis to avoid reprocessing older posts.
  • the present invention may also include a novel pre-collection process for compiling the collection list for system 100 .
  • the present invention may include any suitable method for compiling the collection list, including manual inspection.
  • Segment analyzer 200 may comprise post filtering module 210 , anchor detection module 220 , basic segmentation unit 230 , density calculator 240 and segment optimizer 250 . Segment analyzer 200 may also comprise filter database 215 , anchor database 225 and terms database 235 , each of which may be referenced by the other elements of segment analyzer 200 .
  • FIG. 3 illustrates a novel post segmentation process 260 that may be executed by segment analyzer 200 to derive optimally segmented user-generated personal experience event reports from the posts collected by post collector 50 .
  • Post filtering module 210 may receive (step 262 ) posts from post collector 50 .
  • Post filtering module 210 may filter (step 264 ) these posts according to terms found in filter database 215 .
  • Filter database 215 may store a list of categorized relevant terms which module 210 may search for in each post. Depending on the configuration of system 100 , at least one term from a combination of some the categories must be found in a post for that post to pass through the step 264 .
  • the categories may include, for example, product/service name, indication of personal reference, and indication of personal experience.
  • the product/service name category may consist of names of product/services regarding which a user of system 100 may wish to search for personal experience event reports. It will be appreciated that other configurations for system 100 are included in the present invention.
  • the terms in the product/services name category may include a list of automobile makes, manufacturers and nicknames, such as, for example: “Corvette”, “Chevrolet”, “Chevy”, and “Vette”.
  • the category for indications of personal reference may include terms such as “I”, “my”, “me”, “mine”, “myself”, etc. that may indicate that the post refers to an actual personal experience.
  • the category for personal experience may include terms such as, for example, “I used”, “I bought” “I had”, etc. that may indicate that the poster had an actual personal experience; that the report was not based on hearsay or opinion.
  • a post may have to contain at least one term from each of these categories in order to pass through step 264 .
  • Drug name i.e. product/service name
  • indication of personal reference indication of personal drug experience
  • symptom indication of personal drug experience
  • personal symptom experience may be precise medical terms, such as, for example, “headache”, or alternatively they may also include user descriptions such as “my head exploded”.
  • Personal symptom experience terms may be indicative of the poster having a personal cause/reason for using the indicated drug, for example: “I suffered from”, “I have experienced”.
  • post filtering module may be configured to require terms from only four categories, wherein a term from only one of the personal experience and personal symptom experience categories may be required.
  • similar categories may be used to configure system 100 for non-pharmaceutical products and/or services. For example, if system 100 is configured for automobile research, the symptom category may be replaced by a “preference category” including terms such as “family car”, “sports car”, “road handling” or “seven seats”. Similarly, the personal symptom experience category may be replaced by a personal preference category including terms such as “I need a bigger car”, “I wanted a sports car” or “I value engine performance”.
  • Anchor detection module 220 may detect (step 266 ) segment anchors in posts that contain all of the required term categories. Module 220 may reference database 225 for lists of segment anchor terms to match to terms in the posts. Segment anchors may represent a pair of term categories that may together define the personal experience event reports of interest for system 100 .
  • the segment anchors may be the drug name and symptom categories.
  • the segment anchors may be the drug name and personal symptom experience categories.
  • segment anchors for a pharmaceutical configuration may be terms from the drug name and symptom categories.
  • Database 225 may be populated by a publicly available database of drugs and symptoms.
  • Basic segmentation unit 230 may then segment (step 268 ) the posts based on the anchors identified in step 266 to find the minimal text segments in the post that have at least one term from each of the categories required for the filter process in step 264 .
  • Unit 230 may first search for the required terms between the identified anchors and may then incrementally search before and after the anchors one word at a time until at least one of the terms from all of the relevant categories may be identified in order to define basic segments.
  • Density calculator 240 may reference terms database 235 to calculate (step 270 ) the density of relevant terms in each basic segment.
  • the density may be defined as the ratio of the relevant terms each multiplied by an associated weight stored in database 235 , divided by the overall number of words in the basic segment.
  • each term in database 235 may have a different defined weight that may reflect its value as a predictor of the likelihood that the post being analyzed may represent a user generated personal experience event report. Accordingly, the calculated density score may provide a measure of the amount of relevant information contained in the specified segment. It will be appreciated that any suitable method may be used to assign the weights. As will be described hereinbelow, in accordance with a preferred embodiment of the present invention, linear regressions may be run on a training set of data to derive these weights.
  • terms database 235 may also store other categories of terms that may also be used to assess the likelihood of a segment containing a valid user-generated personal experience event report.
  • terms database 235 may also store terms relating to a “negative” category. Terms such as “heard of”, “likely”, “I've been told”, “did not” may typically impact negatively on the likelihood that a given report is a true personal experience, and may therefore be significant when assessing a given segment at the next step of the process.
  • other categories may be added as well.
  • each term in such a category may be weighted to reflect its value as a predictor of the likelihood that the post being analyzed may represent a user generated personal experience event report.
  • Segment optimizer 250 may incrementally check each word before and after the segment to find (step 272 ) the next term from database 235 . Density calculator 240 may then recalculate (step 274 ) the density as in step 270 . If the result is that density has increased (step 276 ), segment optimizer may again find (step 272 ) the next term. Steps 272 and 274 may be repeated until the density ceases to increase (step 276 ) at which point the final, presumably optimized, segment may be output by segment analyzer 200 .
  • FIG. 4 illustrates an exemplary post as analyzed by segment analyzer 200 .
  • Terms 282 and 284 may represent anchor terms, “symptom” and “drug name” respectively.
  • Term 281 may represent a personal experience term
  • terms 288 may represent personal reference terms
  • terms 289 may represent negative terms.
  • Segment analyzer may use density calculator 240 to compare the density of the two sets in order to define a basic segment 285 .
  • Segment analyzer 200 may use terms 282 A and 284 A to define basic segment 285 since they reflect a denser segment; they “enclose” personal experience term 281 , whereas terms 282 B and 284 B are much farther away from term 281 .
  • segment analyzer 200 may optimize basic segment 285 by expanding it to include additional terms and recalculating density (steps 272 and 274 ). Accordingly, an exemplary optimal segment 290 may be defined by expanding basic segment 285 to include terms 287 and 288 A as well. It will also be appreciated that the second and third sentences may contain several negative terms 289 , which may decrease the likelihood that an optimal segment may be in found in those sentences.
  • FIG. 5 illustrates an exemplary factor weight table 305 , suitable for use with a pharmaceutical configuration of system 100 .
  • Scoring engine 300 may use such a table to “score” the optimized segments received from segment analyzer 200 in order to assess the likelihood that they may contain relevant user-generated personal experience event reports.
  • Each factor 310 may represent a possible situation that may occur in a segment, and may be weighted to reflect the effect of such a situation on the likelihood that a post may indeed be a relevant user-generated personal experience event report. It will be appreciated that any suitable method may be used to assign the weights. As will be described hereinbelow, in accordance with a preferred embodiment of the present invention, linear regressions may be run on a training set of data to derive these weights.
  • high concept density i.e high density as calculated by density calculator 240
  • density calculator 240 may likely indicate that a post may indeed be a relevant user-generated personal experience event report.
  • the appearance of a second drug between the anchors may lessen this likelihood, and accordingly may be given a negative weight, for example: ⁇ 5.
  • the proximity of terms may also reflect on the likelihood that a post may indeed be a relevant user-generated personal experience event report. For example, the farther apart a drug or experience and an associated side effect term may be mentioned in the segment, the less likely that they represent a “true” personal experience event report for that drug. Accordingly, proximity factors may be assigned negative weights.
  • the exemplary values in table 305 may be derived from statistical modeling of actual pharmaceutical related forum posts. However, the present invention may also include other feature-weight sets for both pharmaceutical and other configurations.
  • FIG. 6 illustrates table 305 (now labeled 305 ′) with exemplary values added based on an exemplary post segment.
  • scoring engine 300 may multiply each factor value per its associated weight, and then add the products for the final score. The score for these exemplary values would thus be computed as:
  • System 100 may be configured to store all posts with a score above a certain threshold in personal experience database 110 .
  • FIGS. 7A and 7B show the scoring for two exemplary post segments referring to “Drug B”.
  • FIG. 7A shows a score of +14.83
  • FIG. 7B shows a score of ⁇ 14.46.
  • the salient differences between the two examples may be that the example in FIG. 7A has an explicit “symptom experience (i.e. “no sex drive”) and lacks a negating factor; whereas the example in FIG. 7B has a negating factor (“heard”) and lacks an explicit symptom experience (“can cause” which may indicate a lack of actual experience).
  • the post from FIG. 7A may be determined to qualify as a user generated personal experience event report, whereas, the post from FIG. 7B may not.
  • the threshold for qualification may be configurable.
  • a forum website selection utility may be used to identify appropriate websites for collection by post collector 50 , thus reducing the “universe” of websites for post collection to a manageable number of relevant websites with non-commercial/SPAM authentic user generated personal experience event reports.
  • FIG. 8 illustrates forum website selection utility 400 , constructed and operative in accordance with a preferred embodiment of the present invention.
  • Utility 400 may comprise pre-collection post collector 450 , pattern recognizer 430 , training set scoring engine 440 and candidate scoring engine 460 .
  • Utility 400 may communicate with Internet 10 via post collector 450 , which may be configured with functionality for collecting posts from Internet websites similar to that of post collector 50 .
  • pre-collection post collector 450 may collect Internet posts from training and candidate websites as part of a process to generate website collection list 465
  • post collector 50 may collect posts from the websites in collection list 465 .
  • Pre-collection post collector 450 may collect (step 510 ) posts from a training set of websites that may include “good” websites 405 which may be known to have user generated personal experience event reports.
  • the training set may also include “bad” websites 410 , which may be known to have content related to the search subject (i.e. pharmaceuticals, cars, etc depending on the configuration of system 100 ) which may not qualify as user generated personal experience event reports.
  • “Good” websites 405 may be defined by any suitable method.
  • a generic search engine may be used to locate websites according to relevant keywords, and at least a subset of the website's content may be manually examined to determine whether or not the website includes user generated personal experience event reports.
  • the posts collected by pre-collection post collector 450 may be filtered to contain only verified authentic user generated personal experience event reports.
  • the relevant keywords may be provided by an outside source such as known relevant terms database 425 .
  • database 425 may be a publicly available database of medical terms that may include comprehensive lists of drugs and known symptoms. Similar methods may also be used to define “bad” websites.
  • Pattern recognizer 430 may detect (step 520 ) recurring patterns in the training set posts. It will be appreciated that any known, suitable methods for pattern detection/recognition may be used in the context of step 430 . For example, such detection may include starting by searching for instances of terms from known relevant terms database 425 .
  • database 425 may contain examples of at least one (and preferably both) of the anchor categories for which system 100 may be configured.
  • database 425 may contain a list of drugs and known symptoms. It will be appreciated that database 425 may provide the basis for anchor database 225 .
  • Step 430 may also include detection of recurring terms that may not be found in database 425 .
  • indications of personal reference/experience terms such as those in filter database 215 may also be detected.
  • Exemplary such terms may include phrases such as: “I took” or “I felt better”.
  • filter database 215 may be at least in part populated based on some or all of the terms detected in step 430 .
  • step 430 may be “negative” in nature.
  • terms such as “buy”, “sale”, “selling” may indicate an attempt to sell or market a product and that the post may therefore not be an authentic user generated personal experience event report.
  • Such terms may typically be found in posts on bad websites 410 .
  • step 520 may include detection of larger expressions as well.
  • a “moving window” may be used to check for recurring combination expressions including one or more of the anchor terms from database 425 .
  • pattern recognizer 430 may initially detect anchors “Drug A” (drug name) and “headache (symptom). By incrementally employing a moving window to detect combination expression around these anchors, pattern recognizer may also detect larger expressions such as personal experience term “I took” in juxtaposition to anchor term “Drug A”, and a variant on the initial symptom term, “headache was gone”.
  • Pattern recognizer 430 may be configured do perform statistical analysis on the terms detected in step 520 to track their occurrences and determine their significance.
  • utility 400 may be configured to facilitate inspection of the results of step 520 by a user of system 100 , and to enable the user to adjust the input data as necessary to achieve a truer result. Accordingly, step 520 may be repeated as necessary.
  • the patterns detected by pattern recognizer 430 may be stored in detected patterns database 415 .
  • Training set scoring engine 440 may score (step 530 ) the terms in detected patterns database 415 to produce weighted indicators of the likelihood that a given website may or may not contain user generated personal experience event reports. Such scoring may employ any suitable method. For example, engine 440 may run a linear regression on the terms in detect patterns database 415 vis-à-vis the training set of posts from “good” and “bad” websites to determine the weight of each term as an indicator of likelihood that a given website is either “good” or “bad”.
  • engine 440 may expand the scoring process to also include other indicators from ranking sources database 470 .
  • Database 470 may represent rankings from external sources such as, for example, Google page ranks and/or Alexa ratings.
  • Engine 440 may include the associated rankings for the page on which each post may be located as additional factors when running the linear regression on the terms in detect patterns database 415 .
  • engine 440 may expand the scoring process to also include additional factors that may be calculated or derived from the original posts.
  • additional factors may include, for example, the query rank of the original query that identified the post as a candidate and meta keywords of the page.
  • engine 440 may expand the scoring process to also include the number of images and/or links on the page. It will be appreciated that most user forums have relatively few images and links per page. Accordingly, a higher number of links or images per page may tend to indicate a “bad” website.
  • engine 440 may also expand the scoring process to also include statistical data from cumulative scoring.
  • Such factors may include, for example, the ratio of posts to the number of discussion (aka “threads”); or the overall ranking of a given anchor and/or term in “good” and “bad” websites.
  • the anchor term “Aspirin” may have an overall high ranking in “good” posts; statistically, personal experience event reports citing Aspirin may typically be genuine.
  • the anchor term “Viagra” may typically be indicative of SPAM or commercial posts.
  • utility 400 may be configured to facilitate inspection of the results of step 530 by a user of system 100 , and to enable the user to adjust the input data as necessary to achieve a truer result. Accordingly, step 530 may be repeated as necessary.
  • the patterns scored by engine 440 may be stored in weighted indicators database 435 . It will be appreciated that weighted indicators database 435 may therefore contain a superset (including calculated weights) of the terms in detected patterns database 415 and known relevant terms 425 . It will also be appreciated that database 435 may provide the basis for terms database 235 .
  • Pre-collection post collector 450 may collect (step 540 ) posts from candidate websites 420 on the Internet by formulating search queries based on positive term based indicators from weighted indicators database 435 .
  • Candidate scoring engine 460 may then score (step 550 ) each website 420 vis-à-vis all of the factors in weighted indicators database 435 to assess its likelihood to contain user generated personal experience event reports.
  • System 100 may be configured with a threshold weighted score to determine whether or not a given website 420 may be considered likely to contain user generated personal experience event reports.
  • Utility 400 may update (step 560 ) website collection list 465 to include websites 420 that exceed such a threshold. It will be appreciated that process 500 may be performed on a periodic basis to continually update list 465 . Accordingly, utility 400 may also record websites 420 with weighted scores below the threshold to avoid examining them again in the future.
  • website collection list 465 may be used by post collector 50 in the embodiment of FIG. 1 .
  • Embodiments of the present invention may include apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the desired purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk, including floppy disks, optical disks, magnetic-optical disks, read-only memories (ROMs), compact disc read-only memories (CD-ROMs), random access memories (RAMs), electrically programmable read-only memories (EPROMs), electrically erasable and programmable read only memories (EEPROMs), magnetic or optical cards, Flash memory, or any other type of media suitable for storing electronic instructions and capable of being coupled to a computer system bus.
  • ROMs read-only memories
  • CD-ROMs compact disc read-only memories
  • RAMs random access memories
  • EPROMs electrically programmable read-only memories
  • EEPROMs electrically erasable and

Abstract

A method implementable on a computing device for scoring segments of Internet posts is disclosed. The method includes defining a set of indicating factors where each indicating factor is associated with a possible feature in the segments, and where possible features affect a likelihood that the Internet posts represent a user generated product personal experience event report associated with a pre-defined search subject.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation application claiming benefit from U.S. patent application Ser. No. 13/253,090 filed Oct. 5, 2011 which is hereby incorporated in its entirety by reference.
  • FIELD OF THE INVENTION
  • The present invention relates to Internet search engines generally and to customized search engines for user generated experience reports in particular.
  • BACKGROUND OF THE INVENTION
  • The Internet contains a plethora of reports that are at least somewhat related to consumer products and services. The sources for these reports are varied. For example, manufacturer/providers may provide information as part of their marketing efforts. Their competitors may provide conflicting information to promote competing products and services. Nominally disinterested parties provide independent reviews, although such reviews are often prejudiced by concerns not readily apparent to the reader. Such products and services are also often mentioned “by the way” as background for other subjects, making it difficult to weed out “true” reports from a multitude of “hits” received when using conventional Internet search engines.
  • The Internet also contains “forum” sites where users can post opinions and discuss various issues of interest. Some of the user posts on such sites constitute “personal experience” reports wherein consumers discuss their actual personal experiences using products and services. A typical such personal experience would be something like: “I used product X and my digestion improved immediately.” In such manner, forum sites may provide valuable firsthand information from actual consumers of products and services.
  • Unfortunately, personal experience event reports are typically posted in free text with only nominal constraints on form or content, rendering them unstructured and difficult to identify by non-manual processes. It is therefore be difficult to identify and collate personal experience event reports using conventional Internet search engines, even when such search engines are configured to search forum sites.
  • SUMMARY OF THE PRESENT INVENTION
  • There is provided, in accordance with an embodiment of the present invention, a method implementable on a computing device for scoring segments of Internet posts. The method may include defining a set of indicating factors, where each indicating factor is associated with a possible feature in the segments, and where possible features affect a likelihood that the Internet posts represent a user generated product personal experience event report associated with a pre-defined search subject.
  • In accordance with an embodiment of the present invention, the method may include weighting the indicating factors in accordance with the likelihood, where each of the indicating factors is at least one of a negative and a positive value.
  • There is provided, in accordance with an embodiment of the present invention, a method implementable on a computing device for scoring segments of Internet posts. The method may include listing factors detected in text segments of the Internet posts, defining weights to be associated with each of the factors, where each defined weight reflects a value for an associated factor as a predictor of the segments representing a user generated product personal experience event report. The method may additionally include multiplying a ratio of the weighted factors, divided by an overall number of words in the segment.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:
  • FIG. 1 is a block diagram of a novel user-generated personal experience retrieval system 100, designed and operative in accordance with a preferred embodiment of the present invention;
  • FIG. 2 is a block diagram of the segment analyzer of the embodiment of FIG. 1;
  • FIG. 3 is a block diagram of a novel process to be performed by the system of FIG. 1;
  • FIG. 4 is an illustration of an exemplary Internet post to be analyzed and processed by the system of FIG. 1;
  • FIGS. 5-7B are illustrations of exemplary scoring tables to be used during the process of FIG. 3;
  • FIG. 8 is a schematic diagram of a novel forum website selection utility, constructed and operative in accordance with a preferred embodiment of the present invention; and
  • FIG. 9 is a block diagram of a novel process to be performed by the system of FIG. 8.
  • It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
  • DETAILED DESCRIPTION OF THE PRESENT INVENTION
  • In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.
  • Applicants have realized that currently available Internet search engines are inefficient tools for searching Internet forums for user generated personal experience event reports that may be used to evaluate and compare products and services. An Internet user generated personal experience event report may be a statement written by users on an Internet platform (such as a message board), referring to their own experience with regard to a specific product or service. A specialized search process may be configured to identify such reports related to a specific field of products and/or services in order to filter out “false hits” and extraneous information that may typically be retrieved by a search engine.
  • Reference is now made to FIG. 1 which illustrates a novel user-generated personal experience retrieval system 100, designed and operative in accordance with a preferred embodiment of the present invention. System 100 may comprise post collector 50 in communication with forums 20 on Internet 10. System 100 may also comprise segment analyzer 200, scoring engine 300 and user search interface 350.
  • In accordance with a preferred embodiment of the present invention, system 100 may be configured to identify user-generated personal experience event reports that may be related to pharmaceutical products. It will be appreciated that a typical subject for which there may be demand for collating and analyzing user-generated personal experience event reports may be pharmaceuticals. For example, potential users of pharmaceuticals may understandably wish to study personal experience event reports prior to beginning a treatment. To illustrate such an embodiment, system 100 and its methods of operation may therefore be described hereinbelow in the context of a pharmaceutical based configuration. However, it will be appreciated that the present invention may be configured for any suitable subject for which personal experience event reports may be posted on the Internet, for example, automobiles, airline travel, banking services, food and beverages, etc
  • Post collector 50 may periodically collect posts from a “collection list” of chat forums 20 on Internet 10. The collected posts may be forwarded to segment analyzer 200 to identify segments of forum posts that may be likely to contain personal experience event reports regarding the subject for which system 100 may be configured. For example, segment analyzer may identify post segments that may be likely to contain personal experience event reports regarding the use of pharmaceuticals.
  • These segments may be forwarded to scoring engine 300 which may “score” the segments in terms of their likely relevance as personal reports. Scored segments may then be stored in personal experience database 110 along with addressing information, such as a uniform resource locator (URL) for the original post. Users may then use user search interface 350 to search database 110 for user-generated personal experience event reports regarding the products/services for which system 100 may be configured. For example, a user may search for event reports relating to “Drug A” in order to find out if anyone that had personally used Drug A had reported regarding its success and/or any side effects suffered when using it. The output of such a search may consist of a list of chat posts, sorted according to the score assigned by scoring engine 300. It will be appreciated that the present invention may include any suitable implementation for user search interface 350, such as, for example, a browser based utility for inputting search parameters and displaying links to related user generated personal experience event reports.
  • The collection list used by post collector 50 may include chat forums 20 deemed to be relevant to the subject for which system 100 may be configured. For example, if system 100 is configured for personal reports on pharmaceutical products, the collection list may include a list on chat forums 20 on which it may be likely that users may post personal experience event reports relating to their use of pharmaceutical products. It will be appreciated that post collector 50 may be configured with to include any suitable method such as known in the art for “scraping” forum posts from the collection list. It will similarly be appreciated that post collector 50 may be configured perform such “scraping” on an incremental basis to avoid reprocessing older posts.
  • As will be disclosed hereinbelow, the present invention may also include a novel pre-collection process for compiling the collection list for system 100. However, it will be appreciated that the present invention may include any suitable method for compiling the collection list, including manual inspection.
  • Reference is now made to FIG. 2 which illustrates segment analyzer 200 in greater detail. Segment analyzer 200 may comprise post filtering module 210, anchor detection module 220, basic segmentation unit 230, density calculator 240 and segment optimizer 250. Segment analyzer 200 may also comprise filter database 215, anchor database 225 and terms database 235, each of which may be referenced by the other elements of segment analyzer 200.
  • Reference is now also made to FIG. 3 which illustrates a novel post segmentation process 260 that may be executed by segment analyzer 200 to derive optimally segmented user-generated personal experience event reports from the posts collected by post collector 50.
  • Post filtering module 210 may receive (step 262) posts from post collector 50. Post filtering module 210 may filter (step 264) these posts according to terms found in filter database 215. Filter database 215 may store a list of categorized relevant terms which module 210 may search for in each post. Depending on the configuration of system 100, at least one term from a combination of some the categories must be found in a post for that post to pass through the step 264. The categories may include, for example, product/service name, indication of personal reference, and indication of personal experience. The product/service name category may consist of names of product/services regarding which a user of system 100 may wish to search for personal experience event reports. It will be appreciated that other configurations for system 100 are included in the present invention. For example, if system 100 is configured for automobile research, the terms in the product/services name category may include a list of automobile makes, manufacturers and nicknames, such as, for example: “Corvette”, “Chevrolet”, “Chevy”, and “Vette”. The category for indications of personal reference may include terms such as “I”, “my”, “me”, “mine”, “myself”, etc. that may indicate that the post refers to an actual personal experience. The category for personal experience may include terms such as, for example, “I used”, “I bought” “I had”, etc. that may indicate that the poster had an actual personal experience; that the report was not based on hearsay or opinion. In accordance with a preferred embodiment of the present invention, a post may have to contain at least one term from each of these categories in order to pass through step 264.
  • It will be appreciated, however, that depending on the configuration of system 100 there may be other term categories in filter database 215. For example, if system 100 is configured for pharmaceuticals, the relevant terms may be divided into five categories: Drug name (i.e. product/service name), indication of personal reference, indication of personal drug experience, symptom, and personal symptom experience. Symptom terms may be precise medical terms, such as, for example, “headache”, or alternatively they may also include user descriptions such as “my head exploded”. Personal symptom experience terms may be indicative of the poster having a personal cause/reason for using the indicated drug, for example: “I suffered from”, “I have experienced”. In accordance with a preferred embodiment of the present invention, when system 100 may be configured for pharmaceuticals, terms from all five categories must be present in a post in order for it to pass through step 264. In accordance with an alternative preferred embodiment, post filtering module may be configured to require terms from only four categories, wherein a term from only one of the personal experience and personal symptom experience categories may be required. It will be appreciated that similar categories may be used to configure system 100 for non-pharmaceutical products and/or services. For example, if system 100 is configured for automobile research, the symptom category may be replaced by a “preference category” including terms such as “family car”, “sports car”, “road handling” or “seven seats”. Similarly, the personal symptom experience category may be replaced by a personal preference category including terms such as “I need a bigger car”, “I wanted a sports car” or “I value engine performance”.
  • Anchor detection module 220 may detect (step 266) segment anchors in posts that contain all of the required term categories. Module 220 may reference database 225 for lists of segment anchor terms to match to terms in the posts. Segment anchors may represent a pair of term categories that may together define the personal experience event reports of interest for system 100. For example, in a pharmaceutical configuration, the segment anchors may be the drug name and symptom categories. Alternatively, the segment anchors may be the drug name and personal symptom experience categories. In accordance with a preferred embodiment of the present invention, segment anchors for a pharmaceutical configuration may be terms from the drug name and symptom categories. Database 225 may be populated by a publicly available database of drugs and symptoms.
  • Basic segmentation unit 230 may then segment (step 268) the posts based on the anchors identified in step 266 to find the minimal text segments in the post that have at least one term from each of the categories required for the filter process in step 264. Unit 230 may first search for the required terms between the identified anchors and may then incrementally search before and after the anchors one word at a time until at least one of the terms from all of the relevant categories may be identified in order to define basic segments.
  • Density calculator 240 may reference terms database 235 to calculate (step 270) the density of relevant terms in each basic segment. The density may be defined as the ratio of the relevant terms each multiplied by an associated weight stored in database 235, divided by the overall number of words in the basic segment. It will be appreciated that each term in database 235 may have a different defined weight that may reflect its value as a predictor of the likelihood that the post being analyzed may represent a user generated personal experience event report. Accordingly, the calculated density score may provide a measure of the amount of relevant information contained in the specified segment. It will be appreciated that any suitable method may be used to assign the weights. As will be described hereinbelow, in accordance with a preferred embodiment of the present invention, linear regressions may be run on a training set of data to derive these weights.
  • It will also be appreciated that some of the terms may have negative values. In addition to the terms in filter database 215, terms database 235 may also store other categories of terms that may also be used to assess the likelihood of a segment containing a valid user-generated personal experience event report. For example, terms database 235 may also store terms relating to a “negative” category. Terms such as “heard of”, “likely”, “I've been told”, “did not” may typically impact negatively on the likelihood that a given report is a true personal experience, and may therefore be significant when assessing a given segment at the next step of the process. Depending on the configuration of system 100, other categories may be added as well. For example, in an exemplary configuration for pharmaceuticals, there may be an “outcome” or “result” category that may include terms such as “got better”, “recovered” or “condition worsened”. As in the embodiments described hereinabove, each term in such a category may be weighted to reflect its value as a predictor of the likelihood that the post being analyzed may represent a user generated personal experience event report.
  • Segment optimizer 250 may incrementally check each word before and after the segment to find (step 272) the next term from database 235. Density calculator 240 may then recalculate (step 274) the density as in step 270. If the result is that density has increased (step 276), segment optimizer may again find (step 272) the next term. Steps 272 and 274 may be repeated until the density ceases to increase (step 276) at which point the final, presumably optimized, segment may be output by segment analyzer 200.
  • Reference is now made to FIG. 4 which illustrates an exemplary post as analyzed by segment analyzer 200. Terms 282 and 284 may represent anchor terms, “symptom” and “drug name” respectively. Term 281 may represent a personal experience term, terms 288 may represent personal reference terms, and terms 289 may represent negative terms. It will be appreciated that there may be two sets of anchor terms 282 and 284. Segment analyzer may use density calculator 240 to compare the density of the two sets in order to define a basic segment 285. Segment analyzer 200 may use terms 282A and 284A to define basic segment 285 since they reflect a denser segment; they “enclose” personal experience term 281, whereas terms 282B and 284B are much farther away from term 281. As described hereinabove, segment analyzer 200 may optimize basic segment 285 by expanding it to include additional terms and recalculating density (steps 272 and 274). Accordingly, an exemplary optimal segment 290 may be defined by expanding basic segment 285 to include terms 287 and 288A as well. It will also be appreciated that the second and third sentences may contain several negative terms 289, which may decrease the likelihood that an optimal segment may be in found in those sentences.
  • Reference is now made to FIG. 5 which illustrates an exemplary factor weight table 305, suitable for use with a pharmaceutical configuration of system 100. Scoring engine 300 may use such a table to “score” the optimized segments received from segment analyzer 200 in order to assess the likelihood that they may contain relevant user-generated personal experience event reports. Each factor 310 may represent a possible situation that may occur in a segment, and may be weighted to reflect the effect of such a situation on the likelihood that a post may indeed be a relevant user-generated personal experience event report. It will be appreciated that any suitable method may be used to assign the weights. As will be described hereinbelow, in accordance with a preferred embodiment of the present invention, linear regressions may be run on a training set of data to derive these weights.
  • For example, high concept density, i.e high density as calculated by density calculator 240, may likely indicate that a post may indeed be a relevant user-generated personal experience event report. On the other hand, the appearance of a second drug between the anchors may lessen this likelihood, and accordingly may be given a negative weight, for example: −5. The proximity of terms may also reflect on the likelihood that a post may indeed be a relevant user-generated personal experience event report. For example, the farther apart a drug or experience and an associated side effect term may be mentioned in the segment, the less likely that they represent a “true” personal experience event report for that drug. Accordingly, proximity factors may be assigned negative weights. It will be appreciated that the exemplary values in table 305 may be derived from statistical modeling of actual pharmaceutical related forum posts. However, the present invention may also include other feature-weight sets for both pharmaceutical and other configurations.
  • FIG. 6, to which reference is now made, illustrates table 305 (now labeled 305′) with exemplary values added based on an exemplary post segment. In order to score the post, scoring engine 300 may multiply each factor value per its associated weight, and then add the products for the final score. The score for these exemplary values would thus be computed as:

  • Score=23*(−2)+1*(−3)+0*(−5)+0*(−5)+9*1+0.34*2+0*4+1*(−10)+1*10+0*(−10)=−39.28
  • A negative score may indicate that the likelihood of a relevant report may be low. System 100 may be configured to store all posts with a score above a certain threshold in personal experience database 110.
  • FIGS. 7A and 7B, to which reference is now made, show the scoring for two exemplary post segments referring to “Drug B”. FIG. 7A shows a score of +14.83, whereas FIG. 7B shows a score of −14.46. The salient differences between the two examples may be that the example in FIG. 7A has an explicit “symptom experience (i.e. “no sex drive”) and lacks a negating factor; whereas the example in FIG. 7B has a negating factor (“heard”) and lacks an explicit symptom experience (“can cause” which may indicate a lack of actual experience). Accordingly, the post from FIG. 7A may be determined to qualify as a user generated personal experience event report, whereas, the post from FIG. 7B may not. It will be appreciated that the threshold for qualification may be configurable.
  • It will be appreciated that it may not be possible to continuously perform comprehensive searches for user generated personal experience event reports from among all of the content available on the Internet. By necessity, the “collection list” referred to hereinabove may therefore represent only a small fraction of the websites on the Internet. In accordance with a preferred embodiment of the present invention, a forum website selection utility may be used to identify appropriate websites for collection by post collector 50, thus reducing the “universe” of websites for post collection to a manageable number of relevant websites with non-commercial/SPAM authentic user generated personal experience event reports. Reference is now made to FIG. 8 which illustrates forum website selection utility 400, constructed and operative in accordance with a preferred embodiment of the present invention.
  • Utility 400 may comprise pre-collection post collector 450, pattern recognizer 430, training set scoring engine 440 and candidate scoring engine 460. Utility 400 may communicate with Internet 10 via post collector 450, which may be configured with functionality for collecting posts from Internet websites similar to that of post collector 50. As may be described hereinbelow, pre-collection post collector 450 may collect Internet posts from training and candidate websites as part of a process to generate website collection list 465, whereas post collector 50 may collect posts from the websites in collection list 465.
  • Reference is also made to FIG. 9 which illustrates a novel website selection process 500 to be performed by utility 400 in accordance with a preferred embodiment of the present invention. Pre-collection post collector 450 may collect (step 510) posts from a training set of websites that may include “good” websites 405 which may be known to have user generated personal experience event reports. In accordance with an alternative preferred embodiment of the present invention, the training set may also include “bad” websites 410, which may be known to have content related to the search subject (i.e. pharmaceuticals, cars, etc depending on the configuration of system 100) which may not qualify as user generated personal experience event reports.
  • “Good” websites 405 may be defined by any suitable method. For example, a generic search engine may be used to locate websites according to relevant keywords, and at least a subset of the website's content may be manually examined to determine whether or not the website includes user generated personal experience event reports. In accordance with a preferred embodiment of the present invention, the posts collected by pre-collection post collector 450 may be filtered to contain only verified authentic user generated personal experience event reports. The relevant keywords may be provided by an outside source such as known relevant terms database 425. For example, if system may be configured for pharmaceuticals, database 425 may be a publicly available database of medical terms that may include comprehensive lists of drugs and known symptoms. Similar methods may also be used to define “bad” websites.
  • Pattern recognizer 430 may detect (step 520) recurring patterns in the training set posts. It will be appreciated that any known, suitable methods for pattern detection/recognition may be used in the context of step 430. For example, such detection may include starting by searching for instances of terms from known relevant terms database 425. In accordance with a preferred embodiment of the present invention, database 425 may contain examples of at least one (and preferably both) of the anchor categories for which system 100 may be configured. For example, database 425 may contain a list of drugs and known symptoms. It will be appreciated that database 425 may provide the basis for anchor database 225.
  • Step 430 may also include detection of recurring terms that may not be found in database 425. For example, indications of personal reference/experience terms such as those in filter database 215 may also be detected. Exemplary such terms may include phrases such as: “I took” or “I felt better”. In accordance with a preferred embodiment of the present invention, filter database 215 may be at least in part populated based on some or all of the terms detected in step 430.
  • It will be appreciated that some of the recurring terms detected by step 430 may be “negative” in nature. For example, terms such as “buy”, “sale”, “selling” may indicate an attempt to sell or market a product and that the post may therefore not be an authentic user generated personal experience event report. Such terms may typically be found in posts on bad websites 410.
  • It will be appreciated that step 520 may include detection of larger expressions as well. For example, a “moving window” may be used to check for recurring combination expressions including one or more of the anchor terms from database 425. For example, in the text: “this morning I took Drug A and less than an hour later my headache was gone,” pattern recognizer 430 may initially detect anchors “Drug A” (drug name) and “headache (symptom). By incrementally employing a moving window to detect combination expression around these anchors, pattern recognizer may also detect larger expressions such as personal experience term “I took” in juxtaposition to anchor term “Drug A”, and a variant on the initial symptom term, “headache was gone”. Pattern recognizer 430 may be configured do perform statistical analysis on the terms detected in step 520 to track their occurrences and determine their significance.
  • It will be appreciated that utility 400 may be configured to facilitate inspection of the results of step 520 by a user of system 100, and to enable the user to adjust the input data as necessary to achieve a truer result. Accordingly, step 520 may be repeated as necessary. The patterns detected by pattern recognizer 430 may be stored in detected patterns database 415.
  • Training set scoring engine 440 may score (step 530) the terms in detected patterns database 415 to produce weighted indicators of the likelihood that a given website may or may not contain user generated personal experience event reports. Such scoring may employ any suitable method. For example, engine 440 may run a linear regression on the terms in detect patterns database 415 vis-à-vis the training set of posts from “good” and “bad” websites to determine the weight of each term as an indicator of likelihood that a given website is either “good” or “bad”.
  • In accordance with a preferred embodiment of the present invention, engine 440 may expand the scoring process to also include other indicators from ranking sources database 470. Database 470 may represent rankings from external sources such as, for example, Google page ranks and/or Alexa ratings. Engine 440 may include the associated rankings for the page on which each post may be located as additional factors when running the linear regression on the terms in detect patterns database 415.
  • In accordance with a preferred embodiment of the present invention, engine 440 may expand the scoring process to also include additional factors that may be calculated or derived from the original posts. Such additional factors may include, for example, the query rank of the original query that identified the post as a candidate and meta keywords of the page.
  • In accordance with a preferred embodiment of the present invention, engine 440 may expand the scoring process to also include the number of images and/or links on the page. It will be appreciated that most user forums have relatively few images and links per page. Accordingly, a higher number of links or images per page may tend to indicate a “bad” website.
  • In accordance with a preferred embodiment of the present invention, engine 440 may also expand the scoring process to also include statistical data from cumulative scoring. Such factors may include, for example, the ratio of posts to the number of discussion (aka “threads”); or the overall ranking of a given anchor and/or term in “good” and “bad” websites. For example, the anchor term “Aspirin” may have an overall high ranking in “good” posts; statistically, personal experience event reports citing Aspirin may typically be genuine. However, the anchor term “Viagra” may typically be indicative of SPAM or commercial posts.
  • It will be appreciated that utility 400 may be configured to facilitate inspection of the results of step 530 by a user of system 100, and to enable the user to adjust the input data as necessary to achieve a truer result. Accordingly, step 530 may be repeated as necessary. The patterns scored by engine 440 may be stored in weighted indicators database 435. It will be appreciated that weighted indicators database 435 may therefore contain a superset (including calculated weights) of the terms in detected patterns database 415 and known relevant terms 425. It will also be appreciated that database 435 may provide the basis for terms database 235.
  • Pre-collection post collector 450 may collect (step 540) posts from candidate websites 420 on the Internet by formulating search queries based on positive term based indicators from weighted indicators database 435. Candidate scoring engine 460 may then score (step 550) each website 420 vis-à-vis all of the factors in weighted indicators database 435 to assess its likelihood to contain user generated personal experience event reports. System 100 may be configured with a threshold weighted score to determine whether or not a given website 420 may be considered likely to contain user generated personal experience event reports.
  • Utility 400 may update (step 560) website collection list 465 to include websites 420 that exceed such a threshold. It will be appreciated that process 500 may be performed on a periodic basis to continually update list 465. Accordingly, utility 400 may also record websites 420 with weighted scores below the threshold to avoid examining them again in the future.
  • It will be appreciated that website collection list 465 may be used by post collector 50 in the embodiment of FIG. 1.
  • Unless specifically stated otherwise, as apparent from the preceding discussions, it is appreciated that, throughout the specification, discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer, computing system, or similar electronic computing device that manipulates and/or transforms data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.
  • Embodiments of the present invention may include apparatus for performing the operations herein. This apparatus may be specially constructed for the desired purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk, including floppy disks, optical disks, magnetic-optical disks, read-only memories (ROMs), compact disc read-only memories (CD-ROMs), random access memories (RAMs), electrically programmable read-only memories (EPROMs), electrically erasable and programmable read only memories (EEPROMs), magnetic or optical cards, Flash memory, or any other type of media suitable for storing electronic instructions and capable of being coupled to a computer system bus.
  • The processes and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the desired method. The desired structure for a variety of these systems will appear from the description below. In addition, embodiments of the present invention are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
  • While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.

Claims (3)

What is claimed is:
1. A method for scoring segments of Internet posts, implementable on a computing device, the method comprising:
defining a set of indicating factors, wherein each said indicating factor is associated with a possible feature in the segments, and
wherein said possible features affect a likelihood that the Internet posts represent a user generated product personal experience event report associated with a pre-defined search subject.
2. A method according to claim 1 comprising:
weighting said indicating factors in accordance with said likelihood, wherein each of said indicating factors is at least one of a negative and a positive value.
3. A method for scoring segments of Internet posts, implementable on a computing device, the method comprising:
listing factors detected in text segments of the Internet posts, defining weights to be associated with each of said factors, wherein each defined weight reflects a value for an associated factor as a predictor of the segments representing a user generated product personal experience event report; and
multiplying a ratio of said weighted factors, divided by an overall number of words in the segment.
US14/106,880 2010-10-06 2013-12-16 System and method for detecting personal experience event reports from user generated internet content Abandoned US20140108429A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/106,880 US20140108429A1 (en) 2010-10-06 2013-12-16 System and method for detecting personal experience event reports from user generated internet content

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US39021510P 2010-10-06 2010-10-06
US39022010P 2010-10-06 2010-10-06
US13/253,090 US8612455B2 (en) 2010-10-06 2011-10-05 System and method for detecting personal experience event reports from user generated internet content
US14/106,880 US20140108429A1 (en) 2010-10-06 2013-12-16 System and method for detecting personal experience event reports from user generated internet content

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/253,090 Continuation US8612455B2 (en) 2010-10-06 2011-10-05 System and method for detecting personal experience event reports from user generated internet content

Publications (1)

Publication Number Publication Date
US20140108429A1 true US20140108429A1 (en) 2014-04-17

Family

ID=45925937

Family Applications (4)

Application Number Title Priority Date Filing Date
US13/253,090 Active 2031-12-30 US8612455B2 (en) 2010-10-06 2011-10-05 System and method for detecting personal experience event reports from user generated internet content
US14/106,881 Abandoned US20140108430A1 (en) 2010-10-06 2013-12-16 System and method for detecting personal experience event reports from user generated internet content
US14/106,878 Abandoned US20140108409A1 (en) 2010-10-06 2013-12-16 System and method for detecting personal experience event reports from user generated internet content
US14/106,880 Abandoned US20140108429A1 (en) 2010-10-06 2013-12-16 System and method for detecting personal experience event reports from user generated internet content

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US13/253,090 Active 2031-12-30 US8612455B2 (en) 2010-10-06 2011-10-05 System and method for detecting personal experience event reports from user generated internet content
US14/106,881 Abandoned US20140108430A1 (en) 2010-10-06 2013-12-16 System and method for detecting personal experience event reports from user generated internet content
US14/106,878 Abandoned US20140108409A1 (en) 2010-10-06 2013-12-16 System and method for detecting personal experience event reports from user generated internet content

Country Status (1)

Country Link
US (4) US8612455B2 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014032002A1 (en) * 2012-08-23 2014-02-27 Ims Health Incorporated Detecting drug adverse effects in social media and mobile applications
US9076182B2 (en) 2013-03-11 2015-07-07 Yodlee, Inc. Automated financial data aggregation
US10037367B2 (en) * 2014-12-15 2018-07-31 Microsoft Technology Licensing, Llc Modeling actions, consequences and goal achievement from social media and other digital traces
US20180011977A1 (en) * 2015-03-13 2018-01-11 Ubic, Inc. Data analysis system, data analysis method, and data analysis program
US9971940B1 (en) * 2015-08-10 2018-05-15 Google Llc Automatic learning of a video matching system
US10771424B2 (en) * 2017-04-10 2020-09-08 Microsoft Technology Licensing, Llc Usability and resource efficiency using comment relevance
CN109254799B (en) * 2018-08-29 2023-03-10 新华三技术有限公司 Boot program starting method and device and communication equipment
US20220027827A1 (en) * 2020-07-24 2022-01-27 Content Square SAS Benchmarking of user experience quality

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6012053A (en) * 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US20080034058A1 (en) * 2006-08-01 2008-02-07 Marchex, Inc. Method and system for populating resources using web feeds
US8463790B1 (en) * 2010-03-23 2013-06-11 Firstrain, Inc. Event naming

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001059625A1 (en) * 2000-02-10 2001-08-16 Involve Technology, Llc System for creating and maintaining a database of information utilizing user opinions
US6946715B2 (en) * 2003-02-19 2005-09-20 Micron Technology, Inc. CMOS image sensor and method of fabrication
US8233714B2 (en) * 2006-08-01 2012-07-31 Abbyy Software Ltd. Method and system for creating flexible structure descriptions
US7676457B2 (en) * 2006-11-29 2010-03-09 Red Hat, Inc. Automatic index based query optimization
US8959433B2 (en) * 2007-08-19 2015-02-17 Multimodal Technologies, Llc Document editing using anchors
US9092789B2 (en) * 2008-04-03 2015-07-28 Infosys Limited Method and system for semantic analysis of unstructured data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6012053A (en) * 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US20080034058A1 (en) * 2006-08-01 2008-02-07 Marchex, Inc. Method and system for populating resources using web feeds
US8463790B1 (en) * 2010-03-23 2013-06-11 Firstrain, Inc. Event naming

Also Published As

Publication number Publication date
US20140108430A1 (en) 2014-04-17
US20120089616A1 (en) 2012-04-12
US20140108409A1 (en) 2014-04-17
US8612455B2 (en) 2013-12-17

Similar Documents

Publication Publication Date Title
US8612455B2 (en) System and method for detecting personal experience event reports from user generated internet content
US9443245B2 (en) Opinion search engine
EP2192500B1 (en) System and method for providing robust topic identification in social indexes
US8140512B2 (en) Consolidated information retrieval results
US9535911B2 (en) Processing a content item with regard to an event
US9117006B2 (en) Recommending keywords
US20160162582A1 (en) Method and system for conducting an opinion search engine and a display thereof
Haddow et al. Citation analysis and peer ranking of Australian social science journals
US20080077581A1 (en) System and method for providing medical disposition sensitive content
US20210157863A1 (en) Domain-specific negative media search techniques
US20100325105A1 (en) Generating ranked search results using linear and nonlinear ranking models
US20110106743A1 (en) Method and system to predict a data value
KR101100830B1 (en) Entity searching and opinion mining system of hybrid-based using internet and method thereof
US20130132401A1 (en) Related news articles
CN113724848A (en) Medical resource recommendation method, device, server and medium based on artificial intelligence
US8880390B2 (en) Linking newsworthy events to published content
KR102252188B1 (en) Product recommendation system and method reflecting user purchasing criterion
WO2020101477A1 (en) System and method for dynamic entity sentiment analysis
Boyer et al. How to sort trustworthy health online information? Improvements of the automated detection of HONcode criteria
Nan et al. DO ONLY REVIEW CHARACTERISTICS AFFECT CONSUMERS'ONLINE BEHAVIORS? A STUDY OF RELATIONSHIP BETWEEN REVIEWS.
CN111125561A (en) Network heat display method and device
CN102915358A (en) Method and device for realizing navigation website
Rathan et al. Every post matters: a survey on applications of sentiment analysis in social media
US20150193444A1 (en) System and method to determine social relevance of Internet content
CN102915357A (en) Method and device for realizing website navigation

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION