WO1998029818A1 - Procede et appareil d'analyse des entrees-clavier d'un utilisateur pour determiner ou verifier des faits - Google Patents

Procede et appareil d'analyse des entrees-clavier d'un utilisateur pour determiner ou verifier des faits Download PDF

Info

Publication number
WO1998029818A1
WO1998029818A1 PCT/US1997/021781 US9721781W WO9829818A1 WO 1998029818 A1 WO1998029818 A1 WO 1998029818A1 US 9721781 W US9721781 W US 9721781W WO 9829818 A1 WO9829818 A1 WO 9829818A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
attributes
guess
analyzed
computer implemented
Prior art date
Application number
PCT/US1997/021781
Other languages
English (en)
Inventor
John W. Richardson
Robert T. Adams
Vaughn S. Iverson
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to EP97951483A priority Critical patent/EP1016000A4/fr
Priority to AU55114/98A priority patent/AU5511498A/en
Publication of WO1998029818A1 publication Critical patent/WO1998029818A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Definitions

  • the invention relates generally to the field of online communications in multi-user environments. More specifically, the invention relates to identifying attributes of a user based upon information available through the online communication session.
  • Masquerading is one of the controversial aspects of online chat rooms. Due to the anonymity and untraceability masquerading has become rampant in online chat rooms and the like. For example, men present themselves as women and children pretend to be adults.
  • a method of providing real-time, augmented information to a chat room user about one or more other users in the chat room is disclosed.
  • Information is received that has been transmitted by a user.
  • the information is analyzed to determine one or more analyzed guess attributes of the user.
  • they are displayed to the local user.
  • the identity of a chat room user can be verified.
  • One or more text messages are received from a user.
  • analyzed guess attributes of the user and confidence levels associated with each of the analyzed guess attributes are determined.
  • the analyzed guess attributes are compared with a stored user profile that has been generated from one or more previous chat sessions involving the user to determine whether the user is the same person from a previous chat session.
  • the likelihood that a particular text was written by a given person can be determined.
  • a text is received in a computer readable form.
  • a profile of the purported author, in computer readable form, is also received.
  • analyzed guess attributes of the true author are determined by analyzing the text in question.
  • the profile of the purported author is compared to analyzed guess attributes of the true author. Upon completion of the comparison, the results are displayed.
  • Figure 1 is an example of a typical computer system upon which one embodiment of the present invention can be implemented.
  • Figure 2 is a high level data flow diagram illustrating the overall software architecture including the processes, data stores, and the flow of data among the processes according to one embodiment of the present invention.
  • Figure 3 is a data flow diagram illustrating the attribute analyzer of Figure 2 according to one embodiment of the present invention.
  • Figure 4 is a flow diagram illustrating a method of verifying claimed facts according to one embodiment of the present invention using the processes and data stores shown in Figure 2.
  • Figure 5 is a flow diagram illustrating a method of performing the metric processing of Figure 4 according to one embodiment of the present invention.
  • Figure 6 is a flow diagram illustrating a method of performing the background verification of Figure 4 according to one embodiment of the present invention.
  • Figure 7 is a flow diagram illustrating a method of performing the determination of analyzed guess attributes and confidence levels of Figure 4 according to one embodiment of the present invention.
  • Figure 8 is a flow diagram illustrating a method of determining whether a chat room user is the same person from a previous chat session according to one embodiment of the present invention.
  • Figure 9 is a flow diagram illustrating a method of determining whether a chat room user is the same person from a previous chat session according to another embodiment of the present invention.
  • Figure 10 is a flow diagram illustrating a method of determining the likelihood a given text was written by the purported author according to one embodiment of the present invention.
  • Figure 11 is an exemplary user interface for providing augmented information to a chat room user according to one embodiment of the present invention.
  • chat room and “chat area” are used throughout this application to refer to any online environment that allows multi-user interaction. For example, Internet Relay Chat (IRC), multi-user dungeons, multi-user environment simulators (MU*s), habitats, GMUKS (graphical multi-user konversation), and even Internet newsgroups would fall within this definition of a chat room.
  • IRC Internet Relay Chat
  • MU*s multi-user dungeons
  • MU*s multi-user environment simulators
  • habitats graphical multi-user konversation
  • GMUKS graphical multi-user konversation
  • Computer system 100 represents a computer system upon which the preferred embodiment of the present invention can be implemented.
  • Computer system 100 comprises a bus or other communication means 101 for communicating information, and a processing means 102 coupled with bus 101 for processing information.
  • Computer system 100 further comprises a random access memory (RAM) or other dynamic storage device 104 (referred to as main memory), coupled to bus 101 for storing information and instructions to be executed by processor 102.
  • Main memory 104 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 102.
  • Computer system 100 also comprises a read only memory (ROM) and/or other static storage device 106 coupled to bus 101 for storing static information and instructions for processor 102.
  • Data storage device 107 is coupled to bus 101 for storing information and instructions.
  • a data storage device 107 such as a magnetic disk or optical disc and its corresponding drive can be coupled to computer system 100.
  • Computer system 100 can also be coupled via bus 101 to a display device 121, such as a cathode ray tube (CRT), for displaying information to a computer user.
  • a display device 121 such as a cathode ray tube (CRT)
  • An alphanumeric input device 122 is typically coupled to bus 101 for communicating information and command selections to processor 102.
  • cursor control 123 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 102 and for controlling cursor movement on display 121.
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), which allows the device to specify positions in a plane.
  • other input devices such as a stylus or pen can be used to interact with the display.
  • a hard copy device 124 which may be used for printing instructions, data or other information on a medium such as paper, film, or similar types of media can be coupled to bus 201.
  • a communication device 125 may also be coupled to bus 101 for use in accessing other computer systems.
  • the communication device 125 may include any of a number of commercially available networking peripheral devices such as those used for coupling to an Ethernet, token ring, Internet, or wide area network.
  • an optical character recognition (OCR) device 126 may be coupled to bus 101 for scanning hard copy documents and converting them into computer readable form.
  • OCR optical character recognition
  • the present invention is related to the use of computer system 100 for determining characteristics of a person based upon a sample of text written by the person.
  • computer system 100 executes a program that analyzes the information sent from users in an online chat room and provides real-time, augmented information to a local user about one or more of the users.
  • Figure 2 is a high level data flow diagram illustrating the overall software architecture including the processes, data stores, and the flow of data among the processes according to one embodiment of the present invention.
  • a character stream is received by the system and sent to a display 220 and to a lexical analysis process 205.
  • the input bit stream is a stream of ASCII characters.
  • the lexical analysis process 205 converts the character stream into tokens. Both the tokens and the character stream are input into a metric processing block 210.
  • the lexical analysis process 205 would also include a graphic analysis process to analyze graphic information received such as avatars, icons, or other images associated with or transmitted by a user.
  • the metric processing block produces metrics with reference to a spelling dictionary 255 and a grammar checker 260.
  • the tokens are also supplied into a simple syntax analyzer 225.
  • the simple syntax analyzer 225 identifies claimed facts in the token stream and outputs the claimed facts to the display 220, a background verification process 230, and a comparator against claimed facts 240.
  • the simple syntax analyzer 225 reads a database containing keywords and phrases to facilitate the identification of claimed facts. For example, the user's name would generally follow these introductory phrases "My name is" or "I am” or "I'm,” the number between the phrases “I am” and “years old” would be the user's claimed age, the user's address could be expected to follow the phrase "I live at,” etc.
  • An attribute analyzer 215 analyzes the tokens and metrics to produce analyzed guess attributes and confidence levels.
  • the attribute analyzer 215 can also receive user feedback about the accuracy of its analyzed guess attributes to learn from its mistakes and context information to improve the analysis. The process of determining the analyzed guess attributes and confidence levels will be discussed further with reference to Figure 4.
  • the analyzed guess attributes and confidence levels produced by the attribute analyzer 215 and the claimed facts determined by the simple syntax analyzer 225 are input into the comparator against claimed facts 240.
  • the analyzed guess attributes are compared to the claimed facts and the results of the comparison are sent to display 220.
  • the analyzed guess attributes and confidence levels are also used by an expert system to compare against other people 250.
  • the expert system to compare against other people 250 determines if the analyzed guess attributes match a stored user profile corresponding to the user that produced the character steam.
  • a past user profile database 245 maintains user profiles on users encountered in previous online chat sessions. Each user profile in the past user profile database 245 includes typical conversational constructs employed by the user (e.g., conversational openings, topics of discussion, and conversational closings), and other distinguishing characteristics such the grammar, spelling and typing metrics discussed below. Additionally, claimed facts from previous chat session could be maintained in the user profile. Further, as discussed above, since graphic objects are more frequently being employed in online communication, graphics associated with a given user can also stored in the past user profiles database 245.
  • the background verification process 230 Given a set of claimed facts, the background verification process 230 produces verified facts.
  • the verified facts are compared against the claimed facts by the comparator against claimed facts 240 and the comparison results are displayed.
  • a sample of text written by a particular person therefore, can be analyzed to determine attributes of the person.
  • women are generally more polite than men.
  • Politeness is defined by the concern for the feelings of others.
  • Women typically use more polite speech than do men, characterized by a high frequency of honorifics, and softening devices such as hedges and questions.
  • variables, rules, metrics, and/or confidence levels can be generated for determining where a person grew up.
  • regional phrases and colloquialisms can be stored in a database and compared to the words used by a particular online user.
  • formulas can be used to predict a person's age and educational level. These formulas can be derived from known formulas that measure readability of a text. Several different formulas are used in software packages to analyze text passages and rate the readability (e.g., the Flesch Reading Ease, Gunning Fog Index, and the Flesch-Kincaid Grade Level).
  • the Flesch Reading Ease formula produces lower scores for writing that is difficult to read and higher scores for text that is easy to read.
  • the Flesch Reading Ease score is determined as follows: 206.835 - (1.015 (Average sentence length) + 846(Number of syllables per 100 words)).
  • a text scoring 90 to 100 is very easy to read and rated at the 4th grade level.
  • a score between 60 to 70 is considered standard and the corresponding text would be readable by those having the reading skills of a 7th to 8th grader.
  • a text generating a score between 0 and 30 is considered very difficult to read and is recommended for college level readers.
  • the Gunning Fog Index also gives an approximate grade level a reader must have completed to understand the document using the following formula: .04((Average number of words per sentence) + (Number of words of 3 syllables or more)).
  • the last approach is the Flesch-Kincaid Grade Level.
  • the formula is: 0.39 (Average number of words per sentence) + 11.8 (average number of syllables per word)) - 15.59.
  • a score between 6th and 10th grade is considered to be most effective for a general audience.
  • a correlation can be established between writers' educational levels and the grade level of the writing they produce. In one embodiment, this correlation can ⁇ tc used to determine guess educational levels.
  • This method of determining guess education levels can be further refined by adding a database of expected vocabulary for particular age groups and educational levels to the calculation. Once a database of expected vocabulary for particular age groups and educational levels is established, the system can maintain metrics regarding the vocabulary usage of users that are encountered in chat rooms. If a person claims they have graduated from college, but uses words primarily from vocabulary identified with a high school education, the system should accordingly indicate that it is unlikely that the person is college educated. Table 1 illustrates exemplary multisyllabic words that would likely be used by more educated people and corresponding words that would tend to indicate a lower educational level. These and other examples, readily determined with the use of a thesaurus, can be used as a factor in measuring educational level.
  • Figure 3 is a data flow diagram illustrating the attribute analyzer of Figure 2 according to one embodiment of the present invention.
  • the attribute analyzer 215 includes a neural network 315, a metric analyzer 310, a linguistics expert system 355, and a weighting process 330.
  • the speech patterns and variables discussed above can be incorporated into rules and metrics usable by the linguistics expert system 355 and the metric analyzer 310, respectively.
  • a neural network learns directly by interfacing with the domain, therefore, no rule-base needs to be generated.
  • a "neural network” is a system that is constructed to imitate the intelligent human biological processes of learning, self-modification, and learning by making inferences.
  • the neural network 315 analyzes tokens, and considers the context of the conversation, user feedback and metrics to produce analyzed guess attributes and confidence levels. Based upon user feedback regarding the accuracy of the analyzed guess attributes for a given user, the neural network 15 can adjust the weights used for particular neurode synapses in the synapse weights data store 360. Given enough time and experience or training, the neural network 315 can learn everything about the problem domain (attribute determination based on samples of text). It may well be able to learn what is presently not known by any of the experts.
  • the metric analyzer 310 produces a set of analyzed guess attributes and confidence levels based upon the metrics supplied by the metric processing block 210 and a metric database 325.
  • the metric database 325 includes correlaiions between the metrics and attributes like those discussed above.
  • particular metrics may be better predictors for some attributes than other attributes and should be subdivided accordingly.
  • a metric that tracks references to a popular television series of a particular era e.g., Gilligan's Island
  • the attribute analyzer 215 also includes linguistics expert system 355.
  • an "expert system” is a program that manifests some combination of concepts, procedures, and techniques derived from recent artificial intelligence (Al) research. These problem- solving systems were initially called “expert systems” to suggest that they functioned as effectively as human experts at their highly specialized tasks. Expert systems have been employed in fields as varied as medical diagnosis (e.g., MYCIN was developed at Stanford University to consult with physicians to determine if a patient had a potentially fatal infection like bacteremia or meningitis.) and computer configuration (e.g., XCON was developed to assist Digital Equipment Corporation with customer requests for custom built computers.). A range of implementation options are available for development of an expert system such as simply adding an interface and supplying rules to an existing expert system engine, using expert system development tools to build the system, or creating the system from scratch with a high level language.
  • the linguistics expert system 355 includes an inference engine 305 and a knowledge base 365.
  • the inference engine 305 contains the inference and control strategies. It also includes various knowledge acquisition, explanation, and user interface subsystems.
  • the knowledge base 365 consists of facts and heuristics useful for attribute determination.
  • the knowledge can be in the form of examples, facts, rules, or objects. In this example, because the information available for the determination of attributes is primarily in the form of rules, the linguistic expert system's knowledge representation preferably would be a structured rule-base.
  • the knowledge base 365 would be arranged into separate rule-bases (e.g., a rule-base for educational level predictors 350, a rule-base for gender predictors 335, a rule-base for age predictors 340, and a rule-base for regional predictors 345).
  • a rule-base for educational level predictors 350 e.g., a rule-base for educational level predictors 350
  • a rule-base for gender predictors 335 e.g., a rule-base for gender predictors 335
  • a rule-base for age predictors 340 e.g., a rule-base for regional predictors 345.
  • the knowledge base 365 can also be arranged into separate rule-bases based upon the context of the conversation.
  • the context can be manually input to the linguistics expert system 355 or the linguistics expert system 355 can be configured to automatically recognize the context based upon keywords that are be associated with a given context. Subsequent linguistic analysis based on the context can then disregard certain inferences that might otherwise have been made without reference to the context. For example, without knowing that the conversational environment was a chemistry tutorial chat room, a high level of education might be attributed to an uneducated user due to the high number of references to chemistry and related terms.
  • the linguistics expert system 355 also includes logic to identify rules in the knowledge base 365 that require additional information before they can be evaluated. When these rules are identified, the linguistics expert system 355 provides guidance to the user in the form of suggested questions.
  • the linguistics expert system 355 can be implemented as a rule- based learning system that is trainable or a traditional expert system.
  • Traditional expert systems are dependent on experts to supply domain knowledge in the form of facts and relationships among the facts. The knowledge of these systems is bounded by what is currently known about the established domain. Therefore, when new expertise can no longer be found, the knowledge of this type of expert system stops growing.
  • a rule-based learning system can learn by interacting with the domain, and therefore is not dependent upon experts to supply additional domain knowledge.
  • a rule-based learning system can modify its knowledge base based upon interaction with the domain and user feedback.
  • the attribute analyzer 215 includes a rule- based learning system.
  • a further advantage of using a rule-based learning system is the ability to query the system about the particular rules used to arrive at its conclusions, whereas this information is not available from a neural network .
  • Figure 4 is a flow diagram illustrating a method of verifying claimed facts according to one embodiment of the present invention using the processes and data stores shown in Figure 2.
  • information is received from another chat room user.
  • it is assumed information is communicated among users in the form of text messages.
  • the characters of the text messages may be received in real-time as the other user is typing or the text messages may be received after the other user has completed and transmitted each individual message.
  • lexical analysis is performed on the received text messages to tokenize the character stream. That is, the characters are grouped into tokens (e.g., words, punctuation, white space, numbers, etc.).
  • Metrics are also used to improve attribute determination. Clues about the other user's gender, age, educational level, and other characteristics can be found by studying the character stream and tokens. For example, a user that falsely claims to have an advanced degree in English mighi Ix; betrayed by poor grammar and spelling.
  • one or more metrics are tracked based upon the text messages received from the other user in the metric processing step, 415.
  • a simple syntax analysis is performed.
  • the simple syntax analyzer scans the text messages for facts asserted by the other user.
  • this simple syntax analysis is performed using a database containing keywords and phrases.
  • the database could include the words and phrases that are likely to identify occurrences of claimed facts.
  • the syntax analysis could be performed without reference to a keyword database by using other syntax analysis techniques to determine claimed facts asserted by the user (e.g., parsing methods typically employed by compilers).
  • the claimed facts are displayed to the local user.
  • the claimed facts can be displayed in the chat window or a separate window on the display 121 so as not to interfere with the chat session.
  • the attribute analyzer 215 determines analyzed guess attributes and confidence levels for each of the analyzed guess attributes.
  • the analyzed guess attributes of the other user include: age, educational level, gender, and region. The inventors of the present invention envision many other attributes will be capable of determination as the field of sociolinguistics continues to mature and more information becomes available with respect to the relationship of speech patterns and other attributes.
  • Confidence levels can be provided for each of the analyzed guess attributes.
  • the confidence levels indicating the attribute analyzer's confidence in its guess based upon the sample size and weight of the predictor, for example.
  • this supplemental information about the other user is displayed to the local user.
  • the analyzed guess attributes can be displayed in a separate window, the chat window, or along side the corresponding claimed attributes to allow easy comparison by the local user.
  • a measure of hysteresis should be included in the attribute deteriui nation to ensure that a brief lapse into poor grammar, for example, does not immediately affect the guessed educational level.
  • the analyzed guess attributes are compared to corresponding claimed attributes.
  • the results of the comparison are analyzed to determined if the analyzed guess attributes produced by the attribute analyzer 215 differ from the claimed attributes provided by the other user.
  • the system can include selectable thresholds for each of the analyzed guess attributes.
  • step 450 would be configured to perceive no difference between a given claimed facts and the corresponding analyzed guess attribute unless a user supplied threshold is exceeded. For example, the local user might not want to be alerted about a difference between the analyzed guess age the claimed age unless the absolute value of the difference exceeded two years. Similarly, the local user might want to suppress alerts until the confidence level reached a sufficient level.
  • the local user is alerted at step 455, otherwise processing continues at step 405 so long as text messages are being received. Based upon the local user's preference, the local user can be alerted of the discrepancy between the analyzed guess attributes and the claimed attributes by an alarm including sound, color, or some other mechanism.
  • Figure 5 is a flow diagram illustrating a method of performing the metric processing of Figure 4 according to one embodiment of the present invention.
  • the step of performing metric processing 415 includes measuring typing speed 505, grammar analysis 510, and spelling analysis 515.
  • the character stream is received in real-time, metrics can be measured and recorded regarding the other user's typing speed, typing rhythm, and other factors determined to be pertinent to user attribute analysis. Since different individuals have different typing patterns, metrics recorded with respect to a users typing patterns are especially useful for verifying the identity of a particular user. For example, given general typing rhythm information such as inter-key and inter-word time differences, a touch typist would be distinguishable from a "hunt-and-peck" typist.
  • grammar metrics can be tracked by performing grammar analysis on the text messages. For example, metrics can be recorded for grammatical and punctuation errors, weak phrasing, slang, ambiguity, cliches, long or incomplete sentences, redundant phrases, incorrect verbs, and other problems.
  • spelling metrics can be tracked by performing spelling analysis on the text messages. Many other variables can be tracked depending upon the complexity and accuracy goals for the system (e.g., word choice, breadth of vocabulary, length of sentences, references to events or popular icons of a particular era, music, complexity of sentence structure, etc.)
  • Figure 6 is a flow diagram illustrating a method of performing the background verification of Figure 4 according to one embodiment of the present invention.
  • the step of performing background verification 430 further includes steps 605, 610, 615 and 620 for each claimed fact that is identified by step 420.
  • step 610 information regarding the claimed fact is retrieved from the appropriate online resource.
  • this verification processing is performed in the background to reduce the effect on the ongoing chat session.
  • step 615 it is determined whether there is a discrepancy between the claimed fact and the verified fact. If a discrepancy exists, the local user is alerted at step 620. Again, user selectable thresholds can be employed to allow the system to be more flexible. Once the local user has been alerted of the discrepancy, the background verification processing is complete and processing continues with step 435.
  • step 605 If it is determined, at step 605, that the claimed fact is not one that can be verified with reference to a reliable online resource, then the verification processing for this claimed fact is complete. Processing will continue with step 435.
  • step 615 If, at step 615, it is determined that there is no discrepancy between the claimed fact and the verified data (e.g., the verified data is consistent with the claimed fact or falls within the user defined threshold), then the background verification processing is complete with respect to this claimed fact and processing continues with step 435.
  • Figure 7 is a flow diagram illustrating a method of performing the determination of analyzed guess attributes and confidence levels of Figure 4 according to one embodiment of the present invention.
  • linguistics analysis is performed by the linguistics expert system 355 to provide analyzed guess attributes and corresponding confidence levels.
  • metric analysis is performed on the metrics by the metric analyzer 310.
  • the metric analysis also produces a set of analyzed guess attributes and corresponding confidence levels.
  • the metric analysis simply involves looking up metrics in the metric database 325.
  • the neural network 315 analyzes the text messages received from the other user to arrive at analyzed guess attributes and confidence levels. Upon determining the analyzed guess attributes and corresponding confidence levels, the neural network analysis outputs analyzed guess attributes and confidence levels.
  • the analyzed guess attributes and confidence levels output from steps 705, 710, and 715 are evaluated and weighted at step 720. Initially, less weight will be given to the neural network 315 than the metric analyzer 310 and the linguistics expert system 355, but as it becomes trained, the neural network will be given more weight. For example, initially the linguistics experts system's analyzed guess attributes might be weighted 50%, the metric analyzer's analyzed guess attributes 30%, and the neural network's analyzed guess attributes 20%. However, as the neural network's confidence levels increase as a result of sufficient training, its outputs can be relied upon more heavily. It is appreciated that not all of the analysis steps (705, 710, and 715) are required for a functional method of determining analyzed guess attributes. Any one of steps 705, 710, or 715 alone could provide a reasonable attribute analysis.
  • Figure 8 is a flow diagram illustrating a method of determining whether a chat room user is the same person from a previous chat session according to one embodiment of the present invention.
  • Steps 405 through 440 are as described with respect to Figure 4.
  • the analyzed guess attributes and the claimed facts from the current chat session are compared against a stored user profile in the past user profiles 245.
  • the claimed facts in the current chat session are compared against the claimed facts from previous chat sessions and analyzed guess attributes are compared against claimed facts from the current and previous sessions.
  • Step 845 determines whether a discrepancy exists between the stored user profile and the analyzed guess attributes and the claimed facts from the current session. If a discrepancy is found the local user is alerted, at step 850, otherwise processing continues at step 405 so long as text messages are being received.
  • FIG. 9 is a flow diagram illustrating a method of determining whether a chat room user is the same person from a previous chat session according to another embodiment of the present invention.
  • Research in discourse analysis has revealed predictable sequences and routines in human interaction. While speech may seem to be infinitely variable, it is not totally unpredictable. It has been recognized that a significant percentage of conversational language is highly routinized into prefabricated utierances. Some have concluded that an enormous amount of natural language is formulaic, automatic, and rehearsed.
  • a complete conversation can be said to have the following three components: an opening of the conversation; topic discussion; and a closing of the conversation.
  • People tend to use a predictable routine in opening a conversation.
  • the process of opening a conversation can typically be broken down further into the following elements: bid for attention; verbal salute; identification; personal inquiry; and Smalltalk.
  • Topic discussion generally has a much less obvious structure than the opening and closing sections of a conversation making the task of identification difficult with reference to the structure of past and present topic discussions alone.
  • a fair conclusion about what can be predicted about topic discussion is that it is often repetitive.
  • people tend to follow set procedures for closing a conversation.
  • Recognizing the predictability and repetitive nature of conversations allows a user to be identified by comparing a current conversational pattern to patterns maintained in a user profile.
  • the following method exploits the predictability and repetitive nature of conversations to determine whether a particular user is the same person from one or more previous chat sessions.
  • Steps 405 and 410 are as described with respect to Figure 4.
  • a simple syntax analysis is performed on the received text messages to identify conversational constructs (e.g., conversational opening elements, the topic of conversation, and conversational closing elements).
  • the identified conversational constructs in the current conversation are compared against a stored user profile.
  • the stored user profile has been derived from several previous encounters with the user to make the determination more reliable.
  • the comparison should also include a measure of hysteresis to ensure against a mismatch due to one out-of-character remark.
  • a particular user might be identified with sufficient certainty without every aspect of the user's profile indicating a match. Further, a particular user might be the same person from a previous session even though a topic of conversation is unlikely for the person.
  • the context of the conversation is helpful in this regard. For example, the topic of conversation should be given less weight as a predictor when the users are engaged in a special interest chat room as opposed to a ch; ⁇ t room of a more general nature.
  • step 925 it is determined whether the differences between the conversational constructs identified for the current chat session exceed a threshold so as to warrant an alert to the local user. If the differences are found to be substantial in step 925, the local user is alerted of the discrepancy at step 930. If the differences are not worthy of alerting the local user, then steps 405, 410, and 915 through 930 can be repeated as text messages continue to be received from other chat room users.
  • Figure 1 is a flow diagram illustrating a method of determining the likelihood a given text was written by the purported author according to one embodiment of the present invention.
  • This method could be used by a university, for example, to determine the likelihood that an essay was indeed written by the student that claimed to be the author.
  • a computer readable form of the text to be certified (e.g., an ASCII text file on diskette) is received by the system. If the text to be certified is in hard copy form, the text can be scanned into the system via the optical character recognition device 126 or it may be manually entered via the keyboard 122. In any event, an electronic copy is required before the analysis can be performed.
  • a profile of the purported author of the text to be certified is received by the system.
  • the profile needs to be in a computer readable form.
  • the profile may be generated by analyzing the writing of a pers >n over a significant period of time and storing observed characteristics as described with respect to Figure 2. Another option would be to generate the profile based upon several known samples of the individual's writing. Alternatively, the profile can be manually keyed into the system and should include accurate information regarding the purported author's gender, age, educational level, and where he/she grew up.
  • analyzed guess attributes of the true author and associated confidence levels are determined based upon the text supplied in step 1005. This determination is equivalent to the determination made in step 435.
  • step 1 25 the analyzed guess attributes of the true author are compared to the purported author's profile supplied in step 1010. The differences, if any, between the analyzed guess attributes and the purported author's attributes are detected at step 1030.
  • Figure 1 1 is an exemplary user interface for providing augmented information to a chat room user according to one embodiment of the present invention.
  • the user interface includes means for communicating information to the local chat room user and means for receiving feedback from the local chat room user such as graphical or text-based windows presented on display 220.
  • the user interface includes a chat window 1 1 0, a fact window 1110, an alert window 1130, and a feedback window 1 140.
  • the chat window 1120 records and displays the chat room conversation.
  • the fact window 1110 displays claimed facts, analyzed guesses, and confidence levels associated with the analyzed guesses for one or more users involved in the chat room conversation.
  • the simple syntax analyzer 225 and the attribute analyzer 215 update the fact window 1110.
  • the simple syntax analyzer 225 disc-overs a recognized factual assertion
  • the claimed fact is displayed and recorded in the fact window 1110.
  • the attribute analyzer 215 has enough information to produce analyzed guess attributes and confidence levels, this additional information is presented to the local chat room user by way of the fact window 1110.
  • the local chat room user can be immediately alerted of potential identity deception by way of an alert mechanism such as an audible tone or visual signal.
  • an alert mechanism such as an audible tone or visual signal.
  • text-based alerts are displayed in alert window 1130 when discrepancies are detected between the facts claimed by a particular chat room user and the analyzed guess attributes determined for that chat room user.
  • the feedback window 1140 allows the local chat room user to provide feedback to the attribute analyzer 215 regarding known attributes of the other chat room users.
  • the local chat room user's feedback is available for learning systems of the attribute analyzer 215 to make better educated guesses in the future.

Abstract

L'invention concerne un procédé qui permet de fournir en temps réél à un utilisateur d'un forum de discussion des informations améliorées concernant un ou plusieurs autres utilisateurs dudit forum de discussion. Des informations sont reçues d'un autre utilisateur (405) du forum de discussion et lesdites informations sont ensuite analysées pour déterminer un ou plusieurs attributs conjecturels analysés de l'utilisateur (445) du forum de discussion. Les attributs conjecturels analysés peuvent inclure, par exemple, l'âge, le sexe ou le niveau d'instruction de l'utilisateur. Après détermination de l'attribut ou des attributs conjecturels analysés de l'autre utilisateur du forum de discussion, lesdits attributs sont affichés à l'intention de l'utilisateur local (455) du forum de discussion. Suivant d'autres aspects de l'invention, on peut vérifier l'identité d'un utilisateur d'un forum de discussion ou déterminer s'il est vraisemblable qu'un texte donné ait été écrit par une personne donnée.
PCT/US1997/021781 1996-12-31 1997-12-01 Procede et appareil d'analyse des entrees-clavier d'un utilisateur pour determiner ou verifier des faits WO1998029818A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP97951483A EP1016000A4 (fr) 1996-12-31 1997-12-01 Procede et appareil d'analyse des entrees-clavier d'un utilisateur pour determiner ou verifier des faits
AU55114/98A AU5511498A (en) 1996-12-31 1997-12-01 Method and apparatus for analyzing online user typing to determine or verify facts

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US77580896A 1996-12-31 1996-12-31
US08/775,808 1996-12-31

Publications (1)

Publication Number Publication Date
WO1998029818A1 true WO1998029818A1 (fr) 1998-07-09

Family

ID=25105567

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/021781 WO1998029818A1 (fr) 1996-12-31 1997-12-01 Procede et appareil d'analyse des entrees-clavier d'un utilisateur pour determiner ou verifier des faits

Country Status (3)

Country Link
EP (1) EP1016000A4 (fr)
AU (1) AU5511498A (fr)
WO (1) WO1998029818A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012004283A1 (fr) * 2010-07-06 2012-01-12 Telefonica, S.A. Système de surveillance d'interactions en ligne
US8918818B2 (en) 1998-08-26 2014-12-23 United Video Properties, Inc. Television chat system
US8998720B2 (en) 2010-03-31 2015-04-07 Rovi Technologies Corporation Media appliance
US9779084B2 (en) 2013-10-04 2017-10-03 Mattersight Corporation Online classroom analytics system and methods
US10346879B2 (en) * 2008-11-18 2019-07-09 Sizmek Technologies, Inc. Method and system for identifying web documents for advertisements

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4456973A (en) * 1982-04-30 1984-06-26 International Business Machines Corporation Automatic text grade level analyzer for a text processing system
US5557686A (en) * 1993-01-13 1996-09-17 University Of Alabama Method and apparatus for verification of a computer user's identification, based on keystroke characteristics
US5694163A (en) * 1995-09-28 1997-12-02 Intel Corporation Method and apparatus for viewing of on-line information service chat data incorporated in a broadcast television program
US5710884A (en) * 1995-03-29 1998-01-20 Intel Corporation System for automatically updating personal profile server with updates to additional user information gathered from monitoring user's electronic consuming habits generated on computer during use

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4773009A (en) * 1986-06-06 1988-09-20 Houghton Mifflin Company Method and apparatus for text analysis
US4930077A (en) * 1987-04-06 1990-05-29 Fan David P Information processing expert system for text analysis and predicting public opinion based information available to the public
JPH0281230A (ja) * 1988-09-19 1990-03-22 Hitachi Ltd 構文解析および言語処理システム
US5694595A (en) 1993-12-23 1997-12-02 International Business Machines, Corporation Remote user profile management administration in a computer network
US5717923A (en) 1994-11-03 1998-02-10 Intel Corporation Method and apparatus for dynamically customizing electronic information to individual end users
US5724521A (en) 1994-11-03 1998-03-03 Intel Corporation Method and apparatus for providing electronic advertisements to end users in a consumer best-fit pricing manner
US5758257A (en) 1994-11-29 1998-05-26 Herz; Frederick System and method for scheduling broadcast of and access to video programs and other data using customer profiles
US5748396A (en) 1995-11-13 1998-05-05 Seagate Technology, Inc. Arrangement and method for optimizing the recorded signal to noise ratio in contact recording systems
US5784563A (en) 1996-05-23 1998-07-21 Electronic Data Systems Corporation Method and system for automated reconfiguration of a client computer or user profile in a computer network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4456973A (en) * 1982-04-30 1984-06-26 International Business Machines Corporation Automatic text grade level analyzer for a text processing system
US5557686A (en) * 1993-01-13 1996-09-17 University Of Alabama Method and apparatus for verification of a computer user's identification, based on keystroke characteristics
US5710884A (en) * 1995-03-29 1998-01-20 Intel Corporation System for automatically updating personal profile server with updates to additional user information gathered from monitoring user's electronic consuming habits generated on computer during use
US5694163A (en) * 1995-09-28 1997-12-02 Intel Corporation Method and apparatus for viewing of on-line information service chat data incorporated in a broadcast television program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
COMMUNICATIONS OF THE ACM, May 1992, Vol. 35, No. 5, ALM et al., "Prediction and Conversational Momentum in an Augmentative Communication System", pages 46-57. *
See also references of EP1016000A4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8918818B2 (en) 1998-08-26 2014-12-23 United Video Properties, Inc. Television chat system
US9521451B2 (en) 1998-08-26 2016-12-13 Rovi Guides, Inc. Television chat system
US10346879B2 (en) * 2008-11-18 2019-07-09 Sizmek Technologies, Inc. Method and system for identifying web documents for advertisements
US8998720B2 (en) 2010-03-31 2015-04-07 Rovi Technologies Corporation Media appliance
US10454862B2 (en) 2010-03-31 2019-10-22 Rovi Technologies Corporation Media appliance
WO2012004283A1 (fr) * 2010-07-06 2012-01-12 Telefonica, S.A. Système de surveillance d'interactions en ligne
US9779084B2 (en) 2013-10-04 2017-10-03 Mattersight Corporation Online classroom analytics system and methods
US10191901B2 (en) 2013-10-04 2019-01-29 Mattersight Corporation Enrollment pairing analytics system and methods

Also Published As

Publication number Publication date
AU5511498A (en) 1998-07-31
EP1016000A1 (fr) 2000-07-05
EP1016000A4 (fr) 2002-09-04

Similar Documents

Publication Publication Date Title
Molnár et al. The role of chatbots in formal education
US5812126A (en) Method and apparatus for masquerading online
US10720078B2 (en) Systems and methods for extracting keywords in language learning
Paladines et al. A systematic literature review of intelligent tutoring systems with dialogue in natural language
Richardson et al. What does my QA model know? devising controlled probes using expert knowledge
Saygin et al. Pragmatics in human-computer conversations
AbuShawar et al. Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems
Demir et al. Interactive sight into information graphics
Duerr et al. Persuasive Natural Language Generation--A Literature Review
Rajesh et al. Prediction of N-Gram language models using sentiment analysis on E-learning reviews
WO1998029818A1 (fr) Procede et appareil d'analyse des entrees-clavier d'un utilisateur pour determiner ou verifier des faits
Suri et al. A methodology for developing an error taxonomy for a computer assisted language learning tool for second language learners
Isaak et al. Using the Winograd Schema Challenge as a CAPTCHA.
Wu et al. Text generation from Taiwanese sign language using a PST-based language model for augmentative communication
Craven Abstracts produced using computer assistance
Zhang et al. Affect detection and metaphor in e-drama
Olde et al. A Connectionist Model for Part of Speech Tagging.
Mazza et al. Behavioural simulator for professional training based on natural language interaction
Schoonen et al. Transfer, writing, and SLA: L2 writing as a multilingual event
Read et al. I-PETER: Modelling personalised diagnosis and material selection for an online English course
Albornoz-De Luise et al. Conversational agent design for algebra tutoring
Katavic et al. Navigating the Use of ChatGPT in Classrooms: A Study of Student Experiences
Suwarningsih et al. Discovery indonesian medical question-answering pairs pattern with question generation
Elzer et al. Bar charts in popular media: Conveying their message to visually impaired users via speech
Golikov et al. Comparative analysis of artificial intelligence based on existing chatbots

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE GH HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT UA UG UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1997951483

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1997951483

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1997951483

Country of ref document: EP