US20110103682A1 - Multi-modality classification for one-class classification in social networks - Google Patents
Multi-modality classification for one-class classification in social networks Download PDFInfo
- Publication number
- US20110103682A1 US20110103682A1 US12/608,143 US60814309A US2011103682A1 US 20110103682 A1 US20110103682 A1 US 20110103682A1 US 60814309 A US60814309 A US 60814309A US 2011103682 A1 US2011103682 A1 US 2011103682A1
- Authority
- US
- United States
- Prior art keywords
- objects
- features
- actor
- social network
- actors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 66
- 238000004590 computer program Methods 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 41
- 230000015654 memory Effects 0.000 claims description 18
- 230000000694 effects Effects 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 16
- 239000000284 extract Substances 0.000 claims description 7
- 230000001902 propagating effect Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 description 27
- 238000004422 calculation algorithm Methods 0.000 description 25
- 238000012706 support-vector machine Methods 0.000 description 24
- 238000013507 mapping Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 13
- 238000013459 approach Methods 0.000 description 11
- 238000009826 distribution Methods 0.000 description 8
- 230000002776 aggregation Effects 0.000 description 6
- 238000004220 aggregation Methods 0.000 description 6
- 238000002790 cross-validation Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 230000004043 responsiveness Effects 0.000 description 4
- 238000012552 review Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000004931 aggregating effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013329 compounding Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000003012 network analysis Methods 0.000 description 2
- 235000020004 porter Nutrition 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010009 beating Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 229920005556 chlorobutyl Polymers 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000012553 document review Methods 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/52—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
Definitions
- the exemplary embodiment relates to object classification. It finds particular application in connection with multi-modality one-class classification of a large corpus of documents, based on extracted features, and in one particular case, where only a small corpus of labeled documents may be available.
- document classification plays an important role by preselecting what documents are to be reviewed by a person and in what order.
- Applications of document selection range from search engines to spam filtering.
- more specialized tasks can be approached with the same techniques, such as document review in large corporate litigation cases.
- Support Vector Machines as text classifiers are described, for example, in U.S. Pat. No. 7,386,527, entitled EFFECTIVE MULTI-CLASS SUPPORT VECTOR MACHINE CLASSIFICATION.
- a method of classification includes, for each of a plurality of modalities, extracting features from objects in a set of objects, the objects comprising electronic mail messages, and generating a representation of each object based on its extracted features, at least one of the plurality of modalities being a social-network modality in which social network features are extracted from a social network implicit in the electronic mail messages.
- the method further includes training a classifier system based on class labels of a subset of the set of objects and on the representations generated for each of the modalities. With the trained classifier system, labels are predicted for unlabeled objects in the set of objects. Any one or more of these steps may be implemented by a computer processor.
- a classification apparatus includes an input for receiving a set of objects, the objects comprising electronic mail messages, a subset of the objects having class labels.
- a first feature extractor extracts text-based features from objects in a set of objects.
- a second feature extractor extracts social network-based features from the objects in the set of objects.
- a classifier system executed by a computer processor, which predicts labels for unlabeled objects in the set of objects based on the extracted text-based and social network-based features.
- a classification method includes, for each of a plurality of modalities, extracting features from objects in a set of objects comprising electronic mail messages and generating a representation of each object based on its extracted features.
- a one-class classifier system is trained, based on class labels of a subset of the set of objects and on the representations generated for each of the modalities.
- the training includes, for each of the modalities and based on an initial set of objects positively labeled with respect to the class, generating an initial hypothesis which predicts negative labels for a subset of the unlabeled objects in the set and iteratively generating a new hypothesis in which a new boundary between representations of objects predicted as having negative labels and representations of objects predicted as having positive labels converges towards an original boundary between the representations of the initial positively labeled objects and the rest of the objects in the set.
- labels are predicted for unlabeled objects in the set of objects.
- FIG. 1 is a functional block diagram of an apparatus for training a multimodality classification system in accordance with one aspect of the exemplary embodiment
- FIG. 2 is a flow diagram illustrating a method for training and using a multimodality classification system in accordance with another aspect of the exemplary embodiment
- FIG. 3 illustrates a social network graph
- FIG. 4 a performance curve illustrating convergence for a set of random data in 2 which does not have a large gap between positive and negative data points;
- FIG. 5 schematically illustrates mapping convergence in two dimensions with positive data as circles and unlabeled data as triangles;
- FIG. 6 is a plot of number of occurrences per word in the Enron corpus of e-mails.
- FIG. 7 is a plot of message size for the Enron corpus showing the number of messages for e-mail messages of a given word length
- FIG. 8 is a plot of the number of messages per sender for the Enron corpus
- FIG. 9 is a plot of the number of receivers per message for the Enron corpus.
- FIG. 10 is a plot of number of messages per week in the Enron corpus over a timeline of about 6 years with vertical lines indicating weeks for which a responsive e-mail was generated;
- FIG. 12 is a plot illustrating the effect of feature value type on performance of a text content classifier on the Enron corpus: (a) the bag of words feature set (bow)-using all features, with td-idf as the type of feature value; (b) a bag of clusters feature set (sem), using 6522 terms generated by semantic clustering, with td-idf; and (c) the same bag of clusters feature set, with binary values (bin);
- FIG. 13 is a plot illustrating the effect of ⁇ on performance of a social networks content classifier on the Enron corpus
- FIG. 14 is a plot illustrating the performance of the “best” text-based classifier identified in the tests, using a bag of words feature set and td-idf feature values, the convergence steps being shown by squares, starting at the top right of the graph;
- FIG. 16 is a plot illustrating the effect on performance of combining classifiers by linear combination on the Enron corpus.
- FIG. 17 is a plot illustrating the effect of combining classifiers by co-training with a Mapping Co-convergence framework on performance on the Enron corpus.
- aspects of the exemplary embodiment relate to a method and apparatus for improving classification performance in a one-class setting by combining classifiers of different modalities.
- multi-modality is used to refer to a combination of different levels of description to aid classification tasks.
- the method is particularly suited to classification of a large corpus of electronic mail messages, such as e-mails, text messages (SMS), and other electronic documents which include references to the sender and receiver.
- e-mails electronic mail messages
- SMS text messages
- other electronic documents which include references to the sender and receiver.
- the problem of distinguishing responsive documents in a corpus of e-mails is used to demonstrate the method.
- the same principles are applicable to similar classification problems where collaborators working on a set of documents can instantiate social features (available as a result of collaborative edition, version control, and/or traceability in a document processing system).
- the method provides a way to turn the social network that is implicit in a large body of electronic communication documents into valuable features for classifying the exchanged documents.
- Working in a one-class setting a semi-supervised approach, based on the Mapping Convergence framework, may be used.
- An alternative interpretation, that allows for broader applicability by dismissing the prerequisite that positive and negative items must be naturally separable, is disclosed.
- An extension to the one-class evaluation framework is proposed, which is found to be useful, even when very few positive training examples are available.
- the one-class setting is extended to a co-training principle that enables taking advantage of the availability of multiple redundant views of the data. This extension is evaluated on the Enron Corpus, for classifying responsiveness of documents.
- a way to turn the social network that is implicit in a large body of electronic communication into valuable features for classifying the exchanged documents is also disclosed. A combination of text-based features and features based on this second extra-textual modality has been shown to improve classification results.
- the multi-modality of e-mail is used for classification.
- E-mail not only includes text, but it also implicitly instantiates a social network of people communicating with each other.
- Document representations modeling these distinct levels are constructed, which are then combined for classification. This involves an integration of different aspects, such as the topic of an e-mail, the sender, and the receivers. Classifying with respect towards responsiveness is used as an example (in this case, whether or not the message is of relevance to a particular litigation matter).
- An algorithm developed for use in the method is specifically aimed at classifying items based on a very small set of positive training examples and a large amount of unlabeled data.
- the method also finds application in situations where both positive and negative labeled samples are available.
- an apparatus for classification of electronic data objects is illustrated, in the form of a digital processing device, such as a computer 10 .
- the computer 10 includes a digital processor 12 , such as the computer's CPU, and associated memory, here illustrated as main memory 14 and data memory 16 .
- the digital processor 12 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like.
- the digital processor 12 in addition to controlling the operation of the computer 10 , executes instructions stored in memory 14 for performing the method outlined in FIG. 2 .
- the computer 10 may include one or more dedicated or general purpose computing devices, such as a server computer or a desktop or laptop computer with an associated display device and a user input device, such as a keyboard and/or cursor control device (not shown).
- a server computer or a desktop or laptop computer with an associated display device and a user input device, such as a keyboard and/or cursor control device (not shown).
- a user input device such as a keyboard and/or cursor control device (not shown).
- the memories 14 , 16 may be separate or combined and may represent any type of computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, or holographic memory. In one embodiment, the memory 14 , 16 comprises a combination of random access memory and read only memory.
- the term “software” as used herein is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software.
- the term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.
- Such software may be organized in various ways, and may include software components organized as libraries, Internet-based programs stored on a remote server or so forth, source code, interpretive code, object code, directly executable code, and so forth. It is contemplated that the software may invoke system-level code or calls to other software residing on a server or other location to perform certain functions.
- the illustrated computer 10 includes an input interface 20 and an output interface 22 , which may be combined or separate.
- Interface 20 receives a dataset of electronic data objects 24 to be classified.
- a portion 26 typically only a small portion, of the objects in the dataset 24 has been labeled according to class.
- Interface 22 outputs predicted labels 28 for unlabeled data objects.
- Exemplary input and output interfaces include wired and wireless network interfaces, such as modems, or local interfaces, such as USB ports, disk drives, and the like.
- Components 12 , 14 , 16 , 20 , 22 of the computer are communicatively interconnected by a data/control bus 30 .
- the computer 10 is configured by suitable programming or hardwired firmware to embody a classifier training system 32 for training a classifier system 34 .
- the exemplary training includes assigning values to parameters of a one class classification algorithm for classifying unlabeled objects as responsive (in the class) or not.
- classifier training system 32 and classifier system 34 may be embodied in software instructions stored in memory 14 and executed by processor 12 .
- the classifier training system 32 operates on a dataset 24 comprising a suitable number of labeled training objects 26 .
- the labels represent a priori information about the classifications of the images, such as manually applied class labels.
- the labels can, for example, be “+1” if the object is assigned to the class and “ ⁇ 1” otherwise.
- the labels can, for example, be values in the range [0,1] indicating likelihood of membership in the class.
- the classifier training system 32 may be embodied in software, hardware or a combination thereof.
- system 32 includes various software modules 40 , 42 , 44 , 46 , 48 , 50 , 52 , 54 executed by processor 12 , although it is to be appreciated that the modules may be combined and/or distributed over two or more computing devices.
- a data extraction component 40 extracts text content of the objects in the data set and in particular, the content of an electronic message body as well as the content of header fields.
- a lexicon 56 is generated by the extraction component from the extracted text content and may be stored in memory, such as main memory 14 or data memory 16 .
- a first features extractor 42 extracts text-based features from the extracted text content, in particular from the email body.
- a representation generator 44 generates a representation of each object 24 (e.g., a features vector) based on the extracted text-content features.
- a reference resolution component 46 resolves references in the objects to actors (people) in a social network of actors responsible for sending and receiving the electronic messages in the data set.
- a social network extraction component 48 generates a graph in which actors are represented by nodes and electronic mail traffic between actors is represented by edges, each edge labeled with a number of electronic mail messages in a given direction.
- a second features extractor 50 extracts features for each of the electronic mail messages 24 based on sets of features assigned to the sending and receiving actors of that message. The actors' features are based, at least in part, on the email traffic between actors in the network and are selected to reflect the actor's relative importance in the social network.
- the text-based representation and social network-based representation generated for each object 24 are input to a classifier learning component 54 which trains classifiers 58 , 60 of classification system 34 for the respective modalities.
- the classification system combines the output of the classifiers to identify a classification boundary, whereby objects predicted to be within the class can be labeled with the class label, e.g., “responsive,” and objects outside the class can optionally also be labeled accordingly (e.g., as non-responsive).
- FIG. 2 a computer-implemented method which may be performed with the apparatus of FIG. 1 is shown. The method, details of which are described in greater detail below, begins at S 100 .
- a dataset 24 of S objects is input and may be stored in computer memory 16 during processing.
- the dataset may be preprocessed to remove duplicate objects.
- representations of the objects are generated in n different modalities, based on features extracted from the input objects, where n ⁇ 2.
- two modalities are used: text-based and social network-based.
- the features may be based on textual content and the method may proceed as follows:
- textual content is extracted for each object 24 .
- the textual content may be extracted from the subject (title) field, the body of the e-mail (i.e., the message) and optionally also from any text attachments.
- a lexicon 56 is generated, based on the textual content of all the e-mails in the dataset.
- the lexicon 56 can be processed to reduce its dimensionality. For example, very frequently used words (such as “the” and “and”) and/or words below a threshold length can be excluded from the lexicon. Additionally, words can be grouped in clusters, based on semantic similarity, or by automatically applying co-occurrence rules to identify words used in similar contexts.
- a representation is generated for each object 24 , based on the text content.
- This can be in the form of a bag-of-words or bag-of-clusters representation (collectively referred to as bag-of-terms representations).
- text content is represented as an unordered collection of words/word clusters, disregarding grammar and even word order.
- the representation can thus be a histogram (which can be stored as an optionally normalized vector) in which, for each word (or, more generally, each term) in the lexicon, a value corresponding to the number of occurrences in the object is stored.
- the first modality (textual content) representation for each object can be stored in memory 16 .
- the generation of the second modality (social network) representations can proceed before, after, or contemporaneously with the generation of the first modality representations of the objects.
- the social network representations of the objects aim to capture the hidden social network of actors sending and receiving the objects (e-mails) 24 by graphing a social network in which nodes represent actors and links between actors represent the e-mail communications between the actors.
- reference information (information referring to actors) is extracted from the relevant fields of the e-mail, such as the “to”, “from”, “cc” and “bcc” fields.
- the signature within the e-mail body may also provide reference information concerning the sender.
- the reference information is resolved, to generate a set of actors. Since e-mail addresses are not uniform, the reference resolution step involves associating references to the same person to a common normalized form—a single actor.
- the set of actors generated at S 206 and the e-mail communications between them are graphed.
- An example social network graph is illustrated in FIG. 3 .
- the graph may be stored in memory as a data structure in any suitable form. Actors who send and/or receive fewer than a threshold number of e-mails can be eliminated from the network 70 .
- social network features are extracted from the graph and associated with the respective actors.
- the graph allows various features to be extracted which provide information about the actors, such as whether they belong to a cluster 76 of actors (each actor in a cluster has sent or received e-mails from every other member of the cluster), whether the actor is a hub 80 (sending and/or receiving e-mails from at least a threshold number of other actors), the number of e-mails sent by/to the actor, and the like. Twelve social network features are described below, by way of example. It is to be appreciated that the method is not limited to any particular set of social network features.
- At least two social network features are extracted, and in another embodiment, at least four or at least six different social network features are extracted for each actor.
- the extracted social network features can be normalized and/or weighted and combined to generate a features vector for the respective actor.
- social network features are extracted for the objects, based on the features extracted for the corresponding actors (senders and recipients). In this way, the social network features for the actors are propagated to the e-mails 24 between them.
- the result is a social network features representation (e.g., in the form of a vector) for each object.
- the labeled objects 26 in the data set are identified.
- only positive (relevant objects) may be available.
- a negative set of objects may be generated (S 216 ) by identifying unlabeled objects with feature vectors that are dissimilar from the set of feature vectors belonging to the labeled objects 26 . These can be the objects which are used to train the classifier system (S 218 ).
- the sets of objects to be used as positive and negative samples are expanded through an iterative process.
- the two features vectors can be combined to generate a single D-dimensional vector for each object which is input to a single classifier.
- two classifiers 58 , 60 are trained, one using the first modality object representations and the other using the second modality object representations.
- the output of the two probabilistic classifiers 58 , 60 is combined. Two methods of combining the classifier outputs are proposed, which are referred to herein respectively as na ⁇ ve combination and co-training combination.
- the trained classifier system 34 is used to predict the labels 28 for unlabeled objects, based on their vector representations.
- the corresponding objects for the set labeled positive can then be subjected to a manual review.
- the method ends at S 222 .
- the number n of object modalities used to generate representations may be more than 2.
- Other representations of the documents 24 are also contemplated.
- the exemplary method may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, Graphical card CPU (GPU), or PAL, or the like.
- any device capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in FIG. 2 , can be used to implement the method for training and/or using the trained classifier described herein.
- the method illustrated in FIG. 2 may be implemented in a computer program product or products that may be executed on a computer.
- the computer program product may be a tangible computer-readable recording medium on which a control program is recorded, such as a disk, hard drive, or the like configured for performing the method.
- Common forms of computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, or any other tangible medium from which a computer can read and use.
- the computer program product may be integral with the computer 10 , (for example, an internal hard drive or RAM), or may be separate (for example, an external hard drive operatively connected with the computer 10 ), or may be separate and accessed via a digital data network such as a local area network (LAN) or the Internet (for example, as a redundant array of inexpensive or independent disks (RAID) or other network server storage that is indirectly accessed by the computer 10 , via a digital network).
- LAN local area network
- RAID redundant array of inexpensive or independent disks
- the method may be implemented in a transmittable carrier wave in which the control program is embodied as a data signal using transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like.
- Identical messages may be identified based, for example, on the content of the subject (title) field and a digest of the body.
- the exemplary objects 24 are e-mails in electronic form; accordingly, extraction of the text content of each of the fields is readily achieved using associated metadata.
- OCR processing may be applied to the scanned images.
- a lexicon 56 is derived from the total collection of messages.
- a word list and word frequency table are constructed from the bodies of the messages 24 .
- a number of criteria may be applied to filter some of the terms from the resulting lexicon. For example, strings of length smaller than a threshold, such as less than three characters, are excluded from the lexicon.
- a minimum threshold of occurrence may be established (e.g., a minimum of 4 occurrences in the entire corpus). This is premised on the expectation that the words that occur more frequently in documents are likely to carry more information. Words which do not meet the threshold are excluded from the lexicon.
- a Porter stemmer can be applied to reduce all words to a normalized form, and all words may be lower-cased.
- Another other way of reducing the dimensionality of the lexicon 56 , and the resulting vector representations, is by clustering features semantically.
- a soft clustering approach may be used, as described, for example, in Julien Ah-Pine and Nicolas Jacquet, Clique - based clustering for improving named entity recognition systems , in EACL, pages 51-59 (2009).
- Maximal cliques in a co-occurrence graph are clustered to obtain a score indicating the probability a word belongs to each cluster. Words which have a threshold probability are assigned to that cluster.
- Ambiguous words, which have a threshold probability of belonging to two or more clusters can be assigned to more than one cluster and thus can contribute to multiple features in the features vectors.
- Each feature represents a certain semantic field, but words grouped together in a cluster need not all have the same meaning.
- Each cluster has an identifier, which can be a numerical identifier of a word from the cluster.
- the result of the filtering is a lexicon 56 having a dimensionality corresponding to the number of words or clusters (which will both be referred to herein as “terms”) it contains.
- One level of representation particularly relevant to the distinction responsive/non-responsive is the textual content of the objects. For example, a bag-of-terms representation is generated for each e-mail from the contents of the e-mail's body and subject fields. This allows a vector to be constructed where each feature represents a term of the optionally filtered lexicon 56 and the feature values express a weight which may be based on frequency of occurrence in the object or other occurrence-based value.
- the document frequency df of a term w is the number of documents (objects) in which it occurs.
- the term frequency tf is the number of times the term occurs in a document d.
- S is the number of documents:
- the values in the textual content vector used to represent an e-mail can include any one or an optionally weighted combination of these parameters.
- classifier learning methods may be used for text as well as social network features, such as support vector machines (SVM), na ⁇ ve Bayes, and neural networks
- SVM support vector machines
- na ⁇ ve Bayes na ⁇ ve Bayes
- neural networks the feature set based on a bag-of-terms representation has some properties that make it particularly suited for SVM classification with a linear kernel (see, for example, Thorsten Joachims, A Statistical Learning Model of Text Classification for Support Vector Machines , In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 128-136, ACM Press, 2001).
- the feature set is very high-dimensional, even if dimensionality is reduced by the techniques described above.
- the vectors are very sparse, because only a small number of the terms actually occur in each respective document.
- the terms that do occur in documents may have a considerable overlap, even if the documents belong to the same category.
- there is a lot of redundancy in the feature vector a document typically has a large number of cues that signal a particular classification. Given the sparsity of the feature vectors, with one-class based SVM algorithms, a linear kernel is appropriate, although other methods such as Gaussian kernels are also contemplated.
- Another representation of the object is developed by deriving an implicit social network based on the assumption that the e-mail communication between senders and recipients can implicitly provide information on the roles of senders and recipients of the message in the network of people exchanging information.
- the structure of a large corpus of e-mail such as that produced in response to discovery requests or otherwise uncovered during the course of litigation, clearly is not homogeneous.
- the e-mails are recovered, e.g., using forensic tools, from the computers of a group of key people in an organization which is a party in a civil lawsuit or criminal prosecution, they reflect the normal interactions between people in that organization, with some communications with people outside the organization.
- the lines of communication thus implicitly instantiate a network of actors.
- the exemplary method includes developing such a social network and assigning actors with features based on the communications with others, and propagating these features to the e-mails for which the actors are senders or recipients.
- the first step is to identify references to senders/recipients (S 202 ) and then resolve these references to generate a set of actors (S 204 ).
- References to people in the electronic mail messages may be extracted from the email header, often in predefined fields tagged with metadata.
- firstname, lastname and, when available, e-mail address are extracted from all the references. Then, references that occur in the headers of the same message are reassembled.
- the premise is that often both a name and an e-mail address occur in the header, and the knowledge that a person ‘Mark Jones’ that has the e-mail address ‘mj@abccorp.com’ allows Mark Jones to be matched to the e-mail address with a certain degree of confidence.
- An “actor” is a collection of references that has been identified as pointing to the same person.
- the e-mail address is used as a primary cue. It can be assumed that if two references share an e-mail address, they likely refer to the same actor.
- a search is made for possible matches in the set of actors with the same last name, based on first name, prefix, middle name and/or nicknames (e.g., using a list of common English nicknames).
- a group of similar references refer to at most one actor in the network, it can be assumed that all yet unidentified references refer to the same actor.
- all different formats of Richard Smith's name are resolved as referring to the same actor.
- a social network graph is generated by identifying connections (e-mails in either direction) between actors.
- FIG. 3 shows a simplified exemplary social network graph 70 in which nodes 72 (here labeled with the letters of the alphabet) each represent an actor and edges 74 (shown as one directional arrows) each represent a sender to recipient connection, labeled with the number of e-mails sent in that direction.
- nodes 72 here labeled with the letters of the alphabet
- edges 74 shown as one directional arrows
- a threshold may be set on the number of e-mails on a connection for that connection to be retained in the social network graph 70 . For example a threshold of two e-mails (e.g., one in each direction or two in one direction) can be used to make sure that the majority of the traffic is taken into account while discarding any accidental links with no or little meaning.
- a set of features is associated with each of the actors (nodes) 72 .
- a feature set is selected with the aim of representing the position of correspondents in the corporate network.
- Certain properties of nodes in a communication graph can serve very well for automatically detecting the social role of actors in the network 70 (see, for example, Ryan Rowe, German Creamer, Shlomo Hershkop, and Salvatore J. Stolfo, Automated social hierarchy detection through email network analysis , in WebKDD/SNA-KDD '07: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 109-117, New York, N.Y., USA (2007) published by ACM.
- a group of social network features is selected which aim to represent, singly or in combination, key properties of actors, such as whether the actor belongs to one or more intra-communicating cliques, such as clique 76 in FIG. 3 , whether the actor serves as a hub (see, for example, hub 80 , where actor P has several outgoing edges), an authority (receiving incoming emails from many nodes), or whether the actor is central, with many paths between other actors passing through the actor's node.
- the features can be based, directly or indirectly, on e-mail traffic/connections within the social network 70 .
- at least some or all of the social network features assigned to each of the actors may be selected from and/or based on the following twelve features:
- An Activity score the number of e-mails sent (by the actor). This feature represents the activity of the actor in the network.
- a Hub score based on the number of outgoing connection lines from the actor's node.
- An Authority score based on the number of incoming connection lines to the actor's node.
- Features 2 and 3 are representative of the authority that is assigned to a node by its peers.
- nodes with a high number of incoming edges from hubs are considered to be authorities.
- Nodes that are linked to a large number of authorities are considered to be hubs.
- a range of different centrality measures have been proposed to model the position of the node in the network. These depend on an undirected unweighted version of the communication graph.
- the length of a path is the number of edges between two nodes.
- the shortest path between a pair of nodes in the graph is the path with the fewest number of intervening nodes.
- a Matlab library for working with large graphs may be used for the computation (see, David Gleich, MatlabBGL: a Matlab graph library, at www.stanford.edu/ ⁇ d GmbH/programs/matlab_bgl/, (2008)).
- the distance d st from a node s to another node t (expressed as the number of edges connecting s and t) and the number ⁇ st of paths from s to t are computed.
- the number of paths from s to t via node v (the actor's node) is denoted by ⁇ st (v)
- d vs represents the minimum distance between the actor's node v and another node s
- V represents the set of all nodes
- n is the total number of nodes.
- the mean centrality of the actor is thus the total number of nodes divided by the sum of all distances between the actor's node v and each other node in the graph, i.e., is related to the average distance to any other node.
- the degree of centrality is thus the number of the outgoing actors from each node.
- ⁇ st (v) represents the number of paths from s to t which pass through v (under the constraint that v cannot be s or t).
- max t d(v,t) represents the maximum distance, in edges, between the actor's node and all other nodes.
- C s (v) ⁇ s ⁇ v ⁇ t ⁇ st (v). This is simply the sum of the number of paths from s to t which pass through v over all values of s and t (under the constraint that v cannot be s or t), i.e., each path must comprise at least two edges and two of the edges in the path are connected with v.
- deg(v) is the degree centrality of node v obtained from feature number 5 above.
- the clustering coefficient feature identifies how close v's neighbors are to being a clique. It is given by the proportion of links
- the actual number is
- 1 (only F and J are linked). Then dividing 1 by the maximum link number 3, the clustering coefficient of E is equal to 0.3333.
- cliques are identified. There are groups of nodes in which each of the nodes is connected by at least one edge to every other node in the clique, as illustrated by clique 64 in FIG. 3 .
- the cliques in the social network graph can be identified automatically, e.g., with a Matlab implementation (see, for example, Coen Bron and Joep Kerbosch, Algorithm 457 : finding all cliques of an undirected graph, in Communications of the ACM, 16(9): 575-577 (1973).
- the minimum size of a clique may be specified, such as at least 3 nodes. Additionally, where one clique is fully included in a larger clique, only the maximal sized clique is used.
- a maximal complete subgraph (clique) is a complete subgraph that is not contained in any other complete subgraph (clique).
- a raw clique score for each clique the actor is in, a clique score is computed: a clique of size a actors is assigned a score of 2 ⁇ -1 . The scores of all the cliques that the actor is in are then summed.
- a weighted clique score for each clique of size ⁇ , with ⁇ the sum of activities (e.g., from feature No. 1 above or a sum of emails sent and received) of its members, the actor is assigned a weighted clique score ⁇ 2 ⁇ -1 . The scores of all the cliques the actor is in are then summed.
- Each of the feature scores can be scaled to a value in [0,1], where 1 indicates a higher importance.
- an actor may have a 12-value feature vector, such as [0.2, 0.1, 0.3, 0.2, 0.4, 0.1, 0.1, 0.5, 0.3, 0.2, 0.2, 0.1].
- Each node can also be assigned an overall social score which can be a linear combination (sum) of these features, where all features have equal weight, i.e., 2.7 in the above example. In other embodiments, the features may be assigned different weights.
- step S 210 classifies the nodes (actors) in the social network with a set of features.
- the next step (S 212 ) propagates the actors' features to the e-mails that have been sent and received by these actors.
- features from senders and recipients can be combined. For example, a set of 37 features is constructed to represent each e-mail.
- An e-mail is represented by three sets of 12 features (features 1-12 described above), the first set is the feature values of the sender node, the second set is the average of the feature values of all recipients of the e-mail, and the third is the feature values of the most prominent recipient.
- the most prominent recipient is the recipient of the e-mail having the highest social score (obviously, if there is only one recipient, the second and third feature sets have identical values).
- the last feature is the number of receivers of that particular email.
- this set of 37 features represents a quantification of the sender and recipient characteristics of each e-mail and provides valuable information in classifying the e-mail as responsive or not.
- different sets of features of the sender and recipient(s) may be used.
- the 37 feature values are used as the social network representation of the e-mail (S 112 ).
- the features may be assigned different weights in generating the social network representation.
- Support Vector Machines can be used for both the text-based and social network-based views of the objects. Traditionally, SVMs depend on multi-class training data. In the following, one method for generating a set of negative samples is described. It will be appreciated that if negative samples are available, this step is not necessary.
- An exemplary SVM algorithm (Algorithm 1) for generating positive and negative samples is shown below which is suited to separate training of classifiers for the first and second modalities.
- Another algorithm (Algorithm 2) is then discussed as a method of co-training the two classifiers.
- the SVM-based algorithms are capable of constructing hypotheses based on positive trainings examples only.
- the algorithms employ a semi-supervised framework, referred to as Mapping Convergence, that not only looks at the positive examples, but is also supported by the large amount of unlabeled data that is available.
- the first step is to obtain a small set of artificial negative examples from the set of unlabeled objects that have a high probability of being negative samples because their feature vectors (text and/or social network vectors) are highly dissimilar from the feature vectors of the positively labeled objects.
- the principles of mapping convergence are described in further detail in Hwanjo Yu, Single - class classification with Mapping Convergence, Machine Learning, 61(1-3):49-69, (2005).
- a process of convergence can then proceed towards an optimal hypothesis.
- ⁇ circumflex over ( ⁇ ) ⁇ Given a D-dimensional vector x, of the form:
- a is a D dimensional vector of parameters
- b is an offset parameter
- T denotes the transposition operator
- An objective of one-class SVMs is to create a hyperplane in feature space that separates the projections of the data from the origin with a large margin.
- the data is in fact separable from the origin, if there exists a normal vector w (perpendicular to the hyperplane) such that a kernel K(w, x 1 )>0, ⁇ i, where x i is an object representation—a point in space (a D-dimensional vector).
- a Gaussian kernel may be more appropriate than a polynomial kernel.
- RBF Random Basis Function
- x i and x j are data points (representations of two objects).
- the ⁇ parameter controls the smoothness of the decision boundary. Where there is no sharp boundary between positives and negatives, a value of ⁇ between about 0.1 and 1.0 may be selected.
- a maximally separating hyperplane is parameterized by (w,0) for a data set ⁇ (x i ,y 1 ) . . . , (x l ,y l ) ⁇ and with a margin
- the supporting hyperplane for ⁇ y 1 x 1 , . . . , y l x l ⁇ is parameterized by (w, ⁇ ).
- margin errors in the binary setting correspond to outliers in the one-class case.
- This first approximation of the negative distribution serves as input for a converging stage to move the boundary towards the positive training examples.
- An SVM is trained on the positive training set and the constructed negatives. The resulting hypothesis is used to classify the remaining unlabeled items. Any unlabeled items that are classified as negative are added to the negative set. The boundary which most closely fits around the remaining samples thus converges towards the boundary around the known positive samples.
- the converging stage is iterated until convergence is reached when no new negative items are discovered and the boundary divides the positive and negative data points.
- P represents the data set for the positively labeled objects.
- U represents the data set of unlabeled objects, which at the beginning accounts for the rest of the objects in the entire dataset S.
- N is an empty set.
- a one class support vector machines classifier entitled C 1 provides the first hypothesis. Thereafter, for subsequent iterations, a SVM classifier C 2 which uses positive and negative data points takes over.
- C 1 is trained on the set of positives P to identify a small set of the strongest negatives ⁇ circumflex over (N) ⁇ 0 (e.g., less than 10% of U) from among the unlabeled dataset U. The rest ⁇ circumflex over (P) ⁇ 0 of the unlabeled dataset is considered positive.
- the second classifier is trained on the respective current P and N sets to produce a hypothesis in which the most negative data points in the remaining positive set are labeled negative.
- each new hypothesis h i+1 maximizes the margin between h i and b p (the boundary for the known positive samples).
- the new boundary is not surrounded by any data, it retracts to the nearest point where the data resides.
- the iteration process may be stopped prior to its natural completion to avoid the case where the boundary returns to b p , which can happen if the gap between positive and negative data points is relatively small.
- a solution to the problem of over-iteration is finding the hypothesis that maximizes some performance measure. For example, to determine the optimal point to end the Mapping Convergence process, a graph may be plotted with the percentage of the entire data set returned on, the horizontal axis, and on the vertical axis, the approximate percentage of the actual positives that is found within that space (which may be estimated by identifying the number of labeled e-mails in all or a sample of the objects returned). An iteration number is selected when the approximate percentage of the positives begins to drop off dramatically with each further iteration.
- This approach involves identifying the point in the curve that excludes most of the data as negative, while keeping a large part of the positive data to be classified correctly. This may be after about 1-20 iterations, e.g., up to 10. The number of iterations may be different for the two types of classifier.
- the linear classifier used for text representations can reach an optimal point more quickly (fewer iterations) than the Gaussian classifier used for social networks representations.
- the initial negative data set N ⁇ and the initial step of generating a small set of negative samples can be omitted.
- MC starts out a conservative hypothesis encompassing all the positive samples and most of the unlabeled data and converges to a solution taking into account the distribution of unlabeled data.
- artificial negative items are created by labeling the ones most unsimilar to the positive training examples. This first approximation of the negative distribution serves as input for the converging stage to move the boundary towards the positive training examples.
- An SVM is trained on the positive training set and the constructed negatives. The resulting hypothesis is used to classify the remaining unlabeled items. Any unlabeled items that are classified as negative are added to the negative set.
- the converging stage is iterated until convergence, reached when no new negative items are discovered and the boundary comes to a hold. However, where there is no clear boundary between actual positives and negatives, then over-convergence can result, with boundaries being placed between clusters in the unlabeled data.
- a performance curve generated for the random data is illustrated in FIG. 4 .
- FIG. 5 schematically illustrates this in two dimensions with a set of labeled positive data shown by circles and unlabeled data by triangles.
- a tight boundary 90 around the known positives indicates the OC-SVM.
- the iterations start with a large percentage of the dataset returned as indicated by boundary 92 in FIG. 5 and the first square in FIG. 4 at the top right hand corner. Naturally, this large set contains all if not most of the true positives. It also contains a large percentage of what would be negatives, and is thus not very useful. As the iterations proceed, the number of data point returned as “positive” decreases, and some of the actual positives may be lost (outside boundary 94 ).
- the point on the curve that is closest to (0,100) may be considered to be optimal in terms of performance criteria, i.e., providing the best classifier with the parameters chosen (boundary 96 ). If the convergence is stopped on the fifth iteration, giving the classifier closest to the upper left corner of the plot, a fairly accurate description of the data may be obtained. If the iterations continue, over fitting may occur, as illustrated by boundaries 98 and 100 .
- the distance measure can be weighted to assign more importance to recall or precision.
- the Euclidean distance d to (0,100) on the performance graph is used to identify the closest point at which to stop convergence.
- the iteration number is selected to ensure that at least a threshold percentage, e.g., at least 90% or at least 95%, of the labeled positive data points are returned in the “positive” set.
- a cross-validation step may be incorporated into the algorithm.
- Each step in Mapping Convergence process may be carried out with a split of the data into several folds, such as from four to ten folds (parts).
- Using a higher number of folds in the cross-validation reduces irregularities, but comes with a higher computational cost.
- a 5-fold split over the positive data may be used.
- a hypothesis is trained on 4 of the parts, and a prediction is made on the remaining fifth part and the entire unlabeled set. This results in exactly one prediction per item in the positive set, and after aggregating the five predictions for the items in the unlabeled set, a single prediction is generated there too. This allows a solid estimate of the performance of the hypothesis to be obtained.
- Ensemble Learning Combining different classifiers to improve overall performance is known as Ensemble Learning.
- Various ways are contemplated for incorporating the two modalities into a single overall classification. In a first, naive approach, the outputs of the two classifiers are combined. In a second approach, the MC algorithm is combined with co-training.
- one classifier is trained on the representations generated in the first modality and a second classifier is trained on the representations generated in the second modality.
- Both classifiers have classified all items in the test sets, but potentially have made errors. When one of the two has made an error, ideally, it can be corrected by the second. Since the classifiers each output a prediction, such as a number in (0,1) that represents the confidence, these predictions can be averaged over multiple classifiers. The classifier that is most certain will, in the case of an error, correct the other.
- Algorithm 1 can be used to separately train two classifiers and the optimal combination of iterations of each of the two classifiers selected to produce a combined classifier, which is generally better than either one of the two classifiers.
- Another embodiment of the MC algorithm allows different classifier outputs to be into account on each of the iterative steps.
- the different classifiers cooperate in a manner that resembles co-training (see, for example, Avrim Blum and Tom Mitchell, Combining labeled and unlabeled data with co - training , in Proc. of the Workshop on Computational Learning Theory , Morgan Kaufmann Publishers (1998)).
- the more confident classifier (the one assigning the higher probability to a data point of being positive or negative) is able to overrule the other.
- the predictions of the two classifiers are aggregated by an aggregating function. In one embodiment, a fixed percentage of the unlabeled objects is labeled as negative. It is not necessary to continue labeling all data until convergence, only part of the data is needed: the part that both classifiers agree on to be negative.
- P again represents the set of positively labeled objects.
- U represents the set of unlabeled objects, which at the beginning accounts for the rest of the objects in the dataset S.
- pred i+1 (k) predict with h i+1 (k) on P i (k) ⁇ k ⁇ [1,...,n] 9.
- ⁇ circumflex over (N) ⁇ i + 1 (k) strong negatives ( ⁇ 5%) in ⁇ circumflex over (P) ⁇ i (k) by Agg(pred i+1 (0) ,..., pred i+1 (n) ) ⁇ circumflex over (P) ⁇ i+1 (k) remaining part of ⁇ circumflex over (P) ⁇ i (k) ⁇ k ⁇ [1,...,n] 10.
- i i+1 11. end while
- the interaction takes place only by means of the aggregation function that combines the predictions and thus creates a filter that can be used to select the items to label.
- the exemplary aggregation function simply sums the respective predictions of the two C 2 classifiers.
- other aggregation functions may be used which take into account the two predictions, such as a function in which one classifier's prediction is weighted more highly than the other.
- the Enron Corpus can be used to demonstrate improvements in classification performance in a one-class setting by combining classifiers of different modalities.
- the original data set consists of about 600,000 text files, ordered in 150 folders, each representing an e-mail account. In these folders any original structure the user has created has been preserved. Even though the files are raw text, each contains separately the body and several of the e-mail header fields. Some preprocessing has been performed on the headers (e.g., in some places e-mail addresses have been reconstructed). Attachments are not included.
- the Enron Corpus contains a large number of duplicate messages, ambiguous references to persons and other inconsistencies.
- the first step of preprocessing the 517,617 files in the database involves unifying identical messages based on title and a digest of the body, immediately reducing the number of messages by a little over 52% (248,155 messages remain).
- a total of 167,274 different references can be grouped as 114,986 actors. This includes a large number of references that occur only once and a small number of more central actors that are referred to in many different ways.
- FIGS. 6-10 Plots characterizing the corpus are shown in FIGS. 6-10 .
- FIG. 6 As predicted by Zipf's law, the frequency of words is seen to be inversely proportional to their rank based on frequency.
- FIG. 7 shows the distribution of message sizes. The main peak is around 11 words, with most mass for lengths between 10 and 300 words. It is evident that the average e-mail is a relatively short text, one more reason to try to use other properties in classification.
- FIG. 6 shows the frequency of words. The main peak is around 11 words, with most mass for lengths between 10 and 300 words. It is evident that the average e-mail is a relatively short text, one more reason to try to use other properties in classification.
- FIG. 8 shows that even though there are some very active actors in the network, most actors send very few e-mails.
- the number of recipients per message shows a Zipf-like distribution: there are some e-mails with a very large number of recipients (up to 1000), but most communications are aimed at a small group of recipients.
- FIG. 10 shows the number of e-mails per week that are sent over the timeframe of interest to the Enron Corpus. Vertical lines indicate that a message in the DOJ subset is in that week. It can be seen that the timestamps of e-mails on the exhibit list are clustered around crucial dates in the litigation.
- a framework was developed using the Python language using the LIBSVM library (see, Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for Support Vector Machines, 2001. Software available at www.csie.ntu.edu.tw/ ⁇ cjlin/libsvm.)
- a linear kernel was used for the text-based feature sets and a Gaussian kernel was used with the social network-based sets.
- Text and social network feature representations were generated for each e-mail, as described above. Parameters were adjusted to try to get the optimal settings of parameters to obtain good classifiers for use in combining their predictions.
- FIG. 13 shows the effect of selecting different values of ⁇ Because of the different nature of the features used in the document representation based on the implicit social network, less tuning is needed. The feature values are fixed. No evaluation was made of the effect of reducing the number of features. For the Gaussian kernel, different values of the y parameter, which controls the smoothness of the decision boundary, were evaluated. The optimal value of y for the data set, from those tested, was found to be 0.1. Significantly larger values tend to lead to under-fitting: large steps in the convergence. Significantly smaller values also tend to lead to under-fitting: giving good performance until a certain point, with erratic behavior thereafter.
- the performance curve for the “best” classifier found for text representations is shown in FIG. 14 . It can be seen that during the convergence, performance degrades slowly, with a drop at the end. The object was to select a classifier that is just before the drop. Note also that the algorithm is clearly beating OC-SVM. The algorithm takes a huge first step in the convergence, yielding a hypothesis that separates out 75.8% of the positives in 16.8% of the data.
- the performance curve for the “best” classifier found for social networks representations (“soc”) is shown in FIG. 15 .
- 71.5% of the positives are selected in 9.4% of the data.
- the hypotheses appearing on the curve are based on a small part of the data.
- the performance is excellent, providing above 90% recall of the positives while discarding over 90% of the data.
- FIG. 17 shows the results obtained by using co-training (MCC) with Algorithm 2.
- the curves can be compared by comparing their “best” classifier, taking the Euclidian distance to (0,100), the perfect classifier, to be the measure of comparison.
- TABLE 1 lists the “best” classifiers of the curves. We can see that the combination of social network-based and text-based feature sets does indeed yield very good results.
- cross validation appears to improve results with a corpus which does not contain labels for much of the data.
- Five-fold cross validation was used in the present example. The split is randomly made on every step in the convergence. Discrepancy between runs could be reduced by using a greater number of folds, e.g., 10-fold cross-validation or higher, although at a higher computational cost.
- the large initial corpus could be randomly subdivided into subsets and the method performed, as described above, for each of the subsets, using the same set of initial positives.
- the output of positives for each subset could then be combined to generate a set of objects for review by appropriate trained personnel.
- a classifier trained on one subset of unlabeled objects could be used to label the entire corpus of unlabeled objects.
Abstract
Description
- The exemplary embodiment relates to object classification. It finds particular application in connection with multi-modality one-class classification of a large corpus of documents, based on extracted features, and in one particular case, where only a small corpus of labeled documents may be available.
- In a world where information becomes available in ever increasing quantities, document classification plays an important role by preselecting what documents are to be reviewed by a person and in what order. Applications of document selection range from search engines to spam filtering. However, more specialized tasks can be approached with the same techniques, such as document review in large corporate litigation cases.
- During the pre-trial discovery process, the parties are requested to produce relevant documents. In cases involving large corporations, document production involves reviewing and producing documents which are responsive to the discovery requests to the case. The number of documents under review may easily run in the millions.
- The review of documents by trained personnel is both time-consuming and costly. Additionally, human annotators are prone to errors. Accuracy and lack of consistency between annotators can be a problem. It has been found that both speed and accuracy of reviewers can be improved dramatically by grouping and ordering documents.
- Systems have been developed to support human annotators by discovering structure in the corpus and presenting documents in a natural order. Usually the software that organizes the documents takes into account only the textual content of the documents.
- The following references, the disclosures of which are incorporated herein by reference in their entireties, are mentioned:
- U.S. Pub. No. 2008/0069456, published Jun. 4, 2009, entitled BAGS OF VISUAL CONTEXT-DEPENDENT WORDS FOR GENERIC VISUAL CATEGORIZATION, by Florent Perronnin, U.S. Pub. No. 2009/0144033, entitled OBJECT COMPARISON, RETRIEVAL, AND CATEGORIZATION METHODS AND APPARATUSES, by Yan Liu, et al., and U.S. Ser. No. 12/252,531, filed Oct. 16, 2008, entitled MODELING IMAGES AS MIXTURES OF IMAGE MODELS, by Florent Perronnin, et al. disclose systems and methods for categorizing images based on content.
- Support Vector Machines as text classifiers are described, for example, in U.S. Pat. No. 7,386,527, entitled EFFECTIVE MULTI-CLASS SUPPORT VECTOR MACHINE CLASSIFICATION.
- In accordance with one aspect of the exemplary embodiment, a method of classification includes, for each of a plurality of modalities, extracting features from objects in a set of objects, the objects comprising electronic mail messages, and generating a representation of each object based on its extracted features, at least one of the plurality of modalities being a social-network modality in which social network features are extracted from a social network implicit in the electronic mail messages. The method further includes training a classifier system based on class labels of a subset of the set of objects and on the representations generated for each of the modalities. With the trained classifier system, labels are predicted for unlabeled objects in the set of objects. Any one or more of these steps may be implemented by a computer processor.
- In accordance with another aspect of the exemplary embodiment, a classification apparatus includes an input for receiving a set of objects, the objects comprising electronic mail messages, a subset of the objects having class labels. A first feature extractor extracts text-based features from objects in a set of objects. A second feature extractor extracts social network-based features from the objects in the set of objects. A classifier system, executed by a computer processor, which predicts labels for unlabeled objects in the set of objects based on the extracted text-based and social network-based features.
- In another aspect, a classification method includes, for each of a plurality of modalities, extracting features from objects in a set of objects comprising electronic mail messages and generating a representation of each object based on its extracted features. A one-class classifier system is trained, based on class labels of a subset of the set of objects and on the representations generated for each of the modalities. The training includes, for each of the modalities and based on an initial set of objects positively labeled with respect to the class, generating an initial hypothesis which predicts negative labels for a subset of the unlabeled objects in the set and iteratively generating a new hypothesis in which a new boundary between representations of objects predicted as having negative labels and representations of objects predicted as having positive labels converges towards an original boundary between the representations of the initial positively labeled objects and the rest of the objects in the set. With the trained classifier system, labels are predicted for unlabeled objects in the set of objects.
-
FIG. 1 is a functional block diagram of an apparatus for training a multimodality classification system in accordance with one aspect of the exemplary embodiment; -
FIG. 2 is a flow diagram illustrating a method for training and using a multimodality classification system in accordance with another aspect of the exemplary embodiment; -
FIG. 3 illustrates a social network graph; -
-
FIG. 5 schematically illustrates mapping convergence in two dimensions with positive data as circles and unlabeled data as triangles; -
FIG. 6 is a plot of number of occurrences per word in the Enron corpus of e-mails. -
FIG. 7 is a plot of message size for the Enron corpus showing the number of messages for e-mail messages of a given word length; -
FIG. 8 is a plot of the number of messages per sender for the Enron corpus; -
FIG. 9 is a plot of the number of receivers per message for the Enron corpus; -
FIG. 10 is a plot of number of messages per week in the Enron corpus over a timeline of about 6 years with vertical lines indicating weeks for which a responsive e-mail was generated; -
FIG. 11 is a plot illustrating the effect of number of text features on the performance of a text content classifier on the Enron corpus: (a) all features=the bag of words (bow) in the lexicon after removing words of less than 3 characters and porter stemming; (b) 6522—the terms produced by semantic clustering (sem), (c) 1000 terms-selected as being the most important features, and (d) 500 terms-selected as being the most important features; comparative results of a one class classifier without mapping convergence (OC-SVM) are also shown in this plot and in FIGS. 12 and 14-17; -
FIG. 12 is a plot illustrating the effect of feature value type on performance of a text content classifier on the Enron corpus: (a) the bag of words feature set (bow)-using all features, with td-idf as the type of feature value; (b) a bag of clusters feature set (sem), using 6522 terms generated by semantic clustering, with td-idf; and (c) the same bag of clusters feature set, with binary values (bin); -
FIG. 13 is a plot illustrating the effect of γ on performance of a social networks content classifier on the Enron corpus; -
FIG. 14 is a plot illustrating the performance of the “best” text-based classifier identified in the tests, using a bag of words feature set and td-idf feature values, the convergence steps being shown by squares, starting at the top right of the graph; -
FIG. 15 is a plot illustrating the performance of the “best” social network-based classifier identified in the tests, using γ=0.1, the convergence steps being shown by squares, starting at the top right of the graph; -
FIG. 16 is a plot illustrating the effect on performance of combining classifiers by linear combination on the Enron corpus; and -
FIG. 17 is a plot illustrating the effect of combining classifiers by co-training with a Mapping Co-convergence framework on performance on the Enron corpus. - Aspects of the exemplary embodiment relate to a method and apparatus for improving classification performance in a one-class setting by combining classifiers of different modalities. In the following, the term “multi-modality” is used to refer to a combination of different levels of description to aid classification tasks.
- The method is particularly suited to classification of a large corpus of electronic mail messages, such as e-mails, text messages (SMS), and other electronic documents which include references to the sender and receiver. By way of example, the problem of distinguishing responsive documents in a corpus of e-mails is used to demonstrate the method. However, the same principles are applicable to similar classification problems where collaborators working on a set of documents can instantiate social features (available as a result of collaborative edition, version control, and/or traceability in a document processing system).
- The method provides a way to turn the social network that is implicit in a large body of electronic communication documents into valuable features for classifying the exchanged documents. Working in a one-class setting, a semi-supervised approach, based on the Mapping Convergence framework, may be used. An alternative interpretation, that allows for broader applicability by dismissing the prerequisite that positive and negative items must be naturally separable, is disclosed. An extension to the one-class evaluation framework is proposed, which is found to be useful, even when very few positive training examples are available. The one-class setting is extended to a co-training principle that enables taking advantage of the availability of multiple redundant views of the data. This extension is evaluated on the Enron Corpus, for classifying responsiveness of documents. A way to turn the social network that is implicit in a large body of electronic communication into valuable features for classifying the exchanged documents is also disclosed. A combination of text-based features and features based on this second extra-textual modality has been shown to improve classification results.
- In the exemplary embodiment, the multi-modality of e-mail is used for classification. E-mail not only includes text, but it also implicitly instantiates a social network of people communicating with each other. Document representations modeling these distinct levels are constructed, which are then combined for classification. This involves an integration of different aspects, such as the topic of an e-mail, the sender, and the receivers. Classifying with respect towards responsiveness is used as an example (in this case, whether or not the message is of relevance to a particular litigation matter).
- An algorithm developed for use in the method is specifically aimed at classifying items based on a very small set of positive training examples and a large amount of unlabeled data. However, the method also finds application in situations where both positive and negative labeled samples are available.
- With reference to
FIG. 1 , an apparatus for classification of electronic data objects, such as e-mails, is illustrated, in the form of a digital processing device, such as acomputer 10. Thecomputer 10 includes adigital processor 12, such as the computer's CPU, and associated memory, here illustrated asmain memory 14 anddata memory 16. Thedigital processor 12 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. Thedigital processor 12, in addition to controlling the operation of thecomputer 10, executes instructions stored inmemory 14 for performing the method outlined inFIG. 2 . - The
computer 10 may include one or more dedicated or general purpose computing devices, such as a server computer or a desktop or laptop computer with an associated display device and a user input device, such as a keyboard and/or cursor control device (not shown). - The
memories memory - The term “software” as used herein is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth. Such software may be organized in various ways, and may include software components organized as libraries, Internet-based programs stored on a remote server or so forth, source code, interpretive code, object code, directly executable code, and so forth. It is contemplated that the software may invoke system-level code or calls to other software residing on a server or other location to perform certain functions.
- The illustrated
computer 10 includes aninput interface 20 and anoutput interface 22, which may be combined or separate.Interface 20 receives a dataset of electronic data objects 24 to be classified. Aportion 26, typically only a small portion, of the objects in thedataset 24 has been labeled according to class.Interface 22 outputs predictedlabels 28 for unlabeled data objects. Exemplary input and output interfaces include wired and wireless network interfaces, such as modems, or local interfaces, such as USB ports, disk drives, and the like.Components control bus 30. - The
computer 10 is configured by suitable programming or hardwired firmware to embody aclassifier training system 32 for training aclassifier system 34. The exemplary training includes assigning values to parameters of a one class classification algorithm for classifying unlabeled objects as responsive (in the class) or not. In the exemplary embodiment,classifier training system 32 andclassifier system 34 may be embodied in software instructions stored inmemory 14 and executed byprocessor 12. - The
classifier training system 32 operates on adataset 24 comprising a suitable number of labeled training objects 26. The labels represent a priori information about the classifications of the images, such as manually applied class labels. For a hard binary classification, the labels can, for example, be “+1” if the object is assigned to the class and “−1” otherwise. For a soft binary classification, the labels can, for example, be values in the range [0,1] indicating likelihood of membership in the class. - The
classifier training system 32 may be embodied in software, hardware or a combination thereof. In the exemplary embodiment,system 32 includesvarious software modules processor 12, although it is to be appreciated that the modules may be combined and/or distributed over two or more computing devices. Adata extraction component 40 extracts text content of the objects in the data set and in particular, the content of an electronic message body as well as the content of header fields. Alexicon 56 is generated by the extraction component from the extracted text content and may be stored in memory, such asmain memory 14 ordata memory 16. - A
first features extractor 42 extracts text-based features from the extracted text content, in particular from the email body. Arepresentation generator 44 generates a representation of each object 24 (e.g., a features vector) based on the extracted text-content features. Areference resolution component 46 resolves references in the objects to actors (people) in a social network of actors responsible for sending and receiving the electronic messages in the data set. A socialnetwork extraction component 48 generates a graph in which actors are represented by nodes and electronic mail traffic between actors is represented by edges, each edge labeled with a number of electronic mail messages in a given direction. Asecond features extractor 50 extracts features for each of theelectronic mail messages 24 based on sets of features assigned to the sending and receiving actors of that message. The actors' features are based, at least in part, on the email traffic between actors in the network and are selected to reflect the actor's relative importance in the social network. - The text-based representation and social network-based representation generated for each
object 24 are input to aclassifier learning component 54 which trainsclassifiers classification system 34 for the respective modalities. The classification system combines the output of the classifiers to identify a classification boundary, whereby objects predicted to be within the class can be labeled with the class label, e.g., “responsive,” and objects outside the class can optionally also be labeled accordingly (e.g., as non-responsive). - While the
exemplary system 32 has been described with reference to two modalities—text and social networks, it is to be appreciated that more than two modalities and/or different types of modality are also contemplated. - With reference to
FIG. 2 , a computer-implemented method which may be performed with the apparatus ofFIG. 1 is shown. The method, details of which are described in greater detail below, begins at S100. - At S102, a
dataset 24 of S objects is input and may be stored incomputer memory 16 during processing. The dataset may be preprocessed to remove duplicate objects. - In the following steps, representations of the objects are generated in n different modalities, based on features extracted from the input objects, where n≧2. In the exemplary embodiment, two modalities are used: text-based and social network-based. In the first modality, the features may be based on textual content and the method may proceed as follows:
- At S104, textual content is extracted for each
object 24. In the case of e-mails, the textual content may be extracted from the subject (title) field, the body of the e-mail (i.e., the message) and optionally also from any text attachments. - At S106, a
lexicon 56 is generated, based on the textual content of all the e-mails in the dataset. - At S108, the
lexicon 56 can be processed to reduce its dimensionality. For example, very frequently used words (such as “the” and “and”) and/or words below a threshold length can be excluded from the lexicon. Additionally, words can be grouped in clusters, based on semantic similarity, or by automatically applying co-occurrence rules to identify words used in similar contexts. - At S110, a representation is generated for each
object 24, based on the text content. This can be in the form of a bag-of-words or bag-of-clusters representation (collectively referred to as bag-of-terms representations). In this model, text content is represented as an unordered collection of words/word clusters, disregarding grammar and even word order. The representation can thus be a histogram (which can be stored as an optionally normalized vector) in which, for each word (or, more generally, each term) in the lexicon, a value corresponding to the number of occurrences in the object is stored. The first modality (textual content) representation for each object can be stored inmemory 16. - The generation of the second modality (social network) representations can proceed before, after, or contemporaneously with the generation of the first modality representations of the objects. The social network representations of the objects aim to capture the hidden social network of actors sending and receiving the objects (e-mails) 24 by graphing a social network in which nodes represent actors and links between actors represent the e-mail communications between the actors.
- At S204, reference information (information referring to actors) is extracted from the relevant fields of the e-mail, such as the “to”, “from”, “cc” and “bcc” fields. The signature within the e-mail body may also provide reference information concerning the sender.
- At S206, the reference information is resolved, to generate a set of actors. Since e-mail addresses are not uniform, the reference resolution step involves associating references to the same person to a common normalized form—a single actor.
- At S208, the set of actors generated at S206 and the e-mail communications between them are graphed. An example social network graph is illustrated in
FIG. 3 . As will be appreciated, the graph may be stored in memory as a data structure in any suitable form. Actors who send and/or receive fewer than a threshold number of e-mails can be eliminated from thenetwork 70. - At S210, for each
actor 72 in thenetwork 70, social network features are extracted from the graph and associated with the respective actors. The graph allows various features to be extracted which provide information about the actors, such as whether they belong to acluster 76 of actors (each actor in a cluster has sent or received e-mails from every other member of the cluster), whether the actor is a hub 80 (sending and/or receiving e-mails from at least a threshold number of other actors), the number of e-mails sent by/to the actor, and the like. Twelve social network features are described below, by way of example. It is to be appreciated that the method is not limited to any particular set of social network features. However, in one embodiment, at least two social network features are extracted, and in another embodiment, at least four or at least six different social network features are extracted for each actor. The extracted social network features can be normalized and/or weighted and combined to generate a features vector for the respective actor. - At S212, social network features are extracted for the objects, based on the features extracted for the corresponding actors (senders and recipients). In this way, the social network features for the actors are propagated to the
e-mails 24 between them. The result is a social network features representation (e.g., in the form of a vector) for each object. - At S214, the labeled objects 26 in the data set are identified. In the exemplary embodiment, only positive (relevant objects) may be available. In this case, a negative set of objects may be generated (S216) by identifying unlabeled objects with feature vectors that are dissimilar from the set of feature vectors belonging to the labeled objects 26. These can be the objects which are used to train the classifier system (S218). In the exemplary embodiment, the sets of objects to be used as positive and negative samples are expanded through an iterative process.
- In one embodiment, the two features vectors (text and social network features vectors) can be combined to generate a single D-dimensional vector for each object which is input to a single classifier. In the exemplary embodiment, however, two
classifiers label 28 for an unlabeled object, the output of the twoprobabilistic classifiers - At S220, the trained
classifier system 34 is used to predict thelabels 28 for unlabeled objects, based on their vector representations. The corresponding objects for the set labeled positive can then be subjected to a manual review. - The method ends at S222.
- As will be appreciated, the number n of object modalities used to generate representations may be more than 2. Other representations of the
documents 24 are also contemplated. - The exemplary method may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, Graphical card CPU (GPU), or PAL, or the like. In general, any device, capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in
FIG. 2 , can be used to implement the method for training and/or using the trained classifier described herein. - The method illustrated in
FIG. 2 may be implemented in a computer program product or products that may be executed on a computer. The computer program product may be a tangible computer-readable recording medium on which a control program is recorded, such as a disk, hard drive, or the like configured for performing the method. Common forms of computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, or any other tangible medium from which a computer can read and use. The computer program product may be integral with thecomputer 10, (for example, an internal hard drive or RAM), or may be separate (for example, an external hard drive operatively connected with the computer 10), or may be separate and accessed via a digital data network such as a local area network (LAN) or the Internet (for example, as a redundant array of inexpensive or independent disks (RAID) or other network server storage that is indirectly accessed by thecomputer 10, via a digital network). Alternatively, the method may be implemented in a transmittable carrier wave in which the control program is embodied as a data signal using transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like. - Various steps of the method will now be described in greater detail.
- Since both senders and one or more recipients of e-mails may store an e-mail message, duplicate messages may exist and may be pruned from the dataset. Identical messages may be identified based, for example, on the content of the subject (title) field and a digest of the body.
- The
exemplary objects 24 are e-mails in electronic form; accordingly, extraction of the text content of each of the fields is readily achieved using associated metadata. For scanned documents, OCR processing may be applied to the scanned images. - To be able to construct bag-of-terms representations later, a
lexicon 56 is derived from the total collection of messages. A word list and word frequency table are constructed from the bodies of themessages 24. A number of criteria may be applied to filter some of the terms from the resulting lexicon. For example, strings of length smaller than a threshold, such as less than three characters, are excluded from the lexicon. A minimum threshold of occurrence may be established (e.g., a minimum of 4 occurrences in the entire corpus). This is premised on the expectation that the words that occur more frequently in documents are likely to carry more information. Words which do not meet the threshold are excluded from the lexicon. A Porter stemmer can be applied to reduce all words to a normalized form, and all words may be lower-cased. - Another other way of reducing the dimensionality of the
lexicon 56, and the resulting vector representations, is by clustering features semantically. A soft clustering approach may be used, as described, for example, in Julien Ah-Pine and Guillaume Jacquet, Clique-based clustering for improving named entity recognition systems, in EACL, pages 51-59 (2009). Maximal cliques in a co-occurrence graph are clustered to obtain a score indicating the probability a word belongs to each cluster. Words which have a threshold probability are assigned to that cluster. Ambiguous words, which have a threshold probability of belonging to two or more clusters can be assigned to more than one cluster and thus can contribute to multiple features in the features vectors. The vectors produced by using features generated by a clustering technique are described herein as bag-of-clusters representations. Each feature represents a certain semantic field, but words grouped together in a cluster need not all have the same meaning. Each cluster has an identifier, which can be a numerical identifier of a word from the cluster. - Reducing the dimensionality can sometimes improve the performance of a one class classifier by reducing noise. However, it is to be appreciated that there is a chance that certain infrequent words may contain key information and thus recall may be reduced if the
lexicon 56 is reduced to too great an extent. In the exemplary embodiment, a large feature set was found to provide higher performance. - The result of the filtering is a
lexicon 56 having a dimensionality corresponding to the number of words or clusters (which will both be referred to herein as “terms”) it contains. - One level of representation particularly relevant to the distinction responsive/non-responsive is the textual content of the objects. For example, a bag-of-terms representation is generated for each e-mail from the contents of the e-mail's body and subject fields. This allows a vector to be constructed where each feature represents a term of the optionally filtered
lexicon 56 and the feature values express a weight which may be based on frequency of occurrence in the object or other occurrence-based value. - From the
lexicon 56, vectors are generated, with each feature representing one term. For the values, various parameters can be used, such as tf-idf frequency, and binary values. The document frequency df of a term w is the number of documents (objects) in which it occurs. The term frequency tf is the number of times the term occurs in a document d. S is the number of documents: -
- The values in the textual content vector used to represent an e-mail can include any one or an optionally weighted combination of these parameters.
- While a variety of classifier learning methods may be used for text as well as social network features, such as support vector machines (SVM), naïve Bayes, and neural networks, the feature set based on a bag-of-terms representation has some properties that make it particularly suited for SVM classification with a linear kernel (see, for example, Thorsten Joachims, A Statistical Learning Model of Text Classification for Support Vector Machines, In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 128-136, ACM Press, 2001).
- First, the feature set is very high-dimensional, even if dimensionality is reduced by the techniques described above. Second, the vectors are very sparse, because only a small number of the terms actually occur in each respective document. Third, the terms that do occur in documents may have a considerable overlap, even if the documents belong to the same category. Fourth, there is a lot of redundancy in the feature vector: a document typically has a large number of cues that signal a particular classification. Given the sparsity of the feature vectors, with one-class based SVM algorithms, a linear kernel is appropriate, although other methods such as Gaussian kernels are also contemplated.
- Another representation of the object is developed by deriving an implicit social network based on the assumption that the e-mail communication between senders and recipients can implicitly provide information on the roles of senders and recipients of the message in the network of people exchanging information.
- The structure of a large corpus of e-mail, such as that produced in response to discovery requests or otherwise uncovered during the course of litigation, clearly is not homogeneous. Where the e-mails are recovered, e.g., using forensic tools, from the computers of a group of key people in an organization which is a party in a civil lawsuit or criminal prosecution, they reflect the normal interactions between people in that organization, with some communications with people outside the organization. The lines of communication thus implicitly instantiate a network of actors. The exemplary method includes developing such a social network and assigning actors with features based on the communications with others, and propagating these features to the e-mails for which the actors are senders or recipients.
- The first step is to identify references to senders/recipients (S202) and then resolve these references to generate a set of actors (S204). References to people in the electronic mail messages may be extracted from the email header, often in predefined fields tagged with metadata.
- Ambiguities and inconsistencies among the references in the sender (“from”) and recipient (“to, “cc”, and “bcc”) fields may exist. These references may be names, coming from the personal address book of the user (which is assumed not to be available). To be able to extract the implicit social network, these are matched to specific actors. There may be several references to the same actor. For example, ‘Richard Smith,’ ‘Rick Smith,’ ‘Richard B. Smith,’ ‘richard.smith@abccorp.com,’ ‘rsmith@.ext.abccorp.com,’ ‘Smith-R,’ etc., likely all refer to the same person.
- Using regular expressions, firstname, lastname and, when available, e-mail address are extracted from all the references. Then, references that occur in the headers of the same message are reassembled. The premise is that often both a name and an e-mail address occur in the header, and the knowledge that a person ‘Mark Jones’ that has the e-mail address ‘mj@abccorp.com’ allows Mark Jones to be matched to the e-mail address with a certain degree of confidence.
- Having recombined these different references, the next step is to relate them to the references in other messages. An “actor” is a collection of references that has been identified as pointing to the same person. The e-mail address is used as a primary cue. It can be assumed that if two references share an e-mail address, they likely refer to the same actor. Secondly, for each yet unidentified reference, a search is made for possible matches in the set of actors with the same last name, based on first name, prefix, middle name and/or nicknames (e.g., using a list of common English nicknames). Provided a group of similar references refer to at most one actor in the network, it can be assumed that all yet unidentified references refer to the same actor. As a result of these two steps, all different formats of Richard Smith's name are resolved as referring to the same actor.
- A social network graph is generated by identifying connections (e-mails in either direction) between actors.
FIG. 3 shows a simplified exemplarysocial network graph 70 in which nodes 72 (here labeled with the letters of the alphabet) each represent an actor and edges 74 (shown as one directional arrows) each represent a sender to recipient connection, labeled with the number of e-mails sent in that direction. As will be appreciated, a single e-mail can have many recipients, so can contribute to more than one connection. A threshold may be set on the number of e-mails on a connection for that connection to be retained in thesocial network graph 70. For example a threshold of two e-mails (e.g., one in each direction or two in one direction) can be used to make sure that the majority of the traffic is taken into account while discarding any accidental links with no or little meaning. - Due to the Zipf-like distribution of the connection strengths this reduces the number of actors to take into consideration considerably, without losing much relevant information. Actors which are no longer connected to any other are removed. For example,
actor Z 78 with only one e-mail, may be excluded from thesocial network 70. The largest connected subgraph (which may account for about 95% of the actors) of this correspondence graph is then taken as the social network. -
- Associating Features with Actors in the Social Network (S208)
- Having established a
social network 70, a set of features is associated with each of the actors (nodes) 72. A feature set is selected with the aim of representing the position of correspondents in the corporate network. Certain properties of nodes in a communication graph can serve very well for automatically detecting the social role of actors in the network 70 (see, for example, Ryan Rowe, German Creamer, Shlomo Hershkop, and Salvatore J. Stolfo, Automated social hierarchy detection through email network analysis, in WebKDD/SNA-KDD '07: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 109-117, New York, N.Y., USA (2007) published by ACM. - A group of social network features is selected which aim to represent, singly or in combination, key properties of actors, such as whether the actor belongs to one or more intra-communicating cliques, such as
clique 76 inFIG. 3 , whether the actor serves as a hub (see, for example,hub 80, where actor P has several outgoing edges), an authority (receiving incoming emails from many nodes), or whether the actor is central, with many paths between other actors passing through the actor's node. The features can be based, directly or indirectly, on e-mail traffic/connections within thesocial network 70. In the exemplary embodiment, at least some or all of the social network features assigned to each of the actors may be selected from and/or based on the following twelve features: - 1. An Activity score: the number of e-mails sent (by the actor). This feature represents the activity of the actor in the network.
- 2. A Hub score: based on the number of outgoing connection lines from the actor's node.
- 3. An Authority score: based on the number of incoming connection lines to the actor's node.
-
Features - A range of different centrality measures have been proposed to model the position of the node in the network. These depend on an undirected unweighted version of the communication graph. The length of a path is the number of edges between two nodes. The shortest path between a pair of nodes in the graph is the path with the fewest number of intervening nodes. A Matlab library for working with large graphs may be used for the computation (see, David Gleich, MatlabBGL: a Matlab graph library, at www.stanford.edu/˜dgleich/programs/matlab_bgl/, (2008)). For the following five centrality features, the distance dst from a node s to another node t (expressed as the number of edges connecting s and t) and the number σst of paths from s to t are computed. The number of paths from s to t via node v (the actor's node) is denoted by σst(v)
- 4. A Mean centrality:
-
- where dvs represents the minimum distance between the actor's node v and another node s, V represents the set of all nodes, and n is the total number of nodes. The mean centrality of the actor is thus the total number of nodes divided by the sum of all distances between the actor's node v and each other node in the graph, i.e., is related to the average distance to any other node.
- 5. A Degree of centrality: deg(v)=|s|(v,s)εE. The degree of centrality is thus the number of the outgoing actors from each node.
- 6. Neighborhood centrality:
-
- where σst(v) represents the number of paths from s to t which pass through v (under the constraint that v cannot be s or t).
-
- 7. Closeness centrality:
- where maxtd(v,t) represents the maximum distance, in edges, between the actor's node and all other nodes.
- 8. Stress centrality: Cs(v)=Σs≠v≠tσst(v). This is simply the sum of the number of paths from s to t which pass through v over all values of s and t (under the constraint that v cannot be s or t), i.e., each path must comprise at least two edges and two of the edges in the path are connected with v.
- The next feature characterizes the connectedness in the direct neighborhood of the node:
- 9. Clustering coefficient:
-
- deg(v) is the degree centrality of node v obtained from
feature number 5 above. The clustering coefficient feature identifies how close v's neighbors are to being a clique. It is given by the proportion of links |(s,t)| between the neighbors divided by the number deg(v)(deg(v)−1)/2 of links that could possibly exist between them in an undirected graph. - As an example, consider node E in
FIG. 3 . Node E has only 3 neighbors (namely, H, F, and J), since these nodes are all no more than one edge away from node E. Therefore, for node E, deg(v)=3. The number of possible links just between these 3 neighbors is between 0 and, by simple enumeration, deg(v)(deg(v)−1)/2=3*2/2=3. The actual number is |(s,t)|=1 (only F and J are linked). Then dividing 1 by themaximum link number 3, the clustering coefficient of E is equal to 0.3333. - For the following group of features, cliques are identified. There are groups of nodes in which each of the nodes is connected by at least one edge to every other node in the clique, as illustrated by clique 64 in
FIG. 3 . The cliques in the social network graph can be identified automatically, e.g., with a Matlab implementation (see, for example, Coen Bron and Joep Kerbosch, Algorithm 457: finding all cliques of an undirected graph, in Communications of the ACM, 16(9): 575-577 (1973). The minimum size of a clique may be specified, such as at least 3 nodes. Additionally, where one clique is fully included in a larger clique, only the maximal sized clique is used. A maximal complete subgraph (clique) is a complete subgraph that is not contained in any other complete subgraph (clique). - The following clique based features may be employed:
- 10. The number of cliques that an actor is in.
- 11. A raw clique score: for each clique the actor is in, a clique score is computed: a clique of size a actors is assigned a score of 2α-1. The scores of all the cliques that the actor is in are then summed.
- 12. A weighted clique score: for each clique of size α, with β the sum of activities (e.g., from feature No. 1 above or a sum of emails sent and received) of its members, the actor is assigned a weighted clique score β·2α-1. The scores of all the cliques the actor is in are then summed.
- Each of the feature scores can be scaled to a value in [0,1], where 1 indicates a higher importance. Thus for example, an actor may have a 12-value feature vector, such as [0.2, 0.1, 0.3, 0.2, 0.4, 0.1, 0.1, 0.5, 0.3, 0.2, 0.2, 0.1]. Each node can also be assigned an overall social score which can be a linear combination (sum) of these features, where all features have equal weight, i.e., 2.7 in the above example. In other embodiments, the features may be assigned different weights.
- As will be appreciated, step S210 classifies the nodes (actors) in the social network with a set of features. The next step (S212) propagates the actors' features to the e-mails that have been sent and received by these actors. To translate the properties of actors to properties of e-mails, features from senders and recipients can be combined. For example, a set of 37 features is constructed to represent each e-mail. An e-mail is represented by three sets of 12 features (features 1-12 described above), the first set is the feature values of the sender node, the second set is the average of the feature values of all recipients of the e-mail, and the third is the feature values of the most prominent recipient. In this embodiment, the most prominent recipient is the recipient of the e-mail having the highest social score (obviously, if there is only one recipient, the second and third feature sets have identical values). The last feature is the number of receivers of that particular email.
- It has been found that this set of 37 features, based on the social network implicit in the corpus, represents a quantification of the sender and recipient characteristics of each e-mail and provides valuable information in classifying the e-mail as responsive or not. Obviously, different sets of features of the sender and recipient(s) may be used.
- In the exemplary embodiment, the 37 feature values are used as the social network representation of the e-mail (S112). In other embodiments, the features may be assigned different weights in generating the social network representation. There may also be fewer or more than 37 features for each email, such as at least eight features, e.g., including at least four sender-based features and at least 4 recipient-based features.
- In the exemplary embodiment, it is assumed that only positively labeled examples are available, with no negatively labeled samples. Support Vector Machines (SVMs) can be used for both the text-based and social network-based views of the objects. Traditionally, SVMs depend on multi-class training data. In the following, one method for generating a set of negative samples is described. It will be appreciated that if negative samples are available, this step is not necessary.
- It is also assumed that only a small set of positively labeled e-mails are available and that the ratio between positive and negative objects in the corpus is unknown, but assumed to be unbalanced where the positives are the minority class. The positive items are assumed to be drawn from a certain distribution, whereas the negatives are everything else.
- An exemplary SVM algorithm (Algorithm 1) for generating positive and negative samples is shown below which is suited to separate training of classifiers for the first and second modalities. Another algorithm (Algorithm 2) is then discussed as a method of co-training the two classifiers. The SVM-based algorithms are capable of constructing hypotheses based on positive trainings examples only. The algorithms employ a semi-supervised framework, referred to as Mapping Convergence, that not only looks at the positive examples, but is also supported by the large amount of unlabeled data that is available. In this approach, the first step is to obtain a small set of artificial negative examples from the set of unlabeled objects that have a high probability of being negative samples because their feature vectors (text and/or social network vectors) are highly dissimilar from the feature vectors of the positively labeled objects. The principles of mapping convergence are described in further detail in Hwanjo Yu, Single-class classification with Mapping Convergence, Machine Learning, 61(1-3):49-69, (2005).
- A process of convergence can then proceed towards an optimal hypothesis.
- Various types of kernel may be employed. For the text content classifier, a linear kernel was found to be effective. In training a linear classifier, the goal is to learn a class predictor {circumflex over (γ)}, given a D-dimensional vector x, of the form:
-
{circumflex over (γ)}(x)=a T ·x+b (1) - where a is a D dimensional vector of parameters, b is an offset parameter, and T denotes the transposition operator.
- An objective of one-class SVMs is to create a hyperplane in feature space that separates the projections of the data from the origin with a large margin. The data is in fact separable from the origin, if there exists a normal vector w (perpendicular to the hyperplane) such that a kernel K(w, x1)>0, ∀i, where xi is an object representation—a point in space (a D-dimensional vector).
- For the social networks classifier, a Gaussian kernel may be more appropriate than a polynomial kernel. For the special case of a Gaussian (Radial Basis Function, or RBF) kernel useful for the social networks representations, the following two properties guarantee this:
-
For K(x i ,x j)=e −γ∥xi −xj ∥: -
K(x i ,x j)>0∀i,j (1) -
K(x i ,x i)=1∀i (2) - where xi and xj are data points (representations of two objects).
- The γ parameter controls the smoothness of the decision boundary. Where there is no sharp boundary between positives and negatives, a value of γ between about 0.1 and 1.0 may be selected.
- This results in all mappings being in the positive orthant and on the unit sphere and being much tighter than for other kernels. The connection between a One-class Support Vector Machine and binary classification is fairly strong. Supposing a parameterization (w, ρ) for the supporting hyperplane of a data set {x1, . . . , xl}, where ρ/∥w∥ represents the orthogonal distance from the hyperplane to the
space origin 0, and (,0) is the parameterization of the maximally separating hyperplane for the labeled data set: -
{(x 1+1), . . . , (x l+1), (−x 1−1), . . . , (x l−1)}. - Also, suppose that a maximally separating hyperplane is parameterized by (w,0) for a data set {(xi,y1) . . . , (xl,yl)} and with a margin, then the supporting hyperplane for {y1x1, . . . , ylxl} is parameterized by (w, ρ). For the non-separable case, margin errors in the binary setting correspond to outliers in the one-class case.
- In the Mapping Convergence method, given a hypothesis h, items that are further from the decision boundary are classified with a higher probability. In other words: given a description of the volume where the positive training examples reside, items that are furthest away from it are taken to be most dissimilar to the positive items. Thus, artificial negative items are first created by labeling the ones most dissimilar to the positive training examples. At this stage h0, all other samples are in a large volume bounded by a boundary which includes the positive examples.
- This first approximation of the negative distribution serves as input for a converging stage to move the boundary towards the positive training examples. An SVM is trained on the positive training set and the constructed negatives. The resulting hypothesis is used to classify the remaining unlabeled items. Any unlabeled items that are classified as negative are added to the negative set. The boundary which most closely fits around the remaining samples thus converges towards the boundary around the known positive samples. The converging stage is iterated until convergence is reached when no new negative items are discovered and the boundary divides the positive and negative data points.
- In the following algorithm, P represents the data set for the positively labeled objects. U represents the data set of unlabeled objects, which at the beginning accounts for the rest of the objects in the entire dataset S. Initially, the set of negatively labeled objects, N is an empty set.
- At the beginning, a one class support vector machines classifier (OC-SVM) entitled C1 provides the first hypothesis. Thereafter, for subsequent iterations, a SVM classifier C2 which uses positive and negative data points takes over. For the first hypothesis h0, C1 is trained on the set of positives P to identify a small set of the strongest negatives {circumflex over (N)}0 (e.g., less than 10% of U) from among the unlabeled dataset U. The rest {circumflex over (P)}0 of the unlabeled dataset is considered positive. Thereafter, while the set of negatives {circumflex over (N)}i at iteration i is not empty, the second classifier is trained on the respective current P and N sets to produce a hypothesis in which the most negative data points in the remaining positive set are labeled negative.
-
Algorithm 1: Mapping Convergence Require: positive data set P unlabeled data set U negative data set N=Ø OC-SVM: C1 SVM: C2 Ensure: boundary function h i1. h0 train C1 on P 2. {circumflex over (N)}0 strong negatives(≦10%) from U by h0 {circumflex over (P)}0 remaining part of U 3. i 0 4. while {circumflex over (N)}i ≠ Ø do 5. N N ∪ {circumflex over (N)}i 6. hi+1 train C2 on P and N 7. {circumflex over (N)}i+1 negatives from {circumflex over (P)}i by hi+1 {circumflex over (P)}i+1 positives from {circumflex over (P)}i by h i+18. i i+1 9. end while - Starting with an initial hypothesis which places the boundary between positive and negative data points at h0, each new hypothesis hi+1 maximizes the margin between hi and bp (the boundary for the known positive samples). At each new hypothesis, when the new boundary is not surrounded by any data, it retracts to the nearest point where the data resides.
- The iteration process may be stopped prior to its natural completion to avoid the case where the boundary returns to bp, which can happen if the gap between positive and negative data points is relatively small. A solution to the problem of over-iteration is finding the hypothesis that maximizes some performance measure. For example, to determine the optimal point to end the Mapping Convergence process, a graph may be plotted with the percentage of the entire data set returned on, the horizontal axis, and on the vertical axis, the approximate percentage of the actual positives that is found within that space (which may be estimated by identifying the number of labeled e-mails in all or a sample of the objects returned). An iteration number is selected when the approximate percentage of the positives begins to drop off dramatically with each further iteration. This approach involves identifying the point in the curve that excludes most of the data as negative, while keeping a large part of the positive data to be classified correctly. This may be after about 1-20 iterations, e.g., up to 10. The number of iterations may be different for the two types of classifier. In particular, the linear classifier used for text representations can reach an optimal point more quickly (fewer iterations) than the Gaussian classifier used for social networks representations.
- As will be appreciated, if a small number of negative samples is available at the outset, the initial negative data set N≠Ø and the initial step of generating a small set of negative samples can be omitted.
- To demonstrate the dynamics of the convergence, random data in 2 that also does not offer the large gap between positive and negative data was artificially generated. MC starts out a conservative hypothesis encompassing all the positive samples and most of the unlabeled data and converges to a solution taking into account the distribution of unlabeled data. In the first mapping stage, artificial negative items are created by labeling the ones most unsimilar to the positive training examples. This first approximation of the negative distribution serves as input for the converging stage to move the boundary towards the positive training examples. An SVM is trained on the positive training set and the constructed negatives. The resulting hypothesis is used to classify the remaining unlabeled items. Any unlabeled items that are classified as negative are added to the negative set. The converging stage is iterated until convergence, reached when no new negative items are discovered and the boundary comes to a hold. However, where there is no clear boundary between actual positives and negatives, then over-convergence can result, with boundaries being placed between clusters in the unlabeled data. A performance curve generated for the random data is illustrated in
FIG. 4 . -
FIG. 5 schematically illustrates this in two dimensions with a set of labeled positive data shown by circles and unlabeled data by triangles. Atight boundary 90 around the known positives indicates the OC-SVM. The iterations start with a large percentage of the dataset returned as indicated byboundary 92 inFIG. 5 and the first square inFIG. 4 at the top right hand corner. Naturally, this large set contains all if not most of the true positives. It also contains a large percentage of what would be negatives, and is thus not very useful. As the iterations proceed, the number of data point returned as “positive” decreases, and some of the actual positives may be lost (outside boundary 94). The point on the curve that is closest to (0,100) may be considered to be optimal in terms of performance criteria, i.e., providing the best classifier with the parameters chosen (boundary 96). If the convergence is stopped on the fifth iteration, giving the classifier closest to the upper left corner of the plot, a fairly accurate description of the data may be obtained. If the iterations continue, over fitting may occur, as illustrated byboundaries - The distance measure can be weighted to assign more importance to recall or precision. However, in one embodiment, the Euclidean distance d to (0,100) on the performance graph is used to identify the closest point at which to stop convergence. In another embodiment, the iteration number is selected to ensure that at least a threshold percentage, e.g., at least 90% or at least 95%, of the labeled positive data points are returned in the “positive” set.
- There is also a trade-off in deciding how much data to label in the mapping stage. A bigger initial set of artificial negatives is more likely to cover the distribution of the negatives. However, putting too many data points in No can result in poor performance because the effect will accumulate in convergence. In one embodiment, no more than about 5-10% of the data points are initially assigned to the negative set.
- To improve the retention of actual positives in the set of returned “positives”, a cross-validation step may be incorporated into the algorithm. Each step in Mapping Convergence process may be carried out with a split of the data into several folds, such as from four to ten folds (parts). Using a higher number of folds in the cross-validation reduces irregularities, but comes with a higher computational cost. For example, a 5-fold split over the positive data may be used. A hypothesis is trained on 4 of the parts, and a prediction is made on the remaining fifth part and the entire unlabeled set. This results in exactly one prediction per item in the positive set, and after aggregating the five predictions for the items in the unlabeled set, a single prediction is generated there too. This allows a solid estimate of the performance of the hypothesis to be obtained.
- Other methods of assessing the performance at an iteration include fitting a logistic function to the output of an SVM to directly output the distance to the decision plane. This indeed is a measure of the confidence of the prediction. After scaling to (0,1) the predictions of multiple classifiers can be aggregated, for example by taking averages.
- The discussion in the previous section focuses on obtaining a good performance from a single classifier, even when very little labeled data is available.
- Combining different classifiers to improve overall performance is known as Ensemble Learning. Various ways are contemplated for incorporating the two modalities into a single overall classification. In a first, naive approach, the outputs of the two classifiers are combined. In a second approach, the MC algorithm is combined with co-training.
- Naïve Combination
- For example, one classifier is trained on the representations generated in the first modality and a second classifier is trained on the representations generated in the second modality. Both classifiers have classified all items in the test sets, but potentially have made errors. When one of the two has made an error, ideally, it can be corrected by the second. Since the classifiers each output a prediction, such as a number in (0,1) that represents the confidence, these predictions can be averaged over multiple classifiers. The classifier that is most certain will, in the case of an error, correct the other. In this approach,
Algorithm 1 can be used to separately train two classifiers and the optimal combination of iterations of each of the two classifiers selected to produce a combined classifier, which is generally better than either one of the two classifiers. - Another embodiment of the MC algorithm (Algorithm 2) allows different classifier outputs to be into account on each of the iterative steps. In this embodiment, the different classifiers cooperate in a manner that resembles co-training (see, for example, Avrim Blum and Tom Mitchell, Combining labeled and unlabeled data with co-training, in Proc. of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers (1998)). At each step of the iteration, the more confident classifier (the one assigning the higher probability to a data point of being positive or negative) is able to overrule the other. The predictions of the two classifiers are aggregated by an aggregating function. In one embodiment, a fixed percentage of the unlabeled objects is labeled as negative. It is not necessary to continue labeling all data until convergence, only part of the data is needed: the part that both classifiers agree on to be negative.
- In the following algorithm, P again represents the set of positively labeled objects. U represents the set of unlabeled objects, which at the beginning accounts for the rest of the objects in the dataset S.
-
Algorithm 2: Mapping Co-Convergence Require: n views of the positive data set P(1), ...,P(n) n views of the unlabeled data set U(1), ...,U(n) n views of the negative data set N(1) = Ø, ...,N(n) = Ø OC-SVM: C1 SVM: C2 Aggregation function: Agg. Ensure: boundary functions hi (1), ..., h i (n)1. h0 (k) train C1 on P(k) ∀k ε [1,...,n] 2. pred0 (k) predict with h0 (k) on U(k) ∀k ε [1,...,n] 3. {circumflex over (N)}0 (k) strong negatives (≦10%) in U(k) by Agg(pred0 (0),...,pred0 (n)) {circumflex over (P)}o (k) remaining part of U(k) ∀k ε [1,...,n] 4. i 0 5. while {circumflex over (N)}i (k) ≠ Ø ∀k ε [1,...,n] do 6. N(k) N(k) ∪ {circumflex over (N)}(k) ∀k ε [1,...,n] 7. hi + 1 (k) train C2 on P(k) and N(k) ∀k ε [1,...,n] 8. predi+1 (k) predict with hi+1 (k) on Pi (k) ∀k ε [1,...,n] 9. {circumflex over (N)}i + 1 (k) strong negatives (≦5%) in {circumflex over (P)}i (k) by Agg(predi+1 (0),..., predi+1 (n)) {circumflex over (P)}i+1 (k) remaining part of {circumflex over (P)}i (k) ∀k ε [1,...,n] 10. i i+1 11. end while - There are several differences with
Algorithm 1, discussed above, which may be noted. First,Algorithm 2 starts with more than one representation for the data objects, i.e., with n different views of them. In the exemplary embodiment, n=2, although it is to be appreciated that any number of modalities may be considered, such as 3, 4, 5, or more. In essence,Algorithm 2 is processing n different convergences simultaneously. - Second, the interaction takes place only by means of the aggregation function that combines the predictions and thus creates a filter that can be used to select the items to label. The exemplary aggregation function simply sums the respective predictions of the two C2 classifiers. However, other aggregation functions may be used which take into account the two predictions, such as a function in which one classifier's prediction is weighted more highly than the other.
- Third, in the convergence phase, not all items that are recognized as negative by one or both of the classifiers are added to the set of negatives N. Only the ones with the highest certainty (as determined by the aggregation) are labeled as negative, limited to a maximum of a certain percentage, here 5%, of the data set which, at the previous iteration, were classed as positive.
- Without intending to limit the scope of the exemplary embodiment, the following example demonstrates the applicability of the method to a large dataset in which only a few positive examples are available.
- The Enron Corpus can be used to demonstrate improvements in classification performance in a one-class setting by combining classifiers of different modalities. Large bodies of documents, resembling the ones that are encountered when reviewing documents for corporate litigation, rarely appear in the public domain. Consequently, the Enron Corpus (EC) is, with respect to its sheer size and completeness, unique of its kind. Containing all e-mails sent and received by some 150 accounts of the top management of Enron and spanning a period of several years, the total number of messages is about 250,000. Almost no censorship has been applied to the contents, resulting in a vast variety of subjects ranging from business related topics to strictly personal messages.
- Several attempts have been made to manually annotate parts of the Enron Corpus (e.g., Bekkerman, et al., Automatic Categorization of E-mail in Folders: Benchmark Experiments on Enron and SRI Corpora, Technical Report IR-418, CIIR, 2004). All of these are relatively small (several thousands of messages) and typically annotated with subject, emotional tone, or other primary properties. Annotation with respect to responsiveness is generally not publicly available.
- For purposes of evaluation, the lists of government exhibits published on the website of the United States Department of Justice (DOJ) were used as an indicator of responsiveness. In particular, a set of 186 e-mails that has been used in Enron trials were assumed to be responsive, and were labeled as such, with the understanding that this likely represents only a small subset of responsive documents in the overall collection.
- The original data set consists of about 600,000 text files, ordered in 150 folders, each representing an e-mail account. In these folders any original structure the user has created has been preserved. Even though the files are raw text, each contains separately the body and several of the e-mail header fields. Some preprocessing has been performed on the headers (e.g., in some places e-mail addresses have been reconstructed). Attachments are not included.
- The Enron Corpus contains a large number of duplicate messages, ambiguous references to persons and other inconsistencies. The first step of preprocessing the 517,617 files in the database involves unifying identical messages based on title and a digest of the body, immediately reducing the number of messages by a little over 52% (248,155 messages remain).
- In the reference resolution step, a total of 167,274 different references can be grouped as 114,986 actors. This includes a large number of references that occur only once and a small number of more central actors that are referred to in many different ways.
- By setting a low threshold on activity, this number can be reduced dramatically. Some ambiguity and inconsistency may remain. For example, a considerable number of messages are found to have more than one sender. These errors are generally rare enough to be treated as noise. Plots characterizing the corpus are shown in
FIGS. 6-10 . InFIG. 6 , as predicted by Zipf's law, the frequency of words is seen to be inversely proportional to their rank based on frequency.FIG. 7 shows the distribution of message sizes. The main peak is around 11 words, with most mass for lengths between 10 and 300 words. It is evident that the average e-mail is a relatively short text, one more reason to try to use other properties in classification.FIG. 8 shows that even though there are some very active actors in the network, most actors send very few e-mails. Similarly, as shown inFIG. 9 , the number of recipients per message shows a Zipf-like distribution: there are some e-mails with a very large number of recipients (up to 1000), but most communications are aimed at a small group of recipients.FIG. 10 shows the number of e-mails per week that are sent over the timeframe of interest to the Enron Corpus. Vertical lines indicate that a message in the DOJ subset is in that week. It can be seen that the timestamps of e-mails on the exhibit list are clustered around crucial dates in the litigation. - In generating the lexicon, words of less than three words were excluded. Porter stemming, and lowercased words were used. The effect of reducing dimensionality by clustering was also evaluated. Clustering words into cliques, as described above, resulted in 6522 clusters. Other methods of reducing dimensionality were also evaluated, such as selecting the most important features (e.g., the top 1000 or top 500).
- A framework was developed using the Python language using the LIBSVM library (see, Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for Support Vector Machines, 2001. Software available at www.csie.ntu.edu.tw/˜cjlin/libsvm.) A linear kernel was used for the text-based feature sets and a Gaussian kernel was used with the social network-based sets.
- Although Support Vector Machines are computationally not very expensive, for practical reasons, experiments were run on a subset of the corpus. In all experiments, the same set of 186 positive examples obtained from the DOJ exhibit list and 1,000 randomly selected unlabeled documents were used. For testing on the positives, 5-fold cross-validation was used. For testing on unlabeled items, 500 items were withheld. In this way, performance curves of the type shown in
FIG. 4 could be generated. - Text and social network feature representations were generated for each e-mail, as described above. Parameters were adjusted to try to get the optimal settings of parameters to obtain good classifiers for use in combining their predictions.
- Text Based Representations
- For the text-based representations, reducing the dimension of the feature space was found not to improve performance (
FIG. 11 ). Even the OC-SVM performs considerably worse with less features. The convergence itself however was not hurt. Also the reduction of text features using semantic clustering does not seem to provide an improvement for this data set. Overall, the best and most stable performance was obtained using all features with tf-idf feature values. Binary and tf-idf values were found to give similar results, with tf-idf being slightly better and more stable. (FIG. 12 ). - Social Network-Based Representations
-
FIG. 13 shows the effect of selecting different values of γ Because of the different nature of the features used in the document representation based on the implicit social network, less tuning is needed. The feature values are fixed. No evaluation was made of the effect of reducing the number of features. For the Gaussian kernel, different values of the y parameter, which controls the smoothness of the decision boundary, were evaluated. The optimal value of y for the data set, from those tested, was found to be 0.1. Significantly larger values tend to lead to under-fitting: large steps in the convergence. Significantly smaller values also tend to lead to under-fitting: giving good performance until a certain point, with erratic behavior thereafter. - The performance curve for the “best” classifier found for text representations (bag of words, “bow”) is shown in
FIG. 14 . It can be seen that during the convergence, performance degrades slowly, with a drop at the end. The object was to select a classifier that is just before the drop. Note also that the algorithm is clearly beating OC-SVM. The algorithm takes a huge first step in the convergence, yielding a hypothesis that separates out 75.8% of the positives in 16.8% of the data. - The performance curve for the “best” classifier found for social networks representations (“soc”) is shown in
FIG. 15 . Optimally, 71.5% of the positives are selected in 9.4% of the data. - Combining the classifiers described above was then evaluated. It was found that in both combination methods (naïve and MCC), the best results are obtained using the bag-of-words representation with all features and tf-idf values combined with the social network feature set.
- In the first, naive way of combining the two classifiers, their predictions were aggregated and the performance measured. For an aggregation function, the average of the predictions was used. At each iteration, a limited number of objects classified as negative with the highest certainty by the two classifiers combined, is labeled as negative.
- In
FIG. 16 , combined classifiers were obtained by combining the classifiers obtained at the 12th and 13th steps on the social network curve and the 2nd and 3rd steps of the text curve. It can be seen that this results in a movement towards the top left of the graph, indicating a classification that takes less data while retrieving more of the positives. - In the second approach, the hypotheses appearing on the curve are based on a small part of the data. The performance is excellent, providing above 90% recall of the positives while discarding over 90% of the data.
-
FIG. 17 shows the results obtained by using co-training (MCC) withAlgorithm 2. - As explained above, the curves can be compared by comparing their “best” classifier, taking the Euclidian distance to (0,100), the perfect classifier, to be the measure of comparison. TABLE 1 lists the “best” classifiers of the curves. We can see that the combination of social network-based and text-based feature sets does indeed yield very good results.
-
TABLE 1 Comparison of the different classifiers A: % B: % of data distance from of data in A which optimal (0, 100 classifier removed is positive on the graph) text (bow tf-idf) (FIG. 13) 75.81 16.80 29.45 social network (FIG. 14) 71.51 9.40 30.00 Text + social (naïve) 81.62 9.22 20.56 (FIG. 15) text + social (MCC) (FIG. 16) 90.32 6.40 11.60 - From the results on the Enron corpus, it can be concluded that the multi-modality approach, using multiple levels of description, yields good results, better than for a conventional SVM. The combination of co-training within the Mapping Convergence framework provides a dramatically improvement in classification results, although the naïve approach provides some improvements.
- It appears that in the mapping phase of the framework, a conservative approach is less likely to result in compounding effects of an initial bad classification. It is, however, desirable to include enough data in the first approximation of the negative set to support convergence. In experiments on the Enron corpus, it appears that labeling 5-10% with a one-class classifier is usually enough to keep the convergence going and avoid undesirable compounding effects.
- Even though there are relatively few parameters to be selected, the algorithms are somewhat sensitive to the selection of parameters. For any new dataset, experiments to identify suitable values, such as a suitable value of γ, as illustrated in
FIG. 12 , may therefore be appropriate. - The use of cross validation appears to improve results with a corpus which does not contain labels for much of the data. Five-fold cross validation was used in the present example. The split is randomly made on every step in the convergence. Discrepancy between runs could be reduced by using a greater number of folds, e.g., 10-fold cross-validation or higher, although at a higher computational cost.
- With only a very small size for the initial positive set, randomly selecting only a subset of the entire data set for the unlabeled set may have advantages. For similar data sets, the large initial corpus could be randomly subdivided into subsets and the method performed, as described above, for each of the subsets, using the same set of initial positives. The output of positives for each subset could then be combined to generate a set of objects for review by appropriate trained personnel. Alternatively, a classifier trained on one subset of unlabeled objects could be used to label the entire corpus of unlabeled objects.
- It will be appreciated that various of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.
Claims (21)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/608,143 US8386574B2 (en) | 2009-10-29 | 2009-10-29 | Multi-modality classification for one-class classification in social networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/608,143 US8386574B2 (en) | 2009-10-29 | 2009-10-29 | Multi-modality classification for one-class classification in social networks |
Publications (2)
Publication Number | Publication Date |
---|---|
US20110103682A1 true US20110103682A1 (en) | 2011-05-05 |
US8386574B2 US8386574B2 (en) | 2013-02-26 |
Family
ID=43925504
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/608,143 Active 2031-08-31 US8386574B2 (en) | 2009-10-29 | 2009-10-29 | Multi-modality classification for one-class classification in social networks |
Country Status (1)
Country | Link |
---|---|
US (1) | US8386574B2 (en) |
Cited By (242)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120158525A1 (en) * | 2010-12-20 | 2012-06-21 | Yahoo! Inc. | Automatic classification of display ads using ad images and landing pages |
US20120166942A1 (en) * | 2010-12-22 | 2012-06-28 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US20120166348A1 (en) * | 2010-12-26 | 2012-06-28 | International Business Machines Corporation | Statistical analysis of data records for automatic determination of activity of non-customers |
US20120284381A1 (en) * | 2011-05-03 | 2012-11-08 | Xerox Corporation | Systems, methods and devices for extracting and visualizing user-centric communities from emails |
US20130013706A1 (en) * | 2011-07-07 | 2013-01-10 | International Business Machines Corporation | Method for determining interpersonal relationship influence information using textual content from interpersonal interactions |
US8433762B1 (en) * | 2009-11-20 | 2013-04-30 | Facebook Inc. | Generation of nickname dictionary based on analysis of user communications |
US8620842B1 (en) | 2013-03-15 | 2013-12-31 | Gordon Villy Cormack | Systems and methods for classifying electronic information using advanced active learning techniques |
US8694980B2 (en) | 2012-06-26 | 2014-04-08 | International Business Machines Corporation | Efficient egonet computation in a weighted directed graph |
US8701032B1 (en) | 2012-10-16 | 2014-04-15 | Google Inc. | Incremental multi-word recognition |
US20140172754A1 (en) * | 2012-12-14 | 2014-06-19 | International Business Machines Corporation | Semi-supervised data integration model for named entity classification |
US8782549B2 (en) | 2012-10-05 | 2014-07-15 | Google Inc. | Incremental feature-based gesture-keyboard decoding |
US8819574B2 (en) * | 2012-10-22 | 2014-08-26 | Google Inc. | Space prediction for text input |
US8843845B2 (en) | 2012-10-16 | 2014-09-23 | Google Inc. | Multi-gesture text input prediction |
US8850350B2 (en) | 2012-10-16 | 2014-09-30 | Google Inc. | Partial gesture text entry |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US20150006635A1 (en) * | 2013-06-27 | 2015-01-01 | National Taiwan University | Global relationship model and a relationship search method for internet social networks |
US20150095316A1 (en) * | 2010-04-09 | 2015-04-02 | Microsoft Technology Licensing, Llc. | Web-Scale Entity Relationship Extraction |
US9021380B2 (en) | 2012-10-05 | 2015-04-28 | Google Inc. | Incremental multi-touch gesture recognition |
US9081500B2 (en) | 2013-05-03 | 2015-07-14 | Google Inc. | Alternative hypothesis error correction for gesture typing |
US20150234928A1 (en) * | 2009-09-29 | 2015-08-20 | At&T Intellectual Property I, Lp | Method and apparatus to identify outliers in social networks |
US9183193B2 (en) | 2013-02-12 | 2015-11-10 | Xerox Corporation | Bag-of-repeats representation of documents |
US20150379158A1 (en) * | 2014-06-27 | 2015-12-31 | Gabriel G. Infante-Lopez | Systems and methods for pattern matching and relationship discovery |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9268749B2 (en) | 2013-10-07 | 2016-02-23 | Xerox Corporation | Incremental computation of repeats |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US20160224516A1 (en) * | 2015-01-30 | 2016-08-04 | Xerox Corporation | Method and system to attribute metadata to preexisting documents |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9454771B1 (en) * | 2011-03-31 | 2016-09-27 | Twitter, Inc. | Temporal features in a messaging platform |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9547439B2 (en) | 2013-04-22 | 2017-01-17 | Google Inc. | Dynamically-positioned character string suggestions for gesture typing |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US20170076178A1 (en) * | 2015-09-14 | 2017-03-16 | International Business Machines Corporation | System, method, and recording medium for efficient cohesive subgraph identification in entity collections for inlier and outlier detection |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US20170206453A1 (en) * | 2016-01-19 | 2017-07-20 | International Business Machines Corporation | System and method of inferring synonyms using ensemble learning techniques |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9760546B2 (en) | 2013-05-24 | 2017-09-12 | Xerox Corporation | Identifying repeat subsequences by left and right contexts |
CN107169049A (en) * | 2017-04-25 | 2017-09-15 | 腾讯科技(深圳)有限公司 | The label information generation method and device of application |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9830311B2 (en) | 2013-01-15 | 2017-11-28 | Google Llc | Touch keyboard using language and spatial models |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
CN107748901A (en) * | 2017-11-24 | 2018-03-02 | 东北大学 | The industrial process method for diagnosing faults returned based on similitude local spline |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10062039B1 (en) * | 2017-06-28 | 2018-08-28 | CS Disco, Inc. | Methods and apparatus for asynchronous and interactive machine learning using word embedding within text-based documents and multimodal documents |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10067965B2 (en) | 2016-09-26 | 2018-09-04 | Twiggle Ltd. | Hierarchic model and natural language analyzer |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10078802B2 (en) * | 2013-01-09 | 2018-09-18 | Peking University Founder Group Co., Ltd. | Method and system of discovering and analyzing structures of user groups in microblog |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10108902B1 (en) * | 2017-09-18 | 2018-10-23 | CS Disco, Inc. | Methods and apparatus for asynchronous and interactive machine learning using attention selection techniques |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10146855B2 (en) | 2011-03-31 | 2018-12-04 | Twitter, Inc. | Content resonance |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10204143B1 (en) | 2011-11-02 | 2019-02-12 | Dub Software Group, Inc. | System and method for automatic document management |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10229117B2 (en) | 2015-06-19 | 2019-03-12 | Gordon V. Cormack | Systems and methods for conducting a highly autonomous technology-assisted review classification |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10248667B1 (en) | 2013-03-15 | 2019-04-02 | Twitter, Inc. | Pre-filtering in a messaging platform |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10268766B2 (en) * | 2016-09-26 | 2019-04-23 | Twiggle Ltd. | Systems and methods for computation of a semantic representation |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10282462B2 (en) * | 2016-10-31 | 2019-05-07 | Walmart Apollo, Llc | Systems, method, and non-transitory computer-readable storage media for multi-modal product classification |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10311046B2 (en) | 2016-09-12 | 2019-06-04 | Conduent Business Services, Llc | System and method for pruning a set of symbol-based sequences by relaxing an independence assumption of the sequences |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
WO2019165000A1 (en) * | 2018-02-20 | 2019-08-29 | Jackson James R | Systems and methods for generating a relationship among a plurality of data sets to generate a desired attribute value |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
CN110447039A (en) * | 2017-03-23 | 2019-11-12 | 北京嘀嘀无限科技发展有限公司 | The system and method for predicting object type |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10482118B2 (en) * | 2017-06-14 | 2019-11-19 | Sap Se | Document representation for machine-learning document classification |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
CN110532390A (en) * | 2019-08-26 | 2019-12-03 | 南京邮电大学 | A kind of news keyword extracting method based on NER and Complex Networks Feature |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
CN110781407A (en) * | 2019-10-21 | 2020-02-11 | 腾讯科技(深圳)有限公司 | User label generation method and device and computer readable storage medium |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10650408B1 (en) | 2013-03-15 | 2020-05-12 | Twitter, Inc. | Budget smoothing in a messaging platform |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
CN111274376A (en) * | 2020-01-21 | 2020-06-12 | 支付宝(杭州)信息技术有限公司 | Method and system for training label prediction model |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10699236B2 (en) | 2015-10-17 | 2020-06-30 | Tata Consultancy Services Limited | System for standardization of goal setting in performance appraisal process |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10755229B2 (en) | 2018-04-11 | 2020-08-25 | International Business Machines Corporation | Cognitive fashion-ability score driven fashion merchandising acquisition |
US10769677B1 (en) * | 2011-03-31 | 2020-09-08 | Twitter, Inc. | Temporal features in a messaging platform |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US10893099B2 (en) | 2012-02-13 | 2021-01-12 | SkyKick, Inc. | Migration project automation, e.g., automated selling, planning, migration and configuration of email systems |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10956928B2 (en) | 2018-05-17 | 2021-03-23 | International Business Machines Corporation | Cognitive fashion product advertisement system and method |
US10963744B2 (en) * | 2018-06-27 | 2021-03-30 | International Business Machines Corporation | Cognitive automated and interactive personalized fashion designing using cognitive fashion scores and cognitive analysis of fashion trends and data |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11010127B2 (en) | 2015-06-29 | 2021-05-18 | Apple Inc. | Virtual assistant for media playback |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11036800B1 (en) * | 2016-04-29 | 2021-06-15 | Veritas Technologies Llc | Systems and methods for clustering data to improve data analytics |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US20210272013A1 (en) * | 2020-02-27 | 2021-09-02 | S&P Global | Concept modeling system |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11270211B2 (en) * | 2018-02-05 | 2022-03-08 | Microsoft Technology Licensing, Llc | Interactive semantic data exploration for error discovery |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
CN114757271A (en) * | 2022-04-06 | 2022-07-15 | 扬州大学 | Social network node classification method and system based on multi-channel graph convolution network |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
WO2022177928A1 (en) * | 2021-02-16 | 2022-08-25 | Carnegie Mellon University | System and method for reducing false positives in object detection frameworks |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11538083B2 (en) | 2018-05-17 | 2022-12-27 | International Business Machines Corporation | Cognitive fashion product recommendation system, computer program product, and method |
CN115630160A (en) * | 2022-12-08 | 2023-01-20 | 四川大学 | Dispute focus clustering method and system based on semi-supervised co-occurrence graph model |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11895074B2 (en) * | 2022-05-31 | 2024-02-06 | Microsoft Technology Licensing, Llc | Systems and methods for determining scores for messages based on actions of message recipients and a network graph |
US11948347B2 (en) | 2020-04-10 | 2024-04-02 | Samsung Display Co., Ltd. | Fusion model training using distance metrics |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8514226B2 (en) * | 2008-09-30 | 2013-08-20 | Verizon Patent And Licensing Inc. | Methods and systems of graphically conveying a strength of communication between users |
US20120297038A1 (en) * | 2011-05-16 | 2012-11-22 | Microsoft Corporation | Recommendations for Social Network Based on Low-Rank Matrix Recovery |
US9367526B1 (en) * | 2011-07-26 | 2016-06-14 | Nuance Communications, Inc. | Word classing for language modeling |
US9165069B2 (en) * | 2013-03-04 | 2015-10-20 | Facebook, Inc. | Ranking videos for a user |
US9082047B2 (en) | 2013-08-20 | 2015-07-14 | Xerox Corporation | Learning beautiful and ugly visual attributes |
US10043112B2 (en) * | 2014-03-07 | 2018-08-07 | Qualcomm Incorporated | Photo management |
US9760619B1 (en) | 2014-04-29 | 2017-09-12 | Google Inc. | Generating weighted clustering coefficients for a social network graph |
US9875301B2 (en) | 2014-04-30 | 2018-01-23 | Microsoft Technology Licensing, Llc | Learning multimedia semantics from large-scale unstructured data |
US9785866B2 (en) | 2015-01-22 | 2017-10-10 | Microsoft Technology Licensing, Llc | Optimizing multi-class multimedia data classification using negative data |
US10013637B2 (en) | 2015-01-22 | 2018-07-03 | Microsoft Technology Licensing, Llc | Optimizing multi-class image classification using patch features |
US9942186B2 (en) | 2015-08-27 | 2018-04-10 | International Business Machines Corporation | Email chain navigation |
US10296846B2 (en) * | 2015-11-24 | 2019-05-21 | Xerox Corporation | Adapted domain specific class means classifier |
US10878336B2 (en) * | 2016-06-24 | 2020-12-29 | Intel Corporation | Technologies for detection of minority events |
US11182804B2 (en) | 2016-11-17 | 2021-11-23 | Adobe Inc. | Segment valuation in a digital medium environment |
CN107944489B (en) * | 2017-11-17 | 2018-10-16 | 清华大学 | Extensive combination chart feature learning method based on structure semantics fusion |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050076240A1 (en) * | 2003-04-02 | 2005-04-07 | Barry Appleman | Degrees of separation for handling communications |
US20050076241A1 (en) * | 2003-04-02 | 2005-04-07 | Barry Appelman | Degrees of separation for handling communications |
US6938024B1 (en) * | 2000-05-04 | 2005-08-30 | Microsoft Corporation | Transmitting information given constrained resources |
US20060190481A1 (en) * | 2003-01-24 | 2006-08-24 | Aol Llc | Classifier Tuning Based On Data Similarities |
US7167866B2 (en) * | 2004-01-23 | 2007-01-23 | Microsoft Corporation | Selective multi level expansion of data base via pivot point data |
US20080052398A1 (en) * | 2006-05-30 | 2008-02-28 | International Business Machines Corporation | Method, system and computer program for classifying email |
US20080069456A1 (en) * | 2006-09-19 | 2008-03-20 | Xerox Corporation | Bags of visual context-dependent words for generic visual categorization |
US20080086431A1 (en) * | 2006-09-15 | 2008-04-10 | Icebreaker, Inc. | Social interaction messaging and notification |
US7386527B2 (en) * | 2002-12-06 | 2008-06-10 | Kofax, Inc. | Effective multi-class support vector machine classification |
US20090086720A1 (en) * | 2007-09-28 | 2009-04-02 | Cisco Technology, Inc. | Identity association within a communication system |
US20090144033A1 (en) * | 2007-11-30 | 2009-06-04 | Xerox Corporation | Object comparison, retrieval, and categorization methods and apparatuses |
US20090319518A1 (en) * | 2007-01-10 | 2009-12-24 | Nick Koudas | Method and system for information discovery and text analysis |
US20100082751A1 (en) * | 2008-09-29 | 2010-04-01 | Microsoft Corporation | User perception of electronic messaging |
US7725475B1 (en) * | 2004-02-11 | 2010-05-25 | Aol Inc. | Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems |
US8195754B2 (en) * | 2009-02-13 | 2012-06-05 | Massachusetts Institute Of Technology | Unsolicited message communication characteristics |
-
2009
- 2009-10-29 US US12/608,143 patent/US8386574B2/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6938024B1 (en) * | 2000-05-04 | 2005-08-30 | Microsoft Corporation | Transmitting information given constrained resources |
US7386527B2 (en) * | 2002-12-06 | 2008-06-10 | Kofax, Inc. | Effective multi-class support vector machine classification |
US20060190481A1 (en) * | 2003-01-24 | 2006-08-24 | Aol Llc | Classifier Tuning Based On Data Similarities |
US7945674B2 (en) * | 2003-04-02 | 2011-05-17 | Aol Inc. | Degrees of separation for handling communications |
US20050076241A1 (en) * | 2003-04-02 | 2005-04-07 | Barry Appelman | Degrees of separation for handling communications |
US20050076240A1 (en) * | 2003-04-02 | 2005-04-07 | Barry Appleman | Degrees of separation for handling communications |
US8185638B2 (en) * | 2003-04-02 | 2012-05-22 | Aol Inc. | Degrees of separation for handling communications |
US20110196939A1 (en) * | 2003-04-02 | 2011-08-11 | Aol Inc. | Degrees of separation for handling communications |
US7949759B2 (en) * | 2003-04-02 | 2011-05-24 | AOL, Inc. | Degrees of separation for handling communications |
US7167866B2 (en) * | 2004-01-23 | 2007-01-23 | Microsoft Corporation | Selective multi level expansion of data base via pivot point data |
US7725475B1 (en) * | 2004-02-11 | 2010-05-25 | Aol Inc. | Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems |
US20080052398A1 (en) * | 2006-05-30 | 2008-02-28 | International Business Machines Corporation | Method, system and computer program for classifying email |
US20080086431A1 (en) * | 2006-09-15 | 2008-04-10 | Icebreaker, Inc. | Social interaction messaging and notification |
US20080069456A1 (en) * | 2006-09-19 | 2008-03-20 | Xerox Corporation | Bags of visual context-dependent words for generic visual categorization |
US20090319518A1 (en) * | 2007-01-10 | 2009-12-24 | Nick Koudas | Method and system for information discovery and text analysis |
US20090086720A1 (en) * | 2007-09-28 | 2009-04-02 | Cisco Technology, Inc. | Identity association within a communication system |
US20090144033A1 (en) * | 2007-11-30 | 2009-06-04 | Xerox Corporation | Object comparison, retrieval, and categorization methods and apparatuses |
US20100082751A1 (en) * | 2008-09-29 | 2010-04-01 | Microsoft Corporation | User perception of electronic messaging |
US8195754B2 (en) * | 2009-02-13 | 2012-06-05 | Massachusetts Institute Of Technology | Unsolicited message communication characteristics |
Cited By (409)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US11928604B2 (en) | 2005-09-08 | 2024-03-12 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8930191B2 (en) | 2006-09-08 | 2015-01-06 | Apple Inc. | Paraphrasing of user requests and results by automated digital assistant |
US9117447B2 (en) | 2006-09-08 | 2015-08-25 | Apple Inc. | Using event alert text as input to an automated assistant |
US8942986B2 (en) | 2006-09-08 | 2015-01-27 | Apple Inc. | Determining user intent based on ontologies of domains |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11012942B2 (en) | 2007-04-03 | 2021-05-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9665651B2 (en) | 2009-09-29 | 2017-05-30 | At&T Intellectual Property I, L.P. | Method and apparatus to identify outliers in social networks |
US9443024B2 (en) * | 2009-09-29 | 2016-09-13 | At&T Intellectual Property I, Lp | Method and apparatus to identify outliers in social networks |
US9965563B2 (en) | 2009-09-29 | 2018-05-08 | At&T Intellectual Property I, L.P. | Method and apparatus to identify outliers in social networks |
US20150234928A1 (en) * | 2009-09-29 | 2015-08-20 | At&T Intellectual Property I, Lp | Method and apparatus to identify outliers in social networks |
US8433762B1 (en) * | 2009-11-20 | 2013-04-30 | Facebook Inc. | Generation of nickname dictionary based on analysis of user communications |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10741185B2 (en) | 2010-01-18 | 2020-08-11 | Apple Inc. | Intelligent automated assistant |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US8903716B2 (en) | 2010-01-18 | 2014-12-02 | Apple Inc. | Personalized vocabulary for digital assistant |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US10692504B2 (en) | 2010-02-25 | 2020-06-23 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US9317569B2 (en) * | 2010-04-09 | 2016-04-19 | Microsoft Technology Licensing, Llc | Displaying search results with edges/entity relationships in regions/quadrants on a display device |
US20150095316A1 (en) * | 2010-04-09 | 2015-04-02 | Microsoft Technology Licensing, Llc. | Web-Scale Entity Relationship Extraction |
US8732014B2 (en) * | 2010-12-20 | 2014-05-20 | Yahoo! Inc. | Automatic classification of display ads using ad images and landing pages |
US20120158525A1 (en) * | 2010-12-20 | 2012-06-21 | Yahoo! Inc. | Automatic classification of display ads using ad images and landing pages |
US20120166942A1 (en) * | 2010-12-22 | 2012-06-28 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10762293B2 (en) * | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US20120166348A1 (en) * | 2010-12-26 | 2012-06-28 | International Business Machines Corporation | Statistical analysis of data records for automatic determination of activity of non-customers |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10417405B2 (en) | 2011-03-21 | 2019-09-17 | Apple Inc. | Device access using voice authentication |
US10146855B2 (en) | 2011-03-31 | 2018-12-04 | Twitter, Inc. | Content resonance |
US9892431B1 (en) * | 2011-03-31 | 2018-02-13 | Twitter, Inc. | Temporal features in a messaging platform |
US10970312B2 (en) | 2011-03-31 | 2021-04-06 | Twitter, Inc. | Content resonance |
US9454771B1 (en) * | 2011-03-31 | 2016-09-27 | Twitter, Inc. | Temporal features in a messaging platform |
US10769677B1 (en) * | 2011-03-31 | 2020-09-08 | Twitter, Inc. | Temporal features in a messaging platform |
US8700756B2 (en) * | 2011-05-03 | 2014-04-15 | Xerox Corporation | Systems, methods and devices for extracting and visualizing user-centric communities from emails |
US20120284381A1 (en) * | 2011-05-03 | 2012-11-08 | Xerox Corporation | Systems, methods and devices for extracting and visualizing user-centric communities from emails |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US11350253B2 (en) | 2011-06-03 | 2022-05-31 | Apple Inc. | Active transport based notifications |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US20130013680A1 (en) * | 2011-07-07 | 2013-01-10 | International Business Machines Corporation | System and method for determining interpersonal relationship influence information using textual content from interpersonal interactions |
US20130013706A1 (en) * | 2011-07-07 | 2013-01-10 | International Business Machines Corporation | Method for determining interpersonal relationship influence information using textual content from interpersonal interactions |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10204143B1 (en) | 2011-11-02 | 2019-02-12 | Dub Software Group, Inc. | System and method for automatic document management |
US10965742B2 (en) * | 2012-02-13 | 2021-03-30 | SkyKick, Inc. | Migration project automation, e.g., automated selling, planning, migration and configuration of email systems |
US11265376B2 (en) | 2012-02-13 | 2022-03-01 | Skykick, Llc | Migration project automation, e.g., automated selling, planning, migration and configuration of email systems |
US10893099B2 (en) | 2012-02-13 | 2021-01-12 | SkyKick, Inc. | Migration project automation, e.g., automated selling, planning, migration and configuration of email systems |
US11069336B2 (en) | 2012-03-02 | 2021-07-20 | Apple Inc. | Systems and methods for name pronunciation |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US8694980B2 (en) | 2012-06-26 | 2014-04-08 | International Business Machines Corporation | Efficient egonet computation in a weighted directed graph |
US8694979B2 (en) | 2012-06-26 | 2014-04-08 | International Business Machines Corporation | Efficient egonet computation in a weighted directed graph |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9552080B2 (en) | 2012-10-05 | 2017-01-24 | Google Inc. | Incremental feature-based gesture-keyboard decoding |
US8782549B2 (en) | 2012-10-05 | 2014-07-15 | Google Inc. | Incremental feature-based gesture-keyboard decoding |
US9021380B2 (en) | 2012-10-05 | 2015-04-28 | Google Inc. | Incremental multi-touch gesture recognition |
US9542385B2 (en) | 2012-10-16 | 2017-01-10 | Google Inc. | Incremental multi-word recognition |
US10140284B2 (en) | 2012-10-16 | 2018-11-27 | Google Llc | Partial gesture text entry |
US9710453B2 (en) | 2012-10-16 | 2017-07-18 | Google Inc. | Multi-gesture text input prediction |
US8701032B1 (en) | 2012-10-16 | 2014-04-15 | Google Inc. | Incremental multi-word recognition |
US9678943B2 (en) | 2012-10-16 | 2017-06-13 | Google Inc. | Partial gesture text entry |
US11379663B2 (en) | 2012-10-16 | 2022-07-05 | Google Llc | Multi-gesture text input prediction |
US10489508B2 (en) | 2012-10-16 | 2019-11-26 | Google Llc | Incremental multi-word recognition |
US10977440B2 (en) | 2012-10-16 | 2021-04-13 | Google Llc | Multi-gesture text input prediction |
US8843845B2 (en) | 2012-10-16 | 2014-09-23 | Google Inc. | Multi-gesture text input prediction |
US9798718B2 (en) | 2012-10-16 | 2017-10-24 | Google Inc. | Incremental multi-word recognition |
US8850350B2 (en) | 2012-10-16 | 2014-09-30 | Google Inc. | Partial gesture text entry |
US9134906B2 (en) | 2012-10-16 | 2015-09-15 | Google Inc. | Incremental multi-word recognition |
US10019435B2 (en) | 2012-10-22 | 2018-07-10 | Google Llc | Space prediction for text input |
US8819574B2 (en) * | 2012-10-22 | 2014-08-26 | Google Inc. | Space prediction for text input |
US20140172754A1 (en) * | 2012-12-14 | 2014-06-19 | International Business Machines Corporation | Semi-supervised data integration model for named entity classification |
US9292797B2 (en) * | 2012-12-14 | 2016-03-22 | International Business Machines Corporation | Semi-supervised data integration model for named entity classification |
US10078802B2 (en) * | 2013-01-09 | 2018-09-18 | Peking University Founder Group Co., Ltd. | Method and system of discovering and analyzing structures of user groups in microblog |
US11727212B2 (en) | 2013-01-15 | 2023-08-15 | Google Llc | Touch keyboard using a trained model |
US9830311B2 (en) | 2013-01-15 | 2017-11-28 | Google Llc | Touch keyboard using language and spatial models |
US10528663B2 (en) | 2013-01-15 | 2020-01-07 | Google Llc | Touch keyboard using language and spatial models |
US11334717B2 (en) | 2013-01-15 | 2022-05-17 | Google Llc | Touch keyboard using a trained model |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US10714117B2 (en) | 2013-02-07 | 2020-07-14 | Apple Inc. | Voice trigger for a digital assistant |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US9183193B2 (en) | 2013-02-12 | 2015-11-10 | Xerox Corporation | Bag-of-repeats representation of documents |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US8713023B1 (en) | 2013-03-15 | 2014-04-29 | Gordon Villy Cormack | Systems and methods for classifying electronic information using advanced active learning techniques |
US9678957B2 (en) | 2013-03-15 | 2017-06-13 | Gordon Villy Cormack | Systems and methods for classifying electronic information using advanced active learning techniques |
US11080340B2 (en) | 2013-03-15 | 2021-08-03 | Gordon Villy Cormack | Systems and methods for classifying electronic information using advanced active learning techniques |
US8620842B1 (en) | 2013-03-15 | 2013-12-31 | Gordon Villy Cormack | Systems and methods for classifying electronic information using advanced active learning techniques |
US10600080B1 (en) | 2013-03-15 | 2020-03-24 | Twitter, Inc. | Overspend control in a messaging platform |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US10650408B1 (en) | 2013-03-15 | 2020-05-12 | Twitter, Inc. | Budget smoothing in a messaging platform |
US10248667B1 (en) | 2013-03-15 | 2019-04-02 | Twitter, Inc. | Pre-filtering in a messaging platform |
US10769661B1 (en) | 2013-03-15 | 2020-09-08 | Twitter, Inc. | Real time messaging platform |
US10963922B1 (en) | 2013-03-15 | 2021-03-30 | Twitter, Inc. | Campaign goal setting in a messaging platform |
US11288702B1 (en) | 2013-03-15 | 2022-03-29 | Twitter, Inc. | Exploration in a real time messaging platform |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US10692114B1 (en) | 2013-03-15 | 2020-06-23 | Twitter, Inc. | Exploration in a real time messaging platform |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11409717B1 (en) | 2013-03-15 | 2022-08-09 | Twitter, Inc. | Overspend control in a messaging platform |
US8838606B1 (en) | 2013-03-15 | 2014-09-16 | Gordon Villy Cormack | Systems and methods for classifying electronic information using advanced active learning techniques |
US11157464B1 (en) | 2013-03-15 | 2021-10-26 | Twitter, Inc. | Pre-filtering of candidate messages for message streams in a messaging platform |
US11216841B1 (en) | 2013-03-15 | 2022-01-04 | Twitter, Inc. | Real time messaging platform |
US9122681B2 (en) | 2013-03-15 | 2015-09-01 | Gordon Villy Cormack | Systems and methods for classifying electronic information using advanced active learning techniques |
US9547439B2 (en) | 2013-04-22 | 2017-01-17 | Google Inc. | Dynamically-positioned character string suggestions for gesture typing |
US10241673B2 (en) | 2013-05-03 | 2019-03-26 | Google Llc | Alternative hypothesis error correction for gesture typing |
US9081500B2 (en) | 2013-05-03 | 2015-07-14 | Google Inc. | Alternative hypothesis error correction for gesture typing |
US9841895B2 (en) | 2013-05-03 | 2017-12-12 | Google Llc | Alternative hypothesis error correction for gesture typing |
US9760546B2 (en) | 2013-05-24 | 2017-09-12 | Xerox Corporation | Identifying repeat subsequences by left and right contexts |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10769385B2 (en) | 2013-06-09 | 2020-09-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11048473B2 (en) | 2013-06-09 | 2021-06-29 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US20150006635A1 (en) * | 2013-06-27 | 2015-01-01 | National Taiwan University | Global relationship model and a relationship search method for internet social networks |
US9477994B2 (en) * | 2013-06-27 | 2016-10-25 | National Taiwan University | Global relationship model and a relationship search method for internet social networks |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US9268749B2 (en) | 2013-10-07 | 2016-02-23 | Xerox Corporation | Incremental computation of repeats |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US10714095B2 (en) | 2014-05-30 | 2020-07-14 | Apple Inc. | Intelligent assistant for home automation |
US10699717B2 (en) | 2014-05-30 | 2020-06-30 | Apple Inc. | Intelligent assistant for home automation |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US10417344B2 (en) | 2014-05-30 | 2019-09-17 | Apple Inc. | Exemplar-based natural language processing |
US10657966B2 (en) | 2014-05-30 | 2020-05-19 | Apple Inc. | Better resolution when referencing to concepts |
US20150379158A1 (en) * | 2014-06-27 | 2015-12-31 | Gabriel G. Infante-Lopez | Systems and methods for pattern matching and relationship discovery |
US10262077B2 (en) * | 2014-06-27 | 2019-04-16 | Intel Corporation | Systems and methods for pattern matching and relationship discovery |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10453443B2 (en) | 2014-09-30 | 2019-10-22 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10438595B2 (en) | 2014-09-30 | 2019-10-08 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10390213B2 (en) | 2014-09-30 | 2019-08-20 | Apple Inc. | Social reminders |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US20160224516A1 (en) * | 2015-01-30 | 2016-08-04 | Xerox Corporation | Method and system to attribute metadata to preexisting documents |
US10325511B2 (en) * | 2015-01-30 | 2019-06-18 | Conduent Business Services, Llc | Method and system to attribute metadata to preexisting documents |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US10930282B2 (en) | 2015-03-08 | 2021-02-23 | Apple Inc. | Competing devices responding to voice triggers |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10529332B2 (en) | 2015-03-08 | 2020-01-07 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11127397B2 (en) | 2015-05-27 | 2021-09-21 | Apple Inc. | Device voice control |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10681212B2 (en) | 2015-06-05 | 2020-06-09 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10671675B2 (en) | 2015-06-19 | 2020-06-02 | Gordon V. Cormack | Systems and methods for a scalable continuous active learning approach to information classification |
US10229117B2 (en) | 2015-06-19 | 2019-03-12 | Gordon V. Cormack | Systems and methods for conducting a highly autonomous technology-assisted review classification |
US10242001B2 (en) | 2015-06-19 | 2019-03-26 | Gordon V. Cormack | Systems and methods for conducting and terminating a technology-assisted review |
US10353961B2 (en) | 2015-06-19 | 2019-07-16 | Gordon V. Cormack | Systems and methods for conducting and terminating a technology-assisted review |
US10445374B2 (en) | 2015-06-19 | 2019-10-15 | Gordon V. Cormack | Systems and methods for conducting and terminating a technology-assisted review |
US11010127B2 (en) | 2015-06-29 | 2021-05-18 | Apple Inc. | Virtual assistant for media playback |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US20170076178A1 (en) * | 2015-09-14 | 2017-03-16 | International Business Machines Corporation | System, method, and recording medium for efficient cohesive subgraph identification in entity collections for inlier and outlier detection |
US9852359B2 (en) * | 2015-09-14 | 2017-12-26 | International Business Machines Corporation | System, method, and recording medium for efficient cohesive subgraph identification in entity collections for inlier and outlier detection |
US20180060694A1 (en) * | 2015-09-14 | 2018-03-01 | International Business Machines Corporation | System, method, and recording medium for efficient cohesive subgraph identifiation in entity collections for inlier and outlier detection |
US10282636B2 (en) * | 2015-09-14 | 2019-05-07 | International Business Machines Corporation | System, method, and recording medium for efficient cohesive subgraph identification in entity collections for inlier and outlier detection |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10699236B2 (en) | 2015-10-17 | 2020-06-30 | Tata Consultancy Services Limited | System for standardization of goal setting in performance appraisal process |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US10354652B2 (en) | 2015-12-02 | 2019-07-16 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10942703B2 (en) | 2015-12-23 | 2021-03-09 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US20170206453A1 (en) * | 2016-01-19 | 2017-07-20 | International Business Machines Corporation | System and method of inferring synonyms using ensemble learning techniques |
US10832146B2 (en) * | 2016-01-19 | 2020-11-10 | International Business Machines Corporation | System and method of inferring synonyms using ensemble learning techniques |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US11036800B1 (en) * | 2016-04-29 | 2021-06-15 | Veritas Technologies Llc | Systems and methods for clustering data to improve data analytics |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US10580409B2 (en) | 2016-06-11 | 2020-03-03 | Apple Inc. | Application integration with a digital assistant |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US10942702B2 (en) | 2016-06-11 | 2021-03-09 | Apple Inc. | Intelligent device arbitration and control |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10311046B2 (en) | 2016-09-12 | 2019-06-04 | Conduent Business Services, Llc | System and method for pruning a set of symbol-based sequences by relaxing an independence assumption of the sequences |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10067965B2 (en) | 2016-09-26 | 2018-09-04 | Twiggle Ltd. | Hierarchic model and natural language analyzer |
US10268766B2 (en) * | 2016-09-26 | 2019-04-23 | Twiggle Ltd. | Systems and methods for computation of a semantic representation |
US10282462B2 (en) * | 2016-10-31 | 2019-05-07 | Walmart Apollo, Llc | Systems, method, and non-transitory computer-readable storage media for multi-modal product classification |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
CN110447039A (en) * | 2017-03-23 | 2019-11-12 | 北京嘀嘀无限科技发展有限公司 | The system and method for predicting object type |
CN107169049A (en) * | 2017-04-25 | 2017-09-15 | 腾讯科技(深圳)有限公司 | The label information generation method and device of application |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10741181B2 (en) | 2017-05-09 | 2020-08-11 | Apple Inc. | User interface for correcting recognition errors |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US10847142B2 (en) | 2017-05-11 | 2020-11-24 | Apple Inc. | Maintaining privacy of personal information |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10909171B2 (en) | 2017-05-16 | 2021-02-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10482118B2 (en) * | 2017-06-14 | 2019-11-19 | Sap Se | Document representation for machine-learning document classification |
US11270225B1 (en) * | 2017-06-28 | 2022-03-08 | CS Disco, Inc. | Methods and apparatus for asynchronous and interactive machine learning using word embedding within text-based documents and multimodal documents |
US10062039B1 (en) * | 2017-06-28 | 2018-08-28 | CS Disco, Inc. | Methods and apparatus for asynchronous and interactive machine learning using word embedding within text-based documents and multimodal documents |
US10108902B1 (en) * | 2017-09-18 | 2018-10-23 | CS Disco, Inc. | Methods and apparatus for asynchronous and interactive machine learning using attention selection techniques |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
CN107748901A (en) * | 2017-11-24 | 2018-03-02 | 东北大学 | The industrial process method for diagnosing faults returned based on similitude local spline |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US20220156598A1 (en) * | 2018-02-05 | 2022-05-19 | Microsoft Technology Licensing, Llc | Interactive semantic data exploration for error discovery |
US11803763B2 (en) * | 2018-02-05 | 2023-10-31 | Microsoft Technology Licensing, Llc | Interactive semantic data exploration for error discovery |
US11270211B2 (en) * | 2018-02-05 | 2022-03-08 | Microsoft Technology Licensing, Llc | Interactive semantic data exploration for error discovery |
WO2019165000A1 (en) * | 2018-02-20 | 2019-08-29 | Jackson James R | Systems and methods for generating a relationship among a plurality of data sets to generate a desired attribute value |
US11341513B2 (en) | 2018-02-20 | 2022-05-24 | James R Jackson | Systems and methods for generating a relationship among a plurality of datasets to generate a desired attribute value |
US11900396B2 (en) | 2018-02-20 | 2024-02-13 | James R Jackson | Systems and methods for generating a relationship among a plurality of datasets to generate a desired attribute value |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10755229B2 (en) | 2018-04-11 | 2020-08-25 | International Business Machines Corporation | Cognitive fashion-ability score driven fashion merchandising acquisition |
US10891585B2 (en) | 2018-04-11 | 2021-01-12 | International Business Machines Corporation | Cognitive fashion-ability score driven fashion merchandising acquisition |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11538083B2 (en) | 2018-05-17 | 2022-12-27 | International Business Machines Corporation | Cognitive fashion product recommendation system, computer program product, and method |
US10956928B2 (en) | 2018-05-17 | 2021-03-23 | International Business Machines Corporation | Cognitive fashion product advertisement system and method |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US10944859B2 (en) | 2018-06-03 | 2021-03-09 | Apple Inc. | Accelerated task performance |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US10504518B1 (en) | 2018-06-03 | 2019-12-10 | Apple Inc. | Accelerated task performance |
US10963744B2 (en) * | 2018-06-27 | 2021-03-30 | International Business Machines Corporation | Cognitive automated and interactive personalized fashion designing using cognitive fashion scores and cognitive analysis of fashion trends and data |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11360739B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User activity shortcut suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
CN110532390A (en) * | 2019-08-26 | 2019-12-03 | 南京邮电大学 | A kind of news keyword extracting method based on NER and Complex Networks Feature |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
CN110781407A (en) * | 2019-10-21 | 2020-02-11 | 腾讯科技(深圳)有限公司 | User label generation method and device and computer readable storage medium |
CN111274376A (en) * | 2020-01-21 | 2020-06-12 | 支付宝(杭州)信息技术有限公司 | Method and system for training label prediction model |
US20210272013A1 (en) * | 2020-02-27 | 2021-09-02 | S&P Global | Concept modeling system |
US11948347B2 (en) | 2020-04-10 | 2024-04-02 | Samsung Display Co., Ltd. | Fusion model training using distance metrics |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
WO2022177928A1 (en) * | 2021-02-16 | 2022-08-25 | Carnegie Mellon University | System and method for reducing false positives in object detection frameworks |
CN114757271A (en) * | 2022-04-06 | 2022-07-15 | 扬州大学 | Social network node classification method and system based on multi-channel graph convolution network |
US11895074B2 (en) * | 2022-05-31 | 2024-02-06 | Microsoft Technology Licensing, Llc | Systems and methods for determining scores for messages based on actions of message recipients and a network graph |
CN115630160A (en) * | 2022-12-08 | 2023-01-20 | 四川大学 | Dispute focus clustering method and system based on semi-supervised co-occurrence graph model |
Also Published As
Publication number | Publication date |
---|---|
US8386574B2 (en) | 2013-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8386574B2 (en) | Multi-modality classification for one-class classification in social networks | |
Mujtaba et al. | Email classification research trends: review and open issues | |
Guzella et al. | A review of machine learning approaches to spam filtering | |
Aborisade et al. | Classification for authorship of tweets by comparing logistic regression and naive bayes classifiers | |
Sharmin et al. | Spam detection in social media employing machine learning tool for text mining | |
Jaspers et al. | Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA | |
Almeida et al. | Facing the spammers: A very effective approach to avoid junk e-mails | |
Almeida et al. | Filtering spams using the minimum description length principle | |
Kaya et al. | A novel approach for spam email detection based on shifted binary patterns | |
Mocherla et al. | Evaluation of naive bayes and support vector machines for wikipedia | |
Almeida et al. | Compression‐based spam filter | |
Alkaht et al. | Filtering spam using several stages neural networks | |
Dai et al. | A deep forest method for classifying e-commerce products by using title information | |
Hovelynck et al. | Multi-modality in one-class classification | |
Nisar et al. | Voting-ensemble classification for email spam detection | |
Amayri et al. | Beyond hybrid generative discriminative learning: spherical data classification | |
You et al. | Web service-enabled spam filtering with naive Bayes classification | |
Bhardwaj | Sentiment Analysis and Text Classification for Social Media Contents Using Machine Learning Techniques | |
Kaur et al. | E-mail spam detection using refined mlp with feature selection | |
Trivedi et al. | A modified content-based evolutionary approach to identify unsolicited emails | |
Medlock | Investigating classification for natural language processing tasks | |
Santos et al. | Spam filtering through anomaly detection | |
Naravajhula et al. | Spam classification: genetically optimized passive-aggressive approach | |
Ali | Automatic complaint classification system using classifier ensembles | |
Al-Ghamdi et al. | Digital Forensics and Machine Learning to Fraudulent Email Prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: XEROX CORPORATION, CONNECTICUT Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHIDLOVSKII, BORIS;HOVELYNCK, MATTHIJS;REEL/FRAME:023441/0653 Effective date: 20091028 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS AGENT, DELAWARE Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:062740/0214 Effective date: 20221107 |
|
AS | Assignment |
Owner name: XEROX CORPORATION, CONNECTICUT Free format text: RELEASE OF SECURITY INTEREST IN PATENTS AT R/F 062740/0214;ASSIGNOR:CITIBANK, N.A., AS AGENT;REEL/FRAME:063694/0122 Effective date: 20230517 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:064760/0389 Effective date: 20230621 |
|
AS | Assignment |
Owner name: JEFFERIES FINANCE LLC, AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:065628/0019 Effective date: 20231117 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:066741/0001 Effective date: 20240206 |