WO2003032107A2 - Method and system for monitoring e-mail - Google Patents

Method and system for monitoring e-mail Download PDF

Info

Publication number
WO2003032107A2
WO2003032107A2 PCT/KR2002/001882 KR0201882W WO03032107A2 WO 2003032107 A2 WO2003032107 A2 WO 2003032107A2 KR 0201882 W KR0201882 W KR 0201882W WO 03032107 A2 WO03032107 A2 WO 03032107A2
Authority
WO
WIPO (PCT)
Prior art keywords
mail
documents
document
confidential
vector
Prior art date
Application number
PCT/KR2002/001882
Other languages
French (fr)
Other versions
WO2003032107A3 (en
Inventor
Bog-Ju Lee
Soon-Kyu Choi
Original Assignee
Ecabin Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ecabin Inc. filed Critical Ecabin Inc.
Priority to AU2002362631A priority Critical patent/AU2002362631A1/en
Publication of WO2003032107A2 publication Critical patent/WO2003032107A2/en
Publication of WO2003032107A3 publication Critical patent/WO2003032107A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/109Time management, e.g. calendars, reminders, meetings or time accounting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/214Monitoring or handling of messages using selective forwarding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression

Definitions

  • the present invention relates in general to a method and ' a system for monitoring e-mails, and more particularly, to providing the e-mail monitoring method and system which can monitor efficiently if confidential documents of a group are sent out through emails by learning the concept of confidential documents and general documents automatically and by classifying an e- mail on the basis of learning result.
  • E-mails through network are used not only for posting mails but also for sending files. It takes a little time to post e-mails to a recipient. E-mails are posted to many persons at once. Also, e-mails have an advantage to be stored as a data. For these reason, e-mails are used widely. In case executives and/or employees of an enterprise send the confidential documents by e-mail intentionally or not, the enterprise runs a risk of letting out its secret. Accordingly, the enterprise prepares a system for monitoring emails being sent out in which any confidential information is included.
  • an object of the present invention is to provide a method and a system for monitoring emails, and more particularly, the e-mail monitoring method and system which can monitor efficiently if the confidential documents of a group are sent out through e-mails by learning the concept of confidential documents and general documents automatically and classifying e-mails on the basis of learning results.
  • an e-mail monitoring method for monitoring an e-mail sent out from predetermined group comprising the steps of classifying documents of the group into confidential documents or general documents as a level of security demands; converting the document into a form applicable to a Support Vector Machine (SVM) algorithm; calculating a Hyper-Plane classifying the documents into the confidential documents or the general documents and a Support Vector which is a vector of a nearest document to the Hyper-plane by learning the documents with the SVM algorithm; sniffing the e-mail sent from an inside of the group to an outside; converting the sniffed e-mail into the form applicable to the SVM algorithm; and applying the SVM algorithm to both the Support Vector calculated from a result by learning and the e-mail converted to a vector type and discriminating if the sniffed e-mail includes the confidential documents.
  • SVM Support Vector Machine
  • the step of converting the document into a form applicable to the SVM algorithm can comprise the steps of reading words included in the document and the e-mails; converting the read words into prescribed values; and indicating the document and the e-mail as a vector type with the words converted into the prescribed values .
  • the e-mail monitoring method further comprises the step of reporting an analyzed result after analyzing if the sniffed e-mail is the confidential document, so that the sent e-mail is monitored in real time.
  • a monitoring system for monitoring an e-mail sent out from predetermined group comprising a document database for storing documents in the group which are classified into confidential documents or general documents according to a level of security demands; a sniffer for sniffing the e-mail which is being sent out from an inside of the group; a e-mail database for storing the sniffed e-mail; a vector generator for converting words included in the document database and the e-mail database into a form applicable to a Support Vector Machine (SVM) algorithm; a vector database for storing vectors converted by the vector generator; a learner for learning the document of the document database converted by the vector generator with the SVM algorithm; a lea rning result database for storing a Hyper-Plane and a Support Vector which is learning results of the learner; a discriminator for discriminating if the sniffed e-mail is the confidential document by applying the SVM algorithm to the support vector calculated from the learning result and the
  • FIG. 1 is a block diagram of an e-mail monitoring system according to the present invention
  • FIG. 2 is a definite block diagram of the monitoring server m FIG. 1;
  • FIG. 3 is a flow chart which describes an e-mail monitoring method on the basis of the e-mail monitoring system
  • an e-mail monitoring system comprises an enterprise intranet 1 and a mail server 5 which is connected to each of client terminals 3 in the enterprise intranet via outside network.
  • the outside network includes not only the Internet but also other networks such as LAN, WAN, PSTN(Public Switched Telephone Network), PSDN(Public Switched Data Network) , Cable Network, Wireless communications Network.
  • the enterprise intranet 1 comprises an e-mail monitoring server 2 to monitor if e-mails sent by the client terminal 3 via the enterprise intranet 1 or other network include any confidential document.
  • the e-mail monitoring server 2 applies a Support Vector Machine (SVM) algorithm to learning process and discriminating process for classifying confidential documents.
  • SVM Support Vector Machine
  • V.Vapnik Support Vector Machines
  • a text categorization method with the SVM is referred in abundant literatures such as Thorsten Joachims, Text Categorization with Support Vector Machine: Learning with Many Relevant Features, LS-8 Report 23, Dormund, 27, November, 1997 (Revised: 19, April, 1998); Joachims, T, A Probabilistic analysis of the rocchio algorithm with TF*IDF for text categorization, in International Conference on Machine Learning (ICML) , 1997; G.Salton and M.McGill, Introduction to Modern Information Retrieval, McGraw Hill, New York, 1983; J.Platt, "Fast Training of SVMs Using Sequential Minimal Optimization", to be published in Advances in Kernel Methods-Support Vector Machine Learning, B.Scholkopf, C.Burges and A. Smola, eds., MIT Press, Cambridge, Mass., 1998.
  • documents can be categorized as two types, for example as follows. At first, words are read from the categorized documents to be converted into prescribed values and each document is indicated as a vector form with the words converted into the prescribed values. As each document has many words, a coordinate system indicating the vectors of the documents is also consisted of multidimensional or the more space. If there are many learned documents, the dimension is much higher. If the documents are located according to the vector values of each document at this coordinate system, a Hyper-Plane classifying documents into two categories and Support vectors of being vectors of the nearest documents to the Hyper-plane are calculated. These series of process is obtained by application software with the SVM algorithm.
  • the e-mail monitoring server 2 of the e-mail monitoring system comprises a document indexer 11 for registering the documents classified into general documents or confidential documents according to a level of security demands of the employees and the executives, a document database 13 for storing the classified documents by the document indexer 11, a sniffer 19 for sniffing the e-mails sent from each of the client terminals 3 in the enterprise to the mail server 5, an e- mail database 21 for storing the sniffed e-mails, a vector generator 23 for converting the words included by the e-mails or the documents into vector types, a vector database 25 for storing the documents or e-mails converted into vector types, a learner 15 for learning the document converted into vector types by the vector generator 23, a learning result database 17 for storing learning result of learner 15, a discriminator 27 for discriminating if the sniffed
  • the document indexer 11 registers the documents classified into the general documents or the confidential documents to the document database 13.
  • the document indexer 11 is executed on the basis of web as software to register documents. If documents are subdivided into each division or each characteristic of the job and registered when documents are registered by the indexer 11, the accuracy of learning may increase.
  • m case contents of the confidential documents are various because the size of an organization is large, it is desirable to classify documents and register them for each division.
  • the way that not general documents but only confidential documents are to be registered can be used.
  • all documents except documents classified into the confidential documents are registered as the general documents.
  • the confidential documents of A are documents registered m A as classified into the confidential documents and the general documents of A could be the confidential documents of B and C.
  • the general documents of B could be the confidential documents of A and C.
  • each division can manage the document database 13 without registering the general documents separately.
  • the learner 15 learns the documents converted into vector types by the vector generator 23. That is, the learner 15 is applied to the documents converted into vector types by the vector generator 23 and with the SVM, calculates the Hyper-Plane and the Support Vector and then stores them in the learning result database 17, wherein the Hyper-plane classifies the vector type- converted document into the confidential documents or the general documents and the Support Vector is the vector of the nearest document to the Hyper-plane.
  • the learner 15 can be operated by administrators of the e-mail monitoring server 2 insofar as documents are collected more than a predetermined amount . And the learner can be also operated automatically by every predetermined period.
  • the sniffer 19 sniffs the e-mails sent out and store the sniffed emails in the e-mail database 21.
  • the sniffer 19 uses the technology to monitor network communication packets in the network and read packets only corresponding to the e-mails. And it is most desirable that the sniffer 19 is devised to minimize an alteration of network architecture and network load according to the network architecture of the enterprise by making a combined application of both TCP- Based Sniffing in the form of simple wiretap and ARP- Based Sniffing where a sniffer assumes the role of a logical gateway.
  • the sniffer 19 can read all e-mails sent by protocols such as SMTP, P0P3 , HTTP (also including web mail) . Additionally, the sniffer 19 can read not only the document of an e-mail but also attached files.
  • the vector generator 23 read words from the documents which are stored in the document database 13 and the e-mail database 21 and the e-mails. Further, it converts the read words into the prescribed values. Then the Vector generator 23 converts the words converted into prescribed values into vector types applicable to the SVM algorithm.
  • the discriminator 27 discriminates if the sniffed e- mails are the confidential documents by applying the SVM algorithm to the support vector calculated from the learning result and the vector type-converted e-mails. And then, it stores the result thereof in the discrimination result database 29.
  • the discriminator 27 applies the each learning model of respective division to the sniffed e-mails and discriminates the e-mails as the confidential documents even if there is only one confidential document among them.
  • the controller 10 reads selectively the confidential documents and the general documents stored m the document database 13 with the indexer at need as species and converts the documents into a form applicable to the SVM algorithm of the learner 15 and provides the learner 15 with the converted documents. Thereby, the controller 10 makes the learning result of the learner 15 stored in the learning result database 17 as a file.
  • the learning result is indicated as the Hyper-Plane classifying the vector type- converted document into the confidential documents or the general documents and the Support Vectors which are the vector of the nearest documents to the Hyper-Plane.
  • the controller 10 converts the e-mails which are sniffed by the sniffer 19 and stored m the e-mail database 21 into a form applicable to the SVM algorithm of the learner 15.
  • the controller 10 makes the form thereof provided to the discriminator 27 and at the same time, makes the Hyper- Plane and the Support Vector which are stored in the learning result database 17 provided to the discriminator 27, whereby the controller 10 makes the discriminator 27 analyze if the sniffed e-mails are classified into the confidential documents.
  • the controller 10 makes the report generator 31 notify a user of the analysis result, that is, if the e- mails include the confidential documents, discriminated by the discriminator 27 and stored in the discriminating result database 29, whereby the controller 10 can monitor if the sent e-mails include confidential documents.
  • the learner 15 calculates the Hyper- Plane and the Support Vector by learning the confidential documents and general documents with the SVM algorithm (S10) , wherein the Hyper-plane classifies the vector type-converted document into the confidential documents or the general documents and the Support Vector is the vector of the nearest document to the Hyper-plane. Then, the Hyper-plane and the Support vectors thereof are stored in the learning result database 17 (S20) .
  • the e-mails sent to the outside of the enterprise are sniffed by the sniffer 19 and stored with the e-mail database 21 (S30) .
  • the sniffed e-mails are converted into a form applicable to the SVM algorithm by the vector generator 23 (S40) .
  • the discriminator 27 discriminates if the sniffed e-mails are confidential documents by applying the SVM algorithm to the support vector calculated from the learning result and the vector type- converted e-mails (S50) . If the e-mails thereof are discriminated as the confidential documents according to the analysis result of the discriminator 27, the result values thereof are stored in the discrimination result database 29 (S60) , otherwise the result values discriminated as general documents are stored in the discrimination result database 29 (S70) .
  • the controller 10 shows the result values with all sorts of graphs by operating the report generator 31 (S80) .
  • the present invention provides the method and the system which learn the concept of confidential documents and general documents automatically with the SVM, sniff the sent e-mails and discriminate if the sniffed e-mails are confidential documents on the basis of learning results.
  • the present invention can provide the e-mail monitoring method and system which can monitor efficiently if the confidential documents of a group are sent out through e-mails by learning the concept of confidential documents and general documents automatically and then, classifying an e-mail on the basis of learning results.

Abstract

The e-mail monitoring method comprises the steps of classifying documents of the group into confidential documents or general documents; converting the document into a form applicable to a SVM algorithm; calculating a Hyper-Plane and a Support Vector by learning the documents with the SVM algorithm; sniffing the e-mail sent out from the group; converting the sniffed e-mail into the form applicable to the SVM algorithm; and applying the SVM algorithm to both the Support Vector calculated from a result by learning and the e-mail converted to a vector type and discriminating if the sniffed e-mail includes the confidential documents. Thereby, the present invention can provide the e-mail monitoring method and system which can monitor efficiently if the confidential documents of a group are sent out through e-mails by learning the concept of confidential documents and general documents automatically and then, classifying an e-mail on the basis of learning results.

Description

METHOD AND SYSTEM FOR MONITORING E-MAIL
FIELD OF THE INVENTION
The present invention relates in general to a method and ' a system for monitoring e-mails, and more particularly, to providing the e-mail monitoring method and system which can monitor efficiently if confidential documents of a group are sent out through emails by learning the concept of confidential documents and general documents automatically and by classifying an e- mail on the basis of learning result.
BACKGROUND ART
E-mails through network are used not only for posting mails but also for sending files. It takes a little time to post e-mails to a recipient. E-mails are posted to many persons at once. Also, e-mails have an advantage to be stored as a data. For these reason, e-mails are used widely. In case executives and/or employees of an enterprise send the confidential documents by e-mail intentionally or not, the enterprise runs a risk of letting out its secret. Accordingly, the enterprise prepares a system for monitoring emails being sent out in which any confidential information is included. Therein, under the conventional security systems, administrators couldn't help having read words included in confidential documents and established database thereof and then discriminated if the confidential documents were included in the e- mails according as the sent e-mails included the words stored in the database. Then, they have classified separately the e-mails including the confidential documents and manage them.
But under these conventional security systems, administrators have reviewed the reading of the principle words in enormous enterprise documents by hand in everything. Accordingly, it is very tedious and time consuming. Thus, management cost increases.
Also, it is difficult for administrators to determine what words they are to read from confidential documents when they review confidential documents.
DISCLOSURE OF INVENTION Accordingly, the present invention has been made keeping in mind the above-described shortcoming and user's need, and an object of the present invention is to provide a method and a system for monitoring emails, and more particularly, the e-mail monitoring method and system which can monitor efficiently if the confidential documents of a group are sent out through e-mails by learning the concept of confidential documents and general documents automatically and classifying e-mails on the basis of learning results. This and other objects of the present invention may be accomplished by the provision of an e-mail monitoring method for monitoring an e-mail sent out from predetermined group comprising the steps of classifying documents of the group into confidential documents or general documents as a level of security demands; converting the document into a form applicable to a Support Vector Machine (SVM) algorithm; calculating a Hyper-Plane classifying the documents into the confidential documents or the general documents and a Support Vector which is a vector of a nearest document to the Hyper-plane by learning the documents with the SVM algorithm; sniffing the e-mail sent from an inside of the group to an outside; converting the sniffed e-mail into the form applicable to the SVM algorithm; and applying the SVM algorithm to both the Support Vector calculated from a result by learning and the e-mail converted to a vector type and discriminating if the sniffed e-mail includes the confidential documents.
Herein, the step of converting the document into a form applicable to the SVM algorithm can comprise the steps of reading words included in the document and the e-mails; converting the read words into prescribed values; and indicating the document and the e-mail as a vector type with the words converted into the prescribed values .
Preferably, the e-mail monitoring method further comprises the step of reporting an analyzed result after analyzing if the sniffed e-mail is the confidential document, so that the sent e-mail is monitored in real time.
Also, this and other objects of the present invention may be accomplished by the provision of a monitoring system for monitoring an e-mail sent out from predetermined group comprising a document database for storing documents in the group which are classified into confidential documents or general documents according to a level of security demands; a sniffer for sniffing the e-mail which is being sent out from an inside of the group; a e-mail database for storing the sniffed e-mail; a vector generator for converting words included in the document database and the e-mail database into a form applicable to a Support Vector Machine (SVM) algorithm; a vector database for storing vectors converted by the vector generator; a learner for learning the document of the document database converted by the vector generator with the SVM algorithm; a lea rning result database for storing a Hyper-Plane and a Support Vector which is learning results of the learner; a discriminator for discriminating if the sniffed e-mail is the confidential document by applying the SVM algorithm to the support vector calculated from the learning result and the e-mail converted to a vector type; and a report generator notifying a user of a discriminated result analyzed by the discriminator. BRIEF DESCRIPTION OF DRAWINGS
The present invention will be better understood and its various objects and advantages will be more fully appreciated from the following description taken in conjunction with the accompanying drawings, m which: FIG. 1 is a block diagram of an e-mail monitoring system according to the present invention;
FIG. 2 is a definite block diagram of the monitoring server m FIG. 1; and
FIG. 3 is a flow chart which describes an e-mail monitoring method on the basis of the e-mail monitoring system
MODES FOR CARRYING OUT THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings, and the same configuration has the same number.
According to the description of FIG. 1 and FIG. 2, an e-mail monitoring system comprises an enterprise intranet 1 and a mail server 5 which is connected to each of client terminals 3 in the enterprise intranet via outside network. The outside network includes not only the Internet but also other networks such as LAN, WAN, PSTN(Public Switched Telephone Network), PSDN(Public Switched Data Network) , Cable Network, Wireless communications Network.
The enterprise intranet 1 comprises an e-mail monitoring server 2 to monitor if e-mails sent by the client terminal 3 via the enterprise intranet 1 or other network include any confidential document. The e-mail monitoring server 2 applies a Support Vector Machine (SVM) algorithm to learning process and discriminating process for classifying confidential documents. SVM (Support Vector Machines) is a new learning method introduced by V.Vapnik. The SVM is well founded in terms of computational learning theory and very open to theoretical understanding and analysis.
A text categorization method with the SVM is referred in abundant literatures such as Thorsten Joachims, Text Categorization with Support Vector Machine: Learning with Many Relevant Features, LS-8 Report 23, Dormund, 27, November, 1997 (Revised: 19, April, 1998); Joachims, T, A Probabilistic analysis of the rocchio algorithm with TF*IDF for text categorization, in International Conference on Machine Learning (ICML) , 1997; G.Salton and M.McGill, Introduction to Modern Information Retrieval, McGraw Hill, New York, 1983; J.Platt, "Fast Training of SVMs Using Sequential Minimal Optimization", to be published in Advances in Kernel Methods-Support Vector Machine Learning, B.Scholkopf, C.Burges and A. Smola, eds., MIT Press, Cambridge, Mass., 1998.
According to the text categorization method with the SVM algorithm, documents can be categorized as two types, for example as follows. At first, words are read from the categorized documents to be converted into prescribed values and each document is indicated as a vector form with the words converted into the prescribed values. As each document has many words, a coordinate system indicating the vectors of the documents is also consisted of multidimensional or the more space. If there are many learned documents, the dimension is much higher. If the documents are located according to the vector values of each document at this coordinate system, a Hyper-Plane classifying documents into two categories and Support vectors of being vectors of the nearest documents to the Hyper-plane are calculated. These series of process is obtained by application software with the SVM algorithm. The utility of the SVM theory can be confirm by the empirical data of the text categorization on the basis of the SVM referred in every literature. The e-mail monitoring server 2 of the e-mail monitoring system according to the present invention as shown FIG. 2 comprises a document indexer 11 for registering the documents classified into general documents or confidential documents according to a level of security demands of the employees and the executives, a document database 13 for storing the classified documents by the document indexer 11, a sniffer 19 for sniffing the e-mails sent from each of the client terminals 3 in the enterprise to the mail server 5, an e- mail database 21 for storing the sniffed e-mails, a vector generator 23 for converting the words included by the e-mails or the documents into vector types, a vector database 25 for storing the documents or e-mails converted into vector types, a learner 15 for learning the document converted into vector types by the vector generator 23, a learning result database 17 for storing learning result of learner 15, a discriminator 27 for discriminating if the sniffed e-mails are confidential documents by applying the SVM algorithm to the support vector calculated by learning and the vector type- converted e-mails, a discrimination result database 29 for storing the discrimination result, a report generator 31 for reporting the discrimination result of confidential documents and a controller 10 for controlling all above-described device.
The document indexer 11 registers the documents classified into the general documents or the confidential documents to the document database 13. The document indexer 11 is executed on the basis of web as software to register documents. If documents are subdivided into each division or each characteristic of the job and registered when documents are registered by the indexer 11, the accuracy of learning may increase.
Especially, m case contents of the confidential documents are various because the size of an organization is large, it is desirable to classify documents and register them for each division. In this case, the way that not general documents but only confidential documents are to be registered can be used. Thus, all documents except documents classified into the confidential documents are registered as the general documents. For example, if specific divisions, A, B and C registered only the confidential documents respectively, the confidential documents of A are documents registered m A as classified into the confidential documents and the general documents of A could be the confidential documents of B and C. In the same way, the general documents of B could be the confidential documents of A and C. In this way, each division can manage the document database 13 without registering the general documents separately.
The learner 15 learns the documents converted into vector types by the vector generator 23. That is, the learner 15 is applied to the documents converted into vector types by the vector generator 23 and with the SVM, calculates the Hyper-Plane and the Support Vector and then stores them in the learning result database 17, wherein the Hyper-plane classifies the vector type- converted document into the confidential documents or the general documents and the Support Vector is the vector of the nearest document to the Hyper-plane. The learner 15 can be operated by administrators of the e-mail monitoring server 2 insofar as documents are collected more than a predetermined amount . And the learner can be also operated automatically by every predetermined period. The sniffer 19 sniffs the e-mails sent out and store the sniffed emails in the e-mail database 21. Wherein, it's preferable that the sniffer 19 uses the technology to monitor network communication packets in the network and read packets only corresponding to the e-mails. And it is most desirable that the sniffer 19 is devised to minimize an alteration of network architecture and network load according to the network architecture of the enterprise by making a combined application of both TCP- Based Sniffing in the form of simple wiretap and ARP- Based Sniffing where a sniffer assumes the role of a logical gateway. The sniffer 19 can read all e-mails sent by protocols such as SMTP, P0P3 , HTTP (also including web mail) . Additionally, the sniffer 19 can read not only the document of an e-mail but also attached files.
The vector generator 23 read words from the documents which are stored in the document database 13 and the e-mail database 21 and the e-mails. Further, it converts the read words into the prescribed values. Then the Vector generator 23 converts the words converted into prescribed values into vector types applicable to the SVM algorithm.
The discriminator 27 discriminates if the sniffed e- mails are the confidential documents by applying the SVM algorithm to the support vector calculated from the learning result and the vector type-converted e-mails. And then, it stores the result thereof in the discrimination result database 29. By the way, in the case that each division registers different confidential documents and general documents respectively, there can be various learning model to be the criterion of confidential documents. In this case, the discriminator 27 applies the each learning model of respective division to the sniffed e-mails and discriminates the e-mails as the confidential documents even if there is only one confidential document among them.
On the other hand, the controller 10 reads selectively the confidential documents and the general documents stored m the document database 13 with the indexer at need as species and converts the documents into a form applicable to the SVM algorithm of the learner 15 and provides the learner 15 with the converted documents. Thereby, the controller 10 makes the learning result of the learner 15 stored in the learning result database 17 as a file. Wherein, the learning result is indicated as the Hyper-Plane classifying the vector type- converted document into the confidential documents or the general documents and the Support Vectors which are the vector of the nearest documents to the Hyper-Plane. Also, the controller 10 converts the e-mails which are sniffed by the sniffer 19 and stored m the e-mail database 21 into a form applicable to the SVM algorithm of the learner 15. Thereafter, the controller 10 makes the form thereof provided to the discriminator 27 and at the same time, makes the Hyper- Plane and the Support Vector which are stored in the learning result database 17 provided to the discriminator 27, whereby the controller 10 makes the discriminator 27 analyze if the sniffed e-mails are classified into the confidential documents.
The controller 10 makes the report generator 31 notify a user of the analysis result, that is, if the e- mails include the confidential documents, discriminated by the discriminator 27 and stored in the discriminating result database 29, whereby the controller 10 can monitor if the sent e-mails include confidential documents.
An e-mail monitoring process by the e-mail monitoring system hereof will be fully appreciated from the following description and FIG 3. To begin with, the learner 15 calculates the Hyper- Plane and the Support Vector by learning the confidential documents and general documents with the SVM algorithm (S10) , wherein the Hyper-plane classifies the vector type-converted document into the confidential documents or the general documents and the Support Vector is the vector of the nearest document to the Hyper-plane. Then, the Hyper-plane and the Support vectors thereof are stored in the learning result database 17 (S20) .
The e-mails sent to the outside of the enterprise are sniffed by the sniffer 19 and stored with the e-mail database 21 (S30) . The sniffed e-mails are converted into a form applicable to the SVM algorithm by the vector generator 23 (S40) . The discriminator 27 discriminates if the sniffed e-mails are confidential documents by applying the SVM algorithm to the support vector calculated from the learning result and the vector type- converted e-mails (S50) . If the e-mails thereof are discriminated as the confidential documents according to the analysis result of the discriminator 27, the result values thereof are stored in the discrimination result database 29 (S60) , otherwise the result values discriminated as general documents are stored in the discrimination result database 29 (S70) . The controller 10 shows the result values with all sorts of graphs by operating the report generator 31 (S80) .
Thus, the present invention provides the method and the system which learn the concept of confidential documents and general documents automatically with the SVM, sniff the sent e-mails and discriminate if the sniffed e-mails are confidential documents on the basis of learning results.
As stated above, the present invention can provide the e-mail monitoring method and system which can monitor efficiently if the confidential documents of a group are sent out through e-mails by learning the concept of confidential documents and general documents automatically and then, classifying an e-mail on the basis of learning results.
Although the preferred embodiments of the present invention have been disclosed for illustrative purpose, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Claims

WHAT IS CLAIMED IS:
1. An e-mail monitoring method for monitoring an e- mail sent out from predetermined group comprising the steps of : classifying documents of the group into confidential documents or general documents as a level of security demands ; converting the document into a form applicable to a Support Vector Machine (SVM) algorithm; calculating a Hyper-Plane classifying the documents into the confidential documents or the general documents and a Support Vector which is a vector of a nearest document to the Hyper-plane by learning the documents with the SVM algorithm; sniffing the e-mail sent from an inside of the group to an outside; converting the sniffed e-mail into the form applicable to the SVM algorithm; and applying the SVM algorithm to both the Support Vector calculated from a result by learning and the e-mail converted to a vector type and discriminating if the sniffed e-mail includes the confidential documents.
2. The e-mail monitoring method according to claim 1, wherein the step of converting the document into a form applicable to the SVM algorithm comprising the steps of: reading words included in the document and the e- mails ; converting the read words into prescribed values; and indicating the document and the e-mail as a vector type with the words converted into the prescribed values.
3. The e-mail monitoring method according to claim 1 further comprising the step of reporting an analyzed result after analyzing if the sniffed e-mail is the confidential document.
4. A monitoring system for monitoring an e-mail sent out from predetermined group comprising: a document database for storing documents in the group which are classified into confidential documents or general documents according to a level of security demands; a sniffer for sniffing the e-mail which is being sent out from an inside of the group; an e-mail database for storing the sniffed e-mail; a vector generator for converting words included in the document database and the e-mail database into a form applicable to a Support Vector Machine (SVM) algorithm; a vector database for storing vectors converted by the vector generator; a learner for learning the document of the document database converted by the vector generator with the SVM algorithm; a learning result database for storing a Hyper-Plane and a Support Vector which is learning results of the learner; a discriminator for discriminating if the sniffed e- mail is the confidential document by applying the SVM algorithm to the support vector calculated from the learning result and the e-mail converted to a vector type ; and a discrimination result database for storing a discriminated result of the discriminator.
5. The e-mail monitoring system according to claim 4 further comprising a report generator notifying a user of a discriminated result analyzed by the discriminator if the e-mail includes the confidential document.
6. The e-mail monitoring method according to claim 1, wherein the step of classifying documents of the group into confidential documents or general documents comprising the steps of: registering the confidential document respectively for each of divisions; and registering the document classified into the confidential document in other division except a pertinent division as the general document.
7. The e-mail monitoring method according to claim 6, wherein the step of discriminating if the sniffed e-mail includes the confidential document, comprising the step of discriminating the sniffed e-mail as the confidential document in the case that the SVM algorithm is applied into the Support Vector of which the confidential document is registered for each of the divisions and the e-mail converted into the vector type and thereby, the sniffed e-mail is analyzed and on the basis of an analysis result, the sniffed e-mail is discriminated as the confidential document of at least one division among the divisions.
PCT/KR2002/001882 2001-10-12 2002-10-09 Method and system for monitoring e-mail WO2003032107A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002362631A AU2002362631A1 (en) 2001-10-12 2002-10-09 Method and system for monitoring e-mail

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR2001/63063 2001-10-12
KR10-2001-0063063A KR100483602B1 (en) 2001-10-12 2001-10-12 Method and system for monitoring e-mail

Publications (2)

Publication Number Publication Date
WO2003032107A2 true WO2003032107A2 (en) 2003-04-17
WO2003032107A3 WO2003032107A3 (en) 2003-12-18

Family

ID=19715077

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2002/001882 WO2003032107A2 (en) 2001-10-12 2002-10-09 Method and system for monitoring e-mail

Country Status (3)

Country Link
KR (1) KR100483602B1 (en)
AU (1) AU2002362631A1 (en)
WO (1) WO2003032107A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1613020A2 (en) 2004-06-30 2006-01-04 Microsoft Corporation Method and system for detecting when an outgoing communication contains certain content
EP2101268A1 (en) * 2006-12-28 2009-09-16 Canon Kabushiki Kaisha Information processing device, information processing device control method, program, and recording medium
US11675926B2 (en) 2018-12-31 2023-06-13 Dathena Science Pte Ltd Systems and methods for subset selection and optimization for balanced sampled dataset generation

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102526530B1 (en) 2022-07-27 2023-04-27 주식회사 디프스팩 blocking method for Web mail and system thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000072257A2 (en) * 1999-05-25 2000-11-30 Barnhill Stephen D Enhancing knowledge discovery from multiple data sets using multiple support vector machines
US6161130A (en) * 1998-06-23 2000-12-12 Microsoft Corporation Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set
JP2000354115A (en) * 1999-06-14 2000-12-19 Matsushita Electric Ind Co Ltd Telephone set with electronic mail function
US6192360B1 (en) * 1998-06-23 2001-02-20 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US6161130A (en) * 1998-06-23 2000-12-12 Microsoft Corporation Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set
US6192360B1 (en) * 1998-06-23 2001-02-20 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier
WO2000072257A2 (en) * 1999-05-25 2000-11-30 Barnhill Stephen D Enhancing knowledge discovery from multiple data sets using multiple support vector machines
JP2000354115A (en) * 1999-06-14 2000-12-19 Matsushita Electric Ind Co Ltd Telephone set with electronic mail function

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1613020A2 (en) 2004-06-30 2006-01-04 Microsoft Corporation Method and system for detecting when an outgoing communication contains certain content
EP1613020A3 (en) * 2004-06-30 2012-03-07 Microsoft Corporation Method and system for detecting when an outgoing communication contains certain content
US8782805B2 (en) 2004-06-30 2014-07-15 Microsoft Corporation Method and system for detecting when an outgoing communication contains certain content
EP2101268A1 (en) * 2006-12-28 2009-09-16 Canon Kabushiki Kaisha Information processing device, information processing device control method, program, and recording medium
CN102176706A (en) * 2006-12-28 2011-09-07 佳能株式会社 Information processing device and information processing method
EP2101268A4 (en) * 2006-12-28 2013-01-02 Canon Kk Information processing device, information processing device control method, program, and recording medium
EP2544418A1 (en) * 2006-12-28 2013-01-09 Canon Kabushiki Kaisha Information processing apparatus, method of controlling information processing apparatus, program for control method, and recording medium for program
US9197447B2 (en) 2006-12-28 2015-11-24 Canon Kabushiki Kaisha Information processing apparatus, method of controlling information processing apparatus, program for control method, and recording medium for program
US11675926B2 (en) 2018-12-31 2023-06-13 Dathena Science Pte Ltd Systems and methods for subset selection and optimization for balanced sampled dataset generation

Also Published As

Publication number Publication date
KR20030030720A (en) 2003-04-18
KR100483602B1 (en) 2005-04-15
AU2002362631A1 (en) 2003-04-22
WO2003032107A3 (en) 2003-12-18

Similar Documents

Publication Publication Date Title
US20050204001A1 (en) Method and devices for prioritizing electronic messages
CN112995196B (en) Method and system for processing situation awareness information in network security level protection
US20050281276A1 (en) Data analysis and flow control system
US20080168453A1 (en) Work prioritization system and method
JP6207185B2 (en) Information analysis apparatus, information analysis method, information analysis system, and program
CN106407078A (en) An information interaction-based client performance monitoring device and method
CN110727643A (en) File classification management method and system based on machine learning
WO2023272850A1 (en) Decision tree-based product matching method, apparatus and device, and storage medium
CN112001443A (en) Network behavior data monitoring method and device, storage medium and electronic equipment
CN107465652B (en) Operation behavior detection method, server and system
WO2003032107A2 (en) Method and system for monitoring e-mail
CN102905236A (en) Method, device and system for monitoring spam short messages
CN109493251B (en) Electric power wireless public network monitoring system
CN116781347A (en) Industrial Internet of things intrusion detection method and device based on deep learning
CN112511360B (en) Multi-source service platform data security component monitoring method and system
CN114579961A (en) Sensitive data identification method based on multi-industry detection rules and related device
CN112054989B (en) Construction method of detection model and detection method of batch operation abnormity
CN114020585A (en) Service processing method, device and computer readable storage medium
Lu et al. A deep learning approach for m2m traffic classification using call detail records
Asaju et al. Short message service (SMS) spam detection and classification using Naïve Bayes
RU2775861C1 (en) Method and system for detection of abnormal user behavior
Ming et al. Spam filtering by stages
Frank et al. Applications of neural networks to telecommunications systems
CN108075932A (en) A kind of data monitoring method and device
CN107222546A (en) It polymerize the method and system of internet content

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KZ LK LR LS LT LU LV MA MD MG MK MW MX MZ NO NZ OM PH PL PT RO SD SE SG SI SK SL TJ TM TN TR TT UA UG US UZ VC VN YU ZA ZM

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION UNDER RULE 69 EPC (EPO FORM 1205A DATED 22.07.2004)

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP