WO2002019069A2 - Monitoring electronic mail message digests - Google Patents

Monitoring electronic mail message digests Download PDF

Info

Publication number
WO2002019069A2
WO2002019069A2 PCT/GB2001/003852 GB0103852W WO0219069A2 WO 2002019069 A2 WO2002019069 A2 WO 2002019069A2 GB 0103852 W GB0103852 W GB 0103852W WO 0219069 A2 WO0219069 A2 WO 0219069A2
Authority
WO
WIPO (PCT)
Prior art keywords
message
electronic mail
numerical representation
mail messages
characteristic numerical
Prior art date
Application number
PCT/GB2001/003852
Other languages
French (fr)
Other versions
WO2002019069A3 (en
Inventor
Alyn Hockey
Original Assignee
Clearswift Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clearswift Limited filed Critical Clearswift Limited
Priority to EP01960974A priority Critical patent/EP1368719A2/en
Priority to AU2001282359A priority patent/AU2001282359A1/en
Priority to US10/362,840 priority patent/US7801960B2/en
Publication of WO2002019069A2 publication Critical patent/WO2002019069A2/en
Publication of WO2002019069A3 publication Critical patent/WO2002019069A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99948Application of database or data structure, e.g. distributed, multimedia, or image

Definitions

  • This invention relates to networked computer systems security in general and protection against Denial of Service (DoS) attacks, virus attacks and unsolicited commercial email in particular. More specifically, this invention concerns an apparatus and method for managing electronic mail message processing.
  • DoS Denial of Service
  • Such systems are advantageous in that they can exchange a wide variety of different items of information at a low cost with servers and networks on the Internet and other networks .
  • Email worms may comprise a file attachment to a mail message, or comprise a script or code embedded into the text of a mail message.
  • the worm may exploit scripting capabilities of internet mail client software to send malicious code to other users on a mailing list or newsgroup automatically, and thus appear as having originated from a legitimate sender.
  • an email worm may exploit a user's email address book to obtain targets to spread to. To the recipient, the worm would appear- to have been sent by a familiar source, as their presence in the address book would imply that previous email dialogue had taken place. Therefore, a further email apparently from that source would not appear out of the ordinary.
  • Such emails may comprise a suitably benign message as further means of deception to the recipients.
  • Especially threatening are email worms which require no action on the part of the recipient (such as opening a mail message) to install and activate the malicious code.
  • some mail clients such as Outlook Express (TM) and Outlook (TM) support a "preview” pane which typically scans all unread messages in a user's electronic mailbox. Such a "preview” is sufficient to install and activate such a worm if present amongst the unread messages .
  • worms Other types are known. For instance, a worm could spread over a local area network (LAN) , wide area network (WAN) or peer-to-peer network directly by determining the network addresses of other computers on the network and sending copies of its code to these addresses. Such files would appear to originate from a legitimate machine.
  • LAN local area network
  • WAN wide area network
  • peer-to-peer network directly by determining the network addresses of other computers on the network and sending copies of its code to these addresses. Such files would appear to originate from a legitimate machine.
  • IRC internet relay chat
  • a protocol allowing real time communications between Internet users and other features such as transfer of data and executable files over IRC channels.
  • Popular IRC client software supports scripting features. For example, an IRC script could send a message or data file automatically to specified users when they connect to an IRC channel .
  • Special scripting commands allow execution of DOS and Windows executable files allowing infected scripts to propagate and transfer worm code to other machines .
  • a known approach for addressing the problem of "spam” is to use a mail filtering system implemented at an Internet Service Provider (ISP) server, organisation's mail transfer agent (MTA) or user terminal .
  • ISP Internet Service Provider
  • MTA mail transfer agent
  • Such a filtering system sorts incoming email into categories, typically determined by the recipient.
  • WO 99/32985 discloses a system and method of filtering junk emails comprising a first list of unapproved email addresses and character strings which a user wishes not to receive and a second list of approved email addresses and character strings which the user- wishes to receive.
  • the first list is periodically updated on the basis of further mail messages that the user rejects.
  • the filtering instructions must frequently be updated as new junk mail messages appear. Furthermore, it does not aid in tracking the originator of "spam" messages .
  • EP 0720333A discloses a technique for reducing the quantity of "spam" messages received by a user whereby >a recipient specifier containing non-address information is added to an email message.
  • the mail filter for a given recipient has access to information about that recipient and uses that information together with the non-address information in the email message to determine whether the message should be provided to the given recipient. If the non-address information and the information about the recipient indicate that the given recipient should not receive the message, the filter does not provide it.
  • Such "non-address information” comprises two additional components, namely a recipient specifier and a referral list, which accompany the email message. These components, must be adopted by all parties wishing to communicate with the specified user for the method to be effective. Furthermore, said information increases the size of the mail message and hence storage space and bandwidth requirements.
  • this method is of limited use in detecting email worms as email worms are characterised by their traffic behaviour rather than by any particular sender.
  • a system for controlling delivery of unsolicited electronic mail messages, whereby "spam" probe email addresses are created and planted at various sites on a communications network in order to ensure their inclusion on large-scale "spam” mailing lists.
  • spam probe email addresses Upon receipt of incoming email addressed to the spam probe addresses, the received email is analysed automatically to identify the source of the message, characteristic spam source data extracted from the message, and an alert signal generated containing the spam source data.
  • a filtering system implemented at the servers receives the alert signal and updates filtering data using the spam source data retrieved from the alert signal, and controls delivery of subsequently received email messages received from the identified spam source.
  • UCE unsolicited commercial email
  • each mail message may comprise :
  • non-subject header information (c) a main body, comprising message content; the method comprising: receiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non-subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
  • the step of generating comprises generating a characteristic numerical representation of at least a part of said message content and of said subject header information.
  • the step of generating a characteristic numerical representation of at least a part of said message content comprises generating a characteristic numerical representation of at least textual content in said message content.
  • the step of generating a characteristic numerical representation of at least a part of said message content comprises generating a characteristic numerical representation of at least attached files in said message content .
  • Generating characteristic numerical representations of attached files assists in monitoring of email worms which rely on file attachments in order to propagate.
  • the step of generating may produce a characteristic numerical representation of said file attachments only. This enables the speed of the searcn to oe increased, tor example, when scanning for virus worms .
  • a timestamp of said electronic mail message is stored.
  • Such a timestamp may be provided from a defined public time service, such as a public time server on the Internet.
  • a public time server on the Internet Preferably, the internet protocol
  • IP IP address of origin of said electronic mail message is further stored. These assist in detecting of the originator of said mail message.
  • the header information is stored.
  • said header information comprises message source and destination details. Comparison of these allows determination of whether the message is a simple resend, a bulk circulated email, or " transmitted via multiple routes.
  • said generated characteristic numerical representation is a message digest, for example generated using a message digest 5 (MD5) algorithm or a hash, for example generated using secure hash algorithm 1 (SHA-1) .
  • MD5 message digest 5
  • SHA-1 secure hash algorithm 1
  • said method further comprises, if said characteristic numerical representation matches a characteristic numerical representation stored in memory, incrementing a count value associated with said characteristic numerical representation.
  • said method further comprises comparinq said count value with a predetermined threshold.
  • said predetermined threshold is determined on the basis of said header information. This enables a different threshold to be set for characteristic numerical representations having the same sender and destination information, a same sender and a different destination, a different sender but a same destination, and different senders and destinations .
  • an alert is preferably raised.
  • said alert is preferably determined on the basis of said header information.
  • said alert comprises flagging said message with a marker prefixed to the subject line of the message.
  • said marker may comprise the word "JUNK”, "SUSPICIOUS" and may further be user configurable.
  • said alert comprises delivering a fixed portion of said message only.
  • said alert comprises delivering an. audit notification to the recipient. This would enable the recipient to confirm that the message is unsolicited whilst economising on bandwidth.
  • said alert comprises deleting the message.
  • said alert comprises deleting a file attachment from the message. This facility is useful in instances where the message has already been sent to a same destination a number of times, where a message is suspected of being an email worm, or in guarding a destination or mail server thereof against cracker attack.
  • said method further comprises, allowing said message to be delivered if said characteristic numerical representation matches an approved characteristic numerical representation stored in memory.
  • a "white list" of approved characteristic numerical re-presentations may include those of routine test messages, or other routinely sent "bulk” messages .
  • said characteristic numerical representation is stored in cache memory.
  • the maximum number of characteristic numerical repre- sentations stored in cache memory is configurable and said cache memory may preferably be periodically copied to a storage means .
  • the use of cache memory to store recent characteristic numerical representations allows the database of characteristic numerical representations to be searched more quickly.
  • non-subject header information (c) a main body, comprising message content; the software containing code for: receiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non- subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
  • a software product for use in a client's mail user agent (UA) for monitoring electronic mail messages wherein each mail message may compris :
  • a main body comprising message content; the software product containing code for: -lireceiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non- subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
  • each mail message may comprise: (a) subject header information
  • a main body comprising message content; the system comprising: means for receiving an electronic mail message, a processor for generating a characteristic numerical representation of at least a part of said message content, but not of said non-subject header information; a memory for storing said generated characteristic numerical representation; and means for comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
  • FIG. 1 is a block diagram of part of a computer network operating in accordance with the invention.
  • Figure 2 is a multi-part email message illustrating message body structure.
  • FIG. 3 is a flowchart illustrating operation of a software product in accordance with the invention.
  • FIG. 1 of the accompanying drawings illustrates functional blocks of a server 100, such as a simple mail transfer protocol (SMTP) server, operable in accordance with the present invention.
  • Server 100 comprises a central processing unit (CPU) 102 in communication with a memory 104.
  • the CPU 102 can store and retrieve data to and from a storage means 106, . and outputs display information to a video display 108.
  • CPU central processing unit
  • Server 100 may be connected to and communicate with a private network 110 such as a local area network (LAN) .
  • a private network 110 such as a local area network (LAN) .
  • LAN local area network
  • server 100 may be able to send and receive files to and from a public network 116 sucn as the Internet, using an ISDN, serial, Ethernet or other connection, preferably via a firewall 112 and router 11 .
  • Internet 116 comprises a vast number of computers and computer networks that are connected through communications links.
  • local area network 110 may itself be connected through a server to another network (not shown) such as the Internet .
  • Server 100 may further comprise input peripherals such as a terminal having a mouse and/or keyboard (not shown) and output peripherals such as a printer or sound generation hardware, as customary in the art.
  • Server 100 runs operating system and networking software which may be stored on disc or provided in read-only memory (ROM) . Data may be transferred to server 100 via a removable storage means (not shown) or through either of networks 110, 112.
  • SMTP Simple Mail Transfer Protocol
  • MTS Message Transport System
  • Mail reader software also known as Mail User Agents
  • MUA User Agent
  • POP3 Post Office Protocol
  • IMAP Internet Message Access Protocol
  • messages may be sent, typically using the SMTP protocol .
  • email message 200 comprises header information 210, including subject header information 215, typically input by the sender of the message, and non-subject header information, including the identity of the sender and the intended recipient (s) , and the date.
  • header information is followed by the body 230 of the message, conventionally separated from the header by a blank line 220.
  • header information 210 is not important in most cases, and many headers 214 are optional, but the mandatory headers 212 that must be present are Date:, From: and one of To:, Cc : or BCC: .
  • the keyword may be in any mixture of capitals and small letters, so CC: is the same as Cc : .
  • the subject header 215 although optional, is generally used.
  • the main body 230 of the message may comprise plain American Standard Code for Information Interchange (ASCII) text, or may assume a multi-part format allowing textual and non-textual message bodies to be represented and exchanged without loss of information.
  • ASCII American Standard Code for Information Interchange
  • MIME Multipurpose Internet Mail Extensions
  • Such a multi-part format, defined by RFC2045 and RFC2046 (published on the Internet) also permits representation of body text in character sets other than ASCII through content encodings, as well as representation of non-textual material such as images and audio fragments.
  • the -message body may be -appended by an optional "signature" , comprising a signature separator 300 followed by signature text 310.
  • signature separator 300 comprises two hyphens followed by a space but often may comprise a series of hyphens, asterisks or other characters.
  • signature text 310 typically includes a user's name, organisation, and contact details.
  • Multi-part headers 216 identify the message 200 as being in multi-part format, and are used by mail reader software to interpret the message 200 correctly.
  • the fields 216 comprise distinctive boundary separator text 218 which is used to separate different parts of the message body 230 pertaining, for example, to textual content, enriched textual content such as HyperText Markup Language (HTML) , and attached files.
  • HTML HyperText Markup Language
  • field 216 may read:
  • ABSCDEFG is the distinctive separator text 218.
  • the header may read
  • the different parts of the message body 230 are separated by boundary separators 240, 260 comprising two hyphens followed by the boundary separator text 218.
  • body 230 is shown as consisting of two parts 250 and 270, email message 200 may comprise more or fewer parts.
  • preamble text 232 positioned above the first boundary separator 240 will typically be ignored by a mail reader that understands multi-part format messages but will be displayed by a mail reader which does not support the multi-part format. That is to say, preamble text 232 is not considered part of the message content. Conventionally, such preamble text 232 may therefore be used to alert the recipient to the fact that the mail reader being used is not compliant with the format of the message, as shown.
  • First body part 250 encloses header fields 252 and a body 256 which is separated from header 252 by a blank line 254.
  • Header fields 252 conventionally define the type of data contained in body 256 and may specify any encoding necessary for the purposes of transfer of information contained in the body 256 through the mail system, so that the receiving program can reverse the process.
  • a file conveyed within a message part body 256 in this way is known as an "attachment" to the message.
  • header fields 252 may be empty, in which instance the body is conventionally interpreted as comprising US-ASCII text.
  • second body part 270 is analogous to body part 220, and the reference numerals indicate like parts. Further body parts may then follow as generally indicated by 280.
  • End of body separator text 290 comprises two hyphens followed by the distinctive boundary separator text 218, followed by a further two hyphens.
  • the software is loaded permanently on a mail server, and remains in a quiescent state until an email message arrives at the server, typically from another server via SMTP.
  • the software intercepts an incoming mail message and reads it into memory.
  • the software determines whether the message is in multi-part format, and, if so, whether there are any file attachments to the message. Typically, this is done by examining the header for the following fields as part of step 300
  • xxx may indicate a number of alternatives as customary in the art, such as “mixed”, “alternative” and “parallel” .
  • the body of the message may be scanned for the header field:
  • xxx will typically read “Text/plain” , “Text/html” or similar, for a message without any non-textual file attachments. If file attachments are present, xxx may read “image/jpeg” , “audio/basic” or a number of other possibilities corresponding to a variety of file formats as customary in the art. The header would typically read “Content-Type : multipart/mixed” accordingly.
  • a characteristic numerical representation is generated for the combined subject line 215 and message content, in step 302.
  • the message content is obtained by taking the body 256 of the first part (text) 250 and the body 276 of the second part (image) 270, that is to say, any multi-part boundary information and any header information of each part is not considered to be part of the message content.
  • the message content would be considered to be the entire body 230 of the message 200.
  • digest as known to one of skill of the art, is a message digest algorithm which takes a message of arbitrary length and produces a numerical representation comprising a number of bits sufficiently small to form a condensed digest of the original message and allow fast and straight-forward searching, but sufficiently large to be essentially unique.
  • Example algorithms include Message Digest 5 (MD5) , developed by Rivest in 1991 and documented in Internet RFC1321 (published on the Internet) which takes a message of arbitrary length -and produces a 128-bit message- digest .
  • MD5 Message Digest 5
  • An alternative is the Secure Hash Algorithm 1 (SHA-1) developed by National Institute of Science and Technology (NIST) which produces a 160-bit message digest. In both these cases, the messages are padded so they are a multiple of 512 bits long and processed in blocks if necessary.
  • SHA-1 Secure Hash Algorithm 1
  • a single digest has been generated for a multi-part message
  • variations such as generating a first digest for the subject only, a second digest for the message content or separate digests for each body part may be performed.
  • a user such as a System Administrator, may define which elements would be used in generating the digests.
  • virus worms since a characteristic of virus worms is that they may cause the same message to be sent from different sources to different users in the network, it is preferable not to use the non-subject header information in generating the digest, if such a virus is to be detected.
  • the generated digest (or digests) is stored in a memory, together with the internet protocol (IP) address of where the message came from, sender information and destination details, and a timestamp to serve as a record of when the message was received. For instance, the time from a defined public time service, such as a public time server on the internet, may be used. Other information from the header may be stored such as the Message-ID field. Again, these options may be configured by a System Administrator.
  • IP internet protocol
  • the digest is compared with existing digests stored in memory in step 304 and any matches noted in step 310.
  • message digest algorithms is advantageous in that they greatly reduce the amount of data to be stored and hence enable fast searching, whilst remaining essentially unique and thus ensuring the probability of conflict with a different mail message is extremely low.
  • the digest should be compared with those of mail messages sent over approximately the previous seven days, as typical spam messages will recur within a short period.
  • these most recent digests in this case, those of mail messages sent over the last seven days, are stored in a cache memory 306 which could be flushed to disk periodically.
  • step 390 determines whether there are further messages waiting to be processed by the mail server, and if so, the process will loop back to step 300 and read the next message in to memory. -If no more messages are waiting, the process returns to a quiescent state in which it may perform background tasks such as transferring older hashes out of cache memory and onto disk whilst waiting for further messages to arrive.
  • the digest is compared in step 320 against a "white list" of approved digests believed to be harmless .
  • These typically include test messages sent routinely, particularly by new users, to determine whether a mail system is functioning correctly. If the message is found to match such a digest, as indicated by 322, the message is allowed to proceed and the process will loop back to step 300 accordingly.
  • the sender/destination information (as indicated by the From:, To:, Cc : and Bcc: headers) and/or other header information stored with those hashes are compared.
  • a check is performed to see if the digests share a common source and destination, and if so, a count value for this instance is incremented in step 332.
  • One or more of a variety of actions may be performed, depending on the sender/recipient information and on how large a count value has accrued for the hash in question.
  • the most basic action possible is to allow the message to continue. This is appropriate, if the tally lies .below a small threshold value (e.g. 5) , as the sender and recipient details are the same. Such an instance could correspond to a simple resend.
  • a small threshold value e.g. 5
  • the message could be flagged as unsolicited, e.g. by displaying the message with a "Suspicious" or "Junk” prefix in the subject line on the video display 110, and transmitting only the header information, to save the recipient from downloading the message.
  • a second value e.g. 20
  • the thresholds and attributes used for distinguishing a flagged message may be user-configurable .
  • this second threshold is exceeded, it is likely that the message could be a mailbomb or a form of cracker attack, such as the Denial of Service (DoS) .
  • DoS Denial of Service
  • the message could be deleted at . the server.
  • the digests may share a common source but have different destination information as indicated by 330. Again, a counter is incremented in step 342.
  • Different thresholds and actions would correspondingly apply in this instance.
  • the lowest threshold in this case ⁇ may be set at a considerably higher value, such as 100, as a user may legitimately send a same message to a variety of recipients, such as a business newsletter, party invitation or information concerning a change of address.
  • the message could be a spam message or an email worm, as indicated by 344.
  • a. condensed form of the message could be safely delivered to the recipient (comprising the first and last 512 bytes of the message by way of example) , together with an audit notification to inform the recipient that the message is suspected to be unsolicited.
  • This action could inhibit harmful behaviour of the worm whilst enabling the user to verify whether the message is unsolicited and to request the entire message, if necessary.
  • a message could share a common recipient as indicated by 350. Again, a counter for this instance is incremented in step 352. Such a situation could also correspond to both spam or worm activity.
  • a message may share a variety of senders and recipients ..
  • the threshold for taking the action of deleting the message may be set very low (e.g. 40) . Such a message is likely to be an email worm, especially when the subject line is found to match in most cases and the majority of timestamps are found to lie within a very short period, such as seven days or less .
  • a scoring system may be used to lower the threshold required if the file attachments, if any, are found to correspond, or the subject line is found to match.
  • ⁇ options include changing the message attributes so that it may not be delivered or opened other than by a system administrator, and/or may place the file in a "quarantine zone" ; an area of filespace with restricted access for review by a system administrator.
  • quarantine zones are largely conventional in the art, e.g. used by junk and spam mail filtering programs to filter mail which is thought to be unsolicited.
  • these options and thresholds would remain configurable by a system administrator.
  • a further application of the present invention is in tracking the originator of a spam message or worm.
  • a system operator could perform a search of all messages that matched the "hash" value, to determine who sent the first message and where it originated. For instance, the internet protocol (IP) address stored with each message could be used to provide this information. Depending on the number of "fields" stored with the digest, more detailed information could be obtained. Appropriate action could then be taken to identify the perpetrator.
  • IP internet protocol
  • Such a method is more effective than simply scanning the header fields in a single spam message, as these .can be substituted by fraudulent header fields, often by experienced perpetrators of spam messages .
  • a typical example is the substitution of the "received:" header.
  • the software is primarily intended for use in a mail server of an Internet Service Provider (ISP) , or in an organisation' s mail transfer agent (MTA) , the software could also be used at the client end, or mail user agent (MUA) . Such software would function on an identical principle. However, it is preferable for the software to run on a mail server or MTA for the following reasons :
  • the software controls delivery of the messages, and therefore is able to maximise use of a client's bandwidth to transferring legitimate messages.
  • Yet a further extension of the above method is to use the software to maintain a store of digests for outgoing messages.
  • the software could alert a user if he or she is suspected of circulating spam messages, or file attachments which may include a virus worm.

Abstract

A method for monitoring electronic mail messages, each mail message comprising header information and a main body, particularly for protection against virus attacks and unsolicited commercial email (UCE). The method comprises generating a summary digest of only the subject line and the message content of the main body, wherein the message content may comprise textual content and/or attached files. The generated summary digest is stored in a memory, and compared with existing summary digests stored in memory. If the number of matches exceeds a threshold value, an alert signal is raised and appropriate action initiated. A timestamp may be stored with each summary digest, together with sender/recipient details and the internet protocol (IP) address of origin, to aid detection of the originator of the message.

Description

Monitoring electronic mail message digests
Technical Field of the Invention
This invention relates to networked computer systems security in general and protection against Denial of Service (DoS) attacks, virus attacks and unsolicited commercial email in particular. More specifically, this invention concerns an apparatus and method for managing electronic mail message processing.
Background to the Invention
Recent years have witnessed a proliferation in the use of the Internet. Many stand-alone computers and local area networks connect to the Internet for exchanging various items of information and/or communicating with other networks.
Such systems are advantageous in that they can exchange a wide variety of different items of information at a low cost with servers and networks on the Internet and other networks .
However, the inherent accessibility of the Internet increases the vulnerability of a system to threats such as cracker attacks, Denial of Service (DoS) attacks, viruses and unsolicited commercial email (UCE) . Around 5-10 new viruses are discovered each day on the popular Windows-based operating systems. Especially insidious are those that propagate through the Internet, for example by using mail messages as a transport mechanism, known as email worms . The concern for advanced ^security solutions for networked systems in particular is therefore substantial. In contrast to traditional viruses which are designed to spread themselves on a single computer using the file handling capabilities of an operating system, a worm exploits a computer's networking capabilities as the transport mechanism to enable it to infect other machines. Email worms, for instance, may comprise a file attachment to a mail message, or comprise a script or code embedded into the text of a mail message. The worm may exploit scripting capabilities of internet mail client software to send malicious code to other users on a mailing list or newsgroup automatically, and thus appear as having originated from a legitimate sender.
In particular, an email worm may exploit a user's email address book to obtain targets to spread to. To the recipient, the worm would appear- to have been sent by a familiar source, as their presence in the address book would imply that previous email dialogue had taken place. Therefore, a further email apparently from that source would not appear out of the ordinary. Such emails may comprise a suitably benign message as further means of deception to the recipients.
Especially threatening are email worms which require no action on the part of the recipient (such as opening a mail message) to install and activate the malicious code. For instance, some mail clients, such as Outlook Express (™) and Outlook (™) support a "preview" pane which typically scans all unread messages in a user's electronic mailbox. Such a "preview" is sufficient to install and activate such a worm if present amongst the unread messages .
Other types of worms are known. For instance, a worm could spread over a local area network (LAN) , wide area network (WAN) or peer-to-peer network directly by determining the network addresses of other computers on the network and sending copies of its code to these addresses. Such files would appear to originate from a legitimate machine.
A further category of worm spreads using internet relay chat (IRC) , a protocol allowing real time communications between Internet users and other features such as transfer of data and executable files over IRC channels. Popular IRC client software supports scripting features. For example, an IRC script could send a message or data file automatically to specified users when they connect to an IRC channel . Special scripting commands allow execution of DOS and Windows executable files allowing infected scripts to propagate and transfer worm code to other machines .
A growing concern amongst Internet users is that of unsolicited commercial email (UCE) , commonly referred to as- "spam". The ease and low cost of distributing email messages has made mass marketing via email an attractive advertising medium, particularly for bogus homeworking schemes, business and investment opportunities, lottery schemes, money-making clubs and chain letters. Bogus schemes are often characterised by exaggerated earnings claims, glowing testimonials, "no risk" guarantees and legal assurances. Such "spam" messages have led to added costs for both recipients and internet service providers (ISPs) in the form of additional bandwidth, disk space, server resources, and lost productivity. Furthermore, many users consider "spam" messages offensive and an invasion of privacy. The current growth rate of "spam" suggests that the problem may become unmanageable if it continues to grow at the current rate. A known approach for addressing the problem of "spam" is to use a mail filtering system implemented at an Internet Service Provider (ISP) server, organisation's mail transfer agent (MTA) or user terminal . Such a filtering system sorts incoming email into categories, typically determined by the recipient. WO 99/32985 discloses a system and method of filtering junk emails comprising a first list of unapproved email addresses and character strings which a user wishes not to receive and a second list of approved email addresses and character strings which the user- wishes to receive. The first list is periodically updated on the basis of further mail messages that the user rejects. However to remain effective, the filtering instructions must frequently be updated as new junk mail messages appear. Furthermore, it does not aid in tracking the originator of "spam" messages .
EP 0720333A discloses a technique for reducing the quantity of "spam" messages received by a user whereby >a recipient specifier containing non-address information is added to an email message. The mail filter for a given recipient has access to information about that recipient and uses that information together with the non-address information in the email message to determine whether the message should be provided to the given recipient. If the non-address information and the information about the recipient indicate that the given recipient should not receive the message, the filter does not provide it. Such "non-address information", however, comprises two additional components, namely a recipient specifier and a referral list, which accompany the email message. These components, must be adopted by all parties wishing to communicate with the specified user for the method to be effective. Furthermore, said information increases the size of the mail message and hence storage space and bandwidth requirements. In addition, this method is of limited use in detecting email worms as email worms are characterised by their traffic behaviour rather than by any particular sender.
In the method of WO 99/33188, a system is provided for controlling delivery of unsolicited electronic mail messages, whereby "spam" probe email addresses are created and planted at various sites on a communications network in order to ensure their inclusion on large-scale "spam" mailing lists. Upon receipt of incoming email addressed to the spam probe addresses, the received email is analysed automatically to identify the source of the message, characteristic spam source data extracted from the message, and an alert signal generated containing the spam source data. A filtering system implemented at the servers receives the alert signal and updates filtering data using the spam source data retrieved from the alert signal, and controls delivery of subsequently received email messages received from the identified spam source.
One shortcoming of this method is that, whilst convenient for monitoring spam, it is of limited utility against a DoS attack, email worm or other traffic based mechanism, such as an incorrectly configured mail system that accidentally loops messages.
Accordingly, there is a need for a system that automatically and efficiently identifies unsolicited or threatening email messages and controls the delivery of these messages to users, for example by preventing delivery of these messages, or by identifying the messages as unsolicited by displaying the messages in a distinctive display mode. Statement of the Invention
It is an object of the present invention to provide a system and a computer-implemented method for automatically and efficiently identifying recurrent mail messages. Such a method is particularly advantageous for users of electronic mail, for example, in filtering email worms and unsolicited commercial email (UCE) .
According to a first aspect of the present invention, there is provided a method for monitoring electronic mail messages, wherein each mail message may comprise :
(a) subject header information
(b) non-subject header information (c) a main body, comprising message content; the method comprising: receiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non-subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
Such a method is advantageous as a large quantity of data may be rapidly and automatically searched for matches, whilst significantly reducing the amount of data to be stored. When implemented, for example, in a mail server or organisation's mail transfer agent (MTA), a considerable database can be assembled over a time period. This ensures that the searching is extensive and also enables monitoring of traffic activity. Preferably, the step of generating comprises generating a characteristic numerical representation of at least a part of said message content and of said subject header information.
Preferably, the step of generating a characteristic numerical representation of at least a part of said message content comprises generating a characteristic numerical representation of at least textual content in said message content.
Preferably, the step of generating a characteristic numerical representation of at least a part of said message content comprises generating a characteristic numerical representation of at least attached files in said message content . Generating characteristic numerical representations of attached files assists in monitoring of email worms which rely on file attachments in order to propagate. In an embodiment, the step of generating may produce a characteristic numerical representation of said file attachments only. This enables the speed of the searcn to oe increased, tor example, when scanning for virus worms .
Preferably, a timestamp of said electronic mail message is stored. Such a timestamp may be provided from a defined public time service, such as a public time server on the Internet. Preferably, the internet protocol
(IP) address of origin of said electronic mail message is further stored. These assist in detecting of the originator of said mail message.
Preferably, the header information is stored.
Further preferably, said header information comprises message source and destination details. Comparison of these allows determination of whether the message is a simple resend, a bulk circulated email, or "transmitted via multiple routes.
Preferably, said generated characteristic numerical representation is a message digest, for example generated using a message digest 5 (MD5) algorithm or a hash, for example generated using secure hash algorithm 1 (SHA-1) . These algorithms produce a condensed, fixed size hash of 128 bits and 160 bits respectively, and are essentially unique, permitting a fast and straightforward comparison of the electronic mail messages to be made.
Preferably, said method further comprises, if said characteristic numerical representation matches a characteristic numerical representation stored in memory, incrementing a count value associated with said characteristic numerical representation.
Further preferably, said method further comprises comparinq said count value with a predetermined threshold. Still further preferably, said predetermined threshold is determined on the basis of said header information. This enables a different threshold to be set for characteristic numerical representations having the same sender and destination information, a same sender and a different destination, a different sender but a same destination, and different senders and destinations .
If said count value exceeds said predetermined threshold, an alert is preferably raised. Furthermore, said alert is preferably determined on the basis of said header information. Preferably, said alert comprises flagging said message with a marker prefixed to the subject line of the message. In an embodiment, said marker may comprise the word "JUNK", "SUSPICIOUS" and may further be user configurable.
Preferably, said alert comprises delivering a fixed portion of said message only. For example, the first 512 and last 512 bytes of a message could be delivered to the recipient. Preferably, said alert comprises delivering an. audit notification to the recipient. This would enable the recipient to confirm that the message is unsolicited whilst economising on bandwidth.
Preferably, said alert comprises deleting the message. Preferably, said alert comprises deleting a file attachment from the message. This facility is useful in instances where the message has already been sent to a same destination a number of times, where a message is suspected of being an email worm, or in guarding a destination or mail server thereof against cracker attack.
In a preferred embodiment, said method further comprises, allowing said message to be delivered if said characteristic numerical representation matches an approved characteristic numerical representation stored in memory. Such a "white list" of approved characteristic numerical re-presentations may include those of routine test messages, or other routinely sent "bulk" messages .
Preferably, said characteristic numerical representation is stored in cache memory. Further preferably, the maximum number of characteristic numerical repre- sentations stored in cache memory is configurable and said cache memory may preferably be periodically copied to a storage means . The use of cache memory to store recent characteristic numerical representations allows the database of characteristic numerical representations to be searched more quickly.
In accordance with a second aspect of the present invention, there is provided a software product for use in a server or organisation's mail transfer agent (MTA), for monitoring electronic mail messages, wherein each mail message may comprise:
(a) subject header information
(b) non-subject header information (c) a main body, comprising message content; the software containing code for: receiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non- subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
In accordance with a third aspect of the present invention, there is provided a software product for use in a client's mail user agent (UA) for monitoring electronic mail messages, wherein each mail message may compris :
(a) subject header information
(b) non-subject header information
(c) a main body, comprising message content; the software product containing code for: -lireceiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non- subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
In accordance with a fourth aspect of the present invention, there is provided a computer system for monitoring electronic mail messages, wherein each mail message may comprise: (a) subject header information
(b) non- subject header information
(c) a main body, comprising message content; the system comprising: means for receiving an electronic mail message, a processor for generating a characteristic numerical representation of at least a part of said message content, but not of said non-subject header information; a memory for storing said generated characteristic numerical representation; and means for comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
Brief description of the drawings
For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example, to the accompanying drawings, in which: - Figure 1 is a block diagram of part of a computer network operating in accordance with the invention.
Figure 2 is a multi-part email message illustrating message body structure.
Figure 3 is a flowchart illustrating operation of a software product in accordance with the invention.
Detailed description of the preferred embodiments of the invention
Figure 1 of the accompanying drawings illustrates functional blocks of a server 100, such as a simple mail transfer protocol (SMTP) server, operable in accordance with the present invention. Server 100 comprises a central processing unit (CPU) 102 in communication with a memory 104. The CPU 102 can store and retrieve data to and from a storage means 106, . and outputs display information to a video display 108.
Server 100 may be connected to and communicate with a private network 110 such as a local area network (LAN) .
In addition, server 100 may be able to send and receive files to and from a public network 116 sucn as the Internet, using an ISDN, serial, Ethernet or other connection, preferably via a firewall 112 and router 11 . Internet 116 comprises a vast number of computers and computer networks that are connected through communications links.
Alternatively, local area network 110 may itself be connected through a server to another network (not shown) such as the Internet .
Server 100 may further comprise input peripherals such as a terminal having a mouse and/or keyboard (not shown) and output peripherals such as a printer or sound generation hardware, as customary in the art. Server 100 runs operating system and networking software which may be stored on disc or provided in read-only memory (ROM) . Data may be transferred to server 100 via a removable storage means (not shown) or through either of networks 110, 112.
One format of data that may be transferred between servers is electronic mail (email) messages, typically using the Simple Mail Transfer Protocol (SMTP) . A server adapted for this purpose may also be known as a Message Transport System (MTS) . A variety of mail server software implementing the SMTP and other protocols is available for popular hardware and operating systems.
Mail reader software, also known as Mail User Agents
(MUA) , allows end users, such as customers which connect to their ISP using a dial-up modem connection, to access and read their email, typically using -fe e Post Office Protocol (POP3) or Internet Message Access Protocol (IMAP) . In addition, messages may be sent, typically using the SMTP protocol .
The accepted standard format for messages carried by the Internet mail system is defined by Request for Comments (RFC) 822 (published on the Internet) . As shown in Figure 2, email message 200 comprises header information 210, including subject header information 215, typically input by the sender of the message, and non-subject header information, including the identity of the sender and the intended recipient (s) , and the date. The header information is followed by the body 230 of the message, conventionally separated from the header by a blank line 220.
The order of header information 210 is not important in most cases, and many headers 214 are optional, but the mandatory headers 212 that must be present are Date:, From: and one of To:, Cc : or BCC: . The keyword may be in any mixture of capitals and small letters, so CC: is the same as Cc : . The subject header 215 although optional, is generally used.
The main body 230 of the message may comprise plain American Standard Code for Information Interchange (ASCII) text, or may assume a multi-part format allowing textual and non-textual message bodies to be represented and exchanged without loss of information. For instance, the Multipurpose Internet Mail Extensions (MIME) standard was created to address limitations concerning the structure and content of an email. Such a multi-part format, defined by RFC2045 and RFC2046 (published on the Internet) , also permits representation of body text in character sets other than ASCII through content encodings, as well as representation of non-textual material such as images and audio fragments.
-Finally, the -message body may be -appended by an optional "signature" , comprising a signature separator 300 followed by signature text 310. Conventionally, the signature separator 300 comprises two hyphens followed by a space but often may comprise a series of hyphens, asterisks or other characters. The signature text 310 typically includes a user's name, organisation, and contact details.
The structure of a multi-part message will now be described in detail. As Figure 2 shows, further fields 216 relating to multi-part information are added to the header 210. Multi-part headers 216 identify the message 200 as being in multi-part format, and are used by mail reader software to interpret the message 200 correctly. The fields 216 comprise distinctive boundary separator text 218 which is used to separate different parts of the message body 230 pertaining, for example, to textual content, enriched textual content such as HyperText Markup Language (HTML) , and attached files. As an example, assuming a file attachment is present, field 216 may read:
Content-Type: multipart/mixed; boundary="ABCDEFG"
where "ABCDEFG" is the distinctive separator text 218.
Alternatively, the header may read
Content-Type: multipart/alternative; boundary=" BCDEFG"
the use of the word "alternative" indicating by convention that the same content is repeated in different parts of the message but enhanced in some of the repetitions (e.g. message text in both plain and HTML versions) .
The different parts of the message body 230 are separated by boundary separators 240, 260 comprising two hyphens followed by the boundary separator text 218. Although body 230 is shown as consisting of two parts 250 and 270, email message 200 may comprise more or fewer parts.
Any preamble text 232 positioned above the first boundary separator 240 will typically be ignored by a mail reader that understands multi-part format messages but will be displayed by a mail reader which does not support the multi-part format. That is to say, preamble text 232 is not considered part of the message content. Conventionally, such preamble text 232 may therefore be used to alert the recipient to the fact that the mail reader being used is not compliant with the format of the message, as shown.
The structure of the body parts 250, 270 will now be described, again with reference to Figure 2. First body part 250 encloses header fields 252 and a body 256 which is separated from header 252 by a blank line 254. Header fields 252 conventionally define the type of data contained in body 256 and may specify any encoding necessary for the purposes of transfer of information contained in the body 256 through the mail system, so that the receiving program can reverse the process. A file conveyed within a message part body 256 in this way is known as an "attachment" to the message. Alternatively, header fields 252 may be empty, in which instance the body is conventionally interpreted as comprising US-ASCII text.
The structure of second body part 270, and any further body parts, is analogous to body part 220, and the reference numerals indicate like parts. Further body parts may then follow as generally indicated by 280.
To indicate the end of the multi-part body 230, the body 230 is terminated by an end of body separator text 290. End of body separator text 290 comprises two hyphens followed by the distinctive boundary separator text 218, followed by a further two hyphens.
Reference will now be made to Figure 3, which describes the operation of an embodiment of the software in accordance with the invention. Preferably, the software is loaded permanently on a mail server, and remains in a quiescent state until an email message arrives at the server, typically from another server via SMTP. In step 300, the software intercepts an incoming mail message and reads it into memory. The software then determines whether the message is in multi-part format, and, if so, whether there are any file attachments to the message. Typically, this is done by examining the header for the following fields as part of step 300
MIME-Version and Content-Type: multipart/xxx
where xxx may indicate a number of alternatives as customary in the art, such as "mixed", "alternative" and "parallel" .
To determine whether the parts correspond to attached files, the body of the message may be scanned for the header field:
Content-Type : xxx
xxx will typically read "Text/plain" , "Text/html" or similar, for a message without any non-textual file attachments. If file attachments are present, xxx may read "image/jpeg" , "audio/basic" or a number of other possibilities corresponding to a variety of file formats as customary in the art. The header would typically read "Content-Type : multipart/mixed" accordingly.
A characteristic numerical representation is generated for the combined subject line 215 and message content, in step 302. In the multi-part example given, the message content is obtained by taking the body 256 of the first part (text) 250 and the body 276 of the second part (image) 270, that is to say, any multi-part boundary information and any header information of each part is not considered to be part of the message content. Alternatively, if the message were not in the multi-part MIME format, the message content would be considered to be the entire body 230 of the message 200.
Such a characteristic numerical representation, or
"hash" as known to one of skill of the art, is a message digest algorithm which takes a message of arbitrary length and produces a numerical representation comprising a number of bits sufficiently small to form a condensed digest of the original message and allow fast and straight-forward searching, but sufficiently large to be essentially unique.
Example algorithms include Message Digest 5 (MD5) , developed by Rivest in 1991 and documented in Internet RFC1321 (published on the Internet) which takes a message of arbitrary length -and produces a 128-bit message- digest . An alternative is the Secure Hash Algorithm 1 (SHA-1) developed by National Institute of Science and Technology (NIST) which produces a 160-bit message digest. In both these cases, the messages are padded so they are a multiple of 512 bits long and processed in blocks if necessary.
Although in this example, a single digest has been generated for a multi-part message, in an alternative, variations such as generating a first digest for the subject only, a second digest for the message content or separate digests for each body part may be performed. A user, such as a System Administrator, may define which elements would be used in generating the digests.
However, since a characteristic of virus worms is that they may cause the same message to be sent from different sources to different users in the network, it is preferable not to use the non-subject header information in generating the digest, if such a virus is to be detected.
The generated digest (or digests) is stored in a memory, together with the internet protocol (IP) address of where the message came from, sender information and destination details, and a timestamp to serve as a record of when the message was received. For instance, the time from a defined public time service, such as a public time server on the internet, may be used. Other information from the header may be stored such as the Message-ID field. Again, these options may be configured by a System Administrator.
To determine whether the textual content of the message or the attached files match those from previous messages received by the mail server 100, the digest is compared with existing digests stored in memory in step 304 and any matches noted in step 310. To this end, the use of message digest algorithms is advantageous in that they greatly reduce the amount of data to be stored and hence enable fast searching, whilst remaining essentially unique and thus ensuring the probability of conflict with a different mail message is extremely low. These considerations are important given that a typical user may be sent between 20-40 emails on average over a single 24 hour period.
For the method to be most effective, ideally the digest should be compared with those of mail messages sent over approximately the previous seven days, as typical spam messages will recur within a short period. Taking an organisation with 8000 users as' an example, in which approximately one quarter of the emails sent share common content and hence produce a same digest, the total minimum number of digests to be stored at any one time is therefore in the order of 7 days x 30 emails/day x 8000 users x 0.75 = 1,260,000. To accelerate the process, these most recent digests, in this case, those of mail messages sent over the last seven days, are stored in a cache memory 306 which could be flushed to disk periodically.
If the digest does not match an existing digest stored in cache memory 306, a check is performed in step 390 to determine whether there are further messages waiting to be processed by the mail server, and if so, the process will loop back to step 300 and read the next message in to memory. -If no more messages are waiting, the process returns to a quiescent state in which it may perform background tasks such as transferring older hashes out of cache memory and onto disk whilst waiting for further messages to arrive.
Alternatively, if the digest matches an existing digest stored in memory, the following operations are performed :
Initially, the digest is compared in step 320 against a "white list" of approved digests believed to be harmless . These typically include test messages sent routinely, particularly by new users, to determine whether a mail system is functioning correctly. If the message is found to match such a digest, as indicated by 322, the message is allowed to proceed and the process will loop back to step 300 accordingly.
Alternatively, if the digest matched is not an "approved" digest, the sender/destination information (as indicated by the From:, To:, Cc : and Bcc: headers) and/or other header information stored with those hashes are compared. Firstly, in step 330, a check is performed to see if the digests share a common source and destination, and if so, a count value for this instance is incremented in step 332. One or more of a variety of actions may be performed, depending on the sender/recipient information and on how large a count value has accrued for the hash in question.
The most basic action possible is to allow the message to continue. This is appropriate, if the tally lies .below a small threshold value (e.g. 5) , as the sender and recipient details are the same. Such an instance could correspond to a simple resend.
However, if the threshold -is -exceeded, but still lies below a second value (e.g. 20) , the message could be flagged as unsolicited, e.g. by displaying the message with a "Suspicious" or "Junk" prefix in the subject line on the video display 110, and transmitting only the header information, to save the recipient from downloading the message. Again, the thresholds and attributes used for distinguishing a flagged message may be user-configurable .
Finally, if this second threshold is exceeded, it is likely that the message could be a mailbomb or a form of cracker attack, such as the Denial of Service (DoS) . In this case, the message could be deleted at. the server. A second possibility is that the digests may share a common source but have different destination information as indicated by 330. Again, a counter is incremented in step 342. Different thresholds and actions would correspondingly apply in this instance. As a variety of recipients are involved, the lowest threshold in this case may be set at a considerably higher value, such as 100, as a user may legitimately send a same message to a variety of recipients, such as a business newsletter, party invitation or information concerning a change of address. However, it is more conventional for a user to do this by specifying multiple destinations in the header information rather than to send separate messages each containing the same information.
If this threshold is exceeded, the message could be a spam message or an email worm, as indicated by 344. As well as flagging a message as suspicious, a. condensed form of the message could be safely delivered to the recipient (comprising the first and last 512 bytes of the message by way of example) , together with an audit notification to inform the recipient that the message is suspected to be unsolicited. This action could inhibit harmful behaviour of the worm whilst enabling the user to verify whether the message is unsolicited and to request the entire message, if necessary.
In yet a further possibility, a message could share a common recipient as indicated by 350. Again, a counter for this instance is incremented in step 352. Such a situation could also correspond to both spam or worm activity. In still yet a further alternative, a message may share a variety of senders and recipients .. The threshold for taking the action of deleting the message may be set very low (e.g. 40) . Such a message is likely to be an email worm, especially when the subject line is found to match in most cases and the majority of timestamps are found to lie within a very short period, such as seven days or less . A scoring system may be used to lower the threshold required if the file attachments, if any, are found to correspond, or the subject line is found to match.
It will be understood that this aspect of the process is subject to variations as customary in the art. For example, other~ options include changing the message attributes so that it may not be delivered or opened other than by a system administrator, and/or may place the file in a "quarantine zone" ; an area of filespace with restricted access for review by a system administrator. Such quarantine zones are largely conventional in the art, e.g. used by junk and spam mail filtering programs to filter mail which is thought to be unsolicited. Typically, these options and thresholds would remain configurable by a system administrator.
A further application of the present invention is in tracking the originator of a spam message or worm. A system operator could perform a search of all messages that matched the "hash" value, to determine who sent the first message and where it originated. For instance, the internet protocol (IP) address stored with each message could be used to provide this information. Depending on the number of "fields" stored with the digest, more detailed information could be obtained. Appropriate action could then be taken to identify the perpetrator. Such a method is more effective than simply scanning the header fields in a single spam message, as these .can be substituted by fraudulent header fields, often by experienced perpetrators of spam messages . A typical example is the substitution of the "received:" header.
There is thus described a method, software product and a computer system which provide for detecting email worms and spam messages .
Although the software is primarily intended for use in a mail server of an Internet Service Provider (ISP) , or in an organisation' s mail transfer agent (MTA) , the software could also be used at the client end, or mail user agent (MUA) . Such software would function on an identical principle. However, it is preferable for the software to run on a mail server or MTA for the following reasons :
(1) the software controls delivery of the messages, and therefore is able to maximise use of a client's bandwidth to transferring legitimate messages.
(2) the database of previous digests to compare with is much larger than available on a client's machine, and so there is a relatively high chance of spotting a spam message or email worm.
The ability to run the software on a end user's machine is of use, however, where the user's ISP does not run software of this type, or the user has a need to download mail from several POP3 or IMAP accounts.
Yet a further extension of the above method is to use the software to maintain a store of digests for outgoing messages. The software could alert a user if he or she is suspected of circulating spam messages, or file attachments which may include a virus worm.
It is noted that the various options described above may be programmed or configured by a user and that the above detailed description of preferred embodiments of the invention is provided by way of example only. Other modifications which are obvious to a person skilled in the art may be made without departing from the true scope of the invention, as defined in the appended claims.

Claims

1. A method for monitoring electronic mail messages, wherein each mail message may comprise: (a) subject header information
(b) non-subject header information
(c) a main body, comprising message content; the method comprising: receiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non-subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
2. A method for monitoring electronic mail messages as claimed in claim 1, in which said step of generating comprises generating a characteristic numerical representation of at least a part of said message content and of said subject header information.
3. A method for monitoring electronic mail messages as claimed in claims 1 or 2, in which said step of generating a characteristic numerical representation of at least a part of said message content comprises generating a characteristic numerical representation of at least textual content in said message content.
4. A method for monitoring electronic mail messages as claimed in any preceding claim, in which said step of generating a characteristic numerical representation of at least a part of said message content comprises generating a characteristic numerical representation of at least attached files in said message content.
5. A method for monitoring electronic mail messages as claimed in any preceding claim, in which the step of storing further comprises storing a timestamp of said electronic mail message .
6. A method for monitoring electronic mail messages as claimed in any preceding claim, in which the step of storing further comprises storing the internet protocol (IP) address of origin of said electronic mail message.
7. A method for monitoring electronic mail messages as claimed in any preceding claim, in which said step of storing further comprises storing header information of said electronic mail message.
8. A method for monitoring electronic mail messages as claimed in claim 7, in which said step of storing further comprises storing message source and destination details.
9. A method for monitoring electronic mail messages as claimed in any preceding claim, in which the characteristic numerical representation is a message digest .
10. A method for monitoring electronic mail messages as claimed in claim 9, in which said message digest is generated using a message digest 5 (MD5) algorithm.
11. A method for monitoring electronic mail messages as claimed in claim 9, in which said message digest is generated using a secure hash algorithm 1 (SHA-1) algorithm.
12. A method for monitoring electronic mail messages as claimed in any preceding claim, further comprising, if said characteristic numerical representation matches a characteristic numerical representation stored in memory, the step of incrementing a count value associated with said characteristic numerical representation.
13. A method for monitoring electronic mail messages as claimed in claim 12, further comprising the step of comparing said count value with a predetermined threshold.
14. A method for monitoring electronic mail messages as claimed in claim 13 , in which said predetermined threshold is determined on the basis of said header information.
15. A method for monitoring electronic mail messages as claimed in claims 13 or 14, further comprising, if said count value exceeds said predetermined threshold, raising an alert .
16. A method for monitoring electronic mail messages as claimed in claim 15, in which said alert is determined on the basis of said header information.
17. A method for monitoring electronic mail messages as claimed in claims 15 or 16, in which said alert comprises a marker prefixed to the subject line of the message.
18. A method for monitoring electronic mail messages as claimed in claims 15 to 17, in which said alert comprises delivering a fixed portion of said message.
19. A method for monitoring electronic mail messages as claimed in claims 15 to 18, in which said alert comprises delivering an audit notification to the recipient.
20. A method for monitoring electronic mail messages as claimed in claims 15 to 19, in which said alert comprises deleting said message.
21. A method for monitoring electronic mail messages as claimed in claims 15 to 19, in which- said alert comprises deleting a file attachment from the message.
22. A method for monitoring electronic mail messages as claimed in any preceding claim, comprising allowing said message to be delivered if said characteristic numerical representation matches an approved characteristic numerical representation stored in memory.
23. A method for monitoring electronic mail messages as claimed in any preceding claim, in which said characteristic numerical representation is stored in cache memory.
24. A method for monitoring electronic mail messages as claimed in claim 23, in which said cache memory may be periodically copied to a storage means .
25. A software product for use in a server or organisation' s mail transfer agent (MTA) , for monitoring electronic mail messages, wherein each mail message may comprise: (a) subject header information
(b) non-subject header information
(c) a main body, comprising message content; the software containing code for: receiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non- subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
26. A software product for use in a client's mail user agent (UA) , for monitoring electronic mail messages, wherein each mail message may comprise:
(a) subject header information
(b) non- subject header information
(c) a main body, comprising message content; the software containing code for: receiving an electronic mail message, generating a characteristic numerical representation of at least a part of said message content, but not of said non- subject header information; storing said generated characteristic numerical representation in a memory; and comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
27. A software product as claimed in claims 25 or 26, in which said code for generating comprises code for generating a characteristic numerical representation of at least a part of said message content and of said subject header information.
28. A software product as claimed" in any preceding claim, in which said code for generating a characteristic numerical representation of at least a part of said message content comprises code for generating a characteristic numerical representation of at least textual content in said message content.
29. A software product as claimed in any preceding claim, in which said code for generating a characteristic numerical representation of at least a part of said message content comprises code for generating a characteristic numerical representation of at least attached files in said message content.
30. A software product as claimed in any preceding claim, in which the code for storing further comprises code for storing a timestamp of said electronic mail message .
31. A software product as claimed in any preceding claim, in which the code for storing further comprises code for storing the internet protocol (IP) address of origin of said electronic mail message.
32. A software product as claimed in any preceding claim, in which said code for storing further comprises code for storing header information of said electronic mail message.
33. A software product as claimed in claim 32, in which said code for storing further comprises code for storing message source and destination details.
34. A software product as claimed in any preceding claim, in which the characteristic numerical representation is a message digest.
35. A software product as claimed in claim 34, in which said message digest is generated using a message digest
5 (MD5) algorithm.
36. A software product as claimed in claim 35, in which said message digest is generated using a secure hash algorithm 24 (SHA-1) algorithm.
37. A software product as claimed in any preceding claim, further comprising, if said characteristic numerical representation matches a characteristic numerical representation stored in memory, code for incrementing a count value associated with said characteristic numerical representation.
38. A software product as claimed in claim 37, further comprising code for comparing said count value with a predetermined threshold.
39. A software product as claimed in claim 37, in which said predetermined threshold is determined on the basis of said header information.
40. A software product as claimed in claims 38 or 39, further comprising, if said count value exceeds said predetermined threshold, code for raising an alert .
41. A software product as claimed in claim 40, in which said alert is determined on the basis of said header information.
42. A software product as claimed in claims 40 or 41, in which said alert comprises a marker prefixed to the subject line of the message.
43. A software product as claimed in claims 40 to 42, in which said alert comprises delivering a fixed portion of said message .
44. A software product as claimed in claims 40 to 43, in which said alert comprises delivering an audit notification to the recipient.
45. A software product as claimed in claims 40 to 44, in which said alert comprises deleting said message.
46. A software product as claimed in claims 40 to 44, in which said alert comprises deleting a file attachment from the message.
47. A software product as claimed in any preceding claim, comprising code for allowing said message to be delivered if said characteristic numerical representation matches an approved characteristic numerical representation stored in memory.
48. A software product as claimed in any preceding claim, in which said characteristic numerical representation is stored in cache memory.
49. A software product as claimed in claim 48, in which said cache memory may be periodically copied to a storage means .
50. A computer system for monitoring electronic mail messages, wherein each mail message may comprise:
(a) subject header information
(b) non-subject header information (c) a main body, comprising message content; the system containing: means for receiving an electronic mail message, means for generating a characteristic numerical representation of at least a part of said message content, but not of said non-subject header information; a memory for storing said generated characteristic numerical representation; and means for comparing said characteristic numerical representation with each characteristic numerical representation stored in memory.
51. A computer system for monitoring electronic mail messages as claimed in claim 50, in which said means for generating comprises means for generating a characteristic numerical representation of at least a part of said message content and of said subject header information.
52. A computer system for monitoring electronic mail messages as claimed in claims 50 or 51, in which said means for generating a characteristic numerical representation of at least a part of said message content comprises means for generating a characteristic numerical representation of at least textual content in said message content.
53. A computer system for monitoring electronic mail messages as claimed in claims 51 or 52, in which said means for generating a characteristic numerical representation of at least a part of said message content comprises means for generating a characteristic numerical representation of at least attached files in said message content .
54. . A computer system for monitoring electronic mail messages as claimed in any preceding claim, in which the means for storing further comprises means for storing a timestamp of said electronic mail message.
55. A computer system for monitoring electronic mail messages as claimed in any preceding claim, in which the means for storing further comprises means for storing the internet protocol (IP) address of origin of said electronic mail message.
56. A computer system for monitoring electronic mail messages as claimed in any preceding claim, in which said means for storing further comprises means for storing header information of said electronic mail message .
57. A computer system for monitoring electronic mail messages as claimed in- claim 56, in which said means- for storing further comprises means for storing message source and destination details.
58. A computer system for monitoring electronic mail messages as claimed in any preceding claim, in which the characteristic numerical representation is a message digest .
59. A computer system for monitoring electronic mail messages as claimed in claim 58, in which said message digest is generated using a message digest 5 (MD5) algorithm.
60 . A computer system for monitoring electronic mail messages as claimed in claim 58, in which said message digest is generated using a secure hash algorithm 1 (SHA- 1) algorithm.
61. A computer system for monitoring electronic mail messages as claimed in any preceding claim, further comprising, if said characteristic numerical representation matches a characteristic numerical representation stored in memory, means for incrementing a count value associated with said characteristic numerical representation.
62. A computer system for monitoring electronic mail messages as claimed in claim 61, further comprising means for comparing said count value with a predetermined threshold.
63. A computer system for monitoring electronic mail messages as claimed in claim 62, in which said predetermined threshold is determined on the basis of said header information.
64. A computer system for monitoring electronic mail messages as claimed in claims 62 or 63, further comprising, if said count value exceeds said predetermined threshold, means for raising an alert.
65. A computer system for monitoring electronic mail messages as claimed in claim 64, in which said alert is determined on the basis of said header information.
66. A computer system for monitoring electronic mail messages as claimed in claims 64 or 65, in which said alert comprises a marker prefixed to the subject line of the message.
67. A computer system for monitoring electronic mail messages as claimed in claims 64 to 66, in which said alert comprises means for delivering a fixed portion of said message.
68. A computer system for monitoring electronic mail messages as claimed in claims 64 to 67, in which said alert comprises delivering an audit notification to the recipient.
69. A computer system for monitoring electronic mail messages as claimed in claims 64 to 68, in which said alert comprises deleting said message .
70. A computer system for monitoring electronic mail messages as claimed in claims 64 to 68, in which said alert comprises deleting a file attachment from the message .
71. A computer system for monitoring electronic mail messages as claimed in any preceding claim, comprising means -for allowing said- message to be delivered if said characteristic numerical representation matches an approved characteristic numerical representation stored in memory.
72. A computer system for monitoring electronic mail messages as claimed in any preceding claim, in which said characteristic numerical representation is stored in cache memory.
73. A computer system for monitoring electronic mail messages as claimed in claim 72, in which said cache memory may be periodically copied to a storage means.
74. A method for monitoring electronic mail messages substantially as described herein with reference to Figure 3 of the accompanying drawings.
75. A software product for use in a server or organisation's mail transfer agent (MTA) for monitoring electronic mail messages substantially as described herein with reference to Figure 3 of the accompanying drawings .
76. A software product for use in mail user agent (UA) for monitoring electronic mail messages substantially as described herein with reference to Figure 3 of the accompanying drawings .
77. A computer system for monitoring electronic mail messages substantially as described herein with reference to Figure 3 of the accompanying drawings .
PCT/GB2001/003852 2000-08-31 2001-08-29 Monitoring electronic mail message digests WO2002019069A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP01960974A EP1368719A2 (en) 2000-08-31 2001-08-29 Monitoring electronic mail message digests
AU2001282359A AU2001282359A1 (en) 2000-08-31 2001-08-29 Monitoring electronic mail message digests
US10/362,840 US7801960B2 (en) 2000-08-31 2001-08-29 Monitoring electronic mail message digests

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0021444.5 2000-08-31
GB0021444A GB2366706B (en) 2000-08-31 2000-08-31 Monitoring electronic mail messages digests

Publications (2)

Publication Number Publication Date
WO2002019069A2 true WO2002019069A2 (en) 2002-03-07
WO2002019069A3 WO2002019069A3 (en) 2003-10-09

Family

ID=9898629

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2001/003852 WO2002019069A2 (en) 2000-08-31 2001-08-29 Monitoring electronic mail message digests

Country Status (5)

Country Link
US (1) US7801960B2 (en)
EP (1) EP1368719A2 (en)
AU (1) AU2001282359A1 (en)
GB (1) GB2366706B (en)
WO (1) WO2002019069A2 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1280039A2 (en) * 2001-07-26 2003-01-29 Networks Associates Technology, Inc. Detecting e-mail propagated malware
WO2004098148A1 (en) * 2003-04-25 2004-11-11 Messagelabs Limited A method of, and system for detecting mass mailing computer viruses
WO2005081477A1 (en) * 2004-02-17 2005-09-01 Ironport Systems, Inc. Collecting, aggregating, and managing information relating to electronic messages
EP1644784A2 (en) * 2003-06-25 2006-04-12 Nokia Inc. Two-phase hash value matching technique in message protection systems
US7219131B2 (en) 2003-01-16 2007-05-15 Ironport Systems, Inc. Electronic message delivery using an alternate source approach
EP1422872B1 (en) * 2002-11-20 2007-05-30 Societé Française du Radiotéléphone Modular method and device for the tracing of a multimedia message through a telecommunications network
EP2169897A1 (en) * 2008-09-25 2010-03-31 Avira GmbH Computer-based method for the prioritization of potential malware sample messages
US7711779B2 (en) 2003-06-20 2010-05-04 Microsoft Corporation Prevention of outgoing spam
WO2010059735A3 (en) * 2008-11-18 2010-07-29 Qualcomm Incorporated Method and apparatus for delivering and receiving enhanced emergency broadcast alert messages
US7836133B2 (en) 2005-05-05 2010-11-16 Ironport Systems, Inc. Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources
US7917588B2 (en) 2004-05-29 2011-03-29 Ironport Systems, Inc. Managing delivery of electronic messages using bounce profiles
US7930353B2 (en) 2005-07-29 2011-04-19 Microsoft Corporation Trees of classifiers for detecting email spam
US8046832B2 (en) 2002-06-26 2011-10-25 Microsoft Corporation Spam detector with challenges
US8166310B2 (en) 2004-05-29 2012-04-24 Ironport Systems, Inc. Method and apparatus for providing temporary access to a network device
US8271596B1 (en) 2000-05-16 2012-09-18 Ziplink, Inc. Apparatus and methods for controlling the transmission of messages
WO2013112062A1 (en) * 2012-01-25 2013-08-01 Bitdefender Ipr Management Ltd Systems and methods for spam detection using character histograms
US8612560B2 (en) 2004-02-10 2013-12-17 Sonicwall, Inc. Message classification using domain name and IP address extraction
US9130778B2 (en) 2012-01-25 2015-09-08 Bitdefender IPR Management Ltd. Systems and methods for spam detection using frequency spectra of character strings
CN107707448A (en) * 2016-08-09 2018-02-16 迈买有限责任公司 User is allowed to change the electronic message delivery platform of message content and annex after transmission
EP3716177A1 (en) * 2019-03-25 2020-09-30 IPCO 2012 Limited A method, apparatus and computer program for verifying the integrity of electronic messages
WO2022250909A1 (en) * 2021-05-28 2022-12-01 Microsoft Technology Licensing, Llc A personalized communication text compression system

Families Citing this family (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7032023B1 (en) * 2000-05-16 2006-04-18 America Online, Inc. Throttling electronic communications from one or more senders
US7711790B1 (en) * 2000-08-24 2010-05-04 Foundry Networks, Inc. Securing an accessible computer system
US7725587B1 (en) * 2000-08-24 2010-05-25 Aol Llc Deep packet scan hacker identification
KR20040007435A (en) * 2001-02-12 2004-01-24 오티쥐 소프트웨어, 인코퍼레이션 System and method of indexing unique electronic mail messages and uses for the same
DE10115428A1 (en) * 2001-03-29 2002-10-17 Siemens Ag Procedure for detecting an unsolicited email
US7103599B2 (en) * 2001-05-15 2006-09-05 Verizon Laboratories Inc. Parsing of nested internet electronic mail documents
US6957259B1 (en) * 2001-06-25 2005-10-18 Bellsouth Intellectual Property Corporation System and method for regulating emails by maintaining, updating and comparing the profile information for the email source to the target email statistics
US7716287B2 (en) * 2004-03-05 2010-05-11 Aol Inc. Organizing entries in participant lists based on communications strengths
US20030120722A1 (en) * 2001-12-20 2003-06-26 Forkner Damien R. Persistent process software architecture
US7225343B1 (en) * 2002-01-25 2007-05-29 The Trustees Of Columbia University In The City Of New York System and methods for adaptive model generation for detecting intrusions in computer systems
US7069316B1 (en) * 2002-02-19 2006-06-27 Mcafee, Inc. Automated Internet Relay Chat malware monitoring and interception
AUPS193202A0 (en) * 2002-04-23 2002-05-30 Pickup, Robert Barkley Mr A method and system for authorising electronic mail
US7237008B1 (en) * 2002-05-10 2007-06-26 Mcafee, Inc. Detecting malware carried by an e-mail message
US20040054742A1 (en) * 2002-06-21 2004-03-18 Shimon Gruper Method and system for detecting malicious activity and virus outbreak in email
US20040111423A1 (en) * 2002-07-13 2004-06-10 John Irving Method and system for secure, community profile generation and access via a communication system
US20040122692A1 (en) * 2002-07-13 2004-06-24 John Irving Method and system for interactive, multi-user electronic data transmission in a multi-level monitored and filtered system
US20040103122A1 (en) * 2002-07-13 2004-05-27 John Irving Method and system for filtered web browsing in a multi-level monitored and filtered system
US8838622B2 (en) * 2002-07-13 2014-09-16 Cricket Media, Inc. Method and system for monitoring and filtering data transmission
US20040103118A1 (en) * 2002-07-13 2004-05-27 John Irving Method and system for multi-level monitoring and filtering of electronic transmissions
EP1567963B1 (en) * 2002-12-03 2008-07-30 Research In Motion Limited Method, system and computer software product for pre-selecting a folder for a message
US7533148B2 (en) 2003-01-09 2009-05-12 Microsoft Corporation Framework to enable integration of anti-spam technologies
US7676546B2 (en) * 2003-03-25 2010-03-09 Verisign, Inc. Control and management of electronic messaging
US20050144242A1 (en) * 2003-10-31 2005-06-30 Justin Marston Caching in an electronic messaging system
US7730137B1 (en) 2003-12-22 2010-06-01 Aol Inc. Restricting the volume of outbound electronic messages originated by a single entity
US7548956B1 (en) 2003-12-30 2009-06-16 Aol Llc Spam control based on sender account characteristics
JP4297345B2 (en) * 2004-01-14 2009-07-15 Kddi株式会社 Mass mail detection method and mail server
US20050188040A1 (en) * 2004-02-02 2005-08-25 Messagegate, Inc. Electronic message management system with entity risk classification
US8214438B2 (en) * 2004-03-01 2012-07-03 Microsoft Corporation (More) advanced spam detection features
EP1751937A1 (en) * 2004-05-12 2007-02-14 Bluespace Group Ltd Enforcing compliance policies in a messaging system
US20060031352A1 (en) * 2004-05-12 2006-02-09 Justin Marston Tamper-proof electronic messaging
US7870200B2 (en) * 2004-05-29 2011-01-11 Ironport Systems, Inc. Monitoring the flow of messages received at a server
US7624445B2 (en) * 2004-06-15 2009-11-24 International Business Machines Corporation System for dynamic network reconfiguration and quarantine in response to threat conditions
US7596603B2 (en) * 2004-06-30 2009-09-29 International Business Machines Corporation Automatic email consolidation for multiple participants
US8631077B2 (en) 2004-07-22 2014-01-14 International Business Machines Corporation Duplicate e-mail content detection and automatic doclink conversion
JP4947883B2 (en) * 2004-07-30 2012-06-06 キヤノン株式会社 COMMUNICATION DEVICE, CONTROL METHOD, AND PROGRAM
US20060041625A1 (en) 2004-08-19 2006-02-23 International Business Machines Corporation System and method for sectional e-mail transmission
FI123195B (en) * 2004-11-22 2012-12-14 Mavenir Systems Oy Processing of messages sent over telecommunications networks
WO2006065989A2 (en) * 2004-12-15 2006-06-22 Tested Technologies Corporation Method and system for detecting and stopping illegitimate communication attempts on the internet
US20060253572A1 (en) * 2005-04-13 2006-11-09 Osmani Gomez Method and system for management of an electronic mentoring program
US20060277259A1 (en) * 2005-06-07 2006-12-07 Microsoft Corporation Distributed sender reputations
GB2427048A (en) 2005-06-09 2006-12-13 Avecho Group Ltd Detection of unwanted code or data in electronic mail
US20070016951A1 (en) * 2005-07-13 2007-01-18 Piccard Paul L Systems and methods for identifying sources of malware
US7992205B2 (en) * 2005-08-12 2011-08-02 Cisco Technology, Inc. Method and system device for deterring spam over internet protocol telephony and spam instant messaging
US20070041372A1 (en) * 2005-08-12 2007-02-22 Rao Anup V Method and system for deterring SPam over Internet Protocol telephony and SPam Instant Messaging
US7975297B2 (en) 2005-08-16 2011-07-05 Microsoft Corporation Anti-phishing protection
US20070061402A1 (en) * 2005-09-15 2007-03-15 Microsoft Corporation Multipurpose internet mail extension (MIME) analysis
US7730141B2 (en) 2005-12-16 2010-06-01 Microsoft Corporation Graphical interface for defining mutually exclusive destinations
WO2007082308A2 (en) * 2006-01-13 2007-07-19 Bluespace Software Corp. Determining relevance of electronic content
US20070180031A1 (en) * 2006-01-30 2007-08-02 Microsoft Corporation Email Opt-out Enforcement
US7814116B2 (en) * 2006-03-16 2010-10-12 Hauser Eduardo A Method and system for creating customized news digests
US8201243B2 (en) * 2006-04-20 2012-06-12 Webroot Inc. Backwards researching activity indicative of pestware
US20070250818A1 (en) * 2006-04-20 2007-10-25 Boney Matthew L Backwards researching existing pestware
US8181244B2 (en) * 2006-04-20 2012-05-15 Webroot Inc. Backward researching time stamped events to find an origin of pestware
US20070294396A1 (en) * 2006-06-15 2007-12-20 Krzaczynski Eryk W Method and system for researching pestware spread through electronic messages
US7865555B2 (en) 2006-06-19 2011-01-04 Research In Motion Limited Apparatus, and associated method, for alerting user of communication device of entries on a mail message distribution list
US8028335B2 (en) 2006-06-19 2011-09-27 Microsoft Corporation Protected environments for protecting users against undesirable activities
US7734703B2 (en) * 2006-07-18 2010-06-08 Microsoft Corporation Real-time detection and prevention of bulk messages
US8190868B2 (en) 2006-08-07 2012-05-29 Webroot Inc. Malware management through kernel detection
CN101166159B (en) * 2006-10-18 2010-07-28 阿里巴巴集团控股有限公司 A method and system for identifying rubbish information
WO2008073655A2 (en) * 2006-11-08 2008-06-19 Epals, Inc. Dynamic characterization of nodes in a semantic network
US10636315B1 (en) 2006-11-08 2020-04-28 Cricket Media, Inc. Method and system for developing process, project or problem-based learning systems within a semantic collaborative social network
KR100859664B1 (en) * 2006-11-13 2008-09-23 삼성에스디에스 주식회사 Method for detecting a virus pattern of email
US9729513B2 (en) 2007-11-08 2017-08-08 Glasswall (Ip) Limited Using multiple layers of policy management to manage risk
US20100138754A1 (en) 2007-09-21 2010-06-03 Research In Motion Limited Message distribution warning indication
US8265665B2 (en) * 2007-09-21 2012-09-11 Research In Motion Limited Color differentiating a portion of a text message shown in a listing on a handheld communication device
US8103628B2 (en) * 2008-04-09 2012-01-24 Harmonic Inc. Directed placement of data in a redundant data storage system
EP2300926A4 (en) * 2008-05-08 2013-07-31 Epals Inc Object-based system and language for dynamic data or network interaction including learning management
US8826450B2 (en) * 2008-09-19 2014-09-02 Yahoo! Inc. Detecting bulk fraudulent registration of email accounts
US8996622B2 (en) * 2008-09-30 2015-03-31 Yahoo! Inc. Query log mining for detecting spam hosts
US20100161734A1 (en) * 2008-12-22 2010-06-24 Yahoo! Inc. Determining spam based on primary and secondary email addresses of a user
CN102439583A (en) * 2009-03-05 2012-05-02 e帕尔斯公司 System and method for managing and monitoring electronic communications
US11489857B2 (en) 2009-04-21 2022-11-01 Webroot Inc. System and method for developing a risk profile for an internet resource
US8959157B2 (en) * 2009-06-26 2015-02-17 Microsoft Corporation Real-time spam look-up system
WO2011046899A1 (en) 2009-10-13 2011-04-21 Epals, Inc. Dynamic collaboration in social networking environment
US9223859B2 (en) 2011-05-11 2015-12-29 Here Global B.V. Method and apparatus for summarizing communications
US8584211B1 (en) 2011-05-18 2013-11-12 Bluespace Software Corporation Server-based architecture for securely providing multi-domain applications
US9137193B2 (en) * 2012-12-06 2015-09-15 Linkedin Corporation Increasing the relevance of digest emails to group members
US20140176336A1 (en) * 2012-12-21 2014-06-26 eLuminon, LLC. System, method, and apparatus for remotely monitoring surge arrester conditions
US10033675B2 (en) * 2013-03-13 2018-07-24 International Business Machines Corporation Digest filtering system and method
KR20170117610A (en) * 2013-05-16 2017-10-23 콘비다 와이어리스, 엘엘씨 Semantic naming model
US10313286B2 (en) * 2013-07-11 2019-06-04 Blackberry Limited Qualified email headers
GB2518880A (en) 2013-10-04 2015-04-08 Glasswall Ip Ltd Anti-Malware mobile content data management apparatus and method
CN104135427A (en) * 2014-07-30 2014-11-05 武汉传神信息技术有限公司 Mail generating method and resolving method
US9330264B1 (en) 2014-11-26 2016-05-03 Glasswall (Ip) Limited Statistical analytic method for the determination of the risk posed by file based content
US20160205124A1 (en) * 2015-01-14 2016-07-14 Korea Internet & Security Agency System and method for detecting mobile cyber incident
US10664536B2 (en) 2015-12-18 2020-05-26 Microsoft Technology Licensing, Llc Consumption of user-filtered data on a client device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999033188A2 (en) * 1997-12-23 1999-07-01 Bright Light Technologies, Inc. Apparatus and method for controlling delivery of unsolicited electronic mail

Family Cites Families (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619648A (en) 1994-11-30 1997-04-08 Lucent Technologies Inc. Message filtering techniques
US5812668A (en) * 1996-06-17 1998-09-22 Verifone, Inc. System, method and article of manufacture for verifying the operation of a remote transaction clearance system utilizing a multichannel, extensible, flexible architecture
US5889863A (en) * 1996-06-17 1999-03-30 Verifone, Inc. System, method and article of manufacture for remote virtual point of sale processing utilizing a multichannel, extensible, flexible architecture
US5850446A (en) * 1996-06-17 1998-12-15 Verifone, Inc. System, method and article of manufacture for virtual point of sale processing utilizing an extensible, flexible architecture
US6026379A (en) * 1996-06-17 2000-02-15 Verifone, Inc. System, method and article of manufacture for managing transactions in a high availability system
US6072870A (en) * 1996-06-17 2000-06-06 Verifone Inc. System, method and article of manufacture for a gateway payment architecture utilizing a multichannel, extensible, flexible architecture
US6324525B1 (en) * 1996-06-17 2001-11-27 Hewlett-Packard Company Settlement of aggregated electronic transactions over a network
GB2317793B (en) 1996-09-18 2001-03-28 Secure Computing Corp System and method of electronic mail filtering
US5931917A (en) * 1996-09-26 1999-08-03 Verifone, Inc. System, method and article of manufacture for a gateway system architecture with system administration information accessible from a browser
US6314190B1 (en) * 1997-06-06 2001-11-06 Networks Associates Technology, Inc. Cryptographic system with methods for user-controlled message recovery
US5978475A (en) * 1997-07-18 1999-11-02 Counterpane Internet Security, Inc. Event auditing system
AU8880198A (en) 1997-09-16 1999-04-05 British Telecommunications Public Limited Company Messaging system
WO1999032985A1 (en) 1997-12-22 1999-07-01 Accepted Marketing, Inc. E-mail filter and method thereof
US6161181A (en) * 1998-03-06 2000-12-12 Deloitte & Touche Usa Llp Secure electronic transactions using a trusted intermediary
US6145079A (en) * 1998-03-06 2000-11-07 Deloitte & Touche Usa Llp Secure electronic transactions using a trusted intermediary to perform electronic services
US6199052B1 (en) * 1998-03-06 2001-03-06 Deloitte & Touche Usa Llp Secure electronic transactions using a trusted intermediary with archive and verification request services
US6314421B1 (en) * 1998-05-12 2001-11-06 David M. Sharnoff Method and apparatus for indexing documents for message filtering
US6161130A (en) * 1998-06-23 2000-12-12 Microsoft Corporation Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set
US6226618B1 (en) * 1998-08-13 2001-05-01 International Business Machines Corporation Electronic content delivery system
US6959288B1 (en) * 1998-08-13 2005-10-25 International Business Machines Corporation Digital content preparation system
AU1122100A (en) * 1998-10-30 2000-05-22 Justsystem Pittsburgh Research Center, Inc. Method for content-based filtering of messages by analyzing term characteristicswithin a message
US6330590B1 (en) * 1999-01-05 2001-12-11 William D. Cotten Preventing delivery of unwanted bulk e-mail
GB2347053A (en) * 1999-02-17 2000-08-23 Argo Interactive Limited Proxy server filters unwanted email
JP3644580B2 (en) * 1999-03-19 2005-04-27 富士通株式会社 Display control method and apparatus
US6671805B1 (en) * 1999-06-17 2003-12-30 Ilumin Corporation System and method for document-driven processing of digitally-signed electronic documents
US6895507B1 (en) * 1999-07-02 2005-05-17 Time Certain, Llc Method and system for determining and maintaining trust in digital data files with certifiable time
US6898709B1 (en) * 1999-07-02 2005-05-24 Time Certain Llc Personal computer system and methods for proving dates in digital data files
US6356937B1 (en) * 1999-07-06 2002-03-12 David Montville Interoperable full-featured web-based and client-side e-mail system
US6754661B1 (en) * 1999-07-13 2004-06-22 Microsoft Corporation Hierarchical storage systems for holding evidentiary objects and methods of creating and operating upon hierarchical storage systems
US6557036B1 (en) * 1999-07-20 2003-04-29 Sun Microsystems, Inc. Methods and apparatus for site wide monitoring of electronic mail systems
US6321267B1 (en) * 1999-11-23 2001-11-20 Escom Corporation Method and apparatus for filtering junk email
US7249175B1 (en) * 1999-11-23 2007-07-24 Escom Corporation Method and system for blocking e-mail having a nonexistent sender address
US7213005B2 (en) * 1999-12-09 2007-05-01 International Business Machines Corporation Digital content distribution using web broadcasting services
US6460050B1 (en) 1999-12-22 2002-10-01 Mark Raymond Pace Distributed content identification system
US6816884B1 (en) * 2000-01-27 2004-11-09 Colin T. Summers System and method for creating conversationally-styled summaries from digesting email messages
US6965919B1 (en) * 2000-08-24 2005-11-15 Yahoo! Inc. Processing of unsolicited bulk electronic mail
US7149778B1 (en) * 2000-08-24 2006-12-12 Yahoo! Inc. Unsolicited electronic mail reduction
US7130885B2 (en) * 2000-09-05 2006-10-31 Zaplet, Inc. Methods and apparatus providing electronic messages that are linked and aggregated
BR0114602A (en) * 2000-10-13 2004-09-28 Eversystems Inc Secret Key Message Generation
US8539030B2 (en) * 2000-11-22 2013-09-17 Xerox Corporation System and method for managing digests comprising electronic messages
US7673342B2 (en) * 2001-07-26 2010-03-02 Mcafee, Inc. Detecting e-mail propagated malware
US20040092310A1 (en) * 2002-11-07 2004-05-13 Igt Identifying message senders

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999033188A2 (en) * 1997-12-23 1999-07-01 Bright Light Technologies, Inc. Apparatus and method for controlling delivery of unsolicited electronic mail

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PAUL RUBIN: "Re: spam (hash codes)" INTERNET ARTICLE, [Online] 20 February 1995 (1995-02-20), XP002229452 alt.current-events.net-abuse,news.admin.mi sc,news.admin.policy Retrieved from the Internet: <URL:http://groups.google.com/groups?selm= phrD4A6Ip.LFA%40netcom.com&output=gplain> [retrieved on 2003-01-30] *
RAGNAR LONN: "Re: automated spam detection" INTERNET ARCTICLE, [Online] 16 February 1999 (1999-02-16), XP002229453 anti-spam-wg Retrieved from the Internet: <URL:http://www.ripe.net/ripe/mail-archive s/anti-spam-wg/1999/msg00156.html> [retrieved on 2003-01-30] *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8271596B1 (en) 2000-05-16 2012-09-18 Ziplink, Inc. Apparatus and methods for controlling the transmission of messages
EP1280039A3 (en) * 2001-07-26 2005-08-17 Networks Associates Technology, Inc. Detecting e-mail propagated malware
EP1280039A2 (en) * 2001-07-26 2003-01-29 Networks Associates Technology, Inc. Detecting e-mail propagated malware
US8046832B2 (en) 2002-06-26 2011-10-25 Microsoft Corporation Spam detector with challenges
EP1422872B1 (en) * 2002-11-20 2007-05-30 Societé Française du Radiotéléphone Modular method and device for the tracing of a multimedia message through a telecommunications network
US7219131B2 (en) 2003-01-16 2007-05-15 Ironport Systems, Inc. Electronic message delivery using an alternate source approach
WO2004098148A1 (en) * 2003-04-25 2004-11-11 Messagelabs Limited A method of, and system for detecting mass mailing computer viruses
US7472284B2 (en) 2003-04-25 2008-12-30 Messagelabs Limited Method of, and system for detecting mass mailing viruses
US7711779B2 (en) 2003-06-20 2010-05-04 Microsoft Corporation Prevention of outgoing spam
EP1644784A4 (en) * 2003-06-25 2010-06-09 Nokia Inc Two-phase hash value matching technique in message protection systems
EP1644784A2 (en) * 2003-06-25 2006-04-12 Nokia Inc. Two-phase hash value matching technique in message protection systems
US8856239B1 (en) 2004-02-10 2014-10-07 Sonicwall, Inc. Message classification based on likelihood of spoofing
US9860167B2 (en) 2004-02-10 2018-01-02 Sonicwall Inc. Classifying a message based on likelihood of spoofing
US9100335B2 (en) 2004-02-10 2015-08-04 Dell Software Inc. Processing a message based on a boundary IP address and decay variable
US8612560B2 (en) 2004-02-10 2013-12-17 Sonicwall, Inc. Message classification using domain name and IP address extraction
WO2005081477A1 (en) * 2004-02-17 2005-09-01 Ironport Systems, Inc. Collecting, aggregating, and managing information relating to electronic messages
US7917588B2 (en) 2004-05-29 2011-03-29 Ironport Systems, Inc. Managing delivery of electronic messages using bounce profiles
US8166310B2 (en) 2004-05-29 2012-04-24 Ironport Systems, Inc. Method and apparatus for providing temporary access to a network device
US7836133B2 (en) 2005-05-05 2010-11-16 Ironport Systems, Inc. Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources
US7854007B2 (en) 2005-05-05 2010-12-14 Ironport Systems, Inc. Identifying threats in electronic messages
US7877493B2 (en) 2005-05-05 2011-01-25 Ironport Systems, Inc. Method of validating requests for sender reputation information
US7930353B2 (en) 2005-07-29 2011-04-19 Microsoft Corporation Trees of classifiers for detecting email spam
EP2169897A1 (en) * 2008-09-25 2010-03-31 Avira GmbH Computer-based method for the prioritization of potential malware sample messages
US8666358B2 (en) 2008-11-18 2014-03-04 Qualcomm Incorporated Method and apparatus for delivering and receiving enhanced emergency broadcast alert messages
WO2010059735A3 (en) * 2008-11-18 2010-07-29 Qualcomm Incorporated Method and apparatus for delivering and receiving enhanced emergency broadcast alert messages
US8954519B2 (en) 2012-01-25 2015-02-10 Bitdefender IPR Management Ltd. Systems and methods for spam detection using character histograms
WO2013112062A1 (en) * 2012-01-25 2013-08-01 Bitdefender Ipr Management Ltd Systems and methods for spam detection using character histograms
US9130778B2 (en) 2012-01-25 2015-09-08 Bitdefender IPR Management Ltd. Systems and methods for spam detection using frequency spectra of character strings
CN104067567B (en) * 2012-01-25 2017-08-25 比特梵德知识产权管理有限公司 System and method for carrying out spam detection using character histogram
US10212114B2 (en) 2012-01-25 2019-02-19 Bitdefender IPR Management Ltd. Systems and methods for spam detection using frequency spectra of character strings
CN107707448A (en) * 2016-08-09 2018-02-16 迈买有限责任公司 User is allowed to change the electronic message delivery platform of message content and annex after transmission
EP3716177A1 (en) * 2019-03-25 2020-09-30 IPCO 2012 Limited A method, apparatus and computer program for verifying the integrity of electronic messages
WO2020193016A1 (en) * 2019-03-25 2020-10-01 Ipco 2012 Limited A method, apparatus and computer program for verifying the integrity of electronic messages
US11405408B2 (en) 2019-03-25 2022-08-02 Ipco 2012 Limited Method, apparatus and computer program for verifying the integrity of electronic messages
WO2022250909A1 (en) * 2021-05-28 2022-12-01 Microsoft Technology Licensing, Llc A personalized communication text compression system

Also Published As

Publication number Publication date
EP1368719A2 (en) 2003-12-10
US20040064515A1 (en) 2004-04-01
GB2366706A (en) 2002-03-13
GB0021444D0 (en) 2000-10-18
US7801960B2 (en) 2010-09-21
GB2366706B (en) 2004-11-03
WO2002019069A3 (en) 2003-10-09
AU2001282359A1 (en) 2002-03-13

Similar Documents

Publication Publication Date Title
US7801960B2 (en) Monitoring electronic mail message digests
US20030220978A1 (en) System and method for message sender validation
US7571319B2 (en) Validating inbound messages
AU2002237408B2 (en) A method of, and system for, processing email in particular to detect unsolicited bulk email
US7249175B1 (en) Method and system for blocking e-mail having a nonexistent sender address
EP1299791B1 (en) Method of and system for processing email
US6321267B1 (en) Method and apparatus for filtering junk email
AU782333B2 (en) Electronic message filter having a whitelist database and a quarantining mechanism
US6453327B1 (en) Method and apparatus for identifying and discarding junk electronic mail
US20020147780A1 (en) Method and system for scanning electronic mail to detect and eliminate computer viruses using a group of email-scanning servers and a recipient&#39;s email gateway
AU2002237408A1 (en) A method of, and system for, processing email in particular to detect unsolicited bulk email
US20050198518A1 (en) Method for blocking Spam
US20060184635A1 (en) Electronic mail method using email tickler
AU2009299539B2 (en) Electronic communication control
WO2005001733A1 (en) E-mail managing system and method thereof
US7257773B1 (en) Method and system for identifying unsolicited mail utilizing checksums
US20080059588A1 (en) Method and System for Providing Notification of Nefarious Remote Control of a Data Processing System
US20050289239A1 (en) Method and an apparatus to classify electronic communication
Haskins ISPadmin

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ CZ DE DE DK DK DM DZ EC EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2001960974

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWE Wipo information: entry into national phase

Ref document number: 10362840

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 2001960974

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWR Wipo information: refused in national office

Ref document number: 2001960974

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2001960974

Country of ref document: EP