WO2006033936A2 - Self-tuning statistical method and system for blocking spam - Google Patents

Self-tuning statistical method and system for blocking spam Download PDF

Info

Publication number
WO2006033936A2
WO2006033936A2 PCT/US2005/032819 US2005032819W WO2006033936A2 WO 2006033936 A2 WO2006033936 A2 WO 2006033936A2 US 2005032819 W US2005032819 W US 2005032819W WO 2006033936 A2 WO2006033936 A2 WO 2006033936A2
Authority
WO
WIPO (PCT)
Prior art keywords
email
senders
unsolicited bulk
addresses
spamtrap
Prior art date
Application number
PCT/US2005/032819
Other languages
French (fr)
Other versions
WO2006033936A3 (en
Inventor
Henri H. Van Riel
Original Assignee
Red Hat, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Red Hat, Inc. filed Critical Red Hat, Inc.
Publication of WO2006033936A2 publication Critical patent/WO2006033936A2/en
Publication of WO2006033936A3 publication Critical patent/WO2006033936A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/101Access control lists [ACL]

Definitions

  • the present invention relates to electronic mail transmission and, more particularly, to the detection and blocking of unsolicited bulk electronic mail otherwise known as spam.
  • spam can waste valuable resources without proper authority. For example, spam can take up valuable storage space on an individual's (or company's) email account or mail server. This can often result in a storage quota being exceeded on the user's email account. When this happens, legitimate email can be prevented from reaching the user. For example, most internet service providers (ISP) allocate a certain amount of storage space for each user. Once that storage space is exceeded, email delivery is suspended. All email sent to the user during the suspension period is discarded, or otherwise not accepted by the ISP. Users must delete emails in their accounts in order to receive new emails.
  • ISP internet service providers
  • Another problem with receiving spam is that a significant amount of time can be spent reviewing and deleting messages that have been received. Since a recipient may not know immediately whether a message is spam, the contents must be reviewed prior to making a decision to save or delete. Even if the message can be identified as spam from the header information, it must still be deleted by the user.
  • Certain types of spam can direct users to different web sites where time is ultimately spent reviewing advertising content. For example, a spam email can advertise discounts or sale from a particular merchant. Users must subsequently follow a link to the merchant's website in order to obtain more information. Once at the merchant's website, users are overwhelmed with product information and incentives which lead to surfing the merchant's website. While this does not pose a problem to home users, an employer can lose employee productivity during this time period. Another objection to the receipt of spam is that the messages tend to contain objectionable and/or illegal content such as pornography, illegal activities, financial scams, etc. [0006] There are various methods in place to intercept, or counter, spam being transmitted to individuals.
  • Email messages can also be blocked based on information contained in the message, such as the subject line.
  • spammers can easily change the source of spam to bypass the filters.
  • spammers continue to improve the content of their spam to make it difficult to filter. For example, some emails cannot be filtered without examining the content of the message, which may raise various legal and privacy issues. It is also difficult to determine exactly when a user (or system) is transmitting too much email to qualify such transmission as spam. Without such a determination, the spam transmitted cannot be filtered.
  • Some current methods of filtering spam involve manual selection of a threshold to identify a sender as a spammer. Such methods, however, are inefficient and difficult to implement due to the amount of email being transmitted on a daily basis. Furthermore, spammers can easily change the address being used to transmit spam. Consequently, there is a high cost associated with monitoring the level of email being transmitted by users in order to manually establish a threshold for classifying such messages as spam. Additionally, the time required to adjust the threshold may be too long, resulting in spammers changing their addresses prior to being blocked or immediately upon being blocked.
  • the present invention provides an ability to identify unsolicited bulk email wherein a variety of spamtrap addresses are created and dedicated to the receipt of unsolicited bulk email, and spammers are identified based, in part, on the amount of email received at the spamtrap addresses.
  • a system for detecting unsolicited bulk email.
  • the system comprises one or more spamtrap addresses, a list server, and a database.
  • the spamtrap addresses are created to receive unsolicited bulk email, or spam.
  • the list server receives email from a plurality of senders as well as queries regarding the senders.
  • the database is used for storing information corresponding to the amount of unsolicited bulk email received at the spamtrap addresses.
  • the database stores query data corresponding to the number of queries received regarding senders of email. Based on the information stored on the database, the list server makes a determination of which senders transmit a disproportionate amount of email and should be labeled as spammers. According to such a system, spammers may be identified in real time. The spam transmitted by these users could be effectively blocked, thereby saving valuable resources.
  • the information stored on the database can include at least an EP address associated with each of the senders and the number of unsolicited bulk email sent by each sender to the spamtrap addresses.
  • the query data can correspond to the number of inquiries on whether a particular sender has previously been identified with transmission of unsolicited bulk email.
  • the query data can be representative, at least in part, of the total number of emails transmitted by the sender in question.
  • the list server can automatically set a threshold value for determining which senders transmit unsolicited bulk electronic mail. Accordingly, the list server can dynamically adjust the threshold value and identify the addresses of spammers without external intervention. The list server can also make as many adjustments as necessary to compensate for spammers who change their email addresses, or the addresses of their email servers.
  • the database can store timestamps for each unsolicited bulk email received at the spamtrap addresses and each query received. The list server can also apply different weight factors to the stored information and query data based on the timestamp. These weight factors can subsequently be applied when identifying spammers.
  • Various embodiments of the invention can also allow the list server to receive removal requests from senders who have been identified as spammers.
  • the list server can review the removal request to determine if the sender is transmitting spam or legitimate bulk email.
  • Figure 1 is a block diagram illustrating a system for detecting unsolicited bulk email in accordance with at least one embodiment of the present invention
  • Figure 2 illustrates an exemplary database table and block list that can be used with one or more embodiments of the present invention
  • Figure 3 is a flowchart illustrating steps performed to detect unsolicited bulk email in accordance with various embodiments of the present invention.
  • Figure 4 is a flowchart illustrating steps performed to detect unsolicited bulk email in accordance with at least one alternative embodiment of the present invention.
  • a spam detection system 100 is illustrated for detecting unsolicited bulk emails (spam) according to an exemplary embodiment of the present invention.
  • the spam detection system 100 of Figure 1 includes a list server 110, a database 114, a plurality to senders 116 of email, and a plurality of mail servers 118.
  • the list server 110 is used to receive and detect both regular email and unsolicited bulk email (spam).
  • the list server 110 includes a plurality of spamtrap addresses 112.
  • the spamtrap addresses 112 are not used by senders 116.
  • the spamtrap addresses 112 correspond to email addresses that are monitored only for the receipt of unsolicited bulk email (spam).
  • the list server 110 monitors the amount of spam received at the spamtrap addresses 112 in order to identify senders 116 that transmit spam.
  • the list server 110 can count only emails that are received at more than one spamtrap address 112. Accordingly, if a particular spam message is only received at one spamtrap address 112, it may be discarded for purposes of identifying a sender 116 as a spammer.
  • spam messages are assigned higher weight factors (as will be described in further detail below) if they are received at more than one spamtrap address 112.
  • Information collected by the list server 110 regarding spam received at the spamtrap addresses 112 can be stored in the database 114.
  • one or more embodiments of the present invention can allowi the database 114 to be maintained either directly on the list server 110 or remotely from trie list server 110.
  • the list server 110 is in communication with the database 114, at least in part, to enter data and submit queries.
  • Email transmitted by the senders 116 can also pass through one or more mail servers 118.
  • a mail server 118 receives an email from a sender 116, a query is submitted to the database 114.
  • the mail server 118 would request information regarding the sender 116 of the email message.
  • the mail server 118 would be interested in determining if the sender 116 has been identified as a spammer (i.e., someone who sends unsolicited bulk emails) so that messages sent by that particular sender could be blocked.
  • the mail server 118 can identify a sender 116 as a spammer, if the mail server 118 receives too many messages from the sender 116 within a predetermined time frame. Furthermore, the list server 110 can identify a sender 116 as a spammer if it receives too many queries regarding the sender from the mail servers 118. Senders 116 can also be identified as potential spammers if there is a sudden, large change in the amount of email they transmit. For example, consider a sender 116 who has been transmitting approximately 10 emails per day. Suddenly, the sender 116 begins transmitting approximately 10,000 emails per day.
  • the list server 110 could decide to block emails from the sender 116. Emails can be blocked for a short period of time, a long period of time, or permanently.
  • the list server 110 stores Information on the database 114 corresponding to the identification of senders 116 who have been identified as spammers. While the present embodiment of the invention identifies a list server 110 for creating/monitoring the spamtrap addresses 112 and interacting with the database 114, it should be noted that these actions/features can be implemented in the mail server 118 or various types of computer systems. Additionally, a single computer system can be configured to act as both a list server 110 and a mail server 118.
  • the list server 110 makes a determination as to which senders 116 should be identified as spammers or as senders of legitimate email.
  • solicited bulk email can also be identified as legitimate email. This information can be determined, in part, based on the number of spam email received at the spamtrap addresses 112 and/or the number of queries submitted to the database 114 by the mail servers 118.
  • the list server 110 can identify potential spammers based on the relative proportion of spam received at the spamtrap addresses 112 and the number of legitimate email sent to user addresses. Additionally, the list server 110 can keep track of the number of queries submitted to the database 114 by the mail servers 118.
  • the number of queries submitted can, according to certain embodiments of the invention, correspond to the total number of emails transmitted by the senders 116 (including spam) or the number of legitimate emails transmitted by the senders 116. Based on these values, the list server 110 is capable of determining the proportion of spam transmitted by any sender 116 relative to the proportion of legitimate emails.
  • senders 116 continue transmitting email to the mail servers 118 and the list server 110, data continues to be collected.
  • the list server 110 automatically (and continuously) calculates various ratios related to the proportion of spam transmitted by various senders 116 relative to the number of legitimate emails transmitted.
  • a threshold value can be calculated, in part, to determine if a sender 116 should be identified as a spammer. If a particular sender 116E, for example, transmits an amount of spam to the spamtrap addresses 112 that exceeds the threshold value set by the list server 110 for all senders 116 of email, then that particular sender 116E would be identified as a spammer.
  • the list server 110 continually monitors email traffic and dynamically adjusts the threshold value in order to efficiently identify senders 116 that transmit disproportionate amounts of spam. Once a sender 116E has been identified as a spammer, they are placed on a block list 120, which contains information regarding all senders 116 that have been identified as spammers.
  • the block list 120 can be optionally transmitted to the mail servers 118, in part, to reduce the number of queries submitted to the database 114.
  • the mail servers 118 can exchange block lists 120 with each other in order to continually update the identification of spammers. This can be beneficial in circumstances where, for example, multiple list servers 110 are established with spamtrap addresses 112 to identify potential spammers. Additionally, this can also be beneficial in situations where the mail servers 118 independently use other methods of identifying spammers. Thus, a mail server 118 with a built in method of identifying spammers would have two sources of information regarding senders 116 who are transmitting spam. Such mail servers 11 8 can further include algorithms and/or logic to determine which senders should be blocked based on the information contained in both lists.
  • one or more embodiments of the invention can provide list servers 110 that transmit information regarding the spamtrap addresses 112 directly to the mail servers 118.
  • the database table 122 contains entries (or rows) corresponding to the IP address of senders 116, the number of emails received at the spamtrap addresses 112, the number of database queries regarding the senders 116, and the ratio of spamtrap hits relative to database queries.
  • the database table 122 of figure 2 includes a column which identifies whether the sender 116 should be listed as a spammer or not.
  • Various threshold values can be calculated in order to determine the relative number of spamtrap hits that should be allowed before a sender 116 is identified as a spammer. For example, according to at least one embodiment of the present invention, the threshold can be calculated based on the sum of database queries relative to the sum of spamtrap hits received from all senders 116 and email servers 118.
  • the threshold value can be calculated based on the sum of the ratios of spamtrap hits relative to database queries.
  • the ratio would be 3,860 to 5. This can be reduced to a ratio of 772 database queries per spamtrap hit. Based on this threshold, sender X3 and X6 would be identified as potential spammers. Senders Xl and X5 would also be identified as potential spammers, while senders X2 and X4 would not.
  • Figure 2 also illustrates an exemplary block list 120 that can be transmitted to the mail servers 118.
  • the block list 120 can consist of, for example, the IP address of the sender and an indication of whether the sender has been identified as a spammer.
  • the block list 120 can include at least two different options. The first option would correspond to, for example, the situation where the threshold is set based on the sum of the database queries relative to the sum of spamtrap hits (i.e., threshold 1). The second option would be to transmit the block list 120 where the senders 116 are identified as spammers based on the second threshold value (i.e., threshold 2).
  • the list server 110 can transmit a block list 120 that contains both values to the mail servers 118.
  • various other statistical ratios can be calculated in order to set different threshold values and/or make various observations regarding which senders 116 could potentially be sending spam.
  • At least one further embodiment of the present invention combine the number of spamtrap hits and queries in various proportions to set different threshold value.
  • Multiple list servers 110 could also exchange information regarding senders 116 whose ratios are within a predetermined tolerance of the threshold value. If a sender 116 is identified as a spammer on multiple list servers 110, they can be identified as a spammer on a list server 110 where they have not met the threshold requirements, but have a ratio within the predetermined tolerance.
  • the list server 110 can maintain historical data on senders 116. If a sender's ratio approaches the threshold value, but does not exceed it, historical data regarding that sender 116 would be examined to determine if the sender 116 might be a spammer. The historical data could, under certain circumstances, identify a trend or pattern in the sender email transmissions. For example, consider a sender 116 whose ratio falls within the predetermined tolerance. Review of the sender's historical data indicates that the sender has fluctuated past the threshold value on at least one occasion over the last two weeks. The list server 110 could block the sender 116 even though the ratio does not exceed the threshold value. Furthermore, the list server 110 could optionally decide to block the sender 116 until its ratio is well below the threshold value.
  • FIG. 3 is a flowchart illustrating steps performed in detecting unsolicited bulk email (spam) in accordance with at least one embodiment of the present invention.
  • the list server 110 creates one or more spamtrap addresses 112.
  • the spamtrap addresses 112 can be created by other entities or on separate mail servers 118 that exchange information, for example, with the list server 110. Additionally, the functions performed by the list server 110 can be implemented on other systems. The specific number of spamtrap addresses 112 created can depend on various factors including, for example, the amount of email traffic passing through the list server 110, the number of queries submitted to the database 114, the number of spam messages received at the spamtrap addresses 112, etc.
  • the list server 110 begins receiving emails from various senders 116.
  • information regarding unsolicited bulk email (spam) received at the spamtrap addresses is stored, for example, in the database 114.
  • This information can vary depending on the specific embodiment of the invention.
  • the information stored can include the IP address of the sender 116, the return address used by the sender 116, etc.
  • the list server 110 could store information specific to the number of spamtrap hits for each sender 116, and/or the number of hits to each individual spamtrap address 112 from each individual sender 116.
  • the list server 110 stores query data to the database 114, or other storage device.
  • the query data can correspond to the number of queries received by the database 114 regarding specific senders 116, the total number of queries received by the database regarding all senders 116, identification information such as an IP address for the sender 116, etc.
  • Both the information regarding the unsolicited bulk email (e.g., spam received at the spamtrap addresses) and the query data are stored in the database 114.
  • the database 114 can be maintained as a remotely located system or it can be maintained within the list server 110.
  • the list server 110 reviews the query data and the information regarding unsolicited bulk email in order to identify senders 116 that should be placed on the block lists 120.
  • the list server 110 may automatically determine criteria wherein the number of emails received at the spamtrap addresses 112 relative to the number queries received regarding senders 116 raises suspicion that a particular sender should be identified as sending unsolicited bulk email.
  • a block list 120 containing information regarding senders 116 that have been identified as transmitting unsolicited bulk emails is transmitted to various entities such as, for example, the mail servers 118 or other spam blocking systems.
  • the function of a list server 110 and mail server 118 can be incorporated into a single computer system. In such situations, the computer system generate a block list 120 that identifies potential spammers, and block email transmissions from such senders 116.
  • FIG. 4 is a flowchart illustrating steps performed to detect senders 116 of unsolicited bulk email according to one or more embodiments of the present invention.
  • one or more spamtrap addresses 112 are created by the list server 110.
  • the spamtrap addresses 112 are designed only to receive unsolicited bulk email. There is generally no, or very little, (e.g., less than 95%) legitimate email traffic flowing through these addresses. For example, bounced messages can sometimes reach a spamtrap address 112. A message having a mistyped destination address can sometimes reach a spamtrap address 112.
  • the list server 110 begins receiving emails from the senders 116.
  • the list server 110 stores information regarding unsolicited bulk email (spam) received at the spamtrap addresses 112.
  • this information can be stored on the database 114.
  • the information can include, for example, an IP address associated with the different senders 116, the number of unsolicited bulk emails transmitted by the different senders 116, return addresses of the different senders 116, etc.
  • the list server 110 stores the query data in the database 114.
  • the query data corresponds, in certain embodiments of the invention, to the total number of emails transmitted by the senders 116 (including spam), or the total number of legitimate emails transmitted by the senders 116.
  • the list server 110 optionally creates timestamps to associate with bulk emails received at the spamtrap addresses 112 and the query data.
  • the timestamp could include, for example, the date and time at which the email was received and/or transmitted.
  • the weight factors could correspond to the stored information and the query data.
  • the weight factors can serve various purposes. For example, according to at least one embodiment of the invention, stored information and query data related to older emails are given a lower weight factor than corresponding information for more recent emails. More particularly, emails that were transmitted, for example, more than two months prior to the current date can be counted with less weight than emails received within the past 24 hour period.
  • step 422 it is determined if the timestamp has expired. This corresponds to the situation where information has been weighted to the point that it is too old to be realistically factored into the current processing. If the timestamp has expired, than the entry is discarded at step 424. Otherwise, if the timestamp has not expired, then control passes to step 426 where a threshold is automatically set. As previously discussed, the threshold can be set based on various factors relating to the statistics of emails and spam received by both the email servers 118 and the spamtrap addresses 112.
  • a block list 120 is constructed and transmitted to the mail servers 118.
  • the block list 120 can contain, for example, the IP addresses of senders 116 who have been identified as transmitting a disproportionate amount of unsolicited bulk email relative to legitimate email.
  • the block list 120 can also include query data that assists mail servers 118 with their own screening hardware/software to identify or verify the selection of a particular sender 116 as a spammer.
  • the different mail servers can exchange block lists 120 with each other.
  • the list server 110 determines if any removal requests have been received.
  • the removal requests correspond to requests from specific senders to be removed from the one or more block lists 120 because the requesters are transmitting legitimate emails.
  • the list server 110 can verify the information submitted by the sender 116 in order to insure that the sender 116 is transmitting legitimate emails from a legitimate email address.
  • Such senders can include, for example, distribution lists that often transmit catalogs, circulars, coupons, etc. to members that subscribe to a particular service.
  • the sender 116 if verified, is placed on a safe list. Once placed on the safe list, the sender 116 could be precluded from being placed on a future block list 120. This can be achieved, for example, by the list server 110 cross referencing the entries on the safe list with entries on newly created block list 120, and removing entries from the block list 120 that correspond to entries on the safe list. At this point, control returns to step 410 where the list server 110 continues receiving email. According to one or more embodiments of the invention, senders 116 who cannot be verified by the list server 110 are only placed on the safe list for a temporary time period until they can be verified.
  • senders 116 could be placed on the safe list until the list server 110 identifies them as exceeding the threshold values based, in part, on emails transmitted to the spamtrap addresses 112.
  • the process illustrated in Fig. 4 can be continuous and dynamic in various embodiments of the present invention.
  • the list server 110 continually receives emails and continually performs the necessary steps to adjust the threshold value based on the current email traffic and identify spammers on an ongoing basis.
  • the list server 110 may perform these functions without any external (or user) intervention.
  • the spam detection system of the present invention may be implemented in a variety of forms, such as in software or firmware, running on a general purpose computer or a specialized computer system such as, for example, a server.
  • the software can be provided in any machine-readable medium, including magnetic or optical disk, or in memory. Furthermore, the present invention is utilizable in conjunction with a computer system that operates software which may require periodic updates.
  • the spam detection system can be implemented on various computer systems and/or servers using any operating system including, Windows, MacOS, Unix, Linux, etc., and can be implemented using any email system.

Abstract

A spam detection system (100) is illustrated for detecting unsolicited bulk emails (spam) according to an exemplary embodiment of the present invention. The spam detection system (100) includes a list server (110), a database (114), a plurality to senders (116) of email, and a plurality of mail servers (118). The list server (110) is used to receive and detect both regular email and unsolicited bulk email (spam). Accordingly, the list server (110) includes a plurality of spamtrap addresses (112). According to at least one embodiment of the present invention, the spamtrap addresses (112) correspond to email addresses that are monitored only for the receipt of unsolicited bulk email (spam). The list server (110) monitors the amount of spam received at the spamtrap addresses (112) in order to identify senders (116) that transmit spam. According to other embodiments of the invention, the list server (110) can count only emails that are received at more than one spamtrap address (112). Accordingly, if a particular spam message is only received at one spamtrap address (112), it may be discarded for purposes of identifying a sender (116) as a spammer. According to one or more further embodiments of the invention, spam messages are assigned higher weight factors (as will be described in further detail below) if they are received at more than one spamtrap address (112).

Description

SELF-TUNING STATISTICAL METHOD AND SYSTEM FOR BLOCKING SPAM
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present invention relates to electronic mail transmission and, more particularly, to the detection and blocking of unsolicited bulk electronic mail otherwise known as spam.
Description of the Related Art
[0002] There has been an increasing growth of electronic traffic over trie Internet in recent years. As the Internet continues to evolve and traffic continues to grow, various problems begin to surface. One of the most notable problems today is that of unsolicited bulk email, or spam. Spam is generally in the form of marketing information that is transmitted to a large number of users without solicitation. This type of email is often useless to the person receiving it. Nonetheless, a great deal of time is spent by the recipient to open the email and review/delete the message. While it is possible that a small number of recipients may have an actual interest in the email, the vast majority have no interest and tend to delete spam upon receipt. The term spammer is often used to identify a person (or organization) who transmits spam.
[0003] One of the primary objections to spam is that it can waste valuable resources without proper authority. For example, spam can take up valuable storage space on an individual's (or company's) email account or mail server. This can often result in a storage quota being exceeded on the user's email account. When this happens, legitimate email can be prevented from reaching the user. For example, most internet service providers (ISP) allocate a certain amount of storage space for each user. Once that storage space is exceeded, email delivery is suspended. All email sent to the user during the suspension period is discarded, or otherwise not accepted by the ISP. Users must delete emails in their accounts in order to receive new emails.
[0004] Another problem with receiving spam is that a significant amount of time can be spent reviewing and deleting messages that have been received. Since a recipient may not know immediately whether a message is spam, the contents must be reviewed prior to making a decision to save or delete. Even if the message can be identified as spam from the header information, it must still be deleted by the user.
[0005] Certain types of spam can direct users to different web sites where time is ultimately spent reviewing advertising content. For example, a spam email can advertise discounts or sale from a particular merchant. Users must subsequently follow a link to the merchant's website in order to obtain more information. Once at the merchant's website, users are overwhelmed with product information and incentives which lead to surfing the merchant's website. While this does not pose a problem to home users, an employer can lose employee productivity during this time period. Another objection to the receipt of spam is that the messages tend to contain objectionable and/or illegal content such as pornography, illegal activities, financial scams, etc. [0006] There are various methods in place to intercept, or counter, spam being transmitted to individuals. These methods attempt to filter, or block, email messages that are received from sources that have been identified as, and/or associated with, spammers. Email messages can also be blocked based on information contained in the message, such as the subject line. One problem associated with such techniques, however, is the fact that spammers can easily change the source of spam to bypass the filters. Furthermore, spammers continue to improve the content of their spam to make it difficult to filter. For example, some emails cannot be filtered without examining the content of the message, which may raise various legal and privacy issues. It is also difficult to determine exactly when a user (or system) is transmitting too much email to qualify such transmission as spam. Without such a determination, the spam transmitted cannot be filtered. [0007] Some current methods of filtering spam involve manual selection of a threshold to identify a sender as a spammer. Such methods, however, are inefficient and difficult to implement due to the amount of email being transmitted on a daily basis. Furthermore, spammers can easily change the address being used to transmit spam. Consequently, there is a high cost associated with monitoring the level of email being transmitted by users in order to manually establish a threshold for classifying such messages as spam. Additionally, the time required to adjust the threshold may be too long, resulting in spammers changing their addresses prior to being blocked or immediately upon being blocked.
SUMMARY OF THE INVENTION
[0008] In accordance with one or more embodiments, the present invention provides an ability to identify unsolicited bulk email wherein a variety of spamtrap addresses are created and dedicated to the receipt of unsolicited bulk email, and spammers are identified based, in part, on the amount of email received at the spamtrap addresses. [0009] In accordance with at least one embodiment of the invention, a system is provided for detecting unsolicited bulk email. The system comprises one or more spamtrap addresses, a list server, and a database. The spamtrap addresses are created to receive unsolicited bulk email, or spam. The list server receives email from a plurality of senders as well as queries regarding the senders. The database is used for storing information corresponding to the amount of unsolicited bulk email received at the spamtrap addresses. Additionally, the database stores query data corresponding to the number of queries received regarding senders of email. Based on the information stored on the database, the list server makes a determination of which senders transmit a disproportionate amount of email and should be labeled as spammers. According to such a system, spammers may be identified in real time. The spam transmitted by these users could be effectively blocked, thereby saving valuable resources. [0010] According to one or more specific implementations of the invention, the information stored on the database can include at least an EP address associated with each of the senders and the number of unsolicited bulk email sent by each sender to the spamtrap addresses. Furthermore, the query data can correspond to the number of inquiries on whether a particular sender has previously been identified with transmission of unsolicited bulk email. The query data can be representative, at least in part, of the total number of emails transmitted by the sender in question. The list server can automatically set a threshold value for determining which senders transmit unsolicited bulk electronic mail. Accordingly, the list server can dynamically adjust the threshold value and identify the addresses of spammers without external intervention. The list server can also make as many adjustments as necessary to compensate for spammers who change their email addresses, or the addresses of their email servers. [0011] According to at least one aspect of the present invention, the database can store timestamps for each unsolicited bulk email received at the spamtrap addresses and each query received. The list server can also apply different weight factors to the stored information and query data based on the timestamp. These weight factors can subsequently be applied when identifying spammers. Various embodiments of the invention can also allow the list server to receive removal requests from senders who have been identified as spammers. The list server can review the removal request to determine if the sender is transmitting spam or legitimate bulk email. [00X2] It is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. Rather, the invention is capable of other embodiments and of being practiced and carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting.
[0013] As such, those skilled in the art will appreciate that the conception, upon which this disclosure is based, may readily be utilized as a basis for the designing of other structures, methods and systems for carrying out the several purposes of the present invention. It is important, therefore, that the claims be regarded as including such equivalent constructions insofar as they do not depart from the spirit and scope of the present invention. [0014] These, and various features of novelty which characterize the invention, are pointed out with particularity in the appended claims forming a part of this disclosure. For a better understanding of the invention, its operating advantages and the specific benefits attained by its uses, reference should be had to the accompanying drawings and embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] Figure 1 is a block diagram illustrating a system for detecting unsolicited bulk email in accordance with at least one embodiment of the present invention;
[0016] Figure 2 illustrates an exemplary database table and block list that can be used with one or more embodiments of the present invention;
[0017] Figure 3 is a flowchart illustrating steps performed to detect unsolicited bulk email in accordance with various embodiments of the present invention; and
[0018] Figure 4 is a flowchart illustrating steps performed to detect unsolicited bulk email in accordance with at least one alternative embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0019] Reference now will be made in detail to preferred embodiments of the invention. Such embodiments are provided by way of explanation of the invention, which is not intended to be limited thereto. In fact, those of ordinary skill in the art will appreciate, upon reading the present specification and viewing the present drawings, that various modifications and variations can be made.
[0020] For exiample, features illustrated or described as part of one embodiment can be used on other embodiments to yield a still further embodiment. Additionally, certain features may be interchanged with similar devices or features not mentioned yet which perform the same or similar functions. It is therefore intended that such modifications and variations are included within the totality of the present invention. [0021] Turning to figure 1, a spam detection system 100 is illustrated for detecting unsolicited bulk emails (spam) according to an exemplary embodiment of the present invention. The spam detection system 100 of Figure 1 includes a list server 110, a database 114, a plurality to senders 116 of email, and a plurality of mail servers 118. The list server 110 is used to receive and detect both regular email and unsolicited bulk email (spam). Accordingly, the list server 110 includes a plurality of spamtrap addresses 112. The spamtrap addresses 112 are not used by senders 116. According to at least one embodiment of the present invention, the spamtrap addresses 112 correspond to email addresses that are monitored only for the receipt of unsolicited bulk email (spam). The list server 110 monitors the amount of spam received at the spamtrap addresses 112 in order to identify senders 116 that transmit spam. According to other embodiments of the invention, the list server 110 can count only emails that are received at more than one spamtrap address 112. Accordingly, if a particular spam message is only received at one spamtrap address 112, it may be discarded for purposes of identifying a sender 116 as a spammer. According to one or more further embodiments of the invention, spam messages are assigned higher weight factors (as will be described in further detail below) if they are received at more than one spamtrap address 112.
[0022] Information collected by the list server 110 regarding spam received at the spamtrap addresses 112 can be stored in the database 114. Optionally, one or more embodiments of the present invention can allowi the database 114 to be maintained either directly on the list server 110 or remotely from trie list server 110. In either event, the list server 110 is in communication with the database 114, at least in part, to enter data and submit queries. Email transmitted by the senders 116 can also pass through one or more mail servers 118. When a mail server 118 receives an email from a sender 116, a query is submitted to the database 114. In the query, the mail server 118 would request information regarding the sender 116 of the email message. For example, the mail server 118 would be interested in determining if the sender 116 has been identified as a spammer (i.e., someone who sends unsolicited bulk emails) so that messages sent by that particular sender could be blocked.
[0023] Optionally and additionally, the mail server 118 can identify a sender 116 as a spammer, if the mail server 118 receives too many messages from the sender 116 within a predetermined time frame. Furthermore, the list server 110 can identify a sender 116 as a spammer if it receives too many queries regarding the sender from the mail servers 118. Senders 116 can also be identified as potential spammers if there is a sudden, large change in the amount of email they transmit. For example, consider a sender 116 who has been transmitting approximately 10 emails per day. Suddenly, the sender 116 begins transmitting approximately 10,000 emails per day. Assuming that some of the emails are received at the spamtrap addresses 112, the list server 110 could decide to block emails from the sender 116. Emails can be blocked for a short period of time, a long period of time, or permanently. The list server 110 stores Information on the database 114 corresponding to the identification of senders 116 who have been identified as spammers. While the present embodiment of the invention identifies a list server 110 for creating/monitoring the spamtrap addresses 112 and interacting with the database 114, it should be noted that these actions/features can be implemented in the mail server 118 or various types of computer systems. Additionally, a single computer system can be configured to act as both a list server 110 and a mail server 118.
[0024] The list server 110 makes a determination as to which senders 116 should be identified as spammers or as senders of legitimate email. According to one or more embodiments of the present invention, solicited bulk email can also be identified as legitimate email. This information can be determined, in part, based on the number of spam email received at the spamtrap addresses 112 and/or the number of queries submitted to the database 114 by the mail servers 118. For example, at least one embodiment of the present invention allows the list server 110 to identify potential spammers based on the relative proportion of spam received at the spamtrap addresses 112 and the number of legitimate email sent to user addresses. Additionally, the list server 110 can keep track of the number of queries submitted to the database 114 by the mail servers 118. The number of queries submitted can, according to certain embodiments of the invention, correspond to the total number of emails transmitted by the senders 116 (including spam) or the number of legitimate emails transmitted by the senders 116. Based on these values, the list server 110 is capable of determining the proportion of spam transmitted by any sender 116 relative to the proportion of legitimate emails.
[0025] As senders 116 continue transmitting email to the mail servers 118 and the list server 110, data continues to be collected. The list server 110 automatically (and continuously) calculates various ratios related to the proportion of spam transmitted by various senders 116 relative to the number of legitimate emails transmitted. A threshold value can be calculated, in part, to determine if a sender 116 should be identified as a spammer. If a particular sender 116E, for example, transmits an amount of spam to the spamtrap addresses 112 that exceeds the threshold value set by the list server 110 for all senders 116 of email, then that particular sender 116E would be identified as a spammer. As can be appreciated, the list server 110 continually monitors email traffic and dynamically adjusts the threshold value in order to efficiently identify senders 116 that transmit disproportionate amounts of spam. Once a sender 116E has been identified as a spammer, they are placed on a block list 120, which contains information regarding all senders 116 that have been identified as spammers. The block list 120 can be optionally transmitted to the mail servers 118, in part, to reduce the number of queries submitted to the database 114.
[0026] According to one or more embodiments of the present invention, the mail servers 118 can exchange block lists 120 with each other in order to continually update the identification of spammers. This can be beneficial in circumstances where, for example, multiple list servers 110 are established with spamtrap addresses 112 to identify potential spammers. Additionally, this can also be beneficial in situations where the mail servers 118 independently use other methods of identifying spammers. Thus, a mail server 118 with a built in method of identifying spammers would have two sources of information regarding senders 116 who are transmitting spam. Such mail servers 11 8 can further include algorithms and/or logic to determine which senders should be blocked based on the information contained in both lists. Additionally, one or more embodiments of the invention can provide list servers 110 that transmit information regarding the spamtrap addresses 112 directly to the mail servers 118. [0027] Turning now to figure 2, an exemplary database table 122 and block list 120 are shown. The database table 122 contains entries (or rows) corresponding to the IP address of senders 116, the number of emails received at the spamtrap addresses 112, the number of database queries regarding the senders 116, and the ratio of spamtrap hits relative to database queries. Additionally, the database table 122 of figure 2 includes a column which identifies whether the sender 116 should be listed as a spammer or not. Various threshold values can be calculated in order to determine the relative number of spamtrap hits that should be allowed before a sender 116 is identified as a spammer. For example, according to at least one embodiment of the present invention, the threshold can be calculated based on the sum of database queries relative to the sum of spamtrap hits received from all senders 116 and email servers 118.
[0028] According to the example illustrated in figure 2, the sum of database queries regarding all senders equals 17,200, while the sum of spamtrap hits equals 41. This results in a ratio of one spamtrap hit per 420 database queries. It can be seen that senders Xl and X5 transmit much more spam to the spamtrap addresses 112 relative to legitimate emails. Thus, they would be identified as spammers. Senders X2, X3, X4, and X6 transmit significantly more legitimate email relative to spam at the spamtrap addresses 112. Therefore, these senders would not be listed as spammers on the block list. [0029] According to at least one alternative embodiment of the present invention, the threshold value can be calculated based on the sum of the ratios of spamtrap hits relative to database queries. Thus, based on the values stored in table 122, the ratio would be 3,860 to 5. This can be reduced to a ratio of 772 database queries per spamtrap hit. Based on this threshold, sender X3 and X6 would be identified as potential spammers. Senders Xl and X5 would also be identified as potential spammers, while senders X2 and X4 would not.
[0030] Figure 2 also illustrates an exemplary block list 120 that can be transmitted to the mail servers 118. The block list 120 can consist of, for example, the IP address of the sender and an indication of whether the sender has been identified as a spammer. Accordingly, the block list 120 can include at least two different options. The first option would correspond to, for example, the situation where the threshold is set based on the sum of the database queries relative to the sum of spamtrap hits (i.e., threshold 1). The second option would be to transmit the block list 120 where the senders 116 are identified as spammers based on the second threshold value (i.e., threshold 2). Alternatively, the list server 110 can transmit a block list 120 that contains both values to the mail servers 118. Furthermore, various other statistical ratios can be calculated in order to set different threshold values and/or make various observations regarding which senders 116 could potentially be sending spam.
[0031] At least one further embodiment of the present invention combine the number of spamtrap hits and queries in various proportions to set different threshold value. Multiple list servers 110 could also exchange information regarding senders 116 whose ratios are within a predetermined tolerance of the threshold value. If a sender 116 is identified as a spammer on multiple list servers 110, they can be identified as a spammer on a list server 110 where they have not met the threshold requirements, but have a ratio within the predetermined tolerance.
[0032] Additionally, the list server 110 can maintain historical data on senders 116. If a sender's ratio approaches the threshold value, but does not exceed it, historical data regarding that sender 116 would be examined to determine if the sender 116 might be a spammer. The historical data could, under certain circumstances, identify a trend or pattern in the sender email transmissions. For example, consider a sender 116 whose ratio falls within the predetermined tolerance. Review of the sender's historical data indicates that the sender has fluctuated past the threshold value on at least one occasion over the last two weeks. The list server 110 could block the sender 116 even though the ratio does not exceed the threshold value. Furthermore, the list server 110 could optionally decide to block the sender 116 until its ratio is well below the threshold value. [0033] Figure 3 is a flowchart illustrating steps performed in detecting unsolicited bulk email (spam) in accordance with at least one embodiment of the present invention. At step 300, the list server 110 creates one or more spamtrap addresses 112. Optionally, the spamtrap addresses 112 can be created by other entities or on separate mail servers 118 that exchange information, for example, with the list server 110. Additionally, the functions performed by the list server 110 can be implemented on other systems. The specific number of spamtrap addresses 112 created can depend on various factors including, for example, the amount of email traffic passing through the list server 110, the number of queries submitted to the database 114, the number of spam messages received at the spamtrap addresses 112, etc. At step 310, the list server 110 begins receiving emails from various senders 116. At step 320, information regarding unsolicited bulk email (spam) received at the spamtrap addresses is stored, for example, in the database 114. This information can vary depending on the specific embodiment of the invention. For example, the information stored can include the IP address of the sender 116, the return address used by the sender 116, etc. Additionally, the list server 110 could store information specific to the number of spamtrap hits for each sender 116, and/or the number of hits to each individual spamtrap address 112 from each individual sender 116.
[0034] At step 330, the list server 110 stores query data to the database 114, or other storage device. The query data can correspond to the number of queries received by the database 114 regarding specific senders 116, the total number of queries received by the database regarding all senders 116, identification information such as an IP address for the sender 116, etc. Both the information regarding the unsolicited bulk email (e.g., spam received at the spamtrap addresses) and the query data are stored in the database 114. As previously discussed, the database 114 can be maintained as a remotely located system or it can be maintained within the list server 110. At step 340, the list server 110 reviews the query data and the information regarding unsolicited bulk email in order to identify senders 116 that should be placed on the block lists 120. For example, the list server 110 may automatically determine criteria wherein the number of emails received at the spamtrap addresses 112 relative to the number queries received regarding senders 116 raises suspicion that a particular sender should be identified as sending unsolicited bulk email. At step 350, a block list 120 containing information regarding senders 116 that have been identified as transmitting unsolicited bulk emails is transmitted to various entities such as, for example, the mail servers 118 or other spam blocking systems. As previously discussed, the function of a list server 110 and mail server 118 can be incorporated into a single computer system. In such situations, the computer system generate a block list 120 that identifies potential spammers, and block email transmissions from such senders 116. [0035] Figure 4 is a flowchart illustrating steps performed to detect senders 116 of unsolicited bulk email according to one or more embodiments of the present invention. At step 400, one or more spamtrap addresses 112 are created by the list server 110. As previously discussed, the spamtrap addresses 112 are designed only to receive unsolicited bulk email. There is generally no, or very little, (e.g., less than 95%) legitimate email traffic flowing through these addresses. For example, bounced messages can sometimes reach a spamtrap address 112. A message having a mistyped destination address can sometimes reach a spamtrap address 112. At step 410, the list server 110 begins receiving emails from the senders 116. At step 412, the list server 110 stores information regarding unsolicited bulk email (spam) received at the spamtrap addresses 112. As previously discussed, this information can be stored on the database 114. The information can include, for example, an IP address associated with the different senders 116, the number of unsolicited bulk emails transmitted by the different senders 116, return addresses of the different senders 116, etc. At step 414, the list server 110 stores the query data in the database 114. The query data corresponds, in certain embodiments of the invention, to the total number of emails transmitted by the senders 116 (including spam), or the total number of legitimate emails transmitted by the senders 116. [0036] At step 416, the list server 110 optionally creates timestamps to associate with bulk emails received at the spamtrap addresses 112 and the query data. The timestamp could include, for example, the date and time at which the email was received and/or transmitted. At step 418, it is determined if a timestamp has been created for the query data and the stored information. If a timestamp has been created, then different weight factors can optionally be applied to individual entries. The weight factors could correspond to the stored information and the query data. The weight factors can serve various purposes. For example, according to at least one embodiment of the invention, stored information and query data related to older emails are given a lower weight factor than corresponding information for more recent emails. More particularly, emails that were transmitted, for example, more than two months prior to the current date can be counted with less weight than emails received within the past 24 hour period. This provides flexibility under certain circumstances where new senders 116 must be monitored for transmitting spam. Furthermore, certain embodiments of the invention can cause the weight factor to totally override emails having certain timestamps. [0037] At step 422, it is determined if the timestamp has expired. This corresponds to the situation where information has been weighted to the point that it is too old to be realistically factored into the current processing. If the timestamp has expired, than the entry is discarded at step 424. Otherwise, if the timestamp has not expired, then control passes to step 426 where a threshold is automatically set. As previously discussed, the threshold can be set based on various factors relating to the statistics of emails and spam received by both the email servers 118 and the spamtrap addresses 112. Additionally, various schemes can be employed to set the threshold value. [0038] At step 428, senders 116 that are believed to be transmitting spam are identified. This corresponds to a situation, under certain embodiments of the invention, where the amount of spam relative to legitimate email exceeds a certain value. At step 430, a block list 120 is constructed and transmitted to the mail servers 118. The block list 120 can contain, for example, the IP addresses of senders 116 who have been identified as transmitting a disproportionate amount of unsolicited bulk email relative to legitimate email. The block list 120 can also include query data that assists mail servers 118 with their own screening hardware/software to identify or verify the selection of a particular sender 116 as a spammer. At step 432, the different mail servers can exchange block lists 120 with each other.
[0039] At step 434, the list server 110 determines if any removal requests have been received. The removal requests correspond to requests from specific senders to be removed from the one or more block lists 120 because the requesters are transmitting legitimate emails. At step 436, the list server 110 can verify the information submitted by the sender 116 in order to insure that the sender 116 is transmitting legitimate emails from a legitimate email address. Such senders can include, for example, distribution lists that often transmit catalogs, circulars, coupons, etc. to members that subscribe to a particular service.
[0040] At step 438, the sender 116, if verified, is placed on a safe list. Once placed on the safe list, the sender 116 could be precluded from being placed on a future block list 120. This can be achieved, for example, by the list server 110 cross referencing the entries on the safe list with entries on newly created block list 120, and removing entries from the block list 120 that correspond to entries on the safe list. At this point, control returns to step 410 where the list server 110 continues receiving email. According to one or more embodiments of the invention, senders 116 who cannot be verified by the list server 110 are only placed on the safe list for a temporary time period until they can be verified. Optionally, such senders 116 could be placed on the safe list until the list server 110 identifies them as exceeding the threshold values based, in part, on emails transmitted to the spamtrap addresses 112. It should be further noted that the process illustrated in Fig. 4 can be continuous and dynamic in various embodiments of the present invention. According to such embodiments, the list server 110 continually receives emails and continually performs the necessary steps to adjust the threshold value based on the current email traffic and identify spammers on an ongoing basis. The list server 110 may perform these functions without any external (or user) intervention. [0041] The spam detection system of the present invention may be implemented in a variety of forms, such as in software or firmware, running on a general purpose computer or a specialized computer system such as, for example, a server. The software can be provided in any machine-readable medium, including magnetic or optical disk, or in memory. Furthermore, the present invention is utilizable in conjunction with a computer system that operates software which may require periodic updates. The spam detection system can be implemented on various computer systems and/or servers using any operating system including, Windows, MacOS, Unix, Linux, etc., and can be implemented using any email system.
[0042] The many features and advantages of the invention are apparent from the detailed specification, and thus, the appended claims are intended to cover all such features and advantages which fall within the true spirit and scope of the invention. Further, since numerous modifications and variations will become readily apparent to those skilled in the art, the invention should not be limited to the exact construction and operation illustrated and described. Rather, all suitable modifications and equivalents may be considered as falling within the scope of the claimed invention.

Claims

What is claimed is:
1. A method of detecting unsolicited bulk email, comprising: creating one or more spamtrap addresses for receiving unsolicited bulk email; storing information corresponding to the amount of unsolicited bulk email received at the one or more spamtrap addresses; storing query data corresponding to a number of queries received regarding senders of email; and identifying senders who transmit a disproportionate amount of unsolicited bulk email, based on at least the stored information and the stored query data.
2. The method of claim 1, wherein storing information further includes: storing an IP address associated with each of the different senders; and storing the number of unsolicited bulk emails transmitted by each of the different senders to the one or more spamtrap addresses.
3. The method of claim 1 , wherein the query data includes a number of inquiries on whether a selected sender has previously been associated with the transmission of unsolicited bulk email.
4. The method of claim 3, wherein the query data is representative of the total number of emails sent by the selected sender.
5. The method of claim 1, wherein identifying senders includes: automatically setting a threshold value; and identifying senders who transmit a disproportionate amount of unsolicited bulk email, based on at least the stored information, the stored query data, and the threshold value.
6. The method of claim 5, wherein automatically setting a threshold value includes evaluating data only for senders who have not been identified as sending a disproportionate amount of unsolicited bulk email.
7. The method of claim 1, wherein storing information includes storing a timestamp for each unsolicited bulk email received and for each query received; and applying different weight factors to the stored information and to the query data based on the timestamp.
8. The method of claim 7, wherein unsolicited bulk emails and received queries having recent timestamps are weighted higher than those having older timestamps.
9. The method of claim 7, wherein information regarding unsolicited bulk emails and received queries having timestamps older than a predetermined time period are discarded.
10. The method of claim 1, further comprising transmitting one or more block lists to at least one spam blocking system, wherein the one or more block lists contain information regarding senders who have been identified as transmitting a disproportionate amount of unsolicited bulk email.
11. The method of claim 10, wherein the at least one spam blocking systems exchange the one or more block lists with each other.
12. The method of claim 10, wherein the one or more block lists further contain IP addresses of senders who have been identified as transmitting a disproportionate amount of unsolicited bulk mail.
13. The method of claim 10, wherein the one or more block lists further contain query data.
14. The method of claim 1 , further comprising: receiving a removal request from an email sender; and if the sender is determined to be a legitimate sender of email, placing the sender submitting the removal request on a safe list containing senders who transmit legitimate email.
15. The method of claim 14, wherein the sender is not determined to be a legitimate sender of email, and further comprising a step of placing the sender on the safe list for a predetermined length of time.
16. A system for detecting unsolicited bulk email comprising: one or more spamtrap addresses for receiving unsolicited bulk email; a list server receiving email from a plurality of senders, and further receiving queries regarding said senders; and a database for storing information corresponding to the amount of unsolicited bulk email received at said one or more spamtrap addresses, said database further storing query data corresponding to a number of queries received regarding senders of email; said list server identifying senders who transmit a disproportionate amount of unsolicited bulk email based on at least the information stored in said database.
17. The system of claim 16, wherein the information stored on said database comprises at least an IP address associated with each of said senders and the number of unsolicited bulk emails sent by each of said senders to said one or more spamtrap addresses.
18. The system of claim 16, wherein the query data includes a number of inquiries on whether a selected sender has previously been associated with the transmission of unsolicited bulk email.
19. The system of claim 18, wherein said query data is representative of the total number of emails sent by said selected sender.
20. The system of claim 16, wherein said list server automatically sets a threshold value, and said list server further identifies senders who transmit an amount of unsolicited bulk email based on at least said threshold value.
21. The system of claim 16, wherein: said database further stores a timestamp for each unsolicited bulk email received at said spamtrap addresses and each query received; and said list server applies different weight factors to said stored information and said received queries based on the timestamp when identifying senders who transmit a disproportionate amount of unsolicited bulk email.
22. The system of claim 21, wherein unsolicited bulk emails and received queries having recent timestamps are weighted higher than those having older timestamps.
23. The system of claim 21 , wherein said list server discards unsolicited bulk emails and received queries having timestamps older than a predetermined time period.
24. The system of claim 16, further comprising at least one spam blocking system, and wherein said list server transmits one or more block lists to said at least one spam blocking system, each said one or more block lists containing information regarding senders who have been identified as transmitting a disproportionate amount of unsolicited bulk mail.
25. The system of claim 24, wherein said at least one spam blocking systems exchange said one or more block lists with each other.
26. The system of claim 24, wherein said one or more block lists contain IP.addresses of senders who have been identified as transmitting a disproportionate amount of unsolicited bulk email.
27. The system of claim 24, wherein said one or more block lists contain query data.
28. The system of claim 16, further comprising one or more email servers, and wherein said one or more mail servers submit queries to said database to determine which senders transmit a disproportionate amount of unsolicited bulk email based on the information and query data stored in said database
29. The system of claim 16, wherein: said list server evaluates removal requests from senders of email; and if a removal request is determined to be legitimate, said list server places the sender submitting the removal request on a safe list containing senders who transmit legitimate email.
30. The system of claim 29, wherein: a removal request is determined to be a legitimate sender of email; and said list server places the sender submitting the removal request on said safe list for a predetermined length of time.
31. The system of claim 16, wherein said database is maintained on said list server.
32. A computer program product, residing on a computer-readable medium, for use in controlling use of a computer program, said computer program product comprising instructions for causing a computer system to: create one or more spamtrap addresses for receiving unsolicited bulk email; store information corresponding to the amount of unsolicited bulk email received
at the one or more spamtrap addresses; store query data corresponding to a number of queries received regarding senders of email; and identify senders who transmit a disproportionate amount of unsolicited bulk email, based on at least the stored information and the stored query data.
33. The computer program product of claim 32, further comprising instructions for causing said computer system to: store an IP address associated with each of the different senders; and store the number of unsolicited bulk emails transmitted by each of the different senders to the one or more spamtrap addresses.
34. The computer program product of claim 32, wherein the query data includes a number of inquiries on whether a selected sender has previously been associated with the transmission of unsolicited bulk email.
35. The computer program product of claim 34, wherein the query data is representative of the total number of emails sent by the selected sender.
36. The computer program product of claim 32, wherein identifying senders comprises instructions for causing said computer system to: automatically set a threshold value; and identify senders who transmit a disproportionate amount of unsolicited bulk email, based on at least the stored information, the stored query data, and the threshold
value.
37. The computer program product of claim 36, wherein automatically setting a threshold value comprises instructions for causing said computer system to evaluate data only for senders who have not been identified as sending a disproportionate amount of unsolicited bulk email.
38. The computer program product of claim 32, wherein storing information comprises instructions for causing said computer system to: store a timestamp for each unsolicited bulk email received and for each query received; and apply different weight factors to the stored information and to the query data based on the timestamp.
39. The computer program product of claim 38, wherein unsolicited bulk emails and received queries having recent timestamps are weighted higher than those having older timestamps.
40. The computer program product of claim 38, wherein information regarding unsolicited bulk emails and received queries having timestamps older than a predetermined time period are discarded.
41. The computer program product of claim 32, further comprising instructions for causing said computer system to transmit one or more block lists to at least one spam blocking system, wherein the one or more block lists contain information regarding senders who have been identified as transmitting a disproportionate amount of unsolicited bulk email.
42. The computer program product of claim 41, wherein the at least one spam blocking systems exchange the one or more block lists with each other.
43. The computer program product of claim 41, wherein the one or more block lists further contain IP addresses of senders who have been identified as transmitting a disproportionate amount of unsolicited bulk mail.
44. The computer program product of claim 41 , wherein the one or more block lists further contain query data.
45. The computer program product of claim 32, further comprising instructions for causing said computer system to: receive a removal request from an email sender; and if the sender is determined to be a legitimate sender of email, placing the sender submitting the removal request on a safe list containing senders who transmit legitimate email.
46. The computer program product of claim 45, wherein the sender is not determined to be a legitimate sender of email, and further comprising a step of placing the sender on the safe list for a predetermined length of time.
47. A system for detecting unsolicited bulk email comprising: one or more spamtrap addresses for receiving unsolicited bulk email; a list server receiving email from a plurality of senders, and further receiving queries regarding said senders; and a database for storing at least an IP address associated with each of said senders and the number of unsolicited bulk emails sent by each of said senders to said one or more spamtrap addresses, said database further storing query data corresponding to a number of queries received regarding whether a selected sender has previously been associated with the transmission of unsolicited bulk email; said list server identifying senders who transmit a disproportionate amount of unsolicited bulk email based at least on the information stored in said database.
48. A system for detecting unsolicited bulk email comprising: one or more spamtrap addresses for receiving unsolicited bulk email; a list server receiving email from a plurality of senders, and further receiving queries regarding said senders; and a database for storing information corresponding to the amount of unsolicited bulk email received at said one or more spamtrap addresses, said database further storing query data corresponding to a number of queries received regarding senders of email; said list server automatically setting a threshold value, and identifying senders who transmit a disproportionate amount of unsolicited bulk email based on at least said threshold value and the information stored in said database.
49. A system for detecting unsolicited bulk email comprising: one or more spamtrap addresses for receiving unsolicited bulk email; a list server receiving email from a plurality of senders, and further receiving queries regarding said senders; a database for storing information corresponding to the amount of unsolicited bulk email received at said one or more spamtrap addresses, said database further storing query data corresponding to a number of queries received regarding senders of email; and a timestamp for each unsolicited bulk email received at the spamtrap addresses and each query received; a plurality of weight factors corresponding to the relevance of a database entry based on said timestamp; said list server identifying senders who transmit a disproportionate amount of unsolicited bulk email based on at least the information stored in said database and said plurality of weight factors.
50. A method of detecting unsolicited bulk email, comprising: creating one or more spamtrap addresses for receiving unsolicited bulk email; storing an IP address associated with each of the different senders; storing the number of unsolicited bulk emails sent by each of the different senders to the one or more spamtrap addresses; storing query data corresponding to a number of queries received regarding senders of email; and identifying senders who transmit a disproportionate amount of unsolicited bulk email, based on at least the stored IP addresses, the stored number of unsolicited bulk emails, and the stored query data.
51. A method of detecting unsolicited bulk email, comprising: creating one or more spamtrap addresses for receiving unsolicited bulk email; storing information corresponding to the amount of unsolicited bulk email received at the one or more spamtrap addresses; storing query data corresponding to a number of queries received regarding senders of email; automatically setting a threshold value; and identifying senders who transmit a disproportionate amount of unsolicited bulk email, based on at least the stored information, the stored query data, and the threshold value.
52. A method of detecting unsolicited bulk email, comprising: creating one or more spamtrap addresses for receiving unsolicited bulk email; storing infoπnation corresponding to the amount of unsolicited bulk email received at the one or more spamtrap addresses; storing query data corresponding to a number of queries received regarding senders of email; creating a timestamp for each unsolicited bulk email received and for each query received; applying different weight factors to the stored information and to the query data based on the timestamp; identifying senders who transmit a disproportionate amount of unsolicited bulk email, based on at least the stored information, the weight factors, and the stored query data.
53. A method of detecting unsolicited bulk email, comprising: creating one or more spamtrap addresses for receiving unsolicited bulk email; storing information corresponding to the amount of unsolicited bulk email received at the one or more spamtrap addresses; storing query data corresponding to a number of queries received regarding senders of email; identifying senders who transmit a disproportionate amount of unsolicited bulk email, based on the stored information and the stored query data; receiving a removal request from an email sender; and if the sender is determined to be a legitimate sender of email, placing the sender submitting the removal request on a safe list containing senders who transmit legitimate email.
PCT/US2005/032819 2004-09-16 2005-09-16 Self-tuning statistical method and system for blocking spam WO2006033936A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/942,112 US8312085B2 (en) 2004-09-16 2004-09-16 Self-tuning statistical method and system for blocking spam
US10/942,112 2004-09-16

Publications (2)

Publication Number Publication Date
WO2006033936A2 true WO2006033936A2 (en) 2006-03-30
WO2006033936A3 WO2006033936A3 (en) 2008-01-10

Family

ID=36090482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/032819 WO2006033936A2 (en) 2004-09-16 2005-09-16 Self-tuning statistical method and system for blocking spam

Country Status (2)

Country Link
US (1) US8312085B2 (en)
WO (1) WO2006033936A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2452555A (en) * 2007-09-07 2009-03-11 Toshiba Res Europ Ltd Identification of insecure network nodes, such as spammers, using decoy addresses

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7594230B2 (en) 2001-06-11 2009-09-22 Microsoft Corporation Web server architecture
US7159025B2 (en) * 2002-03-22 2007-01-02 Microsoft Corporation System for selectively caching content data in a server based on gathered information and type of memory in the server
US8073910B2 (en) * 2005-03-03 2011-12-06 Iconix, Inc. User interface for email inbox to call attention differently to different classes of email
US20070162394A1 (en) 2004-02-12 2007-07-12 Iconix, Inc. Rapid identification of message authentication
US20070107053A1 (en) * 2004-05-02 2007-05-10 Markmonitor, Inc. Enhanced responses to online fraud
US8041769B2 (en) * 2004-05-02 2011-10-18 Markmonitor Inc. Generating phish messages
US20070299915A1 (en) * 2004-05-02 2007-12-27 Markmonitor, Inc. Customer-based detection of online fraud
US8769671B2 (en) * 2004-05-02 2014-07-01 Markmonitor Inc. Online fraud solution
US7992204B2 (en) * 2004-05-02 2011-08-02 Markmonitor, Inc. Enhanced responses to online fraud
US9203648B2 (en) 2004-05-02 2015-12-01 Thomson Reuters Global Resources Online fraud solution
US7457823B2 (en) * 2004-05-02 2008-11-25 Markmonitor Inc. Methods and systems for analyzing data related to possible online fraud
US7870608B2 (en) * 2004-05-02 2011-01-11 Markmonitor, Inc. Early detection and monitoring of online fraud
US7913302B2 (en) * 2004-05-02 2011-03-22 Markmonitor, Inc. Advanced responses to online fraud
US7418709B2 (en) * 2004-08-31 2008-08-26 Microsoft Corporation URL namespace to support multiple-protocol processing within worker processes
US20060101516A1 (en) * 2004-10-12 2006-05-11 Sushanthan Sudaharan Honeynet farms as an early warning system for production networks
US7711781B2 (en) * 2004-11-09 2010-05-04 International Business Machines Corporation Technique for detecting and blocking unwanted instant messages
US20060168017A1 (en) * 2004-11-30 2006-07-27 Microsoft Corporation Dynamic spam trap accounts
US7610344B2 (en) * 2004-12-13 2009-10-27 Microsoft Corporation Sender reputations for spam prevention
US20060179137A1 (en) * 2005-02-04 2006-08-10 Jennings Raymond B Iii Method and apparatus for reducing spam on a peer-to-peer network
US8312119B2 (en) * 2005-03-01 2012-11-13 Microsoft Corporation IP block activity feedback system
US20070005702A1 (en) * 2005-03-03 2007-01-04 Tokuda Lance A User interface for email inbox to call attention differently to different classes of email
JP4559295B2 (en) * 2005-05-17 2010-10-06 株式会社エヌ・ティ・ティ・ドコモ Data communication system and data communication method
CA2613083A1 (en) * 2005-07-01 2007-01-11 Markmonitor Inc. Enhanced fraud monitoring systems
JP4353933B2 (en) * 2005-10-11 2009-10-28 ソニー・エリクソン・モバイルコミュニケーションズ株式会社 Communication apparatus and computer program
US8601064B1 (en) * 2006-04-28 2013-12-03 Trend Micro Incorporated Techniques for defending an email system against malicious sources
US7603425B2 (en) * 2006-08-07 2009-10-13 Microsoft Corporation Email provider prevention/deterrence of unsolicited messages
US20080147669A1 (en) * 2006-12-14 2008-06-19 Microsoft Corporation Detecting web spam from changes to links of web sites
US9148432B2 (en) * 2010-10-12 2015-09-29 Microsoft Technology Licensing, Llc Range weighted internet protocol address blacklist
US8682990B2 (en) * 2011-10-03 2014-03-25 Microsoft Corporation Identifying first contact unsolicited communications
US8938511B2 (en) * 2012-06-12 2015-01-20 International Business Machines Corporation Method and apparatus for detecting unauthorized bulk forwarding of sensitive data over a network
US20140358939A1 (en) * 2013-05-31 2014-12-04 Emailvision Holdings Limited List hygiene tool
US10108918B2 (en) 2013-09-19 2018-10-23 Acxiom Corporation Method and system for inferring risk of data leakage from third-party tags

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030009698A1 (en) * 2001-05-30 2003-01-09 Cascadezone, Inc. Spam avenger
US6769016B2 (en) * 2001-07-26 2004-07-27 Networks Associates Technology, Inc. Intelligent SPAM detection system using an updateable neural analysis engine
US20040243847A1 (en) * 2003-03-03 2004-12-02 Way Gregory G. Method for rejecting SPAM email and for authenticating source addresses in email servers

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040177120A1 (en) * 2003-03-07 2004-09-09 Kirsch Steven T. Method for filtering e-mail messages
US7257564B2 (en) * 2003-10-03 2007-08-14 Tumbleweed Communications Corp. Dynamic message filtering
US7222158B2 (en) * 2003-12-31 2007-05-22 Aol Llc Third party provided transactional white-listing for filtering electronic communications
US7653695B2 (en) * 2004-02-17 2010-01-26 Ironport Systems, Inc. Collecting, aggregating, and managing information relating to electronic messages
US8918466B2 (en) * 2004-03-09 2014-12-23 Tonny Yu System for email processing and analysis
CN101288060B (en) * 2004-05-25 2012-11-07 波斯蒂尼公司 Electronic message source reputation information system
US7870200B2 (en) * 2004-05-29 2011-01-11 Ironport Systems, Inc. Monitoring the flow of messages received at a server

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030009698A1 (en) * 2001-05-30 2003-01-09 Cascadezone, Inc. Spam avenger
US6769016B2 (en) * 2001-07-26 2004-07-27 Networks Associates Technology, Inc. Intelligent SPAM detection system using an updateable neural analysis engine
US20040243847A1 (en) * 2003-03-03 2004-12-02 Way Gregory G. Method for rejecting SPAM email and for authenticating source addresses in email servers

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2452555A (en) * 2007-09-07 2009-03-11 Toshiba Res Europ Ltd Identification of insecure network nodes, such as spammers, using decoy addresses
GB2452555B (en) * 2007-09-07 2012-05-02 Toshiba Res Europ Ltd Identification of insecure network nodes

Also Published As

Publication number Publication date
US8312085B2 (en) 2012-11-13
US20060075030A1 (en) 2006-04-06
WO2006033936A3 (en) 2008-01-10

Similar Documents

Publication Publication Date Title
US8312085B2 (en) Self-tuning statistical method and system for blocking spam
US9306890B2 (en) System and method for regulating electronic messages
US10699246B2 (en) Probability based whitelist
US7206814B2 (en) Method and system for categorizing and processing e-mails
US7849142B2 (en) Managing connections, messages, and directory harvest attacks at a server
US7366761B2 (en) Method for creating a whitelist for processing e-mails
US7472163B1 (en) Bulk message identification
US7653695B2 (en) Collecting, aggregating, and managing information relating to electronic messages
EP1476819B1 (en) E-mail management services
US8554847B2 (en) Anti-spam profile clustering based on user behavior
US20050080857A1 (en) Method and system for categorizing and processing e-mails
US20050091319A1 (en) Database for receiving, storing and compiling information about email messages
US20050091320A1 (en) Method and system for categorizing and processing e-mails
US20060010215A1 (en) Managing connections and messages at a server by associating different actions for both different senders and different recipients
US20050198159A1 (en) Method and system for categorizing and processing e-mails based upon information in the message header and SMTP session
US20120271890A1 (en) Systems And Methods For Classification Of Messaging Entities
US20100100957A1 (en) Method And Apparatus For Controlling Unsolicited Messages In A Messaging Network Using An Authoritative Domain Name Server
US20070180031A1 (en) Email Opt-out Enforcement
US7620691B1 (en) Filtering electronic messages while permitting delivery of solicited electronics messages
CN101247406A (en) Method for local information classification using global information and junk mail detection system
WO2004081734A2 (en) Method for filtering e-mail messages

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase