US20080162449A1 - Dynamic page similarity measurement - Google Patents

Dynamic page similarity measurement Download PDF

Info

Publication number
US20080162449A1
US20080162449A1 US11/617,654 US61765406A US2008162449A1 US 20080162449 A1 US20080162449 A1 US 20080162449A1 US 61765406 A US61765406 A US 61765406A US 2008162449 A1 US2008162449 A1 US 2008162449A1
Authority
US
United States
Prior art keywords
web page
component
similarity
components
given
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/617,654
Inventor
Chen Chao-Yu
Pu Peng-Shih
Tsai Yu-Fang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Trend Micro Inc
Original Assignee
Trend Micro Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Trend Micro Inc filed Critical Trend Micro Inc
Priority to US11/617,654 priority Critical patent/US20080162449A1/en
Assigned to TREND MICRO INCORPORATED reassignment TREND MICRO INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHAO-YU, CHEN, PENG-SHIH, PU, YU-FANG, TSAI
Publication of US20080162449A1 publication Critical patent/US20080162449A1/en
Priority to US16/548,269 priority patent/US11042630B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/51Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/235Update request formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Definitions

  • Phishing represents a fraudulent technique employed to obtain confidential transaction information (such as user name, password, financial information, credit card information, etc.) from computer users for misuse.
  • the phisher employs a phishing server to send an apparently official electronic communication (such as an official looking email) to the victim.
  • an apparently official electronic communication such as an official looking email
  • the email would typically come from an XYZ bank email address and contain official-looking logos and language to deceive the victim into believing that the email is legitimate.
  • the phisher's email typically includes language urging the victim to access the website of XYZ bank in order to verify some information or to confirm some transaction.
  • the email also typically includes a link for use by the victim to supposedly access the website of XYZ bank.
  • the sham website referred to herein as the phishing website, would then ask for confidential information from the victim. Since the victim had been told in advance that the purpose of clicking on the link is to verify some account information or to confirm some transaction, many victims unquestioningly enter the requested information.
  • the confidential information is collected by the phisher, the phisher can subsequently employ the information to perpetrate fraud on the victim by stealing money from the victim's account, by purchasing goods using the account funds, etc.
  • FIG. 1 illustrates an example of a phishing attack.
  • a phisher 102 typically an email server that is under control of a human phisher
  • the email may, for example, attempt to convince the recipient 108 to update his account by clicking on an attached link to access a web page. If the recipient 108 clicks on the link, the web page that opens would then request the user to enter the user's confidential information such as userid, password, account number, etc.
  • the user's confidential information is sent ( 110 ) the user's confidential information to a phishing website 112 .
  • Phishing website 112 collects the user's confidential information to allow the phisher to perpetrate fraud on the user.
  • phishers actually divert the victim to another website other than the website of the legitimate business that the victim intended to visit, some knowledgeable users may be able to spot the difference in the website domain names and may become alert to the possibility that a phishing attack is being attempted. For example, if a victim is taken to a website whose domain name “http://218.246.224.203/icons/cgi-bin/xyzbank/login.php” appears in the browser's URL address bar, that victim may be alert to the fact that the phisher's website URL address as shown on the browser's URL toolbar is different from the usual “http://www.xyzbank.com/us/cgi-bin/login.php” and may refuse to furnish the confidential information out of suspicion.
  • the invention relates, in an embodiment, to a computer-implemented method for ascertaining which web page among a plurality of candidate web pages is similar to a given web page.
  • the method includes extracting a set of web page components from the given web page.
  • the method also includes comparing the given web page against each of the plurality of candidate web pages in turn. The comparing results in a composite similarity score for the set of web page components.
  • the composite similarity score is computed from scores assigned to individual ones of the set of web page components in accordance with a set of scoring rules associated with the web page that is under examination for similarity, wherein a web page component of the set of web page components is associated with a first score if the web page component also exists in the web page that is under examination for similarity.
  • the web page component of the set of web page components is associated with second score different from the first web page component if the web page component does not exists in the web page that is under examination for similarity. If the composite similarity score exceeds a predefined threshold, the method also includes designating the given web page similar to the web page that is under examination for similarity.
  • FIG. 1 illustrates an example of a phishing attack.
  • FIG. 2 shows, in accordance with an embodiment of the invention, the high level steps for preparing the set of likely target web pages for similarity comparison.
  • FIG. 3 shows, in accordance with an embodiment of the present invention, the steps for performing similarity analysis for a suspect web page.
  • the invention might also cover articles of manufacture that includes a computer readable medium on which computer-readable instructions for carrying out embodiments of the inventive technique are stored.
  • the computer readable medium may include, for example, semiconductor, magnetic, opto-magnetic, optical, or other forms of computer readable medium for storing computer readable code.
  • the invention may also cover apparatuses for practicing embodiments of the invention. Such apparatus may include circuits, dedicated and/or programmable, to carry out tasks pertaining to embodiments of the invention. Examples of such apparatus include a general-purpose computer and/or a dedicated computing device when appropriately programmed and may include a combination of a computer/computing device and dedicated/programmable circuits adapted for the various tasks pertaining to embodiments of the invention.
  • phishing web page Since the purpose of a phishing web page is divert the user input information to a website controlled by the phisher, this fact provides a possible approach to detect whether a particular web page is being used in attempting to commit phishing fraud. If the counterpart legitimate web page can be determined, it is possible then to determine whether the transaction information destination (i.e., the location that the respective web pages specify for user input data to be sent) would be the same for both the legitimate web page and for the suspect web page (e.g., one under investigation to ascertain whether that web page is attempting to commit a phishing fraud). If the transaction information destinations are different for the two web pages, that difference is an indication that a phishing fraud may be underway.
  • the transaction information destination i.e., the location that the respective web pages specify for user input data to be sent
  • a given web page is sufficiently similar to a suspect web page such that the given web page is likely the counterpart legitimate web page that the suspect web page is attempting to emulate. Since there are potentially billions of web pages in existence today, it would be impractical to test a suspect web page against every web page in existence to determine whether they are similar. Even if there is sufficient computing power to do so, the amount of time required to make such a similarity determination would render the technique impractical in use.
  • the inventors herein realize, however, that given the scope of the phishing problem, the set of web pages to be tested for similarity against a suspect web page is substantially smaller and more manageable than the set of all available web pages. It is reasoned that the majority of phishing attempts will be focused on a few types of web page, including those that collect transaction information from the user for example. Accordingly, web pages that merely implement static presentations of data do not present the same degree of phishing risk as a web page that collects, for example, the user's login data, the user's financial data, or any of the user's personal, financial, and/or confidential data.
  • the target websites are likely to be found among financial institution sites (such as banks, on-line trading accounts, online payment accounts), shopping sites (such as sites that allow the user to purchase goods and have the goods shipped to a particular address upon entering the user's financial and/or login data), and generally any website that provides goods and/or services upon the user's presentation of authenticating and/or financial/personal data.
  • financial institution sites such as banks, on-line trading accounts, online payment accounts
  • shopping sites such as sites that allow the user to purchase goods and have the goods shipped to a particular address upon entering the user's financial and/or login data
  • any website that provides goods and/or services upon the user's presentation of authenticating and/or financial/personal data.
  • each likely target web page is associated with a set of scoring rules (which may comprise one or more scoring rules) for scoring features of that target web page if those same features are found on the suspect web page.
  • each web page may be thought of as a combination of features.
  • These features may include visible characteristics or attributes, such as the color, location, and size of its images or textual information.
  • These features may also include background characteristics or attributes that are not necessarily visible to a user. For example, some portion of many web pages may be formed using code that is largely invisible to the user but nevertheless contributes to the transmission, generation, and/or operation of the web page. Examples of these features include the URL strings specifying the destination for the user-input transaction information, HTML strings or other codes to perform computations, etc.
  • the login page of XYZ bank may be associated with a set of scoring rules that gives a high score for a nearly invisible security feature while giving a lower score for an obvious feature, such as a prominently displayed logo. This is because, for example, it may have been judged that it would be unlikely for a phisher to duplicate a nearly invisible and easily overlooked feature than to copy a highly visible logo.
  • a set of scoring rules for the login page for XYZ bank may give a particular score for a particular field of content, including for example the domain/port/query/string of a URL and/or the HTML/text string of a URL.
  • any feature may be associated with a score, if desired, and the particular score associated with a feature may vary and may even be arbitrary.
  • the rule creator may arbitrarily decide that a particular misspelling is intentional, or a particular background characteristic that can be easily overlooked is intentional and the absence of that feature in a suspect web page may indicate that that the suspect web page is not similar to the target web page at issue.
  • the set of scoring rules associated with the login page for XYZ bank would be employed for scoring features found in the suspect web page. In this manner, if the suspect web page has a large number of features in common with the login page for XYZ bank and/or has in common certain high-scoring features, the suspect web page may earn a sufficiently high aggregate score to be deemed similar to the login page for XYZ bank.
  • the threshold for deciding whether an aggregate score earned by a suspect web page when that suspect web page is compared against the login page for XYZ bank may be implemented in the set of scoring rules for the login page of XYZ bank, for example. As with the determination of how many point a particular feature may be worth, the determination of the particular threshold value for deeming a suspect web page similar may be made empirically by a human or by automated software.
  • each potential target web page e.g., Acme Store credit card entry page
  • each potential target web page e.g., Acme Store credit card entry page
  • that set of scoring rules are employed to generate a score for a suspect web page when that suspect web page is compared against Acme Store credit card entry page.
  • the similarity threshold value to determine whether a suspect web page is similar to Acme Store credit card entry page is implemented by the set of scoring rules associated with the Acme Store credit card entry page.
  • the set of scoring rules associated with that potential target web page e.g., ABC Bank personal information authentication page
  • the similarity threshold value to determine whether a suspect web page is similar to the ABC Bank personal information authentication page is implemented by the set of scoring rules associated with the ABC Bank personal information authentication page.
  • the score associated with each feature and/or the similarity threshold in the set of scoring rules for a particular web page may be continually refined and updated each time a “false positive” or an erroneous identification of similarity or dissimilarity occurs.
  • the scoring rules may be revised and/or the similarity threshold in the set of scoring rules for that particular web page may be revised upward so that only suspect web pages that have a large number of features in common or having a sufficient number of high-scoring features in common would be judged to be similar.
  • the scoring rules may be revised and/or the similarity threshold in the set of scoring rules for that particular web page may be revised downward so that web pages that are truly similar may be judged to be to be similar by the set of scoring rules for that particular web page. Since the set of scoring rules are associated with the legitimate web page, the effect of continually improving the scoring rules result in increasingly accurate similarity identification as more suspect web pages are tested against the legitimate web page.
  • fuzzy logic or artificial intelligence may be employed to render the comparison process more efficient and/or accurate.
  • regular expressions for textual features may be employed in the evaluation of features and can achieve a good accuracy.
  • a regular expression refers to a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are known to those skilled in the art and will not be explained in details herein. Using regular expressions in the creation of the set of scoring rules and in the scoring rules themselves increases the flexibility with which features in the suspect web pages may be identified and scored.
  • FIG. 2 shows, in accordance with an embodiment of the invention, the high level steps for preparing the set of likely target web pages for similarity comparison.
  • the set of likely target web pages are selected on the basis of website type and web page type.
  • website type websites that are popular and/or provide money, goods, or services tend to be sites that are targets for phishers and may thus be chosen in an embodiment.
  • web pages that request from users transaction information tend to be web pages that are targets of phishers and may thus be chosen, in an embodiment.
  • both the website type filter and web page type filter may be employed to select the set of likely target web pages.
  • a human operator may select and add web pages to the set likely target web pages if it is believed that those web pages may be phishing targets.
  • web pages may also be included based on other criteria designed to select web pages deemed to be likely to be susceptible to phishing attacks
  • each of the likely target web page in the set of likely target web pages are processed to generate a set of scoring rules for features in that web page.
  • a feature may represent any attribute or characteristic of a web page, whether or not human or visually perceptible.
  • a human operator may manually designate the features worthy of scoring and the score associated with each of the web page features.
  • software may be employed to scan through a web page and/or the code implementing the web page and assign scores to some or all of the features found.
  • each web page and its set of scoring rules are stored ( 206 ) for subsequent use in similarity determination with a suspect web page.
  • FIG. 3 shows, in accordance with an embodiment of the present invention, the steps for performing similarity analysis for a suspect web page.
  • the suspect web page is received.
  • the suspect web page is compared against each likely target web page in the set of likely target web pages.
  • web pages in the set of likely target web pages may optionally be re-ordered based on information gleaned from the suspect web page such that those likely target web pages that have a highly probability of a similarity match are tested first. For example, if text or image in the suspect web page suggests that the suspect web page is a login web page for a particular enterprise, likely target login web pages for that particular enterprise may be tested first.
  • the set of scoring rules for the likely target web page currently being tested is employed to score features found in the suspect web page. If the aggregate score exceeds (or equal to, in an embodiment) a certain similarity threshold (as determined by step 306 ), that likely target web page is identified as the web page that is similar to the suspect web page ( 308 ). Thereafter, analysis may be performed on the suspect web page to determine whether the suspect web page is indeed represents an attempt to perform a phishing attack on the identified similar target web page.
  • changes may be made to the selection of features, the scoring of features, and/or the similarity threshold associated with the set of scoring rules for the target web page that was misidentified as being similar to the suspect web page. If all likely target web pages are exhausted and no similar web pages are found, a report is then provided, noting that a similar web page is not found among the set of likely target web pages. In this case, the similarity testing may proceed against additional web pages that were not included in the set of likely target web pages or the operator may be notified and the method of FIG. 3 may simply end after notification.
  • embodiments of the invention are able to ascertain the identity of the target web page in a highly efficient manner.
  • the set of likely target web pages may be made smaller. Since each likely target web page is associated with its own scoring rules, much flexibility is afforded to entities who own those likely target web pages in deciding whether the suspect web page is sufficiently similar. If an erroneous similarity determination is made, changes to the scoring rules and/or the similarity threshold may be made, enabling the similarity determination process to become more accurate over time.

Abstract

A method for determining which web page among multiple candidate web pages is similar to a given web page. For each candidate web page, a set of scoring rules is provided to score the components therein. When the given web page is compared against a candidate web page, each component that is found in both the given web page and the candidate web page under examination is given a score in accordance with the set of scoring rules that is specific to that web page under examination. A composite similarity score is computed for each comparison between the given webpage and a candidate web page. If the composite similarity score exceeds a predefined threshold value for a comparison between the given webpage and a candidate web page, that candidate web page is deemed the web page that is similar.

Description

    BACKGROUND OF THE INVENTION
  • Phishing represents a fraudulent technique employed to obtain confidential transaction information (such as user name, password, financial information, credit card information, etc.) from computer users for misuse. In phishing, the phisher employs a phishing server to send an apparently official electronic communication (such as an official looking email) to the victim. For example, if a phisher wishes to obtain confidential information to access a victim's account at XYZ bank, the email would typically come from an XYZ bank email address and contain official-looking logos and language to deceive the victim into believing that the email is legitimate.
  • Further, the phisher's email typically includes language urging the victim to access the website of XYZ bank in order to verify some information or to confirm some transaction. The email also typically includes a link for use by the victim to supposedly access the website of XYZ bank. However, when the victim clicks on the link included in the email, the victim is taken instead to a sham website set up in advance by the phisher. The sham website, referred to herein as the phishing website, would then ask for confidential information from the victim. Since the victim had been told in advance that the purpose of clicking on the link is to verify some account information or to confirm some transaction, many victims unquestioningly enter the requested information. Once the confidential information is collected by the phisher, the phisher can subsequently employ the information to perpetrate fraud on the victim by stealing money from the victim's account, by purchasing goods using the account funds, etc.
  • FIG. 1 illustrates an example of a phishing attack. In FIG. 1, a phisher 102 (typically an email server that is under control of a human phisher) sends an official-looking email 104 designed to convince a recipient 108 that the email is sent by a legitimate business, such as by bank 106. The email may, for example, attempt to convince the recipient 108 to update his account by clicking on an attached link to access a web page. If the recipient 108 clicks on the link, the web page that opens would then request the user to enter the user's confidential information such as userid, password, account number, etc.
  • However, since the web page did not come from the legitimate business 106, the user's confidential information is sent (110) the user's confidential information to a phishing website 112. Phishing website 112 then collects the user's confidential information to allow the phisher to perpetrate fraud on the user.
  • Because phishers actually divert the victim to another website other than the website of the legitimate business that the victim intended to visit, some knowledgeable users may be able to spot the difference in the website domain names and may become alert to the possibility that a phishing attack is being attempted. For example, if a victim is taken to a website whose domain name “http://218.246.224.203/icons/cgi-bin/xyzbank/login.php” appears in the browser's URL address bar, that victim may be alert to the fact that the phisher's website URL address as shown on the browser's URL toolbar is different from the usual “http://www.xyzbank.com/us/cgi-bin/login.php” and may refuse to furnish the confidential information out of suspicion. However, it is known that many users are not sophisticated or always vigilant against phishing attempts. Accordingly, relying on users to stay on guard against phishing attempts has proven to be an inadequate response to the phishing problem.
  • SUMMARY OF INVENTION
  • The invention relates, in an embodiment, to a computer-implemented method for ascertaining which web page among a plurality of candidate web pages is similar to a given web page. The method includes extracting a set of web page components from the given web page. The method also includes comparing the given web page against each of the plurality of candidate web pages in turn. The comparing results in a composite similarity score for the set of web page components. The composite similarity score is computed from scores assigned to individual ones of the set of web page components in accordance with a set of scoring rules associated with the web page that is under examination for similarity, wherein a web page component of the set of web page components is associated with a first score if the web page component also exists in the web page that is under examination for similarity. The web page component of the set of web page components is associated with second score different from the first web page component if the web page component does not exists in the web page that is under examination for similarity. If the composite similarity score exceeds a predefined threshold, the method also includes designating the given web page similar to the web page that is under examination for similarity.
  • These and other features of the present invention will be described in more detail below in the detailed description of the invention and in conjunction with the following figures.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
  • FIG. 1 illustrates an example of a phishing attack.
  • FIG. 2 shows, in accordance with an embodiment of the invention, the high level steps for preparing the set of likely target web pages for similarity comparison.
  • FIG. 3 shows, in accordance with an embodiment of the present invention, the steps for performing similarity analysis for a suspect web page.
  • DETAILED DESCRIPTION OF EMBODIMENTS
  • The present invention will now be described in detail with reference to a few embodiments thereof as illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process steps and/or structures have not been described in detail in order to not unnecessarily obscure the present invention.
  • Various embodiments are described herein below, including methods and techniques. It should be kept in mind that the invention might also cover articles of manufacture that includes a computer readable medium on which computer-readable instructions for carrying out embodiments of the inventive technique are stored. The computer readable medium may include, for example, semiconductor, magnetic, opto-magnetic, optical, or other forms of computer readable medium for storing computer readable code. Further, the invention may also cover apparatuses for practicing embodiments of the invention. Such apparatus may include circuits, dedicated and/or programmable, to carry out tasks pertaining to embodiments of the invention. Examples of such apparatus include a general-purpose computer and/or a dedicated computing device when appropriately programmed and may include a combination of a computer/computing device and dedicated/programmable circuits adapted for the various tasks pertaining to embodiments of the invention.
  • Since the purpose of a phishing web page is divert the user input information to a website controlled by the phisher, this fact provides a possible approach to detect whether a particular web page is being used in attempting to commit phishing fraud. If the counterpart legitimate web page can be determined, it is possible then to determine whether the transaction information destination (i.e., the location that the respective web pages specify for user input data to be sent) would be the same for both the legitimate web page and for the suspect web page (e.g., one under investigation to ascertain whether that web page is attempting to commit a phishing fraud). If the transaction information destinations are different for the two web pages, that difference is an indication that a phishing fraud may be underway.
  • The aforementioned approach would be operative only if, however, the identity of the counterpart legitimate web page can be ascertained from the suspect web page. Ascertaining whether a given web page is sufficiently similar to a suspect web page such that the given web page is likely the counterpart legitimate web page that the suspect web page is attempting to emulate is a subject of the present invention herein.
  • In accordance with embodiments of the present invention, there are provided methods and apparatus for dynamically ascertaining whether a given web page is sufficiently similar to a suspect web page such that the given web page is likely the counterpart legitimate web page that the suspect web page is attempting to emulate. Since there are potentially billions of web pages in existence today, it would be impractical to test a suspect web page against every web page in existence to determine whether they are similar. Even if there is sufficient computing power to do so, the amount of time required to make such a similarity determination would render the technique impractical in use.
  • The inventors herein realize, however, that given the scope of the phishing problem, the set of web pages to be tested for similarity against a suspect web page is substantially smaller and more manageable than the set of all available web pages. It is reasoned that the majority of phishing attempts will be focused on a few types of web page, including those that collect transaction information from the user for example. Accordingly, web pages that merely implement static presentations of data do not present the same degree of phishing risk as a web page that collects, for example, the user's login data, the user's financial data, or any of the user's personal, financial, and/or confidential data.
  • Furthermore, it is reasoned that the majority of phishing attempts would also be focused on a certain known types of website. For example, the large majority of phishing attempts will be motivated by financial fraud, and thus the target websites are likely to be found among financial institution sites (such as banks, on-line trading accounts, online payment accounts), shopping sites (such as sites that allow the user to purchase goods and have the goods shipped to a particular address upon entering the user's financial and/or login data), and generally any website that provides goods and/or services upon the user's presentation of authenticating and/or financial/personal data.
  • Of these websites, it is reasoned that a large majority of phishing attempts will again be focused on those that are most popular since the user whom the phisher is attempting to deceive would more likely have an account at a popular online store versus a relatively obscure online store. By progressively narrowing down the set of possible target websites and web pages, the number of web pages to be tested for similarity against a suspect phishing web page can be kept manageably small for computational purposes. Even by focusing only on the top dozens or hundreds of target websites and web pages (which may be identified by performing a study of past phishing attempts for example), it is possible to provide a heightened level of protection against phishing via the ability to identify the target web page for a large majority of the time, and to determine whether their transaction information destinations are the same.
  • The inventors herein also provide techniques to efficiently test a particular potential target web page for similarity with a suspect web page. In accordance with an embodiment of the invention, each likely target web page is associated with a set of scoring rules (which may comprise one or more scoring rules) for scoring features of that target web page if those same features are found on the suspect web page.
  • To elaborate, each web page may be thought of as a combination of features. These features may include visible characteristics or attributes, such as the color, location, and size of its images or textual information. These features may also include background characteristics or attributes that are not necessarily visible to a user. For example, some portion of many web pages may be formed using code that is largely invisible to the user but nevertheless contributes to the transmission, generation, and/or operation of the web page. Examples of these features include the URL strings specifying the destination for the user-input transaction information, HTML strings or other codes to perform computations, etc
  • Since the set of likely target web pages are limited in number given the scope of the phishing problem, it is possible to manually (i.e., performed by a human) or automatically (i.e., performed in an automated manner using software) generate rules for scoring features of a particular target web page.
  • For example, the login page of XYZ bank may be associated with a set of scoring rules that gives a high score for a nearly invisible security feature while giving a lower score for an obvious feature, such as a prominently displayed logo. This is because, for example, it may have been judged that it would be unlikely for a phisher to duplicate a nearly invisible and easily overlooked feature than to copy a highly visible logo. As another example, such a set of scoring rules for the login page for XYZ bank may give a particular score for a particular field of content, including for example the domain/port/query/string of a URL and/or the HTML/text string of a URL.
  • Generally speaking, any feature may be associated with a score, if desired, and the particular score associated with a feature may vary and may even be arbitrary. For example, the rule creator may arbitrarily decide that a particular misspelling is intentional, or a particular background characteristic that can be easily overlooked is intentional and the absence of that feature in a suspect web page may indicate that that the suspect web page is not similar to the target web page at issue.
  • Thus, when a suspect web page is compared against the login page for XYZ bank for the purpose of determining whether the suspect web page and the login page for XYZ bank is similar, the set of scoring rules associated with the login page for XYZ bank would be employed for scoring features found in the suspect web page. In this manner, if the suspect web page has a large number of features in common with the login page for XYZ bank and/or has in common certain high-scoring features, the suspect web page may earn a sufficiently high aggregate score to be deemed similar to the login page for XYZ bank.
  • The threshold for deciding whether an aggregate score earned by a suspect web page when that suspect web page is compared against the login page for XYZ bank may be implemented in the set of scoring rules for the login page of XYZ bank, for example. As with the determination of how many point a particular feature may be worth, the determination of the particular threshold value for deeming a suspect web page similar may be made empirically by a human or by automated software.
  • The point is each potential target web page (e.g., Acme Store credit card entry page) is associated with a set of scoring rules for its features, and that set of scoring rules are employed to generate a score for a suspect web page when that suspect web page is compared against Acme Store credit card entry page. Furthermore, the similarity threshold value to determine whether a suspect web page is similar to Acme Store credit card entry page is implemented by the set of scoring rules associated with the Acme Store credit card entry page.
  • When the suspect web page is compared against another potential target web page (e.g., ABC Bank personal information authentication page), the set of scoring rules associated with that potential target web page (e.g., ABC Bank personal information authentication page) would be employed instead to generate the similarity score. Further, the similarity threshold value to determine whether a suspect web page is similar to the ABC Bank personal information authentication page is implemented by the set of scoring rules associated with the ABC Bank personal information authentication page.
  • In this manner, it is possible for each web page or website owner to decide the importance place on each individual feature of his web page for the purpose of deciding whether another web page is sufficiently similar. In an embodiment, the score associated with each feature and/or the similarity threshold in the set of scoring rules for a particular web page may be continually refined and updated each time a “false positive” or an erroneous identification of similarity or dissimilarity occurs. For example, if the similarity threshold is so low that suspect web pages are often misidentified as being similar to a particular web page, the scoring rules may be revised and/or the similarity threshold in the set of scoring rules for that particular web page may be revised upward so that only suspect web pages that have a large number of features in common or having a sufficient number of high-scoring features in common would be judged to be similar.
  • As another example, if the similarity threshold is so high that no suspect web page is ever identified as being similar to a particular web page even though a suspect web page is the same as that particular web page (i.e., failing to identify that the two websites are similar), the scoring rules may be revised and/or the similarity threshold in the set of scoring rules for that particular web page may be revised downward so that web pages that are truly similar may be judged to be to be similar by the set of scoring rules for that particular web page. Since the set of scoring rules are associated with the legitimate web page, the effect of continually improving the scoring rules result in increasingly accurate similarity identification as more suspect web pages are tested against the legitimate web page.
  • In an embodiment, fuzzy logic or artificial intelligence may be employed to render the comparison process more efficient and/or accurate. In some embodiments, regular expressions for textual features may be employed in the evaluation of features and can achieve a good accuracy. In the context of the present application, a regular expression refers to a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are known to those skilled in the art and will not be explained in details herein. Using regular expressions in the creation of the set of scoring rules and in the scoring rules themselves increases the flexibility with which features in the suspect web pages may be identified and scored.
  • The features and advantages of the invention may be better understood with reference to the figures and discussions that follow. FIG. 2 shows, in accordance with an embodiment of the invention, the high level steps for preparing the set of likely target web pages for similarity comparison. In step 202, the set of likely target web pages are selected on the basis of website type and web page type. With respect to website type, websites that are popular and/or provide money, goods, or services tend to be sites that are targets for phishers and may thus be chosen in an embodiment.
  • With respect to web page type, web pages that request from users transaction information (including for example login information, any confidential and/or financial transaction information, etc.) tend to be web pages that are targets of phishers and may thus be chosen, in an embodiment. In an embodiment, both the website type filter and web page type filter may be employed to select the set of likely target web pages. Alternatively or additionally, a human operator may select and add web pages to the set likely target web pages if it is believed that those web pages may be phishing targets. In these or other embodiments, web pages may also be included based on other criteria designed to select web pages deemed to be likely to be susceptible to phishing attacks
  • In step 204 each of the likely target web page in the set of likely target web pages are processed to generate a set of scoring rules for features in that web page. As discussed, a feature may represent any attribute or characteristic of a web page, whether or not human or visually perceptible. In an embodiment, a human operator may manually designate the features worthy of scoring and the score associated with each of the web page features. In another embodiment, software may be employed to scan through a web page and/or the code implementing the web page and assign scores to some or all of the features found.
  • After each web page in the set of likely target web pages is processed, each web page and its set of scoring rules are stored (206) for subsequent use in similarity determination with a suspect web page.
  • FIG. 3 shows, in accordance with an embodiment of the present invention, the steps for performing similarity analysis for a suspect web page. In step 302, the suspect web page is received. In step 304, the suspect web page is compared against each likely target web page in the set of likely target web pages. In an embodiment, web pages in the set of likely target web pages may optionally be re-ordered based on information gleaned from the suspect web page such that those likely target web pages that have a highly probability of a similarity match are tested first. For example, if text or image in the suspect web page suggests that the suspect web page is a login web page for a particular enterprise, likely target login web pages for that particular enterprise may be tested first.
  • Generally speaking, the set of scoring rules for the likely target web page currently being tested is employed to score features found in the suspect web page. If the aggregate score exceeds (or equal to, in an embodiment) a certain similarity threshold (as determined by step 306), that likely target web page is identified as the web page that is similar to the suspect web page (308). Thereafter, analysis may be performed on the suspect web page to determine whether the suspect web page is indeed represents an attempt to perform a phishing attack on the identified similar target web page.
  • On the other hand, if the aggregate score is below (or equal to, in another embodiment) to the similarity threshold, that likely target web page is not identified as the web page that is similar to the suspect web page (310). Thereafter, comparison of the suspect web page against the likely target web pages continue until similarity is found.
  • In an embodiment, if a subsequent analysis ascertains that the similarity determination result from the steps of FIG. 3 is erroneous, changes may be made to the selection of features, the scoring of features, and/or the similarity threshold associated with the set of scoring rules for the target web page that was misidentified as being similar to the suspect web page. If all likely target web pages are exhausted and no similar web pages are found, a report is then provided, noting that a similar web page is not found among the set of likely target web pages. In this case, the similarity testing may proceed against additional web pages that were not included in the set of likely target web pages or the operator may be notified and the method of FIG. 3 may simply end after notification. In an embodiment, if more than one target web pages are determined to be similar to the suspect web page, no result will be drawn for this suspect web page, and the scoring rules may be revised iteratively to avoid this case. This embodiment is intended to minimize “false positives,” as in the case wherein multiple web pages are determined to be similar and the result is thus inconclusive.
  • As can be appreciated from the foregoing, embodiments of the invention are able to ascertain the identity of the target web page in a highly efficient manner. By filtering the available web pages based on likely website types and likely web page types and further in view of the phishing problem to be solved, the set of likely target web pages may be made smaller. Since each likely target web page is associated with its own scoring rules, much flexibility is afforded to entities who own those likely target web pages in deciding whether the suspect web page is sufficiently similar. If an erroneous similarity determination is made, changes to the scoring rules and/or the similarity threshold may be made, enabling the similarity determination process to become more accurate over time.
  • While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and apparatuses of the present invention. Additionally, it is intended that the abstract section, having a limit to the number of words that can be provided, be furnished for convenience to the reader and not to be construed as limiting of the claims herein. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.

Claims (21)

1. A computer-implemented method for ascertaining which web page among a plurality of candidate web pages is similar to a given web page, comprising:
extracting a set of web page components from said given web page;
comparing said given web page against each of said plurality of candidate web pages in turn, said comparing results in a composite similarity score for said set of web page components, said composite similarity score being computed from scores assigned to individual ones of said set of web page components in accordance with a set of scoring rules associated with said web page that is under examination for similarity, wherein a web page component of said set of web page components is associated with a first score if said web page component also exists in said web page that is under examination for similarity, said web page component of said set of web page components is associated with second score different from said first web page component if said web page component does not exists in said web page that is under examination for similarity; and
if said composite similarity score exceeds a predefined threshold, designating said given web page similar to said web page that is under examination for similarity.
2. The method of claim 1 wherein said set of web page components includes at least a URL string.
3. The method of claim 1 wherein said set of web page components includes an image element.
4. The method of claim 1 wherein said web page component represents text.
5. The method of claim 4 wherein said web page component is tested for similarity using a regular expression.
6. The method of claim 1 wherein said web page component is visible.
7. The method of claim 1 wherein said web page component is invisible.
8. The method of claim 1 wherein said comparing is performed until a similar web page is found.
9. The method of claim 1 further comprising providing a warning indication if multiple web pages of said plurality of web pages are deemed similar to said given web page.
10. A computer-implemented method for designating a given web page similar or dissimilar with respect to a reference web page, comprising:
extracting a set of web page components from said given web page;
computing, using a set of scoring rules associated with said reference web page, a composite similarity score for said set of web page components, said composite similarity score being computed from scores assigned to individual ones of said set of web page components, wherein a web page component of said set of web page components is assigned first score if said web page component also exists in said reference web page, said web page component of said set of web page components is assigned second score different from said first web page component if said web page component does not exists in said reference web page;
if said composite similarity score exceeds a predefined threshold, designating said given web page similar to said reference web page.
11. The method of claim 10 wherein said set of web page components includes at least a URL string.
12. The method of claim 10 wherein said set of web page components includes an image element.
13. The method of claim 10 wherein said web page component represents text.
14. The method of claim 13 wherein said web page component is tested for similarity using a regular expression.
15. The method of claim 10 wherein said web page component is visible.
16. The method of claim 10 wherein said web page component is invisible.
17. An article of manufacture comprising a computer storage medium for storing thereon computer readable code for ascertaining which web page among a plurality of candidate web pages is similar to a given web page, comprising:
computer readable code for extracting a set of web page components from said given web page;
computer readable code for comparing said given web page against each of said plurality of candidate web pages in turn, said comparing results in a composite similarity score for said set of web page components, said composite similarity score being computed from scores assigned to individual ones of said set of web page components in accordance with a set of scoring rules associated with said web page that is under examination for similarity, wherein a web page component of said set of web page components is associated with a first score if said web page component also exists in said web page that is under examination for similarity, said web page component of said set of web page components is associated with second score different from said first web page component if said web page component does not exists in said web page that is under examination for similarity; and
computer readable code for designating, if said composite similarity score exceeds a predefined threshold, said given web page similar to said web page that is under examination for similarity.
18. The article of manufacture of claim 17 wherein said set of web page components includes at least a URL string.
19. The article of manufacture of claim 17 wherein said set of web page components includes an image element.
20. The article of manufacture of claim 17 wherein said web page component represents text.
21. The article of manufacture of claim 20 wherein said web page component is tested for similarity using a regular expression.
US11/617,654 2006-12-28 2006-12-28 Dynamic page similarity measurement Abandoned US20080162449A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/617,654 US20080162449A1 (en) 2006-12-28 2006-12-28 Dynamic page similarity measurement
US16/548,269 US11042630B2 (en) 2006-12-28 2019-08-22 Dynamic page similarity measurement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/617,654 US20080162449A1 (en) 2006-12-28 2006-12-28 Dynamic page similarity measurement

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/548,269 Continuation US11042630B2 (en) 2006-12-28 2019-08-22 Dynamic page similarity measurement

Publications (1)

Publication Number Publication Date
US20080162449A1 true US20080162449A1 (en) 2008-07-03

Family

ID=39585407

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/617,654 Abandoned US20080162449A1 (en) 2006-12-28 2006-12-28 Dynamic page similarity measurement
US16/548,269 Active US11042630B2 (en) 2006-12-28 2019-08-22 Dynamic page similarity measurement

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/548,269 Active US11042630B2 (en) 2006-12-28 2019-08-22 Dynamic page similarity measurement

Country Status (1)

Country Link
US (2) US20080162449A1 (en)

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080172741A1 (en) * 2007-01-16 2008-07-17 International Business Machines Corporation Method and Apparatus for Detecting Computer Fraud
US20080178302A1 (en) * 2007-01-19 2008-07-24 Attributor Corporation Determination of originality of content
US20100043071A1 (en) * 2008-08-12 2010-02-18 Yahoo! Inc. System and method for combating phishing
US20100095375A1 (en) * 2008-10-14 2010-04-15 Balachander Krishnamurthy Method for locating fraudulent replicas of web sites
US8161561B1 (en) * 2004-10-05 2012-04-17 Symantec Corporation Confidential data protection through usage scoping
US20120304291A1 (en) * 2011-05-26 2012-11-29 International Business Machines Corporation Rotation of web site content to prevent e-mail spam/phishing attacks
US20130204860A1 (en) * 2012-02-03 2013-08-08 TrueMaps LLC Apparatus and Method for Comparing and Statistically Extracting Commonalities and Differences Between Different Websites
WO2013134350A1 (en) * 2012-03-09 2013-09-12 Digilant, Inc. Look-alike website scoring
US8561185B1 (en) * 2011-05-17 2013-10-15 Google Inc. Personally identifiable information detection
US20140068016A1 (en) * 2012-08-28 2014-03-06 Greg Howett System and Method for Web Application Acceleration
US20150073944A1 (en) * 2008-06-05 2015-03-12 Craze, Inc. Method and system for classification of venue by analyzing data from venue website
US8990933B1 (en) * 2012-07-24 2015-03-24 Intuit Inc. Securing networks against spear phishing attacks
US9065850B1 (en) * 2011-02-07 2015-06-23 Zscaler, Inc. Phishing detection systems and methods
US20150205808A1 (en) * 2014-01-22 2015-07-23 International Business Machines Corporation Storing information to manipulate focus for a webpage
EP3065367A1 (en) * 2015-03-05 2016-09-07 AO Kaspersky Lab System and method for automated phishing detection rule evolution
US20160307201A1 (en) * 2010-11-29 2016-10-20 Biocatch Ltd. Contextual mapping of web-pages, and generation of fraud-relatedness score-values
US9842200B1 (en) 2006-08-29 2017-12-12 Attributor Corporation Content monitoring and host compliance evaluation
US20180067833A1 (en) * 2011-05-16 2018-03-08 Intuit Inc. System and method for automated web site information retrieval scripting using untraine
US10007723B2 (en) 2005-12-23 2018-06-26 Digimarc Corporation Methods for identifying audio or video content
US10032010B2 (en) 2010-11-29 2018-07-24 Biocatch Ltd. System, device, and method of visual login and stochastic cryptography
US10037421B2 (en) 2010-11-29 2018-07-31 Biocatch Ltd. Device, system, and method of three-dimensional spatial user authentication
US10049209B2 (en) 2010-11-29 2018-08-14 Biocatch Ltd. Device, method, and system of differentiating between virtual machine and non-virtualized device
US10055560B2 (en) 2010-11-29 2018-08-21 Biocatch Ltd. Device, method, and system of detecting multiple users accessing the same account
US10069852B2 (en) 2010-11-29 2018-09-04 Biocatch Ltd. Detection of computerized bots and automated cyber-attack modules
US10069837B2 (en) 2015-07-09 2018-09-04 Biocatch Ltd. Detection of proxy server
US10083439B2 (en) 2010-11-29 2018-09-25 Biocatch Ltd. Device, system, and method of differentiating over multiple accounts between legitimate user and cyber-attacker
US10097580B2 (en) 2016-04-12 2018-10-09 Microsoft Technology Licensing, Llc Using web search engines to correct domain names used for social engineering
US10108525B2 (en) 2013-06-14 2018-10-23 International Business Machines Corporation Optimizing automated interactions with web applications
US10164985B2 (en) 2010-11-29 2018-12-25 Biocatch Ltd. Device, system, and method of recovery and resetting of user authentication factor
US10198122B2 (en) 2016-09-30 2019-02-05 Biocatch Ltd. System, device, and method of estimating force applied to a touch surface
US10242415B2 (en) 2006-12-20 2019-03-26 Digimarc Corporation Method and system for determining content treatment
US10262324B2 (en) 2010-11-29 2019-04-16 Biocatch Ltd. System, device, and method of differentiating among users based on user-specific page navigation sequence
US10298614B2 (en) * 2010-11-29 2019-05-21 Biocatch Ltd. System, device, and method of generating and managing behavioral biometric cookies
US10397262B2 (en) 2017-07-20 2019-08-27 Biocatch Ltd. Device, system, and method of detecting overlay malware
US10395018B2 (en) 2010-11-29 2019-08-27 Biocatch Ltd. System, method, and device of detecting identity of a user and authenticating a user
US10404729B2 (en) 2010-11-29 2019-09-03 Biocatch Ltd. Device, method, and system of generating fraud-alerts for cyber-attacks
US10476873B2 (en) 2010-11-29 2019-11-12 Biocatch Ltd. Device, system, and method of password-less user authentication and password-less detection of user identity
US10474815B2 (en) 2010-11-29 2019-11-12 Biocatch Ltd. System, device, and method of detecting malicious automatic script and code injection
US10497006B2 (en) * 2014-10-08 2019-12-03 Facebook, Inc. Systems and methods for processing potentially misidentified illegitimate incidents
EP3465455A4 (en) * 2016-05-23 2020-01-01 Greathorn, Inc. COMPUTER-IMPLEMENTED METHODS AND SYSTEMS FOR lDENTIFYING VISUALLY SIMILAR TEXT CHARACTER STRINGS
US20200042696A1 (en) * 2006-12-28 2020-02-06 Trend Micro Incorporated Dynamic page similarity measurement
US10579784B2 (en) 2016-11-02 2020-03-03 Biocatch Ltd. System, device, and method of secure utilization of fingerprints for user authentication
US10586036B2 (en) 2010-11-29 2020-03-10 Biocatch Ltd. System, device, and method of recovery and resetting of user authentication factor
WO2020086773A1 (en) 2018-10-23 2020-04-30 Cser Tamas Software test case maintenance
US10685355B2 (en) 2016-12-04 2020-06-16 Biocatch Ltd. Method, device, and system of detecting mule accounts and accounts used for money laundering
US10719765B2 (en) 2015-06-25 2020-07-21 Biocatch Ltd. Conditional behavioral biometrics
US10728761B2 (en) 2010-11-29 2020-07-28 Biocatch Ltd. Method, device, and system of detecting a lie of a user who inputs data
US10735381B2 (en) 2006-08-29 2020-08-04 Attributor Corporation Customized handling of copied content based on owner-specified similarity thresholds
US10747305B2 (en) 2010-11-29 2020-08-18 Biocatch Ltd. Method, system, and device of authenticating identity of a user of an electronic device
US10776476B2 (en) 2010-11-29 2020-09-15 Biocatch Ltd. System, device, and method of visual login
US10805346B2 (en) * 2017-10-01 2020-10-13 Fireeye, Inc. Phishing attack detection
US10834590B2 (en) 2010-11-29 2020-11-10 Biocatch Ltd. Method, device, and system of differentiating between a cyber-attacker and a legitimate user
US10897482B2 (en) 2010-11-29 2021-01-19 Biocatch Ltd. Method, device, and system of back-coloring, forward-coloring, and fraud detection
US10917431B2 (en) 2010-11-29 2021-02-09 Biocatch Ltd. System, method, and device of authenticating a user based on selfie image or selfie video
US10949514B2 (en) 2010-11-29 2021-03-16 Biocatch Ltd. Device, system, and method of differentiating among users based on detection of hardware components
US10949757B2 (en) 2010-11-29 2021-03-16 Biocatch Ltd. System, device, and method of detecting user identity based on motor-control loop model
US10970394B2 (en) 2017-11-21 2021-04-06 Biocatch Ltd. System, device, and method of detecting vishing attacks
US11055395B2 (en) 2016-07-08 2021-07-06 Biocatch Ltd. Step-up authentication
US20210329030A1 (en) * 2010-11-29 2021-10-21 Biocatch Ltd. Device, System, and Method of Detecting Vishing Attacks
US11210674B2 (en) 2010-11-29 2021-12-28 Biocatch Ltd. Method, device, and system of detecting mule accounts and accounts used for money laundering
US11223619B2 (en) 2010-11-29 2022-01-11 Biocatch Ltd. Device, system, and method of user authentication based on user-specific characteristics of task performance
US11269977B2 (en) 2010-11-29 2022-03-08 Biocatch Ltd. System, apparatus, and method of collecting and processing data in electronic devices
US11606353B2 (en) 2021-07-22 2023-03-14 Biocatch Ltd. System, device, and method of generating and utilizing one-time passwords

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5477451A (en) * 1991-07-25 1995-12-19 International Business Machines Corp. Method and system for natural language translation
US5857212A (en) * 1995-07-06 1999-01-05 Sun Microsystems, Inc. System and method for horizontal alignment of tokens in a structural representation program editor
US6138129A (en) * 1997-12-16 2000-10-24 World One Telecom, Ltd. Method and apparatus for providing automated searching and linking of electronic documents
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US20030088643A1 (en) * 2001-06-04 2003-05-08 Shupps Eric A. Method and computer system for isolating and interrelating components of an application
US20040158799A1 (en) * 2003-02-07 2004-08-12 Breuel Thomas M. Information extraction from html documents by structural matching
US20040163043A1 (en) * 2003-02-10 2004-08-19 Kaidara S.A. System method and computer program product for obtaining structured data from text
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US7051368B1 (en) * 1999-11-09 2006-05-23 Microsoft Corporation Methods and systems for screening input strings intended for use by web servers
US20080010683A1 (en) * 2006-07-10 2008-01-10 Baddour Victor L System and method for analyzing web content
US20080046738A1 (en) * 2006-08-04 2008-02-21 Yahoo! Inc. Anti-phishing agent
US20080115214A1 (en) * 2006-11-09 2008-05-15 Rowley Peter A Web page protection against phishing
US20080133540A1 (en) * 2006-12-01 2008-06-05 Websense, Inc. System and method of analyzing web addresses
US7457823B2 (en) * 2004-05-02 2008-11-25 Markmonitor Inc. Methods and systems for analyzing data related to possible online fraud

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004055632A2 (en) * 2002-12-13 2004-07-01 Wholesecurity, Inc. Method, system, and computer program product for security within a global computer network
US7913302B2 (en) * 2004-05-02 2011-03-22 Markmonitor, Inc. Advanced responses to online fraud
US7992204B2 (en) * 2004-05-02 2011-08-02 Markmonitor, Inc. Enhanced responses to online fraud
US9203648B2 (en) * 2004-05-02 2015-12-01 Thomson Reuters Global Resources Online fraud solution
US20070107053A1 (en) * 2004-05-02 2007-05-10 Markmonitor, Inc. Enhanced responses to online fraud
US8041769B2 (en) * 2004-05-02 2011-10-18 Markmonitor Inc. Generating phish messages
US20060080735A1 (en) * 2004-09-30 2006-04-13 Usa Revco, Llc Methods and systems for phishing detection and notification
US7630987B1 (en) * 2004-11-24 2009-12-08 Bank Of America Corporation System and method for detecting phishers by analyzing website referrals
US20060123478A1 (en) * 2004-12-02 2006-06-08 Microsoft Corporation Phishing detection, prevention, and notification
JP2006221242A (en) * 2005-02-08 2006-08-24 Fujitsu Ltd Authentication information fraud prevention system, program, and method
US20060253584A1 (en) * 2005-05-03 2006-11-09 Dixon Christopher J Reputation of an entity associated with a content item
US7590707B2 (en) * 2006-08-07 2009-09-15 Webroot Software, Inc. Method and system for identifying network addresses associated with suspect network destinations
US8578481B2 (en) * 2006-10-16 2013-11-05 Red Hat, Inc. Method and system for determining a probability of entry of a counterfeit domain in a browser
US20080163369A1 (en) * 2006-12-28 2008-07-03 Ming-Tai Allen Chang Dynamic phishing detection methods and apparatus
US20080162449A1 (en) * 2006-12-28 2008-07-03 Chen Chao-Yu Dynamic page similarity measurement
US9521161B2 (en) * 2007-01-16 2016-12-13 International Business Machines Corporation Method and apparatus for detecting computer fraud
US8205255B2 (en) * 2007-05-14 2012-06-19 Cisco Technology, Inc. Anti-content spoofing (ACS)
US9148445B2 (en) * 2008-05-07 2015-09-29 Cyveillance Inc. Method and system for misuse detection
US8850570B1 (en) * 2008-06-30 2014-09-30 Symantec Corporation Filter-based identification of malicious websites
US8448245B2 (en) * 2009-01-17 2013-05-21 Stopthehacker.com, Jaal LLC Automated identification of phishing, phony and malicious web sites
US8943588B1 (en) * 2012-09-20 2015-01-27 Amazon Technologies, Inc. Detecting unauthorized websites
US9621566B2 (en) * 2013-05-31 2017-04-11 Adi Labs Incorporated System and method for detecting phishing webpages

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5477451A (en) * 1991-07-25 1995-12-19 International Business Machines Corp. Method and system for natural language translation
US5857212A (en) * 1995-07-06 1999-01-05 Sun Microsystems, Inc. System and method for horizontal alignment of tokens in a structural representation program editor
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US6138129A (en) * 1997-12-16 2000-10-24 World One Telecom, Ltd. Method and apparatus for providing automated searching and linking of electronic documents
US7051368B1 (en) * 1999-11-09 2006-05-23 Microsoft Corporation Methods and systems for screening input strings intended for use by web servers
US20030088643A1 (en) * 2001-06-04 2003-05-08 Shupps Eric A. Method and computer system for isolating and interrelating components of an application
US20040158799A1 (en) * 2003-02-07 2004-08-12 Breuel Thomas M. Information extraction from html documents by structural matching
US20040163043A1 (en) * 2003-02-10 2004-08-19 Kaidara S.A. System method and computer program product for obtaining structured data from text
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US7457823B2 (en) * 2004-05-02 2008-11-25 Markmonitor Inc. Methods and systems for analyzing data related to possible online fraud
US20080010683A1 (en) * 2006-07-10 2008-01-10 Baddour Victor L System and method for analyzing web content
US20080046738A1 (en) * 2006-08-04 2008-02-21 Yahoo! Inc. Anti-phishing agent
US20080115214A1 (en) * 2006-11-09 2008-05-15 Rowley Peter A Web page protection against phishing
US20080133540A1 (en) * 2006-12-01 2008-06-05 Websense, Inc. System and method of analyzing web addresses

Cited By (88)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8161561B1 (en) * 2004-10-05 2012-04-17 Symantec Corporation Confidential data protection through usage scoping
US10007723B2 (en) 2005-12-23 2018-06-26 Digimarc Corporation Methods for identifying audio or video content
US10735381B2 (en) 2006-08-29 2020-08-04 Attributor Corporation Customized handling of copied content based on owner-specified similarity thresholds
US9842200B1 (en) 2006-08-29 2017-12-12 Attributor Corporation Content monitoring and host compliance evaluation
US9436810B2 (en) 2006-08-29 2016-09-06 Attributor Corporation Determination of copied content, including attribution
US10242415B2 (en) 2006-12-20 2019-03-26 Digimarc Corporation Method and system for determining content treatment
US20200042696A1 (en) * 2006-12-28 2020-02-06 Trend Micro Incorporated Dynamic page similarity measurement
US11042630B2 (en) * 2006-12-28 2021-06-22 Trend Micro Incorporated Dynamic page similarity measurement
US20080172741A1 (en) * 2007-01-16 2008-07-17 International Business Machines Corporation Method and Apparatus for Detecting Computer Fraud
US9521161B2 (en) * 2007-01-16 2016-12-13 International Business Machines Corporation Method and apparatus for detecting computer fraud
US8707459B2 (en) * 2007-01-19 2014-04-22 Digimarc Corporation Determination of originality of content
US20080178302A1 (en) * 2007-01-19 2008-07-24 Attributor Corporation Determination of originality of content
US20150073944A1 (en) * 2008-06-05 2015-03-12 Craze, Inc. Method and system for classification of venue by analyzing data from venue website
US8528079B2 (en) * 2008-08-12 2013-09-03 Yahoo! Inc. System and method for combating phishing
US20100043071A1 (en) * 2008-08-12 2010-02-18 Yahoo! Inc. System and method for combating phishing
US20100095375A1 (en) * 2008-10-14 2010-04-15 Balachander Krishnamurthy Method for locating fraudulent replicas of web sites
US8701185B2 (en) * 2008-10-14 2014-04-15 At&T Intellectual Property I, L.P. Method for locating fraudulent replicas of web sites
US10395018B2 (en) 2010-11-29 2019-08-27 Biocatch Ltd. System, method, and device of detecting identity of a user and authenticating a user
US11425563B2 (en) 2010-11-29 2022-08-23 Biocatch Ltd. Method, device, and system of differentiating between a cyber-attacker and a legitimate user
US20210329030A1 (en) * 2010-11-29 2021-10-21 Biocatch Ltd. Device, System, and Method of Detecting Vishing Attacks
US10949757B2 (en) 2010-11-29 2021-03-16 Biocatch Ltd. System, device, and method of detecting user identity based on motor-control loop model
US20160307201A1 (en) * 2010-11-29 2016-10-20 Biocatch Ltd. Contextual mapping of web-pages, and generation of fraud-relatedness score-values
US10949514B2 (en) 2010-11-29 2021-03-16 Biocatch Ltd. Device, system, and method of differentiating among users based on detection of hardware components
US10917431B2 (en) 2010-11-29 2021-02-09 Biocatch Ltd. System, method, and device of authenticating a user based on selfie image or selfie video
US10897482B2 (en) 2010-11-29 2021-01-19 Biocatch Ltd. Method, device, and system of back-coloring, forward-coloring, and fraud detection
US10834590B2 (en) 2010-11-29 2020-11-10 Biocatch Ltd. Method, device, and system of differentiating between a cyber-attacker and a legitimate user
US11210674B2 (en) 2010-11-29 2021-12-28 Biocatch Ltd. Method, device, and system of detecting mule accounts and accounts used for money laundering
US11223619B2 (en) 2010-11-29 2022-01-11 Biocatch Ltd. Device, system, and method of user authentication based on user-specific characteristics of task performance
US10776476B2 (en) 2010-11-29 2020-09-15 Biocatch Ltd. System, device, and method of visual login
US10032010B2 (en) 2010-11-29 2018-07-24 Biocatch Ltd. System, device, and method of visual login and stochastic cryptography
US10037421B2 (en) 2010-11-29 2018-07-31 Biocatch Ltd. Device, system, and method of three-dimensional spatial user authentication
US10049209B2 (en) 2010-11-29 2018-08-14 Biocatch Ltd. Device, method, and system of differentiating between virtual machine and non-virtualized device
US10055560B2 (en) 2010-11-29 2018-08-21 Biocatch Ltd. Device, method, and system of detecting multiple users accessing the same account
US10069852B2 (en) 2010-11-29 2018-09-04 Biocatch Ltd. Detection of computerized bots and automated cyber-attack modules
US10747305B2 (en) 2010-11-29 2020-08-18 Biocatch Ltd. Method, system, and device of authenticating identity of a user of an electronic device
US10083439B2 (en) 2010-11-29 2018-09-25 Biocatch Ltd. Device, system, and method of differentiating over multiple accounts between legitimate user and cyber-attacker
US11838118B2 (en) * 2010-11-29 2023-12-05 Biocatch Ltd. Device, system, and method of detecting vishing attacks
US10728761B2 (en) 2010-11-29 2020-07-28 Biocatch Ltd. Method, device, and system of detecting a lie of a user who inputs data
US11250435B2 (en) * 2010-11-29 2022-02-15 Biocatch Ltd. Contextual mapping of web-pages, and generation of fraud-relatedness score-values
US10164985B2 (en) 2010-11-29 2018-12-25 Biocatch Ltd. Device, system, and method of recovery and resetting of user authentication factor
US11580553B2 (en) 2010-11-29 2023-02-14 Biocatch Ltd. Method, device, and system of detecting mule accounts and accounts used for money laundering
US10621585B2 (en) * 2010-11-29 2020-04-14 Biocatch Ltd. Contextual mapping of web-pages, and generation of fraud-relatedness score-values
US10262324B2 (en) 2010-11-29 2019-04-16 Biocatch Ltd. System, device, and method of differentiating among users based on user-specific page navigation sequence
US10298614B2 (en) * 2010-11-29 2019-05-21 Biocatch Ltd. System, device, and method of generating and managing behavioral biometric cookies
US10586036B2 (en) 2010-11-29 2020-03-10 Biocatch Ltd. System, device, and method of recovery and resetting of user authentication factor
US11269977B2 (en) 2010-11-29 2022-03-08 Biocatch Ltd. System, apparatus, and method of collecting and processing data in electronic devices
US10404729B2 (en) 2010-11-29 2019-09-03 Biocatch Ltd. Device, method, and system of generating fraud-alerts for cyber-attacks
US10476873B2 (en) 2010-11-29 2019-11-12 Biocatch Ltd. Device, system, and method of password-less user authentication and password-less detection of user identity
US10474815B2 (en) 2010-11-29 2019-11-12 Biocatch Ltd. System, device, and method of detecting malicious automatic script and code injection
US11314849B2 (en) 2010-11-29 2022-04-26 Biocatch Ltd. Method, device, and system of detecting a lie of a user who inputs data
US11330012B2 (en) * 2010-11-29 2022-05-10 Biocatch Ltd. System, method, and device of authenticating a user based on selfie image or selfie video
US9065850B1 (en) * 2011-02-07 2015-06-23 Zscaler, Inc. Phishing detection systems and methods
US20180067833A1 (en) * 2011-05-16 2018-03-08 Intuit Inc. System and method for automated web site information retrieval scripting using untraine
US9996441B2 (en) * 2011-05-16 2018-06-12 Intuit Inc. System and method for building a script for a web page using an existing script from a similar web page
US8561185B1 (en) * 2011-05-17 2013-10-15 Google Inc. Personally identifiable information detection
US9015802B1 (en) * 2011-05-17 2015-04-21 Google Inc. Personally identifiable information detection
US20120304291A1 (en) * 2011-05-26 2012-11-29 International Business Machines Corporation Rotation of web site content to prevent e-mail spam/phishing attacks
US9148444B2 (en) * 2011-05-26 2015-09-29 International Business Machines Corporation Rotation of web site content to prevent e-mail spam/phishing attacks
US20130204860A1 (en) * 2012-02-03 2013-08-08 TrueMaps LLC Apparatus and Method for Comparing and Statistically Extracting Commonalities and Differences Between Different Websites
WO2013134350A1 (en) * 2012-03-09 2013-09-12 Digilant, Inc. Look-alike website scoring
US8990933B1 (en) * 2012-07-24 2015-03-24 Intuit Inc. Securing networks against spear phishing attacks
US20140068016A1 (en) * 2012-08-28 2014-03-06 Greg Howett System and Method for Web Application Acceleration
US10127132B2 (en) 2013-06-14 2018-11-13 International Business Machines Corporation Optimizing automated interactions with web applications
US10108525B2 (en) 2013-06-14 2018-10-23 International Business Machines Corporation Optimizing automated interactions with web applications
US10929265B2 (en) 2013-06-14 2021-02-23 International Business Machines Corporation Optimizing automated interactions with web applications
US9680910B2 (en) * 2014-01-22 2017-06-13 International Business Machines Corporation Storing information to manipulate focus for a webpage
US20150205808A1 (en) * 2014-01-22 2015-07-23 International Business Machines Corporation Storing information to manipulate focus for a webpage
US10497006B2 (en) * 2014-10-08 2019-12-03 Facebook, Inc. Systems and methods for processing potentially misidentified illegitimate incidents
US9621570B2 (en) 2015-03-05 2017-04-11 AO Kaspersky Lab System and method for selectively evolving phishing detection rules
EP3065367A1 (en) * 2015-03-05 2016-09-07 AO Kaspersky Lab System and method for automated phishing detection rule evolution
US10719765B2 (en) 2015-06-25 2020-07-21 Biocatch Ltd. Conditional behavioral biometrics
US11238349B2 (en) 2015-06-25 2022-02-01 Biocatch Ltd. Conditional behavioural biometrics
US10069837B2 (en) 2015-07-09 2018-09-04 Biocatch Ltd. Detection of proxy server
US10834090B2 (en) 2015-07-09 2020-11-10 Biocatch Ltd. System, device, and method for detection of proxy server
US10523680B2 (en) * 2015-07-09 2019-12-31 Biocatch Ltd. System, device, and method for detecting a proxy server
US11323451B2 (en) 2015-07-09 2022-05-03 Biocatch Ltd. System, device, and method for detection of proxy server
US10097580B2 (en) 2016-04-12 2018-10-09 Microsoft Technology Licensing, Llc Using web search engines to correct domain names used for social engineering
EP3465455A4 (en) * 2016-05-23 2020-01-01 Greathorn, Inc. COMPUTER-IMPLEMENTED METHODS AND SYSTEMS FOR lDENTIFYING VISUALLY SIMILAR TEXT CHARACTER STRINGS
US11055395B2 (en) 2016-07-08 2021-07-06 Biocatch Ltd. Step-up authentication
US10198122B2 (en) 2016-09-30 2019-02-05 Biocatch Ltd. System, device, and method of estimating force applied to a touch surface
US10579784B2 (en) 2016-11-02 2020-03-03 Biocatch Ltd. System, device, and method of secure utilization of fingerprints for user authentication
US10685355B2 (en) 2016-12-04 2020-06-16 Biocatch Ltd. Method, device, and system of detecting mule accounts and accounts used for money laundering
US10397262B2 (en) 2017-07-20 2019-08-27 Biocatch Ltd. Device, system, and method of detecting overlay malware
US10805346B2 (en) * 2017-10-01 2020-10-13 Fireeye, Inc. Phishing attack detection
US10970394B2 (en) 2017-11-21 2021-04-06 Biocatch Ltd. System, device, and method of detecting vishing attacks
WO2020086773A1 (en) 2018-10-23 2020-04-30 Cser Tamas Software test case maintenance
EP3871094A4 (en) * 2018-10-23 2022-07-27 Functionize, Inc. Software test case maintenance
US11606353B2 (en) 2021-07-22 2023-03-14 Biocatch Ltd. System, device, and method of generating and utilizing one-time passwords

Also Published As

Publication number Publication date
US20200042696A1 (en) 2020-02-06
US11042630B2 (en) 2021-06-22

Similar Documents

Publication Publication Date Title
US11042630B2 (en) Dynamic page similarity measurement
US10951636B2 (en) Dynamic phishing detection methods and apparatus
Tan et al. PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder
JP6068506B2 (en) System and method for dynamic scoring of online fraud detection
US7802298B1 (en) Methods and apparatus for protecting computers against phishing attacks
CN104217160B (en) A kind of Chinese detection method for phishing site and system
US7451487B2 (en) Fraudulent message detection
US20090089859A1 (en) Method and apparatus for detecting phishing attempts solicited by electronic mail
US20140359760A1 (en) System and method for detecting phishing webpages
US20220030029A1 (en) Phishing Protection Methods and Systems
US10341382B2 (en) System and method for filtering electronic messages
Abbasi et al. A comparison of tools for detecting fake websites
Deshpande et al. Detection of phishing websites using Machine Learning
CN116366338B (en) Risk website identification method and device, computer equipment and storage medium
JP4781922B2 (en) Link information verification method, system, apparatus, and program
Razaque et al. Detection of phishing websites using machine learning
Deepa Phishing website detection using novel features and machine learning approach
Nivedha et al. Improving phishing URL detection using fuzzy association mining
WO2021050990A1 (en) Data analytics tool
KR20070067651A (en) Method on prevention of phishing through analysis of the internet site pattern
JP2007133488A (en) Information transmission source verification method and device
Sharathkumar et al. Phishing site detection using machine learning
Oko et al. DEVELOPMENT OF PHISHING SITE DETECTION PLUGIN TO SAFEGUARD ONLINE TRANSCATION SERVICES
US11323476B1 (en) Prevention of credential phishing based upon login behavior analysis
Nandhini et al. Phish Detect-Real Time Phish Detecting Browser Extension

Legal Events

Date Code Title Description
AS Assignment

Owner name: TREND MICRO INCORPORATED, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHAO-YU, CHEN;PENG-SHIH, PU;YU-FANG, TSAI;REEL/FRAME:019115/0008

Effective date: 20061228

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION