WO2000052598A1 - Apparatus and system for classifying and control access to information - Google Patents

Apparatus and system for classifying and control access to information Download PDF

Info

Publication number
WO2000052598A1
WO2000052598A1 PCT/AU2000/000158 AU0000158W WO0052598A1 WO 2000052598 A1 WO2000052598 A1 WO 2000052598A1 AU 0000158 W AU0000158 W AU 0000158W WO 0052598 A1 WO0052598 A1 WO 0052598A1
Authority
WO
WIPO (PCT)
Prior art keywords
param
submodel
exp
product
porn
Prior art date
Application number
PCT/AU2000/000158
Other languages
French (fr)
Inventor
Alan Bradley Jones
David Ross Taylor
Original Assignee
Tel.Net Media Pty. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tel.Net Media Pty. Ltd. filed Critical Tel.Net Media Pty. Ltd.
Priority to CA002363574A priority Critical patent/CA2363574A1/en
Priority to AU28959/00A priority patent/AU761017B2/en
Publication of WO2000052598A1 publication Critical patent/WO2000052598A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/102Entity profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/101Access control lists [ACL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • H04L67/306User profiles

Definitions

  • TECHNICAL FIELD OF THE INVENTION relates to apparatus and system for classifying information on communications network and in particular but not limited to apparatus and system for classifying content servers and for selectively controlling access to classified content servers.
  • Internet in particular allows fetching of information from any cooperating computers or content servers located in different parts of the world by simply clicking references to the information.
  • the number of accessible computers or content servers and the amount of information over the communications network grow daily it becomes increasingly difficult to classify them manually.
  • Known systems for controlling the types of information accessible on a network rely on comparing a requested destination with those on pre-determined Access Control Lists (ACL) or on word matching to determine whether to allow or deny access.
  • ACL Access Control Lists
  • This approach can be applied at the client node prior to requesting the information or on any suitably intelligent network device capable of intercepting the request or subsequent reply prior to it reaching the requester.
  • a software program for monitoring such requests on the PC can be configured to scan a pre-determined list of site addresses for a match. If found, access to the site may be denied and a suitable message is then displayed informing the user that access is denied.
  • the request may be allowed to proceed, but as data are received from the site they are scanned for checking a match with one or more sets of pre-determined words, word fragments or phrases. If a match is found the site is not displayed on the computer but instead there is shown a suitable message.
  • this type of control software is installed on a PC or work station which does not have particularly strict access privileges. The control software can be easily removed, disabled or otherwise circumvented and thereby defeating the control system.
  • a network device capable of intercepting the request or reply to a request may perform similar actions using the same methods of web site matching. This is usually maintained by a network administrator with strict access rights. Also, a network requiring clients to connect through the network device in order to access the network can have its content control enforced. This allows content control of multiple clients from one central point.
  • a system based on an access control list of prohibited sites is much more selective. Access can only be denied when attempting to access the sites which are included in the lists. While a suitably large list could bar access to a great deal of undesirable information it is difficult to keep up to date due to the rapid increase in the number of new sites and removal of sites.
  • An object of the present invention is to alleviate or to reduce to a certain degree one or more of the above disadvantages.
  • Another object of the present invention is provide an apparatus/system for classifying user profiles.
  • SUMMARY OF THE INVENTION the present invention resides in an apparatus for classifying information on communications network.
  • the apparatus comprises means for obtaining one or more transmission characteristics of information on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
  • the present invention resides in an apparatus for classifying content servers which are accessible on a communications network.
  • the apparatus comprises means for obtaining one or more transmission characteristics of information provided by any of said content servers on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
  • the present invention resides in a computer program for classifying information which is accessible on a communications network.
  • the program comprises means for obtaining one or more transmission characteristics of information on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
  • the present invention resides in a computer program for classifying content servers which are accessible on a communications network.
  • the apparatus comprises means for obtaining one or more transmission characteristics of information provided by any of said content servers on a path of said communications network, analysing means for predicting a classification of said information based on said one or more transmission characteristics.
  • the present invention resides in an apparatus/computer program for classifying user profiles of users accessing information or content servers on a communications network.
  • the apparatus/computer program comprises means for obtaining one or more transmission characteristics of information or information provided by any one of said content servers on a path of said communications network, analysing means for predicting a classification of said information or said one content server based on said one or more transmission characteristics, and means for classifying user profile in accordance with the predicted classification.
  • the above invention may also comprise means for storing said one or more transmission characteristics.
  • said one or more transmission characteristics include any one or more of network protocol, date and time stamps, size of transmission activities (text and image), content type of transmission activities, pattern seen within the content of the transmission and any other characteristic that can be employed for predicting classifications.
  • said one or more transmission characteristics are obtained from network packets or fragments thereof.
  • the analysing means includes profiling means for providing profiles of interactions based on said one or more transmission characteristics.
  • profiling means is arranged to process said one or more transmission characteristics for providing any one or more of frequency of interaction, duration of interaction, duration of absence of interaction, patterns of transmission, average number of http links within an object of related sites, average number of like sites visited within a time frame, and statistics from said other characteristics, for forming interaction profiles.
  • the analysing means can then use the profiles for predicting classifications.
  • the invention may have a knowledge base of predetermined profiles, and the analysing means is adapted to predict a classification based on a comparison between the profile of information to be classified and predetermined profiles.
  • the invention may have means for updating the knowledge base so that the classification prediction may be enhanced fol lowing classifications.
  • Figure 1 is a schematic diagram of the apparatus according to the invention
  • Figure 2 is a table of selected data of captured packets of a search engine using the apparatus shown in Figure 1 ;
  • Figure 3 is a partial table of selected data of captured packets of a news web site using the apparatus shown in Figure 1 ;
  • Figure 4 is a table of selected data of captured packets of an entertainment web site using the apparatus shown in Figure 1 ;
  • Figure 5 is a table of selected data of captured packets of the web site of an e-commerce merchant using the apparatus shown in Figure 1 ;
  • Figure 6 is a table of selected data of captured packets of the web site of another e-commerce merchant using the apparatus shown in Figure 1 ;
  • Figure 7 is a table of selected data of captured packets of a pornography web site using the apparatus shown in Figure 1 ;
  • Figure 8 is a table of selected data of captured packets of another pornography web site using the apparatus shown in Figure 1 ;
  • Figure 9 is a table of model N 1 results using the apparatus shown in Figure
  • Figure 10 is a table of model N2 results using the apparatus shown in Figure
  • Figure 1 1 is a table of model N3 results using the apparatus shown in Figure 1 ;
  • Figure 12 is a table of classification prediction confidence levels using the apparatus shown in Figure 1 .
  • FIG. 1 there is shown an apparatus 10 for classifying media or information flowing through a path of a communications network which in this case is the Internet.
  • network traffic passing through the apparatus 10 is captured and analysed for providing statistics relating to interactions between two or more terminals (not shown).
  • the captured traffic is first checked against a list of predetermined classifications to determine if it is known or unknown.
  • various models (to be described more fully below) are applied to the data set in the captured traffic in order to predict the content classification.
  • the models use parameters derived from a knowledge base of previously classified data sets and fitness with these parameters to determine the classification of the content of the newly captured traffic.
  • the web site sending the captured traffic is now classified and is added to the list of known classifications.
  • the embodiment of the apparatus 10 as described herein is for an analysis of transmission traffic using the HTTP protocol.
  • the apparatus 10 according to the present invention is not restricted to HTTP, and is easily adaptable to analyse data carried within any networks using any known protocol. Examples of the protocols include FTP, SMTP, NNTP, etc.
  • the captured data set is stored in the knowledge base. As the knowledge base expands, more data are used for the model parameters. This refines the apparatus and results in improved predictive performance.
  • ACLs Access control lists
  • the sites that are deemed to include undesirable information are added to Access control lists (ACLs).
  • the ACLs are used control the flow of content information between terminals. E.g. Undesired content information can be prevented from travelling further through the network by simply not forwarding it, or by replacing it, or by intercepting the request for such content information and modifying its destination.
  • Classification of traffic from content servers are relatively static.
  • user terminals that interact with these content servers are variable and their classifications are considered transient classifications.
  • classifications of content servers form a model of the style of content residing on the server
  • transient classifications form a model of style of content being viewed by a user terminal, or content consumer. This in effect forms a behaviour profile of such a consumer. This profile can be used to tailor the content information to suit the consumer.
  • the apparatus 10 captures a set of observed data relating to a network interaction event, and provides a set of results indicating the classification of a resource or personality residing at each network node involved in the interaction. This is accomplished by applying various statistical models to a profile, and testing this against results obtained from profiles of known classifications. In this example of the invention this process is represented by the following formulas: x is an unknown profile to be classified;
  • Profiles p1 ,p2,p3...pn are of known classifications; Models M1 ,M2,M3...Mn are available to operate on these profiles; and
  • C1 ,C2,C3...Cn are profile classifications.
  • the population of a profile of classification C1 may be defined by the population of M1 (p).
  • M1 (x) may be tested against the true population using any of the standard statistical hypothesis methods.
  • a pre-determined set of media terminals of a classification are modelled by various models M1 , M2 .. Mn.
  • Each model consists of an approach and a set of parameter, e.g linear regression, gradient and point of interception, so that for a single classification M1 (p1 ,p2 .. pn), M2(q1 ,q2 .. qn) .. Mn(r1 ,r2 .. rn) are used to model the population from the classification.
  • the models may be based on mathematical structures, or arbitrary rules.
  • the models are continually refined as more network traffic passes through the apparatus 10, thereby increasing the population space from which the classifications are computed.
  • a terminal may be permanently or transitionally defined in relation to a classification.
  • a transitionally defined terminal may move between classifications based on the fitness of the observed traffic to the models of the various classifications.
  • Figures 2 to 8 are tables of selected data of traffic for testing the profile of data during a network interaction with a content server to determine if it contains media content of a pornographic nature. Assumption is made that profiles for content servers contain a variable which is the average size of graphical images served.
  • a normai distribution or similar non-deterministic probability distribution is then used to test the hypothesis that the profile belongs to a population classified as pornographic.
  • the population of the classification may be defined by the population of N(a,b) where N is the image size and a and b are the mean and variance respectively, based on a normal distribution.
  • the average and standard deviation derived from the observed samples is tested against the true population using standard statistical hypothesis methods.
  • this approach may be broadened to encompass analysis of variance methods with multiple dependant variables, to model the characteristics of a site.
  • Traditional ANOVA or regressive techniques may be applied to model the media content.
  • a variety of traditional deterministic and non-deterministic models may be applied to determine the hypothesis of profile classification. These may be changed or upgraded continually depending on the level of predictive power found.
  • the functionality of models used is not limited to, but can include simple rules-of- thumb, deterministic and non-deterministic probability models, or arbitrary calculations. The choice of model is primarily dictated by the predictive power of that model against the population in question.
  • Figures 2 through 8 show examples of basic data set that can be gathered by observing network traffic of a typical interaction between a client browser and a web server.
  • Figures 9 to 1 1 illustrate a simple classification model. This model looks at the size, content and relationships of objects being transmitted by a content server.
  • the outcome of this model is to determine if the media being transmitted has pornographic content.
  • N 1 (a,b) Where N 1 is the image size, a and b are the mean and variance respectively, based on a normal distribution. N2(c,d)
  • N2 is the ratio of text to graphics
  • c and d are the total size of the text and graphic objects respectively.
  • N3 is the count of word patterns matched from a list of pre-determined words
  • e is the text of an object.
  • the result shows confidence to the 93% and 87% level for sites 6 and 7 respectively, that the sites belong to a population of pornographic sites.
  • the other samples give much lower confidence levels.
  • model N2 shown in Figure 10 a simple rule is used to test if the ratio is below a pre-determined threshold. The results show that sites 2, 4, 6 and 7 are within the threshold rating.
  • Model N3 shown in Figure 1 1 a simple rule is used to test if the number of words matching a list of patterns, exceeds a pre-determined threshold. The results show that sites 6 and 7 exceed the threshold.
  • a weighting formula is then applied to derive a final result as shown in Figure 12.
  • the apparatus 10 would predict that sites 6 and 7 are probably serving media with pornographic content, whereas sites 1 through 5 probably are not.
  • the attached appendix shows an example of the set of rules, constants and formulas which determine a confidence prediction based on logistic regression.
  • the rules are defined using "Submodel” and “Model” components to define individial data points, and aggregated data points. These are then referred to in the "ProbabilityAnalyser” equations which use standard predictive formulas.

Abstract

An apparatus (10) is provided for classifying information or content servers on a communications network including the Internet. The apparatus (10) comprises means for obtaining one or more transmission characteristics of information on a path of said communications network and analysing means for predicting a classification of said information based on said one or more transmission characteristics. Typically said one or more transmission characteristics include any one or more of network protocol, date and time stamps, size of transmission activities (text and image), content type of transmission activities, pattern seen within the content of the transmission and any other characteristic that can be employed for predicting classifications. The apparatus (10) can be adapted toclassify user profiles in accordance with the predicted classification. A knowledge base of predetermined profiles can be included, and the analysing means is adapted to predict a classification based on a comparison between the profile of information to be classified and the predetermined profiles.

Description

APPARATUS AND SYSTEM FOR CLASSIFYING AND CONTROL ACCESS TO INFORMATION
TECHNICAL FIELD OF THE INVENTION THIS INVENTION relates to apparatus and system for classifying information on communications network and in particular but not limited to apparatus and system for classifying content servers and for selectively controlling access to classified content servers.
BACKGROUND OF THE INVENTION
The phenomenon growth of information technology has allowed many people to have access to diverse information on communications networks. The
Internet in particular allows fetching of information from any cooperating computers or content servers located in different parts of the world by simply clicking references to the information. As the number of accessible computers or content servers and the amount of information over the communications network grow daily it becomes increasingly difficult to classify them manually.
Known systems for controlling the types of information accessible on a network rely on comparing a requested destination with those on pre-determined Access Control Lists (ACL) or on word matching to determine whether to allow or deny access. This approach can be applied at the client node prior to requesting the information or on any suitably intelligent network device capable of intercepting the request or subsequent reply prior to it reaching the requester. For example, in the case of an Internet browser running on a PC or work station, a request is made for an Internet resource such as a web site. A software program for monitoring such requests on the PC can be configured to scan a pre-determined list of site addresses for a match. If found, access to the site may be denied and a suitable message is then displayed informing the user that access is denied. Alternatively, the request may be allowed to proceed, but as data are received from the site they are scanned for checking a match with one or more sets of pre-determined words, word fragments or phrases. If a match is found the site is not displayed on the computer but instead there is shown a suitable message. Typically, this type of control software is installed on a PC or work station which does not have particularly strict access privileges. The control software can be easily removed, disabled or otherwise circumvented and thereby defeating the control system.
A network device capable of intercepting the request or reply to a request, such as a proxy server, may perform similar actions using the same methods of web site matching. This is usually maintained by a network administrator with strict access rights. Also, a network requiring clients to connect through the network device in order to access the network can have its content control enforced. This allows content control of multiple clients from one central point.
While these known systems do provide some access control abilities, there are several disadvantages. A system based on word or phrase matching can only match text and it therefore would allow access to undesired information comprising graphic images. Also, a single word may match a broad range of sites with quite different classes of information. As an example, when the word "sex" is used to match pornographic sites the system would also block access to other sites providing non offensive information such as articles on biology.
A system based on an access control list of prohibited sites is much more selective. Access can only be denied when attempting to access the sites which are included in the lists. While a suitably large list could bar access to a great deal of undesirable information it is difficult to keep up to date due to the rapid increase in the number of new sites and removal of sites.
The above systems also do not lend themselves to adaptation to other network protocols and services such as interactive chat, streaming video, email or encrypted data streams. Extending to different languages also poses a problem for globalisation of these systems. OBIECT OF THE INVENTION
An object of the present invention is to alleviate or to reduce to a certain degree one or more of the above disadvantages.
Another object of the present invention is provide an apparatus/system for classifying user profiles. SUMMARY OF THE INVENTION In one aspect therefor the present invention resides in an apparatus for classifying information on communications network. The apparatus comprises means for obtaining one or more transmission characteristics of information on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
In a second aspect therefor the present invention resides in an apparatus for classifying content servers which are accessible on a communications network. The apparatus comprises means for obtaining one or more transmission characteristics of information provided by any of said content servers on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
In a third aspect therefor the present invention resides in a computer program for classifying information which is accessible on a communications network. The program comprises means for obtaining one or more transmission characteristics of information on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics. In a fourth aspect therefor the present invention resides in a computer program for classifying content servers which are accessible on a communications network. The apparatus comprises means for obtaining one or more transmission characteristics of information provided by any of said content servers on a path of said communications network, analysing means for predicting a classification of said information based on said one or more transmission characteristics.
In a fifth aspect therefor the present invention resides in an apparatus/computer program for classifying user profiles of users accessing information or content servers on a communications network. The apparatus/computer program comprises means for obtaining one or more transmission characteristics of information or information provided by any one of said content servers on a path of said communications network, analysing means for predicting a classification of said information or said one content server based on said one or more transmission characteristics, and means for classifying user profile in accordance with the predicted classification.
The above invention may also comprise means for storing said one or more transmission characteristics.
Typically said one or more transmission characteristics include any one or more of network protocol, date and time stamps, size of transmission activities (text and image), content type of transmission activities, pattern seen within the content of the transmission and any other characteristic that can be employed for predicting classifications.
In preference said one or more transmission characteristics are obtained from network packets or fragments thereof.
It is also preferred that the analysing means includes profiling means for providing profiles of interactions based on said one or more transmission characteristics. Typically said profiling means is arranged to process said one or more transmission characteristics for providing any one or more of frequency of interaction, duration of interaction, duration of absence of interaction, patterns of transmission, average number of http links within an object of related sites, average number of like sites visited within a time frame, and statistics from said other characteristics, for forming interaction profiles. The analysing means can then use the profiles for predicting classifications.
The invention may have a knowledge base of predetermined profiles, and the analysing means is adapted to predict a classification based on a comparison between the profile of information to be classified and predetermined profiles. Advantageously the invention may have means for updating the knowledge base so that the classification prediction may be enhanced fol lowing classifications. In order that the present invention can be more readily understood and be put into practical effect reference will now be made to the accompanying drawings which illustrate one preferred embodiment of the invention and wherein: BRIEF DESCRIPTION OF THE DRAWING
Figure 1 is a schematic diagram of the apparatus according to the invention; Figure 2 is a table of selected data of captured packets of a search engine using the apparatus shown in Figure 1 ;
Figure 3 is a partial table of selected data of captured packets of a news web site using the apparatus shown in Figure 1 ; Figure 4 is a table of selected data of captured packets of an entertainment web site using the apparatus shown in Figure 1 ;
Figure 5 is a table of selected data of captured packets of the web site of an e-commerce merchant using the apparatus shown in Figure 1 ;
Figure 6 is a table of selected data of captured packets of the web site of another e-commerce merchant using the apparatus shown in Figure 1 ;
Figure 7 is a table of selected data of captured packets of a pornography web site using the apparatus shown in Figure 1 ;
Figure 8 is a table of selected data of captured packets of another pornography web site using the apparatus shown in Figure 1 ; Figure 9 is a table of model N 1 results using the apparatus shown in Figure
1 ;
Figure 10 is a table of model N2 results using the apparatus shown in Figure
1 ;
Figure 1 1 is a table of model N3 results using the apparatus shown in Figure 1 ; and
Figure 12 is a table of classification prediction confidence levels using the apparatus shown in Figure 1 .
DESCRIPTION OF THE PREFERRED EMBODIMENT Referring initially to Figure 1 there is shown an apparatus 10 for classifying media or information flowing through a path of a communications network which in this case is the Internet.
As can be seen, network traffic passing through the apparatus 10 is captured and analysed for providing statistics relating to interactions between two or more terminals (not shown). The captured traffic is first checked against a list of predetermined classifications to determine if it is known or unknown. When the captured traffic is of an unknown classification, various models (to be described more fully below) are applied to the data set in the captured traffic in order to predict the content classification. The models use parameters derived from a knowledge base of previously classified data sets and fitness with these parameters to determine the classification of the content of the newly captured traffic. Thus, the web site sending the captured traffic is now classified and is added to the list of known classifications.
It should be noted that the embodiment of the apparatus 10 as described herein is for an analysis of transmission traffic using the HTTP protocol. The apparatus 10 according to the present invention is not restricted to HTTP, and is easily adaptable to analyse data carried within any networks using any known protocol. Examples of the protocols include FTP, SMTP, NNTP, etc.
Following classification the captured data set is stored in the knowledge base. As the knowledge base expands, more data are used for the model parameters. This refines the apparatus and results in improved predictive performance.
The sites that are deemed to include undesirable information are added to Access control lists (ACLs). The ACLs are used control the flow of content information between terminals. E.g. Undesired content information can be prevented from travelling further through the network by simply not forwarding it, or by replacing it, or by intercepting the request for such content information and modifying its destination.
Classification of traffic from content servers are relatively static. On the other hand, user terminals that interact with these content servers are variable and their classifications are considered transient classifications.
Whereas classifications of content servers form a model of the style of content residing on the server, transient classifications form a model of style of content being viewed by a user terminal, or content consumer. This in effect forms a behaviour profile of such a consumer. This profile can be used to tailor the content information to suit the consumer. As mentioned earlier the apparatus 10 captures a set of observed data relating to a network interaction event, and provides a set of results indicating the classification of a resource or personality residing at each network node involved in the interaction. This is accomplished by applying various statistical models to a profile, and testing this against results obtained from profiles of known classifications. In this example of the invention this process is represented by the following formulas: x is an unknown profile to be classified;
Profiles p1 ,p2,p3...pn are of known classifications; Models M1 ,M2,M3...Mn are available to operate on these profiles; and
C1 ,C2,C3...Cn are profile classifications.
The population of a profile of classification C1 , may be defined by the population of M1 (p). M1 (x) may be tested against the true population using any of the standard statistical hypothesis methods. A pre-determined set of media terminals of a classification are modelled by various models M1 , M2 .. Mn. Each model consists of an approach and a set of parameter, e.g linear regression, gradient and point of interception, so that for a single classification M1 (p1 ,p2 .. pn), M2(q1 ,q2 .. qn) .. Mn(r1 ,r2 .. rn) are used to model the population from the classification. The models may be based on mathematical structures, or arbitrary rules.
The models are continually refined as more network traffic passes through the apparatus 10, thereby increasing the population space from which the classifications are computed.
A terminal may be permanently or transitionally defined in relation to a classification. A transitionally defined terminal may move between classifications based on the fitness of the observed traffic to the models of the various classifications.
Figures 2 to 8 are tables of selected data of traffic for testing the profile of data during a network interaction with a content server to determine if it contains media content of a pornographic nature. Assumption is made that profiles for content servers contain a variable which is the average size of graphical images served.
A normai distribution or similar non-deterministic probability distribution is then used to test the hypothesis that the profile belongs to a population classified as pornographic. In this example, the population of the classification may be defined by the population of N(a,b) where N is the image size and a and b are the mean and variance respectively, based on a normal distribution. The average and standard deviation derived from the observed samples is tested against the true population using standard statistical hypothesis methods. In some cases this approach may be broadened to encompass analysis of variance methods with multiple dependant variables, to model the characteristics of a site. Traditional ANOVA or regressive techniques may be applied to model the media content.
A variety of traditional deterministic and non-deterministic models may be applied to determine the hypothesis of profile classification. These may be changed or upgraded continually depending on the level of predictive power found. The functionality of models used is not limited to, but can include simple rules-of- thumb, deterministic and non-deterministic probability models, or arbitrary calculations. The choice of model is primarily dictated by the predictive power of that model against the population in question.
Figures 2 through 8 show examples of basic data set that can be gathered by observing network traffic of a typical interaction between a client browser and a web server. Figures 9 to 1 1 illustrate a simple classification model. This model looks at the size, content and relationships of objects being transmitted by a content server.
The outcome of this model is to determine if the media being transmitted has pornographic content.
Classification: pornographic Standard Model:
N 1 (a,b) Where N 1 is the image size, a and b are the mean and variance respectively, based on a normal distribution. N2(c,d)
Where N2 is the ratio of text to graphics, c and d are the total size of the text and graphic objects respectively. N3(e)
Where N3 is the count of word patterns matched from a list of pre-determined words, and e is the text of an object.
Observed Samples are given in the tables shown in Figures 2 to 8. For model N 1 shown in Figure 9, there is applied the normal distribution hypothesis test to the observed samples deriving the results.
The result shows confidence to the 93% and 87% level for sites 6 and 7 respectively, that the sites belong to a population of pornographic sites. The other samples give much lower confidence levels. For model N2 shown in Figure 10, a simple rule is used to test if the ratio is below a pre-determined threshold. The results show that sites 2, 4, 6 and 7 are within the threshold rating.
For Model N3 shown in Figure 1 1 , a simple rule is used to test if the number of words matching a list of patterns, exceeds a pre-determined threshold. The results show that sites 6 and 7 exceed the threshold.
A weighting formula is then applied to derive a final result as shown in Figure 12.
Therefore, using this example model, the apparatus 10 would predict that sites 6 and 7 are probably serving media with pornographic content, whereas sites 1 through 5 probably are not.
The attached appendix shows an example of the set of rules, constants and formulas which determine a confidence prediction based on logistic regression. The rules are defined using "Submodel" and "Model" components to define individial data points, and aggregated data points. These are then referred to in the "ProbabilityAnalyser" equations which use standard predictive formulas. Whilst the above has been given by way of illustrative example of the present invention many variations and modifications thereto will be apparent to those skilled in the art without departing from the broad ambit and scope of the invention as herein set forth.

Claims

1 . An apparatus for classifying information on communications network, the apparatus comprises means for obtaining one or more transmission characteristics of information on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
2. An apparatus for classifying content servers which are accessible on a communications network, the apparatus comprises means for obtaining one or more transmission characteristics of information provided by any of said content servers on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
3. A computer program for classifying information which is accessible on a communications network, the program comprises means for obtaining one or more transmission characteristics of information on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
4. A computer program for classifying content servers which are accessible on a communications network, the apparatus comprises means for obtaining one or more transmission characteristics of information provided by any of said content servers on a path of said communications network, analysing means for predicting a classification of said information based on said one or more transmission characteristics.
5. An apparatus for classifying user profiles of users accessing information or content servers on a communications network, the apparatus comprises means for obtaining one or more transmission characteristics of information or information provided by any one of said content servers on a path of said communications network, analysing means for predicting a classification of said information or said one content server based on said one or more transmission characteristics, and means for classifying user profile in accordance with the predicted classification.
6. A computer program for classifying user profiles of users accessing information or content servers on a communications network, the program comprises means for obtaining one or more transmission characteristics of information or information provided by any one of said content servers on a path of said communications network, analysing means for predicting a classification of said information or said one content server based on said one or more transmission characteristics, and means for classifying user profile in accordance with the predicted classification.
7. The invention according to any one of claims 1 to 6 further comprising means for storing said one or more transmission characteristics.
8. The invention according to any one of claims 1 to 7 wherein said one or more transmission characteristics include any one or more of network protocol, date and time stamps, size of transmission activities (text and image), content type of transmission activities, pattern seen within the content of the transmission and any other characteristic that can be employed for predicting classifications.
9. The invention according to any one of claims 1 to 8 wherein said one or more transmission characteristics are obtained from network packets or fragments thereof.
10. The invention according to any one of claims 1 to 9 wherein the analysing means includes profiling means for providing profiles of interactions based on said one or more transmission characteristics.
1 1 . The invention according to claim 10 said profiling means is arranged to process said one or more transmission characteristics for providing any one or more of frequency of interaction, duration of interaction, duration of absence of interaction, patterns of transmission, average number of http links within an object of related sites, average number of like sites visited within a time frame, and statistics from said other characteristics, for forming interaction profiles, and the analysing means is adapted to use the profiles for predicting classifications.
12. The invention according to any one of claims 1 to 1 1 further comprising a knowledge base of predetermined profiles, and the analysing means is adapted to predict a classification based on a comparison between the profile of information to be classified and predetermined profiles.
13. The invention according to claim 12 further comprising means for updating the knowledge base so that the classification prediction can be enhanced following classifications.
Appendix
#-
#Body Text Word Ratio Models SubModel Param AllWordCount WordList AllWords SubModel Param AllWordCount Context BODY
#
#Body Text Unique Word Ratio Models SubModel Param AllWordCountUnique WordList AllWords SubModel Param AllWordCountUnique Context BODY SubModel Param AllWordCountUnique Mode Distinct # .
#Meta Text Word Ratio Models
SubModel Param AllMetaWordCount WordList AllWords SubModel Param AllMetaWordCount Context META
# -_-_____
#Altemate Text Word Ratio Models
SubModel Param AllAlternateWordCount WordList AllWords
SubModel Param AllAlternateWordCount Context ALTERNATE # — — —
#lmage models
SubModel Param LargeGIFPictureCount Dimension 201 x 201 - 999 x 999
SubModel Param LargeGIFPictureCount ImageType GIF
SubModel Param ThumbnailGIFPictureCount Dimension 51 x 51 - 200 x 200 SubModel Param ThumbnailGIFPictureCount ImageType GIF
SubModel Param IconGIFPictureCount Dimension 5 x 5 - 50 x 50
SubModel Param IconGIFPictureCount ImageType GIF
SubModel Param AllGIFPictureCount ImageType GIF
Model Exp LargeGIFPictureRatio RATIO(LargeGIFPictureCount, AllGIFPictureCount) Model Exp ThumbnailGIFPictureRatio RATIO(ThumbnailGIFPictureCount, AllGIFPictureCount)
Model Exp IconGIFPictureRatio RATIO(lconGIFPictureCount, AllGIFPictureCount)
#
SubModel Param LargejPEGPictureCount Dimension 201 x 201 - 999 x 999
SubModel Param LargejPEGPictureCount ImageType JPEG
SubModel Param ThumbnailjPEGPictureCount Dimension 51 x 51 - 200 x 200 SubModel Param ThumbnailjPEGPictureCount ImageType JPEG
SubModel Param IconJPEGPictureCount Dimension 5 x 5 - 50 x 50 SubModel Param IconJPEGPictureCount ImageType JPEG SubModel Param AllJPEGPictureCount ImageType JPEG Model Exp LargeJPEGPictureRatio RATIO(LargeJPEGPictureCount, AllJPEGPictureCount)
Model Exp ThumbnailJPEGPictureRatio RATIO(ThumbnailJPEGPictureCount,
AllJPEGPictureCount)
Model Exp IconJPEGPictureRatio RATIO(lconJPEGPictureCount,
AllJPEGPictureCount) #
SubModel Param LowDepthGIFPictureCount Depth 2 - 4 SubModel Param LowDepthGIFPictureCount ImageType GIF SubModel Param MediumDepthGIFPictureCount Depth 5 - 6 SubModel Param MediumDepthGIFPictureCount ImageType GIF SubModel Param HighDepthGIFPictureCount Depth 7 - 1 6 SubModel Param HighDepthGIFPictureCount ImageType GIF Model Exp LowDepthGIFPictureRatio RATIO(LowDepthGIFPictureCount, AllGIFPictureCount) Model Exp MediumDepthGIFPictureRatio RATIO(MediumDepthGIFPictureCount, AllGIFPictureCount) Model Exp HighDepthGIFPictureRatio RATIO(HighDepthGIFPictureCount,
AllGIFPictureCount)
# SubModel Param LowDepthJPEGPictureCount Depth 2 - 7
SubModel Param LowDepthJPEGPictureCount ImageType JPEG
SubModel Param MediumDepthJPEGPictureCount Depth 8 - 1 5
SubModel Param MediumDepthJPEGPictureCount ImageType JPEG
SubModel Param HighDepthJPEGPictureCount Depth 16 - 36 SubModel Param HighDepthJPEGPictureCount ImageType JPEG
Model Exp LowDepthJPEGPictureRatio RATIO(LowDepthJPEGPictureCount,
AllJPEGPictureCount)
Model Exp MediumDepthJPEGPictureRatio
RATIO(MediumDepthJPEGPictureCount, AllJPEGPictureCount) Model Exp HighDepthJPEGPictureRatio RATIO(HighDepthJPEGPictureCount,
AllJPEGPictureCount)
# _
#Links Out Models
SubModel Param AllLinkOutCount IncludeLocal FALSE #
SubModel Param AVSLinkOutCount Classification ADULTVERIFICATION
SubModel Param AVSLinkOutCount IncludeLocal FALSE
Model Exp AVSLinkOutRatio RATIO(AVSLinkOutCount, AllLinkOutCount)
# begin porn.conf
#Body Text Word Count Models
SubModel Param PornExtraHardWordCount WordFile models/dictionary/porn/pom_words_extrahard.txt
SubModel Param Porn Hard WordCount WordFile models/dictionary/pom/porn_words_hard.txt SubModel Param PornMediumWordCount WordFile models/dictionary/porn/porn_words_medium.txt
SubModel Param Porn LiteWordCount WordFile models/dictionary/porn/porn_words_lite.txt
SubModel Param PornExtraLiteWordCount WordFile models/dictionary/porn/porn_words_extralite.txt
#
#Unique Body Text Word Count Models
SubModel Param PornExtraHardWordCountUnique WordFile models/dictionary/pom/porn_words_extrahard.txt
SubModel Param PornExtraHardWordCountUnique Mode Distinct
SubModel Param PomHardWordCountUnique WordFile models/dictionary/pom/porn_words_hard.txt
SubModel Param PomHardWordCountUnique Mode Distinct SubModel Param PornMediumWordCountUnique WordFile models/dictionary/porn/porn words_medium.txt SubModel Param PornMediumWordCountUnique Mode Distinct SubModel Param PornLiteWordCountUnique WordFile models/dictionary/pom/porn_wordsJite.txt SubModel Param PornLiteWordCountUnique Mode Distinct SubModel Param PornExtraLiteWordCountUnique WordFile models/dictionary/porn/pom_words_extralite.txt SubModel Param PornExtraLiteWordCountUnique Mode Distinct #
#Body Text Word Ratio Models
Model Exp PornTextWordRatioExtraHard RATIO(PomExtraHardWordCount, AllWordCount) Model Exp PornTextWordRatioHard RATIO(PornHardWordCount, AllWordCount) Model Exp PornTextWordRatioMedium RAT10(PornMediumWordCouri[, AllWordCount)
Model Exp PornTextWordRatioLite RATIO(PornLiteWordCount, AllWordCount) Model Exp PornTextWordRatioExtraLite RATIO(PornExtraLiteWordCount, AllWordCount) #
#Body Text Unique Word Ratio Models Model Exp PomTextWordRatioExtraHardUnique RATIO(PornExtraHardWordCountUnique, AllWordCountUnique) Model Exp PornTextWordRatioHardUnique RATIO(PornHardWordCountUnique, AllWordCountUnique)
Model Exp PornTextWordRatioMediumUnique RATIO(PomMediumWordCountUnique, AllWordCountUnique)
Model Exp PornTextWordRatioLiteUnique RATIO(PornLiteWordCountUnique,
AllWordCountUnique)
Model Exp PornTextWordRatioExtraLiteUnique
RATIO(PornExtraLiteWordCountUnique, AllWordCountUnique) #
#Domain Word Count Models
SubModel Param PornExtraHardDomainWordCount Context DOMAIN-NAME
SubModel Param PornExtraHardDomainWordCount WordFile models/dictionary/porn/porn_words_extrahard.txt SubModel Param PornHardDomainWordCount Context DOMAIN-NAME
SubModel Param PornHardDomainWordCount WordFile models/dictionary/porn/porn_words_hard.txt
SubModel Param PornMediumDomainWordCount Context DOMAIN-NAME
SubModel Param PornMediumDomainWordCount WordFile models/dictionary/porn/porn_words_medium.txt
SubModel Param PornLiteDomainWordCount Context DOMAIN-NAME SubModel Param PornLiteDomainWordCount WordFile model s/dictionary/porn/pom_words_lite.txt
SubModel Param PornExtraLiteDomainWordCount Context DOMAIN-NAME SubModel Param PornExtraLiteDomainWordCount WordFile models/dictionary/pom/porn_words_extralite.txt #__
#Meta Text Word Count Models
SubModel Param PornExtraHardMetaWordCount Context META SubModel Param PornExtraHardMetaWordCount WordFile models/dictionary/porn/pom_words_extrahard.txt SubModel Param PornHardMetaWordCount Context META SubModel Param PornHardMetaWordCount WordFile models/dictionary/porn/pom_words_hard.txt
SubModel Param PornMediumMetaWordCount Context META SubModel Param PornMediumMetaWordCount WordFile modeis/dictionary/porn/pom_words_medium.txt SubModel Param PornLiteMetaWordCount Context META SubModel Param PornLiteMetaWordCount WordFile models/dictionary/porn/porn_words_lite.txt SubModel Param PomExtraLiteMetaWordCount Context META SubModel Param PomExtraLiteMetaWordCount WordFile models/dictionary/pom/porn_words_extralite.txt # — — —
#Meta Text Word Ratio Models
Model Exp PornMetaWordRatioExtraHard
RATIO(PomExtraHardMetaWordCount, AllMetaWordCount)
Model Exp PornMetaWordRatioHard RATIO(PornHardMetaWordCount, AllMetaWordCount) Model Exp PornMetaWordRatioMedium RATIO(PornMediumMetaWordCount, AllMetaWordCount)
Model Exp PornMetaWordRatioLite RATIO(PornLiteMetaWordCount, AllMetaWordCount)
Model Exp PornMetaWordRatioExtraLite RATIO(PornExtraLiteMetaWordCount,
AllMetaWordCount)
#
#Alternate Text Word Count Models SubModel Param PornExtraHardAlternateWordCount Context ALTERNATE
SubModel Param PornExtraHardAlternateWordCount WordFile models/dictionary/porn/pom_words_extrahard.txt
SubModel Param PomHardAlternateWordCount Context ALTERNATE
SubModel Param PomHardAlternateWordCount WordFile models/dictionary/porn/porn_words_hard.txt
SubModel Param PornMediumAlternateWordCount Context ALTERNATE
SubModel Param PornMediumAlternateWordCount WordFile models/dictionary/porn/porn_words_medium.txt
SubModel Param PornLiteAlternateWordCount Context ALTERNATE SubModel Param PornLiteAlternateWordCount WordFile models/dictionary/porn/porn_wordsJite.txt
SubModel Param PornExtraLiteAlternateWordCount Context ALTERNATE
SubModel Param PornExtraLiteAlternateWordCount WordFile models/dictionary/porn/porn_words_extralite.txt #
#Alternate Text Word Ratio Models
Model Exp PornAlternateWordRatioExtraHard
RATIO(PomExtraHardAltemateWordCount, Al I AlternateWordCount)
Model Exp PornAlternateWordRatioHard RATIO(PomHardAlternateWordCount, AllAlternateWordCount) Model Exp PornAlternateWordRatioMedium RATIO(PornMediumAlternateWordCount, AllAlternateWordCount) Model Exp PornAlternateWordRatioLite RATIO(PomLiteAlternateWordCount, AllAlternateWordCount)
Model Exp PornAlternateWordRatioExtraLite RATIO(PornExtraLiteAlternateWordCount, AllAlternateWordCount)
# —
#Links Out Models SubModel Param PornLinkOutCount Classification PORN SubModel Param PornLinkOutCount IncludeLocal FALSE Model Exp PomLinkOutRatio RATIO(PornLinkOutCount, AllLinkOutCount) # _---_
#Logistic Models Model Exp PornLRConstant -3.9869
# — ___
Model Exp PornLRCoefficientPornTextWordRatioExtraHard 39.7450
Model Exp PornLRCoefficientPornTextWordRatioHard 355.0550 Model Exp PornLRCoefficientPornTextWordRatioMedium -1 36.436 Model Exp PornLRCoefficientPornTextWordRatioLite -63.2565 Model Exp PornLRCoefficientPornTextWordRatioExtraLite 33.9054
#
Model Exp PornLRCoefficientPomTextWordRatioExtraHardUnique 1 1 1 .4752 Model Exp PornLRCoefficientPornTextWordRatioHardUnique -72.7005
Model Exp PornLRCoefficientPornTextWordRatioMediumUnique 264.1902 Model Exp PomLRCoefficientPornTextWordRatioLiteUnique 125.0743 Model Exp PornLRCoefficientPornTextWordRatioExtraLiteUnique -1 6.6895
# Model Exp PornLRCoefficientPomExtraHardDomainWordCount 0.2598 Model Exp PornLRCoefficientPomHardDomainWordCount 2.1344 Mode Exp PornLRCoefficientPornMediumDomainWordCouni- 0 Mode Exp PornLRCoefficientPomLiteDomainWordCount 0.0610 Mode Exp PornLRCoefficientPornExtraLiteDomainWordCount 0
#
Mode Exp PomLRCoefficientPornMetaWordRatioExtraHard 0 Mode Exp PornLRCoefficientPornMetaWordRatioHard 0 Mode Exp PornLRCoefficientPornMetaWordRatioMedium 0 Mode Exp PornLRCoefficientPomMetaWordRatioLite 0 Mode Exp PomLRCoefficientPornMetaWordRatioExtraLite 0 #
Mode Exp PornLRCoefficientPornAlternateWordRatioExtraHard 16,1972 Mode Exp PornLRCoefficientPornAlternateWordRatioHard 0 Mode Exp PornLRCoefficientPomAltemateWordRatioMedium 26.4186 Mode Exp PornLRCoefficientPornAlternateWordRatioLite 0 Mode Exp PomLRCoefficientPornAlternateWordRatioExtraLite 14.161 5 #
Mode Exp PornLRCoefficientAllGIFPictureCount 0 Mode Exp PornLRCoefficientLargeGIFPictureCount 0
Mode Exp PornLRCoeffic entlconGIFPictureCount 0 Mode Exp PornLRCoeffic ent-ThumbnailGIFPictureCount 0 Mode Exp PornLRCoeffic entLargeGIFPictureRatio 0 Mode Exp PornLRCoeffic entlconGIFPictureRatio 0 Mode Exp PornLRCoeffic ent-ThumbnailGIFPictureRatio 0 Model Exp PornLRCoeffic entHighDepthGIFPictureCount 0 Mode Exp PornLRCoeffic ent-MediumDepthGIFPictureCount 0 Mode Exp PornLRCoeffic entLowDepthGIFPictureCount 0 Mode Exp PornLRCoeffic entHighDepthGIFPictureRatio 0 Mode Exp PornLRCoeffic entMediumDepthGIFPictureRatio 0 Mode Exp PornLRCoeffic entLowDepthGIFPictureRatio 0 Mode Exp PornLRCoefficientAllJPEGPictureCount 0 Mode Exp PornLRCoefficientLargeJPEGPictureCount 0 Mode Exp PornLRCoefficientlconJPEGPictureCount 0 Mode Exp PornLRCoefficientThumbnailJPEGPictureCount 0 Mode Exp PornLRCoefficientLargeJPEGPictureRatio 0 Mode Exp PornLRCoefficientlconJPEGPictureRatio 0 Mode Exp PornLRCoefficientThumbnailJPEGPictureRatio 0 Mode Exp PornLRCoefficientHighDepthJPEGPictureCount 0 Mode Exp PornLRCoefficientMediumDepthJPEGPictureCount 0 Mode Exp PornLRCoefficientLowDepthJPEGPictureCount 0 Mode Exp PornLRCoefficientHighDepthJPEGPictureRatio 0 Mode Exp PornLRCoefficientMediumDepthJPEGPictureRatio 0 Mode Exp PornLRCoefficientLowDepthJPEGPictureRatio 0 #
Mode Exp PornLRCoefficientPornLinkOutRatio 4.6958 Mode Exp PornLRCoefficientAVSLinkOutCount 0.3327 Mode Exp PornLRCoefficientAVSLinkOutRatio 3.6786
#
Model Exp PornLRLogOdds SUM(PornLRConstant, \
PRODUCT(PomLRCoefficientPornTextWordRatioExtraHard, PornTextWordRatioExtraHard), \
PRODUCT(PornLRCoefficientPornTextWordRatioHard, PornTextWordRatioHard), \
PRODUCT(PornLRCoefficientPornTextWordRatioMedium, PornTextWordRatioMedium), \
PRODUCT(PomLRCoefficientPornTextWordRatioLite, PornTextWordRatioLite), \
PRODUCT(PornLRCoefficientPornTextWordRatioExtraLite, PornTextWordRatioExtraLite), \ PRODUCT(PornLRCoefficientPornTextWordRatioExtraHardUnique, PornTextWordRatioExtraHardUnique), \
PRODUCT(PornLRCoefficientPornTextWordRatioHardUnique, PornTextWordRatioHardUnique), \
PRODUCT(PornLRCoefficientPomTextWordRatioMediumUnique, PornTextWordRatioMedium Unique), \
PRODUCT(PornLRCoefficientPornTextWordRatioLiteUnique, PornTextWordRatioLiteUnique), \ PRODUCT(PornLRCoefficientPornTextWordRatioExtraLiteUnique,
PornTextWordRatioExtraLiteUnique), \
PRODUCT(PomLRCoefficientPornExtraHardDomainWordCount, PornExtraHardDomainWordCount), \
PRODUCT(PomLRCoefficientPornHardDomainWordCount, PornHardDomainWordCount), \
PRODUCT(PornLRCoefficientPornMediumDomainWordCount, PornMediumDomainWordCount), \
PRODUCT(PornLRCoefficientPomLiteDomainWordCount, PomLiteDomainWordCount), \ PRODUCT(PornLRCoefficientPornExtraLiteDomainWordCount,
PornExtraLiteDomainWordCount), \
PRODUCT(PomLRCoefficientPornMetaWordRatioExtraHard, PornMetaWordRatioExtraHard), \
PRODUCT(PornLRCoefficientPomMetaWordRatioHard, PornMetaWordRatioHard), \
PRODUCT(PomLRCoefficientPornMetaWordRatioMedium, PornMetaWordRatioMedium), \
PRODUCT(PornLRCoefficientPomMetaWordRatioLite, PornMetaWordRatioLite), \ PRODUCT(PomLRCoefficientPornMetaWordRatioExtraLite,
PornMetaWordRatioExtraLite), \ PRODUCT(PornLRCoefficientPornAlternateWordRatioExtraHard, PornAlternateWordRatioExtraHard), \
PRODUCT(PomLRCoefficientPornAlternateWordRatioHard, PornAlternateWordRatioHard), \
PRODUCT(PornLRCoefficientPornAlternateWordRatioMedium, PornAlternateWordRatioMedium), \
PRODUCT(PornLRCoefficientPornAlternateWordRatioLite, PornAlternateWordRatioLite), \ PRODUCT(PornLRCoefficientPornAlternateWordRatioExtraLite,
PomAltemateWordRatioExtraLite), \
PRODUCT(PornLRCoefficientAllGIFPictureCount, AllGIFPictureCount), \
PRODUCT(PornLRCoefficientLargeGIFPictureCount, LargeGIFPictureCount), \ PRODUCT(PornLRCoefficientlconGIFPictureCount,
IconGIFPictureCount), \
PRODUCT(PornLRCoefficientThumbnailGIFPictureCount, ThumbnailGIFPictureCount), \
PRODUCT(PornLRCoefficientLargeGIFPictureRatio, LargeGIFPictureRatio), \
PRODUCT(PornLRCoefficientlconGIFPictureRatio, IconGIFPictureRatio), \
PRODUCT(PornLRCoefficientThumbnailGIFPictureRatio, ThumbnailGIFPictureRatio), \
PRODUCT(PomLRCoefficientHighDepthGIFPictureCount, HighDepthGIFPictureCount), \
PRODUCT(PornLRCoefficientMediumDepthGIFPictureCount, MediumDepthGIFPictureCount), \
PRODUCT(PomLRCoefficientLowDepthGIFPictureCount, LowDepthGIFPictureCount), \ PRODUCT(PomLRCoefficientHighDepthGIFPictureRatio,
HighDepthGIFPictureRatio), \ PRODUCT(PornLRCoefficientMediumDepthGIFPictureRatio, MediumDepthGIFPictureRatio), \
PRODUCT(PornLRCoefficientLowDepthGIFPictureRatio, LowDepthGIFPictureRatio), \
PRODUCT(PornLRCoefficientAllJPEGPictureCount, AllJPEGPictureCount), \
PRODUCT(PornLRCoefficientLargeJPEGPictureCount, LargejPEGPictureCount), \ PRODUCT(PornLRCoefficientlconJPEGPictureCount,
IconJPEGPictureCount), \
PRODUCT(PornLRCoefficientThumbnailJPEGPictureCount, ThumbnailjPEGPictureCount), \
PRODUCT(PornLRCoefficientLargeJPEGPictureRatio, LargeJ EGPictureRatio), \
PRODUCT(PornLRCoefficientlconJPEGPictureRatio, IconJPEGPictureRatio), \
PRODUCT(PornLRCoefficientThumbnailJPEGPictureP-atio, ThumbnailJPEGPictureRatio), \ PRODUCT(PornLRCoefficientHighDepthJPEGPictureCount,
HighDepthJPEGPictureCount), \
PRODUCT(PornLRCoefficientMediumDepthJPEGPictureCount, MediumDepthJPEGPictureCount), \
PRODUCT(PornLRCoefficientLowDepthJPEGPictureCount, LowDepthJPEGPictureCount), \
PRODUCT(PomLRCoefficientHighDepthJPEGPictureRatio, HighDepthJPEGPictureRatio), \
PRODUCT(PornLRCoefficientMediumDepthJPEGPictureRatio, MediumDepthJPEGPictureRatio), \ PRODUCT(PornLRCoefficientLowDepthJPEGPictureRatio,
LowDepthJPEGPictureRatio), \ PRODUCT(PomLRCoefficientPornLinkOutRatio, PornLinkOutRatio), \ PRODUCT(PornLRCoefficientAVSLinkOutCount, AVSLinkOutCount), \ PRODUCT(PornLRCoefficientAVSLinkOutRatio, AVSLinkOutRatio)) #-
#Probability Analysers
ProbabilityAnalyser Param PomAltMetaWordCountProbability Classification PORN
ProbabilityAnalyser Exp PomAltMetaWordCountProbability \ SUM(PornExtraHardMetaWordCount, PornHardMetaWordCount, \
PRODUCT(0.5,PornMediumMetaWordCount), \ PornExtraHardAlternateWordCount, PomHardAlternateWordCount, \
PRODUCT(0.5,PornMediumAlternateWordCount)) ProbabilityAnalyser Param PornMetaWordRatioProbability Classification PORN ProbabilityAnalyser Exp PornMetaWordRatioProbability \
PRODUCT000, SUM(PornMetaWordRatioExtraHard, \
PornMetaWordRatioHard, PornMetaWordRatioMedium)) ProbabilityAnalyser Param PornLRProbability Classification PORN ProbabilityAnalyser Exp PornLRProbability PRODUCT(100,
RATIO(1 ,SUM(1 ,EXP(MIN US(PomLRLogOdds)))))
#-
PCT/AU2000/000158 1999-03-04 2000-03-06 Apparatus and system for classifying and control access to information WO2000052598A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CA002363574A CA2363574A1 (en) 1999-03-04 2000-03-06 Apparatus and system for classifying and control access to information
AU28959/00A AU761017B2 (en) 1999-03-04 2000-03-06 Apparatus and system for classifying and control access to information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPP9048 1999-03-04
AUPP9048A AUPP904899A0 (en) 1999-03-04 1999-03-04 Apparatus and system for classifying and control access to information

Publications (1)

Publication Number Publication Date
WO2000052598A1 true WO2000052598A1 (en) 2000-09-08

Family

ID=3813246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2000/000158 WO2000052598A1 (en) 1999-03-04 2000-03-06 Apparatus and system for classifying and control access to information

Country Status (4)

Country Link
AU (1) AUPP904899A0 (en)
CA (1) CA2363574A1 (en)
TW (1) TW462164B (en)
WO (1) WO2000052598A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003044703A1 (en) * 2001-11-22 2003-05-30 Mobicus Oy A system and a method for generating personalized messages
EP1377024A2 (en) * 2002-06-27 2004-01-02 Fuji Photo Film Co., Ltd. Image processing apparatus, image processing method, and computer readable medium storing program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5678041A (en) * 1995-06-06 1997-10-14 At&T System and method for restricting user access rights on the internet based on rating information stored in a relational database
US5706507A (en) * 1995-07-05 1998-01-06 International Business Machines Corporation System and method for controlling access to data located on a content server
US5835722A (en) * 1996-06-27 1998-11-10 Logon Data Corporation System to control content and prohibit certain interactive attempts by a person using a personal computer
US5835905A (en) * 1997-04-09 1998-11-10 Xerox Corporation System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents
US5867799A (en) * 1996-04-04 1999-02-02 Lang; Andrew K. Information system and method for filtering a massive flow of information entities to meet user information classification needs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5678041A (en) * 1995-06-06 1997-10-14 At&T System and method for restricting user access rights on the internet based on rating information stored in a relational database
US5706507A (en) * 1995-07-05 1998-01-06 International Business Machines Corporation System and method for controlling access to data located on a content server
US5867799A (en) * 1996-04-04 1999-02-02 Lang; Andrew K. Information system and method for filtering a massive flow of information entities to meet user information classification needs
US5835722A (en) * 1996-06-27 1998-11-10 Logon Data Corporation System to control content and prohibit certain interactive attempts by a person using a personal computer
US5835905A (en) * 1997-04-09 1998-11-10 Xerox Corporation System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003044703A1 (en) * 2001-11-22 2003-05-30 Mobicus Oy A system and a method for generating personalized messages
EP1377024A2 (en) * 2002-06-27 2004-01-02 Fuji Photo Film Co., Ltd. Image processing apparatus, image processing method, and computer readable medium storing program
EP1377024A3 (en) * 2002-06-27 2004-12-08 Fuji Photo Film Co., Ltd. Image processing apparatus, image processing method, and computer readable medium storing program
US7474768B2 (en) 2002-06-27 2009-01-06 Fujifilm Corporation Image processing apparatus, image processing method, and computer readable medium storing program

Also Published As

Publication number Publication date
AUPP904899A0 (en) 1999-03-25
TW462164B (en) 2001-11-01
CA2363574A1 (en) 2000-09-08

Similar Documents

Publication Publication Date Title
AU2008100859A4 (en) Method and apparatus for restricting access to network accessible digital information
Ring et al. Flow-based network traffic generation using generative adversarial networks
US7636777B1 (en) Restricting access to requested resources
US7594019B2 (en) System and method for adult approval URL pre-screening
JP5792198B2 (en) URL filtering based on user browsing history
JP4891299B2 (en) User authentication system and method using IP address
JP2003263529A (en) Offline behavior analysis for online personalisation of value added services
US7089246B1 (en) Overriding content ratings and restricting access to requested resources
US6523023B1 (en) Method system and computer program product for distributed internet information search and retrieval
Verma et al. Policy-based management of content distribution networks
CA2475323A1 (en) Url based filtering of electronic communications and web pages
KR20010097250A (en) Apparatus and method for intercept link of unwholesom site in internet
JP2019523584A (en) Network attack prevention system and method
US20040267929A1 (en) Method, system and computer program products for adaptive web-site access blocking
US20060036728A1 (en) Systems and methods for categorizing network traffic content
CA2538693A1 (en) Personalisation
CN107733867A (en) It is a kind of to find Botnet and the method and system of protection
Greenfield et al. Effectiveness of Internet filtering software products
Masoud et al. On tackling social engineering web phishing attacks utilizing software defined networks (SDN) approach
US7971054B1 (en) Method of and system for real-time form and content classification of data streams for filtering applications
EP4033717A1 (en) Distinguishing network connection requests
WO2000052598A1 (en) Apparatus and system for classifying and control access to information
AU761017B2 (en) Apparatus and system for classifying and control access to information
KR200216643Y1 (en) Apparatus for intercept link of unwholesom site in internet
Yang et al. Adaptive delivery of HTML contents

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 2363574

Country of ref document: CA

Kind code of ref document: A

Ref document number: 2363574

WWE Wipo information: entry into national phase

Ref document number: 28959/00

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 09914733

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
WWG Wipo information: grant in national office

Ref document number: 28959/00

Country of ref document: AU