WO2000052598A1 - Apparatus and system for classifying and control access to information - Google Patents
Apparatus and system for classifying and control access to information Download PDFInfo
- Publication number
- WO2000052598A1 WO2000052598A1 PCT/AU2000/000158 AU0000158W WO0052598A1 WO 2000052598 A1 WO2000052598 A1 WO 2000052598A1 AU 0000158 W AU0000158 W AU 0000158W WO 0052598 A1 WO0052598 A1 WO 0052598A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- param
- submodel
- exp
- product
- porn
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
- H04L63/102—Entity profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
- H04L63/101—Access control lists [ACL]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/30—Profiles
- H04L67/306—User profiles
Definitions
- TECHNICAL FIELD OF THE INVENTION relates to apparatus and system for classifying information on communications network and in particular but not limited to apparatus and system for classifying content servers and for selectively controlling access to classified content servers.
- Internet in particular allows fetching of information from any cooperating computers or content servers located in different parts of the world by simply clicking references to the information.
- the number of accessible computers or content servers and the amount of information over the communications network grow daily it becomes increasingly difficult to classify them manually.
- Known systems for controlling the types of information accessible on a network rely on comparing a requested destination with those on pre-determined Access Control Lists (ACL) or on word matching to determine whether to allow or deny access.
- ACL Access Control Lists
- This approach can be applied at the client node prior to requesting the information or on any suitably intelligent network device capable of intercepting the request or subsequent reply prior to it reaching the requester.
- a software program for monitoring such requests on the PC can be configured to scan a pre-determined list of site addresses for a match. If found, access to the site may be denied and a suitable message is then displayed informing the user that access is denied.
- the request may be allowed to proceed, but as data are received from the site they are scanned for checking a match with one or more sets of pre-determined words, word fragments or phrases. If a match is found the site is not displayed on the computer but instead there is shown a suitable message.
- this type of control software is installed on a PC or work station which does not have particularly strict access privileges. The control software can be easily removed, disabled or otherwise circumvented and thereby defeating the control system.
- a network device capable of intercepting the request or reply to a request may perform similar actions using the same methods of web site matching. This is usually maintained by a network administrator with strict access rights. Also, a network requiring clients to connect through the network device in order to access the network can have its content control enforced. This allows content control of multiple clients from one central point.
- a system based on an access control list of prohibited sites is much more selective. Access can only be denied when attempting to access the sites which are included in the lists. While a suitably large list could bar access to a great deal of undesirable information it is difficult to keep up to date due to the rapid increase in the number of new sites and removal of sites.
- An object of the present invention is to alleviate or to reduce to a certain degree one or more of the above disadvantages.
- Another object of the present invention is provide an apparatus/system for classifying user profiles.
- SUMMARY OF THE INVENTION the present invention resides in an apparatus for classifying information on communications network.
- the apparatus comprises means for obtaining one or more transmission characteristics of information on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
- the present invention resides in an apparatus for classifying content servers which are accessible on a communications network.
- the apparatus comprises means for obtaining one or more transmission characteristics of information provided by any of said content servers on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
- the present invention resides in a computer program for classifying information which is accessible on a communications network.
- the program comprises means for obtaining one or more transmission characteristics of information on a path of said communications network, and analysing means for predicting a classification of said information based on said one or more transmission characteristics.
- the present invention resides in a computer program for classifying content servers which are accessible on a communications network.
- the apparatus comprises means for obtaining one or more transmission characteristics of information provided by any of said content servers on a path of said communications network, analysing means for predicting a classification of said information based on said one or more transmission characteristics.
- the present invention resides in an apparatus/computer program for classifying user profiles of users accessing information or content servers on a communications network.
- the apparatus/computer program comprises means for obtaining one or more transmission characteristics of information or information provided by any one of said content servers on a path of said communications network, analysing means for predicting a classification of said information or said one content server based on said one or more transmission characteristics, and means for classifying user profile in accordance with the predicted classification.
- the above invention may also comprise means for storing said one or more transmission characteristics.
- said one or more transmission characteristics include any one or more of network protocol, date and time stamps, size of transmission activities (text and image), content type of transmission activities, pattern seen within the content of the transmission and any other characteristic that can be employed for predicting classifications.
- said one or more transmission characteristics are obtained from network packets or fragments thereof.
- the analysing means includes profiling means for providing profiles of interactions based on said one or more transmission characteristics.
- profiling means is arranged to process said one or more transmission characteristics for providing any one or more of frequency of interaction, duration of interaction, duration of absence of interaction, patterns of transmission, average number of http links within an object of related sites, average number of like sites visited within a time frame, and statistics from said other characteristics, for forming interaction profiles.
- the analysing means can then use the profiles for predicting classifications.
- the invention may have a knowledge base of predetermined profiles, and the analysing means is adapted to predict a classification based on a comparison between the profile of information to be classified and predetermined profiles.
- the invention may have means for updating the knowledge base so that the classification prediction may be enhanced fol lowing classifications.
- Figure 1 is a schematic diagram of the apparatus according to the invention
- Figure 2 is a table of selected data of captured packets of a search engine using the apparatus shown in Figure 1 ;
- Figure 3 is a partial table of selected data of captured packets of a news web site using the apparatus shown in Figure 1 ;
- Figure 4 is a table of selected data of captured packets of an entertainment web site using the apparatus shown in Figure 1 ;
- Figure 5 is a table of selected data of captured packets of the web site of an e-commerce merchant using the apparatus shown in Figure 1 ;
- Figure 6 is a table of selected data of captured packets of the web site of another e-commerce merchant using the apparatus shown in Figure 1 ;
- Figure 7 is a table of selected data of captured packets of a pornography web site using the apparatus shown in Figure 1 ;
- Figure 8 is a table of selected data of captured packets of another pornography web site using the apparatus shown in Figure 1 ;
- Figure 9 is a table of model N 1 results using the apparatus shown in Figure
- Figure 10 is a table of model N2 results using the apparatus shown in Figure
- Figure 1 1 is a table of model N3 results using the apparatus shown in Figure 1 ;
- Figure 12 is a table of classification prediction confidence levels using the apparatus shown in Figure 1 .
- FIG. 1 there is shown an apparatus 10 for classifying media or information flowing through a path of a communications network which in this case is the Internet.
- network traffic passing through the apparatus 10 is captured and analysed for providing statistics relating to interactions between two or more terminals (not shown).
- the captured traffic is first checked against a list of predetermined classifications to determine if it is known or unknown.
- various models (to be described more fully below) are applied to the data set in the captured traffic in order to predict the content classification.
- the models use parameters derived from a knowledge base of previously classified data sets and fitness with these parameters to determine the classification of the content of the newly captured traffic.
- the web site sending the captured traffic is now classified and is added to the list of known classifications.
- the embodiment of the apparatus 10 as described herein is for an analysis of transmission traffic using the HTTP protocol.
- the apparatus 10 according to the present invention is not restricted to HTTP, and is easily adaptable to analyse data carried within any networks using any known protocol. Examples of the protocols include FTP, SMTP, NNTP, etc.
- the captured data set is stored in the knowledge base. As the knowledge base expands, more data are used for the model parameters. This refines the apparatus and results in improved predictive performance.
- ACLs Access control lists
- the sites that are deemed to include undesirable information are added to Access control lists (ACLs).
- the ACLs are used control the flow of content information between terminals. E.g. Undesired content information can be prevented from travelling further through the network by simply not forwarding it, or by replacing it, or by intercepting the request for such content information and modifying its destination.
- Classification of traffic from content servers are relatively static.
- user terminals that interact with these content servers are variable and their classifications are considered transient classifications.
- classifications of content servers form a model of the style of content residing on the server
- transient classifications form a model of style of content being viewed by a user terminal, or content consumer. This in effect forms a behaviour profile of such a consumer. This profile can be used to tailor the content information to suit the consumer.
- the apparatus 10 captures a set of observed data relating to a network interaction event, and provides a set of results indicating the classification of a resource or personality residing at each network node involved in the interaction. This is accomplished by applying various statistical models to a profile, and testing this against results obtained from profiles of known classifications. In this example of the invention this process is represented by the following formulas: x is an unknown profile to be classified;
- Profiles p1 ,p2,p3...pn are of known classifications; Models M1 ,M2,M3...Mn are available to operate on these profiles; and
- C1 ,C2,C3...Cn are profile classifications.
- the population of a profile of classification C1 may be defined by the population of M1 (p).
- M1 (x) may be tested against the true population using any of the standard statistical hypothesis methods.
- a pre-determined set of media terminals of a classification are modelled by various models M1 , M2 .. Mn.
- Each model consists of an approach and a set of parameter, e.g linear regression, gradient and point of interception, so that for a single classification M1 (p1 ,p2 .. pn), M2(q1 ,q2 .. qn) .. Mn(r1 ,r2 .. rn) are used to model the population from the classification.
- the models may be based on mathematical structures, or arbitrary rules.
- the models are continually refined as more network traffic passes through the apparatus 10, thereby increasing the population space from which the classifications are computed.
- a terminal may be permanently or transitionally defined in relation to a classification.
- a transitionally defined terminal may move between classifications based on the fitness of the observed traffic to the models of the various classifications.
- Figures 2 to 8 are tables of selected data of traffic for testing the profile of data during a network interaction with a content server to determine if it contains media content of a pornographic nature. Assumption is made that profiles for content servers contain a variable which is the average size of graphical images served.
- a normai distribution or similar non-deterministic probability distribution is then used to test the hypothesis that the profile belongs to a population classified as pornographic.
- the population of the classification may be defined by the population of N(a,b) where N is the image size and a and b are the mean and variance respectively, based on a normal distribution.
- the average and standard deviation derived from the observed samples is tested against the true population using standard statistical hypothesis methods.
- this approach may be broadened to encompass analysis of variance methods with multiple dependant variables, to model the characteristics of a site.
- Traditional ANOVA or regressive techniques may be applied to model the media content.
- a variety of traditional deterministic and non-deterministic models may be applied to determine the hypothesis of profile classification. These may be changed or upgraded continually depending on the level of predictive power found.
- the functionality of models used is not limited to, but can include simple rules-of- thumb, deterministic and non-deterministic probability models, or arbitrary calculations. The choice of model is primarily dictated by the predictive power of that model against the population in question.
- Figures 2 through 8 show examples of basic data set that can be gathered by observing network traffic of a typical interaction between a client browser and a web server.
- Figures 9 to 1 1 illustrate a simple classification model. This model looks at the size, content and relationships of objects being transmitted by a content server.
- the outcome of this model is to determine if the media being transmitted has pornographic content.
- N 1 (a,b) Where N 1 is the image size, a and b are the mean and variance respectively, based on a normal distribution. N2(c,d)
- N2 is the ratio of text to graphics
- c and d are the total size of the text and graphic objects respectively.
- N3 is the count of word patterns matched from a list of pre-determined words
- e is the text of an object.
- the result shows confidence to the 93% and 87% level for sites 6 and 7 respectively, that the sites belong to a population of pornographic sites.
- the other samples give much lower confidence levels.
- model N2 shown in Figure 10 a simple rule is used to test if the ratio is below a pre-determined threshold. The results show that sites 2, 4, 6 and 7 are within the threshold rating.
- Model N3 shown in Figure 1 1 a simple rule is used to test if the number of words matching a list of patterns, exceeds a pre-determined threshold. The results show that sites 6 and 7 exceed the threshold.
- a weighting formula is then applied to derive a final result as shown in Figure 12.
- the apparatus 10 would predict that sites 6 and 7 are probably serving media with pornographic content, whereas sites 1 through 5 probably are not.
- the attached appendix shows an example of the set of rules, constants and formulas which determine a confidence prediction based on logistic regression.
- the rules are defined using "Submodel” and “Model” components to define individial data points, and aggregated data points. These are then referred to in the "ProbabilityAnalyser” equations which use standard predictive formulas.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002363574A CA2363574A1 (en) | 1999-03-04 | 2000-03-06 | Apparatus and system for classifying and control access to information |
AU28959/00A AU761017B2 (en) | 1999-03-04 | 2000-03-06 | Apparatus and system for classifying and control access to information |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AUPP9048 | 1999-03-04 | ||
AUPP9048A AUPP904899A0 (en) | 1999-03-04 | 1999-03-04 | Apparatus and system for classifying and control access to information |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000052598A1 true WO2000052598A1 (en) | 2000-09-08 |
Family
ID=3813246
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2000/000158 WO2000052598A1 (en) | 1999-03-04 | 2000-03-06 | Apparatus and system for classifying and control access to information |
Country Status (4)
Country | Link |
---|---|
AU (1) | AUPP904899A0 (en) |
CA (1) | CA2363574A1 (en) |
TW (1) | TW462164B (en) |
WO (1) | WO2000052598A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003044703A1 (en) * | 2001-11-22 | 2003-05-30 | Mobicus Oy | A system and a method for generating personalized messages |
EP1377024A2 (en) * | 2002-06-27 | 2004-01-02 | Fuji Photo Film Co., Ltd. | Image processing apparatus, image processing method, and computer readable medium storing program |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5678041A (en) * | 1995-06-06 | 1997-10-14 | At&T | System and method for restricting user access rights on the internet based on rating information stored in a relational database |
US5706507A (en) * | 1995-07-05 | 1998-01-06 | International Business Machines Corporation | System and method for controlling access to data located on a content server |
US5835722A (en) * | 1996-06-27 | 1998-11-10 | Logon Data Corporation | System to control content and prohibit certain interactive attempts by a person using a personal computer |
US5835905A (en) * | 1997-04-09 | 1998-11-10 | Xerox Corporation | System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents |
US5867799A (en) * | 1996-04-04 | 1999-02-02 | Lang; Andrew K. | Information system and method for filtering a massive flow of information entities to meet user information classification needs |
-
1999
- 1999-03-04 AU AUPP9048A patent/AUPP904899A0/en not_active Abandoned
-
2000
- 2000-03-02 TW TW089103925A patent/TW462164B/en active
- 2000-03-06 CA CA002363574A patent/CA2363574A1/en not_active Abandoned
- 2000-03-06 WO PCT/AU2000/000158 patent/WO2000052598A1/en active IP Right Grant
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5678041A (en) * | 1995-06-06 | 1997-10-14 | At&T | System and method for restricting user access rights on the internet based on rating information stored in a relational database |
US5706507A (en) * | 1995-07-05 | 1998-01-06 | International Business Machines Corporation | System and method for controlling access to data located on a content server |
US5867799A (en) * | 1996-04-04 | 1999-02-02 | Lang; Andrew K. | Information system and method for filtering a massive flow of information entities to meet user information classification needs |
US5835722A (en) * | 1996-06-27 | 1998-11-10 | Logon Data Corporation | System to control content and prohibit certain interactive attempts by a person using a personal computer |
US5835905A (en) * | 1997-04-09 | 1998-11-10 | Xerox Corporation | System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003044703A1 (en) * | 2001-11-22 | 2003-05-30 | Mobicus Oy | A system and a method for generating personalized messages |
EP1377024A2 (en) * | 2002-06-27 | 2004-01-02 | Fuji Photo Film Co., Ltd. | Image processing apparatus, image processing method, and computer readable medium storing program |
EP1377024A3 (en) * | 2002-06-27 | 2004-12-08 | Fuji Photo Film Co., Ltd. | Image processing apparatus, image processing method, and computer readable medium storing program |
US7474768B2 (en) | 2002-06-27 | 2009-01-06 | Fujifilm Corporation | Image processing apparatus, image processing method, and computer readable medium storing program |
Also Published As
Publication number | Publication date |
---|---|
AUPP904899A0 (en) | 1999-03-25 |
TW462164B (en) | 2001-11-01 |
CA2363574A1 (en) | 2000-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2008100859A4 (en) | Method and apparatus for restricting access to network accessible digital information | |
Ring et al. | Flow-based network traffic generation using generative adversarial networks | |
US7636777B1 (en) | Restricting access to requested resources | |
US7594019B2 (en) | System and method for adult approval URL pre-screening | |
JP5792198B2 (en) | URL filtering based on user browsing history | |
JP4891299B2 (en) | User authentication system and method using IP address | |
JP2003263529A (en) | Offline behavior analysis for online personalisation of value added services | |
US7089246B1 (en) | Overriding content ratings and restricting access to requested resources | |
US6523023B1 (en) | Method system and computer program product for distributed internet information search and retrieval | |
Verma et al. | Policy-based management of content distribution networks | |
CA2475323A1 (en) | Url based filtering of electronic communications and web pages | |
KR20010097250A (en) | Apparatus and method for intercept link of unwholesom site in internet | |
JP2019523584A (en) | Network attack prevention system and method | |
US20040267929A1 (en) | Method, system and computer program products for adaptive web-site access blocking | |
US20060036728A1 (en) | Systems and methods for categorizing network traffic content | |
CA2538693A1 (en) | Personalisation | |
CN107733867A (en) | It is a kind of to find Botnet and the method and system of protection | |
Greenfield et al. | Effectiveness of Internet filtering software products | |
Masoud et al. | On tackling social engineering web phishing attacks utilizing software defined networks (SDN) approach | |
US7971054B1 (en) | Method of and system for real-time form and content classification of data streams for filtering applications | |
EP4033717A1 (en) | Distinguishing network connection requests | |
WO2000052598A1 (en) | Apparatus and system for classifying and control access to information | |
AU761017B2 (en) | Apparatus and system for classifying and control access to information | |
KR200216643Y1 (en) | Apparatus for intercept link of unwholesom site in internet | |
Yang et al. | Adaptive delivery of HTML contents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 2363574 Country of ref document: CA Kind code of ref document: A Ref document number: 2363574 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 28959/00 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09914733 Country of ref document: US |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
WWG | Wipo information: grant in national office |
Ref document number: 28959/00 Country of ref document: AU |