WO1999042935A1

WO1999042935A1 - System for searching and monitoring information on a computer network

Info

Publication number: WO1999042935A1
Application number: PCT/US1999/003838
Authority: WO
Inventors: Frank A. Cona, Jr.; Scott Plichta
Original assignee: Ip Warehouse, Inc.
Priority date: 1998-02-23
Filing date: 1999-02-23
Publication date: 1999-08-26

Abstract

The present invention is directed to a system for searching a plurality of specified information contained in electronic files on a computer network (4), wherein characteristic information regarding each of said electronic files is stored in information sources (11) across the computer network (4). The invention may include a user interface for sending search parameters and retrieving the specified information and an information collector (10) comprising a plurality of collecting modules (13, 14, 15) for each information source (9), wherein each of the collecting modules (13, 14) is capable of collecting characteristic information regarding each of the electronic files which match the search parameters from each of the information sources (11). The invention may also have a storage medium (9) for storing the characteristic information for each of the electronic file matching the search parameters, and an information clipper in communication with the storage medium (9), wherein the information clipper comprises a plurality of clipping modules (15) for each of the specified information, wherein each of the clipping modules is capable of retrieving the characteristic information, accessing each of the electronic file matching the search parameter, retrieving each of the specified information therefrom, and storing the specified information in the storage medium (9).

Description

System for Searching and Monitoring Information on a Computer Network

Field of the Invention

The present invention relates to a system for searching and monitoring of information on a computer network, and particularly to a method for searching and monitoring information on a decentralized, cross-platform computer network, and more particularly to searching and monitoring electronic documents and files located on the

Internet.

Background of the Invention

The growing proliferation of businesses and other content providers on large corporate, government, or public computer networks, such as the Internet, has lead to a tremendous increase in the amount of information present on these networks. Each day, more and more businesses and organizations are distributing advertising, product literature, press articles and products online.

This growth has created an important need to for a workable system with which to search these networks for the presence of improper uses of proprietary information and products, and of valuable intellectual property, such as trademarks and copyrighted material, and for monitoring the use thereof. This problem is explained below in more detail in regard to the Internet.

The Internet is a vast "network of networks" connecting a large number of computer networks and sub-networks to each other through several regional backbone systems. The Internet is a "decentralized" network, which means that each computer on the network can communicate with each other computer on the network and can do so without communicating with a central computer. The Internet is a "packet-switched" network, which means that information is transmitted among each of the computers using "packets" of data which are routed from one system to the next. One portion of the 2

Internet, the World Wide Web ("Web"), is growing at a rapid pace, as more and more businesses go online.

The Web is the most popular segment of the Internet today because it allows users to interact with each other and access content through the use of Web pages, containing a combination of text and multimedia components. The total number of Web documents and Web sites (locations on the Internet where businesses, organizations, or individuals store their Web documents for viewing over the Internet) is increasing exponentially.

This fact, coupled with the relative ease with which these documents can be created and made available, and the relative anonymity available to their authors, it has been an increasing problem that a growing portion of Web pages may contain false or misleading information, which can be a danger to the public and harmful to other businesses and organizations. These Web pages may also contain improper and/or unauthorized uses of valuable trademarks and copyrighted material and products, which could create confusion among the public as to the source of the content, and significant harm to the businesses or other organizations owning that intellectual property.

Moreover, the ability to monitor the activities of competitor's online, or to uncover strategic information, can provide businesses with a decided advantage in the marketplace.

Another significant portion of the Internet is USENET, which is the equivalent of an pubhc electronic bulletin board. USENET consists of a multitude of postings by users, which are posted into one or more "news groups," each of which is dedicated to a particular topic. Internet users post messages to one or more of these news groups and these messages can be read by other Internet users.

One other area of note on the Internet are Listservs, which are private versions of USENET. Internet users must subscribe to a particular Listserv, which is typically dedicated to a particular group or subject. Operation of the Listservs occurs in much the same manner as USENET. As with Web pages, USENET and Listserv postings may 3

contain unauthorized uses of trademarks brand names or other valuable intellectual property.

Accordingly, a system is needed whereby owners of trademarks, software, and other intellectual property or proprietary information can efficiently and reliably search for and momtor the use of their property on a decentralized public network such as the Internet. A system is also needed whereby the search for this information, and any suspect content which is located, can be adequately documented to provide reliable, authenticated evidence capable of being used as proof of such unauthorized use in any subsequent legal proceedings.

Objects of the Invention

It is, therefore, an object of the present invention to provide a system for searching and monitoring information on a computer network, such as the Internet.

It is another object of the invention to detect unauthorized uses of intellectual property such as trademarks and brand names, as well as unauthorized distribution of proprietary information and products, such as trade secrets or computer software.

It is a further object of the invention to adequately document the occurrence and manner of these unauthorized uses to provide evidence of sufficient reliability to be used in an subsequent legal proceeding regarding these unauthorized uses. Further objects of the present invention will become apparent to those of ordinary skill in the art based on the disclosure of the invention herein and in the appended claims.

Brief Description of the Drawings

FIG. 1 is a schematic illustration of the user interface used in a preferred embodiment of the present invention operated over the Internet.

FIG. 2 is a schematic illustration of a preferred embodiment of the present invention. 4

FIGS. 3(a)-(b) are computer screen shots demonstrating an HTML search submission form in the manner of the present invention.

FIG. 4 is a schematic illustration of the information collector used in a preferred embodiment of the present invention. FIG. 5 is a schematic illustration of the document retriever used in a preferred embodiment of the present invention.

FIG. 6 is a schematic illustration of the report generator used in a preferred embodiment of the present invention.

FIG. 7 is a flow chart illustrating operation of the information collector in a preferred embodiment of the present invention.

FIG. 8 is a flow chart illustrating operation of a document retrieval module in a preferred embodiment of the present invention.

FIGS. 9(a)-9(b) are computer screen shots demonstrating an HTML report generated in the manner of the present invention. FIG. 10 is a flowchart illustrating operation of a second document retrieval module in a preferred embodiment of the present invention.

Summary of the Invention

The present invention is directed to a system for searching for a plurality of specified information contained in electronic files on a computer network, wherein characteristic information regarding each of said electronic files is stored in information sources across the computer network. The invention may include a user interface for sending search parameters and retrieving the specified information and an information collector comprising a plurality of collecting modules for each information source, wherein each of the collecting modules is capable of collecting characteristic information regarding each of the electronic files which match the search parameters from each of the information sources. The invention may also have a storage medium for storing the 5

characteristic information for each of the electronic files matching the search parameters, and an information clipper in communication with the storage medium, wherein the information clipper comprises a plurality of clipping modules for each of the specified information, wherein each of the clipping modules is capable of retrieving the characteristic information, accessing each of the electronic files matching the search parameters, retrieving each of the specified information therefrom, and storing the specified information in the storage medium.

Description of a Preferred Embodiment The present invention will be understood more fully from the detailed description given below and from the accompanying drawings of preferred embodiments of the invention which, however, should not be taken to limit the invention to a specific embodiment but are for explanation and understanding only.

Fig. 1 is a schematic demonstrating the typical components used in a preferred embodiment of the user interface of the invention when used over the Internet. An electromc document, such as a Web page created using Hypertext Mark-Up Language (HTML), is loaded into Document Viewer 1. Document Viewer 1 may be any software application capable of viewing electromc documents and loading additional electronic documents from within the original document, such as through the use of a hypertext link (although not limited thereto).

For example, the Document Viewer could include a Web browser, such as Navigator from Netscape Communications or Microsoft's Internet Explorer, or could be a word processing application with hypertext capabilities, such as Corel's WordPerfect 8.0 or Microsoft Word '97. Electronic documents may be loaded automatically when Document Viewer 1 is first started, or may be opened into the viewer by the user from a file stored locally or at a remote URL. For example, the user may load the document by typing the document's 6

URL into the Web browser's command line. This will be explained in more detail in regard to Fig. 2.

Document Viewer 1 may be accessed by the user through any of a number of computer systems, such as through the use of a terminal connected to a main-frame system, from a personal computer, or over computer connected to a local computer network.

Document Viewer 1 is connected to the Internet along with other Document

Viewers and computers, such as Personal Computer 2, through Local Access Provider 3.

This connection is typically made through local telephone lines using an analog or ISDN modem, though it can be over a direct network connection, such as an Ethernet network.

Local Access Provider 3 maintains a computer network, which routes any requests from Document Viewer 1 to the appropriate location on the Internet. This is accomplished in a conventional manner, such as through the use of a modem pool connected to a local server and Internet gateway (not shown). Local Access Provider 3 connects Document Viewer 3 to Web Server 4 through any of a number of well-known connection schemes, such as through the use of leased lines.

Web Server 4 is typically a software application running on a remote computer which is capable of forwarding or processing requests from Document Viewer 1 which utilize the Hypertext Transfer Protocol (HTTP). For example, Web Server 4 may include any one of a number of well-known server applications, such as the NSCA Web server, the Apache Web server, etc. Web Server 4 passes a document request from Document Viewer 1 to Interface Server 6, which is the Web server at the location specified in the requested document's URL. The mechanism by which electromc documents are transmitted using the HTTP protocol is described in detail in co-pending U.S. Patent Application No. 09/027,766, incorporated by reference herein. Also present on the network are Document Server 5 and Index 7, the operation of which will be more fully described in regard to Fig. 2. 7

Interface Server 6 then returns the requested document, which allows the user to format a query request for submission to the system of the present invention. The document may contain input fields in which the user enters the parameters defining her search or query. The query information is returned to Interface Server 6 by simply "clicking" on the "Test Search" button at the bottom of the page.

Because a decentralized, public network such as the Internet requires the passage of information among a number of different Web servers before reaching Interface Server 6, it is important to safeguard against the interception of confidential or sensitive information transmitted between Document Viewer 1 and Interface Server 6. For example, a user who desires to submit a query, such as a trademark search, using the system of the present invention may have need to maintain the privacy of any and all information sent to and received from Interface Server 6.

The security of this information can be enhance through the use of encryption, such as through the creation of a Secure Socket Layer (SSL) between Document Viewer 1 and Interface Server 6. The use and operation of SSL is well known and prevents third parties from deciphering any transmission sent between Document Viewer 1 and Interface Server 6. The most common implementations of SSL utilize the RSA asymmetric encryption algorithm as discussed in U.S. Patent No. 4,405,829, incorporated by reference herein, although not limited thereto. Fig 2 is a schematic illustrating the general aspects of a preferred embodiment of the invention. Interface Server 6 receives a query request from Document Viewer 1, as discussed above. This request is then passed in a conventional manner to Query Input 8, which stores the query to be used in searching and monitoring information on the network. In a preferred embodiment of the invention, Query Input 8 is a computer program. This program may be continuously running to process search requests in real time, or may be executed by Interface Server 6 at the time a search request is received. 8

This is typically accomplished by using a Common Gateway Interface (CGI) integrated into Interface Server 6, which allows it to communicate with external programs or devices such as Query Input 8. The use of CGI in connection with HTTP document servers (Web servers) is well known to those of ordinary skill and will not be further described herein. Query Input 8 stores the query information by parsing information received from Interface Server 6 in a conventional manner and storing this information in Information Storage 9. Information Storage 9 is preferably a relational database application that can be used to store individual units of information in related fields. This may be accomplished, for example, through the using a scripting language, such as PERL, which has a variety of libraries for transferring information to and from relational databases through the use of SQL. A representative example of the information provided through Interface Server 6, parsed by Query Input 8, and stored in Information Storage 9 is shown in Table 1.

Table 1

Item Example Description

Matter Number 1020 Unique identifier associated with this particular query.

Referring Matter 1019 Identifier associated with previous query from which the present query originated.

Matter Type 0 (IP DragNet Search) Identifier defining the type of or query.

1 (URL CheckLink)

Client Jdoe Identifier associating a particular user with this query.

Client Email infot^ipwarehouse.com Client email address.

Client Reference 2034-10 Client identifier associated with this query, e.g. client matter no.).

Search Terms IP Warehouse, IP DragNet Words defining the query or parameters

http://www.ipwarehouse.com

Information Storage 9 may also contain additional information related to the user, such as a mail address, telephone and facsimile numbers, and one or more email addresses. This information can be used to notify the user that a query has been completed, or to provide other information as described in more detail below.

Information Collector 10 periodically retrieves query information for each matter from Information Storage 9 and processes the query. The manner in which each matter query is processed depends upon the Matter Type, as described in more detail below. This typically involves the use of Document Retriever 11, which receives URL queries from Information Collector 10 and retrieves some or all of the contents of the electronic documents or files located at the URLs contained in the URL query.

Information Collector 10 may also create and submit a search query to Index 7 to collect URLs relevant to the matter query in order to generate a URL query for Document Retriever 11. For example, if the matter query is a trademark search for all uses on the World Wide Web of a particular mark, such as "IP DragNet," Information Collector 10 will use the information contained in the matter query to generate a search query to send to one or more search engines located on the Internet (represented by Index 7).

Each of these search engines contain an index of URLs from across various portions of the Internet, as well as an abstract or full text of the electronic document or file located at that URL. When Information Collector 10 submits the search query to each of these search engines, the search engines return a list of URLs relevant to that search. Information Collector 10 then compares each list which it receives to remove any redundant URLs, and to remove any URLs which are to be excluded based upon the search criteria contained in the matter query. This list of URLs is then stored in Information Storage 9.

Information Collector 10 generates the search query by combining the information contained in the matter query with translation parameters for each search engine that are stored in Information Collector 10. The translation parameters allow the matter query to be understood and processed by each of the search engines, all of which use different 10

variables for searching their indexes. For example, each search engine may receive data in the form of key=value pairs encoded into the URL sent to the search engine.

Information Collector 10 converts the matter query information into key=value pairs that can be understood by the search engine. This may be accomplished using any number of conventional means well known to those in the art, such as through program code which assigns the values contained in the matter query to the proper key in each search engine's key=value pairs.

The URLs stored in each Index 7 are compiled and updated by the search engines by periodically retrieving electronic documents and files from each Document Server 5 and indexing the results. This is accomplished in a conventional manner, such as by using a Web Crawler, which is a program that follows hypertext links contained in Web page documents from document to document, and sends the information on each Web page back to the Index.

Once Document Retriever 11 has retrieved all of the documents or files from each of the URLs supplied by Information Collector 10, the relevant information for each matter query is extracted from the document and stored in Information Storage 9.

Thereafter, Report Generator 12 generates a report of the information uncovered during the query and transmits this information to the user through Interface Server 6. This report may be actively requested by the user, or may be sent automatically. Reports may be access over the Internet, preferably over an SSL connection, or may be received via email or facsimile. Alternatively, reports may be sent to a printer, where they are collected any manually sent to the user.

The operation of each of the components of the invention will now be described in more detail. In a preferred embodiment, Information Collector 10 comprises one or more Search Modules 13 and one or more Matter Modules 14 as illustrated in Fig. 4.

Each Matter Module 14 may consist of a segment of program code, which can be interchangeably added to or removed from Information Collector 10. 11

Each Matter Module 14 contains instructions for the operation of the Search Modules 13 and the processing of information returned from each Index 7. Each Search Module 13 contains program code providing instructions on the search parameters necessary for submitting a query to each Index 7 (e.g. search engine), such as assigning key=value pairs as discussed above.

This embodiment of the invention provides significant advantages in the development and use of the system. The Internet is growing and changing at a rapid pace. Technology that is used to locate and access information on the Internet is constantly changing and the amount of information to be found online is constantly increasing. The use of interchangeable program modules for different types of matters or services provides tremendous flexibility in how the invention may be used, while the underlying operation of the system remains unchanged. This allows new services to be added much more quickly than in a static system, in which the entire application would have to be modified and re-compiled to add any new services. Moreover, as new and different types of search indexes become available, they can be utilized by simply adding the appropriate Search Module 13 which is capable of interfacing each Matter Module 14 with the new index.

Several types of Search Modules 13 and Matter Modules 14 are shown in Fig. 4. This is an illustration of the use of the invention with current Internet transmission protocols. It is one of the important advantages of the invention that it can be easily adapted to operate with any new transmission protocol that may come into use in the future.

Information Collector 10 may contain Search Modules 13 for searching the various Internet search engines for the existence of Web pages relevant to the search (HTTP). It may also contain Search Modules 13 for searching through one or more USENET feeds and Listservs. Older documents stored in GOPHER sites may also be searched, as 12

well as computer files contained in FTP archives. Finally, domain names may also be searched for any matching DNS registrations.

As shown in Fig. 5, Document retriever 11 contains Clipping Modules 15, which typically correspond to the Search Module 13 protocols. Clipping Modules 15 contain program code for retrieving different types of electronic documents or files on the Internet and for extracting certain types of information from those files depending upon the type of matter for which the URL was retrieved.

For example, one Clipping Module 15 may retrieve a clip of text surrounding a usage of a trademark in a trademark search, highlighting that mark. These text clips can be used to provide a summary of how the trademark in question is being used in a particular document, without retaining the entire text of the document. This would allow the user to determine quickly if the document is worth reviewing or can be disregarded.

The program code necessary for generating such a text clip is well known to those of skill in the art. As shown in Fig. 6, Report Generator 12 may contain one or more Reporting

Modules 16 for reporting the information uncovered during the search to the user.

Several types of reporting may be employed, and more than one type of reporting may be utilized for each matter. Reports may be generated on-the-fly by Report Generator 12 when the user accesses the report through Interface Server 6. For example, the user may access a secure Web site and activate a hypertext link to Interface Server 6 from a document loaded into Document Viewer 1.

The URL request then transmitted to Interface Server 6 contains key=value pairs that identifies to Interface Server 6 and Report Generator 12 which report has been requested. Report Generator 12 retrieves the associated information stored in Information Storage 9, formats the search report in an HTML document, and sends this document back to Interface Server 6 for transmission to Document Viewer 1 for viewing by the user. 13

If the report is to be sent to the user via email or facsimile, then Report Generator 12 creates an electronic document, such as an ASCII text file, containing the search results. The email Reporting Module 16 or facsimile Reporting Module 16 then processes this file and transmits the file to the user's email address or facsimile number. This can be accomplished by using conventional program code well known to those of skill in the art. For example, if the program modules are written using the PERL programming language, then various utilities, such as SendMail, can be used to transmit the report.

Various specific embodiments of the invention will now be described in regard to particular types of matters for which the system can be used. However, these disclosed embodiments are examples of a number of different types of matters for which the system of the invention can be used and are provided for purposes of describing operation of the invention, which is not limited thereto.

Example 1: Single Search or Monitoring Services

One of the Matter Modules 14 that may be included in Information Collector 10 is a single subject search, such as a search for the usage of a particular trademark or brand name. The user accesses Interface Server 6, which may be a secure Web site, and accesses a search request form over an SSL connection, which loads into Document Viewer 1.

The user completes this form, filling in the input field with the appropriate search information, including the client reference, the trademark to be search (along with any variations thereof), any additional terms (such as goods or services) with which the mark is used, terms to be used to excluded irrelevant documents, particular Web sites to be excluded from the search, and possibly the manner in which the search is to be reported (such as an email address). The user then "clicks" on the button at the bottom of the form. 14

Document Viewer 1 transmits the form information to Interface Sever 6. This is typically accomplished using a GET or POST transmission, which encodes the search information as key=pairs transmitted to Information Server 6. Information Server 6 then activates Query Input 8, which parses the key=value pairs and stores the matter query information in Information Storage 9. Query Input 8 generates a unique matter number for this search so that it can be tracked and identified by the system. Query Input 8 also generates an HTML response , which it sends back to Interface Server 6 and Document Viewer 1 to inform the user that the search is being processed.

As shown in Fig. 7, the Single Search Matter Module 14 in Information Collector 10 periodically checks Information Storage 9 for the presence of new single search matters (17). Matter Modules 14 then retrieve the matter query information for each single search and activates the appropriate Search Module 13 (18-19). Which Search Modules 13 are appropriate will depend on the type of search selected by the user when submitting the original search request. For example, a comprehensive search might activate a search of all HTTP Indexes (search engines), all USENET feeds, all Listservs to which the system subscribes, all FTP archives, all listed GOPHER sites, and all of the root Domain Name Servers.

Each Search Module 13 receives the matter query information from the Single Search Module 14 and searches the associated Index 7 for all occurrences of the trademark in question (20). The results received are filtered to remove redundancies and are stored in Information Storage 9 (21-23). This process is repeated for all new single search matters (24).

As shown in Fig. 8, Text Clipping Modules 15 in Document Retriever 11 access Information Storage 9 to retrieve the new search results placed there by Information Collector 10 (25-26). Clipping Modules 15 then prepare the appropriate TCP/IP request for the document associated with each URL (e.g. HTTP, FTP, Gopher, etc.) and requests that document or file from the appropriate Document Server 5 (27-28). If Document 15

Server 5 responds positively, Clipping Modules 15 retrieve the document or file and parse the document, generating a text clip showing usage of the trademark in question on that page (29-31). If Document Server 5 responds negatively, then this is also noted by Clipping Modules 15. The resulting information is then stored in Information Storage 9 and the process is repeated (32-34). The exact time at which the document download was attempted or the text clip created is also stored in Information Storage 9. Documenting the exact time that each URL is processed provides the significant benefit that the reliability of the information obtained is greatly increased, enhancing the possible admission of that information as evidence in any subsequent legal proceeding.

When all of the URLs for a particular search (matter) have been reviewed, Clipping Modules 15 may generate a notification message to the user that the search is completed. This may be accomplished, for example, by the use of an email utility, such as Send Mail. Clipping Modules 15 generate a text message containing the Matter Number, the Client Reference, and relevant portions of the search criteria and send the message to the user.

The user then accesses Interface Server 6 via Document Viewer 1, activating a hypertext link associated with the Matter Number for that search. Interface Server 6 then activates HTTP Reporting Modules 16 in Report Generator 12, which retrieves the search results stored by Information Collector 10 and Document Retriever 11 for that search and generates an HTML document, which it returns to Document Viewer 1 through Interface Server 6. This may be accomplished in any number of conventional manners well known to those in the art, such as through the use of CGI, an executable program and an SQL interface to the database (Information Storage 9). Alternatively, Clipping Modules 15 may activate Email Reporting Modules 16 in

Report Generator 12, which retrieves the search results stored in Information Storage 9 16

and generates an email containing those results. This can, of course, be accomplished in the same manner as generating email messages previously described.

Also, Clipping Modules 15 may activate Facsimile Reporting Module 16 or Print

Reporting Module 16 in Report Generator 12. This may involve the use of a native application program interface (API) to operate a fax or print utility program integrated into the operating system of the computer on which Report Generator 12 resides. The uses of such APIs is well known to those of skill in the art.

The single search service described above may be modified to enable the same search criteria to be re-searched on a regular basis to effectively momtor usage of the trademark in question on the Internet. This is accomplished by selecting trademark monitoring in the search request form, which causes Query Input 8 to assign a different

Matter Type to the search.

The Monitoring Matter Module 14 in Information Collector 10 retrieves all matter queries identified with this Matter Type and generates a search in the same manner as a single search, except that the URLs retrieved from Index 7 are now compared against previously stored URLs and only new URLs are added to Information Storage 9. Each new URL is flagged in the database as new, so that Clipping Modules 15 will know to generate a text clip in the manner of a single search.

In regard to previously stored URLs, however, Clipping Modules 15 first checks to see if the content of the document associated with that URL has changed. If the content has changed, then a new text clip is generated and stored, if not, then the lack of change is noted and a clip is NOT generated. Determining whether the content of a document has changed can be accomplished in a number of convention manners, such as reviewing the CONTENT LENGTH value in the Document Header for the document, or by calculating the actual length of the document and comparing this to a value stored in the database. 17

After each monitoring search is completed, the user is notified in the same manner as with single searches.

Example 2: URL Verification Services In addition to collecting and reporting on multiple URLs associated with a particular trademark, the system of the present invention can be used to momtor and document individual URLs. For example, suppose a search report reveals the existence of a competitor who is disparaging the trademark owned by the user. The user wants to document this improper use of her trademark, wants to be notified of any changes in the document associated with that URL, and wants to record those changes.

This may be accomplished as follows. The HTML report retrieved by the user may contain a submission form consisting of a set of check boxes with each URL and text clipping, which allows the user to select additional services for those individual URLs.

At the end of the report is a submission button. When the user "clicks" on this button, Document Viewer 1 sends the form information to Interface Server 6, which activates Query Input 8. This is illustrated in Figs. 9(a)-9(b).

Query Input 8 assigns a Matter Number to the new matter query, and record the Referring Matter from which it came. Query Input 8 also assigns the Matter Type for the new matter query. The operation of two of these URL verification Matter Types are illustrated in Fig. 10.

URL Monitoring Matter Module 14 in Information Collector 10 retrieves the matter information for each URL Monitoring Matter from Information Storage 9 (35-36). This information is passed to Clipping Modules 15 in Document Retriever 11, which retrieves the document associated with that URL from Document Server 5 (37-40). Since the document is already stored in the database, having originated from a single search or monitoring, Clipping Modules 15 compares the URL information retrieved from Information Storage 9 with the new document information (41-42). If the 18

document has changed then Clipping Modules 15 checks to see if the user requested to be notified (44). If yes, then Clipping Modules 15 activate the appropriate Notification Module 16 in Report Generator 12 to notify the user of the change (45-46).

Alternatively, if the URL is also associated with Document Recordation Module 14 in Information Collector 10, then the entire document is downloaded and stored in a secure location. It will be appreciated that the URL monitoring service and the Document Recordation service can also be operated together, so that a user will be notified every time the document associated with the URL changes, and that change will be recorded by the system. Moreover, it is possible with the system of the present invention to submit individual URLs for monitoring and / or recordation directly through Interface Server 6, without a previous search or monitoring report containing that URL. This may be accomplished using an HTML form in a manner similar to that for requesting searches.

Although this invention has been described with reference to particular embodiments, it will be appreciated that many variations may be resorted to without departing from the spirit and scope of this invention. For example, the invention may be utilized on any computer network, such as corporate Intranets and other internetworked systems, and is not limited to the Internet or the World Wide Web. It will also be appreciated that many combinations of Web servers and CGI applications may used for accessing the data source, and the data source may be other than a relational database, such as an ACSII text file, or some other type of binary file. Also, the services accomplished in the system of the present invention may be accomplished by interconnecting the various components of the invention in different configurations. This is, in fact, one of the significant benefits of the invention.

Claims

19We Claim:

1. A system for searching for specified information contained in electronic files on a computer network, wherein characteristic information regarding each of said electronic files is stored in information sources across said computer network comprising: (a) a user interface for sending search parameters and retrieving said specified information from said system;

(b) an information collector, wherein said information collector is capable of collecting characteristic information regarding each of said electronic files which match said search parameters from said information sources; (c) a storage medium for storing said characteristic information for each of said electronic files matching said search parameters; and

(d) an information clipper in communication with said storage medium, wherein said information clipper is capable of retrieving said characteristic information, accessing said electronic files matching said search parameters, retrieving said specified information therefrom, and storing said specified information in said storage medium.

2. The system of Claim 1, wherein said computer network is the Internet.

3. The system of Claim 1, wherein said information sources comprise indexes of URLs.

4. The system of Claim 1, wherein said characteristic information is one or more selected from the group consisting of a document title, one or more document meta tags, a document URL, a document length, and a document type.

5. The system of Claim 1, wherein said user interface comprises a Web server, at least one HTML document, and at least one CGI application. 20

6. The system of Claim 1, wherein said storage medium comprises a relational database.

7. A system for searching for a plurality of specified information contained in electronic files on a computer network, wherein characteristic information regarding each of said electronic files is stored in information sources across said computer network comprising:

(a) a user interface for sending search parameters and retrieving said specified information from said system;

(b) an information collector, said information collector comprising a plurality of collecting modules for each information source, wherein each of said collecting modules is capable of collecting characteristic information regarding each of said electronic files which match said search parameters from each of said information sources;

(c) a storage medium for storing said characteristic information for each of said electronic files matching said search parameters; and

(d) an information clipper in communication with said storage medium, wherein said information clipper comprises a plurality of clipping modules for each of said specified information, wherein each of said clipping modules is capable of retrieving said characteristic information, accessing each of said electronic files matching said search parameters, retrieving each of said specified information therefrom, and storing said specified information in said storage medium.

8. The system of Claim 1, wherein said computer network is the Internet. 21

9. The system of Claim 7, wherein said user interface comprises an information reporter, said information reporter comprising a plurality of reporting modules for returning said plurality of specified information.

9. The system of Claim 7, wherein said information sources comprise indexes of document URLs.

10. The system of Claim 7, wherein said characteristic information is one or more type of information selected from the group consisting of a document title, one or more document meta tags, a document URL, a document length, and a document type.

11. The system of Claim 7, wherein said user interface comprises a Web server, at least one HTML document, and at least one CGI application.

12. The system of Claim 7, wherein said storage medium comprises a relational database.

13. The system of Claim 7, wherein said reporting modules comprise software applications capable of reporting said characteristic information in a format selected from one or more of the group consisting of HTML documents, Email documents, Facsimile documents, PDF documents, and ASCII documents.

14. A method for searching for a plurality of specified information contained in electronic files on a computer network, wherein characteristic information regarding each of said electronic files is stored in information sources across said computer network, said method comprising the steps of:

(a) entering search parameters; 22

(b) collecting characteristic information regarding each of said electronic files which match said search parameters from each of said information sources with an information collector comprising a plurality of collecting modules for each information source; (c) storing said characteristic information for each of said electronic files matching said search parameters;

(d) retrieving said characteristic information, accessing each of said electronic files matching said search parameters, retrieving each of said specified information therefrom, and storing said specified information in said storage medium with an information clipper comprising a plurality of clipping modules for each of said specified information; and

(e) retrieving said specified information.