Búsqueda Imágenes Maps Play YouTube Noticias Gmail Drive Más »
Búsqueda avanzada de patentes | Imágenes de página | Historial web | Iniciar sesión

Patentes

  

Illllllllllllllllllllllllllllllllllllllllllllllllll

US007483910B2

United States Patent

Beyer et al.

(io) Patent No.: (45) Date of Patent:

US 7,483,910 B2 Jan. 27, 2009

(56)

AUTOMATED ACCESS TO WEB CONTENT
BASED ON LOG ANALYSIS

Inventors: Kevin Scott Beyer, San Jose, CA (US);

Jussi Petri Myllymaki, San Jose, CA
(US)

Assignee: International Business Machines
Corporation, Armonk, NY (US)

Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 510 days.

Appl.No.: 10/042,367

Filed: Jan. 11, 2002

Prior Publication Data

US 2003/0135487 Al Jul. 17, 2003

Int. CI.

G06F17/00 (2006.01)

U.S. CI 707/102; 707/100; 707/101;

707/3

Field of Classification Search 707/3,

707/4, 5, 10, 100-102; 705/10 See application file for complete search history.

References Cited

U.S. PATENT DOCUMENTS

6,119,101 A * 9/2000 Peckover 705/10

6,363,377 Bl * 3/2002 Kravets et al 707/4

6,438,539 Bl * 8/2002 Korolev et al 707/3

6,516,312 Bl* 2/2003 Kraft et al 707/3

6,631,369 Bl* 10/2003 Meyerzonetal 707/4

6,665,658 Bl * 12/2003 DaCostaetal 707/3

6,738,780 B2* 5/2004 Lawrence etal 707/101

6,785,671 Bl* 8/2004 Bailey etal 707/3

7,120,629 Bl* 10/2006 Seibel et al 707/5

7,120,692 B2* 10/2006 Hesselink et al 709/225

2001/0032205 Al * 10/2001 Kubaitis 707/10

2002/0103823 Al * 8/2002 Jackson et al 707/501.1

2003/0088544 Al * 5/2003 Kan et al 707/3

2003/0115189 Al* 6/2003 Srinvasaetal 707/3

OTHER PUBLICATIONS

Informia: a Meditator for Integrated Access to Heterogenous Information Sources,{cube root}{cube rootjby Barja et al., published by ACM, 1998 *

Effective Web Data Extraction with Standard XML Technologies, {cube root} {cube rootjby Myllymaki, published by ACM, May 2001.*

* cited by examiner

Primary Examiner—Sana Al-Hashemi

(74) Attorney, Agent, or Firm—IP Authority, LLC; Ramraj

Soundararajan; Leonard Guzman

(57) ABSTRACT

The present invention provides a manner for providing Web crawlers capable of efficiently accessing Web content not accessible via static hyperlinks. Log files are maintained of communications between a Web browser and a Web server resulting from real user accesses to the content associated with dynamic hyperlinks. These log files represent past user's accesses to the content and are used to generate Web crawler accesses. This approach allows a crawler to accurately mimic real users, resulting in a capability of the crawler to automatically access all the content that real users would have access to.

10 Claims, 3 Drawing Sheets

[graphic][graphic]
[table][merged small][merged small][merged small]
[merged small][merged small][graphic]
[graphic]
« AnteriorContinuar »