WO2003054742A2

WO2003054742A2 - System and method for web to mainframe synchronization

Info

Publication number: WO2003054742A2
Application number: PCT/IB2002/005578
Authority: WO
Inventors: Linda Clark; Chad Virnig; Sriram Balakrishnan
Original assignee: American Airlines
Priority date: 2001-12-21
Filing date: 2002-12-20
Publication date: 2003-07-03
Also published as: WO2003054742A3; AU2002360193A1

Abstract

A method and system for synchronizing data between the internet and other computer systems by converting HTML documents into formats suitable for use on mainframe computer systems, while preserving most of the look and feel of the original HTML document, Page titles, sections and their corresponding headers, bold fonts, bulleted lists, numbered lists, and tables are converted into plain text with a similar arrangement as the original elements.

Description

SYSTEM AND METHOD FOR WEB TO MAINFRAME SYNCHRONIZATION

BACKGROUND OF THE INVENTION

1. Field of the Invention

[0001] The invention relates generally to a synchronizing data between the internet and other computer systems and, more particularly, to a system and method for converting HTML documents into formats suitable for use on mainframe computer systems.

2. Description of the Related Art

[0002] Reference information for airlines is typically maintained on mainframe computers in records that are accessible by flight reservation systems. This information may include, for example, aircraft seat map information, baggage restrictions for particular flights, flight booking and check-in policies etc. The information is stored in file formats designed to be compatible with the reservation system software. This type of information may also be stored on personal computers (PCs) and other systems in text files and other file formats compatible with the various applications that need access to the data. [0003] In recent years the internet has become a convenient means to make this same type of information available to airline employees, travel agents, and customers. The internet is also proving a convenient means for accepting input of new information from all of these people. This information is typically stored on internet servers in Hyper Text Markup Language (HTML) documents suitable for display by internet browser programs. [0004] To ensure that information stored on the mainframe and PC computer systems and the internet is the same, some means is required to synchronize the data stored in the different computer systems. New data and updates to existing data could be manually entered into the different systems, but this method is time consuming, expensive, and prone to error. As the amount of data posted on the internet increases, the need to provide some automated means to synchronize the data stored in legacy mainframe and PC systems and the data stored on internet servers becomes more critical. Although the data stored on the internet can be sent electronically to the mainframe and PC systems- for storage, the HTML format used for internet web pages is not compatible with many existing mainframe and PC applications, particularly airline reservation systems.

[0005] Thus, there is a need for a means to transform HTML documents into a format suitable for storage and use by mainframe and other computer systems.

BRIEF SUMMARY OF THE -INVENTION

[0006] The present invention provides a system and method for applying transformations of a formatted web page to generate a file in a format suitable for storage and use by mainframe and other computer systems. Preferably, the system and method converts the HTML document into a plain text document while preserving most of the look and feel of the original HTML document. Page titles, sections and their corresponding headers, bold fonts, bulleted lists, numbered lists, and tables are preferably converted into plain text with a similar arrangement as the original elements.

[0007] The present invention provides in one aspect a method for converting an HTML document which includes HTML tags and one or more blocks of plain text into one or more text documents suitable for use with a destination system. The HTML document is divided into sections and inserting XML tags to indicate breaks between the sections, one or more of the HTML tags in the source document are replaced with plain text providing a visual representation corresponding to formatting defined by the HTML tags, and the blocks of plain text of the source document are broken into smaller blocks of plain text if the blocks exceed a maximum size for use with the destination system.

[0008] The source document may be tidied before the dividing step by removing unwanted HTML tags and fixing syntax errors in HTML tags. HTML tags replaced with plain text may include replacing an image with plain text representing the image, replacing a link which specifies a destination with plain text indicating the destination, replacing an unnumbered list with a plain text list having a symbol beginning each item in the plain text list, replacing a numbered list with a plain text list having a list item number beginning each item in the plain text list, and replacing a table with a plain text representation of a table organized in rows and columns. Dividing the blocks of plain text may include dividing the blocks into lines where each line has less than a predetermined maximum number of characters per line.

[0009] Another aspect of the invention comprises a method for synchronizing information in a source record formatted for use on the internet with information in a destination record for use on a mainframe computer, where the source record includes HTML tags^' and one or more blocks of plain text. The source record is accessed and transformed by tidying the source record by removing unwanted HTML tags and fixing syntax errors in HTML tags, dividing the tidied source record into sections by detecting first types of the HTML tags which indicate breaks between sections of the source record and inserting XML tags to indicate section breaks, replacing second types of the HTML tags which indicate formatting of the content of the source record with plain text providing a visual representation of the formatting defined by the second types of HTML tags, and dividing the blocks of plain text of the tidied source record into smaller blocks of plain text if the blocks exceed a maximum size for use in the destination record. Any destination record having the same name as the converted source record is deleted, and the converted source record is transmitted to the mainframe computer.

[0010] Yet another aspect of the invention comprises a system for converting a source document comprising HTML tags and one or more blocks of plain text into one or more destination documents suitable for use with a destination system. The system includes a server for storing the source document, a destination computer for using information in a destination format, and a transformation application for converting the source document. The transformation application tidies the source document to generate an XML compliant HTML document, divides the tidied source document into sections and inserts XML tags to indicate breaks between the section, replaces one or more of the HTML tags in the source document with plain text providing a visual representation corresponding to formatting defined by the HTML tags, and divides the blocks of plain text of the source document into smaller blocks of plain text if the blocks exceed a maximum size for use with the destination system. BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The features and advantages of the invention will be appreciated upon reference to the following drawings, in which:

[0012] FIG. 1 is a block diagram of a system for implementing a synchronization process according to the present invention;

[0013] FIG. 2 is a flowchart showing an embodiment of the synchronization process according to the present invention;

[0014] FIG. 3 is a flow chart showing the Break into Sections step of the synchronization process of FIG. 2; [0015] FIG. 4 is a flow chart showing the Reformat to Text step of the synchronization process of FIG. 2;

[0016] FIGS. 5 A and 5B are flow charts showing the Process HTML Tags process called by the process of FIG. 4;

[0017] FIG. 6 is a flow chart showing the Handle Text process called by the process of FIG. 5;

[0018] FIG. 7 is a flow chart showing the Reset Table process called by the process of FIG.

5;

[0019] FIG. 8 is a flow chart showing the Reset Row process called by the process of FIG.

5; [0020] FIG. 9 is a flow chart showing the Handle Header process called by the process of

FIG. 5;

[0021] FIG. 10 is a flow chart showing the Handle Col process called by the process of

FIG. 5;

[0022] FIG. 11 is a flow chart showing the Print Table process called by the process of FIG. 5;

[0023] FIG. 12 is flow chart showing the Split into Multiple Pages step of the synchronization process of FIG. 2; [0024] FIG. 13 is a flow chart showing the Create New STAR process called by the process of FIG. 12;

[0025] FIG. 14 is a flow chart showing the Send to Sabre step of the synchronization process of FIG. 2; [0026] FIG. 15 is a flow chart showing the Crt-ate STAR process called by the process of FIG. 14;

[0027] FIG. 16 is a flow chart showing the Send Result process called by the process of FIG. 14;

[0028] FIG. 17 is a flow chart showing the Send Text process called by the process of FIGS. 14 and 16;

[0029] FIG. 18 shows a portion of an HTML document before conversion to mainframe readable format; and

[0030] FIG. 19 shows the converted version of the HTML document of FIG. 18.

DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS [0031] One embodiment of a system and process for synchronizing main frame and web based content is described below. The system applies transformations to a formatted web page to generate a mainframe readable text page while preserving most of the look and feel of the original web page. Page titles, sections and their corresponding headers, bold fonts, bulleted lists, numbered lists, and tables are converted into plain text representations which retain the basic arrangement of these elements in the original document.

[0032] The embodiment described below converts HTML documents into a format suitable for use by the SABRE® system, although the method could be modified to accomomdate other destination systems with different file format requirements. The SABRE® system contains data records called STARs and FOCUS STARs. A STAR record contains all capitalized plain text with a maximum of 55 characters per line and a maximum of 200 lines per record. Abbreviations and cryptic terms are typically used to accommodate the space limitations. A FOCUS record is also all capitalized plain text and has the same character and line capacity as a STAR record. FOCUS and STAR records are typically used to document policies, and procedure manuals.

[0033] Referring initially to FIG. 1, a block diagram is shown of a system for implementing a synchronization process. A computer 102 and a mainframe computer 108 are shown connected to a network 104. Web server 106 is connected to the internet 110, and there is a connection between the network 104 and internet 110. The web server 106 stores HTML documents which may be accessed across the internet 110. The mainframe computer 108 stores information in mainframe-readable formats. In the embodiment discussed below, this computer 108 runs the SABRE® application, but the system is equally applicable to other types of computers (for example, personal computers) and other types of applications. [0034] The computer 102 has access to the HTML documents stored on the web server 106 and to the mainframe computer 108. Computer 102 runs the synchronization process, selecting and downloading an HTML document from web server 106, converting the HTML document to a mainframe readable format, and transmitting the converted document to the mainframe computer 108.

[0035] Although a particular structure is shown in FIG. 1, any suitable network structure and other types of computers may be used to implement the system. For example, the computer 102, web server 106, and mainframe computer all could be connected via the internet, via separate connections, or by a mixture of these means. [0036] FIG. 2 shows an overview of a process for converting an HTML document (the source document) into data records suitable for display and use by the SABRE system. The process begins at step 202 and at step 204 the HTML document to be converted undergoes a cleaning process to produce a valid XHTML document. This cleaning process may be achieved using a software application such as the open source JTidy application, available for download from the internet from w3c.org at https://sourceforge.net/piOiects/itidv or http ://lempinen.net/sami/itidy/. This is a Java port of HTML Tidy, an HTML checker and pretty printer application. JTidy can be used to clean up malformed and faulty HTML tags by removing unwanted tags and fixing errors in the tags to comply with HTML syntax to produce a valid XML compliant HTML file (XHTML) that is suitable for parsing by an XML parser.

[0037] In step 206, the content of the HTML document is broken down into XML sections. These sections will comprise title, subtitle and body sections to preserve the overall layout of the HTML document in the converted?dataJϊles. This process may be performed using a software application such as the open source XML Parser application xerces.jar (ver 1.3.1) from Apache to parse the HTML document and the XSL Transformation Engine xalan.jar (ver 2.0.9) from Apache to do transformations. These software applications are available from the Apache Software Foundation at www.apache.org. The details of the sectioning process are shown in FIG. 3.

[0038] hi step 208, within each section of the document, the HTML tags are reformatted into text format. This reformatting process may be performed using the XSL Transformation Engine xalan.jar from Apache. The details of the reformatting process are shown in FIGS. 4 to 11. [0039] hi step 210, the text is broken into multiple pages, if required, to conform to the maximum page size for the SABRE® system. A table of contents may also be created during this step. The details of the page splitting process are shown in FIGS. 12 and 13. This step is designed to meet the requirements of the SABRE® system. It may be adapted to suit the requirements of different destination systems or applications, or could be omitted entirely if the destination system or application did not have a specific record size limit.

[0040] In step 212, the generated text is sent to SABRE® and a text version of each page is created as STAR records on the SABRE® system. The details of the sending process are shown in FIGS. 14 to 17. This step is described as it particularly relates to the SABRE® system, although the document conversion process could be readily applied to different types of destination systems and applications. Finally, the process completes at step 214. [0041] FIG. 3 shows a flow chart of the process for breaking the source document into sections, as shown in step 206 of FIG. 2. Style sheets are preferably used to transform the HTML elements in the source document throughout the process, and a hybrid flowchart format is used in the drawings for documenting the flow of the process, although the transformation software does not perform the steps strictly according to the flow shown in the drawings.

[0042] When the transformation software encounters an HTML element in the source document, it creates a STAR Document element and multiple STAR elements in the converted document. These elements are created as XML tags in the document. ] The STAR elements which may be created during the page splitting process include a Title element containing title information for the STAR Document and one or more Name and Section elements. A Name element may contain additional header information for the^" document, and a Section element contains the content of the document. Each section has a section number and includes a Sub itle element containing title information for the section and a Body element containing the content for the section. The Body element may further comprise Body_info and Icon elements. These elements are created to divide the source document into sections corresponding to the way the content of the source document is divided for display by a web browser. These sections assist in preserving the look and feel of the source document once it has undergone the conversion process. [0043] The transformation software parses the source document and processes the HTML elements (or tags) in the document to determine how the STAR Document should be broken into STAR elements representing the content of the original HTML document. The source HTML document usually includes header and/or body elements as child nodes of the HTML element, the head tag including title and header information and the body tag including the document information to display. The head and body elements will each contain other HTML tags which will be processed in turn by the transformation software. [0044] Referring now to FIG. 3, when the transformation software encounters an <html> tag it generates a STAR Document element, as shown in steps 302 and 304. An <html> tag should be the first in the entire document and defines an HTML document. The <head> tag defines a header section that includes text that describes the HTML document, and may include a <title> tag and a <meta> tag. The title tag defines the title of an HTML document which may be displayed in a browser's title bar and bookmark lists, and the meta tag declares HTTP meta name/value pairs which may be used to extend the HTTP header information returned by the HTTP server. When the transformation software encounters a title tag, it creates a Title element, as shown in steps 310 and 312. When the transformation software encounters a meta tag, if the name attribute of the tag includes the word "STAR" then the transformation software creates a STAR Name element which includes the meta tag value attribute, as shown in steps 314, 316, and 318. Other types of header tags are ignored. [0045] The <body> tag introduces the body of the HTML document and usually appears after the head section and occupies the remainder of the document. Processing of the child tags of the body tag is shown in steps 324 to 352.

[0046] The <p> tag defines a new paragraph, and the <font> tag defines text with a smaller or larger font than usual. The <hl> tag defines a level 1 heading, which is typically shown in a large bold font with several blank lines around it, and the <h2> tag defines a level 2 heading. When the transformation software encounters a level 1 heading tag, a Sub_title element is created in the current STAR element and the section number for the STAR element is incremented, as shown in steps 328 to 332. When the transformation software encounters a level 2 heading tag, a Body element is created in the current STAR element, as shown in steps 334 and 342. [0047] The <ul> or unordered list tag introduces a bulleted list (i.e. an unnumbered list), and the <div> or division tag is used to divide a document up into different sections. When the transformation software encounters an unordered list tag or a division tag, a Body -element is also created, as shown in steps 336 to 342. [0048] The <img> or inline image tag indicates an image to be displayed. The source attribute indicates the URL for the image to be displayed. If the source attribute includes the name "note.gif ' then the transformation software creates a further subsection within the Body element comprising Info and Icon elements, as shown in steps 344 and 346. For example, the Icon element may consist of the phrase "Please note:" and the Info element may consist of the text for the note. If the source attribute of the image tag does not include the name "note.gif ' then the transformation software creates a Body_info subsection within the Body element, as shown in step 348.

[0049] The <pre> or preformatted text tag defines text to be shown in a fixed width font with specified line breaks and other white space. When the transformation software encounters a preformatted text tag, the preformatted text is added to the current Section element, as shown in steps 350 and 352.

[0050] FIG. 4 shows the process to create additional elements within and reformat the HTML tags into text in a format which mimics the presentation in the original HTML document. The process shown in FIG. 4 is specific in many respects to the SABRE® system, although the techniques used could readily- be applied to other systems with different requirements. The process shown in FIG. 4 (shown as step 208 of FIG. 2) operates on the output of the process shown in FIG. 3. Initially, the transformation software detects the STAR Document element and creates a new STAR element as a parent element for the additional child elements that will be created, as shown in steps 402 and 404. The transformation software then continues to parse the document being converted to detect specific types of elements embedded in the HTML tags, which indicate specific treatment for various parts of the HTML document when converted for the SABRE® system. One method of embedding these elements into the source HTML document is to insert them as attributes of HTML tags. For example, attributes of the font tags in the original document may be used to indicate SABRE® specific elements such as the Always_Move_Lines, Optional_Move_Lines, RestrictedJLines, Internal_Content, and Overview elements. The transformation software examines the font tags within the source document to detect the presence of these attributes and create explicit XML elements (ALINES, OLINES, RLINES etc.) which will be used in later processes to indicate to the SABRE® system how to treat these text types associated with the elements.

[0051] When the transformation software encounters a Title element (created during the process shown in FIG. 3), an SLTNES element is created, as shown in steps 406 and 408. The Count Number of Lines process is then called, as shown in step 410, to determine the number of lines of text to which the SLINES element applies. An SLINES element indicates that the following lines should be presented as subject lines used as a title in the converted document. [0052] When the transformation software encounters an Always_Move_Lines element, an ALINES element is created, and the Count Number of Lines process is called to determine the number of lines of text to which the OLINES element applies, as shown in steps 412 to 416. The ALINES element is a child of a STAR element and indicates to other applications accessing the SABRE® application to extract the following lines of text for display in those other applications.

[0053] When the transformation software encounters an Optional_Move_Lines element, an OLINES element is created and the Count Number of Lines process is called to determine the number of lines of text to which the OLINES element applies, as shown in steps 418 to 422. The OLINES element is a child of a STAR element and indicates to other applications accessing the SABRE® application that the following lines of text may be extracted for display in those other applications.

[0054] When the transformation software encounters a Restricted_Lines element, a PLINES element and an RLINES element are created, as shown in steps 424 to 428. A PLINES element indicates priority lines which will be displayed near the top of the converted document and an RLINES element indicates to other applications accessing the SABRE® application that the following lines of text are restricted and may not be extracted. The Process HTML Tags process is then called, as shown in step 430. The Process HTML Tags process, which is shown in FIGS. 5 A and 5B, processes the HTML tags within following the Restricted_Lines element to determine how the text of the HTML document should be displayed in the converted document. The Count Number of Lines process is then called to determine the number of lines of text to which the RLINES element applies, as shown in step 432. [0055] When the transformation software encounters an Overview element or an Internal_Content element, a PLINES element and an NLINES element are created, as shown in steps 434 to 438. A PLINES element indicates priority lines which will be displayed near the top of the converted document to provide an overview section with an index of subtitles relating to each section of the document. An NLINES element indicates that the following lines constitute the content of the converted document. The Process HTML Tags process and the Count Number of Lines process are both called to determine how the text of the HTML document should be displayed and to determine the number of lines of text to which the NLINES element applies, as shown in steps 440 and 442. [0056] The resulting SLINES, ALINES, OLINES, RLINES, PLINES, and NLINES elements created during the process shown in FIG. 4 are used when the converted document is transmitted to the SABRE® system (as shown in FIG. 14) to indicate to the SABRE® software how to treat the various portions of the converted document. [0057] FIGS. 5A and 5B show a flowchart for the Process HTML Tags process. The transformation software parses the HTML tags within the document being converted and processes each tag to determine the presentation of the content of the original HTML document in the converted document. Plain text is used to mimic the look and feel of the HTML elements in the source document. The HTML tags are deleted during the process and replaced in the converted document by the plain text versions. Other types of HTML tags which are not converted by the transformation software are simply deleted from the converted document.

[0058] The <img> or image tag displays an image referred to by a URL. When the transformation software encounters an image tag, text denoting the presence of an image in the original HTML document is added into the converted document. For example, the text "[AAWR IMAGE]" may be added, as shown in steps 504 and 506.

[0059] The <a> or anchor tag defines either an anchor or a hyperlink in a document. The anchor tag contains either a Name attribute or an HREF attribute, or both. When the transformation software encounters an anchor tag, if the tag defines an external hyperlink which links to another document, the transformation software adds the name of the destination of the link into the converted document, as shown in steps 508 to 512. If the anchor tag is an internal link (i.e. a hyperlink to a location within the HTML document), the transformation software gets the STAR Name element of the STAR Document corresponding to the destination of the link and adds text to the converted document indicating the STAR Name in order to specify where in the converted document the link points. For example, the text "N*<STAR Name>" may be inserted, as shown in step 514. [0060] The <div> or division tag is used to divide a document up into different sections, and the <p> or paragraph tag defines a new paragraph. When the transformation software encounters either a division tag or a paragraph tag, the transformation software processes any HTML tags within the division or paragraph of the document and then performs the Handle Text process (shown in FIG. 7), as shown in steps 516 to 520. [0061] The <b> or bold tag defines text to be shown in a bold typeface. Bold text is shown enclosed in asterisk symbols in the converted document. When the transformation software encounters a bold tag, an asterisk symbol is added to the converted document, the transformation software processes any HTML tags within the bold element, and another asterisk symbol is added to the converted document, as shown in steps 522 to 528. [0062] The <font> tag defines text with a smaller or larger font than usual, and the <br> or break tag breaks the current line of text. When the transformation software encounters either a font tag or a break tag, the transformation software processes any HTML tags within the font or break element and a new line os added to the converted document, as shown in steps 530 to 534.

[0063] The <li> or list item tag defines one entry in an ordered, unordered, menu, or directory list. Other tags may be embedded in a list item, such as the <ul> tag which introduces an unordered (bulleted) list and the <ol> tag which introduces an ordered (numbered) list. When the transformation software encounters a unordered list tag, the transformation software adds the text "(0)" to indicate a bullet point and then processes any HTML tags within the unordered list element, as shown in steps 536 to 542. When the transformation software encounters an ordered list tag, the transformation software adds a number indicated by the value of variable n to produce a numbered list element, as shown in steps 544 and 546. The transformation software then processes any HTML tags within the ordered list element and increments the variable n, as shown in steps 548 and 550. [0064] A <table> tag consists of an optional caption and one or more <tr> or table row tags. A table row tag defines a row of cells in the table that are defined with <th> or table header tags and <td> or table data tags. The header tag defines a header cell and the table data tag defines a table cell. When the transformation software encounters a table tag it calls the Reset Table process to calculate the width of the table (shown in FIG. 8) and then processes all of the table row, table header, and table data tags within the table element, as shown in steps 552 to 546. The Print Table process (shown in FIG. 12) is then called to pad the contents of each cell in the table with blank spaces and blank lines to make each cell as wide as the widest cell in each column and as tall as the tallest cell in each row. If the table tag defined the table as having a border, the Print Table process is called with a border character "| " printed at the edges of the table, and if the table is defined without a border the Print Table process is called with blank spaces printed at the edges, as shown in steps 558 to 562.

[0065] When the transformation software encounters a table row tag, the Reset Row process, is called to calculate the height of the row in the converted document (shown in FIG. 9) and then all of the table header and table data tags within the table row element are processed, as shown in steps 564 and 568. The Handle Col process is then called to convert the contents of each cell in the table into a box containing text and concatenate the boxes for each row (shown in FIG. 11), as shown in step 570. [0066] When the transformation software encounters a table header tag or table data tag, the Handle Header process is called to determine the width of the widest cell within a column of the table in the converted document (shown in FIG. 10), as shown in steps 572 and 574. This will enable the column to be made wide enough to accommodate the widest cell in the column. ^'"AnyΗTML tags within the table header or table data elements are then processed as shown in step 576.

[0067] FIG. 6 shows a flowchart for the Handle Text process which is called by the Process HTML Tags process. Text within an HTML document is usually wrapped by the browser and the text does not include line breaks. To convert an HTML document to a format suitable for the SABRE® application or other non-HTML application which requires text to be broken into separate lines, each block of text within a division or paragraph element is broken into lines, as shown in steps 602 to 606. The SABRE® application requires a maximum number of 55 characters per line. The process of breaking the text into lines preferably does not insert a line break in the middle of a word.

[0068] FIG. 7 shows a flowchart for the Reset Table process which is called by the Process HTML Tags process to calculate the width of a table in the converted document. A Tablehifo object is created to store table data, such as the number of rows and columns, the table width, the width of each column, and the data in each cell of the table, as shown in step 704. The width of a table in the HTML document may be defined in the table tag as a percentage of the page width. If the width is defined, the corresponding width of the table is calculated for the converted document as a function of the width of the converted page, as shown in steps 706 and 708. If the width is not defined, a default width of 100 percent is used, as shown in step 710. The calculated width is stored in the Tablelnfo object which is pushed onto the Stack for storage, as shown in step 712.

[0069] FIG. 8 shows a flowchart for the Reset Row process which is called by the Process HTML Tags process to calculate the height of a table row in the converted document. The Tablelnfo object is popped from the top of the stack, as shown in step 804. The height of a row of a table in the HTML document may be defined in the table row tag. If the height is defined, the corresponding height of the row is calculated for the converted document, as shown in steps 806 and 808. If the height is not defined, a default height is used, as shown in step 810. The calculated height is stored in the Tablelnfo object which is pushed onto the Stack for storage, as shown in step 812. [0070] FIG. 9 shows a flowchart for the Handle Header process which is called by the Process HTML Tags process to determine the width of the widest cell within a column of a table in the converted document. The Tablelnfo object is popped from the stack, as shown in step 904. The width of each cell within a column of a table in the HTML document may be defined in a table header or table data tag. If the width is defined, the corresponding width of the column is calculated for the converted document, as shown in steps 906 and 908. If the calculated width is greater than the currently stored maximum column width, then the maximum column width is set to the newly calculated width, as shown in steps 910 and 912. As each cell within a column is processed, this algorithm will set the maximum column width to be equal to, the width of the widest cell within the column. The maximum column width is stored in the Tablelnfo object which is pushed onto the Stack for storage, as shown in step 914.

[0071] FIG. 10 shows a flowchart for the Handle Col process which is called by the Process HTML Tags process to convert the contents of each cell in a table into a box containing text and concatenate the boxes for each row. The Tablelnfo object is popped from the top of the stack, as shown in step 1004. The contents of each table header and table data element (which may simply contain text or may contain HTML elements) is converted into a block of text in a box and the box is added to the boxes for each column in the row, as shown in steps 1006 and 1008. The data is stored in the Tablelnfo object which is pushed onto the stack for storage, as shown in step 1010.

[0072] FIG. 11 shows a flowchart for the Print Table process which is called by the Process HTML Tags process to pad the contents of each cell in the table with blank spaces and blank lines to make each cell as wide as the widest cell in each column and as tall as the tallest cell in each row. The Tablelnfo object is popped from the stack, as shown in step 1104. For each column in a given row, the maximum column width (the width of the widest cell in the column) is read from the Tablelnfo object and the data in each cell in the column is aligned in the cell by padding the cell with the appropriate number of blank spaces. This process is repeated for each column in the row, as shown in steps 1106 to 1114. The maximum row height (the height of the tallest cell in the row) is read from the Tablelnfo object and the data in each cell in the row is aligned in the cell by adding an appropriate number of blank lines to the cell. This process is repeated for each row in the table, as shown in steps 1116 to 1120. All of the cells in the Tablelnfo object are then printed to the converted document, as shown in step 1122.

[0073] FIG. 12 shows a flowchart for the page split step of the synchronization process of FIG. 2. This process operates on each STAR element created during the previous processes, and divides the content of each element among multiple elements to accommodate the page size limitations of the SABRE® application of 200 lines per page. Initially, a line count variable is reset and a STARS element is created, as shown in steps 1204 and 1206, as a dummy parent element for the multiple STAR elements created during the process of FIG. 12. The original HTML document being converted was divided into sections during previous processes, and the process of FIG. 12 ensures that the division into multiple pages in the SABRE® system does not result in a section being split between two pages. For each of these sections, the number of lines is counted and added to the line count, as shown in steps 1208 and 1210. If the resulting line count is greater than the maximum number of lines per page permitted by the SABRE® application, then the Create New STAR process is called, as shown in steps 1212 and 1214. The contents of the section are then copied into the existing STAR element if the line count did not exceed the SABRE® limit, or into the newly created STAR element if the limit was exceeded, as shown in step 1216. The process shown in steps 1208 to 1220 continues until all of the sections within each STAR element are processed.

[0074] FIG. 13 shows a flowchart for the Create New STAR process. A new STAR element is created in step 1304. The software examines the number of line numbers in the converted document, and if the new STAR element is not the last STAR element in the converted document, then a NEXT STAR element is added to the STAR element with the variable nextSTAR indicating the line number of the next STAR element, as shown in steps 1306 and 1308. If the new STAR element is not the first STAR element in the converted document, then a PREV STAR element is added to the STAR element with the variable prevSTAR indicating the line number of the previous STAR element, as shown in steps 1310 and 1312. The process then ends and returns to the calling process, as shown in step 1314: [0075] FIG. 14 shows a flowchart for the Send to SABRE® step of the synchronization process of FIG. 2. Each STARS element is parsed and when each child STAR element is encountered the Create STAR process is called to create a new STAR element in the SABRE® system, as shown in steps 1402 to 1406. Each of the child elements of each STAR element is then processed to transfer the content of each of the child elements to the SABRE® system with the appropriate parameters to enable the SABRE® application to display the content in the correct manner.

[0076] When an SLINES element is encountered, the Send Result process is called, as shown in steps 1508 and 1510, with line type parameter equal to "S" indicating the content of the SLINES element should be treated as subject lines to be used as a title. When an PREV STAR element is encountered, the Send Text process is called, as shown in steps 1512 and 1514, with the text "Continued from " and the variable prevSTAR to indicate on the page displayed on the SABRE® system the identity of the previous page from which the current page continues. When a PLINES element is encountered, the Send Text process is called, as shown in steps 1516 and 1518, with line type parameter indexLines indicating the content of the PLINES element should be treated as index lines to be placed near the top of the page and used to create an index of the contents of the page displayed on the SABRE® system. When a NEXT STAR element is encountered, the Send Text process is called, as shown in steps 1520 and 1522, with the text "Continued in " and the variable nextSTAR to indicate on the page displayed on the SABRE® system the identity of the next page which is a continuation of the current page.

[0077] When an ALINES element is encountered, the Send Result process is called, as shown in steps 1524 and 1526, with line type parameter equal to "A" indicating the content of the ALINES element should be treated as "always move lines" which are to be extracted by other applications accessing the SABRE® system. When an OLINES element is encountered, the Send Result process is called, as shown in steps 1528 and 1530, with line type parameter equal to "O" indicating the content of the OLINES element should be treated as "optional move lines" which may optionally be extracted by other applications accessing the SABRE® system. When an RLINES element is encountered, the Send Result process is called, as shown in steps 1532 and 1534, with line type parameter equal to "R" indicating the content of the RLINES element should be treated as "restricted lines" which may not be extracted by other applications accessing the SABRE® system. When an NLINES element is encountered, the Send Result process is called, as shown in steps 1536 and 1538, with line type parameter equal to "N" indicating the content of the NLINES element should be treated as generic normal lines within the document. [0078] FIG. 15 shows a flowchart for the Create STAR process. Initially a connection is established with the system running the SABRE® application and the transformation software signs on to the SABRE® application, as shown in steps 1502 and 1504. If a STAR record already exists in the SABRE® system with the same name as the STAR to be sent to the SABRE® system, then the existing STAR record is renamed and purged (deleted), as shown in steps 1508 and 1510. The new STAR record is then created in the SABRE® system as shown in step 1512. The process then ends and returns to the calling process, as shown in step 1514. [0079] FIG. 16 shows a flowchart for the Send Result process. For each line of the converted document, the Send Text process is called with the line type specified, as shown in steps 1604 to 1608. The process then ends and returns to the calling process, as shown in step 1610. [0080] FIG. 17 shows a flowchart for the Send Text process. Each time the process is called, the line type is sent to the SABRE® system and the line is inserted into the current STAR record, as shown in step 1704. If an error is detected, a SABRE® Exception is generated to announce the error, as shown in steps 1706 and 1708. The process then ends and returns to the calling process, as shown in step 1710. [0081] FIGS. 18 and 19 show an example of the conversion of an HTML document into a document for display on a mainframe computer running the SABRE® application. FIG. 18 shows a portion of an HTML document with a title, subtitles, hyperlinks, a table, bold fonts, and a bulleted list. FIG. 19 shows the converted document. The page title is displayed as the SLMES element "OS UNITED STATES BAGGAGE INFORMATION." The PLTNES element is displayed next beginning with the line "IP OVERVIEW" and including the indexLines parameter "N**006-011." The PLINES provide an index to each section of the page, with the indexLines parameter indicating the line number of beginning of each section. The NLINES elements are then displayed beginning at line 6N and continuing for the rest of the page. The subtitles "OVERVIEW," "RELATED INFORMATION," and "EMBARGOES" are reproduced on lines 8N, 14N, and 21N, and the hyperlink "View domestic baggage acceptance/requirement information" in the HTML document has been converted to the reference on line 17N to ".N*DBAG". [0082] The transformation software maybe implemented as a JAVA application to generate a platform independent process, but other programming languages may also be used to implement the process.

[0083] Thus, an embodiment of a system and process for converting HTML documents to a mainframe readable text format has been described. This transformation process eliminates the need for duplicate entry of data into multiple computer systems and may be used to efficiently synchronize data stored in HTML format to data stored in mainframe readable format.

[0084] The transformation process will typically produce better results if the source HTML documents adhere to a well-defined structure or template. For example, such a standard structure might include pages divided into sections with an index section appearing at the top of each page, minimal images, and a limited number of nested tables.

[0085] It should be noted that the embodiments described above are susceptible to various modifications and alternative forms. For example, the transformation process described could be used to convert any web based text content into mainframe readable format, such as weather data, news reports, stock market information, etc. Although the process has been described for conversion of HTML documents to a specific mainframe readable text format (i.e. the SABRE® STAR record format), the process could readily be modified using techniques well known to those of skill in the art to convert HTML documents into other formats, such as mainframe readable text files for applications other than SABRE® or text or other format files for use by applications running on a personal computer. The transformation process may also be modified to convert web pages in HTML format to pages that can be displayed on personal digital assistants (PDAs) like Palm Pilot etc., using the Wireless Markup Language format or other suitable data formats. [0086] The transformation process as described uses XML tags to define various sections of the document being converted. However, any suitable alternative technique may be employed to define the document sections in a way that could indicate how the sections should be displayed in the converted document. [0087] Many modifications in addition to those described above may be made to the structures and techniques described herein without departing from the spirit and scope of the invention. Accordingly, although specific embodiments have been described, these are examples only and are not limiting upon the scope of the invention.

Claims

1. A method for converting a source document comprising HTML tags and one or more blocks of plain text into one or more destination documents suitable for use with a destination system, comprising: dividing the source document into sections and inserting XML tags to indicate breaks between the sections; replacing one or more of the HTML tags in the source document with plain text providing a visual representation corresponding to formatting defined by the HTML tags; and breaking the blocks of plain text of the source document into smaller blocks of plain text if the blocks exceed a maximum size for use with the destination system.

2. The method of claim 1 further comprising tidying the source document before the dividing step by removing unwanted HTML tags and fixing syntax errors in HTML tags.

3. The method of claim 1 wherein dividing the source document into sections comprises creating a title element and one or more body elements, each body element having a subtitle element.

4. The method of claim 1 further comprising inserting XML tags into the destination document to indicate availability for access by a computer application to one or more sections of the destination document.

5. The method of claim 4 wherein one or more attributes of the HTML tags in the source document are used to indicate the availability for access by a computer application to one or more sections of the source document.

6. The method of claim 1 wherein the step of replacing HTML tags comprises: replacing an image with plain text representing the image; replacing a link which specifies a destination with plain text indicating the destination; replacing an unnumbered list with a plain text list having a symbol beginning each item in the plain text list; replacing a numbered list with a plain text list having a list item number beginning each item in the plain text list; and replacing a table with a plain text representation of a table organized in rows and columns.

7. The method of claim 4 wherein if a link specifies a destination which is within the source document, the plain text indicating the destination comprises a line number corresponding to the position of the destination within the one or more destination documents.

8. The method of claim 1 wherein dividing the blocks of plain text comprises dividing the blocks into lines, each line having less than a predetermined maximum number of characters per line.

9. The method of claim 1 wherein breaking the blocks of plain text of the source document into smaller blocks of plain text comprises providing an indication for each of the smaller blocks of a previous block or a next block.

10. A computer-readable record carrier on which is stored a computer program for implementing the method of claim 1.

11. A method for synchronizing information in a source record formatted for use on the internet with information in a destination record for use on a mainframe computer, the source record comprising HTML tags and one or more blocks of plain text, the method comprising: accessing the source record; transforming the source record by: tidying the source record by removing unwanted HTML tags and fixing syntax errors in HTML tags; dividing the tidied source record into sections by detecting first types of the HTML tags which indicate breaks between sections of the source record and inserting XML tags to indicate section breaks; replacing second types of the HTML tags which indicate formatting of the content of the source record with plain text providing a visual representation of the formatting defined by the second types of HTML tags; and dividing the blocks of plain text of the tidied source record into smaller blocks of plain text if the blocks exceed a maximum size for use in the destination record; deleting a destination record having a same name as the converted source record; and transmitting the converted source record to the mainframe computer.

12. The method of claim 11 wherein dividing the source record into sections comprises creating a title element and one or more body elements, each body element having a subtitle element.

13. The method of claim 11 further comprising inserting XML tags into the source record to indicate availability for access by a computer application to one or more sections of the destination document.

14. The method of claim 11 wherein the step of replacing second types of the HTML tags comprises: replacing an image with plain text representing the image; replacing a link which specifies a destination with plain text indicating the destination; replacing an unnumbered list with a plain text list having a symbol beginning each item in the plain text list; replacing a numbered list with a plain text list having a list item number beginning each item in the plain text list; and replacing a table with a plain text representation of a table organized in rows and columns.

15. A computer-readable record carrier on which is stored a computer program for implementing the method of claim 11.

16. A system for converting a source document comprising HTML tags and one or more blocks of plain text into one or more destination documents suitable for use with a destination system, comprising: a^~server for storing the source document; a destination computer for using information in a destination format; and a transformation application for converting the source document, the transformation application tidying the source document to generate an XML compliant HTML document, dividing the tidied source document into sections and inserting XML tags to indicate breaks between the section, replacing one or more of the HTML tags in the source document with plain text providing a visual representation corresponding to formatting defined by the HTML tags, and dividing the blocks of plain text of the source document into smaller blocks of plain text if the blocks exceed a maximum size for use with the destination system.