US20080028375A1 - Validator-driven architecture of an xml parsing and validating solution - Google Patents

Validator-driven architecture of an xml parsing and validating solution Download PDF

Info

Publication number
US20080028375A1
US20080028375A1 US11/460,050 US46005006A US2008028375A1 US 20080028375 A1 US20080028375 A1 US 20080028375A1 US 46005006 A US46005006 A US 46005006A US 2008028375 A1 US2008028375 A1 US 2008028375A1
Authority
US
United States
Prior art keywords
xml
validation engine
parsing
parser
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/460,050
Inventor
Moshe E. Matsa
Eric Perkins
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US11/460,050 priority Critical patent/US20080028375A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MATSA, MOSHE E., Perkins, Eric
Publication of US20080028375A1 publication Critical patent/US20080028375A1/en
Priority to US12/130,285 priority patent/US8935605B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation

Definitions

  • IBM® is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.
  • This invention relates to XML parsers, and particularly to a method that treats validation engines as an integral part of parsing by allowing the validation engines to be written in a recursive-descent code-driven manner.
  • XML Extensible Markup Language
  • SOAP Simple Object Access Protocol
  • Web services In the performance-critical setting of business computing, however, the flexibility of XML becomes a liability due to the potentially significant performance penalty.
  • XML processing is conceptually a multitiered task, an attribute it inherits from the multiple layers of specifications that govern its use including: XML, XML namespaces, XML Information Set (Infoset), and XML Schema.
  • Traditional XML processor implementations reflect these specification layers directly. Bytes, read off the “wire” or from disk, are converted to some known form. Attribute values and end-of-line sequences are normalized.
  • Namespace declarations and prefixes are resolved, and the tokens are then transformed into some representation of the document Infoset.
  • the Infoset is optionally checked against an XML Schema grammar (XML schema, schema) for validity and rendered to the user through some interface, such as Simple API for XML (SAX) or Document Object Model (DOM) (API stands for application programming interface).
  • SAX Simple API for XML
  • DOM Document Object Model
  • XML is invariably constrained by XML parsing and validation by having the tokenizer drive the validation engine.
  • tokenizers parse the entire XML document by performing tokenizing with a DOM or SAX event stream and then run the validation engine over the stream of tokens or the DOM.
  • technologies that treat validation as an integral part of parsing have not reached their full potential.
  • none of the current technologies allow the validation engine to be written in a recursive-descent code driven manner. As a result, this requires large tables, which increase the memory footprint, thus slowing processing efficiency. It also makes the validation code slower, and obscures the control flow of the whole parsing and validation processes.
  • a method for parsing a document the document being in an Extensible Markup Language (XML) format
  • the method comprising: identifying data via the XML format; defining a tag set including a plurality of tags; defining a tokenizer that produces one token at a time; parsing the XML document via a parser; validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and permitting the validation engine to be written in a recursive-descent code-driven manner.
  • XML Extensible Markup Language
  • a system for parsing a document the document being in an Extensible Markup Language (XML) format
  • the system comprising: a network; and a host system in communication with the network, the host system including XML software to implement a method comprising: identifying data via the XML format; defining a tag set including a plurality of tags; defining a tokenizer that produces one token at a time; parsing the XML document via a parser; validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and permitting the validation engine to be written in a recursive-descent code-driven manner.
  • XML Extensible Markup Language
  • a computer program for parsing a document the document being in an Extensible Markup Language (XML) format
  • the computer program product comprising: a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising: identifying data via the XML format; defining a tag set including a plurality of tags; defining a tokenizer that produces one token at a time; parsing the XML document via a parser; validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and permitting the validation engine to be written in a recursive-descent code-driven manner.
  • XML Extensible Markup Language
  • FIG. 1 illustrates one example of a diagram showing a validating engine communicating with a parser in order to receive a start tag
  • FIG. 2 illustrates one example of a diagram showing the parser communicating with the validating engine in order to send the start tag
  • FIG. 3 illustrates one example of a diagram showing the validation code calling a function GetNextTag( );
  • FIG. 4 illustrates one example of a diagram showing the parser transferring control back to the validating engine
  • FIG. 5 illustrates one example of a method for parsing and validating a document in a XML (Extensible Markup Language) format
  • FIG. 6 illustrates one example of a communication between a validating engine and a parsing engine.
  • One aspect of the exemplary embodiments is a method for integrating validation and parsing processes. Another aspect of the exemplary embodiments is a method for allowing a validation engine to be written in a recursive-descent code-driven manner.
  • a recursive descent parser is a top-down parser built from a set of mutually-recursive procedures (or a non-recursive equivalent) where each such procedure usually implements one of the production rules of the grammar.
  • Code-driven refers to the design style that is common in some handcrafted programs.
  • the OO (Object Oriented) approach favors highly structured OO techniques.
  • the code-driven approach favors straightforward code with embedded data.
  • the table-driven approach puts data in a separate data section that is used by the code section.
  • a typical program generator will use some combination of these three techniques.
  • a generated code is preferred, which is generated from the DTD or other grammar information for the XML dialect.
  • an XML parser that is code-driven or table-driven may be generated.
  • most of the code for the parser is static and unchanging, but tables are generated from the DTD.
  • these current solutions are table-driven because that is the only viable approach.
  • the exemplary embodiments of the present invention allow for a code-driven approach.
  • XML cursors are a way to navigate through an XML instance document. Once a user loads an XML document, the user may create a cursor to represent a specific place in the XML document. Because a user may utilize a cursor with or without a schema corresponding to the XML document, cursors are an ideal way to handle XML documents without the schema. With the XML cursor, the user may utilize a token model to move through the XML document in small increments, or in a manner similar to using a DOM-based model.
  • the validator-driven architecture has a validation engine drive the tokenizer and the tokenizer produces one token at a time, as needed by the validation engine.
  • This enables the validation engine to be written in a recursive-descent code-driven manner. This results in a faster validating parser, without large tables, and with a clear control flow through the whole parsing and validation process. This makes the validation code easier to write, test, maintain, and extend, as well as making the code shorter and faster.
  • the parsing engine maintains a pointer in the XML buffer, as well as other states, as appropriate.
  • the validating engine maintains control of the parse, and engages the parsing engine when it requires a next piece of information from the XML instance document, using for example a call function GetNextTag( ).
  • validation code could be written in a recursive-descent code-driven manner, as indicated by this pseudo-code:
  • FIGS. 1-4 illustrate one example of a process diagram showing a validating engine communicating with the parser in order to receive one or more start tags.
  • FIG. 1 illustrates a validating engine 10 communicating with a parsing engine 12 .
  • the parsing engine 12 receives one or more tags from an input buffer 14 .
  • the process starts in a routine at the top and the validating engine 10 requests a tag (i.e., ⁇ a> tag 3 ) from the parsing engine 12 .
  • the parsing engine 12 has updated its state, including moving the pointer ahead, beyond the ⁇ a> tag 3 , to the next spot 5 in an input buffer 14 .
  • the validating engine 10 receives the ⁇ a> tag 3 and the validation code proceeds by calling a “validate-a” routine, whose first action is to re-call the function GetNextTag( ).
  • the parsing engine 12 decides to return the ⁇ b> tag 5 it received from the input buffer 14 . Finally, the parsing engine 12 transfers control back to the validating engine 10 , deciding that when asked it will continue the parse where its state indicates that it left off, namely at the next spot 7 .
  • validation code is very straightforwardly an implementation of this particular DTD fragment, and thus the validation code could be written in a generic manner to process any DTD, and validate the XML instance document against it.
  • the parsing process commences at step 50 when a user commences a document parsing operation.
  • the data is identified to determine whether it is XML format data.
  • a tag set is defined that includes a plurality of tags.
  • a tokenizer that produces one token at a time is defined.
  • the XML document is validated via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing.
  • the parsing process terminates.
  • the method sets up the input buffer and passes control to the VE (validating engine).
  • the VE calls GetNextTag( ) on the PE (parsing engine).
  • the PE reads the ⁇ a> tag.
  • the PE updates its state including updating its pointer into the data buffer to after the ⁇ a> tag.
  • the PE passes the ⁇ a> tag back to the VE, ending the GetNextTag( ) call.
  • the VE internally calls validate-a( ).
  • the VE calls GetNextTag( ) on the PE.
  • the PE reads its current state and reads the ⁇ b> tag from the buffer.
  • the PE updates its state including updating its pointer into the data buffer to after the ⁇ b> tag.
  • the PE passes the ⁇ b> tag back to the VE, ending the GetNextTag( ) call.
  • the capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
  • one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media.
  • the media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention.
  • the article of manufacture can be included as a part of a computer system or sold separately.
  • At least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A method for parsing a document in an Extensible Markup Language (XML) format includes identifying data via the XML format, defining a tag set including a plurality of tags, defining a tokenizer that produces one token at a time, parsing the XML document via a parser, validating the XML document via a validation engine, the validation engine driving the tokenizer, the validating being an integral part of the parsing, and permitting the validation engine to be written in a recursive-descent code-driven manner.

Description

    TRADEMARKS
  • IBM® is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • This invention relates to XML parsers, and particularly to a method that treats validation engines as an integral part of parsing by allowing the validation engines to be written in a recursive-descent code-driven manner.
  • 2. Description of Background
  • XML (Extensible Markup Language) has begun to work its way into the business computing infrastructure and underlying protocols such as the Simple Object Access Protocol (SOAP) and Web services. In the performance-critical setting of business computing, however, the flexibility of XML becomes a liability due to the potentially significant performance penalty. XML processing is conceptually a multitiered task, an attribute it inherits from the multiple layers of specifications that govern its use including: XML, XML namespaces, XML Information Set (Infoset), and XML Schema. Traditional XML processor implementations reflect these specification layers directly. Bytes, read off the “wire” or from disk, are converted to some known form. Attribute values and end-of-line sequences are normalized. Namespace declarations and prefixes are resolved, and the tokens are then transformed into some representation of the document Infoset. The Infoset is optionally checked against an XML Schema grammar (XML schema, schema) for validity and rendered to the user through some interface, such as Simple API for XML (SAX) or Document Object Model (DOM) (API stands for application programming interface).
  • With the widespread adoption of SOAP and Web services, XML-based processing, and parsing of XML documents in particular, is becoming a performance-critical aspect of business computing. In such scenarios, XML is invariably constrained by XML parsing and validation by having the tokenizer drive the validation engine. In fact, most tokenizers parse the entire XML document by performing tokenizing with a DOM or SAX event stream and then run the validation engine over the stream of tokens or the DOM. However, technologies that treat validation as an integral part of parsing have not reached their full potential. Regardless of which manner of pushing the tokens is used, none of the current technologies allow the validation engine to be written in a recursive-descent code driven manner. As a result, this requires large tables, which increase the memory footprint, thus slowing processing efficiency. It also makes the validation code slower, and obscures the control flow of the whole parsing and validation processes.
  • Thus, it is well known that there are no existing technologies that treat validation as an integral part of parsing. Therefore, it is desired to integrate validation and parsing, and enable the writing of the validation engine in a recursive-descent code-driven manner.
  • SUMMARY OF THE INVENTION
  • The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method for parsing a document, the document being in an Extensible Markup Language (XML) format, the method comprising: identifying data via the XML format; defining a tag set including a plurality of tags; defining a tokenizer that produces one token at a time; parsing the XML document via a parser; validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and permitting the validation engine to be written in a recursive-descent code-driven manner.
  • The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a system for parsing a document, the document being in an Extensible Markup Language (XML) format, the system comprising: a network; and a host system in communication with the network, the host system including XML software to implement a method comprising: identifying data via the XML format; defining a tag set including a plurality of tags; defining a tokenizer that produces one token at a time; parsing the XML document via a parser; validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and permitting the validation engine to be written in a recursive-descent code-driven manner.
  • The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a computer program for parsing a document, the document being in an Extensible Markup Language (XML) format, the computer program product comprising: a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising: identifying data via the XML format; defining a tag set including a plurality of tags; defining a tokenizer that produces one token at a time; parsing the XML document via a parser; validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and permitting the validation engine to be written in a recursive-descent code-driven manner.
  • Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and the drawings.
  • TECHNICAL EFFECTS
  • As a result of the summarized invention, technically we have achieved a solution that integrates validation and parsing, thus resulting in a faster and more efficient validating parser, without large tables, and with a clear control flow through the entire parsing and validating processes.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The subject matter, which is regarded as the invention, is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
  • FIG. 1 illustrates one example of a diagram showing a validating engine communicating with a parser in order to receive a start tag;
  • FIG. 2 illustrates one example of a diagram showing the parser communicating with the validating engine in order to send the start tag;
  • FIG. 3 illustrates one example of a diagram showing the validation code calling a function GetNextTag( );
  • FIG. 4 illustrates one example of a diagram showing the parser transferring control back to the validating engine;
  • FIG. 5 illustrates one example of a method for parsing and validating a document in a XML (Extensible Markup Language) format; and
  • FIG. 6 illustrates one example of a communication between a validating engine and a parsing engine.
  • DETAILED DESCRIPTION OF THE INVENTION
  • One aspect of the exemplary embodiments is a method for integrating validation and parsing processes. Another aspect of the exemplary embodiments is a method for allowing a validation engine to be written in a recursive-descent code-driven manner.
  • A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures (or a non-recursive equivalent) where each such procedure usually implements one of the production rules of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes. Code-driven refers to the design style that is common in some handcrafted programs. In general, there are three styles of code in generated programs. In a program-generation system, the need for understanding and change occurs at the specification level, not the program level. This results in greater flexibility in the design of generated programs. Three styles of generated programs are known. The OO (Object Oriented) approach favors highly structured OO techniques. The code-driven approach favors straightforward code with embedded data. The table-driven approach puts data in a separate data section that is used by the code section. A typical program generator will use some combination of these three techniques. In the exemplary embodiments, a generated code is preferred, which is generated from the DTD or other grammar information for the XML dialect. From the generated code an XML parser that is code-driven or table-driven may be generated. In most cases, most of the code for the parser is static and unchanging, but tables are generated from the DTD. In other words, these current solutions are table-driven because that is the only viable approach. The exemplary embodiments of the present invention allow for a code-driven approach.
  • Once a class of XML documents is defined, there is a need for a method of navigating through the XML documents. XML cursors are a way to navigate through an XML instance document. Once a user loads an XML document, the user may create a cursor to represent a specific place in the XML document. Because a user may utilize a cursor with or without a schema corresponding to the XML document, cursors are an ideal way to handle XML documents without the schema. With the XML cursor, the user may utilize a token model to move through the XML document in small increments, or in a manner similar to using a DOM-based model.
  • In the exemplary embodiments of the present application, the validator-driven architecture has a validation engine drive the tokenizer and the tokenizer produces one token at a time, as needed by the validation engine. This enables the validation engine to be written in a recursive-descent code-driven manner. This results in a faster validating parser, without large tables, and with a clear control flow through the whole parsing and validation process. This makes the validation code easier to write, test, maintain, and extend, as well as making the code shorter and faster.
  • Below is one example of an algorithm containing the validation code written in a recursive-descent code-driven manner. In particular, at any given point in the parse, the parsing engine maintains a pointer in the XML buffer, as well as other states, as appropriate. The validating engine maintains control of the parse, and engages the parsing engine when it requires a next piece of information from the XML instance document, using for example a call function GetNextTag( ). Consider the following DTD fragment:
  • <!ELEMENT a (b,d)>
    <!ELEMENT b (c)>
  • In this case, the validation code could be written in a recursive-descent code-driven manner, as indicated by this pseudo-code:
  • validate-top-level-tag {
     tag = GetNextTag( );
     if (tag == “a”)
      validate-a( );
     else if (tag == “b”)
      validate-b( );
     else
      error(“illegal top-level tag”);
    }
    validate-a {
     if (GetNextTag( ) == “b”)
      validate-b( );
     else
      error(“a should start with a b”);
     if (GetNextTag( ) == “d”)
      validate-d( );
     else
      error(“a should continue with a d”);
     if (GetNextTag( ) == “/a”)
      return
     else
      error(“a should end with a /a”);
    }
    validate-b {
     if (GetNextTag( ) == “c”)
      validate-c( );
     else
      error(“b should start with a c”);
     if (GetNextTag( ) == “/b”)
      return
     else
      error(“b should end with a /b”);
    }
  • FIGS. 1-4 illustrate one example of a process diagram showing a validating engine communicating with the parser in order to receive one or more start tags.
  • FIG. 1 illustrates a validating engine 10 communicating with a parsing engine 12. The parsing engine 12 receives one or more tags from an input buffer 14. In FIG. 1, the process starts in a routine at the top and the validating engine 10 requests a tag (i.e., <a> tag 3) from the parsing engine 12.
  • In FIG. 2, the parsing engine 12 has updated its state, including moving the pointer ahead, beyond the <a> tag 3, to the next spot 5 in an input buffer 14.
  • In FIG. 3, the validating engine 10 receives the <a> tag 3 and the validation code proceeds by calling a “validate-a” routine, whose first action is to re-call the function GetNextTag( ).
  • In FIG. 4, the parsing engine 12 decides to return the <b> tag 5 it received from the input buffer 14. Finally, the parsing engine 12 transfers control back to the validating engine 10, deciding that when asked it will continue the parse where its state indicates that it left off, namely at the next spot 7.
  • Processing continues in this manner until the validating engine 10 completes a path through the entire XML document. The validation code is very straightforwardly an implementation of this particular DTD fragment, and thus the validation code could be written in a generic manner to process any DTD, and validate the XML instance document against it.
  • Referring to FIG. 5, a method for parsing a document in a XML format is shown. The parsing process commences at step 50 when a user commences a document parsing operation. At step 52 the data is identified to determine whether it is XML format data. At step 54 a tag set is defined that includes a plurality of tags. At step 56 a tokenizer that produces one token at a time is defined. At step 58 the XML document is validated via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing. At step 60 the parsing process terminates.
  • Referring to FIG. 6, a communication between a validating engine and a parsing engine is shown. At step 70, the method sets up the input buffer and passes control to the VE (validating engine). At step 72, the VE calls GetNextTag( ) on the PE (parsing engine). At step 74, the PE reads the <a> tag. At step 76, the PE updates its state including updating its pointer into the data buffer to after the <a> tag. At step 78, the PE passes the <a> tag back to the VE, ending the GetNextTag( ) call. At step 80, the VE internally calls validate-a( ). At step 82, the VE calls GetNextTag( ) on the PE. At step 84, the PE reads its current state and reads the <b> tag from the buffer. At step 86, the PE updates its state including updating its pointer into the data buffer to after the <b> tag. At step 88, the PE passes the <b> tag back to the VE, ending the GetNextTag( ) call.
  • The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
  • As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately.
  • Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
  • The flow diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
  • While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.

Claims (15)

1. A method for parsing a document, the document being in an Extensible Markup Language (XML) format, the method comprising:
identifying data via the XML format;
defining a tag set including a plurality of tags;
defining a tokenizer that produces one token at a time;
parsing the XML document via a parser;
validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and
permitting the validation engine to be written in a recursive-descent code-driven manner.
2. The method of claim 1, wherein the parser maintains one or more datatypes in a buffer.
3. The method of claim 1, wherein the validation engine maintains control of the parser.
4. The method of claim 1, wherein the validation engine activates the parser when the validation engine requires a next piece of information from the XML document.
5. The method of claim 4, wherein the next piece of information is retrieved via a function GetNextTag( ).
6. A system for parsing a document, the document being in an Extensible Markup Language (XML) format, the system comprising:
a network; and
a host system in communication with the network, the host system including XML software to implement a method comprising:
identifying data via the XML format;
defining a tag set including a plurality of tags;
defining a tokenizer that produces one token at a time;
parsing the XML document via a parser;
validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and
permitting the validation engine to be written in a recursive-descent code-driven manner.
7. The system of claim 6, wherein the parser maintains one or more datatypes in a buffer.
8. The system of claim 6, wherein the validation engine maintains control of the parser.
9. The system of claim 6, wherein the validation engine activates the parser when the validation engine requires a next piece of information from the XML document.
10. The system of claim 9, wherein the next piece of information is retrieved via a function GetNextTag( ).
11. A computer program product for parsing a document, the document being in an Extensible Markup Language (XML) format, the computer program product comprising:
a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising:
identifying data via the XML format;
defining a tag set including a plurality of tags;
defining a tokenizer that produces one token at a time;
parsing the XML document via a parser;
validating the XML document via a validation engine, the validation engine driving the tokenizer, and the validating being an integral part of the parsing; and
permitting the validation engine to be written in a recursive-descent code-driven manner.
12. The computer program of claim 11, wherein the parser maintains one or more datatypes in a buffer.
13. The computer program of claim 11, wherein the validation engine maintains control of the parser.
14. The computer program of claim 11, wherein the validation engine activates the parser when the validation engine requires a next piece of information from the XML document.
15. The computer program of claim 14, wherein the next piece of information is retrieved via a function GetNextTag( ).
US11/460,050 2006-07-26 2006-07-26 Validator-driven architecture of an xml parsing and validating solution Abandoned US20080028375A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/460,050 US20080028375A1 (en) 2006-07-26 2006-07-26 Validator-driven architecture of an xml parsing and validating solution
US12/130,285 US8935605B2 (en) 2006-07-26 2008-05-30 Validator-driven architecture of an XML parsing and validating solution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/460,050 US20080028375A1 (en) 2006-07-26 2006-07-26 Validator-driven architecture of an xml parsing and validating solution

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/130,285 Continuation US8935605B2 (en) 2006-07-26 2008-05-30 Validator-driven architecture of an XML parsing and validating solution

Publications (1)

Publication Number Publication Date
US20080028375A1 true US20080028375A1 (en) 2008-01-31

Family

ID=38987901

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/460,050 Abandoned US20080028375A1 (en) 2006-07-26 2006-07-26 Validator-driven architecture of an xml parsing and validating solution
US12/130,285 Active 2030-12-28 US8935605B2 (en) 2006-07-26 2008-05-30 Validator-driven architecture of an XML parsing and validating solution

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/130,285 Active 2030-12-28 US8935605B2 (en) 2006-07-26 2008-05-30 Validator-driven architecture of an XML parsing and validating solution

Country Status (1)

Country Link
US (2) US20080028375A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080092037A1 (en) * 2006-10-16 2008-04-17 Oracle International Corporation Validation of XML content in a streaming fashion
US20090150412A1 (en) * 2007-12-05 2009-06-11 Sam Idicula Efficient streaming evaluation of xpaths on binary-encoded xml schema-based documents
US20090210783A1 (en) * 2008-02-15 2009-08-20 Canon Kabushiki Kaisha Method and device for access to a production of a grammar for processing a document of hierarchical data
US8522136B1 (en) * 2008-03-31 2013-08-27 Sonoa Networks India (PVT) Ltd. Extensible markup language (XML) document validation
US20150161215A1 (en) * 2012-08-31 2015-06-11 Facebook, Inc. Api version testing based on query schema
US9646028B2 (en) 2012-08-31 2017-05-09 Facebook, Inc. Graph query logic
US11226815B2 (en) * 2020-04-07 2022-01-18 International Business Machines Corporation Using big code to construct code conditional truth tables

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9003380B2 (en) * 2010-01-12 2015-04-07 Qualcomm Incorporated Execution of dynamic languages via metadata extraction
US8677316B2 (en) * 2010-05-12 2014-03-18 Microsoft Corporation Enforcement of architectural design during software development
US11237802B1 (en) 2020-07-20 2022-02-01 Bank Of America Corporation Architecture diagram analysis tool for software development

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060212859A1 (en) * 2005-03-18 2006-09-21 Microsoft Corporation System and method for generating XML-based language parser and writer
US20070250766A1 (en) * 2006-04-19 2007-10-25 Vijay Medi Streaming validation of XML documents

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8918710B2 (en) * 2004-10-05 2014-12-23 Oracle International Corporation Reducing programming complexity in applications interfacing with parsers for data elements represented according to a markup language

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060212859A1 (en) * 2005-03-18 2006-09-21 Microsoft Corporation System and method for generating XML-based language parser and writer
US20070250766A1 (en) * 2006-04-19 2007-10-25 Vijay Medi Streaming validation of XML documents

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080092037A1 (en) * 2006-10-16 2008-04-17 Oracle International Corporation Validation of XML content in a streaming fashion
US20090150412A1 (en) * 2007-12-05 2009-06-11 Sam Idicula Efficient streaming evaluation of xpaths on binary-encoded xml schema-based documents
US9842090B2 (en) 2007-12-05 2017-12-12 Oracle International Corporation Efficient streaming evaluation of XPaths on binary-encoded XML schema-based documents
US8464231B2 (en) * 2008-02-15 2013-06-11 Canon Kabushiki Kaisha Method and apparatus for accessing a production forming a set of rules for constructing hierarchical data of a structured document
US20090210783A1 (en) * 2008-02-15 2009-08-20 Canon Kabushiki Kaisha Method and device for access to a production of a grammar for processing a document of hierarchical data
US8522136B1 (en) * 2008-03-31 2013-08-27 Sonoa Networks India (PVT) Ltd. Extensible markup language (XML) document validation
US20150161215A1 (en) * 2012-08-31 2015-06-11 Facebook, Inc. Api version testing based on query schema
US9400822B2 (en) * 2012-08-31 2016-07-26 Facebook, Inc. API version testing based on query schema
US9646028B2 (en) 2012-08-31 2017-05-09 Facebook, Inc. Graph query logic
US20170212914A1 (en) * 2012-08-31 2017-07-27 Facebook, Inc. Graph Query Logic
US10671661B2 (en) * 2012-08-31 2020-06-02 Facebook, Inc. Graph query logic
US11226815B2 (en) * 2020-04-07 2022-01-18 International Business Machines Corporation Using big code to construct code conditional truth tables
US11656869B2 (en) 2020-04-07 2023-05-23 International Business Machines Corporation Using big code to construct code conditional truth tables

Also Published As

Publication number Publication date
US20080229292A1 (en) 2008-09-18
US8935605B2 (en) 2015-01-13

Similar Documents

Publication Publication Date Title
US8935605B2 (en) Validator-driven architecture of an XML parsing and validating solution
CN108399256B (en) Heterogeneous database content synchronization method and device and middleware
US7992081B2 (en) Streaming validation of XML documents
US7890479B2 (en) Efficient XML schema validation of XML fragments using annotated automaton encoding
US20130086100A1 (en) Method and System Providing Document Semantic Validation and Reporting of Schema Violations
US7941417B2 (en) Processing structured electronic document streams using look-ahead automata
US7707491B2 (en) Optimizing differential XML processing by leveraging schema and statistics
US20120110437A1 (en) Style and layout caching of web content
US8201083B2 (en) Simple one-pass W3C XML schema simple type parsing, validation, and deserialization system
US9361398B1 (en) Maintaining a relational database and its schema in response to a stream of XML messages based on one or more arbitrary and evolving XML schemas
CA2803616C (en) Systems, methods and machine readable mediums to select a title for content production
US20050246159A1 (en) System and method for document and data validation
CN102053994B (en) Language parser and parsing method using same
KR20040002738A (en) System and method for supporting non-native xml in native xml of a word-processor document
CN108090069A (en) A kind of method and apparatus for showing web page resources in a browser
US8266188B2 (en) Method and system for extracting structural information from a data file
CN111522558B (en) Method, device, system and readable medium for dynamically configuring rules based on Java
US8397158B1 (en) System and method for partial parsing of XML documents and modification thereof
US20060168511A1 (en) Method of passing information from a preprocessor to a parser
US8959066B2 (en) Message validation in a service-oriented architecture
JP4909882B2 (en) Web document style changing system and method
US10719424B1 (en) Compositional string analysis
US9971849B2 (en) Method and system for retrieving legal data for user interface form generation by merging syntactic and semantic contraints
JP4234698B2 (en) Structured document processing system
US20080092037A1 (en) Validation of XML content in a streaming fashion

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATSA, MOSHE E.;PERKINS, ERIC;REEL/FRAME:018005/0187

Effective date: 20060725

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION