VIRTUAL FIELDS
FIELD OF INVENTION
The invention is related to the field of representation and translation of electronic documents.
BACKGROUND OF THE INVENTION
A field in some document forms can have multiple meanings. In some concrete documents, documents with data that convey information, a needed piece of information can reside in one of several possible locations, and the location of that piece of information depends on information in other locations in the concrete document. For example, in an EDI 850 (i.e. a purchase order), a "segment" (group) called "POl" is shown in Table 1. POl has elements (i.e. fields) in it, which are called "PO101", "PO102", etc.
POl
PO101 PO102
PO106 PO107 PO108 PO109
P0124 P0125 Table 1
POl has in it a set of data elements, any of which could store a value with a particular "meaning", as shown in Table 2.
Qualifier Element Data Element
Table 2
A qualifier element can contain a qualifier code, which determines the meaning of its related data elements. For example, if PO106 contains "UP", then PO107 holds an UPC code. If POllO holds a "VP", then POlll holds a vendor part number. However, any of PO106, PO108, ... P0124 can hold any of the qualifier values.
Thus the data that "means" a particular thing can be in any of PO107, PO109, ..., P0125, depending on the values of PO106, PO108, ..., P0124. The location that has a particular meaning can reside in any of several locations. This complicates documentation translation.
Current methods of document translation require writing customized code to handle such cases. For example, suppose Vendorl has a mapping tool that Customerl uses. Customerl is mapping an EDI 850 (see Table 1 above) to Document2. Customerl writes code like
if ( PO106 == "UP" )
{ move PO107 to Documents .UPC_Code
} else if (PO108 == "UP")
{ move PO109 to Document2.UPC_Code
} else if (POllO == "UP")
{ move POlll to Document2.UPC_Code } etc... else if (P0124 == "UP") { move P0125 to Document2.UPC_Code
}
Locations in a document can have more than one meaning. This means that conventional methods of mapping are hard to automate. Instead, the mappings must be manually done and require customized code, which does not allow reuse of mapping knowledge and rules.
Therefore, conventional methods have several disadvantages. Both mapping and the mapping rules are one-off. That is, each time a user wants to define how to perform a document translation, similar code must be written and tested. This increases the time needed to define how to translate from the source to the target document.
Furthermore, both the mapping and the mapping rules depend on user- written code. This makes it hard to automatically validate the integrity of the mapping. It also sets a minimum bar for the skill level of anyone trying to define a mapping, as they must then know all the document locations that might hold a particular meaning, and must be skilful enough to write the code to handle the case. This imposes a maintenance burden, as fixing a problem in a mapping requires altering code.
The mapping and the mapping rules are translation-language dependent. The code that must be written and tested depends on the underlying translation engine that will translate the documents. Thus, mapping rules will be translation-engine dependent, and a translation defined for one translation engine will likely need adjusting to make the mapping work on a different translation engine. Moving a transform from one translation engine to another is difficult.
The source and target mappings must be significantly different. The code for handling the case described above will differ whether the document is the source or the target document. If one has mapped from A to B, mapping from B to A requires major rework, as the code for the mapping would have to be rewritten using different logic.
Conventional mapping tools use superficial similarities in field names or document structure as the basis for automapping. They can not automap to virtual structures, forcing users to write code.
SUMMARY OF THE INVENTION
A method including identifying a meaning for data in a document, automatically locating the data having the meaning, assigning a virtual field to the meaning, and automatically mapping and transforming document using the virtual field is disclosed.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:
Figure 1 is an example of an embodiment of a data structure for a document.
Figure 2 is another example of an embodiment of a data structure for a document.
Figure 3 is an example of data structure used to create a virtual field.
Figure 4 is an example of a network that translates a document using virtual fields.
Figure 5 is an example of a computer system that translates a document using virtual fields.
Figure 6 is an example of a translation system that uses virtual fields to translate documents.
Figure 7 is an embodiment of a method for automatically generating a transform using virtual fields.
DETAILED DESCRIPTION
Virtual fields can be used to automatically generate transforms. The virtual fields automatically locate data that has a specific meaning in concrete documents where the meaning can reside in any of a set of locations, as identified by data in other document locations, assign a name and a field to that meaning, and let mapping and mapping rules work with the new field.
Figure 1 depicts a data structure for part of a document. Groupl has fields under it. Fields Fieldx_q, Fieldy_q and Fieldz_q are qualifier fields that can hold values from a predefined set of qualifier values. Fields Fieldx_dl, Fieldy_dl and Fieldz_dl are related data fields that can hold data values that will be moved between the source and the target documents. Fields Fieldx_d2, Fieldy_d2 and Fieldz_d2 are another set of related data fields. Groupl can contain other fields too.
Enabling and Disabling Virtual Fields
Figure 2 is a data structure depicting Groupl as having two virtual fields that represent the possible meanings of the data fields in Figure 1. Virtual field UPC_Code represents the data field in Fieldx_dl, Fieldy_dl and Fieldz_dl whose related qualifier field holds code "UPC". That is, qualifier and data fields come in pairs, such that if a qualifier field contains "UPC" then its related data field holds a UPC code value. The virtual fields are here depicted as having three qualifier-data field pairs. In other embodiments, a virtual field needs at least one qualifier-data pair, and can have more than three.
In this example, UPC_Code and Vendor_Code are "enabled" virtual fields, and that Vendor_Subcode is a "disabled" virtual field. Enabled fields appear in the document under Groupl. The document structure at any given time might or might not contain a particular virtual field under a particular group. When an event, such as a user operation in a GUI, requests that a virtual
field be enabled, the virtual field is added to the document structure. When an event requests that a virtual field be disabled, the virtual field is removed from the document structure.
Source Data for Defining Virtual Fields
The descriptions for virtual fields are stored in an external data source. Figure 3 illustrates a data structure including the information needed to create the virtual fields of Figure 2. The information needed to generate a virtual field is:
• Name - (optional). If specified, append the Qualifier to the end of the Name, and use the result as the name of the virtual field. If Name is not specified, locate the name of the group that is the parent of the qualifier and data fields, append the qualifier, and use the result as the name of the virtual field.
• Group - (optional). If specified, only enable the virtual field under a group having this name. If not specified, enable the virtual field under any group that has the specified qualifier and data fields.
• Qualifier - (required). The qualifier code.
• Description - (optional). A description of the "meaning" of the virtual field.
• Fields - (required). The qualifier and data fields. Each data field needs its own qualifier field.
Every data field in a virtual field has the same syntactic and presentation characteristics. That is, for UPC_Code to be made available for use, Fieldx_dl, Fieldy_dl and Fieldz_dl must have the same minimum and maximum lengths, store the same type of information (date, unsigned integer, etc.), if such information is available.
Using Virtual Fields in Source Documents to Automatically Generate a Transform
Users can apply mapping rules to meta-data to map from a virtual field in the source document to a corresponding field(s) in the target document. A virtual field in a source document can be treated like any other field. Whatever operations - move, or any other mapping rule that might be applied to other fields - apply to virtual fields.
A transform is the code used by a translation engine to convert one concrete document into another. A transform is generated by applying mapping rules to meta-data of the source and target documents. After the mapping rules, and meta-data, including virtual fields, are defined, a transform can be automatically generated, which will perform the following processing on virtual fields defined for a concrete source document:
Find the first qualifier field in the virtual field that has a value equal to the virtual field's qualifier code. Then hand the data from the corresponding data field to whatever mapping rule has been specified.
If no qualifier field in the virtual field has a value that matches the specified qualifier value, then no data will be handed to the mapping rule. In this sense, virtual fields in a source document are conditional fields - they are valid if and only if the qualifier field is present.
For example, in Figure 2, if the user mapped UPC_Code in the source document, the transform will locate the right group, and then examine Fieldx_q, Fieldy_q and Fieldz_q, in that order, and locate the first that holds the value "UPC". It will stop, and will continue with the mapping rules using the value in Fieldx_dl, Fieldy_dl or Fieldz__dl, whichever corresponds to the qualifier field that contained "UPC".
Note that simple extension to support default values, etc. are within the scope of the invention. For example, Figure 3, a constant value could be added to the information, such that if none of the qualifier fields match the qualifier value, then the virtual field's value is equal to the default.
Using Virtual Fields in Target Documents to Automatically Generate a Transform
Users can apply mapping rules to meta-data to map from location(s) in the source document to a virtual field in the target document. A virtual field in a target document can be treated like any other field. Whatever operations - move, or any other manipulation rule that might be applied to other fields - apply to virtual fields.
A transform is the code used by a translation engine to convert one concrete document into another. A transform is generated by applying mapping rules to the meta-data of the source and target documents. After the mapping rules and meta-data, including the virtual fields, are defined, a transform can be automatically generated, which will perform the following processing on virtual fields defined for a concrete target document:
Find the first qualifier and data field pair in the virtual field that have not had a value set, then put the qualifier code into the qualifier field, and put the data into the data field.
For example, in Figure 2, assume the user mapped to UPC_Code in the target document. If the source document specified 123456789 as the UPC code, the transform uses these steps to put the value into the target virtual field for the proper group:
• If both Fieldx_q and Field _dl are both empty, then the transform code will put "UPC" into Fieldx_q and 123456789 into Fieldx_dl
• else, if Fieldy_q and Fieldy_dl are both empty, then put "UPC" into Fieldy_q and 123456789 into Fieldy_dl
• else, if Fieldz_q and Fieldz_dl are both empty, then put "UPC" into Fieldz_q and 123456789 into Fieldz_dl
Note that simple extension to supply default values, etc. are also possible and are within the scope of the invention.
Hardware Overview
According to the present invention, a host computer system transmits and receives data over a computer network or standard telephone line. According to one embodiment, the steps of accessing, downloading, and manipulating the data, as well as other aspects of the present invention are implemented by a central processing unit (CPU) in the host computer executing sequences of instructions stored in a memory. The memory may be a random access memory (RAM), read-only memory (ROM), a persistent store, such as a mass storage device, or any combination of these devices. Execution of the sequences of instructions causes the CPU to perform steps according to the present invention.
The instructions may be loaded into the memory of the host computer from a storage device, or from one or more other computer systems over a network connection. For example, a server computer may transmit a sequence of instructions to the host computer in response to a message transmitted to the server over a network by the host. As the host receives the instructions over the network connection, it stores the instructions in memory. The host may store the instructions for later execution or execute the instructions as they arrive over the network connection. In some cases, the downloaded instructions may be directly supported by the CPU. In other cases, the instructions may not be directly executable by the CPU, and may instead be executed by an interpreter that interprets the instructions. In other embodiments, hardwired circuitry may be used in place of, or in combination with, software instructions to implement the present invention. Thus, the present invention is not limited to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by the host computer.
Figure 4 illustrates a system 400 in which a host computer 402 is connected to a remote computer 404 through a network 410. The network interface between host computer 402 and remote 404 may also include one or more routers, such as routers 406 and 408, which serve to buffer and route the data transmitted between the host and client computers. Network 410 may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof. The remote computer 404 may be a World-Wide Web
(WWW) server that stores data in the form of 'web pages' and transmits these pages as Hypertext Markup Language (HTML) files over the Internet network 410 to host computer 402. To access these files, host computer 402 runs a 'web browser', which is simply an application program for accessing and providing links to web pages available on various Internet sites. Host computer 402 is also configured to communicate to telephone system 412 through a telephone interface, typically a modem.
Figure 5 is a block diagram of a representative networked computer, such as host computer 402 illustrated in Figure 4. The computer system 500 includes a processor 502 coupled through a bus 501 to a random access memory (RAM) 504, a read only memory (ROM) 506, and a mass storage device 507. Mass storage device 507 could be a disk or tape drive for storing data and instructions. A display device 520 for providing visual output is also coupled to processor 502 through bus 501. Keyboard 521 is coupled to bus 501 for communicating information and command selections to processor 502. Another type of user input device is cursor control unit 522, which may be a device such as a mouse or trackball, for communicating direction commands that control cursor movement on display 520. Also coupled to processor 502 through bus 501 is an audio output port 524 for connection to speakers that output audio signals produced by computer 500.
Further coupled to processor 502 through bus 501 is an input/output (I/O) interface 525, and a network interface device 523 for providing a physical and logical connection between computer system 500 and a network. Network interface device 523 is used by various communication applications running on computer 500 for communicating over a network medium and may represent devices such as an ethernet card, ISDN card, or similar devices.
Modem 526 interfaces computer system 500 to a telephone line and translates digital data produced by the computer into analog signals that can be transmitted over standard telephone lines, such as by telephone system 412 in
Figure 4. In an embodiment of the present invention, modem 526 provides a hardwired interface to a telephone wall jack, however modem 526 could also
represent a wireless modem for communication over cellular telephone networks. It should be noted that the architecture of Figure 5 is provided only for purposes of illustration, and that a host computer used in conjunction with the present invention is not limited to the specific architecture shown.
Figure 6 shows an example of the groups and fields of two different documents, a source document format 610 and a target document format 620. In this embodiment, the document is a purchase order. However, the document may convey any information that one person or business wants to send to another person or business. The source group 615 includes the source fields of name, address, city, description, price, quantity, and total. The target group 625 includes the fields name, location, information, cost, number, and amount. Although the formats of the fields in the source and target groups are structurally different, they have similarities and common abstractions such as name, amount, and place to ship the goods. Thus, the names of the fields in groups 615 and 625 may be different, such as "price" and "cost," for example, but the data 617 and 627 contained in these fields is the same.
A virtual field that corresponds to a field in the source and target groups 615 and 625 can be used to capture these common abstractions using meta-data. For example, meta-data associated with the source document can be used by the mapping engine to define one or more virtual fields. The meta-data used to define the virtual fields can be obtained from a data structure such as the data structure of Figure 3. After the virtual fields are defined, the mapping engine can apply mapping rules to the meta-data associated with the source group, including the virtual fields, to automatically generate a transform. The transform is then provided to the translation engine, which uses the transform to convert the source document into the target document.
A mapping engine 650 creates a translation map, as shown in Figure 6. The translation map is used by a translation engine 630 to convert, or translate a message from a source format to a target format. The translation map is a metadata level description of the fields in the source document that will be used to populate a field in the target document.
Figure 7 shows an embodiment of a method for automatically generating a transform using virtual fields. One or more virtual fields for a first document are defined, step 710. The virtual fields are defined using meta-data contained in the data structure of Figure 3. One or more of these virtual fields are enabled, so that the enabled virtual fields appear in the first document, step 720. One or more of the virtual fields may be disabled, so that the disabled virtual fields do not appear in the first document, step 720. Mapping rules to map data from fields in the first document to fields in a second document are defined, step 740. Then, a transform to convert the first document into the second document is automatically generated by applying the mapping rules to the meta-data, including the enabled virtual fields, of the first and second documents.
Using virtual fields provides several advantages. First, virtual fields enable the automatic generation of transform code that maps between source and target documents. The automatically generated code enables virtual fields as needed - if it discovers that a virtual field that could potentially be enabled is specified by the code, it enables the virtual field.
Second, merely identifying the virtual field while mapping tends to be sufficient if a virtual field is involved. The user does not need to write code to locate the data of a source field that is part of a virtual field, or to put data into the correct location into the correct part of a target virtual field.
Third, if a user needs to put particular information into the first qualifier- data pair of a virtual field, he merely needs to specify that the translation engine run the mapping to that virtual field occurs before other mappings to a different virtual field that maps to the same set of qualifier-data pairs. Alternatively, he can manually write the code to put the data into those target fields.
Fourth, the ability to write code is not compromised. This new technique can co-exist with the older way of doing things. Fifth, transform code can be successfully generated for various translation engines.
Sixth, mapping from document A to B is much closer to mapping from B to A than without this invention. Thus, the mapping from B to A has been made closer to the transposition of the mapping from A to B. Mapping one direction
then provides most of the information needed to map the other direction. If users had to write code to map from A to B, such a transposition would be far more work. With this invention, transposing a mapping is far less work.
Seventh, mapping to or from a virtual field is translation-engine independent. The code appropriate for that translation engine is generated when writing out the transform in the way that translation engine requires.
Eighth, without needing to perform complicated analyses of user-written code, mappings to and from virtual fields can be validated, as most cases do not require the user to write code. Because fewer mappings require the user to write code, mapping difference checking is easier.
Ninth, a non-programmer can do most of the work of mapping. Tenth, maps are more translation-engine independent, as code to handle virtual fields in the mapping is automatically generated when the mapping is exported, rather than coded by the user before the mapping is exported. Eleventh, creating a map is faster, as automapping has a better hit rate.
Twelfth, maps have fewer bugs, as users don't need to write code. Thus, debugging a mapping is faster. Also, the maintenance burden is less, as users have less code in the specialized mapping languages of translation engines, and have less of their own code to maintain. Thirteenth, time to market is faster for users. Fourteenth, this invention works with virtual groups and automapping.
These and other embodiments of the present invention may be realized in accordance with these teachings and it should be evident that various modifications and changes may be made in these teachings without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense and the invention measured only in terms of the claims.