WO2001035282A2

WO2001035282A2 - System and method for creating associations between digital data objects

Info

Publication number: WO2001035282A2
Application number: PCT/US2000/041622
Authority: WO
Inventors: David L. Zeltzer; Rita K. Addison
Original assignee: Emerge Learning, Inc.
Priority date: 1999-10-26
Filing date: 2000-10-26
Publication date: 2001-05-17
Also published as: AU2747701A; WO2001035282A3

Abstract

A method for the dynamic creation and presentation of associations between digital data objects is revealed. The illustrated embodiment of the present invention provides a way of associating digital data objects that share similar features, such as digital multimedia objects. Computer vision and other technologies are used to detect features in digital data objects, and the present invention provides a mechanism to form associations among digital data objects that share features. Knowledge representation technology is utilized to allow the associations among digital data objects to be assigned an importance level weighted according to user preferences. The weighting of the association links allows coherent and meaningful presentations of related digital data objects to a user.

Description

SYSTEM AND METHOD FOR

CREATING ASSOCIATIONS BETWEEN

DIGITAL DATA OBJECTS

Field of the Invention

This invention relates generally to a system and method for organizing and interacting with digital data, and more particularly, to an automated, interactive system for creating and presenting associations between digital data objects in multimedia and information collections.

Background of the invention

The rapid development of digital multimedia has resulted in a need for improved tools to manage digital data, including multimedia assets. ("Multimedia" includes color images and black and white still images, video, audio, animation, text, and 2- and 3- dimensional graphic objects). Unfortunately, conventional methods of managing physical media are not readily transferable to digital media. The conventional methods of managing physical media, such as pictures, by storing the images in a single location, such as a photo album or a desk drawer, makes organizing, finding and presenting the physical media difficult. The physical media are difficult to index and search manually, and conventional methods for presenting physical media restrict users to static presentations that are very difficult to change. For example, pictures in a photo album are affixed to the album in a fixed order and changing the presentation requires the presenter to physically transfer the pictures to another spot in the album.

Conventional tools for the management of digital multimedia are inadequate.

Current industry efforts have attempted to transform paper scrapbooks into "digital scrapbooks," but these digital scrapbooks still result in static, passive presentations using linear technologies that lack user-friendly interfaces. Changing the presentation of digital multimedia presentations still requires manual configuration of the digital scrapbook by the user. Conventional methods for managing digital multimedia are tedious and time-consuming to organize, and make it difficult to coherently and conveniently present large collections of digital multimedia. Some computer vision technologies allow multimedia archives to be searched by image content, including both still images and video streams, but these technologies only allow a user to access one item at a time from a multimedia archive, failing to provide associations between different groups of digital data.

Summary of the Invention

The illustrated embodiment of the present invention provides a way of associating groups of digital multimedia data that share similar features. Multimedia objects in a multimedia stream that are input into a computer system using the illustrated embodiment share a great many features, including simple temporal and geographical attributes (i.e., a sequence of images taken at one time and place), as well as "high- level" features such as people and objects. Computer vision and other technologies can be used to detect such features in multimedia objects, and the present invention provides the mechanism needed to form associations among the multimedia objects that share features. Knowledge representation technology is then utilized to allow linkages among multimedia objects. The links are assigned an importance level weighted according to domain knowledge about identified multimedia objects. The weighting of the links allows coherent and meaningful presentations of related multimedia objects to a user.

In one embodiment of the present invention, a method is practiced whereby digital multimedia data is input into a computer system and parsed into individual multimedia objects. The multimedia data is screened for recognizable features and the features are stored as metadata attached to an agent created for each digital multimedia object. The features contained in the metadata attached to agents for different multimedia objects are compared for associations and any associated multimedia object is noted. Knowledge representation technology is used in the association process to weight the linkages according to domain knowledge about different types of objects. An interactive component presents to the user for verification those decisions regarding features and linkages that are incomplete comparison matches. Digital multimedia objects are presented to a user by the system and contain linkages to all of the other associated objects recorded in the attached metadata. In another embodiment of the present invention, a method is practiced whereby digital information data is input into a computer system and parsed into individual information objects. Examples of the input digital information include streams of genealogy research data, forensic investigation data, industrial site survey data, intelligence, surveillance and reconnaisance data, museum collections data and medical data, etc.. The information data is screened for recognizable features and the features are stored as metadata attached to an agent created for each digital information object. The features contained in the metadata attached to agents for different information objects are compared for associations and any associated information object is noted. Knowledge representation technology is used in the association process to weight the linkages according to domain knowledge about different types of objects. An interactive component presents to the user for verification decisions regarding features and linkages that are incomplete comparison matches. Digital information objects are presented to a user by the system and contain linkages to all of the other associated objects recorded in the attached metadata.

Brief Description of the Drawings

Figure 1 is a block diagram of major modules employed in the embodiment of the invention; Figure 2 is a block diagram of the EM- Agent and EMCON module of Figure 1 and interacting modules;

Figure 3 is a block diagram of the association engine module of Figure 2 and its related components;

Figure 4 is a flowchart of the association engine process; Figure 5 is a flowchart of the multimedia input process.

Detailed Description of the Invention

The illustrated embodiment of the present invention provides a method for creating associations between multimedia objects and presenting the associations to a user. The multimedia objects may be color images, black and white still images, video, audio, animation, text, and/or two and three dimensional graphic objects. Figure 1 depicts the major modules utilized in the illustrated embodiment of the present invention, including multimedia input 2. The multimedia input 2 may be entered into the storage medium of a computer which is using the illustrated embodiment in a number of different ways. For example, frame grabs may be used for digitizing analog video and 2D scanners may be used for digitizing text and visual images. Similarly 3D scanners may be used for digitizing 3D objects and high bandwidth cables that connect directly to digital cameras may be used to input the pictures taken by the digital cameras. The EM-Agents and EMCONS module 4 create an agent with attached metadata for each multimedia object input into the computer system and stored in memory. The metadata attached to the agent for a multimedia object contains information about the content of the multimeda object. The display processors module 6 converts display specifications received from the EM-Agents and EMCONS module 4 into display instructions for the multimedia displays module 8. The multimedia displays module 8 may be any one of numerous types of well known display devices used to make digital multimedia presentations to a user 10, including CRT monitors, flat panel LCDs, or head mounted displays.

The user 10, interacts with the illustrated embodiment of the present invention through a human-machine interface module 12. The human-machine interface module 12 provides a mechanism for interacting with the computer system such as a keyboard, joystick, mouse, voice command, etc.. The human-machine interface module parses messages received from the user 10 into tokens. The dialog manager module 14 receives the parsed messages from the user 10 as tokens from the human-machine interface module 12. The dialog manager 14 also interacts with a number of other modules including the blackboard module 16, the learning dialog module 18, the display manager module 22, the EM-Agents and EMCONS module 4, and the ontology module 20. The blackboard module 16 is primarily a storage area in which agent queries are stored in a priority order to be periodically retrieved by the dialog manager 14 for presentation to the user 10. An agent query may be questions to the user regarding the identification of a feature contained in a multimedia object or a question regarding the importance of such a feature. The blackboard module 16 also maintains a listing of identified features contained within the multimedia object data, a pointer to each feature and to all of the agents that share this feature, as well as one or more tags which indicate information such as whether the metadata for this feature has changed or whether new associations for this feature have been found.

The ontology module 20 is a software module that stores information about the domain of discourse in a generalization hierarchy. The generalization hierarchy contains weighted preferences for different types of features contained in the multimedia objects being input into the computer system. For example, if the user 10 is interested in creating associations between features depicting types of transportation, but is primarily interested in airplanes, less interested in trains and not interested in automobiles, the weighted preferences assigned to those features will reflect the user's level of interest. The generalization hierarchy includes kind-of and part-of relationships between nodes in the hierarchy, such as a chair is a kind-of furniture and a hair is a part-of a mammal. Such hierarchical information may include a dictionary of commonly encountered concepts such as kinship relationships, commonly observed holidays, common events such as birthday parties, common objects such as automobiles and household appliances, and so forth. The agents created by the EM-Agents and EMCONS module 4 consult with ontology 20 to determine the relative location in the generalization hierarchy of an EMCON feature to be identified. Based on this information, an EM-Agent calculates a priority for an agent query posted on the blackboard module 16 for later presentment to the user 10. Features which appear important based on the weighted preferences contained in the ontology module 20 are given a higher priority in the agent query list posted on the blackboard 16, features which are of limited interest are given a lower priority, and certain features will be noted as of no interest to the user, and the user will therefore not be questioned regarding the feature.

The dialog manager 14 retrieves agent queries from the blackboard module 16 and presents them to the learning dialog module 18. The learning dialog module 18 formats the agent query into instructions for the display manager module 22 on how to present the agent queries to the user 10. The agent queries may be spoken, graphical, or text based, and they may specify that certain multimedia objects be presented to the user 10. Replies to agent queries from the user are forwarded as tokens from the human- machine interface module 12 to the dialog manager 14 and to the learning dialog module 18. The learning dialog module 18 formats the input information from the user 10 and sends the information to EM-Agents and EMCONS module 4 as updates for the metadata attached to a particular agent.

Figure 2 depicts a block diagram of the submodules making up the EM-Agents and EMCONS module 4 and the other modules that interact with the EM-Agents and EMCONS module . Incoming multimedia input 2 is processed by a registry module 24 and the raw data from the input multimedia stream is stored as multimedia objects in the media objects module 26. The multimedia object is referred to as an EMCON. The media objects module 26 may be any one of a number of well known storage device types such as a hard drive. The registry module 24 creates an agent for each multimedia object, each agent including metadata describing the contents of the multimedia object. The agent and associated metadata for a multimedia object are stored in the agent community module 30. Agents are programmed to seek out associations for features contained in their attached metadata and utilize the association engine module 28 to contact the other agents in the agent communtity 30 to identify associations. Any association found will be stored in the metadata attached to the agents for both multimedia objects so that both agents will contain a link to the other agent's multimedia object. The located feature will also be listed on the blackboard 16.

The search for associations is an ongoing process that may occur automatically without a users input. In one embodiment, the search for associations occurs as a background process. In another embodiment, the search for associations between EMCON features occurs during idle clock cycles. The search for associations may also take place in a distributed computer network on available hosts. Alternatively, the search for associations between EMCON features may occur as the result of a user input command.

The association engine 28 and its interacting components are depicted in

Figure 3. The association engine 28 passes an EMCON to feature finder module 34 to be segmented into features to be stored in the metadata associated with the agent assigned to the EMCON. After segmenting the EMCON into features, the feature finder module 34 passes the features to the training features module 35 to search for commonly occurring objects. Commonly occurring objects include objects such as people, faces, domestic and farm animals, appliances, equipment, vehicles and landscape. If a possible feature is found by the feature finder 34 the feature is tagged with a generic identity and a weight specifying the probability that the generic identify is correct. If no generic identifications can be found, the EM-Agent for the multimedia object prepares agent queries, prioritizes them and posts them on the blackboard 16. Once a generic EMCON feature has been found, the EM-Agent 32 tries to uniquely identify each such feature. The features are sent to the association engine 28 and then to the feature recognizer 36. The feature recognizer 36 returns a unique identity of the feature together with a weight specifying the probability that the unique feature identity is correct. The weight is compared to a predefined parameter, and if the unique feature identity weight is above the threshold, the unique feature identities are formatted as metadata by the feature recognizer module 36 and returned to the EM-Agent 32 via the association engine 28. Uniquely recognized features are stored in training features module 35. As the number of training features increases, the feature identification process becomes more efficient. The association engine 28 is also used by EM-Agents 32 to perform metadata searches 29 on other EM-Agents in the agent community 30 for the purpose of forming associations between recognized features in the metadata describing the content of different multimedia objects.

Figure 4 depicts a flow chart of the association engine process. An EM-Agent 32 will check to see if it has received notification of any new associations between features ( step 38 ). If the EM-Agent 32 has received new associations it will compute the weight to be given to those associations using matched feature priorities ( step 40 ). If there are more associations ( step 42 ) the EM-Agent 32 will continue to compute the weight using matched feature priorities. If there are not more associations or if there were no associations when the EM-Agent 32 initially checked, the EM-Agent will select in priority order one of the features contained in its metadata and search for associations between the selected feature and the features contained in the metadata attached to other agents ( step 44 ). The selected feature is sent to the association engine 28 ( step 46 ). The association engine 28 checks to see if the feature is currently listed on the blackboard module 16 ( step 48 ). If information about the requested feature is not currently listed on the blackboard, a query regarding the feature is placed in priority order on the blackboard 16 ( step 50 ). If the requested feature is already listed on the blackboard 14, the information associated with EM-Agent 32 is merged with the metadata from the other listed EM-Agents ( step 52 ). A metadata search is then performed ( step 29 ) and the search results checked for new associations ( step 54 ). If an association is found the association weight is computed ( step 56 ). If all of the metadata has been searched ( step 58 ) the blackboard feature listing is updated ( step 60 ), otherwise another metadata search 29 ( step 29 ) on the remaining metadata is conducted. After updating the blackboard listing ( step 60 ) in the event of a completed metadata search, the new association metadata is returned to the EM-Agent 32 ( step 62 ).

A flowchart of the multimedia input process used by the illustrated embodiment of the present invention is depicted in Figure 5. A multimedia object from a multimedia input stream 2 is examined to see if it is a new object ( step 64 ). If the object is not a new object a message to that effect is displayed to the user ( step 66 ). The multimedia input stream 2 is examined to see if there are any additional objects ( step 68 ) and if there are not, control is returned to the dialog manager 14 ( step 70 ) which checks its own status ( step 71 ). If the dialog manager 14 has no pending queries to retrieve from the blackboard 16, the dialog manager quits ( step 72 ). If the multimedia object from the multimedia input stream 2 is a new object, the registry module 24 creates a new EM- Agent for the multimedia object (step 74), allocates memory and stores the EMCON object in the media objects module 26 ( step 76 ), and registers the EM-Agent and

EMCON with the registry 24 ( step 78). If the EMCON is part of a hierarchy, such as a single frame grab out of a sequence of video, it is segmented into constituent EMCONs ( step 80 ) and a new agent creation and registration process is carried out for each EMCON (step 82 ). Each individual EMCON is examined after registration for the presence of generic identifiable features ( step 84 ). If no generic features are identifed, an agent query is prepared ( step 86 ). The agent query is forwarded to the dialog manager 14 ( step 70) for posting on the blackboard 16 for later presentment to a user 10. If a generic feature is identified, the weight of the identification is computed ( step 88 ) and the EM-Agent' s metadata is updated if the weight of the identification exceeds a threshold parameter ( step 90 ). After identification of a generic feature, the EMCON is searched for unique features ( step 92 ). If unique features are not found, an agent query is prepared ( step 86 ). If a unique feature is found, and the feature weight exceeds a threshold parameter (step 96 ), the feature is stored as a training feature ( step 98 ) in the training features module 35, and the metadata for the EM-Agent is updated ( step 100 ). Thereafter, control is returned to the dialog manager module 14 ( step 70 ).

It will thus be seen that the invention attains the objectives stated in the previous description. Since certain changes may be made without departing from the scope of the present invention, it is intended that all matter contained in the above description or shown in the accompanying drawings be interpreted as illustrative and not in a literal sense. Practitioners of the art will realize that the sequence of steps depicted in the figures may be altered without departing from the scope of the present invention and that the illustrations contained herein are singular examples of a multitude of possible depictions of the present invention.

Claims

We claim:

1. In a computer network with a user interface, a method for associating multimedia content, said method comprising the steps of: providing an input stream of multimedia content; converting said stream of multimedia content into a plurality of objects; evaluating said multimedia content in said objects for recognizable features; discovering associations between different multimedia objects based on said recognizable features; and presenting associations between different multimedia objects to a user.

2. The method of claim 1 comprising the step of: creating an associated agent and collection of metadata for every multimedia object created from said multimedia content.

3. The method of claim 1 further comprising the steps of: attaching labels describing said recognized features in said multimedia content into the collection of metadata associated with the multimedia object containing the recognized features; and comparing labels located in said collection of metadata with labels located in different collections of metadata to discover associations between multimedia objects.

4. The method of claim 3 wherein said comparing labels located in different collections of metadata is performed as a background process.

5. The method of claim 3 wherein said comparing labels located in different collections of metadata is performed during idle clock cycles of a central processing unit.

6. The method of claim 3 wherein said said comparing labels located in different collections of metadata is performed on available hosts of a distributed computer network.

7. The method of claim 3 wherein said comparing labels located in different collections of metadata is performed in response to a user query.

8. The method of claim 3 further comprising the step of: storing discovered associations among multimedia objects in said collection of metadata.

9. The method of claim 3 further comprising the step of: querying said user to verify said recognized features recorded in said collection of metadata.

10. The method of claim 8 comprising the step of: providing a priority level to a query directed to said user, said priority level controlling an order in which said query is presented to said user, said priority level determined according to the relative importance of said recognized feature, said relative importance determined by a pre-defined and user changeable parameter.

11. In a computer network with a user interface, a method for associating information content, said method comprising the steps of: providing an input stream of information content; converting said stream of information content into a plurality of objects; evaluating said information content in said objects for recognizable features; discovering associations between different information objects based on said recognizable features; and presenting associations between different information objects to a user.

12. The method of claim 1 1 further comprising the step of: creating an associated agent and collection of metadata for every information object created from said information content.

13. The method of claim 11 further comprising the step of: attaching labels describing said recognized features in said information content into the collection of metadata associated with the information object containing the recognized features; comparing labels located in said collection of metadata with labels located in different collections of metadata to discover associations between information objects;

14. The method of claim 13 wherein said comparing labels located in different collections of metadata is performed as a background process.

15. The method of claim 13 wherein said comparing labels located in different collections of metadata is performed during idle clock cycles of the central processing unit.

16. The method of claim 13 wherein said said comparing labels located in different collections of metadata is performed on available hosts of a distributed computer network.

17. The method of claim 13 wherein said comparing labels located in different collections of metadata is performed in response to a user query.

18. The method of claim 13 comprising the step of: storing discovered associations among information objects in said collection of metadata.

19. The method of claim 13 comprising the step of: querying said user to verify said recognized features recorded in said collection of metadata.

20. The method of claim 19 comprising the step of: providing a priority level to a query directed to said user, said priority level controlling the order in which said query is presented to said user, said priority level determined according to the relative importance of said recognized feature, said relative importance determined by a pre-defined and user changeable parameter.

21. The method of claim 11 wherein said information content is genealogical data.

22. The method of claim 11 wherein said information content is forensic investigation data.

23. The method of claim 11 wherein said information content is industrial site survey data.

24. The method of claim 1 1 wherein said information content is museum collection data.

25. The method of claim 11 wherein said information content is intelligence, surveillance and reconnaisance data.

26. The method of claim 1 1 wherein said information content is medical data.