TARGET RECOGNITION METHOD FIELD OF THE INVENTION
The present invention relates to the field of image recognition and methods for identifying objects. BACKGROUND TO THE INVENTION
In the maritime surveillance industry, for example, visual recognition of ships is the most common form of identifying ships. For the purposes of port safety and security, the entry and departure of ships to/from port is checked. Often the responsibility for such checks falls to security services such as the military, police or customs. Traditionally, the identifying of ships is by sight. While an experienced operator may quickly recognise ships which are frequent visitors to the port, unfamiliar ships may require the operator to conduct manual searching of records to make the identification.
Systems for automatic identification of ships have been proposed. AIS transmitters are fitted to ships which transmit identification information. However, such a system is open to abuse by tampering with the AIS transmitter to transmit false identification information. As such, even where AIS transmitter identification is used, visual cross-check identification still needs to be employed as verification. It is desired to have automatic visual recognition conducted by computers.
However, the ability of computers to visually recognise objects is a relatively recent area of research that has had limited success. The main hurdles in the current research include identifying real world objects from a two dimensional image of a three dimensional world, image noise and the effects of changing lighting. The complexity of the human visual system and its ability to recognise objects has proved difficult to replicate using instrumentation.
The traditional approach, that has been taken with computer visual recognition, is to segment the captured image of the object (where possible), process the image to find features and attempt to match the features with stored features of known objects. Feature finding involves complex methods that require significant processing overheads. These methods produce models or parameterised elements which are used to search. The methods attempt to
come up with feature measurements which are used to estimate a feature fit against stored feature measurements of known objects.
A 2004 US Navy Research Paper, titled "Robust recognition of Ship Types from an Infrared Silhouette" (Jorge Alves, Jessica Herman and Neil C. Rowe; US Naval Postgraduate School, Monterey) discloses a method which uses edge histograms and neural networks to identify ship types. The paper cites that their
V feature extraction methods required approximately 55 seconds per image. This period of time provides a significant delay from when the image is taken to when the characteristics of the features are identified and can be crucial in security applications when, for example, image recognition is being used to identify moving objects such as vehicles, aircraft or vessels.
Another approach being taken by an Asian defence Research agency is to correlate image frames against a 3D model. This approach relies on obtaining and maintaining 3D wire models of all objects in the domain. The set of 3D models are rotated to match the attitude and aspect of the image of the object. The image is then correlated against the set of three dimensional models of objects and probability of matches applied. This approach is not easily implemented in that the availability of 3D models of objects is unrealistic in real life and does not account for modifications of objects from the base model. US Patent No. 6,597,800 discloses an "Automatic target recognition apparatus and process". This method concentrates on methods of extracting the features and segmentation methods and the organisation of the feature extraction apparatus. The method described in this invention describes a methodology of identification rather than the apparatus of feature extraction and segmentation. US Patent No. 4,845,610 discloses "Target recognition using string to string matching". This approach creates a one dimensional description of the perceived target's boundary. Effectively, it is doing outline external shape matching.
US Patent No. 6,118,886 discloses an "Automatic target recognition apparatus and method". This approach details the processing involved in achieving the features needed for classification. The classification technique uses a fractal dimension value to measure target edge contortion to attempt classification against land mines.
US Patent Application No. 2003/0138146 discloses a method for determining the attributes of features within a captured image, for example, determining the readings of various instrument indicators from an image of a cockpit instrument panel. The method relies upon the use of a template to extract the known location of sub-images, i.e each instrument, from the overall captured image to extract the required feature, i.e the instrument indicator. Processing of the extracted feature allows the instrument reading to be determined. The fact that it is a prerequisite of the method that the cockpit instrument panel is known and conforms to the template used means that the method is not suitable for identification of an unknown object.
It is an object of the present invention to provide an alternative approach to target recognition. SUMMARY OF THE INVENTION
In its broadest form, the invention provides a method of creating a multi- dimensional search space that can be used to categorise noisy data. The method first defines a description language for the features whose separation allows for expected noise variance from real life readings. This is done such that each feature type maps to one or more dimensions (arranged in an orthogonal basis). The set of features results in a vector representing the image or data set. According to the invention there is provided a method of recognising the identity of a target object from a plurality of known objects, said method including the steps of: obtaining a digital visual image of said target object; applying a feature extraction method to said image to extract one or more visual features of said target object; for each extracted visual feature: ' establishing one or more values for one or more predetermined feature categorisation parameters for said extracted visual feature; and creating one or more target feature vectors based upon said established value(s); collating all created target feature vectors to form a target feature vector set;
comparing said target feature vector set with feature vector sets of known objects; and providing an indication of one or more known objects which have a feature vector set substantially matching said target feature vector set. Preferably, the establishing step further includes: establishing one or more values for one or more alternative categorisation parameters for the respective extracted visual feature; and creating one or more alternative target feature vectors based upon said further established value(s). In further preferred embodiments, all the created alternative target feature vectors are collated to form an alternative target feature vector set; whereby the method further includes: comparing said alternative target feature vector set with feature vector sets of known objects; and providing an indication of known object(s) which have a feature vector set substantially matching said alternative target feature vector set.
In preferred implementations, a target image is analysed by feature extraction methods to identify groups of pixels that are related in some way. This could be by region colour, edge detection or other means to determine the boundaries and extent of a feature.
Each feature is categorised using chosen categorisations. The result is a vector representing the feature. Using fuzzy logic, multiple vectors can be created for the same feature representing different possibilities of categorising a feature using different categorisations. A search space is populated with positions of all known object feature vectors. This target image vector is then checked against the search space to determine which object vectors are within a defined distance (Nearest neighbours). The categorisation ensures that those within the defined distance categorise to the same vector point. Those matching object vectors represent candidates classifying the input image or data.
The total feature fit is assessed for the target giving an overall probability of target identification.
Preferred implementations of the present invention provide the advantage of ease of calculation and speed at which object recognition can be performed concurrently against extremely large datasets of known object feature vectors. When object signature features located within a proximity to the target signature features are the only features processed, the search time is logarithmic with respect to the size of the number of signatures. BRIEF DESCRIPTION OF THE DRAWINGS
A preferred embodiment of the present invention will now be described with reference to the accompanying drawings, in which: Fig. 1 illustrates a captured image of a target object in the form of a ship;
Fig. 2 is an exploded view of a feature of the ship of Fig. 1 , being the ship's funnel;
Fig. 3 illustrates an example of a search space for a feature vector. DESCRIPTION OF PREFERRED EMBODIMENT The present invention can be implemented on common computing platforms such as a PC, Apple, IBM, Sun, or HP utilising commonly known operating systems such as Windows, HPUX, Solaris, Linux, BSD Unix, or Mac OS and in common programming languages such as Visual Basic, C++, C#, Pascal and Java. It will be appreciated, however, that the present invention is not specific to any particular hardware or software implementation, and is at a conceptual level above specifics of implementation. It is to be understood that various other embodiments and variations of the invention may be produced without departing from the spirit or scope of the invention. The following is provided to assist in understanding the practical implementation of orie^ embodiment of the invention.
The following implementation utilises the following elements:
• A multi-dimensional search space. A multi-dimensional search space is defined by each of the category types selected. Each dimension of the multi-dimensional space should represent an orthogonal category type. The full set of categories to be used defines a vector for a feature. Each vector defines a feature in some way.
• A target signature, being a collection of features that identify an object. More than one signature may be applicable to the same object.
• A data set of object signatures. This is the total data set of known targets. Each target may have one or more signatures. Each signature has one or more feature vectors.
• A target signature result data set. The target signature result data set links the object identification, signature identifier and results of feature matching or the object signature features against the target image features.
• Feature Categorisation Definition. Feature measurements are first defined into categorisations. These categorisations could be relative position, colour, density, shape or some other type of value describing the object. The choice of the coarseness of the categorisations method of determining which set of values equates to a categorisation needs to be made dependent on the application domain. It is important to choose domain specific categorisations that separate different objects rather than categorisations that apply to almost all objects. For example, a human face categorisation of "Has Eyebrows" (everyone has at least one) would not greatly differentiate between most humans whereas "Eye Colour" would differentiate. Finally the categorisations are ordered where possible to allow distance or fuzzy vectors to match.
Using the structures above, the target signature result data set is initially cleared.
The search space is populated with the features contained in the data set of object signatures. This can be a full set or a select group of signatures. Feature vectors from the selected target signatures identify their position in the search space and the identity of the target. Images from cameras are typically arranged in a 2 dimensional matrix of pixels image colours. These colours could be represented as a binary value (eg: 0 - 255) or a colour value (eg RGB). Each pixel represents a single colour at a location on the image. Using different techniques, a subset of the pixels
representing the target image can be selected from the full set of image pixels (Segmentation). This is the target image.
Known feature extraction methods are used to identify the areas in the target image that are to be considered features. An example of a suitable feature extraction method is disclosed in US Patent No. 6597800, the disclosure of which is incorporated herein by way of reference. These features represent areas or points of interest. A set of pixels identified by these methods represent the feature.
The identified features in a target image are categorised by the predefined categorisation methods. This provides a set of feature vectors. This categorisation is performed in the exact same manner as the target object features were determined. Each feature identifies a unique point in the vector space.
Each object feature vector's position in proximity to the target feature vector's position is assessed and a total measure of fit established. This measure is then used to update the result of the target signature result data set. If the target signature is not in the target signature result data set, its details are added from the data set of object signatures. The distance from a measured point to these vectors provides a goodness of fit of the classification. Targets in the target signature result data set with the best measurement of fit can then be considered as candidates for identification.
Continued assessment by repeating this method with multiple images can be performed to improve candidate identification reducing the effect of noise until enough confidence is achieved to declare identification of a target. Categorisation can be narrowed or widened to account for noise variance in the image sources. To illustrate, two successive images of a scene will typically not contain the exact same pixel colours for every pixel on the image. Lighting changes, cloud movement, etc. cause changes in the colours. This can result in the existence, size and shape of features changing from frame to frame. An extremely fine categorisation would be a measurement (i.e. this object occupies 23 x 2 pixels) where as a more generalised categorisation may say that the object is 'tall' and 'thin'. With this categorisation it does not matter between image samples that the feature occupies 24 x 3 pixels. It will still receive the
same categorisation of 'tall' and 'thin'. Alternative strategies employing sub-pixel comparison can address this issue but with a significant increase in processing overhead.
Categorisation further helps reduce the size of signatures by eliminating common elements between objects. In some domains the objects that are being assessed may nearly all have the same categorisation for certain features. All ships have a large area on the bottom called a hull and therefore categorising the hull as a feature does not assist in search process. (I.e. all ships have hulls so all objects in the data set would have a hull feature.) The vector space allows multiple points to be identified as belonging to the same object. This allows differences to be categorised as to multiple vector points assisting in determination of fit during the matching phase.
Depending on selection of categorisation, the method can provide significant separation between total feature vectors allowing for significant differentiation to be identified between objects.
Judicial selection of categories used within the signatures developed can be resilient to object size, image aspect as well as scene lighting and sensor sensitivity variations.
The resultant target signatures can be used to train the object signature data sets.
As an illustrative example of the method being used in practice, consider the need to identify a ship at sea. Figure 1 is a segmented black and white image of a ship 10 at sea. This is a target image in the context of the method. The target image is scanned and the sub features of the target image are determined. For example, the rectangle 12 located on the ship's funnel 14 may be selected as a feature as shown in Figure 2.
If the categorisation methods chosen include relative position of the feature as a percentage of the vessel length and height then this feature is located in the 10% from the left and 15% from the top. The categorisation of shape being tall, squarish, or wide would categorise this as square. This results in a feature vector of [10,15,S] where this represents the 10% from the left, 15% from the top and "s" indicating squarish. The three elements of the vector are the three dimensions in the search space, as shown in Figure 3.
The target feature vector identifies a position in the vector space. A search of the vector space for the feature near the location may find a set of features from different ships. For every ship that the target feature vector matches against the ship, the probability of a unique identification is increased. This process continues until all features in the target image have been assessed.
The set of ships that have been selected are then checked to see the goodness of fit. This can be achieved by comparing the number of features expected on the vessel and the number of features in the target image against the number that matched. This provides an overall measure of the probability that the set of ships matches the image.
A sequence of ship images can be assessed using this method to build confidence in the ship's class and/or the individual ship name.
While the present invention has been described with reference to a specific embodiment, it will be appreciated that various modifications and changes could be made without departing from the scope of the invention.