WO2007048844A1

WO2007048844A1 - Method for processing a representaitve source image of at least one object, processing device, corresponding distance map and a computer software product

Info

Publication number: WO2007048844A1
Application number: PCT/EP2006/067876
Authority: WO
Inventors: Sylvain Le Gallou; Christophe Garcia; Gaspard Breton; Renaud Seguier
Original assignee: France Telecom
Priority date: 2005-10-28
Filing date: 2006-10-27
Publication date: 2007-05-03

Abstract

The invention relates to a method for processing a representative source image (53) of at least one object consisting in building (61) a distance map by associating the smallest distance from the pixel to the outlines of said at least one object to each pixel of the source image (53) according to the outline image obtained at a preliminary stage of searching for at least one outline of said at least one object, in superseding said source image by the distance map at the input of a processing chain, and in processing (63) the distance map (62) in said processing chain.

Description

A method of processing a source image representative of at least one object, processing device, distance map and corresponding computer program product.

FIELD OF THE INVENTION The field of the invention is that of image processing and image sequences, such as video sequences.

More specifically, the invention relates to the processing of images to improve their quality, especially when an image to be treated does not have a uniform illumination. The invention finds particular, but not exclusively, applications in the field of video coding for video telephony, the animation of synthetic faces, the visual recognition of speech, the analysis of expressions and emotions, or the tracking and face recognition.

Thus, the invention applies for example to the recognition of deformable objects such as faces, requiring a fine analysis of gestures and expressions of faces, in an image or a sequence of images.

2. Prior Art

To date, several image processing techniques are known. These techniques are for example applied to the recognition of objects in an image or a sequence of images. For example, in the context of an object recognition application, T.F.

Cootes, GJ Edwards, and CJ Taylor presented in the document "Active Appearance Models" (1998, Vol.4, p.484-498, 1998) a technique based on Active Models of Appearance, hereafter called AAM, which can realistically synthesize the shape and texture of visual objects for reconstruction.

According to this technique, an appearance model, composed of a shape (set of interrelated points of interest) and a texture (grayscale components a shape), is generated from a base of learning composed of images of objects similar to the object to be recognized (possibly extracted from learning video sequences). To do this, the points of interest of the objects are annotated, and the modes of joint variations between the positions of the points and the content of the image are automatically learned from a principal components analysis, noted ACP later. . These models allow in particular to automatically and robustly identify points of interest in face images.

An active appearance model can thus be used to generate a set of plausible representations, in terms of shape and texture of the learned objects. This appearance model is used in particular to search for visual objects in images by jointly using the shape and texture information, through an optimization process on the model parameters, in order to best adjust the model on the area. image containing the object.

Unfortunately, there is no known technique to improve the quality of treatment when an image does not have a uniform illumination.

Thus, a major disadvantage of the prior art technique based on the active appearance models is that it does not take into account lighting conditions to generate an active appearance pattern, or to search for an object in a picture.

In other words, this technique of the prior art does not take into account non-homogeneous illumination in the images, that is to say, variations of the illumination conditions within the same image, or variations in the conditions of illumination. illumination between the image to be treated and a basic model, which may especially come from over-lighting, under-lighting, or side lighting.

Therefore, the construction of appearance models under normal lighting conditions does not allow a search for shape and reliable texture in different illuminations. It is therefore difficult to make a reliable recognition of an object in an image, especially when the image has shadows.

Thus, the major disadvantage of deformable appearance model methods lies in their low robustness to variable illumination conditions, the statistical models created by PCA being linear and therefore not robust to overall image variations, and particularly to lighting variations.

For the sake of simplification and illustration, the disadvantages of the prior art have been presented here in relation to the particular application to the recognition of objects in an image. It is clear however that this discussion can be transposed to other applications requiring image processing to improve the quality when the image does not have a uniform illumination.

3. Objectives of the invention The invention particularly aims to overcome these disadvantages of the prior art.

More specifically, an object of the invention is to provide an image processing technique for improving the quality of processing when said image does not have a uniform illumination.

In other words, the object of the invention is to provide such a treatment technique that is robust to variations in lighting conditions.

Another object of the invention is to provide such a technique having improved performance over prior art techniques for recognizing objects in an image, in which the illumination conditions can vary greatly.

Yet another object of the invention is to propose such a technique that is compatible with existing techniques based on deformable models or artificial neural networks. The invention also aims to provide such a technique that is simple to implement and inexpensive to implement. 4. Presentation of the invention

These objectives, as well as others that will appear later, are achieved using a method of processing a source image representative of at least one object. According to the invention, such a method comprises: a step of constructing a distance map associating with each pixel of the source image the smallest distance from the pixel to one of the contours of the object (s), to from an outline image obtained during a preliminary step of searching for at least one contour of the object (s), - a step of replacing said source image by said distance map at the input of a chain treatment ; a step of processing the distance map in said processing chain.

Thus, the invention is based on a completely new and inventive approach to the processing of a source image, based on the construction of a distance map associated with the source image, and the treatment not directly of the image. source of texture, but of the distance map associated with the source image. In other words, the distance map replaces the source image during the processing step. This substitution makes it possible in particular to improve the quality of the processing, in particular when the source image does not have a homogeneous illumination.

The processing step may in particular be implemented by a conventional processing chain. For example, a distance map thus constructed can be used directly at the input of a processing chain corresponding to an artificial neural network, or at the input of a processing chain implementing active models of appearance.

Thus, the invention makes it possible to take into account the distance relationships between the different contours composing an object, instead of directly taking into account the value of the colors or gray levels of the source image, as proposed by the prior techniques. The images to be processed according to the invention are thus more robust in the face of illumination.

In particular, the step of constructing a distance map can implement a calculation of a Euclidean distance between at least one pixel of the source image and each of the pixels of the contour image, and an allocation of the smaller distance calculated at said pixel, delivering an image of distances.

The step of constructing a distance map can also implement a normalization of the image of distances, and a reversal of the normalized image, delivering the distance map.

Thus, the darker the pixels of the distance map, the farther away they are from the contours, and the clearer the pixels, the closer they are to an outline.

In particular, the processing step may implement recognition of the object in the source image. The invention thus makes it possible to recognize more reliably than the techniques of the prior art an object in an image, whatever the illumination conditions.

According to a particular embodiment of the invention, the object is a deformable object. In particular, deformable object is understood to mean an object whose densest contours are the most informative: for example a face, a tire, etc.

According to this embodiment, the step of processing the distance map can include sub-stages of: matching of a deformable model, representative of the object, to the distance map; recognition of the object in the source image, by adjustment of the deformable model.

In particular, the deformable model can be generated by implementing the following steps: constructing a learning base comprising at least two basic images representative of an object similar to said deformable object; associating with each of the basic images of a distance map obtained by associating with each pixel of the base image the smallest distance of the pixel with one of the outlines of the similar object in the base image; generation of the deformable model from the distance maps. The invention also relates to a device for processing a source image representative of at least one object.

According to the invention, such a device comprises: means for constructing a distance map associating with each pixel of the source image the smallest distance from the pixel to one of the contours of said at least one object, from an outline image previously derived from means for searching at least one contour of said at least one object, means for replacing the source image by the distance map at the input of a processing chain; means for processing the distance map in the processing chain.

Such a device can in particular implement the treatment method as described above.

In particular, as indicated above, the construction means and the processing means can be implemented in two distinct entities, a first delivering the distance maps associated with the images entering in said first entity, and a second comprising a conventional processing chain. For example, a distance map thus constructed can be used directly at the input of a processing chain corresponding to a network of artificial neurons, or input of a processing chain implementing active models of appearance.

The invention also relates to a distance map associated with a source image representative of at least one object. According to the invention, such a distance map associates with each pixel of the source image the smallest distance from the pixel to one of the contours of said at least one object, from an outline image obtained during a preliminary step of searching for at least one contour of said at least one object.

Finally, the invention relates to a computer program product downloadable from a communication network and / or stored on a computer readable medium and / or executable by a microprocessor, comprising program code instructions for the implementation of the method treatment as described above. 5. List of figures

Other characteristics and advantages of the invention will emerge more clearly on reading the following description of a particular embodiment, given as a simple illustrative and nonlimiting example, and the appended drawings, among which: FIGS. and IB illustrate the construction of a distance map from a source image according to the invention; FIGS. 2A to 2D show an example of application of the invention to the recognition of a face in a source image, from a distance map as presented in relation with FIG. 1B; FIGS. 3A and 3B illustrate the evolution of the texture and the evolution of the shape during the adjustment of a model on a face; Figure 4 illustrates the performance of the invention; FIG. 5 presents a simplified diagram of the structure of the processing device according to the invention; FIG. 6A illustrates the general principle of the processing method according to the invention, and FIG. 6B illustrates its application to the recognition of objects in a picture. 6. Description of an embodiment of the invention

6.1 General principle The general principle of the invention is based on the implementation of a pre-treatment applied to an image, in particular to improve the quality of the image processing, when the image does not have a uniform illumination. In other words, the invention provides a robust pre-treatment to lighting variations.

This pretreatment is based in particular on the calculation of a distance map from a source image and on the substitution of the calculated distance map to the source image for processing directly on the distance map.

In other words, and as illustrated in relation to FIG. 6A, the invention proposes a method of processing a source image 53 representative of at least one object, comprising: a card construction step 61 of distances associating with each pixel of the source image the smallest distance from the pixel to one of the contours of the object or objects of the image, delivering a distance map 62, a step of replacing the source image by the distance map at the input of a processing chain; a processing step 63 of the distance card 62 in the processing chain, delivering a processed image 54.

The invention thus proposes to replace the texture images, composed of pixels, with distance maps, at the input of a processing chain implementing, for example, Active Appearance Models or artificial neural networks. The distance maps correspond more precisely to images comprising the information of distance between the different contours of the objects (for example the eyes, the nose and the mouth for a face) found in the source images (texture images). According to the invention, therefore, the processing is not carried out directly on the texture of the image, taking into account the value of the colors or gray levels of the source image, but on the distance map. For this reason, the distance maps make it possible to overcome at least in part the effects of light variations which make the use of Active Models of Appearance unstable, for example, when the illumination is arbitrary. The invention particularly relates to the use of this pre-processing for recognition applications of a deformable object in an image. 6.2 Creating a distance map

In the following, with reference to FIGS. 1A and 1B, the steps implemented for the construction of a distance map from a source image, called the original texture image.

To do this, we consider a grayscale source image, illustrating for example a face.

In particular, it is considered that if the source image is in color, a conversion of the color image into a greyscale image is first performed by calculating the luminance component L of the Hue Saturation luminance code. and Luminance ") of the color space.

To this end, it is recalled that a color image can be considered as the combination of three gray-scale images, each represented by a matrix comprising the values of the pixels, a first image R corresponding to the red level, and a second corresponding image V at the level of green, and finally a third image B corresponding to the blue level of the color image. Thus, the transformation of a color image to a grayscale image implements the relationship L =

0.2989 * R + 0.5870 * V + 0.1140 * B, with L the new image in grayscale (luminance).

A. Histogram equalization

The first step of constructing a distance map associated with the source image is based on a known technique of histogram equalization of the source image.

The principle of the histogram equalization of an image is recalled below. For the record, an image of size (M, N) is conventionally defined by a size matrix (M, N) comprising the gray levels of the pixels of the image, namely an integer value ranging from 0 for black to 255 for the white.

It is thus possible to construct a histogram making it possible to represent the distribution of the intensities of the pixels of an image, that is to say the number of pixels for each luminous intensity. More precisely, a statistical graph representing the gray level (for example, going from black to white) is conventionally constructed in abscissa, and the number of pixels in ordinates. Thus, the histogram of an image in 256 gray levels (from 0 to 255) is represented by a graph having 256 values on the abscissa, and the number of pixels per value of gray level in the image on the ordinates. In order to improve the contrast of the image, a histogram equalization, corresponding to a transformation of the histogram of the image to obtain a lighter, darker, or more normalized image, is necessary. We can notably make this adaptive equalization by performing it in blocks. Remember that when an image is divided into blocks, only parts of this matrix are considered.

Such an adaptive histogram equalization technique is especially proposed by K. Zuiderveld in his document "Contrast Limited Adaptive Histogram Equalization" (Graphics Gems IV, p.474-485, 1994).

For example, according to a particular embodiment of the invention, the source image is divided into 64 blocks by dividing the horizontal and vertical axes at 8, and the histogram shape is modified by following the Rayleigh distribution of the parameter. distribution α equal to 2, to obtain an equalized image. It is recalled that the probability function of

2 2 the Rayleigh distribution (of random variable r) is P (r) = {r I a) ^• e ^{~ ra} .

B. Smoothing

The second step of constructing a distance map is based on the smoothing of the equalized image obtained.

The use of a low-pass filter makes it possible in particular to attenuate the noise of the image, and thus to smooth the image. More precisely, the application of a filter on an area of the image implements a convolution product allowing, for each pixel of the zone to which it applies, to modify its value as a function of the values of the neighboring pixels. , assigned coefficients.

Classically, a filter is represented by a matrix whose center corresponds to the pixel concerned. The values of the matrix coefficients define the properties of the filter: high-pass, low-pass, band-pass, directional, etc.

For example, according to this particular embodiment of the invention, the equalized image is smoothed by means of an averaging filter F:

More specifically, the filtering is implemented by positioning the center of the filter successively at each pixel of the image and by averaging the neighboring pixels weighted by the coefficients of the filter.

In particular, it may be noted that the first two steps of histogram equalization and smoothing are optional, but make it possible to improve the quality of the treatment.

C. Extraction of contours The third step of constructing a distance map is based on the application, in the smoothed image, of an edge extractor, making it possible to obtain a contour image of the source image.

It is recalled that the extraction of outlines is a phase of detection of the edges of objects in the image. These edges are characterized by discontinuities of gray levels on one side and the other contours. These discontinuities can in particular be detected by calculating the gradient of the image. In addition, filters having particular coefficients make it possible to obtain a good estimate of the directional derivatives of the image with respect to an axis. Then, keeping only the pixels having a gradient greater than a predetermined threshold (thresholding operation), only the strongest discontinuities are preserved, which corresponds to the contours of the most relevant objects.

During the contour extraction operation, for example, the value 0 is assigned to the pixels having a value greater than the predetermined threshold, and the value 1 to the pixels having a value lower than said threshold. A binary image of contours is thus created, making it possible to visualize the contours in black on a white background image.

In particular, in order not to reduce the number of contours by imposing a threshold that is too high, and not to recover a non-significant number of contours by imposing a threshold that is too low, it is determined, according to the invention, a threshold block by block inside the image. In other words, the smoothed image is divided into at least two blocks of pixels, each block is assigned a threshold, determined according to the content of the block, and, for each of the blocks, the value is compared. pixels of the block at the corresponding threshold of this block and we keep only the pixels having a gradient greater than this threshold. Indeed, by setting a threshold for the whole image, the low contrast areas are not taken into account, since the calculated gradient is too low, while it is possible that important contours are there. Choosing a block-by-block threshold in the image therefore makes it possible to adapt to the different contrasts of the image.

For example, returning to the example described above, the contour extraction phase is performed by means of a Sobel filter on the 64 blocks of the image, according to this particular embodiment of the invention.

Indeed, we can approximate the directional gradients by means of filters of Sobel of horizontal direction

By applying these two filters on the smoothed image, two gradient images I _x and I _y are obtained. The calculation of a magnitude then makes it possible to consider only an image of gradients, to which blocks can be assigned, in order to recover the contours of the objects in the image.

The magnitude B image is defined by the sum of the matrices I _x and I _y , whose values have been previously squared. In particular, it is considered that the effective value of the noise is estimated by the square root of the average of B. For example, the threshold is defined as 75% of this value. Finally, as indicated above, the value of 0 is assigned to the pixels of the magnitude B image having a value greater than the threshold thus defined, corresponding to a black outline, and the value 1 to the pixels having a value lower than this threshold. A binary image of outlines is thus obtained, as illustrated in FIG.

D. Distance Map The fourth step is based on the construction of the distance map, from the binary contour image, associating with each pixel of the contour image the distance to its nearest contour in the outline image.

As indicated above, the invention is based on the construction of a distance map, such that the distance map replaces the source image during image processing, so as to improve the quality of the processing when the source image does not have uniform illumination.

According to the invention, a distance map is constructed from a contour binary image, in which the pixels representative of the contours for example bear a value equal to 0 (black), and the pixels representative of the background of the image. have a value equal to 1 (white).

According to a particular embodiment of the invention, the coordinates of the pixels representative of the contours are stored in a table of contours. ^{X _table Note> _Table y) ^ ^es coordinates of the pixels in the array.

Then, for each pixel of the contour image, one calculates

), the Euclidean distance D _e between this pixel and each pixel {x _table ^ ' _table ) of the outline table: D _e - - ^ [ ^X pixel ^Λ table) ⁺ [y pixel y table] ^•

Finally, it assigns each pixel \ x _p i _xel -> y _p i _xel) of the image contours the smallest Euclidean distance D _e obtained, thereby outputting an image distances I _D.

According to this particular embodiment of the invention, this image of distances I _D is normalized between 0 and 255 in order to be able to display it.

To do this, for example, each pixel of the image of distances I _D is modified from the following relation, to define a normalized image I _N : _n - ₉ ^ _S P ^~ Pmm PN ^{~ 2:} "^'

Pmax Pmm where: p _N denotes the new value of the pixel of the normalized image I _N ; p denotes the value of the pixel of the image of distances I _D ;

P _mm , p _max respectively denote the minimum value and the maximum value reached by the pixels of the distance image I _D.

At this point, the more a pixel is dark (value close to 0) the closer it is to a contour, and the more a pixel is clear (value close to 255), the farther it is from an edge. Finally, to improve the visual comfort, it is possible to invert the normalized image I _N to generate the distance map:

Map of Distances = 255 - I _N.

This last operation makes it possible in particular to invert the scale of the gray levels, which gives a better visual fluidity: the dark pixels become clear and the light pixels become dark.

In the distance map thus created, as illustrated in relation to FIG. 1B for example, it can be seen that the more a pixel is dark, the farther away it is from the outlines, and the brighter a pixel is, the closer it is to an outline.

The invention thus makes it possible to take into account the distance relationships between the different contours composing an object.

Thus, according to the invention, it does not directly take into account the value of the colors or gray levels of the source image to process this image. The invention thus makes it possible to overcome, at least in part, the effects of light variations, by delivering a robust image in the face of illumination. 6.3 Applications of distance maps to the recognition of objects in an image. As indicated above and as illustrated in connection with FIG. 6B, the invention particularly relates to the use of distance maps for applications in the recognition of at least one object in an image.

Thus, in the context of an application to face recognition, for example, the processing step 63 implements recognition of a face in the source image 53.

More specifically, and as indicated above, a distance map 62 associated with the source image 53 is constructed. On the other hand, in a step 64, a learning base is constructed from a set of representative facial images 55 associated with one or more persons. This learning base is notably composed of distance maps associated with each of the images of said set, and makes it possible to construct a deformable model 65.

During the processing step 63, the deformable model 65 is matched to the distance map 62 associated with the source image 53, then a face recognition in the source image 53, by adjustment of the deformable model 65 .

The processing step 63 finally delivers a processed image 54, in which the shape and texture of the face present in the source image 53 are found.

An object may in particular be represented by a point of interest. It is thus recalled that a distance map can be substituted for a source image during the image processing (the construction of a distance map corresponding to a pre-treatment robust to illumination), and thus be used in input of a processing chain, for example implemented by active models of appearance or networks of artificial neurons, replacing the original texture images, also called source images.

It is recalled in particular that according to the invention, the information contained in these distance maps is fundamental since it concerns the information of distances between the different contours of the objects found in the source images, thus allowing to free, or at least reduce, effects related to light variations that make the use of active appearance models unstable.

The use of distance maps in the Active Appearance Model (AAM) method for face recognition is described in detail below. It should be noted that this technique is classically broken down into three main stages: a learning phase, which makes it possible to create a model as well as a parameter enabling it to be deformed; - a phase of creation of matrices of experiments, allowing, thanks to a certain number of experiments, to give a relation between the modification of the parameter of appearance of the model and the adjustment of the model on images; a phase of segmentation, allowing to adjust the model on new images.

A. Learning phase

Remember that the learning phase creates a deformable model both in shape and texture.

Classically, we build a learning base, composed for example of faces, from a set of images, from which we extract different shapes and textures of faces. According to this example, the learning base obtained is therefore composed of a set of faces characterized by their shape and their texture.

According to the invention, the learning base is constructed not from faces directly extracted from each of the images of a set of images, but from distance maps associated with each of the images of said set.

Thus, in a first step, the images of said set are transformed into a distance map, as previously described.

Then using the Eyebrows, Eyes, Nose, Mouth and Face Contours annotations defining the shapes, the textures composing the shapes are retrieved. These textures correspond in particular to the pixels, and their corresponding gray levels, of distance maps.

The creation of the deformable model is then implemented according to a conventional technique.

First, the shapes (faces) are aligned by means of a Procrustean transformation, making it possible to bring back all the forms of the learning base in the same orientation and in the same ratio of size as the average form x _moy , corresponding to the average of all forms of the learning base. This will include using a Principal Component Analysis (PCA), the creation of a form of statistical model: ^x ^{~ x} Avg ι ^~ ^"~ Ψ x ^{^} x X ₁ with the synthesized form; ^x _moy ^is the average shape; φ _x matrix of the main eigenvectors of the ACP and b _x the vector controlling the synthesized form.

Then, all the textures of the learning base are deformed to be applied on the average form x _moy . A PCA is then applied on this set of textures aligned on the average shape, in order to create a statistical model in texture:

- b _g with g _t the synthesized texture; g _average medium texture; φ _g the matrix of the principal eigenvectors of the PCA; and b _g the vector controlling the synthesized texture.

After weighting the vector b _x by a weight W _x =

, (with λ _x and λ _p the eigenvalues of the ACP of shape and texture) to make it of the same order of magnitude as the vector b _g , these vectors are concatenated.

A new PCA is then performed on the concatenation of W _x ^• b _x and b _g , delivering a statistical model of appearance: b = [w _x -b _x b _g ^~ \ = φ- c with φ the matrix of the principal eigenvectors of PCA; and c the vector jointly controlling the shape and texture of the model, called the appearance parameter. Indeed, c controls the parameters b _x and b _g , which themselves control the shape and texture respectively.

Moreover, in order to reconstruct the desired shapes and textures, it is preferable to introduce a setting parameter t. This parameter allows to control the position, the scale, the orientation of the reconstructed object: t = [Sx Sy Tx Ty] with Sx = s - cos (0) - 1;

; where s denotes the ratio of homothety and θ the angle of rotation; and Tx and Ty the translations in x and y.

The parameters c and t thus make it possible to represent the objects learned in the learning base. In particular, these parameters can be automatically adjusted to recognize a new unknown object in a new source image. B. Creation phase of the experiment matrices

It is recalled that the phase of creation of so-called "experimental" matrices gives a relation between the position and the shape of the deformable model, and its way of evolving, in order to better adjust the deformable model on the form sought in the 'picture.

Indeed, thanks to the previous steps, each image of the learning base contains an object synthesized by a certain value of the appearance parameter c.

Let C _{0 be} the value of the appearance parameter c in image i of the learning base. By modifying the parameter C ₀ by a difference δc, and by modifying the pose parameter t (position, scale and rotation) by a time δt, a new form x _m and a new texture g _m are synthesized. Consider now the texture g, - of the original image i lying inside the form x _m . The difference in pixels g _t - g _m , denoted δg, and a linear regression with multiple variables on a certain number of experiments (modification of the images of the learning base of δc and δt), give a relation between δc and δg, then between δt and δg: δc = R _c - δg and δt = R _t ^• δg with R _c and R _t the experimental matrices. C. Segmentation phase

It is recalled that the segmentation phase allows, in turn, the implementation of the search for a particular texture and shape (whose modes of variation were learned in the first phase of learning) in new images.

It is recalled in particular that the modification of the parameters c and t makes it possible to adjust the deformable model on the object to be found in a new image.

The algorithm for searching an object in a new image, denoted source image, can for example take the following form: 1 - Generate g _m and x from the parameters c and t, initially equal to 0;

2 - Calculate g _t , corresponding to the texture in the x-shape of the source image; 3 - Evaluate δg ₀ = g _t - g _m and the error E ₀ = \ δg _o \;

4 - Predicting .delta.t R ₀ = _tQ ^• .DELTA.G and .DELTA.C ₀ = R _CQ ^• .DELTA.G;

5 - Find the first attenuation coefficient k such that E, <E ₀ , with ke {1.5,0.5,0.25,0.125,0.0625}, with E _j =

, where g _mj is the texture created by c, - = c - k - δc ₀ and g _t : is the texture of the image under x _t : (shape created by C _j and modified by δt ₀ ); 6- As long as error E.- is not stable, start again at step 1 with c = C; and t = δt ₀ .

When the algorithm converged, a representation of g _t through the model is synthesized in g _m . Indeed, the appearance parameter c, which characterizes the model, allows the end of the algorithm to obtain a representation of the shape x and the texture g _m of the object present in the image. This method of MAAs can therefore be useful both in synthesis and in analysis.

An example of the application of this technique to the recognition of a face in relation to FIGS. 2A to 2D is presented.

According to the invention, and as illustrated in relation with FIG. 2A, a distance map associated with a source image is created, comprising a new face to be recognized.

The deformable model, as defined above, is then related to the distance map associated with the source image, as illustrated in FIG. 2B, from the parameters of appearance c and pose t. This generates a shape x and a texture g _m from the parameters c and t (initially equal to 0), by adjusting the deformable model to the recognized object.

Once the shape x determined, that is to say the object recognized as illustrated in Figure 2C, it remains to determine the texture g _t is in the form x of the source image, as shown in Figure 2D.

Classically, these different steps are reiterated in order to better adjust the deformable model on the object to be recognized.

FIGS. 3A and 3B notably illustrate this adjustment on two examples of face search, one showing only the evolution of the texture (FIG. 3A), and the other showing only the evolution of the shape (FIG. Figure 3B). It can be seen in particular in these figures that after several iterations of the preceding steps, the model is best adjusted to the shape and / or the texture. Finally, we see that the algorithm of the AAM remains unchanged that is carried out a recognition of an object in an image according to a conventional technique, or according to the invention. Indeed, in both situations, only the images taken into account during the creation of the learning base and during the different segmentation phase. The invention thus proposes to substitute a distance map for an image

"Classic", so as to improve the image processing, especially when it does not have a uniform illumination.

6.4 Performance

We now present, in connection with Figure 4, the performance of the invention.

More precisely, FIG. 4 shows the error curve obtained by implementing the image processing method according to the invention applied to face recognition in an image (curve 41) and the error curve obtained in FIG. implementing a conventional method of face recognition in an image (curve 42). To compare these performances, a database from Carnegie Mellon University's "Robotics Institute" (CMU) research center, the "PIE" database, presented by T. Sim, S. Baker, and M. Bsat in the CMU Pose, Illumination, and Expression Database (Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002). This base contains 68 faces taken from different angles and under 21 different illuminations for each angle of view. More precisely, to make these error curves, 8 faces were selected among the 68 of the PIE base. Of these 8 faces, 4 are used during the AAM learning phase, and 4 during the segmentation phase. In other words, 4 faces are "learned" to form the learning base, and 4 other faces are "not learned" and considered as objects to be recognized.

In particular, for each of the 4 faces used during the learning phase, a particular illumination corresponding to a front illumination is considered. The other 20 illuminations of these faces are used during the segmentation phase.

Thus, in this FIG. 4, the average error obtained on the abscissa as a function of a given illumination number is plotted on the ordinate.

Specifically, the illuminations 2 to 8 correspond to more or less strong lights and more or less high on the left side of the face. The illuminations 16 to 22 correspond to lighting more or less strong and more or less high on the right side of the face. Finally, the illuminations 9 to 15 correspond to a lighting on the face face.

The errors are expressed as a ratio of the distance between the eyes with respect to a point, that is, an error of 1 corresponds to an error made in each point of the model equal to the distance between the two eyes. Each point of the curves of figure 4 corresponds to the average of the errors made by the model during the search of face in the 8 images of different faces under the same given illumination.

It is first noted that the error curve 41 is below the error curve 42. It is thus noted that the invention (curve 41), based on the use of distance maps, makes it possible to find the characteristic features of the faces. It can be noted that in this example, only 4 images of correctly lit faces were learned by the model.

On the other hand, the classical method (curve 42) makes errors in this search for characteristic features as soon as the illumination is no longer uniform (left-hand illumination: illuminations 2 to 8, and right-hand illumination: illuminations 16 to 22).

It is also noted that the technique according to the invention is independent of the direction of illumination (side lighting). Indeed, the curve 41 remains stable and undergoes only a few variations in the course of the various illuminations, while the curve 42 increases significantly to the illuminations 2 to 8 and 16 to 22, that is to say when the lights are on the side.

6.5 Treatment device

Finally, the structure of the treatment device according to the invention is presented in relation to FIG. Such a device comprises a memory M 51, and a processing unit 50 equipped with a μP processor, driven by a computer program Pg 52. The processing unit 50 receives as input a source image 53 representative of at least an object. The processor μP then builds, according to the instructions of the program Pg 52, a distance map associating with each pixel of the source image 53 the smallest distance of the pixel to one of the contours of the object or objects, and carries out a processing of the distance map.

The processing unit 50 thus outputs a processed image 54.

In the context of an application to face recognition, for example, the unit processing unit 50 may also receive as input a set of representative facial images 55 associated with one or more persons.

The processor μP then builds, according to the instructions of the program Pg 52, a distance map associated with each of the images of said set, to generate a learning base, and builds a deformable model from the learning base.

During the processing of the source image 53, the μP processor matches the model to the distance map associated with the source image 53 and adjusts this model.

Finally, the processing unit 50 outputs a processed image 54, in which the shape and texture of the face present in the source image 53 are found.

Claims

A method for processing a source image (53) representative of at least one object, characterized in that it comprises: a step of constructing (61) a distance map associating with each pixel of said source image (53) the smallest distance from said pixel to one of the contours of said at least one object, from an outline image obtained during a preliminary step of searching for at least one contour of said at least one object, a step of replacing said source image by said distance map at the input of a processing chain; a processing step (63) of said distance map (62) in said processing chain.

2. Process for processing a source image according to claim 1, characterized in that said step of constructing (61) a distance map implements a calculation of a Euclidean distance between at least one pixel of said image. source (53) and each of the pixels of said contour image, and an assignment of the smallest calculated distance to said pixel, delivering an image of distances.

A method for processing a source image according to claim 2, characterized in that said step of constructing (61) a distance map also implements a normalization of said image of distances, and an inversion of said image standardized, delivering said distance map (62).

4. Process for processing a source image according to any one of claims 1 to 3, characterized in that said processing step (63) implements a recognition of said object in said source image (53).

5. Process for processing a source image according to any one of claims 1 to 4, characterized in that said object is a deformable object.

6. Process for processing a source image according to claim 5, characterized in that said step of processing (63) said distance map comprises substeps of: - matching a deformable model, representative of said object, to said distance map; recognizing said object in said source image, by adjustment of said deformable model.

7. Process for processing a source image according to claim 6, characterized in that said deformable model is generated by implementation of the following steps: construction of a learning base comprising at least two basic images representative of an object similar to said deformable object; associating with each of said base images a distance map obtained by associating with each pixel of said base image the smallest distance from said pixel to one of the contours of said similar object in said base image; generating said deformable model from said distance maps.

The method for processing a source image according to any one of claims 1 to 7, characterized in that said processing chain implements an element belonging to the group comprising: an active model of appearance; a network of artificial neurons.

9. Process for processing a source image according to any one of claims 1 to 8, characterized in that said object is a face.

10. Device for processing a source image representative of at least one object, characterized in that it comprises: means for constructing a distance map associating with each pixel of said source image the smallest distance of said pixel at one of the contours of said at least one object, from a contour image previously derived from means for searching for at least one contour of said at least one object, means for replacing said source image with said distance map at the input of a processing chain; means for processing said distance map in said processing chain.

A distance map associated with a source image representative of at least one object, characterized in that said distance map associates with each pixel of said source image the smallest distance from said pixel to one of the contours of said at least one object. object, from an outline image obtained during a preliminary step of searching for at least one contour said at least one object.

12. Computer program product downloadable from a communication network and / or stored on a computer readable medium and / or executable by a microprocessor, characterized in that it comprises program code instructions for the implementation of at least one of claims 1 to 9.