WO2006111914A1

WO2006111914A1 - Detecting a moving object

Info

Publication number: WO2006111914A1
Application number: PCT/IB2006/051174
Authority: WO
Inventors: Ralph Braspenning
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2005-04-21
Filing date: 2006-04-14
Publication date: 2006-10-26

Abstract

A detection unit (300) for detecting a moving object in a sequence of video images is disclosed. The detection unit (300) comprises: a motion estimation unit (302) for computing a number of motion vector fields on basis of a number of consecutive video images of the sequence; a filter (304) for computing a combined motion vector field by filtering the number of motion vector fields; a motion compensation (306) unit for computing a motion compensated image on basis of a first one of the video images and the combined motion vector field; a subtraction unit (308) for computing a difference image on basis of a second one of the video images and the motion compensated image; and a selection unit (310) for selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving object.

Description

Detecting a moving object

The invention relates to a detection unit for detecting a moving object in a sequence of video images.

The invention further relates to an image processing apparatus comprising: receiving means for receiving a signal corresponding to a sequence of video images; and a detection unit for detecting a moving object in the sequence of video images.

The invention further relates to a method of detecting a moving object in a sequence of video images.

The invention further relates to a computer program product to be loaded by a computer arrangement, comprising instructions to detect a moving object in a sequence of video images, the computer arrangement comprising processing means and a memory.

Almost 70% of accidents involving pedestrians occur at night. One of the main reasons for this is the decrease in the visibility at night. Furthermore, some people are also hampered by decreased visual acuity at night due to specific visual problems. To this effect, automotive companies are interested in providing the drivers on-board systems with automated obstacle/object detection or computer-enhanced imagery of the road at night. Due to the insufficient or the absence of illumination of the road at night, infrared (IR) cameras that reflect the temperature of the objects replace vision-based systems. The main issue in night vision systems is the robust detection of objects, which move on the road so as to alarm the driver or to enhance the view. A particular problem in the detection of objects is that typically the camera, which acquires the images, also moves relative to the road and of course relative to the rest of the scene in which the road is located.

It is an object of the invention to provide a detection unit of the kind described in the opening paragraph, which is relatively robust.

This object of the invention is achieved in that detection unit comprises: a motion estimation unit for computing a number of motion vector fields on basis of a number of consecutive video images of the sequence; a filter for computing a combined motion vector field by filtering the number of motion vector fields; - a motion compensation unit for computing a motion compensated image on basis of a first one of the video images and the combined motion vector field; a subtraction unit for computing a difference image on basis of a second one of the video images and the motion compensated image; and a selection unit for selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving object.

To find the moving object, i.e. the object which moves relative to the scene in which the object is present, the detection unit according to the invention is arranged to estimate the background motion which is caused by the movement of the camera relative to the scene. The background motion is estimated by combining a number of motion vector fields. An estimation, i.e. motion compensated image is computed for a particular time instance on basis of the background motion. The actually captured image corresponding to that particular time instance, i.e. the second one of the video images and the motion compensated image are subtracted from each other to compute a difference image. It is assumed that a number of the pixels of the difference image having extreme values, which may be either relatively high or relatively low pixel values, correspond to the moving object to be detected. In other words, the areas in the difference image having extreme values correspond to relatively large deviations between the estimation and the actually captured representation of the scene, comprising the moving object, for the particular time instance. It is assumed that the estimation is a relatively good representation for the background but does not represent the moving object well, since the moving object is typically much smaller than the background and hence is represented by much fewer pixels.

Pixel values may correspond to luminance and/or color. Alternatively, the pixel values may correspond to the other physical quantity, e.g. temperature or reflectivity of radiation with a predetermined wavelength.

It should be noted that a number of the components, i.e. the motion estimation, the filter, the motion compensation unit, the subtraction unit and the selection unit might be implemented by a single processor. Besides that it may be that a number of the processing steps which are performed by these components are partly performed in parallel for the video images. Consequently, the different images, e.g. the video images, the motion compensated image, and the difference image may have mutually different dimensions. E.g. the motion compensated image may correspond to only a block of pixels of one of the video images.

In an embodiment of the detection unit according to the invention, the filter is arranged to compute the combined motion vector field by averaging respective motion vectors of the number of motion vector fields. With averaging respective motion vectors is meant that the corresponding motion vectors, i.e. belonging to respective groups/blocks of pixels of consecutive pairs of video images with mutually equal coordinates, are combined. Preferably, the averaging is based on separately averaging the x-components and the y- components of the respective motion vectors.

The different motion vector fields may have mutually equal contributions to the combined motion vector field, but preferably the filter is arranged to compute the combined motion vector field by weighted averaging respective motion vectors of the number of motion vector fields. Preferably the weighting coefficients are related to the distance in time between the particular time instance and the time instance to which the corresponding motion vector field belong.

In an embodiment of the detection unit according to the invention the filter is arranged to compute the combined motion vector field by recursively filtering respective motion vectors of the number of motion vector fields. The advantage of a recursive filter is its relative easy implementation. The storage requirement for storage in multiple video images is limited because the combined output of previous processing steps is applied to a current processing.

In an embodiment of the detection unit according to the invention, the filter is arranged to compute the combined motion vector field by order statistical filtering respective motion vectors of the number of motion vector fields. Preferably the order statistical filtering corresponds to median filtering. The advantage of order statistical filtering is that outliers are removed.

In an embodiment of the detection unit according to the invention, the selection unit is arranged to select the group of mutually related pixels, wherein the pixels of the group are mutually connected. Connected means that each of pixels of the group has at least one direct neighbor which is located adjacent to that pixel. The at least one direct neighbor may be located at one of the borders or at one of the corners of the particular pixel. The group of pixels may have any shape, even irregular. But typically, the selection unit is arranged to select the group of mutually related pixels on basis of the spatial distribution of the pixels of the group. The spatial distribution preferably corresponds to the shape of the object to be detected. The shape of the object can reasonably well characterized by the aspect ratio of the group of pixels, i.e. the ratio between the height and width of the object. So preferably, the selection unit is arranged to select the group of mutually related pixels on basis of a ratio between a first number of pixels of the group being horizontally disposed and a second number of pixels of the group being vertically disposed.

In an embodiment of the detection unit according to the invention, the selection unit is arranged to select the group of mutually related pixels on basis of values of pixels of the second one of the video images corresponding to the respective pixels of the group. Besides using the pixels of the difference image it is advantageous to apply pixels of at least one of the original non-subtracted video images. For instance if the detection unit is configured to find pedestrians in infra red images, the pixels of at least one of the video images corresponding to relatively high temperatures are preferably applied for the detection of the moving object. Alternatively, if the detection unit is configured to find a ball in a football match the known color of the ball and optionally the color of the grass are used for the detection of the ball.

An embodiment of the detection unit according to the invention further comprises a matching unit for matching the group of mutually related pixels with a further group of mutually related pixels being determined on basis of a third one of the video images, which succeeds the second one of the video images. In other words, this embodiment of the detection unit according to the invention is arranged to track the moving object. The detection unit is arranged to correlate different groups of pixels in subsequent difference image is and hence in subsequent video images.

It is further object of the invention to provide an image processing apparatus of the kind described in the opening paragraph which is relatively robust.

This object of the invention is achieved in that the detection unit of the image processing apparatus comprises: a motion estimation unit for computing a number of motion vector fields on basis of a number of consecutive video images of the sequence; - a filter for computing a combined motion vector field by filtering the number of motion vector fields; a motion compensation unit for computing a motion compensated image on basis of a first one of the video images and the combined motion vector field; a subtraction unit for computing a difference image on basis of a second one of the video images and the motion compensated image; and a selection unit for selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving object.

It is further object of the invention to provide a method of the kind described in the opening paragraph which is relatively robust.

This object of the invention is achieved in that the method of comprises: computing a number of motion vector fields on basis of a number of consecutive video images of the sequence; computing a combined motion vector field by filtering the number of motion vector fields; computing a motion compensated image on basis of a first one of the video images and the combined motion vector field; - computing a difference image on basis of a second one of the video images and the motion compensated image; and selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving object.

It is further object of the invention to provide a computer program product of the kind described in the opening paragraph which is relatively robust.

This object of the invention is achieved in that the computer program product after being loaded, provides said processing means with the capability to carry out: computing a number of motion vector fields on basis of a number of consecutive video images of the sequence; computing a combined motion vector field by filtering the number of motion vector fields; computing a motion compensated image on basis of a first one of the video images and the combined motion vector field; - computing a difference image on basis of a second one of the video images and the motion compensated image; and selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving object. Modifications of the detection unit and variations thereof may correspond to modifications and variations thereof of the image processing apparatus, the method and the computer program product, being described.

These and other aspects of the detection unit, of the image processing apparatus, of the method and of the computer program product, according to the invention will become apparent from and will be elucidated with respect to the implementations and embodiments described hereinafter and with reference to the accompanying drawings, wherein:

Fig. 1 schematically shows a number of images and motion vector fields and their mutual relations;

Fig. 2 schematically shows a number of motion vector fields and a combined motion vector field; Fig. 3 schematically shows a detection unit according to the invention;

Fig. 4 schematically shows a detection unit according to the invention comprising a matching unit;

Fig. 5 schematically shows an image processing apparatus according to the invention; and Fig. 6 schematically shows a video image on which a graphics overlay is drawn which forms a bounding box of a detected moving object. Same reference numerals and signs are used to denote similar parts throughout the figures.

Fig. 1 schematically shows a number of images and motion vector fields and their mutual relations. The time axis is depicted with the horizontal arrow. Fig. 1 shows: a number of consecutive video images Vl ,V2, ... ,V6; a number of consecutive motion vector fields MV1,MV2,....,MV5 which are computed on basis of the consecutive video images V1,V2,...,V6; - a number of consecutive combined motion vector fields

CMV1,CMV2,....,CMV5 which are computed on basis of the consecutive motion vector fields MV1.MV2......MV5; a number of consecutive motion compensated images E2,E3 , ... ,E6 which are computed on basis of a number of video images V1,V2,...,V6 and corresponding combined motion vector fields CMV1,CMV2,....,CMV5; and a number of difference images D2,D3 , ... ,D6 which are computed on basis of a number of motion compensated images E2,E3 , ... ,E6 and corresponding video images V1.V2.....V6.

The video images Vl, V2,...,V6 are images which are captured by a video camera which is located or moving within is scene. Moving may correspond to translation or rotation. Because of the movement the consecutive video images represent different views on the scene. That means that it looks like the scene moves relative to the camera. Typically there are objects in the scene which move independent of the camera. The detection unit 300 according to the invention is arranged to detect the representations, i.e. groups of pixels in the video images which correspond to these independently moving objects. For instance the camera is an infrared camera which is attached to a vehicle like a car which moves on the road. Alternatively, the camera is arranged to capture visible light and is configured to pan in order to follow the ball in a football match.

The motion vector fields MV1,MV2,....,MV5 are two-dimensional matrices each comprising a number of motion vectors. Each motion vector represents the amount of two-dimensional shift to be applied to a first pixel or optionally group/block of pixels, of a first one of the video images, to find a corresponding second pixel (or optionally group/block of pixels) of a second one of the video images. A motion vector field may be computed on basis of two consecutive video images. In Fig. 1 it is schematically depicted that motion vector field MV2 is primarily based on video image V2 and video image V3 which are in the time domain located before and after that motion vector field MV2. However, more than two video images may be applied to compute a motion vector field. Besides that alternative arrangements in the time domain are possible.

The combined motion vector fields CMV1,CMV2,....,CMV5 are two- dimensional matrices each comprising a number of combined motion vectors. In connection with Fig. 2 it is explained how the combined motion vectors may be computed. In Fig. 1 is schematically depicted that a particular combined motion vector field CMV2 is based on three consecutive motion vector fields MV1,MV2,MV3.

The motion compensated images E2,E3,...,E6 are computed on basis of interpolation or extrapolation of pixel values of the video images V1,V2,...,V6. As an example it is depicted that a particular motion compensated image E3 is computed on basis of a particular video image V2 and corresponding combined motion vector field CMV2. It will be clear that alternatives are possible.

The difference images D2,D3,...,D6 are computed on basis of subtraction of the video images Vl, V2,...,V6 from the respective motion compensated images E2,E3 , ... ,E6. Alternatively, the difference images D2,D3 , ... ,D6 are computed on basis of subtraction of the motion compensated images E2,E3,...,E6 from the respective video images V1,V2,...,V6. A further alternative comprises the computation of the absolute difference between corresponding pixels of the respective images. As an example, it is depicted in Fig. 1 that a particular difference image D3 is computed by subtraction of the particular motion compensated image E3 from the corresponding video image V3. In this context, with corresponding is meant that the particular motion compensated image E3 corresponds to the estimated representation of the scene, in particular the background, for the moment in time in which video image V3 was captured.

Fig. 2 schematically shows a number of motion vector fields MVl, MV2 and a combined motion vector field CMVl . The first one of the motion vector fields MVl is a two- dimensional matrix comprising a number of motion vectors al₅a2,...a9. Each of the motion vectors al₅a2,...a9 corresponds to a respective pixel (or group of pixels) of one of the video images. The second one of the motion vector fields MV2 is a two-dimensional matrix comprising a number of motion vectors bl,b2,...b9. Each of the motion vectors bl,b2,...b9 corresponds to a respective pixel (or group of pixels) of another one of the video images. The respective pixels of the two motion vector fields have mutually equal coordinates. For instance the first pixels for which the first motion vector, which is indicated with reference sign al, is computed corresponds to the pixel for which the second motion vector, which is indicated with reference sign bl, is computed. Also the third pixel for which the motion vector, which is indicated with reference sign cl, corresponds to the first pixel.

There are several approaches for computing the motion vectors of a combined motion vector field CMVi. For instance by computing the average of a number of motion vectors from different motion vector fields, e.g. as specified in Equation 1.

CMV(X y t) = ^MV(X' ^y'' ^{~ 1) + MV(X}' ^y' ° ^{+ MV(X}' ^y'^{t + 1)}

3 (1) With ^CMV(^x-> y> 0 being the combined motion vector of a pixel with coordinates ^x' ^y for moment in time ^t and ^V (x, y,t) being the combined motion vector of a pixel with coordinates *>? for moment in time ^t . An average based on more than three motion vectors is also possible.

Alternatively, the motion vectors of the combined motion vector field CMVi are computed by means of a weighted average, e.g. as specified in Equation 2.

with ^α and P being constants.

Alternatively, the motion vectors of the combined motion vector field CMVi are computed recursively, by making use of a previously computed motion vector, e.g. as specified in Equation 3. CMV(x,y,t) = aCMV(x,y,t -\) + (\ -a)MV(x,y,t) ₍₃₎

Alternatively, spatial filtering is applied to compute the motion vectors of the combined motion vector field CMVl, e.g. as specified in Equation 4.

¹ ' ^N (4) with N being equal to the number of motion vectors used to compute a motion vector CMV(x, y,t) _{of the comb mQ}£ _mot j_on vector field.

Alternatively, order statistical filtering is applied to compute the motion vectors of the combined motion vector field CMVi, e.g. as specified in Equation 5.

CMV(x, y,t) = median(MV(x,y,t -\),MV(x,y,t),MV(x,y,t + 1)) «x

It will be clear that combinations of the different types of filtering as mentioned above, i.e. weighted average, recursive, order statistical and spatial are possible.

Fig. 3 schematically shows a detection unit 300 according to the invention. The detection unit 300 comprises: a motion estimation unit 302 for computing a number of motion vector fields MVi on basis of a number of consecutive video images Vi of the sequence which are provided via the input connector 312; a filter 304 for computing a combined motion vector field CMVi by filtering the number of motion vector fields Vi. Optionally the filter 304 is a recursive filter, meaning that the output of the filter 304 is provided as input to the filter 304 by means of the internal connection indicated with reference number 316; - a motion compensation unit 306 for computing a motion compensated image

Ei on basis of a first one of the video images Vi and the combined motion vector field CMVi. The motion compensation unit 306 is provided with video images by means of the internal connection indicated with reference number 318 and is connected with the output of the filter 304. The motion compensation unit 306 is arranged to compute the motion compensated image by means of interpolation of pixel values from one or more of the video images Vi; - a subtraction unit 308 for computing a difference image Di on basis of a second one of the video images and the motion compensated image Ei. The subtraction unit 308 is connected to the input connector 312 by means of the internal connection which is indicated with reference number 320. The subtraction unit 308 comprises a (non-depicted) memory device for temporarily storage of one or more video images. Alternatively, the memory device is external to the subtraction unit 308 and even external to the detection unit 300. Alternatively, the memory device is shared with other units of the detection unit 300; and a selection unit 310 for selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving obj ect. The group of selected pixels are provided at the output connector 314 of the detection unit 300.

The motion estimation unit 302, the filter 304, the motion compensation unit 306, the subtraction unit 308 and the selection unit 310 may be implemented using one processor. Normally, these functions are performed under control of a software program product. During execution, normally the software program product is loaded into a memory, like a RAM, and executed from there. The program may be loaded from a background memory, like a ROM, hard disk, or magnetical and/or optical storage, or may be loaded via a network like Internet. Optionally an application specific integrated circuit provides the disclosed functionality. The motion estimation unit 302 is preferably as disclosed in the article "True-

Motion Estimation with 3-D Recursive Search Block Matching" by G. de Haan et al. in IEEE Transactions on circuits and systems for video technology, vol.3, no.5, October 1993, pages 368-379.

The working of the detection unit 300 according to the invention is as follows. First an estimate of the motion of the background is made. Assuming that the objects with independent motion, i.e. those to be detected, are small compared to the dimensions of the video images, an estimate of the motion of the background is made by combining a number of motion vector fields. The combining corresponds to low-pass filtering. The filtering is such that the global motion prevails and the optionally detected motion of small objects is removed.

By appropriate configuration of the motion estimation unit 302 movements of relatively small objects are even not detected by the motion estimation unit 302. That means that the motion estimation unit 302 is configured to assign motion vectors to pixels corresponding to relatively small independently moving objects, which are substantially equal to motion vectors which do not correspond to the relatively small independently moving objects but correspond to the background. The configuration may be such that motion vectors are assigned to relatively large blocks of pixels or that the parameters which influence the convergence of the motion estimation, e.g. the number of different candidate motion vectors in the set of candidate motion vectors to be evaluated is such that deviations are hardly detected.

Then a motion compensated image Ei is computed on basis of the combined motion vector field CMVi. Ei = f (CMVi, Vi -ή ₍₆₎

On basis of the motion compensated image a difference image Di is computed

Di = Ei - Vi (7)

The difference image Di is further analyzed to determine whether there is a group of mutually connected pixels having extreme values. It will be clear that the operations as specified in Equation 6 and 7 can be combined in a single operation. That means that for the different pixels the following equation holds:

D(x, y, t) = \V(x, y, t) - V(x - CMVx(x, y, t), y - CMVy(x, y, t), t - 1)| _(g)

wherein ^^M"^χ\^χ^y^⁾ _1S the x component of the combined motion vector of the pixel with coordinates ¹^ for moment in time ^t and CMVy^(X, y,t) -_g ^_{6 y com}p_Onent _of the combined motion vector of the pixel with coordinates ^x^ for moment in time *

For robust detection the pixels of the difference image Di are spatially integrated, e.g. by summing the pixels values within blocks of 4*4 pixels. By thresholding the difference image or preferably the integrated difference image a decision is made if a group of pixels or a block of pixels corresponds to an independently moving object or not. Preferably the binary detection mask is filtered with a morpho logical filter to fill small holes between detected pixels or detected blocks. Typically, connected clusters of detected pixels which are larger than a minimum size are identified as (parts of) independently moving objects.

Fig. 4 schematically shows a detection unit 400 according to the invention comprising a matching unit 402. The embodiment of the detection unit 400 as schematically shown in Fig. 4 is essentially equal to the embodiment of the detection unit 300 as disclosed in connection with Fig. 3. A difference is the matching unit 402 for matching the group of mutually related pixels of a first one of the difference images with a further group of mutually related pixels for a subsequent one of the difference images. By matching groups temporal stability of detection is achieved, meaning that the or optionally multiple objects are tracked. To elucidate on that, the following example is given.

It may be that in a number of subsequent difference images the detection unit 400 is arranged to detect respective groups of mutually connected pixels having values which are relatively high or low, while in a particular difference image which is currently in consideration no such group of mutually connected pixels is not directly detected in first instance. Such detection may be unsuccessful because e.g. none of the pixels in the particular difference image exceed a predetermined threshold. However on basis of one or more actual detections in previous difference images a further search for a group of mutually connected pixels is performed in the particular difference image with modified selection criteria. Typically, the further search for a group of mutually connected pixels is restricted to a more limited search area, which is located in the neighborhood were the group of mutually connected pixels is expected on basis of the previous actual detections. Preferably the matching unit 402 is provided with motion vectors in order to establish the local neighborhood for the search. The motion vectors may be provided by the motion estimation unit 302 as depicted. Alternatively, the motion vectors are provided by the filter 304. The detection unit 300 and 400 according to the invention may be configured for several types of applications, i.e. various domains. An embodiment of the detection unit according to the invention may be applied to detect a ball in a video sequence representing a football match. In connection with Fig. 5 an image processing apparatus 500 comprising an embodiment of the detection unit according to the invention being configured for that task is disclosed. In Fig. 6 an example image of another application domain is shown: infra red images. It will be clear that embodiments of the detection unit may be applied in alternative application domains.

Fig. 5 schematically shows an embodiment of the image processing apparatus 500 according to the invention, comprising: Receiving means 502 for receiving a signal representing video images. The signal may be a broadcast signal received via an antenna or cable but may also be a signal from a storage device like a VCR (Video Cassette Recorder) or Digital Versatile Disk (DVD). The signal is provided at the input connector 510; - An image processing unit 504 for calculating a sequence of overview images on basis of the succession of video images. A part of the processing is the detection and tracking of a ball in the video images in order to create the overview images, which comprise a visualization of the trajectory, which the ball made during a certain period of time. For the detection of the ball the detection unit 300,400 according to the invention, as described in connection with Fig. 3 or Fig. 4, respectively is used. Preferably, the detection unit 300 and 400 is integrated in the image processing unit 504. Alternatively the output of the detection unit 300 and 400 is provided to control the image processing unit 504; and

A display device 506 for displaying the output images of the image processing unit 504. This display device 506 is optional. The image processing apparatus 500 might e.g. be a TV. Alternatively the image processing apparatus 500 does not comprise the optional display device 506 but provides the output images to an apparatus that does comprise a display device 506. Then the image processing apparatus 500 might be e.g. a set top box, a satellite-tuner, a VCR player or a DVD player. But it might also be a system being applied by a film-studio or broadcaster. Fig. 6 schematically shows a video image 600 on which an overlay 602 is drawn which forms a bounding box of a detected moving object 604. The video image represents a infra red image which is acquired by a camera which was attached to a driving car. The moving object is a pedestrian.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be constructed as limiting the claim. The word 'comprising' does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitable programmed computer. In the unit claims enumerating several units, several of these units can be embodied by one and the same item of hardware or software. The usage of the words first, second and third, etcetera do not indicate any ordering. These words are to be interpreted as names.

Claims

CLAIMS:

1. A detection unit (300) for detecting a moving object in a sequence of video images, the detection unit comprising: a motion estimation unit (302) for computing a number of motion vector fields on basis of a number of consecutive video images of the sequence; - a filter (304) for computing a combined motion vector field by filtering the number of motion vector fields; a motion compensation (306) unit for computing a motion compensated image on basis of a first one of the video images and the combined motion vector field; a subtraction unit (308) for computing a difference image on basis of a second one of the video images and the motion compensated image; and a selection unit (310) for selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving object.

2. A detection unit as claimed in claim 1 , wherein the filter is arranged to compute the combined motion vector field by averaging respective motion vectors of the number of motion vector fields.

3. A detection unit as claimed in claim 2, wherein the filter is arranged to compute the combined motion vector field by weighted averaging respective motion vectors of the number of motion vector fields.

4. A detection unit as claimed in claim 1, wherein the filter is arranged to compute the combined motion vector field by order statistical filtering respective motion vectors of the number of motion vector fields.

5. A detection unit as claimed in any of the claims 1-4, wherein the filter is arranged to compute the combined motion vector field by recursively filtering respective motion vectors of the number of motion vector fields.

6. A detection unit as claimed in any of the claims 1-5, wherein the selection unit is arranged to select the group of mutually related pixels, wherein the pixels of the group are mutually connected.

7. A detection unit as claimed in any of the claims 1-6, wherein the selection unit is arranged to select the group of mutually related pixels on basis of the spatial distribution of the pixels of the group.

8. A detection unit as claimed in any of the claims 1-7, wherein the selection unit is arranged to select the group of mutually related pixels on basis of a ratio between a first number of pixels of the group being horizontally disposed and a second number of pixels of the group being vertically disposed.

9. A detection unit as claimed in any of the claims 1-8, wherein the selection unit is arranged to select the group of mutually related pixels on basis of values of pixels of the second one of the video images corresponding to the respective pixels of the group.

10. A detection unit as claimed in any of the claims 1-9, further comprising a matching unit for matching the group of mutually related pixels with a further group of mutually related pixels being determined on basis of a third one of the video images, which succeeds the second one of the video images.

11. An image processing apparatus (400) comprising: - receiving means (402) for receiving a signal corresponding to a sequence of video images; and a detection unit for detecting a moving object in the sequence of video images, as claimed in any of the claims 1-10.

12. A method of detecting a moving object in a sequence of video images, the method comprising: computing a number of motion vector fields on basis of a number of consecutive video images of the sequence; computing a combined motion vector field by filtering the number of motion vector fields; computing a motion compensated image on basis of a first one of the video images and the combined motion vector field; computing a difference image on basis of a second one of the video images and the motion compensated image; and selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving object.

13. A computer program product to be loaded by a computer arrangement, comprising instructions to detect a moving object in a sequence of video images, the computer arrangement comprising processing means and a memory, the computer program product, after being loaded, providing said processing means with the capability to carry out: computing a number of motion vector fields on basis of a number of consecutive video images of the sequence; - computing a combined motion vector field by filtering the number of motion vector fields; computing a motion compensated image on basis of a first one of the video images and the combined motion vector field; computing a difference image on basis of a second one of the video images and the motion compensated image; and selecting a group of mutually related pixels in the difference image having extreme values, the group of mutually related pixels representing the moving object.