WO2013135964A1

WO2013135964A1 - A method, an apparatus and a computer program for estimating a size of an object in an image

Info

Publication number: WO2013135964A1
Application number: PCT/FI2013/050280
Authority: WO
Inventors: Markus KUUSISTO
Original assignee: Mirasys Oy
Priority date: 2012-03-14
Filing date: 2013-03-13
Publication date: 2013-09-19
Also published as: FI20125275L

Abstract

The present invention provides an arrangement, e.g. a method, an apparatus and a computer program, for estimating a size of an object in an image plane in an image of a sequence of images. The arrangement comprises determining a first position on basis of information indicating a position of the object in the image plane, using a predetermined mapping function configured to determine a size of a reference object in the image plane on basis of a position of the object in the image plane to determine a reference size in the first position, and estimating the size of the object in the first position on basis of the reference size in the first position and a scaling function.

Description

A method, an apparatus and a computer program for estimating a size of an object in an image

FIELD OF THE INVENTION The invention relates to image processing. In particular, the invention relates to a method, an apparatus and a computer program for estimating a size of an object as depicted in an image of a sequence of images.

BACKGROUND OF THE INVENTION

Image analysis and processing techniques involved in analysis of images to identify an object and track the movement thereof based on images of a sequence of images are typically computationally demanding. Moreover, simultaneous identification and tracking of multiple objects where the objects in an image may be occasionally fully or partially overlapping is a challenging task. One particularly challenging aspect in identification and tracking of an object that may fully or partially overlap with another object in an image is estimation of the true position and size of such an object. The existing techniques do not provide good performance at a reasonable computational complexity for analysis and/or estimation of overlapping objects in an image in this regard.

SUMMARY OF THE INVENTION It is an object of the present invention to provide a method, an apparatus and a computer program that facilitates computationally efficient but yet accurate estimation of a size of an object in an image of a series of images to facilitate further processing of one or more images of the sequence of images, particularly in a scenario where a part of the object is not visible in the image. An example of such processing of one or more images is determination of an estimated position and/or size of an object as depicted in an image overlapping with another object depicted in the image.

The objects of the invention are reached by a method, an apparatus and a computer program as defined by the respective independent claims. According to a first aspect of the invention, a method for estimating a size of an object in an image plane in an image of a sequence of images is provided, the method comprising determining a first position on basis of information indicating a position of the object in the image plane, using a predetermined mapping function configured to determine a size of a reference object in the image plane on basis of a position of the object in the image plane to determine a reference size in the first position, and estimating the size of the object in the first position on basis of the reference size in the first position and a scaling function.

According to a second aspect of the invention, an apparatus for estimating a size of an object in an image plane in an image of a sequence of images is provided, the apparatus comprising an image analyzer configured to determine a first position on basis of information indicating a position of the object in the image plane, and an object estimator configured to use a predetermined mapping function configured to determine a size of a reference object in the image plane on basis of a position of the object in the image plane to determine a reference size in the first position, and to estimate the size of the object in the first position on basis of the reference size in the first position and a scaling function.

According to a third aspect of the invention, a computer program for estimating a size of an object in an image plane in an image of a sequence of images is provided, the computer program comprising one or more sequences of one or more instructions which, when executed by one or more processors, cause an apparatus to at least perform a method in accordance with the first aspect of the invention.

The computer program may be embodied on a volatile or a non-volatile com- puter-readable record medium, for example as a computer program product comprising at least one computer readable non-transitory medium having program code stored thereon, the program code, which when executed by an apparatus, causes the apparatus at least to perform the operations described hereinbefore for the computer program in accordance with the third aspect of the invention.

The exemplifying embodiments of the invention presented in this patent application are not to be interpreted to pose limitations to the applicability of the appended claims. The verb "to comprise" and its derivatives are used in this patent application as an open limitation that does not exclude the existence of al- so unrecited features. The features described hereinafter are mutually freely combinable unless explicitly stated otherwise.

The novel features which are considered as characteristic of the invention are set forth in particular in the appended claims. The invention itself, however, both as to its construction and its method of operation, together with additional objects and advantages thereof, will be best understood from the following detailed description of specific embodiments when read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 a illustrates a coordinate system used to describe an image plane.

Figure 1 b illustrates a coordinate system used to describe a real world.

Figure 2 illustrates a principle of the concept of estimating a size of an object in an image based on its distance from the bottom of the image.

Figure 3 schematically illustrates an apparatus in accordance with an embodi- ment of the invention.

Figure 4 illustrates the principle of linear fitting for determination of a mapping function.

Figure 5 provides a flowchart illustrating a method in accordance with an embodiment of the invention. DETAILED DESCRIPTION

Figure 1 a illustrates a coordinate system used to describe an image plane 100 and an image 101 in the image plane 100 in this document. The image plane 100 can be considered to comprise a number of pixels, positions of which are determined by coordinates along a u axis and a v axis, and where the origin of the coordinate system determined by the u and v axes is at the center of the image 101 on the image plane 100. The origin could and even the directions of the axes could naturally be selected differently; many conventional image processing applications place the origin in the top left corner and make the magnitude of the v coordinate increase downwards. For brevity and clarity of descrip- tion, without losing generality, in the following a position along the u axis may be referred to as a horizontal position and a position along the v axis is referred to as a vertical position. Terms left and right may be used to refer to a position in the direction of the u axis, and terms up and down may be used to refer to a position in the direction of the v axis. Moreover, an extent of an ob- ject in the direction of the u axis is referred to as width of the object and an extent of the object along the direction of the v axis is referred to as a height of the object.

Figure 1 b illustrates a coordinate system 1 10 used to describe a real world, projection of which is mapped as an image on the image plane upon capture of an image. A position in the real world may be expressed by coordinates in the x, y and z axes, as illustrated in Figure 1 b. A coordinate in direction of x, y and/or z axes may be expressed as a distance from the origin, for example in meters. The x and z axes can be considered to represent a plane that approximates the ground level. While this may not be exactly accurate representation of the ground, which locally may comprise hills and slopes and which in the larger scale is actually a geoid, it provides sufficient modeling accuracy. Consequently, the y axis can be considered as the height from the ground level - or from the plane approximating the ground level.

Figure 1 c schematically illustrates a relationship between the real world coor- dinate system 1 10 and the image plane 101 . Figure 1 c shows the x, y and z axes of the real world coordinate system 1 10 such that the x axis is perpendicular to the figure. The illustration of the image plane 100 in Figure 1 c explicitly indicates the direction of the v axis, whereas the u axis is assumed to be perpendicular to the figure. The parameter y_c indicates the height of the focal point of an imaging device from the ground level represented by the x and z axes, f denotes a focal length of the imaging device along an imaginary line perpendicular to the image plane 100, and θ_χ denotes the angle between the imaginary line perpendicular to the image plane 100 and the horizon plane 121 , i.e. the tilt angle of the imaging device. In the following, the relationship between a point in the real world coordinate system 1 10 and the corresponding point in the image plane 100 is described.

Since the coordinate system 1 10 employed hereinbefore to model the real world is a left-handed coordinate system, a minus sign is added in front of the u and v coordinates representing the corresponding point - or pixel position - of the image plane 100. Let's make the following definitions.

0

K / (1 )

0

uw

U —vw (3) w

where K is a projection matrix of the imaging device, R is a rotation matrix, U denotes the projection of the real-world point in the image plane in homogenous coordinates, and X denotes the real-world coordinates of a point to be projected. Note that the imaging device projects a point in the real world coordinate system into the two-dimensional image plane. The vector U introduces an additional dimension w as the third dimension of the projected point. The actual projected point may be recovered by dividing the components of the vector U by w.

Consequently, the projection of a point (x, y, z) in the real world coordinate system 1 10 on the image plane 100 is

U = WKRX - KRX

z (5) and hence the position in the image plane may be expressed as uw fWx

—vw W(f(y_c - y) cos θ_χ - fz sin θ_χ) (6) w W{z cos θ_χ + (y_c - y) sin θ_χ)

An image, such as the image 101 , may be part of a sequence of images. A sequence of images is considered as a time-ordered set of images, where each image of a sequence of images has its predetermined temporal location within the sequence with known temporal distance to the immediately preceding and following images of the sequence. A sequence of images may origi- nate from an imaging device such as a (digital or analog) still camera, from a (digital or analog) video camera, from a device equipped with a camera or a video camera module etc., configured to capture and provide a number of images at a predetermined rate, i.e. at predetermined time intervals. Hence, a sequence of images may comprise still images and/or frames of a video sequence.

For the purposes of efficient analysis and prediction of movement within images of a sequence of images, the images preferably provide a fixed field of view to the environment of the imaging device(s) employed to capture the images. Preferably, the images of a sequence of images originate from an imaging device that has a fixed position throughout the capture of the images of the sequence of images, thereby providing a fixed or essentially fixed field of view throughout the sequence of images. Consequently, any fixed element or object in the field of view of the imaging device remains at the same position in each image of the sequence of images. On the other hand, objects that are moving in the field of view may be present in only some of the images and may have a varying position in these images.

However, it is also possible to generate a sequence of images representing a fixed field of view on basis of an original sequence of images captured by an imaging device that is not completely fixed but whose movement with respect its position or orientation is known, thereby providing a field of view that may vary from one image to another. Assuming that the orientation and position of the imaging device for each of the images of the original sequence images is known, it is possible to apply pre-processing to modify images of the original sequence of images in order to create a series of images having a fixed field of view.

As example, an imaging device may be arranged to overlook a parking lot, where the parking area, driveways to and from the parking area and the surroundings thereof within the field of view of the imaging device are part of the fixed portion of the images of the sequence of images, whereas a changing portion of the images of the sequence of images comprises e.g. people and cars moving within, to and from the parking area. As another example, an imaging device may be arranged to overlook a portion of an interior of a building, such as a shop or a store. In this another example the fixed portion of the im- ages may comprise shelves and other structures arranged in the store and the items arranged thereon, whereas the changing portion of the images may comprise e.g. the customers moving in the store within the field of view of the imaging device.

For the purposes of efficient analysis and prediction of movement within imag- es of a sequence of images, an imaging device employed to capture the images is preferably positioned in such a way that the camera horizon is in parallel with the plane horizon, consequently resulting in a horizon level in the image plane to be an imaginary line that is in parallel with the u axis. Hence, the horizon level within the image plane may be considered as an imaginary horizontal line at a certain distance from the u axis - or from an edge of the image, the certain distance being dependent on the vertical orientation of the imaging device. In case the vertical orientation of the imaging device does not enable representing the horizon level within the captured image, the horizon level may be considered as an imaginary line in the image plane that is in parallel with the u axis but which is outside the image.

In case an imaging device positioned such that there is a non-zero angle of known value between the camera horizon and the plane horizon, preprocessing of images of the captured sequence of images may be applied in order to modify the image data to compensate for the said angle to provide a sequence of images where the horizon can be represented as an imaginary line that is in parallel with the u axis of the image plane.

With the assumption that the images of a sequence of images represent a fixed field of view, an object moving in the field of view may be detected by observing any changes between (consecutive) images of the sequence. As ex- ample, an object in an image, i.e. a set of pixel positions in an image, may be identified by comparing the image to a reference image comprising only the fixed portion of the field of view of the images of a sequence of images and identifying the set of pixels that is not present in the reference image. An object in an image may be determined by indicating its position in the image plane together with its shape and/or size in the image plane, all of which may be expressed using the u and v coordinates of the image plane.

Once an object is detected in an image of a sequence of images, a data record comprising information on the object may be created. The information may comprise for example the current and previous positions of the object, the cur- rent and/or previous shape(s) of the object, the current and previous size(s) of the object, an identifier of the object and/or any further suitable data that can be used to characterize the object.

In case multiple objects are detected in an image, a dedicated data record may be created and/or updated for each of the objects.

An object moving within the field of view of the imaging device is typically depicted in two or more images of the sequence of images. An object detected in an image can be identified as the same object already detected in a previous image of the sequence by comparing the characteristics - e.g. with respect to the shape of the object - of the object detected in the image to characteristics of an image detected in a previous image (e.g. as stored in a corresponding data record).

Hence, it is possible to track the movement of the object by determining its position in a number of images of the sequence of images and by characterizing the movement on basis of the change in its position over a number of images. In this regard, the information on the position(s) of the object in a number of images may be stored in the data record comprising information on the object in order to enable subsequent analysis and determination of a movement pattern of the object. Due to the movement the positions of two objects identified as two objects detected or identified in a previous image of the sequence of images as separate objects may overlap, fully or in part, in an image of the sequence. Consequently, such two objects may merge into a combined object for one or more images of the sequence, while they may again separate as individually identifiable first and second objects in a subsequent image of the sequence. In a similar manner, an object initially identified as a single individual object, e.g. at or near a border of an image of the sequence, may in a subsequent image separate as two individual objects spawn from the initial single object. Information indicating merging of two objects into a combined object and/or separation of a (combined) object into two separate objects may be kept in the data record comprising information on the object in order to facilitate analysis of the evolution of the object(s) within the sequence of images.

While it would be possible to separately determine a position of each pixel in an image representing a given object, in case of an object whose shape or ap- proximation thereof is known, e.g. based on a data record comprising information on the object, it is sufficient to determine the position of the group of pixels representing the object in an image as a single position in the image plane. Such determination of position is applicable, in particular, to objects having a fixed shape or having a shape that only slowly evolves in the image plane, resulting in only a small change in shape of the object from one image to another.

A position of an object whose shape or approximation thereof is known may be determined or expressed for example as the position(s) of one or more prede- termined parts of the object in the image plane. An example of such a predetermined part is a pixel position indicating a geographic center point of the object, thereby - conceptually - indicating a center of mass of the object (with the assumption that each pixel position of the modified first set of pixel positions representing the enlarged first object represents an equal 'mass'). The geo- graphic center point of an object in an image may be determined for example as the average of the coordinates of the pixel positions representing the object in the image.

Another example for using predetermined part(s) of an object to indicate a position of the object in an image involves determining at least one of a lower boundary and an upper boundary together with at least one of a left boundary and a right boundary of an imaginary rectangle enclosing the pixel positions representing the object by touching the lowermost, the uppermost, the leftmost and the rightmost pixel positions representing the object in the image plane. Such a rectangle may be referred to as a bounding box. The lower and upper boundaries may be expressed as a v coordinate, i.e. as a position in the v axis, whereas the left and right boundaries may be expressed as a u coordinate, i.e. a position in the u axis.

Consequently, the position of an object may be expressed for example by a coordinate of the u axis indicating the left boundary of a bounding box enclos- ing the object and by a coordinate of the v axis indicating the lower boundary of the bounding box. This is equivalent of expressing the coordinates of the pixel position indicating the lower left corner of the (rectangular) bounding box. In principle the bounding box does not need to have an exactly rectangular shape; it is possible to use e.g. a bounding circle just large enough to enclose all pixels of the object, or a bounding oval with its u and v dimensions selected to match those of the object. However, a rectangular bounding box is the most common and most easily handled in processing.

A size of an object in an image, i.e. in the image plane, may be expressed for example by its dimension(s) along the axis or axes of the image plane. Thus, a size of an object in an image may be expressed as its extent in the direction of the v axis, i.e. as the height of the object in the image. Alternatively or additionally, a size of an object in an image may be expressed as its extent in the direction of the u axis, i.e. as the width of the object in the image. A height and/or a width may be expressed for example as a number of pixel positions corresponding to the height/width in the image plane. Such information may be derived for example with the aid of a bounding box, as described hereinbefore. A further alternative for expressing the size of the object is to indicate either the height or the width of an object, e.g. as a height or width of a bounding box enclosing the object, together with an aspect ratio determining the relationship between the height and width of the object. Since the size of an object as represented in an image may vary over time, the data record comprising information on an object may be employed to keep track of the current (or most recent) size of the object and possibly also of the size of the object in a number of previous images. A shape of an object can be expressed for example by a set of pixel positions or as a two-dimensional 'bitmap' indicating the pixel positions forming the object. Such information may be stored in a data record comprising information on the object. The information regarding the shape of the object may include the current or most recent observed shape of the object and/or the shape of the object in a number of preceding images of the sequence.

Figure 2 schematically illustrates two images 201 , 203 of a sequence of images, the images schematically illustrating a reference object in real-world moving along a plane that is essentially horizontal, for example the plane determined by the x and z axes of the real world coordinate system 1 10 described hereinbefore. Note that only changing portions of images are illustrated in the images 201 , 203 thereby omitting any possible fixed portion (or background objects) of the images for clarity of illustration.

The image 201 illustrates the real-world object as an object 205 having a height h_vi and a width w_vi with its lower edge situated at position v_w in the v axis of the image plane. The image 203 illustrates the real-world object as an object 205' having a height h_v2 and a width w_v2 with its lower edge situated at position v_b2 of the v axis of the image plane. Moreover, a level representing the horizon 207 is assumed to be a line that is parallel to the u axis - and also parallel to the lower and upper edges of the images 201 and 203.

The real-world object in image 201 is closer to the imaging device than in the image 203, and hence the object is depicted in the image 201 as larger than in the image 203. In particular, both the height h_vi of the object 205 in the image 201 is larger than the height h_v2 of the object 205' and width w_vi of the object 205 in the image 201 is larger than the width w_v2 of the object 205' in the image 203. Moreover, since the real-world object depicted in the images 201 and 203 was moving along an essentially horizontal plane, due to the object 205 in the image 201 being closer to the imaging device than the corresponding object 205' in the image 203, the object 205 in the image 201 is closer to the bot- torn of the image than the object 205' in the image 203.

This can be generalized into a rule that a real-world object closer to the imaging device appears closer to the bottom of the image than the same real-world object - or another real-world object of identical or essentially identical size - situated further away from the imaging device. In a similar manner, a real- world object closer to the imaging device appears larger in an image than the same real-world object - or another real-world object of identical or essentially identical size - situated further away from the imaging device. Moreover, a point, either actual or conceptual, where the size of a real-world object as depicted in the image plane would appear zero or essentially zero, represents a point in the image plane - e.g. a level of an imaginary line parallel to the u axis of the image plane, i.e. a v coordinate of the image plane - representing a horizon in the image plane.

Therefore, a real-world object exhibiting movement towards or away from the imaging device - i.e. towards or away from the horizon - is typically depicted as an object of different size and different distance from the bottom of an image in two images of a sequence of images captured using an imaging device arranged to capture a sequence of images with a fixed field of view. Consequently, it is possible to determine and/or use a mapping function configured to determine a height of an object in the image plane on basis of a vertical posi- tion of the object in the image plane. The determined position of the object in two or more previous images may be used to predict the position of the object in a subsequent image. In the following a straightforward example on the principle of predicting the position of an image is provided. The position of an object in image n may be expressed by as a pair of u and v coordinates (u_n, v_n), and the position of the object in image n+1 may expressed as (u_n+i , v_n+ ). Hence, the change in position between the images n and n+1 can be expressed as (u_d, v_d) = (u_n+i - u_n, v_n+ - v_n), thereby indicating the motion of the object in the image plane between two consecutive images of the sequence of images. Consequently, the position of the object in image n+2 may be predicted based on position of the object in image n+1 and the above-mentioned change in position as (u'_n+2, v'_n+2) = (u_n+i + u_d, v_n+ + v_d). Since the change in position is not typically fully constant and hence the prediction may not be fully accurate, the actual position of the object in image n+2, expressed as (u_n+2, v_n+2) may be different from the predicted one, resulting in prediction error of (u_e2, v_e2) = (u_n+2 - u'n+2, v_n+2 - v'_n+2). Moreover, the change in position of the object between two consecutive images may be updated into (u_d, v_d) = (u_n+2 - u_n+i , v_n+2 - v_n+i ) to enable prediction based on the most recently observed change of position between two consecutive images.

The exemplifying prediction described hereinbefore jointly predicts the change of position in the directions of the u and v axis. Alternatively, it is possible to predict the change of position in the direction of u axis and the change of position in the direction of the v axis separately from each, thereby allowing a straightforward use of different prediction schemes for the predictions in directions of the two axes. A prediction as described hereinbefore may be applied to a number of pixels representing an object in an image - for example to all pixels representing an object or to a subset thereof - or the prediction may be applied to a single pixel chosen to represent the object (as described hereinbefore).

While the discussion in the foregoing describes the concept of prediction by using a straightforward example, it is readily apparent that more elaborate prediction schemes can be determined and applied without departing from the scope of the embodiments of the invention.

Figure 3 schematically illustrates an apparatus 300 for estimating a size of an object in an image plane in an image of a sequence of images. The apparatus 300 comprises an image analysis unit 301 and an object estimation unit 303. The image analysis unit 301 may also be referred to as an image analyzer or as an object analyzer and the object estimation unit 303 may be also referred to as an object estimator or an image estimator. The image analysis unit 301 is operatively coupled to the object estimation unit 303. The apparatus 300 may comprise further components, such as a processor, a memory, a user interface, a communication interface, etc. In particular, the apparatus 300 may receive input from one or more external processing units and/or apparatuses and the apparatus 300 may provide output to one or more external processing units and/or apparatuses.

The apparatus 300, and the object estimation unit 303 in particular, may be configured to estimate and/or express a size of an object, as described hereinbefore. As an example, the apparatus 300 may be configured to estimate and/or express a size of an object as a height of the object in the image plane. As another example, the apparatus 300 may be, alternatively or additionally, configured to estimate and/or express the size of an object as a width of the object in the image plane and/or as an extent in a direction different from the u and/or v axes of the image plane. As a further example, the apparatus 300 may be configured to estimate and/or express a size of an object as an extent of the object in a first direction in the image plane, e.g. in the direction of the v axis, together with an aspect ratio determining the relationship between the extent of the object in the first direction and an extent of the object in a second direction in the image plane, the second direction being perpendicular to the first direction. Similar considerations apply also to a reference size, which is a con- cept that will be described in more detail hereinafter.

The image analysis unit 301 may be configured to obtain information indicating a position of an object in an image plane in an image of the sequence of images. The information indicating the position of the object may be an observed position, obtained for example via analysis of the image data, or the infor- mation indicating the position of the object may be an estimated position, obtained for example on basis of prediction, as described hereinbefore.

In the following, for clarity and brevity of description, the term current image is used to refer to an image of a sequence of images in which a size of an object of interest is to be estimated. Moreover, the term current position is used to re- fer to the position indicative of the position of the object of interest in the image plane in the current image.

The image analysis unit 301 may be configured to obtain information indicating a position of an object in the image plane for example by performing an analy- sis of image data of a number of images of the sequence of images in order to identify an object of predetermined characteristics and its position in the image plane. Image analysis techniques for detecting and identifying an object of predetermined characteristics in an image known in the art may be used for this purpose. The output of such analysis may comprise indication of pixel po- sition(s) in the image plane indicating a position of the object in the image plane.

Alternatively, the image analysis unit 301 may be configured to receive information indicating a position of an object in the image plane by receiving an indication of a pixel position or pixel positions of the image plane indicating a po- sition of the object in the image plane. Such information may be received, for example, from another processing unit of the apparatus 300 or from a processing unit outside the apparatus 300, such processing unit being configured to apply image analysis in order to determine a presence of an object of predetermined characteristics and a position thereof in the image plane. As another alternative, the information indicating a position of an object in the image plane may be received, for example, based on input from a user. The user may indicate an object of interest in an image via a suitable user interface (such as display & pointing device, a touchscreen, etc.), for example by indicating one or more of a lower, upper, left and right boundaries of the object in the image plane. As a particular further example, the user may be involved in initial detection of an object, whereas the image analysis unit 301 may be configured to track the object indicated by the user in the subsequent (and/or preceding) images of the sequence of images.

The apparatus 300, and the image analysis unit 301 and/or the object estima- tion unit 303 in particular, may be configured to estimate, express and/or determine information indicating a position of an object in the image plane as described hereinbefore.

As an example the apparatus 300 may be configured to estimate, express and/or determine information indicating a position of an object in the image plane by information that may comprise, for example, a position indicating a lower boundary of the object in the image plane or a position indicating an upper boundary of the object in the image plane. Additionally or alternatively, the information indicating a position of an object may comprise for example a posi- tion indicating a left boundary of the object or a position indicating a right boundary of the object, as described hereinbefore.

As a further example, the information indicating a position of an object in the image plane may indicate a position of any predetermined or otherwise determinable part of an object in the image plane, which, together with information regarding the shape of the object may be employed to determine or estimate a position of the object in the image plane.

The object estimation unit 303 is configured to use a predetermined mapping function to determine a reference size in the current position, wherein the predetermined mapping function is configured to determine a reference size of an object in the image plane on basis of a position of the object in the image plane. In particular, the predetermined mapping function may be configured to determine the reference size as a size of a reference object in a given position in the image plane. The reference object may or may not have a known real- world size. The determined reference size in the current position may be employed to estimate the size of an object of interest in the current position of the image plane by making use of a scaling function, as described in detail hereinafter.

A predetermined mapping function may be configured to base the determination of the size of an object in the image plane on a distance of the position of the object from a predetermined reference level in the image plane. The predetermined reference level is preferably a predetermined position in the direction of the v axis of the image plane, which hence can be considered as an imaginary line that is in parallel to the u axis of the image plane. Thus, the position of the predetermined reference level may be expressed for example as a v co- ordinate of the image plane or as a distance, as a number of pixel positions, from the bottom of the image and/or from the top of the image. Alternatively, or additionally, the position of the predetermined reference level may be expressed indirectly, for example by information that enables determination of the position in the image plane representing the predetermined reference level. The distance between an object in the image plane and the predetermined reference level may be expressed as a number of pixel positions between a pixel position indicating a position of an object in the image plane and a position indicating the predetermined reference level, for example as a difference be- tween a v coordinate of the pixel position indicating the position of the object in the image plane and the v coordinate indicating the position of the predetermined reference level in the image plane. The predetermined reference level may represent a horizon level or another suitable reference level in the image plane for images of the sequence of images. The object estimation unit 303 may be configured to obtain a predetermined mapping function or to obtain information enabling access to a predetermined mapping function. The predetermined mapping function may be determined on basis of a number of images of the sequence of images to which the current image belongs, or the predetermined mapping function may be determined on basis of another sequence of images exhibiting similar or essentially similar field of view to the environment of the imaging device(s) as the sequence to which the current image belongs.

The apparatus 300 may be further configured to determine a mapping function for determining a reference size for an object in the image plane on basis of a position of the object in the image plane, e.g. for determining a size of a reference object in a given position in the image plane. In particular, the apparatus 300 may be configured to determine such mapping function on basis of a number of images of the sequence of images to which the current image belongs. Said number of images may comprise images preceding the current im- age in the sequence and/or images following the current image in the sequence, possibly together with the current image.

A detailed description of determination of a mapping function on basis of observed positions and sizes of two or more objects in the image plane in one or more images of a sequence of images, wherein the sizes of the two or more objects correspond to real-world objects having similar or essentially similar sizes is provided hereinafter.

A predetermined mapping function may be based on a linear function. Such a mapping function may be of the form h_v = av_b + b, (7) where h_v represents a size of a reference object in the image plane, v¾ represents a position of the object in the image plane in direction of the v axis, and a and b represent mapping parameters, which may be determined based on observed data. In particular, h_v may represent a height of the reference object at distance v_b in the direction of the v axis from the origin of the image plane.

Consequently, it is possible to directly use the equation (7) with the determined values of the parameters a and b to determine or estimate a size of the reference object at a given position in the direction of the v axis. In a practical implementation, however, it may be beneficial to determine a number of esti- mates of a function according to the equation (7), i.e. a number of pairs of the parameters a and b, for improved reliability and/or accuracy, where some of the estimates are possibly derived on basis of reference object of different (real-world) size. Consequently, straightforward application of a number of functions according to the equation (7) would result in a number of reference sizes, possibly corresponding to a number of reference objects of different (real- word) sizes - and hence different relationship with a size of the object of interest in the current image.

Therefore, it may be more convenient to configure the mapping function to determine a size of a reference object in the image plane on basis of a parameter or parameters derivable based on the function or functions of the form of the equation (7). In particular, it may be beneficial to configure the mapping function to determine a size of a reference object in the image plane on basis of a parameter or parameters derivable from a number of functions of the form of the equation (7) which parameters originating from separate functions of the form of the equation (7) may be combined into a single combined parameter using a straightforward mathematical operation, such as averaging.

An example of such parameter is a position of a reference level in the image plane, and hence the mapping function may be configured to determine a size of a reference object in the image plane on basis of a distance of the position of an object of interest from the reference level in the image plane. An example of a suitable reference level is an estimated or known horizon level in the image plane. However, while using the horizon level in the image plane as the reference level may provide some advantages, as discussed hereinafter, any other reference level may be used as well. In particular, an equation of the form of the equation (7) may be used to determine a position in the direction of the v axis of the image plane, i.e. a v coordinate of the image plane, representing a horizon as a predetermined reference level. As described hereinbefore, the size of an object at the horizon may be assumed to become zero, and hence a size of the object at the horizon level may be determined as h_v = av_b + b = 0, implying that the horizon level would be at v_h = -bla. The v axis coordinate of the image plane representing the horizon level may be derived separately on basis of a number of functions of the form of the equation (7), hence resulting in a number of estimated positions representing the horizon in the image plane that may be combined into a refined single estimate for example by computing the arithmetic mean of the estimated positions or by using other suitable approach for combining a number of estimated values into a refined single estimate.

Instead of using a linear mapping function of the form indicated by the equa- tion (7), a parabolic function or a 'skewed' parabolic function, i.e. a second order function, may be used as the mapping function. As an example of using a parabolic mapping function and using the horizon level as the reference level, see equations (15) to (21 ) and the associated description hereinafter.

In case of a linear mapping function, another example of suitable parameters representing the reference level in the image plane that are derivable on basis of a number of separate functions of the form of the equation (7) that may be combined into a single combined parameter using a straightforward mathematical operation is a parameter indicative of the slope of a function of the form of the equation (7) together with a reference size at a predetermined point in the direction of the v axis of the image plane. An example of assuming the u axis of the image plane, i.e. a position where the v coordinate of the image plane is zero, as said predetermined point is described in the following.

Assuming we have an initial estimates of the parameters a and b, denoted as a₀ and /¾, one may determine a reference size for the mapping function, re- ferred to in the following also as a second reference size, at the level of u axis of the image plane, i.e. at v_b = 0, as = /¾. The parameter may be denoted as h_ref in order to emphasize its role as the second reference size. Together with the above-discussed fact that the corresponding reference level, defined as a position of the horizon level is determined as v_h = -bla implying that v¾o = - bola₀, one may further define an angle φ₀ = arctan(-1 /a₀). Hence, the pair of pa- rameters and φ₀ may be used to characterize the initial position of the horizon level in the image plane based on the initial estimate, which may be also computed as ντ,ο = h_ret * tan φ₀.

Further assuming another estimates of the parameters a and b, denoted as a^ and £>i, one may come up with another pair of parameters v_M = -^ΙΆ_Λ and φι = arctan(-I Aa-i) characterizing the position of the horizon level in the image plane based on the another estimate at v_M = In order to combine the another estimate with the initial estimate, one may use the another estimate of the position of the horizon level v_M to find the adjusted angle as φ = arctan(v / )- The adjusted angle compensates for the difference between the sizes of the real-world objects - and hence the corresponding sizes thereof as depicted on the image plane - such that a combined angle q>_avg = (φ₀ + φΊ) I 2 may be computed as the arithmetic mean, enabling computation of the combined estimate of the horizon level as v_avg = h_ref * tan q>_avg. Any possible further estimates of the parameters a and b may be incorporated into the combined estimate in a similar manner as the another estimate discussed in the foregoing.

A further example, a mapping function based on a linear model, such as a function of the form indicated by the equation (7), may be configured to determine a reference size of an object in the image plane on basis of a position of the object in the image plane by making use of the second reference size h_ret at the position v_ret and a known or estimated position of the horizon level in the image plane v_avg. Since by definition one may assume that the size of any object at a horizon is zero and hence also the size of a reference object at the horizon level of the image plane is zero, assuming that v_c denotes the current position of the object of interest in the image plane, one may estimate the size of a reference object in the image plane at v_c as

In case the second reference size h_ref is estimated at the origin of the image plane, i.e. v_ref = 0, the mapping function may be simplified into h_c = O_c) = href * ^ = ^href * (1 ^~ T^) (9)

In particular, if the position v_c indicates the position of a lower boundary of the object of interest, the equation (9) may be used to directly determine the size of the reference object in the image plane in the current position of the object of interest.

The object estimation unit 303 is configured to estimate the size of the object of interest in the current position on basis of a reference size in the current po- sition and a scaling function. A scaling function may be configured to determine a size of an object of interest, for example, solely on basis of a reference size in the current position or on basis of a reference size in the current position and the current position of the object of interest.

The scaling function is predetermined in that it is, preferably, determined on basis of observed or otherwise known size(s) of the object of interest in consideration of its/their relationship with the size of a reference object. The observed/known size(s) of the object of interest may be based on observation(s) of the real world size(s) of the object of interest or on observation(s) of the size(s) of the object of interest as depicted in the image plane in one or more images of the sequence other than the current image. These one or more images may comprise images preceding the current image in the sequence and/or images following the current image in the sequence, possibly together with the current image.

Alternatively or additionally, the scaling function may be determined, or updat- ed, at least in part on basis of an object of similar or essentially similar size as the object of interest in any image of the sequence of images, including the current image.

A scaling function may be configured to determine or estimate a size of an object in a given position of the image plane by multiplying a reference size at the given position by a scaling factor that is indicative of the ratio between the size of the object and the size of a reference object.

As an example, the ratio between the sizes of the object and the reference object, and hence the scaling factor, may be determined as a ratio between the known real-world sizes of the object and the reference object. As another example, the ratio between the sizes of the object and the reference object, and hence the scaling factor, may be determined as a ratio between an observed size of the object in a second position and the size of the reference object in the corresponding position of the image plane, i.e. the ref- erence size in the second position. The second position may be a position in the image plane in another image of the sequence, as described hereinbefore, whereas the reference size in the second position can be obtained by the predetermined mapping function on basis of the second position. The second po- sition may be the same position of the image plane as the current position, or the current and second positions may be different positions of the image plane. Preferably, the observations of an object where the object is not depicted in the image in full are excluded from the determination of the scaling factor.

As a further example, the ratio between the sizes of the object and the refer- ence object, and hence the scaling factor, may be determined as an average of two or more ratios between an observed size of the object and the respective size of the reference object in two or more positions in the image plane. The averaging of ratios may involve for example determining the ratios between an observed size of the object in the image plane and the size of the reference object in respective position for a predetermined number of positions in the image plane and computing an average of the determined ratios. The average employed therein may be for example arithmetic mean or a weighted average. A weighted average may be determined for example by multiplying each determined ratio by a weight having a value that is increasing with in- creasing observed size of the object in the image plane used to determine the respective ratio, thereby giving more emphasis on the observations of the object that are close(r) to the imaging device.

The scaling function may be determined on basis of pre-analysis of image data of a number of images of a sequence of images such that, in consideration of the current image, images preceding the current image in the sequence, the current image and/or images following the current image in the sequence may be considered. In other words, the process of determining the scaling function may involve obtaining an observed size and position of the object in the image plane in a number of past and/or future images of the sequence of images to- gether with the respective sizes of the reference object and determining the scaling function on basis of said number of past and/or future images before applying the scaling function to any of the images of the sequence. While, in consideration of the current image, this approach may require access also to images following the current image in the sequence of images and hence es- sentially requires post-processing of the sequence of images, a benefit of such an approach is that all information regarding observed size(s) of a given object may be made use of in determination of the scaling function.

Alternatively, the scaling function may be determined and continuously updated on image-by-image basis such that only images of the sequence up to (and including) the current image may be considered in determination of the scaling function. In other words, the process of determining the scaling function may involve obtaining an observed size and position of the object in the image plane in a number of past images of the sequence of images together with the respective sizes of the reference object and determining the scaling function in said number of images that precede the current image in the sequence. In particular, as an example, assuming that one has a current, or initial, estimate of the scaling function determined on basis of images up to the image immediately preceding the current image, the process of updating the scaling function may comprise obtaining an observed size and position of an object of interest in the current image and the size of the reference object in the respective position (by using the predetermined mapping function) and using the observed size together with the respective size of the reference object to update the scaling function. While this approach constantly updates the scaling function and hence may be considered to improve the accuracy thereof, the initial esti- mates and/or a first few updated estimates of the scaling function may be inaccurate. On the other hand, a benefit of such an approach is that on-line processing of the sequence images in order to estimate a size of an object of interest in an image of the sequence is possible.

The determination and/or updating a scaling function may be limited to consid- er observations in images of the sequence in which the object of interest is depicted in full, thereby omitting the observations where the object of interest is only partially depicted in the image for example due to part of the object falling outside of the image or due to the view to the object of interest being partially or fully obstructed by another object or element depicted in the image. Instead of using a dedicated mapping function and a dedicated scaling function, the object estimation unit 303 may be configured to apply or use a joint function perform the operations or functions of a mapping function and a scaling function described hereinbefore. In particular, the object estimation unit 303 may be configured to use a predetermined joint function configured to es- timate a size of the object of interest in the image plane on basis of the posi- tion of the object of interest in the image plane. For simplicity, such a joint function may be referred to as a mapping function, thereby corresponding to an arrangement comprising a mapping function configured to estimate a size of the reference object in the image plane on basis of the position of the object of in- terest in the image plane to determine a reference size, and to estimate the size of the object of interest in its current position by scaling the reference size by a scaling factor having a fixed value, which may be for example 1 .

As an example of such a joint function, in case of a linear mapping function of the format indicated by the equation (7) the scaling functionality may be con- veniently incorporated as part the mapping function, thereby leaving the role of the scaling function to multiply the reference size by a scaling factor having value 1 . In this regard, for example the equation (9) may be re-written as a joint function by = /Oc) = r * href * ^{Va v}la⁹vg ^VC where the parameter r contributes the scaling functionality by directly scaling the second reference size h_ref. of the mapping function according to the equation (9). Consequently, the scaling functionality may be applied either before application of the mapping function or after application of the mapping function.

As another example making use of a joint function providing both the mapping and scaling functionalities, one may consider a scenario where the position of the object of interest is observed on basis of a first position associated with the object, whereas a predetermined mapping function is configured to determine a size of the reference object in the image plane on basis of a second position associated with the object, the second position being different from the first position. As an example, the predetermined mapping function may be configured to determine a size of a reference object on basis of a position of the lower boundary of an object, whereas the position of the object of interest is expressed by indicating the position of the upper boundary of the object - e.g. due to a view to the lower boundary of the object of interest in the image plane being obstructed by another object overlapping with the object of interest. In such a case, on basis of the equation (10), one may define a position of an upper boundary of an object of interest v_t as v_t = h_c + v_c = r * h_ref * ^{Vava Vc} + v_c (1 1 )

vavg where v_c indicates the position of the lower boundary of the object of interest. By solving the equation (1 1 ) with respect to v_c, one obtains the equation (12)

vavg and by substituting v_c of the equation (10) with the equation (12) the height of the object of interest may be estimated on basis of the position of the upper boundary of the object of interest v_t as h_c = r * ^href * (13)

The estimated size of the object of interest may be used for example as a size representative of the object of interest in the image plane in further processing of data related to the object especially in case where the object of interest is not depicted in full in the current image e.g. due to part of the object of interest falling outside of the image or due to the view to the object of interest being partially or fully obstructed by another object or element depicted in the image. An example of such further processing is estimation of one or more boundaries of an object of interest in the current image when the object is not fully depicted in the current image to facilitate tracking of the object and/or prediction of the position of the object in a subsequent image of the sequence.

As another example, the estimated size of the object of interest in the image plane - together with a respective observed size of the object of interest in the image plane and a respective reference size that may be obtained by applying the predetermined mapping function - may be used to update the scaling function. An example of such use is to use a difference between the estimated size and the observed size as a measure that may indicate a need to update the scaling function. As an example, a difference between the estimated size and the observed size exceeding a predetermined threshold, either in a single image or in a number of (consecutive) images of the sequence, may be used as a trigger to update the scaling function. Consequently, in response to said difference exceeding a predetermined threshold the scaling function may be up- dated or even re-determined on basis of the observed size of the object and the respective reference size. As already referred to in the foregoing, using the estimated size of the object of interest in an image to update the scaling function may be limited to take place only on basis of images in which the object of interest is depicted in full. The object estimation unit 303, or the apparatus 300 in general, may be further configured to output the estimated size of the object of interest in the current image. The object estimation unit 303 may be configured to provide the estimated size to another processing unit within or outside the apparatus 300, for example, to facilitate further analysis and/or processing of images of the se- quence of images, to facilitate tracking of an object in images of the sequence images, etc. Alternatively or additionally, the object estimation unit 303 may be configured to store information indicating the estimated size of the object of interest in a memory at the apparatus 300 or at another apparatus. The information indicating the estimated size of the object of interest may be stored, for example, as part of the data record comprising information on the object of interest.

In the following, a detailed description on exemplifying determination of a mapping function is provided. In this regard, the apparatus 300 may further comprise a mapping function determination unit 305 operatively coupled to the image analysis unit 301 and/or to the object estimation unit 303. The mapping function determination unit 305 may be also referred to as a mapping function unit, a mapping function determiner, etc.

The mapping function determination unit 305 may be configured, in order to enable determination of a mapping function, to obtain information indicating positions and sizes of two or more objects in an image plane in one or more images of the sequence of images, wherein said sizes of two or more objects in the image plane correspond to a real-world object having a first size. Said two or more objects may depict a single real-world object of the first size or two or more real-world objects of similar or essentially similar size, said two or more real-world objects hence having a size matching or essentially matching the first size. Furthermore, said two or more objects may comprise real-world objects of different size, for example an object or objects having a first size and an object or objects having a second size, where the size of the second object as depicted in the image plane is scaled with a suitable scaling factor such that the scaled size corresponds to the first size. In particular, for the purpose of correct determination of the mapping function, the mapping function determination unit 305 may be configured consider information indicating positions and sizes of two or more objects in an image plane in one or more images of the sequence of images wherein said two or more objects are depicted in full, i.e. without being partially outside the image and without being partially or fully overlapping with, e.g. obstructed by, another object or element depicted in the image.

In order to enable determination of a mapping between a position of an object in the image plane and a size of the object as depicted in the image plane on basis of observed positions and sizes in the image plane, at least two position - size pairs are needed. Having more than two observed position - size pairs improves the accuracy of mapping, thereby improving the reliability of the estimate of a position in the image plane representing the reference level. Typically, the higher the number of observed position - size pairs, the better the re- liability of mapping. The observations may originate from a single real-world object depicted in the image plane in two or more images of the sequence, or the observations may originate from two or more real-world objects of the same, similar or essentially similar size depicted in the image plane in one or more images of the sequence. Moreover, the observation may originate from two or more real-world objects of different size, e.g. a first size and a second size, depicted in the image plane in one or more images of the sequence, wherein the sizes of the objects in image plane depicting the real-world object having the second size are scaled by a scaling factor indicative of the ratio between the first and second sizes. The set of images of the sequence images applied in determination of a mapping between a position of an object in the image plane and a size of the object as depicted in the image plane on basis of observed positions and sizes in the image plane may comprise a predetermined number of observations or at least a predetermined number of observations in order to ensure reliable enough estimate. This set of images may, consequently, comprise a subset of images of the sequence in which a real-world object of given size is depicted or all images of the sequence in which the real-world object of given size is depicted.

The mapping function determination unit 305 may be configured to obtain in- formation indicating positions of and sizes of two or more objects in the image plane depicting the same real-world object in two or more images of the sequence of images. In other words, the two or more images depict a single real- world object moving within the field of view represented by the images of the sequence and, consequently, depict the real-world object in at least two differ- ent positions in the image plane.

The mapping function determination unit 305 may be configured to obtain information indicating positions and sizes of two or more objects in the image plane depicting two or more real-world objects of essentially identical size in one or more images of the sequence of images. In other words, the one or more images depict two or more real-world objects of essentially identical size within the field of view represented by the images of the sequence and, consequently, depict a real-world object of essentially identical size in at least two different positions in the image plane.

Information indicating or identifying the two to or more objects in the image plane to depict two or more real-world objects of essentially identical size may be obtained for example as input from a user via a suitable user interface, e.g. by the user indicating the two or more objects in the image plane that are considered to represent real-world objects of essentially identical size. As another example, Information indicating or identifying the two to or more objects in the image plane to depict two or more real-world objects of essentially identical size may be obtained by analysis of image data of an image indicating two objects at a similar distance from a reference level in the image plane exhibiting essentially similar size. In case the reference level is assumed to be a level that is in parallel to the u axis of the image plane, there is no need to have an indication of the position of the reference level but it is sufficient to identify two objects of essentially identical size in the image plane at essentially the same position in the direction of the v axis of the image plane.

The mapping function determination unit 305 may be configured to obtain information indicating positions and sizes of two or more objects in the image plane depicting two or more real-world objects having different sizes. In particular, the two or more objects may comprise a first object having a first size in the real-world and a second object having a second size in the real-world, wherein the information indicating size of the second object in the image plane is scaled, e.g. multiplied, by a scaling factor indicative of the ratio between the first size and the second size. The scaling converts the size of the second ob- ject as observed in the image plane in such a way that it corresponds to a size the first object would have in the current position of the second object, hence enabling determination of the mapping between a position of an object in the image plane and a size of the object as depicted in the image plane on basis of observed positions and sizes of real-world objects of different size.

The terms essentially similar size and essentially identical size as used herein refer to two - or several - real-world objects to have sizes that differ by a few percent at most. While the actual tolerance to deviation in size of the two - or several - real-world objects considered to represent an identical size depends on the distance of the real-world object from the focal point of the imaging device, a difference in size of up to 5 percent does not typically unduly degrade the accuracy of the mapping between a position in the image plane and a size of an object as depicted in the image plane. In general, for real-world objects further away from the focal point of the imaging device a larger difference in size may be tolerated without unduly affecting the accuracy of the mapping.

Similar considerations also apply to a size of a single real-world object that may exhibit subtle changes in size as depicted in the image plane even when the real-world object does not move in relation to the imaging device. An example of such real-world object is a person moving or standing within the field of view of the imaging device, where the subtle changes in size as depicted in the image plane may occur e.g. due to change in posture, change in orientation with respect to the image plane, etc.

The mapping function determination unit 305 may be configured to obtain information indicating a position of an object in the image plane and/or the size of the object for example by performing an analysis of image data of a number of images of the sequence of images in order to identify an object of predetermined characteristics, its position in the image plane and its size in the image plane. Image analysis techniques for detecting and identifying an object of predetermined characteristics in an image known in the art may be used for this purpose. The output of such analysis may comprise indication of pixel positions in the image plane indicating a position of the object in the image plane and/or indication of the size of the object.

Alternatively, the mapping function determination unit 305 may be configured to receive information indicating a position of an object in the image plane and/or the size of the object by receiving an indication of a pixel position or pixel positions of the image plane indicating a position of the object in the image plane and/or an indication of the size of the object. Such information may be received, for example, from another processing unit of the apparatus 300 or from a processing unit outside the apparatus 300, such processing unit configured to apply image analysis in order to determine a presence of an object of predetermined characteristics and a position and size thereof in the image plane. As another alternative, the information indicating a position and a size of an object in the image plane may be received, for example, based on input from a user. The user may indicate an object of interest in an image via a suitable user interface (such as display & pointing device, a touchscreen, etc.), for example by indicating a lower and upper boundaries of the object in the image plane and/or a left and right boundaries of the object in the image plane. As a particular further example, the user may be involved in initial detection of an object, whereas the mapping function determination unit 305 may be configured to track the object indicated by the user in the subsequent (and/or preceding) images of the sequence of images.

The information indicating a position of an object in the image plane may comprise, for example, a position indicating a lower boundary of the object in the image plane and/or a position indicating an upper boundary of the object in the image plane. Additionally or alternatively, the information indicating a position of an object may comprise for example a position indicating a left boundary of the object and/or a position indicating a right boundary of the object, as described hereinbefore. The information indicating a size of an object in the im- age plane may comprise, for example, a height of the object in the image plane and/or a width of the object in the image, as described hereinbefore. The height and/or the width in the image plane may be expressed e.g. as number of pixel positions.

The mapping function determination unit 305 is configured to determine a mapping between a position of an object in the image plane and a size of the object in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images. In particular, the mapping function determination unit 305 may be configured to determine such mapping for a real-world object of the first size, depicted in the image plane as an object of different size in one or more images of a sequence of images, where the size of the object in the image plane varies in dependence of a distance of the real- world object from the focal point of the imaging device used to capture the sequence of images.

The mapping may be determined as a function taking a position of the object in the image plane as an input argument and providing a corresponding size in the image plane as an output argument. A respective inverse function may also be determined, hence taking a size of an object in the image plane as an input argument and providing a corresponding position of the object in the image plane as an output argument. The position(s) and size(s) may be expressed as described hereinbefore. The mapping function determination unit 305 may be configured to determine the mapping between a position of an object in the image plane and a size of the object in the image plane is determined as a linear function. Any suitable linear model may be employed. As an example, the mapping function determination unit 305 may be configured to apply a function of the form indicated by the equation (7) hereinbefore.

As an exemplifying application of the function of the for indicated by the equation (7), the parameter h_v may represent a size of the object as a height of an object in the image plane, the parameter v_b may represent a position of the object as a position of a lower boundary of the object in the image plane, while a and b represent mapping parameters to be determined. The mapping may be determined for example using a least squares fit to an equation system comprising a number of equations of the form indicated by the equation (7), each equation of the system representing a pair of an observed position of a lower boundary of an object v¾, and a corresponding observed height of the object h_vi in the image plane. Consequently, the fitting involves determining the parameters a and b such that the overall error in an equation system (14) is minimized (by using methods known in the art)

Figure 4 illustrates the principle of linear fitting according to the equations (7) and (14) by an example. The black dots represent observed pairs of position of a lower boundary of an object in the image plane and the respective height of the object in the image plane in a coordinate system where the position of an object in the image plane is indicated by the position in the h axis, which may also be referred to as a 'size axis'. Note that the observed positions and sizes are explicitly indicated only for some of the observed pairs for clarity of illustration. The line 402 crossing the v axis at v_h = -bla and crossing the h axis at point h_ref = b illustrates the fitted line providing a minimum error in consideration of the observed position - height pairs. In particular, v_h indicates an estimate of a position (along the v axis of the image plane) representing a level where the height of the object is zero, whereas h_ret indicates an estimated height of the object at the bottom of the image.

The exemplifying mapping function illustrated by the equations (7) and (14) may be modified to employ a parameter different from the observed height to indicate a size of the object in the image plane and/or a parameter different from the observed position of a lower boundary of the object to indicate a position of the object in the image plane. As an example of such a modification, the exemplifying process of determining the mapping function may be modified by replacing the height of the object h_v in equations (7) and (14) by a width of the object (w_v, Wvl) to represent a size of the object in the image plane and/or the position of the lower boundary of the object v_b in equations (7) and (14) by the position of an upper boundary of the object (v_t, v¾) to represent a position of the object in the image plane.

Instead of applying a linear function, the mapping may be determined by using a parabolic function or a 'skewed' parabolic function, i.e. a second order function. Consequently, the mapping between a position of an object in the image plane and a size of the object in the image plane is determined using a parabolic fit. As an example of a parabolic fit, one may consider determination of the mapping on basis of observed positions of the lower and upper boundaries of an object in the image plane, v_b and v_t, respectively. These positions may be expressed as

fsz-cfy_c

(15) cz+sy_c and

cfy-cfy_c+fsz

(16) cz—sy+sy_c where s = sin θ_χ and c = cos θ_χ. The equations (15) and (16) enable solving the projected object height h_v in the image plane (in pixels), as indicated by equations (17) and (18), respectively. c² (-f)v_by+cf²sy-csv_b ^ly+fs²v_by

c² fy_c+csv_by-fs²y+fs²y_c

_ y{c²fv_t-cf²s+csv^-fs²v_t)

n_v — — ~ — ~~ — ( i o ) c fy-c fy_c+csv_ty-fs y_c

The equations (17) and (18) essentially provide mapping between a position of an object in the image plane, expressed by the position of the lower boundary of the object v_b and the position of the upper boundary of the object v_t and a size of the object, expressed as the size of the object as 'skewed' parabolic curves.

The equations (17) and (18) may be written as Av² ₊Bv_{b +}C

Av_{b +}D ^{~ hv ( 1 9)} AV_t +Bv_t+C

— ¹— = K (20)

-Av_t+E ^v '

Thus, the mapping may be determined for example using a least squares fit to an equation system comprising equations of the form indicated by the equations (19) and/or (20), each equation of the system representing a pair of an observed position of a lower or upper boundary of an object v_bi or v¾, respectively, and a corresponding observed height of the object h_vi in the image plane. Consequently, the fitting involves determining the parameters A, B and C together with D and/or E such that the overall error in the equation system is minimized (by using methods known in the art).

Consequently, in case the reference level of interest is a horizon level in the image plane where the height of an object, representing the reference size, can be assumed to be zero. Hence, if the projected object height h_v in the im- age plane in the equations (19) and/or (20) is set to zero, once the mapping parameters A, B and C together with D and/or E have been estimate the horizon level may be determined as

While a parabolic or a 'skewed' parabolic function may be considered to provide a model that is theoretically more accurate than a linear one, it is more sensitive to errors in observed positions and sizes in the image plane and also requires slightly more complex computation. Hence, a parabolic or a 'skewed' parabolic function may not be applicable to all scenarios. Furthermore, a mapping function of any order and any kind may be employed without departing from the scope of the present invention

The mapping function determination unit 305 may be further configured to use the mapping to determine an estimate of a position representing a reference level in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a predetermined reference size. Alternatively, the apparatus 300 may further comprise a reference level determination unit 307 configured to use the mapping function to determine an estimate of a position representing a reference level in the image plane. Such predetermined reference size is preferably zero, or a size that maps to zero in consideration of the available image resolution. Consequently, the reference level represents a horizon in the image plane. An estimate of a position or level representing a horizon in the image plane may be useful for example in determi- nation of parameters associated with the imaging device employed to capture the sequence of images and its position and/or orientation with respect to the real-world. As another example, an estimate of a position or level representing a horizon in the image plane may be useful for image analysis, in particular in analysis of objects, their positions and changes thereof in images of the se- quence of images.

In a scenario where the plane in the field of view of the imaging device is not a horizontal one but an upward or a downward slope of essentially constant ascent/descent with respect to the image plane, the reference level determined on basis of a position in the image plane where a size of an object maps to a predetermined reference size may not represent a 'real' horizon in the image plane but rather a virtual horizon with respect to the non-horizontal plane in the field of view of the imaging device. Alternatively, instead of using a zero-size to determine an estimate of a position of a horizon in the image plane, a non-zero reference size may be used to determine a reference level different from the horizon level.

As an example, an estimate of a position representing a reference level in the image plane in images of the sequence of images may be expressed, i.e. determined, as a distance from a predetermined reference point in the direction of the v axis of the image plane. As an example, the reference level may be expressed as a distance in number of pixel positions from the origin of the image plane, thereby directly indicating the v axis coordinate v_h of the image plane estimating the position of the reference level, as illustrated by an example in Figure 4.

As another example, an estimate of a position representing a reference level in the image plane may be expressed as an angle corresponding to a slope of the mapping function (or the inverse mapping function) together with a second reference size. For example in case of a linear mapping on basis of a function according to the equation (7), a slope of the mapping function may be determined on basis of the parameter a. The corresponding angle φ, which is the angle between the v axis of the example of Figure 4 and the fitted line 402 representing the mapping function, may be determined as

The angle φ may be used together with a second reference size h_re which may be for example the (estimated) height of the object at a predetermined position of the image, for example at the origin of the image plane or at the bottom of the image href. , or the height of the object at any other suitable posi- tion in the image plane, to indicate an estimate of a position representing the reference level in the image plane. In case of a linear mapping on basis of a function according to the equation (7), the (estimated) height of the object at the origin of the image plane can be rather conveniently obtained by setting the position of the object in the image plane v_b in equation (7) to zero, resulting in the second reference height h_ret = b as the second reference size. Consequently, in case the reference level of interest is a horizon level in the image plane, an estimated of a position representing horizon v_h may be computed as v_h = h_ref tan <p. (23) In the foregoing, the angle between the fitted line 402 and the h axis (i.e. the 'size axis') were used as parameters descriptive of the estimate of a position representing the reference level. As another example, the angle between the fitted line 402 and the v axis, i.e. β = arctan {h_ret I v_h) = arctan (-a), together with href as the second reference size. Consequently, in case the reference level of interest is a horizon level in the image plane, this may be estimated by v_h = tan β I href.

In case a size parameter different from the (observed) height of a depicted object in the image plane and/or a position parameter different from the (ob- served) position of a lower boundary of the depicted object in the image plane are employed to determine the mapping, similar considerations with respect to expressing, or determining, the estimate of a position representing a reference level apply.

Determination of an estimate of a position representing a reference level in the image plane in images of the sequence of images described hereinbefore may be applied to determine a single estimate of the reference level position in the image plane. Consequently, the mapping function unit 305, or the reference level determination unit 307, may be configured to determine a final, or refined, estimate of a position representing a reference level in the image plane on ba- sis of a single estimate of a position representing a reference level.

While the refined estimate of a position representing the reference level may be reliably determined based on a single estimate, the accuracy of the refined estimate may be improved by determining a number of (initial) estimates of a position representing the reference level in the image plane and by deriving the refined estimate. Hence, the mapping function unit 305, or the reference level determination unit 307, may be configured to determine the refined estimate of a position representing the reference level on basis of one or more (initial) estimates of a position representing the reference level. The size of the real-world object, referred to hereinbefore also as the first size, used as a basis for determination the number of (initial) estimates need not be the same, but a refined estimate of a position representing the reference level in the image plane may be determined on basis of a number of (initial) estimates that are based on real-world objects of different size. As an example, the mapping function determination unit 305 may be configured to determine the refined estimate as an average of two or more estimates of a position representing the reference level in the image plane, for example as an average of two or more estimates of a v axis coordinate v_h in the image plane estimating the position of the reference level or as an average of two or more estimates of the angle φ indicating the position of the reference level together with the second reference size h_ref.

While averaging of a number of v axis coordinates v_h indicating an estimated position of the reference level in the image plane may be realized in a straight- forward manner, some further processing is required for averaging of a number of angles φ indicating the position of the reference level together with the second reference size h_ret. In particular, assuming there is an initial estimate of the angle φ and the second reference size h_re one may determine another mapping according to the equations (7) and (14) to find another estimates of the parameters a and b, denoted as Ά_Λ and £>i , based on which one may determine another pair of parameters v_ren = -£>i and φι = arctan (-1 /a-i) estimating the position of the reference level in the image plane at v_M = In order to combine the another estimate with the initial estimate, one may use the another estimate of the position of the reference level v_hu which in this example is the estimated horizon level in the image plane, to find the adjusted angle as φ = arctan(v / )- The adjusted angle compensates for the difference between the sizes of the real-world objects - and hence the corresponding sizes thereof as depicted on the image plane - such that a combined angle q>_avg = (φ₀ + φ'ϊ) I 2 may be computed as the arithmetic mean, enabling computation of the combined estimate of the horizon level as v_avg = h_ref * tan q>_avg. Any possible further estimates of the parameters a and b may be incorporated into the combined estimate in a similar manner as the another estimate discussed in the foregoing.

The average may be an arithmetic mean or, alternatively, a weighted average may be employed. The weighting may involve making use the fitting error that may be derived as part of a least squares fit applied to a group of equations according to the equations (7) and (14), for example such that a given (initial) estimate of a position representing the reference level is multiplied by a weight that has a value increasing with decreasing value of the respective fitting error. The operations, procedures and/or functions assigned to the image analysis unit 301 , the object estimation unit 303, as well as the operations, procedures and/or functions assigned to the mapping function determination unit 305 and reference level determination unit 307 possibly comprised in the apparatus 300, described hereinbefore may be divided between the units of the apparatus 300 in a different manner. Moreover, the apparatus 300 may comprise further units that may be configured to perform some of the operations, procedures and/or functions assigned to the above-mentioned processing units.

On the other hand, the operations, procedures and/or functions assigned to the image analysis 301 , the object estimation unit 303, as well as the operations, procedures and/or functions assigned to the mapping function determination unit 305 and reference level determination unit 307 possibly comprised in the apparatus 300, may be assigned to a single processing unit within the apparatus 300 instead. In particular, in accordance with an embodiment of the invention, the apparatus 300 may comprise means for determining a first position on basis of information indicating a position of the object in the image plane, means for using a predetermined mapping function configured to determine a size of a reference object in the image plane on basis of a position of the object in the im- age plane to determine a reference size in the first position, and means for estimating the size of the object in the first position on basis of the reference size in the first position and a scaling function.

The operations, procedures and/or functions described hereinbefore in context of the apparatus 300 may also be expressed as steps of a method implement- ing the corresponding operation, procedure and/or function.

As an example, a method 500 in accordance with an embodiment of the invention is illustrated in Figure 5. The method 500 may be arranged to estimate a size of an object in an image plane in an image of a sequence of images. The method 500 comprises determining a first position on basis of information indi- eating a position of the object in the image plane, as indicated in step 510. The method 500 further comprises using a predetermined mapping function configured to determine a size of a reference object in the image plane on basis of a position of the object in the image plane to determine a reference size in the first position, as indicated in step 520, and estimating the size of the object in the first position on basis of the reference size in the first position and a scaling function, as indicated in step 530.

The apparatus 300 may be implemented as hardware alone, for example as an electric circuit, as a programmable or non-programmable processor, as a microcontroller, etc. The apparatus 300 may have certain aspects implemented as software alone or can be implemented as a combination of hardware and software.

The apparatus 300 may be implemented using instructions that enable hardware functionality, for example, by using executable computer program instruc- tions in a general-purpose or special-purpose processor that may be stored on a computer readable storage medium to be executed by such a processor. The apparatus 300 may further comprise a memory as the computer readable storage medium the processor is configured to read from and write to. The memory may store a computer program comprising computer-executable in- structions that control the operation of the apparatus 300 when loaded into the processor. The processor is able to load and execute the computer program by reading the computer-executable instructions from memory

While the processor and the memory are hereinbefore referred to as single components, the processor may comprise one or more processors or pro- cessing units and the memory may comprise one or more memories or memory units. Consequently, the computer program, comprising one or more sequences of one or more instructions that, when executed by the one or more processors, cause an apparatus to perform steps implementing the procedures and/or functions described in context of the apparatus 300. Reference to a processor or a processing unit should not be understood to encompass only programmable processors, but also dedicated circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processors, etc. Features described in the preceding description may be used in combinations other than the combinations explicitly described. Alt- hough functions have been described with reference to certain features, those functions may be performable by other features whether described or not. Although features have been described with reference to certain embodiments, those features may also be present in other embodiments whether described or not.

Claims

1 . A method for estimating a size of an object in an image plane in an image of a sequence of images, the method comprising determining a first position on basis of information indicating a posi- tion of the object in the image plane, using a predetermined mapping function configured to determine a size of a reference object in the image plane on basis of a position of an object in the image plane to determine a reference size in the first position, and estimating the size of the object in the first position on basis of the reference size in the first position and a scaling function.

2. A method according to claim 1 , wherein a position of an object in the image plane is determined by a position of a lower boundary of an object or by a position of an upper boundary of an object.

3. A method according to claim 1 or 2, wherein a size of an object in the image plane is determined by a height of an object or by a width of an object.

4. A method according to any of claims 1 to 3, wherein said scaling function comprises determining the size of the object by multiplying the reference size in the first position by a scaling factor indicative of the ratio between the size of the object and the size of the reference object.

5. A method according to claim 4, wherein the scaling factor is determined as a ratio between an observed size of the object in a second position in the image plane and the size of the reference object in respective position in the image plane.

6. A method according to claim 4, wherein the scaling factor is determined as an average of ratios between an observed size of the object and the respective size of the reference object in two or more positions in the image plane.

7. A method according to any of claims 1 to 6, further comprising obtaining an observed size of the object in the image plane in the first position, and updating the scaling function on basis of the observed size of the object in the image plane and the reference size in the first position.

8. A method according to any of claims 1 to 7, wherein the predetermined mapping function is configured to determine a size of the reference object in the image plane on basis of a distance of the position of the object from a reference level in the image plane and a second reference size indicating a size of the reference object in a second reference position.

9. A method according to claim 8, wherein the reference level is an estimate of a horizon level in the image plane.

10. A method according to claim 8 or 9, wherein the second reference position is the origin of the image plane.

1 1 . A method according to claim 9 or 10, wherein the predetermined mapping function is configured to determine a size of a reference object h_c in the image plane by using the formula h — h _* ^v vg ^~vc

vavg ^~vref where, h_ref represents the second reference size, v_c represents the position of a lower boundary of the object, v_avg represents the position of the horizon level, and v_ref represents the second reference position.

12. A method according to any of claims 1 to 1 1 , wherein the predetermined mapping function is based on a linear function of the form h_v = av_b + b, where h_v represents a size of an object in the image plane, v_b repre- sents a position of the object in the image plane, and a and b represent mapping parameters.

13. A method according to claim 12, wherein the estimate of a position representing the reference level is determined as an angle corresponding to the slope determined by the parameter a and the second reference size.

14. A computer program for estimating a size of an object in an image plane in an image of a sequence of images, the computer program comprising one or more sequences of one or more instructions which, when executed by one or more processors, cause an apparatus to at least perform the method of any of claims 1 to 13.

15. An apparatus for estimating a size of an object in an image plane in an image of a sequence of images, the apparatus comprising an image analyzer configured to determine a first position on basis of information indicating a position of the object in the image plane, and an object estimator configured to use a predetermined mapping function configured to determine a size of a reference object in the image plane on basis of a position of an object in the image plane to determine a reference size in the first position, and estimate the size of the object in the first position on basis of the reference size in the first position and a scaling function.

16. An apparatus according to claim 15, wherein a position of an object in the image plane is determined by a position of a lower boundary of an object or by a position of an upper boundary of an object.

17. An apparatus according to claim 15 or 16, wherein a size of an object in the image plane is determined by a height of an object or by a width of an object.

18. An apparatus according to any of claims 15 to 17, wherein said scaling function comprises determining the size of the object by multiplying the refer- ence size in the first position by a scaling factor indicative of the ratio between the size of the object and the size of the reference object.

19. An apparatus according to claim 18, wherein the scaling factor is determined as a ratio between an observed size of the object in a second position in the image plane and the size of the reference object in respective position in the image plane.

20. An apparatus according to claim 18, wherein the scaling factor is determined as an average of ratios between an observed size of the object and the respective size of the reference object in two or more positions in the image plane.

21 . An apparatus according to any of claims 15 to 20, further comprising wherein the image analyzer is further configured to obtain an observed size of the object in the image plane in the first position, and wherein the object estimator is further configured to update the scaling function on basis of the observed size of the object in the image plane and the reference size in the first position.

22. An apparatus according to any of claims 15 to 21 , wherein the predetermined mapping function is configured to determine a size of the reference object in the image plane on basis of a distance of the position of the object from a reference level in the image plane and a second reference size indicating a size of the reference object in a second reference position.

23. An apparatus according to claim 22, wherein the reference level is an estimate of a horizon level in the image plane.

24. An apparatus according to claim 22 or 23, wherein the second reference position is the origin of the image plane.

25. An apparatus according to claim 23 or 24, wherein the predetermined mapping function is configured to determine a size of a reference object h_c in the image plane by using the formula h — h _* ^v vg ^~vc

26. An apparatus according to any of claims 15 to 25, wherein the predetermined mapping function is based on a linear function of the form h_v = av_b + b, where h_v represents a size of an object in the image plane, v¾ represents a position of the object in the image plane, and a and b represent mapping parameters.

27. An apparatus according to claim 26, wherein the estimate of a position representing the reference level is determined as an angle corresponding to the slope determined by the parameter a and the second reference size.