WO2008107713A1 - Controlled high resolution sub-image capture with time domain multiplexed high speed full field of view reference video stream for image based biometric applications - Google Patents

Controlled high resolution sub-image capture with time domain multiplexed high speed full field of view reference video stream for image based biometric applications Download PDF

Info

Publication number
WO2008107713A1
WO2008107713A1 PCT/GB2008/050144 GB2008050144W WO2008107713A1 WO 2008107713 A1 WO2008107713 A1 WO 2008107713A1 GB 2008050144 W GB2008050144 W GB 2008050144W WO 2008107713 A1 WO2008107713 A1 WO 2008107713A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
capture device
image capture
images
interest
Prior art date
Application number
PCT/GB2008/050144
Other languages
French (fr)
Inventor
Thomas Heseltine
Justen Hyde
Original Assignee
Aurora Computer Services Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aurora Computer Services Limited filed Critical Aurora Computer Services Limited
Publication of WO2008107713A1 publication Critical patent/WO2008107713A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/667Camera operation mode switching, e.g. between still and video, sport and normal or high- and low-resolution modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources

Definitions

  • the present invention relates to apparatus and methods for capturing high definition, high speed images from object specific targets in a range of biometric applications such as iris recognition and particularly, but not exclusively, for the capture of facial images for facial recognition systems.
  • a method of image processing for object recognition applications comprising the steps of: using an image capture device to acquire a first, relatively low resolution image of a relatively wide field of view containing a plurality of objects; applying a pattern recognition algorithm to the first image so as to identify objects of interest; selecting at least one object of interest from the first image and determining its coordinates within the image; and using the determined coordinates to control either the image capture device or an additional image capture device so as to obtain a sequence of second, relatively high resolution images of a relatively narrow field of view containing the at least one object of interest.
  • Te first image is advantageously time multiplexed with the sequence of second images.
  • a single image capture device may be used to acquire both the first and the second images.
  • a first image capture device acquires the first image
  • a second image capture device acquires the second images.
  • the image capture device may be controlled so as to perform digital zoom and optionally digital pan and/or tilt in order to select the relatively narrow field of view.
  • the image capture device may be controlled so as to perform optical zoom and optionally electromechanical pan and/or tilt in order to select the relatively narrow field of view.
  • only the second images are used for object recognition processing.
  • apparatus for image processing for object recognition applications comprising: an image capture device adapted to acquire a first, relatively low resolution image of a relatively wide field of view containing a plurality of objects; means for applying a pattern recognition algorithm to the first image so as to identify objects of interest; means for selecting at least one object of interest from the first image and determining its coordinates within the image; and means for using the determined coordinates to control either the image capture device or an additional image capture device so as to obtain a sequence of second, relatively high resolution images of a relatively narrow field of view containing the at least one object of interest.
  • the apparatus may comprise means for time multiplexing the first image with the sequence of second images.
  • the apparatus may comprise a single image capture device for acquiring both the first and the second images, or may comprise a first image capture device for acquiring the first image, and a second image capture device for acquiring the second images.
  • the image capture device may be controlled so as to perform digital zoom and optionally digital pan and/or tilt in order to select the relatively narrow field of view, or may be controlled so as to perform optical zoom and optionally electromechanical pan and/or tilt in order to select the relatively narrow field of view.
  • the present application describes a technique that acquires good quality images of the facial features of targets in both ideal and non-ideal locations where the target may be stationary or moving and fast image capture is required. By adapting both the imaged area and image resolution the method transmits the minimal data required for good facial recognition.
  • Embodiments of the invention also provide the means to acquire good quality facial images in areas where significant numbers of people are present.
  • Embodiments of the invention remove the requirement for moving parts, filtering, encoding or other image processing at the image source.
  • embodiments of the invention provide excellent or optimal image quality of single or multiple facial areas within a viewing region whilst maintaining a high frame rate. It must also be stated that any image based biometric, regardless of the specifics of the biometric and the light spectrum used, places a similar requirement upon the image capture device.
  • Embodiments of the present invention use dynamic object specific digital pan and zoom, where a down sampled, low resolution, typically VGA high-speed representation of the full high-resolution image is time multiplexed with high resolution sub regions containing facial detail.
  • the data reduction allows multiple sub regions to be transmitted within single frame periods without significant reduction in frame rates.
  • Such images may be derived from 1.3 to 8Mpixel sensors and may rise as camera technology improves.
  • the VGA resolution image is first transmitted to the host facial recognition computer, facial areas are located by head and/or eye location algorithms running on the host.
  • the coordinates of the target specific area is passed back to the camera.
  • the camera operates digital pan and zoom to send a high-resolution subset of the full image area but transmits only the data containing the areas of the identified targets.
  • Several highspeed transmissions of these areas can be made to optimise the facial position for the image recognition software.
  • This approach provides target area location and high- resolution facial images at frame rates significantly higher than 25fps. This has the potential for capturing and processing crowded areas within the image area at any one time; with suitable lenses this may be achieved at a distance of tens of metres from the camera.
  • CCTV closed circuit TV
  • the pixel resolution of these cameras is often below the normal maximums of 752(H) x 582(V) for PAL and 768(H) x 494(V) for NTS.
  • mechanical control of camera tilt, pan and zoom can be employed to optimise the image size but the response time of these mechanical systems is often too slow to optimise each image capture [Alessio Del Bue, Dorin Comaniciu, Visvanathan Ramesh, Carlo Regazzoni, "Smart Cameras With Real-Time Video Object Generation”. IEEE International Conference Image Processing, (ICIP'02), Rochester, NY, VoI 3, 429 - 432, 2002].
  • the image of the facial area is a small subset of the total available image area allowing target acquisition over a larger area. In this situation the active area is large but the detail is insufficient for good facial recognition.
  • CMOS or CCD imaging technology that currently produces images with resolutions up to 8 megapixels (8Mp) and will continue to rise as imaging technology develops.
  • a typical 5Mp device at present provides a resolution of 2592(H) x 1944(V).
  • the 8Mp versions reduce this further to only 1.1 mm. This is a significant improvement but requires a large increase in data transmission between the camera and the processing computer.
  • the CMOS device produces a 10bit digital value for each pixel, unlike the PAL system, which transmits an analogue representation of the image.
  • standard transmission methods such as USB1 and full speed USB2 only 12Mb/s speeds are available; this determines a frame transfer speed of 6.6 seconds.
  • Coding systems may be employed within the imaging device, for example JPEG or MPEG. This will increase the image transfer rate but requires additional hardware within the camera to provide the coding functions and requires decoding at the host computer. Compression techniques also reduce image quality if high levels of compression are employed; this limits the level of compression available and hence reduces the available frame rate.
  • Embodiments of the present invention use multimode transmission with target specific area transmission to achieve high definition, high-speed transmission of facial images.
  • the implementation requires minimal computation within the video data stream, requiring only the formatting of data to conform to the chosen output format.
  • the embedded controller operating outside the main video data stream is subsequently able to operate at lower speeds and implements the main control functionality.
  • High-speed head and eye location algorithms have been available for some time and are used in the early stage of facial recognition to locate and isolate the head from the rest of the image. The resulting image is then processed for recognition.
  • Embodiments of the present invention use head location algorithms running on the host face recognition-processing computer to locate the position of a face on a low-resolution (typically 640 x 480 pixels) image.
  • This image is derived from a high resolution CMOS or CCD imaging sensor, where the high-resolution image has been down sampled from for example 5Mp to 300Kp.
  • the down sampled image is transmitted via a high-speed digital link to the processing host. If high-speed USB2 is employed, frame rates of around I OOfps can be transmitted at VGA resolution, the actual frame rate is determined by the speed of the image capture device and the number of sub frames transmitted per full frame.
  • the head On receipt of the low-resolution image, the head is located and the coordinates of the head location are transmitted back to the imaging sensor.
  • the coordinates either define the location of a default window size, at which point the entire default window area is transmitted, or alternatively two coordinates are used, determined by the face location algorithm, that define two diagonally opposed corners. In the case of the opposed corners, a variable window size results where the transmitted image area contains minimal data not relating to the facial region of interest. Image transmission rates in this system are therefore increased due to the high level of redundant data removal.
  • the high-resolution subset of the image area is transmitted back to the host processor, allowing full facial recognition to be applied to the high- resolution image area of the facial region of interest. Once the facial area is located the coordinates of the facial area may be recalculated from the high-resolution current image. The updated coordinates may then be retransmitted to the imaging system, allowing tracking of the subject within the total available image area.
  • multiple images can be transmitted within each frame period. This allows time division multiplexing to intersperse full area images with high-resolution facial images within a single frame period.
  • the following example is based on a 640 x 480 pixel reference image with three facial windows. Each facial window is determined from the previous reference image. This allows the system to track and identify the subject in real time with high frame rates.
  • Full reference image 640x480x16bit 4.9Mbit
  • These high frame rate high-resolution images allow multiple images of a subject to be acquired while in motion, providing the opportunity to capture multiple facial images with the optimal facial orientation.
  • the digital interface may be replaced or supplemented by PAL or NTSC analogue transmission.
  • An auxiliary digital communications line is employed to dynamically pass the digital pan, zoom and other commands to the camera system. If the auxiliary communications is in the form of RS485 or RS422, long cable runs may be employed allowing remote mounting of camera systems. However, frame rates in this configuration reduce due to the limited transmission bandwidth.
  • FIGURE 1 shows an image of an eye region of a subject taken with a low resolution image capture device
  • FIGURE 2 shows an image of an eye region of a subject taken with a high resolution image capture device
  • FIGURE 3 shows an image from a high resolution image capture device viewing an area where faces may appear
  • FIGURE 4 shows a facial region selected from the image of Figure 3;
  • FIGURE 5 shows an implementation of an embodiment of the invention
  • FIGURE 6 shows an implementation of another embodiment of the invention.
  • FIGURE 7 shows an implementation of a further embodiment of the invention.
  • Figure 1 shows an image of the eye region of a subject taken with a CCD camera with a resolution of 752 x 582 pixels.
  • the camera has been rotated by 90 degrees to improve vertical resolution and resulting image rotated back through 90 degrees.
  • the subjects' facial area occupies approximately 25% of the full image area.
  • Figure 2 shows an image of the same eye region taken with a 3.1 Mp camera with the same total image area. Only the 640 x 480 pixels in the facial area were transmitted thus reducing the data transmission demand significantly compared to transmission of the full 3.1 Mp image. Dynamic object specific systems only transmit the required data, therefore resolution and picture quality is significantly enhanced for a given data transmission rate, which results in much higher accuracy within the facial recognition system.
  • Figure 3 shows the image from a 3.1 Mp camera viewing an area where faces may appear.
  • the object specific head and eye location identifies the area of the image that holds a valid facial region and passes the coordinates to the camera.
  • the camera selects this region 1 and transmits the high definition version of this image area, this being shown in Figure 4.
  • This process requires the transmission of a 640 pixel x 480 pixel reference image and 224 x 241 pixel facial image.
  • the resulting data is 5.8Mbit at 16bit colour depth.
  • FIG. 5 shows an implementation of the invention such that the high-resolution image is converted to a digital bitmap.
  • a camera system 2 comprises a high resolution image sensor 3 linked to an analog-to-digital converter 4, which in turn is linked to a digital communications interface (DCI) 5.
  • the camera system 2 components are controlled by a control microprocessor 6.
  • Image data is sent to a host computer 7, which includes a DCI 8, a face recognition system 9 and a system 10 for locating head and eye regions in the image.
  • the system 10 is configured to transmit head and eye coordinates back to the control microprocessor 6 of the camera system 2.
  • Sub sampling and windowing is performed by the image sensor 3 under the control of the control microprocessor 6, which in turn derives the windowing control data from the remote host computer 7.
  • Image data is transmitted to the host 7 from the camera 2 via a suitable interface such as USB2 or FireWire.
  • the digital communications may incorporate suitable encoding such as JPEG or MPEG.
  • JPEG or MPEG Joint Photographic Experts Group
  • the reference full frame is transmitted first.
  • the host 7 determines the window coordinates, and a sequence of sub frames requested by the host 7. Each sub frame is derived from a new image captured by the sensing device 3 and is therefore not at exactly the same time frame as the reference image.
  • Figure 6 shows an implementation such that the operation is as in Figure 5 with the addition of a full frame memory 1 1.
  • the full high-resolution image is stored in the memory 1 1.
  • Down sampling is then performed by reading subsets of the image data stored in the memory 1 1.
  • Windowing is also performed by reading subsets of stored data.
  • This implementation generates a reference image and captures subset images at the same instant in time.
  • Figure 7 shows an implementation such that the image is transferred to the host 7 via analog video encoders 12, 13 as standard analogue transmission media such as PAL or NTSC (rather than digitally).
  • analog video encoders 12, 13 as standard analogue transmission media such as PAL or NTSC (rather than digitally).
  • sub frame transmission speeds remain the same as full frame transmission speeds and must be interposed with the full frame reference image.
  • Image transfer rates in this embodiment are lower than in the digital transmission implementations

Abstract

There is discloses a method and apparatus for image processing for object recognition applications. An image capture device is used to acquire a first, relatively low resolution image of a relatively wide field of view containing a plurality of objects. Apattern recognition algorithm is then applied to the first image so as to identify objects of interest, and at least one object of interest is selected from the first image and its coordinates within the image are determined. The determined coordinates are then used to control either the image capture device or an additional image capture device so as to obtain a sequence of second, relatively high resolution images of a relatively narrow field of view containing the at least one object of interest.

Description

CONTROLLED HIGH RESOLUTION SUB-IMAGE CAPTURE WITH TIME DOMAIN
MULTIPLEXED HIGH SPEED FULL FIELD OF VIEW REFERENCE VIDEO STREAM
FOR IMAGE BASED BIOMETRIC APPLICATIONS
The present invention relates to apparatus and methods for capturing high definition, high speed images from object specific targets in a range of biometric applications such as iris recognition and particularly, but not exclusively, for the capture of facial images for facial recognition systems.
BACKGROUND
The performance of software based facial recognition systems is determined by many factors. Enhancements in this technology now mean that many facial parameters are taken into consideration by the software. These include facial geometry, specific identifying marks such as moles, skin colouration and more specifically its variation across the facial area. However, one common factor is the requirement for high- resolution good quality electronic images to enable the software to analyse these features, a relatively simple task to achieve if the subject poses at a specific distance from a high definition camera under controlled lighting conditions. In such a defined posed situation, all of the key elements for good image capture can be controlled. Additionally, the image transfer time from the image capture device to the computer performing the facial recognition task is not critical. In uncontrolled environments the target may be moving and not looking directly at the camera.
BRIEF SUMMARY OF THE DISCLOSURE
According to a first aspect of the present invention, there is provided a method of image processing for object recognition applications, the method comprising the steps of: using an image capture device to acquire a first, relatively low resolution image of a relatively wide field of view containing a plurality of objects; applying a pattern recognition algorithm to the first image so as to identify objects of interest; selecting at least one object of interest from the first image and determining its coordinates within the image; and using the determined coordinates to control either the image capture device or an additional image capture device so as to obtain a sequence of second, relatively high resolution images of a relatively narrow field of view containing the at least one object of interest.
Te first image is advantageously time multiplexed with the sequence of second images.
A single image capture device may be used to acquire both the first and the second images. Alternatively, a first image capture device acquires the first image, and a second image capture device acquires the second images.
The image capture device may be controlled so as to perform digital zoom and optionally digital pan and/or tilt in order to select the relatively narrow field of view.
Alternatively, the image capture device may be controlled so as to perform optical zoom and optionally electromechanical pan and/or tilt in order to select the relatively narrow field of view.
Preferably, only the second images are used for object recognition processing.
According to a second aspect of the present invention, there is provided apparatus for image processing for object recognition applications, the apparatus comprising: an image capture device adapted to acquire a first, relatively low resolution image of a relatively wide field of view containing a plurality of objects; means for applying a pattern recognition algorithm to the first image so as to identify objects of interest; means for selecting at least one object of interest from the first image and determining its coordinates within the image; and means for using the determined coordinates to control either the image capture device or an additional image capture device so as to obtain a sequence of second, relatively high resolution images of a relatively narrow field of view containing the at least one object of interest.
The apparatus may comprise means for time multiplexing the first image with the sequence of second images.
The apparatus may comprise a single image capture device for acquiring both the first and the second images, or may comprise a first image capture device for acquiring the first image, and a second image capture device for acquiring the second images.
The image capture device may be controlled so as to perform digital zoom and optionally digital pan and/or tilt in order to select the relatively narrow field of view, or may be controlled so as to perform optical zoom and optionally electromechanical pan and/or tilt in order to select the relatively narrow field of view.
The present application describes a technique that acquires good quality images of the facial features of targets in both ideal and non-ideal locations where the target may be stationary or moving and fast image capture is required. By adapting both the imaged area and image resolution the method transmits the minimal data required for good facial recognition. Embodiments of the invention also provide the means to acquire good quality facial images in areas where significant numbers of people are present. Embodiments of the invention remove the requirement for moving parts, filtering, encoding or other image processing at the image source. As a result, embodiments of the invention provide excellent or optimal image quality of single or multiple facial areas within a viewing region whilst maintaining a high frame rate. It must also be stated that any image based biometric, regardless of the specifics of the biometric and the light spectrum used, places a similar requirement upon the image capture device.
It is true to say that in image based biometric recognition in general, high resolution, noise free images of the area used for analysis are highly beneficial, and that multiple instances of these biometric regions may be present in any image captured using the full viewing cone of the capture device. Iris recognition is an example where the eyes form the object specific target within the full facial image. Analysis is generally constrained by time in any practical application; hence high throughput of relevant image data to a biometric recognition engine is of great importance. For reasons of clarity and brevity, we will continue to address the issue of face recognition, though it must be borne in mind that any image-based biometric could be substituted for the face.
In areas with significant target numbers, the opportunity to locate a large number of faces arises provided each face occupies a small area of the available image. However this prevents image recognition due to poor image detail. Within densely populated places, a high-resolution camera image could be captured and processed with high- resolution facial detail. Transmission of such images is slow, and a key element of good facial recognition in such dynamic situations is the use of a sequence of closely time-related images. Face recognition systems make a comparison between a previously taken reference image and current images. Because of the comparison process, different pose positions present problems to the recognition system. To minimise or reduces this problem in dynamic situations, a sequence of images taken in quick succession allow the recognition system to select the best pose within the sequence. It is therefore important to maintain high frame rates during image capture.
Embodiments of the present invention use dynamic object specific digital pan and zoom, where a down sampled, low resolution, typically VGA high-speed representation of the full high-resolution image is time multiplexed with high resolution sub regions containing facial detail. The data reduction allows multiple sub regions to be transmitted within single frame periods without significant reduction in frame rates. Such images may be derived from 1.3 to 8Mpixel sensors and may rise as camera technology improves.
The VGA resolution image is first transmitted to the host facial recognition computer, facial areas are located by head and/or eye location algorithms running on the host. The coordinates of the target specific area is passed back to the camera. The camera operates digital pan and zoom to send a high-resolution subset of the full image area but transmits only the data containing the areas of the identified targets. Several highspeed transmissions of these areas can be made to optimise the facial position for the image recognition software. This approach provides target area location and high- resolution facial images at frame rates significantly higher than 25fps. This has the potential for capturing and processing crowded areas within the image area at any one time; with suitable lenses this may be achieved at a distance of tens of metres from the camera.
Many existing facial recognition systems employ standard closed circuit TV (CCTV) based on CCD technology cameras, to acquire the image for recognition. In security applications the pixel resolution of these cameras is often below the normal maximums of 752(H) x 582(V) for PAL and 768(H) x 494(V) for NTS. In non-posed applications, mechanical control of camera tilt, pan and zoom can be employed to optimise the image size but the response time of these mechanical systems is often too slow to optimise each image capture [Alessio Del Bue, Dorin Comaniciu, Visvanathan Ramesh, Carlo Regazzoni, "Smart Cameras With Real-Time Video Object Generation". IEEE International Conference Image Processing, (ICIP'02), Rochester, NY, VoI 3, 429 - 432, 2002].
Alternatively, a compromise may be made where the image of the facial area is a small subset of the total available image area allowing target acquisition over a larger area. In this situation the active area is large but the detail is insufficient for good facial recognition. In PAL applications where some level of posing is possible, the facial image size may be around 50% of the total available image (376 x 291 pixels) and in non-posed applications, facial image sizes of 10% of the total available image (75 x 58 pixels) may be the only ones available. With a resolution of only 58 pixels in the vertical and a head size of 280mm, the geometry of the face can only be measured to a resolution of 280mm/58 = 4.8mm. This results in an uncertainty on the size of features such as eye sockets of 4.8mm x 2 = 9.6mm. This poor geometric resolution limits the accuracy of the geometric component of the facial recognition system. If skin defects such as moles are considered these are generally smaller than normal features and the ability to identify these marks is significantly compromised.
An alternative is to use high resolution CMOS or CCD imaging technology that currently produces images with resolutions up to 8 megapixels (8Mp) and will continue to rise as imaging technology develops. A typical 5Mp device at present provides a resolution of 2592(H) x 1944(V). A like-for-like comparison with the aforementioned PAL system gives a 194 pixel resolution in the vertical for an image area of 10% of the total. This results in a geometric resolution of 280mm/194 = 1.4mm under the same conditions. The 8Mp versions reduce this further to only 1.1 mm. This is a significant improvement but requires a large increase in data transmission between the camera and the processing computer.
In the PAL system described, the total number of pixels is 752 x 582 = 438Kp, whilst the CMOS system has a total number of 2592 x 1944 = 5Mp, which is an 1 1 -fold increase in the data transmission requirement. In addition, the CMOS device produces a 10bit digital value for each pixel, unlike the PAL system, which transmits an analogue representation of the image. As transmission standards such as USB are 8 bit or 16 bit orientated, 5Mp x16 bit = 80 megabit (80Mb) is required to transfer a full frame with 10 bit colour depth. With standard transmission methods such as USB1 and full speed USB2 only 12Mb/s speeds are available; this determines a frame transfer speed of 6.6 seconds. In many applications this is much too slow. Other transmission formats, such as high speed USB2, transmit data at 480Mb/s but this only achieves a frame rate of 6fps. This is significantly slower than the 25fps associated with PAL systems. This problem becomes even more difficult if an 8Mp or higher resolution imaging device is employed.
Coding systems may be employed within the imaging device, for example JPEG or MPEG. This will increase the image transfer rate but requires additional hardware within the camera to provide the coding functions and requires decoding at the host computer. Compression techniques also reduce image quality if high levels of compression are employed; this limits the level of compression available and hence reduces the available frame rate.
Data reduction techniques described in US 6,829,391 'Adaptive resolution system and method for providing efficient low bit Rate transmission of image data for distributed applications, describe complex filtering of the data where upon data in the target area has high resolution whilst data in the surrounding area has a low level of detail'. This is based on the log polar mapping [Dorin Comaniciu, Fabio Berton, Visvanathan Ramesh, "Adaptive Resolution System for Distributed Surveillance"; Real Time Imaging, VoI 8, No 5, 427-437, 2002 and also Dorin Comaniciu, Visvanathan Ramesh, "Robust Detection and Tracking of Human Faces with an Active Camera"; IEEE lnt Workshop on Visual Surveillance, Dublin, Ireland, 1 1-18, 2000] technique in US 5,103,306 'Digital image compression employing a resolution gradient', a technique highly suited to images containing a single face where a single region of interest exists. However, it becomes much more complex when multiple faces, requiring multiple overlaid foveation patterns, exist within a single image. This increase in complexity significantly increases the computational load and raises the required bandwidth of the transmission if downgraded image quality is to be avoided. The alternative is to restrict the method to single target foveation, selecting only single facial images at any time regardless of the actual number of interest regions present in the image.
Embodiments of the present invention use multimode transmission with target specific area transmission to achieve high definition, high-speed transmission of facial images. The implementation requires minimal computation within the video data stream, requiring only the formatting of data to conform to the chosen output format. The embedded controller operating outside the main video data stream is subsequently able to operate at lower speeds and implements the main control functionality. High-speed head and eye location algorithms have been available for some time and are used in the early stage of facial recognition to locate and isolate the head from the rest of the image. The resulting image is then processed for recognition. Embodiments of the present invention use head location algorithms running on the host face recognition-processing computer to locate the position of a face on a low-resolution (typically 640 x 480 pixels) image. This image is derived from a high resolution CMOS or CCD imaging sensor, where the high-resolution image has been down sampled from for example 5Mp to 300Kp. The down sampled image is transmitted via a high-speed digital link to the processing host. If high-speed USB2 is employed, frame rates of around I OOfps can be transmitted at VGA resolution, the actual frame rate is determined by the speed of the image capture device and the number of sub frames transmitted per full frame. On receipt of the low-resolution image, the head is located and the coordinates of the head location are transmitted back to the imaging sensor. The coordinates either define the location of a default window size, at which point the entire default window area is transmitted, or alternatively two coordinates are used, determined by the face location algorithm, that define two diagonally opposed corners. In the case of the opposed corners, a variable window size results where the transmitted image area contains minimal data not relating to the facial region of interest. Image transmission rates in this system are therefore increased due to the high level of redundant data removal. The high-resolution subset of the image area is transmitted back to the host processor, allowing full facial recognition to be applied to the high- resolution image area of the facial region of interest. Once the facial area is located the coordinates of the facial area may be recalculated from the high-resolution current image. The updated coordinates may then be retransmitted to the imaging system, allowing tracking of the subject within the total available image area.
Due to the high data rate transmission available with transmission media such as USB2 and the high level of data reduction within embodiments of this invention, multiple images can be transmitted within each frame period. This allows time division multiplexing to intersperse full area images with high-resolution facial images within a single frame period. The following example is based on a 640 x 480 pixel reference image with three facial windows. Each facial window is determined from the previous reference image. This allows the system to track and identify the subject in real time with high frame rates. Full reference image 640x480x16bit=4.9Mbit
Facial window 1 300x200x16bit=1 Mbit
Facial window 2 600x400x16bit=3.8Mbit
Facial window 3 700x450x16bit=5Mbit
Total transmission required = 14.7Mbit
Data rate available on USB2; 480Mbit/s
Frame Rate = 480/14.7 = 32 frames/sec
These high frame rate high-resolution images allow multiple images of a subject to be acquired while in motion, providing the opportunity to capture multiple facial images with the optimal facial orientation.
In applications where the distance between the camera and the host processor is too large for low cost digital transmission to be implemented, the digital interface may be replaced or supplemented by PAL or NTSC analogue transmission. An auxiliary digital communications line is employed to dynamically pass the digital pan, zoom and other commands to the camera system. If the auxiliary communications is in the form of RS485 or RS422, long cable runs may be employed allowing remote mounting of camera systems. However, frame rates in this configuration reduce due to the limited transmission bandwidth.
Throughout the description and claims of this specification, the words "comprise" and "contain" and variations of the words, for example "comprising" and "comprises", means "including but not limited to", and is not intended to (and does not) exclude other moieties, additives, components, integers or steps.
Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.
Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of the present invention and to show how it may be carried into effect, reference shall now be made by way of example to the accompanying drawings, in which:
FIGURE 1 shows an image of an eye region of a subject taken with a low resolution image capture device;
FIGURE 2 shows an image of an eye region of a subject taken with a high resolution image capture device;
FIGURE 3 shows an image from a high resolution image capture device viewing an area where faces may appear;
FIGURE 4 shows a facial region selected from the image of Figure 3;
FIGURE 5 shows an implementation of an embodiment of the invention;
FIGURE 6 shows an implementation of another embodiment of the invention; and
FIGURE 7 shows an implementation of a further embodiment of the invention.
DETAILED DESCRIPTION
Figure 1 shows an image of the eye region of a subject taken with a CCD camera with a resolution of 752 x 582 pixels. The camera has been rotated by 90 degrees to improve vertical resolution and resulting image rotated back through 90 degrees. The subjects' facial area occupies approximately 25% of the full image area.
Figure 2 shows an image of the same eye region taken with a 3.1 Mp camera with the same total image area. Only the 640 x 480 pixels in the facial area were transmitted thus reducing the data transmission demand significantly compared to transmission of the full 3.1 Mp image. Dynamic object specific systems only transmit the required data, therefore resolution and picture quality is significantly enhanced for a given data transmission rate, which results in much higher accuracy within the facial recognition system.
Figure 3 shows the image from a 3.1 Mp camera viewing an area where faces may appear. The object specific head and eye location identifies the area of the image that holds a valid facial region and passes the coordinates to the camera. The camera then selects this region 1 and transmits the high definition version of this image area, this being shown in Figure 4.
This process requires the transmission of a 640 pixel x 480 pixel reference image and 224 x 241 pixel facial image. The resulting data is 5.8Mbit at 16bit colour depth. With 480Mbit/second on high speed USB2 the transmission time is around 12ms.
Figure 5 shows an implementation of the invention such that the high-resolution image is converted to a digital bitmap. A camera system 2 comprises a high resolution image sensor 3 linked to an analog-to-digital converter 4, which in turn is linked to a digital communications interface (DCI) 5. The camera system 2 components are controlled by a control microprocessor 6. Image data is sent to a host computer 7, which includes a DCI 8, a face recognition system 9 and a system 10 for locating head and eye regions in the image. The system 10 is configured to transmit head and eye coordinates back to the control microprocessor 6 of the camera system 2.
Sub sampling and windowing is performed by the image sensor 3 under the control of the control microprocessor 6, which in turn derives the windowing control data from the remote host computer 7. Image data is transmitted to the host 7 from the camera 2 via a suitable interface such as USB2 or FireWire. The digital communications may incorporate suitable encoding such as JPEG or MPEG. In this implementation the reference full frame is transmitted first. The host 7 determines the window coordinates, and a sequence of sub frames requested by the host 7. Each sub frame is derived from a new image captured by the sensing device 3 and is therefore not at exactly the same time frame as the reference image.
Figure 6 shows an implementation such that the operation is as in Figure 5 with the addition of a full frame memory 1 1. In this implementation the full high-resolution image is stored in the memory 1 1. Down sampling is then performed by reading subsets of the image data stored in the memory 1 1. Windowing is also performed by reading subsets of stored data. This implementation generates a reference image and captures subset images at the same instant in time.
Figure 7 shows an implementation such that the image is transferred to the host 7 via analog video encoders 12, 13 as standard analogue transmission media such as PAL or NTSC (rather than digitally). In this implementation, sub frame transmission speeds remain the same as full frame transmission speeds and must be interposed with the full frame reference image. Image transfer rates in this embodiment are lower than in the digital transmission implementations

Claims

CLAIMS:
1. A method of image processing for object recognition applications, the method comprising the steps of: using an image capture device to acquire a first, relatively low resolution image of a relatively wide field of view containing a plurality of objects; applying a pattern recognition algorithm to the first image so as to identify objects of interest; selecting at least one object of interest from the first image and determining its coordinates within the image; and using the determined coordinates to control either the image capture device or an additional image capture device so as to obtain a sequence of second, relatively high resolution images of a relatively narrow field of view containing the at least one object of interest.
2. A method according to claim 1 , wherein the first image is time multiplexed with the sequence of second images.
3. A method according to any preceding claim, wherein a single image capture device is used to acquire both the first and the second images.
4. A method according to claim 1 or 2, wherein a first image capture device acquires the first image, and a second image capture device acquires the second images.
5. A method according to any preceding claim, wherein the image capture device is controlled so as to perform digital zoom and optionally digital pan and/or tilt in order to select the relatively narrow field of view.
6. A method according to any one of claims 1 to 4, wherein the image capture device is controlled so as to perform optical zoom and optionally electromechanical pan and/or tilt in order to select the relatively narrow field of view.
7. A method according to any preceding claim, wherein only the second images are used for object recognition processing.
8. Apparatus for image processing for object recognition applications, the apparatus comprising: an image capture device adapted to acquire a first, relatively low resolution image of a relatively wide field of view containing a plurality of objects; means for applying a pattern recognition algorithm to the first image so as to identify objects of interest; means for selecting at least one object of interest from the first image and determining its coordinates within the image; and means for using the determined coordinates to control either the image capture device or an additional image capture device so as to obtain a sequence of second, relatively high resolution images of a relatively narrow field of view containing the at least one object of interest.
9. Apparatus as claimed in claim 8, comprising means for time multiplexing the first image with the sequence of second images.
10. Apparatus as claimed in claim 8 or 9, comprising a single image capture device for acquiring both the first and the second images.
1 1. Apparatus as claimed in claim 8 or 9, comprising a first image capture device for acquiring the first image, and a second image capture device for acquiring the second images.
12. Apparatus as claimed in any one of claims 8 to 1 1 , wherein the image capture device is controlled so as to perform digital zoom and optionally digital pan and/or tilt in order to select the relatively narrow field of view.
13. Apparatus as claimed in any one of claims 8 to 1 1 , wherein the image capture device is controlled so as to perform optical zoom and optionally electromechanical pan and/or tilt in order to select the relatively narrow field of view.
14. A method of image processing for object recognition applications substantially as hereinbefore described with reference to or as shown in the accompanying drawings.
15. Apparatus for image processing for object recognition applications substantially as hereinbefore described with reference to or as shown in the accompanying drawings.
PCT/GB2008/050144 2007-03-07 2008-03-03 Controlled high resolution sub-image capture with time domain multiplexed high speed full field of view reference video stream for image based biometric applications WO2008107713A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0704325.0A GB2447246B (en) 2007-03-07 2007-03-07 Controlled high resolution sub-image capture with time domain multiplexed high speed full field of view reference video stream for image biometric application
GB0704325.0 2007-03-07

Publications (1)

Publication Number Publication Date
WO2008107713A1 true WO2008107713A1 (en) 2008-09-12

Family

ID=37966027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2008/050144 WO2008107713A1 (en) 2007-03-07 2008-03-03 Controlled high resolution sub-image capture with time domain multiplexed high speed full field of view reference video stream for image based biometric applications

Country Status (2)

Country Link
GB (1) GB2447246B (en)
WO (1) WO2008107713A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102469372A (en) * 2010-11-11 2012-05-23 Lg电子株式会社 Multimedia device, multiple image sensors having different types and method for controlling the same
EP2587407A1 (en) * 2011-10-27 2013-05-01 Samsung Electronics Co., Ltd Vision recognition apparatus and method
CN104869306A (en) * 2014-02-21 2015-08-26 托比技术股份公司 Apparatus and method for robust eye/gaze tracking
US9639741B2 (en) 2014-04-14 2017-05-02 International Business Machines Corporation Facial recognition with biometric pre-filters
US9667872B2 (en) * 2012-12-05 2017-05-30 Hewlett-Packard Development Company, L.P. Camera to capture multiple images at multiple focus positions
CN107292228A (en) * 2017-05-05 2017-10-24 珠海数字动力科技股份有限公司 A kind of method for accelerating face recognition search speed
US9886630B2 (en) 2014-02-21 2018-02-06 Tobii Ab Apparatus and method for robust eye/gaze tracking
US10572008B2 (en) 2014-02-21 2020-02-25 Tobii Ab Apparatus and method for robust eye/gaze tracking

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202018662U (en) * 2007-05-24 2011-10-26 德萨拉技术爱尔兰有限公司 Image acquisition and processing equipment
TW201121314A (en) * 2009-12-01 2011-06-16 Htc Corp Object image cropping method, object image cropping system and digital image device
EP2453386B1 (en) * 2010-11-11 2019-03-06 LG Electronics Inc. Multimedia device, multiple image sensors having different types and method for controlling the same
US9473702B2 (en) * 2011-12-23 2016-10-18 Nokia Technologies Oy Controlling image capture and/or controlling image processing
US10841458B2 (en) 2018-03-02 2020-11-17 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and storage medium
US11106912B1 (en) 2019-08-05 2021-08-31 Genetec Inc. Method and system for video content analysis
CN114641806A (en) * 2020-10-13 2022-06-17 谷歌有限责任公司 Distributed sensor data processing using multiple classifiers on multiple devices

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003034361A1 (en) * 2001-10-17 2003-04-24 Biodentity Systems Corporation Face imaging system for recordal and automated identity confirmation
US6714665B1 (en) * 1994-09-02 2004-03-30 Sarnoff Corporation Fully automated iris recognition system utilizing wide and narrow fields of view

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6215519B1 (en) * 1998-03-04 2001-04-10 The Trustees Of Columbia University In The City Of New York Combined wide angle and narrow angle imaging system and method for surveillance and monitoring
US7940299B2 (en) * 2001-08-09 2011-05-10 Technest Holdings, Inc. Method and apparatus for an omni-directional video surveillance system
JP2005173787A (en) * 2003-12-09 2005-06-30 Fujitsu Ltd Image processor detecting/recognizing moving body
WO2006040687A2 (en) * 2004-07-19 2006-04-20 Grandeye, Ltd. Automatically expanding the zoom capability of a wide-angle video camera
WO2007014216A2 (en) * 2005-07-22 2007-02-01 Cernium Corporation Directed attention digital video recordation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6714665B1 (en) * 1994-09-02 2004-03-30 Sarnoff Corporation Fully automated iris recognition system utilizing wide and narrow fields of view
WO2003034361A1 (en) * 2001-10-17 2003-04-24 Biodentity Systems Corporation Face imaging system for recordal and automated identity confirmation

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102469372A (en) * 2010-11-11 2012-05-23 Lg电子株式会社 Multimedia device, multiple image sensors having different types and method for controlling the same
US10095941B2 (en) 2011-10-27 2018-10-09 Samsung Electronics Co., Ltd Vision recognition apparatus and method
EP2587407A1 (en) * 2011-10-27 2013-05-01 Samsung Electronics Co., Ltd Vision recognition apparatus and method
US9667872B2 (en) * 2012-12-05 2017-05-30 Hewlett-Packard Development Company, L.P. Camera to capture multiple images at multiple focus positions
CN104869306A (en) * 2014-02-21 2015-08-26 托比技术股份公司 Apparatus and method for robust eye/gaze tracking
US10572008B2 (en) 2014-02-21 2020-02-25 Tobii Ab Apparatus and method for robust eye/gaze tracking
US9646207B2 (en) 2014-02-21 2017-05-09 Tobii Ab Apparatus and method for robust eye/gaze tracking
CN104869306B (en) * 2014-02-21 2019-12-17 托比公司 Robust eye/gaze tracking apparatus, method and medium
US9886630B2 (en) 2014-02-21 2018-02-06 Tobii Ab Apparatus and method for robust eye/gaze tracking
US10282608B2 (en) 2014-02-21 2019-05-07 Tobii Ab Apparatus and method for robust eye/gaze tracking
US9665766B2 (en) 2014-04-14 2017-05-30 International Business Machines Corporation Facial recognition with biometric pre-filters
US10074008B2 (en) 2014-04-14 2018-09-11 International Business Machines Corporation Facial recognition with biometric pre-filters
US9639741B2 (en) 2014-04-14 2017-05-02 International Business Machines Corporation Facial recognition with biometric pre-filters
CN107292228A (en) * 2017-05-05 2017-10-24 珠海数字动力科技股份有限公司 A kind of method for accelerating face recognition search speed

Also Published As

Publication number Publication date
GB0704325D0 (en) 2007-04-11
GB2447246A (en) 2008-09-10
GB2447246B (en) 2012-04-18

Similar Documents

Publication Publication Date Title
WO2008107713A1 (en) Controlled high resolution sub-image capture with time domain multiplexed high speed full field of view reference video stream for image based biometric applications
JP3870124B2 (en) Image processing apparatus and method, computer program, and computer-readable storage medium
US8264524B1 (en) System for streaming multiple regions deriving from a wide-angle camera
US8243135B2 (en) Multiple-view processing in wide-angle video camera
US10277901B2 (en) Encoding a video stream having a privacy mask
US20190199898A1 (en) Image capturing apparatus, image processing apparatus, control method, and storage medium
US20060140445A1 (en) Method and apparatus for capturing digital facial images optimally suited for manual and automated recognition
US20080129844A1 (en) Apparatus for image capture with automatic and manual field of interest processing with a multi-resolution camera
US10110929B2 (en) Method of pre-processing digital images, and digital image preprocessing system
US9253411B2 (en) Image processing apparatus, image processing method and image communication system
US20040001149A1 (en) Dual-mode surveillance system
US20080062278A1 (en) Secure Access Camera and Method for Camera Control
US20070222858A1 (en) Monitoring system, monitoring method and program therefor
CN105827935A (en) Terminal screenshot method and terminal
EP3182704B1 (en) A bit rate controller and a method for limiting output bit rate
CN106600548B (en) Fisheye camera image processing method and system
WO2006010910A1 (en) Apparatus and method for capturing and transmitting images of a scene
JP2005175970A (en) Imaging system
KR102474697B1 (en) Image Pickup Apparatus and Method for Processing Images
JP2007020064A (en) System, apparatus, method and program for imaging
CN112261474A (en) Multimedia video image processing system and processing method
CN106101530A (en) A kind of method that high-speed adaptability night vision image strengthens
EP2495972A1 (en) Monitoring device and method for monitoring a location
KR20040039080A (en) Auto tracking and auto zooming method of multi channel by digital image processing
WO2002013535A2 (en) Video encoder using image from a secondary image sensor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08709664

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08709664

Country of ref document: EP

Kind code of ref document: A1