US20120140982A1 - Image search apparatus and image search method - Google Patents
Image search apparatus and image search method Download PDFInfo
- Publication number
- US20120140982A1 US20120140982A1 US13/232,245 US201113232245A US2012140982A1 US 20120140982 A1 US20120140982 A1 US 20120140982A1 US 201113232245 A US201113232245 A US 201113232245A US 2012140982 A1 US2012140982 A1 US 2012140982A1
- Authority
- US
- United States
- Prior art keywords
- image
- event
- detection module
- module
- face
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/28—Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/772—Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/70—Multimodal biometrics, e.g. combining information from different biometric modalities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/178—Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
Definitions
- Embodiments described herein relate generally to an image search apparatus and an image search method.
- Developments are made in technology for searching for a desired image from monitor images obtained by a plurality of cameras installed at a plurality of locations. Such technology is to search for a desired image from among images directly input from cameras or images accumulated in a recording apparatus.
- a face image including a specified feature can is searched for from a database by specifying a feature of a face of a human figure to search for, as a search condition.
- a high-speed search is achieved by performing a search by using a name, a member ID, or registration year/month/date, in addition to a face image.
- recognition dictionaries are narrowed by using attribute information (height, weight, gender, age, etc.) other than main biometric information such as a face.
- attribute information in text form
- the present invention hence provides an image search apparatus and an image search method capable of more efficiently performing an image search.
- FIG. 1 is an exemplary diagram showing for explaining an image search apparatus according to an embodiment
- FIG. 2 is an exemplary diagram showing for explaining the image search apparatus according to the embodiment
- FIG. 3 is an exemplary diagram showing for explaining the image search apparatus according to the embodiment.
- FIG. 4 is an exemplary diagram showing for explaining the image search apparatus according to the embodiment.
- FIG. 5 is an exemplary table showing for explaining the image search apparatus according to the embodiment.
- FIG. 6 is an exemplary graph showing for explaining the image search apparatus according to the embodiment.
- FIG. 7 is an exemplary diagram showing for explaining an image search apparatus according to an another embodiment
- FIG. 8 is an exemplary diagram showing for explaining the image search apparatus according to the another embodiment.
- FIG. 9 is an exemplary diagram showing for explaining the image search apparatus according to the another embodiment.
- FIG. 10 is an exemplary diagram showing for explaining the image search apparatus according to the another embodiment.
- FIG. 11 is an exemplary diagram showing for explaining the image search apparatus according to the another embodiment.
- an image search apparatus comprises; an image input module which is input with an image, an event detection module which detects events from the input image input by the image input module, and determines levels, depending on types of the detected events, an event controlling module which retains the events detected by the event detection module, for each of the levels, and an output module which outputs the events retained by the event controlling module, for each of the levels.
- FIG. 1 is an exemplary diagram showing for explaining an image search apparatus 100 according to the one embodiment.
- the image search apparatus 100 comprises an image input module 110 , an event detection module 120 , a search-feature-information controlling unit module 130 , an event controlling module 140 , and an output module 150 .
- the image search apparatus 100 may comprise an operation module which receives an operational input from users.
- the image search apparatus 100 extracts scenes which image a specific human figure from input images (image sequence or photographs) such as monitor images.
- the image search apparatus 100 extracts events depending on reliability degrees indicating how reliably a human figure is imaged. In this manner, the image search apparatus 100 assigns levels to scenes including the extracted events, respectively for the reliability degrees. By controlling a list of the extracted events linked with images, the image search apparatus 100 can easily output scenes in which a desired human figure exists.
- the image search apparatus 100 can search for the same human figure as imaged in a face photo currently in hand.
- the video search apparatus 100 can also search for relevant images when an accident or crime happens. Further, the image search apparatus 100 can search for relevant scenes or events among images from an installed security camera.
- the image input module 110 is an input means to which images are input from a camera or a storage which stores images.
- the event detection module 120 detects events such as a moving region, a personal region, face region, personal attribute information, or personal identification information.
- the event detection module 120 sequentially obtains information (frame information) indicating positions of frames including the detected events in a video image.
- a search-feature-information controlling module 130 stores personal information and information used for attribute determination.
- An event controlling module 140 links input images, detected events, and frame information to one another.
- the output module 150 outputs a result controlled by the event controlling module 140 .
- the image input module 110 inputs a face image of a target human figure to image.
- the image input module 110 comprises, for example, an industrial television (ITV) camera.
- the ITV camera digitizes optical information received through a lens, by an A/D converter, and outputs the information as image data. In this manner, the image input module 110 can output image data to the event detection module 120 .
- ITV industrial television
- the image input module 110 may alternatively be configured to comprise a recording apparatus such as a digital video recorder (DVR), which records images, or an input terminal which is input with images recorded on a recording medium.
- a recording apparatus such as a digital video recorder (DVR)
- DVR digital video recorder
- the image input module 110 may have any configuration insofar as the configuration can obtain digitized image data.
- a search target needs only to be, finally, digital image data including a face image.
- An image file imaged by a digital still camera may be loaded through a medium, or even a digital image scanned from a paper medium or a photograph is available.
- a scene of searching a large amount of stored still images for a corresponding image is cited as an application example.
- the event detection module 120 detects an image supplied from the image input module 110 or an event to be detected based on a plurality of images.
- the event detection module 120 also detects an index indicating a frame (e.g., a frame number) in which an event has been detected. For example, when images to be input are a plurality of still images, the event detection module 120 may detect file names of the still images as frame information.
- the event detection module 120 detects, as events, a scene where a region which moves with a predetermined size or more exists, a scene where a human figure exists, a scene where a face of a human figure is detected, a scene where a face of a human figure is detected and a person corresponding to a specific attribute exists, and a scene where a face of a human figure is detected and a specific person exists.
- events which are detected by the event detection module 120 are not limited to those described above.
- the event detection module 120 may be configured to detect an event in any way insofar as the event indicates that a human figure exists.
- the event detection module 120 detects a scene which may image a human figure, as an event.
- the event detection module 120 adds levels respectively to scenes in order from a scene from which the greatest amount of information relevant to a human figure can be obtained.
- the event detection module 120 assigns “level 1” as the lowest level to each scene where a region which moves over a predetermined size or more exists.
- the event detection module 120 assigns “level 2” to each scene where a human figure exists.
- the event detection module 120 assigns “level 3” to each scene where a human figure's face is detected.
- the event detection module 120 assigns “level 4” to each scene where a human figure's face is detected and a human figure corresponding to a specific attribute exists.
- the event detection module 120 assigns “level 5” as the highest level to each scene where a human figure's face is detected and a specific person exists.
- the event detection module 120 detects a region which moves over a predetermined size or more, in a method described below.
- the event detection module 120 detects a scene where a region which moves over a predetermined size or more exists, based on a method disclosed in Japanese Patent No. P3486229, P3490196, or P3567114.
- the event detection module 120 stores, for preliminary study, a distribution of luminance in a background image, and compares an image supplied from the image input module 110 with the prestored luminance distribution. As a result of comparison, the event detection module 120 determines that an “object not forming part of a background exists” in any region of the image which does not match with the luminance distribution.
- general versatility can be improved by employing a method capable of correctly detecting an “object not forming part of a background” even from an image including a background where a periodical change appears like trembling of leaves.
- the event detection module 120 divides each set of pixels each of which is expressed by “1” by means of labeling, and calculates a size of a moving region, based on a size of a circumscribed rectangle for each of the sets of pixels, or based on a number of moving pixels included in each of the sets of pixels. If the calculated size is larger than a preset reference size, the event detection module 120 determines “changed” and extracts the image.
- the event detection module 120 determines that pixel values have changed because the sun has gone behind a cloud and it has suddenly become dark or because a near illumination has turned on, or from any other casual reason. Therefore, the event detection module 120 can correctly extract a scene where a moving object such as a human figure exists.
- the event detection module 120 can also correctly extract a scene where a moving object such as a human figure exists, by setting an upper limit to a size to be determined as a moving region. For example, the event detection module 120 can more accurately extract a scene where a human figure exists, by setting thresholds for upper and lower limits to an assumed size of a distribution of a human being.
- the event detection module 120 can detect a scene where a human figure exists, based on a method described below.
- the event detection module 120 can detect a scene where a human figure exists by using technology of detecting a region of the whole of a human figure.
- the technology of detecting a region of the whole of a human figure is described for example, Document 1 (Watanabe et al., “Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection, In Proceedings of the 3rd Pacific-Rim Symposium on Image and Video Technology” (PSIVT2009), pp. 37-47.)
- the event detection module 120 obtains how a distribution of luminance gradient information appears when a human figure exists, by using co-occurrence at a plurality of local regions. If a human figure exists, an upper half region of the human figure can be calculated as rectangle information.
- the event detection module 120 detects a frame thereof as an event. According to this method, the event detection module 120 can detect a scene where a human figure exists even when a face of the human figure is not imaged in the image or if resolution is insufficient to recognize a face.
- the event detection module 120 detects a scene where a face of a human figure is detected.
- the event detection module 120 calculates a correlation value with moving a prepared template within an input image.
- the event detection module 120 specifies, as a face region, a region where a highest correlation value is calculated. In this manner, the event detection module 120 can detect a scene where a face of a human figure is imaged.
- the event detection module 120 may be configured to detect a face region by using an eigen space method or a subspace method.
- the event detection module 120 detects a position of a facial portion such as an eye or a nose from an image of a detected face region.
- the event detection module 120 can detect facial portions according to a method described in, for example, Document 2 (Kazuhiro Fukui and Osamu Yamaguchi, “Facial Feature Point Extraction Method Based on Combination of Shape Extraction and Pattern Matching”, Transactions of the Institute of Electronics, Information and Communication Engineers (D), vol. J80-D-II, No. 8, pp 2170-2177 (1997))
- the event detection module 120 When the event detection module 120 detects one face region (facial feature) from one image, the event detection module 120 obtains a correlation value with respect to a template for the whole image, and outputs a position and a size which maximize the correlation value. When a plurality of facial features are obtained from one image, the event detection module 120 obtains a local maximum value of the correlation value for the while image, and narrows candidate positions of a face in consideration of overlapping within one image. Further, the event detection module 120 can finally simultaneously detect a plurality of facial features in consideration of relationships (chronological transition) with past images which have been sequentially input.
- the event detection module 120 may be configured to prestore facial patterns of human figures wearing a mask, sunglasses, and a headgear, as templates in order that a face region can be detected even if a human figure wears a mask, sun-glasses, or a headgear.
- the event detection module 120 If the event detection module 120 cannot detect all of facial feature points when the event detection module 120 detects facial feature points, the event detection module 120 performs a processing, based on evaluation values for part of facial feature points. Specifically, if an evaluation value for part of facial feature points is not smaller than a preset reference value, the event detection module 120 can estimate remaining feature points from feature points which have been detected by using a two-dimensional or three-dimensional facial model.
- the event detection module 120 can detect a position of a whole face and can estimate a facial feature point from the position of the whole face, by preliminarily studying a pattern of a whole face.
- the event detection module 120 may give an instruction about which face to set as a search target, by a search condition setting means or an output means. Further, the event detection module 120 may be configured to automatically select and output search targets in an order of indices indicating face likelihood obtained through the processing described above.
- the event detection module 120 calculates probabilities, based on statistical information indicating which of sequential frames a human figure who normally walks moves to, and selects a combination which maximizes the probability.
- the event detection module 120 can thereby associate the combination with an event to issue. In this manner, the event detection module 120 can recognize, as one event, a scene where an identical human figure is imaged throughout a plurality of frames.
- the event detection module 120 associates personal regions or face regions with one another between frames by using, for example, an optical flow. Accordingly, the event detection module 120 can recognize, as one event, a scene where an identical human figure is imaged throughout a plurality of frames.
- the event detection module 120 can select a “best shot” from a plurality of frames (a group of associated images). The best shot is most suitable for visually checking a human figure.
- the event detection module 120 selects, as the best shot, a frame having the highest value which takes at least one or more indices into consideration, from among a frame which includes the largest face region, a frame in which a face of a human being is directed in a direction closest to the front direction, a frame which has the greatest contrast of an image in a face region, and a frame which has the greatest similarity to a pattern indicating face likelihood.
- the event detection module 120 may be configured to select, as the best shot, an easy-to-see image for human eyes or an image suitable for a recognition processing.
- a selection criterion for selecting such a best shot may be freely set based on user's discretion.
- the event detection module 120 detects a scene where a human figure corresponding to a specific attribute exists, based on a method described below.
- the event detection module 120 calculates feature information for specifying attribute information of a human figure by using information of a face region detected by the processing described above.
- Attribute information described in the present embodiment has been described as including the five types of age, sex, glasses type, mask type, and headgear type.
- the event detection module 120 may be configured to use other attribute information.
- the event detection module 120 may be configured to use, as attribute information, a race, wearing glasses or not (information of 1 or 0), wearing a mask or not (information of 1 or 0), wearing a headgear or not (information of 1 or 0), a facial accessory (pierce, earring, etc.), a wear, a face look, an obesity index, a wealth index, etc.
- the event detection module 120 can use any feature as an attribute by studying a pattern in advance for each attribute by using an attribute determination method described later.
- the event detection module 120 extracts a facial feature from an image in a face region. For example, the event detection module 120 can calculate the facial feature by using the subspace method.
- the event detection module 120 may be configured to calculate a facial feature by using a calculation method depending on attribute information to be compared with.
- the event detection module 120 can more accurately determine an attribute by applying an adequate pre-processing for each of the age and gender.
- the event detection module 120 can determine an attribute (age decade) of a human figure with high accuracy, by synthesizing a line-segment emphasis filter which emphasizes wrinkles, on an image of a face region.
- the event detection module 120 synthesizes a filter which emphasizes a frequency component to emphasize a portion specific to a gender (such as a beard), on an image of a face region, or synthesizes a filter which emphasizes skeletal information, on an image of a face region. In this manner, the event detection module 120 can more accurately determine an attribute (gender) of a person.
- the event detection module 120 specifies a position of an eye, an outer canthus, or an inner canthus from a facial portion obtained by a face detection processing. Therefore, the event detection module 120 can obtain feature information concerning glasses by cutting out an image around two eyes and by treating the cut image as a calculation target for a subspace.
- the event detection module 120 specifies, for example, positions of a mouth and a nose from positional information of facial portions, which is obtained by the face detection processing. Therefore, the event detection module 120 can obtain feature information concerning a mask, by cutting out an image around the specified positions of the mouth and nose and by treating the cut image as a calculation target for a subspace.
- the event detection module 120 specifies positions of eyes and eyeblows from positional information of facial portions obtained by the face detection processing. Therefore, the event detection module 120 can specify an upper end of a skin region of a face. Further, the event detection module 120 can obtain feature information concerning a headgear, by cutting out an image of a top region of a specified face and by treating the cut image as a calculation target for a subspace.
- the event detection module 120 can extract feature information by specifying glasses, a mask, and a hat from a position of a face. Specifically, the event detection module 120 can extract feature information from any attribute insofar as the attribute exists at a position which is estimable from a position of a face.
- the event detection module 120 may be configured to extract feature information by using such a method.
- the event detection module 120 extracts facial skin information directly as feature information. Therefore, different feature information is extracted individually for each of attributes such as glasses, a mask, and sunglasses. Specifically, the event detection module 120 need not mandatory extract feature information by particularly classifying attributes such as glasses, a mask, and sunglasses.
- the event detection module 120 may be configured to separately extract feature information indicating nothing put on if a human figure wears neither glasses, a mask, nor a hat.
- the event detection module 120 After calculating the feature information for determining an attribute, the event detection module 120 further compares the feature information with attribute information stored by the search-feature-information controlling module 130 described later. The event detection module 120 thereby determines an attribute such as a gender, an age decade, glasses, a mask, and a hat for a human figure of an input face image.
- the event detection module 120 sets, as an attribute to be used for detecting an event, at least one of an age, a gender, wearing glasses or not, a glasses type, wearing a mask or not, a mask type, wearing a headgear or not, a headgear type, a beard, a mole, a wrinkle, an injury, a hair color, a wear color, a wear shape, a headgear, an ornament, an accessory near a face, a face look, a wealth degree, and a race.
- the event detection module 120 outputs the determined attribute to the event detection module 120 .
- the event detection module 120 comprises an extraction module 121 and an attribute determination module 122 .
- the extraction module 121 extracts feature information for a predetermined region in a registered image (input image), as described above. For example, when face region information indicating a face region and an input image are input, the extraction module 121 then calculates feature information for the region indicated by the face region information in the input image.
- the attribute determination module 122 determines an attribute of a human figure in the input image, based on feature information extracted by the extraction module 121 and attribute information prestored in the search-feature-information controlling module 130 .
- the attribute determination module 122 determines an attribute of the human figure in the input image, by calculating a similarity between feature information extracted by the extraction module 121 and attribute information prestored in the search-feature-information controlling module 130 .
- the attribute determination module 122 comprises, for example, a gender determination module 123 and an age-decade determination module 124 .
- the attribute determination module 122 may further comprise a determination module for determining a further attribute.
- the attribute determination module 122 may comprise a determination module which determines an attribute such as glasses, a mask, or a headgear.
- the search-feature-information controlling module 130 preliminarily retains male attribute information and female attribute information.
- the gender determination module 123 calculates similarities, based on the male attribute information and female attribute information retained by the search-feature-information controlling module 130 , and the feature information extracted by the extraction module 121 .
- the gender determination module 123 outputs attribute information for which a greater similarity has been calculated, as a result of an attribute determination for an input image.
- the gender determination module 123 uses a feature amount by retaining an occurrence frequency of a local gradient feature of a face as statistical information. Specifically, the gender determination module 123 determines two classes such as maleness and femaleness, by selecting a gradient feature for which maleness or femaleness can be most identified from the statistical information, and by calculating a discriminator which identifies the feature through studies.
- the search-feature-information controlling module 130 preliminarily retains dictionaries of average facial features (attribute information) for the respective classes (age decades in this case).
- the age-decade determination module 124 calculates a similarity between attribute information for each age decade, which is retained in the search-feature-information controlling module 130 , and feature information extracted by the extraction module 121 .
- the age-decade determination module 124 determines an age decade of a human figure in an input image, based on the attribute information used for calculating the highest similarity.
- the search-feature-information controlling module 130 preliminarily retains a face image for each of ages which are desired to identify. For example, to determine an age decade group of ages from 10 to 60, the search-feature-information controlling module 130 preliminarily retains a face image for ages smaller than 10 and not smaller than 60. In this case, as the number of face images retained by the search-feature-information controlling module 130 increases, age decades can be determined more accurately. Further, the search-feature-information controlling module 130 can widen determinable ages by preliminarily retaining face images for wider age decades.
- the search-feature-information controlling module 130 prepares a discriminator for determining “whether an age decade is greater or smaller than a reference age”.
- the search-feature-information controlling module 130 can make the event detection module 120 perform a two-class determination by using linear discriminate analysis.
- the event detection module 120 and search-feature-information controlling module 130 may be configured to employ a method such as a support vector machine.
- the support vector machine will be hereinafter referred to as an SVM.
- SVM a boundary condition for discriminating two classes can be set, and whether a distance is within a set distance from a boundary or not can be calculated. Therefore, the event detection module 120 and search-feature-information controlling module 130 can discriminate face images which belong to ages greater than a reference age N and face images which belong to ages smaller than the reference age N.
- the search-feature-information controlling module 130 preliminarily retains a group of images for determining whether 30 is exceeded or not.
- the search-feature-information controlling module 130 is input with images including images for the age 30 or higher, as images for a positive class of “30 or higher”.
- the search-feature-information controlling module 130 is also input with images for a negative class of “smaller than 30”.
- the search-feature-information controlling module 130 performs SVM studies based on the input images.
- the search-feature-information controlling module 130 creates dictionaries, with reference ages shifted from 10 to 60.
- the search-feature-information controlling module 130 creates dictionaries for age decade determination of “10 or greater”, “smaller than 10”, “20 or greater”, “smaller than 20”, . . . , and “60 or greater”, “smaller than 60”.
- the age-decade determination module 124 determines an age decade for a human figure in an input image, based on a plurality of dictionaries for age decade determination which are stored by the search-feature-information controlling module 130 , and based on the input image.
- the search-feature-information controlling module 130 classifies images for age decade determination, which have been prepared by shifting the reference ages from 10 to 60, into two classes relative to a reference age. In this manner, the search-feature-information controlling module 130 can prepare a SVM study machine in accordance with the number of reference ages. In the present embodiment, the search-feature-information controlling module 130 prepares six study machines for ages from 10 to 60.
- the search-feature-information controlling module 130 “returns an index of a plus value when an age greater than the reference age is input” by studying a class of “age X or greater” as a “positive” class. An index indicating whether an age decade is greater or lower than the reference age can be obtained, by performing this determination processing with shifting the reference ages from 10 to 60. Among indices thus output, an index which is closest to zero is closest to an age to be output.
- FIG. 4 shows a method for estimating an age.
- An age-decade determination module 124 in the event detection module 120 calculates an output value of the SVM for each reference age. Further, the age-decade determination module 124 plots output values along the vertical axis representing output values and along the horizontal axis representing reference ages. Based on the plot, the age-decade determination module 124 can specify an age of a human figure in an input image.
- the age-decade determination module 124 selects a plot whose output value is closest to zero.
- the reference age 30 results in the output value closest to zero.
- the age-decade determination module 124 outputs “thirties” as an attribute of a human figure in an input image.
- the age-decade determination module 124 can stably determine an age decade by calculating an average change relative to adjacent reference ages.
- the age-decade determination module 124 may be configured to calculate an approximation function, based on a plurality of plots adjacent to one another, and to specify a value on the horizontal axis as an estimated age if an output value of the calculated approximation function is 0.
- the age-decade determination module 124 specifies an intersection point by calculating a linear approximation function, based on plots, and can specify an age of approximately 33 from the specified intersection point.
- the age-decade determination module 124 may be configured to calculate an approximation function based on all plots in place of a subset (e.g., plots covering three adjacent reference ages). In this case, an approximation function with less approximation errors can be calculated.
- the age-decade determination module 124 may be configured to determine a class by a value obtained from a predetermined transform function.
- the event detection module 120 detects a scene where a specific person exists, based on a method described below. At first, the event detection module 120 calculates feature information for specifying attribute information of a human figure by using information of a face region detected by the processing as described above.
- the search-feature-information controlling module 130 comprises a dictionary for specifying a person. This dictionary comprises feature information calculated from a face image of a person to specify.
- the event detection module 120 cuts a face region into a constant size and a constant shape, based on detected positions of parts of a face, and uses grayscale information thereof as a feature amount.
- the event detection module 120 uses grayscale values of a region of m ⁇ n pixels directly as feature information, and m ⁇ n dimensional information as a feature vector.
- the event detection module 120 performs a processing by employing the subspace method, based on feature information extracted from an input image and feature information of a person retained by the search-feature-information controlling module 130 . Specifically, the event detection module 120 calculates a similarity between feature vectors by performing normalization to set lengths of vectors each to 1 and by calculating an inner product, according to a simple similarity method.
- the event detection module 120 may apply a method of creating an image in which a direction or condition of a face is intentionally moved, by using a model, to face image information of one image. According to the processing described above, the event detection module 120 can obtain a feature of a face from an image.
- the event detection module 120 can recognize a human figure at higher accuracy, based on an image sequence including a plurality of images obtained chronologically sequentially from one identical human figure.
- the event detection module 120 may be configured to employ a mutual subspace method described in Document 3 (Kazuhiro Fukui, Osamu Yamaguchi, and Kenichi Maeda: “Face Recognition System using Temporal Image Sequence”, IEICE technical report PRMU, vol 97, No. 113, pp 17-24 (1997))
- the event detection module 120 cuts out an image of m ⁇ n pixels from an image sequence, as in the feature extraction processing described above, obtains a correlation matrix based on the cut data, and obtains orthonormal vectors by KL expansion. Therefore, the event detection module 120 can calculate a subspace indicating a facial feature obtained from the sequential images.
- a correlation matrix (or covariance matrix) of feature vectors is calculated, and orthonormal vectors (eigen vectors) are calculated by K-L expansion thereof. Accordingly, a subspace is calculated.
- the subspace is expressed by selecting k eigen vectors corresponding to an eigen value, in an order from one having the greatest eigen value, and by using a set of the eigen vectors.
- This information is a subspace indicating a facial feature of a human figure who is currently a recognition target.
- Feature information such as a subspace which is output in a method as described above is taken as feature information of a person for a face detected from an input image.
- the event detection module 120 performs a processing of performing a calculation to indicate similarities to facial feature information in the search-feature-information controlling module 130 which preliminarily registers a plurality of faces, and of returning results in order from one having the highest similarity.
- An index indicating a similarity, a similarity between subspaces controlled as facial feature information is used.
- a calculation method thereof may be a subspace method, a multiple similarity method, or any other method.
- both of recognition data prestored in registration information and input data are expressed as subspaces calculated from a plurality of images, and an “angle” between two subspaces is defined as a similarity.
- an input subspace is referred to as an input means subspace.
- the event detection module 120 obtains a subspace similarity (0.0 to 1.0) for a subspace expressed by two eigen vectors ⁇ in and ⁇ d. The event detection module 120 uses this similarity as a similarity for recognizing a person.
- the event detection module 120 may be configured to identify a person by projecting a plurality of face images, which are known to belong to one identical human figure, together to a subspace. In this case, accuracy of personal identification can be improved.
- the search-feature-information controlling module 130 retains a variety of information used in a processing for detecting various events by the event detection module 120 . As described above, the search-feature-information controlling module 130 retains information required for determining persons, and attributes of human figures.
- the search-feature-information controlling module 130 retains, for example, facial feature information for each of the persons, and feature information (attribute information) for each of the attributes. Further, the search-feature-information controlling module 130 can retain attribute information associated with each identical human figure.
- the search-feature-information controlling module 130 retains, as facial feature information and attribute information, a variety of feature information calculated in the same method as the event detection module 120 .
- the search-feature-information controlling module 130 retains m ⁇ n feature vectors, a subspace, or a correlation matrix immediately before KL expansion is performed.
- the configuration may be arranged so as to detect human figures from photographs or image sequences input to the image search apparatus 100 , calculate feature information based on images of detected human figures, and store the calculated feature information into the search-feature-information controlling module 130 .
- the search-feature-information controlling module 130 stores, with associating the feature information, facial images, identification IDs, and names with one another, wherein the names are input through an unillustrated operation input module.
- the search-feature-information controlling module 130 may be configured to store different additional information or attribute information associated with feature information, based on preset text information.
- the event controlling module 140 retains information concerning an event detected by the event detection module 120 .
- the event controlling module 140 stores input image information directly just as the image information is input or down-converted. If image information is input from an apparatus such as DVR, the event controlling module 140 stores link information to a corresponding image. In this manner, the event controlling module 140 can easily search a scene which is instructed about when playback of an arbitrary scene is instructed about. Accordingly, the image search apparatus 100 can play the image search apparatus 100 .
- FIG. 5 is a table showing for explaining an example of information stored by the event controlling module 140 .
- the event controlling module 140 retains types of events (equivalent to levels described above) detected by the event detection module 120 , information (coordinate information) indicating coordinates at which detected objects are imaged, attribute information, identification information for identifying persons, and frame information indicating frames in images, with the types and foregoing information associated with one another.
- the event controlling module 140 controls, as a group, a plurality of frames throughout which one identical human figure is sequentially imaged. In this case, the event controlling module 140 selects and retains a best shot image as a representative image. For example, when a face region has been detected, the event controlling module 140 retains a face image from which the face region can be known, as a best shot.
- the event controlling module 140 retains an image of a personal region as a best shot.
- the event controlling module 140 selects, as a best shot, an image in which a personal region is imaged to be largest or an image in which a human figure is determined to face in a direction closest to the front direction due to bilateral symmetry.
- the event controlling module 140 selects, as a best shot, an image in which a moving amount is the greatest or an image which shows a move but looks stable since a moving amount thereof is small.
- the event controlling module 140 classifies events detected by the event detection module 120 into levels depending on “human likelihood”. Specifically, the event controlling module 140 assigns “level 1” as the lowest level to a scene where a region which moves over a predetermined size or more exists. The event controlling module 140 assigns “level 2” to a scene where a human figure exists. The event controlling module 140 assigns “level 3” to a scene where a face of a human figure is detected. The event controlling module 140 assigns “level 4” to a scene where a face of a human figure is detected and a person corresponding to a specific attribute exists. Further, the event controlling module 140 assigns “level 5” as the highest level to a scene where a face of a human figure is detected and a specific person exists.
- FIG. 6 is a diagram showing for explaining an example of a screen displayed by the image search apparatus 100 .
- the output module 150 outputs an output screen 151 as shown in FIG. 6 , based on information stored by the event controlling module 140 .
- the output screen 151 output from the output module 150 comprises an image switch button 11 , a detection setting button 12 , a playback screen 13 , control buttons 14 , a time bar 15 , event marks 16 , and an event-display setting button 17 .
- the image switch button 11 is to switch an image as a processing target. This embodiment will now be described with reference to an example of reading an image file.
- the image switch button 11 shows a file name of a read image file.
- an image to be processed by the present apparatus may be directly input from a camera or may be a list of still images in a folder.
- the detection setting button 12 is to make a setting for detection from an image as a target. For example, to perform the level 5 (personal identification), the detection setting button 12 is operated. In this case, the detection setting button 12 shows a list of persons as search targets. The displayed list of persons may be configured to allow the persons to be deleted or edited or to allow a new search target to be added.
- the playback screen 13 is a screen which plays an image as a target.
- a playback processing for an image is controlled by the control buttons 14 .
- the control button 14 comprises “skip to previous event”, “reverse high-speed play”, “reverse play”, “frame-by-frame reverse”, “pause”, “frame-by-frame advance”, “play”, “high-speed play”, and “skip to next event” in this order from the left side in FIG. 6 .
- a further button for another function may be added or any useless buttons may be deleted from the control buttons 14 .
- the time bar 15 indicates a playback position relative to a whole image length.
- the time bar 15 comprises a slider which indicates a current playback position. When the slider is operated, the image search apparatus 100 performs a processing to change the playback position.
- the event marks 16 marks positions of detected events. Positions of the event marks 16 correspond to playback positions on the time bar 15 . When the “skip to previous event” or “skip to next event” of the control buttons 14 is operated, the image search apparatus 100 skips to a position of an event existing before or after the slider of the time bar 15 .
- the event-display setting button 17 comprises check boxes shown for levels 1 to 5 . Events corresponding to checked levels are marked as the event marks 16 . Specifically, the user can make useless events undisplayed by operating the event-display setting button 17 .
- the output module 150 comprises buttons 18 and 19 , thumbnails 20 to 23 , and a save button 24 .
- the thumbnails 20 to 23 form a displayed list of events.
- the thumbnails 20 to 23 respectively show best shot images for events, frame information (frame numbers), event levels, and additional information concerning the events.
- the image search apparatus 100 may be configured to show images of detected regions as the thumbnails 20 to 23 if a personal region or a face region is detected for each event.
- the thumbnails 20 to 23 show events close to corresponding positions on the slider of the time bar 15 .
- the image search apparatus 100 switches one of the thumbnails 20 to 23 to another. For example, when the button 18 is operated, the image search apparatus 100 then displays a thumbnail concerning an event existing before a currently displayed event.
- the image search apparatus 100 displays a thumbnail concerning an event existing after a currently displayed event.
- a thumbnail corresponding to an event being played on the playback screen 13 is displayed, bordered as shown in FIG. 6 .
- the image search apparatus 100 skips to a playback position of a selected event and displays a corresponding image on the playback screen 13 .
- the save button 24 is to store an image or an image sequence of an event.
- the image search apparatus 100 can then store, into an unillustrated storage module, an image of an event corresponding to a selected one of the displayed thumbnails 20 to 23 .
- this image to save may be selected and saved from a “face region”, “upper half body region”, “whole body region”, “whole moving region”, and “whole image” in accordance with an operation input.
- the image search apparatus 100 may be configured to output a frame number, file name, and text file.
- the image search apparatus 100 outputs, as a file name for the text file, a file name having a different extension from that of an image file. Further, the image search apparatus 100 may output all relevant information in text form.
- the image search apparatus 100 When an event is an image sequence of the level 1, the image search apparatus 100 outputs, as an image sequence file, images for a duration throughout which a move continues sequentially. When an event is an image sequence of the level 2, the image search apparatus 100 outputs, as an image sequence file, images corresponding to a range throughout which one identical human figure can be associated throughout a plurality of frames.
- the image search apparatus 100 can store the file which is thus output, as an evidence image or video which can be visually checked. Further, the image search apparatus 100 can output the file to a system which performs comparison with preregistered human figure.
- the image search apparatus 100 is input with a monitor camera image or a recorded image, and extracts scenes where human figures are imaged, with the scenes associated with an image sequence.
- the image search apparatus 100 assigns levels to extracted events, depending on reliability degrees indicating how reliably the human figures exist. Further, the image search apparatus 100 controls a list of extracted events, linked with images. In this manner, the image search apparatus 100 can output scenes where a human figure desired by the user is imaged.
- the image search apparatus 100 allows the user to easily see images of detected human figures by outputting firstly an event of the level 5 and secondly an event of the level 4. Further, the image search apparatus 100 makes the user see events throughout an entire image without fails, by displaying the events, switching the levels in order from 3 to 1.
- FIG. 7 is a diagram showing for explaining the configuration of an image search apparatus 100 according to the second embodiment.
- the image search apparatus 100 comprises an image input module 110 , an event detection module 120 , a search-feature-information controlling module 130 , an event controlling module 140 , an output module 150 , and a time estimation module 160 .
- the time estimation module 160 estimates a time point of an input image.
- the time estimation module 160 estimates a time point when the input image was imaged.
- the time estimation module 160 assigns information (time point information) indicating the estimated time point to the image input to the image input module 110 , and outputs the information to the event detection module 120 .
- time information indicating an imaging time point of an image is input, according to the present embodiment.
- the image input module 110 and the time estimation module 160 can associate frames of the image and time points with each other, based on time stamps and a frame rate of the file.
- time point information is often graphically embedded in an image. Therefore, the time estimation module 160 can generate time information by recognizing numerical figures expressing time points, which are embedded in the image.
- the time estimation module 160 can also obtain a current time point by using time point information obtained from a real time clock which is directly input from a camera.
- a meta file including information indicating time is added to an image file.
- a method is available for providing information indicating a relationship of respective frames with time points, in form of an external meta file as a caption information file, separately from the time estimation module 160 . Therefore, time information can be obtained by reading the external meta file.
- the image search apparatus 100 prepares, as face images for search, face images which have been respectively preliminarily given imaging time points and ages, or face images for which imaging time points have been known and ages are estimated by using the face images
- the time estimation module 160 estimates an imaging time point, based on a method of using EXIF information added to a face image or a time stamp of a file. Alternatively, the time estimation module 160 may be configured to use, as an imaging time point, time information input by an unillustrated operation input.
- the image search apparatus 100 calculates similarities between all face images detected from an input image and personal facial feature information for search, which is prestored in the search-feature-information controlling module 130 .
- the image search apparatus 100 performs a processing from an arbitrary position of an image, and estimates an age for a face image for which a predetermined similarity is calculated first. Further, the image search apparatus 100 backwardly calculates an imaging time point of an input image, based an average value or a mode value among differences between age estimation results for the face images for search and age estimation results for the face images for which the predetermined similarity has been calculated.
- FIG. 8 shows an example of the time estimation processing.
- ages are preliminarily estimated for the face images for search which are stored in the search-feature-information controlling module 130 .
- a human figure of a face image for search is estimated to be 35 years old.
- the image search apparatus 100 searches for the same human figure as of the face image for search by using facial features from an input image.
- a method for searching the same human figure is the same as described in the first embodiment.
- the image search apparatus 100 calculates similarities between all face images detected from an image and a face image for search.
- the image search apparatus 100 assigns a similarity “ ⁇ ” to each face image for which a similarity is calculated to be a preset predetermined value or greater, and assigns a similarity “x” to each face image for which a similarity is calculated to be smaller than the predetermined value.
- the image search apparatus 100 estimates an age for each of these face images by using the same method as described in the first embodiment. Further, the image search apparatus 100 calculates an average value of the calculated ages, and estimates time point information indicating an imaging time point of an input image, based on a difference between the average value and an age estimated from the face image for search. In this method, the image search apparatus 100 has been described to have a configuration of using an average value of calculated ages. However, the image search apparatus 100 may be configured to use an intermediate value, a mode value, or any other value.
- calculated ages are 40, 45, and 44. Therefore, an average value thereof is 43. An age difference of 8 years exists to the face image for search.
- the image search apparatus 100 determines that the input image was imaged between the year 2000 when the face image for search had been imaged and the year 2008 which is eight years after 2000.
- the image search apparatus 100 specifies the imaging time point of the input image to be Aug. 23, 2008, including year/month/date, though depending on accuracy of age estimation. Specifically, the image search apparatus 100 can estimate imaging date/time in units of days.
- the image search apparatus 100 may be configured to estimate an age, for example, based on a face image detected first, as shown in FIG. 9 , and to estimate an imaging time point, based on the estimated age and the age of an image for search. According to this method, the image search apparatus 100 can estimate an imaging time point faster.
- the event detection module 120 performs the same processing as the first embodiment. However, in the present embodiment, an imaging time point is added to an image.
- the event detection module 120 may be configured to associate not only frame information but also an imaging time point with each event detected.
- the event detection module 120 may be configured to narrow estimated ages by using a difference between an imaging time point of a face image for search and an imaging time point of an input image, when the event detection module 120 performs a processing of the level 5, i.e., when a scene where a specific person is imaged is detected from an input image.
- the event detection module 120 estimates an age at the time when the input image of the human figure to search for was imaged, based on a difference between the imaging time of the face image for search and the imaging time point of the input image. Further, the event detection module 120 estimates ages respectively for human figures in a plurality of events in which the human figures detected from the input image are imaged. The event detection module 120 detects an event in which a human figure close to the age at the time when the input image of the person in the face image for search was imaged.
- the event detection module 120 sets, as a target for detecting an event, the age at the time when the input image of the human figure in the face image for search was imaged ⁇ . In this manner, the image search apparatus 100 can more steadily detect events without fails.
- the value of ⁇ may be arbitrarily set based on a user's operation input or may be preset as a reference value.
- the image search apparatus 100 estimates a time point when an input image was imaged, in a processing of the level 5 for detecting a person from an input image. Further, the image search apparatus estimates an age at a time point when an input image of a human figure to search for was imaged.
- the image search apparatus 100 detects a plurality of scenes in which human figures are imaged, and estimates ages of the human figures who are imaged in the scenes.
- the image search apparatus 100 can detect a scene where a human figure who is estimated to have an age close to the age of the human figure to search for. As a result, the image search apparatus 100 can detect, at a higher speed, scenes where a specific human figure is imaged.
- the search-feature-information controlling module 130 further retains time point information indicating a time point when a face image was imaged and information indicating an age at the time point of having imaged the face image, together with feature information extracted from the face image of each human figure. Ages may be either estimated from images or input by the user.
- FIG. 11 is a diagram showing for explaining an example of a screen displayed by the image search apparatus 100 .
- the output module 150 outputs an output screen 151 which comprises time point information 25 indicating a time point of an image in addition to the same content as displayed in the first embodiment. Time point information of the image is thus displayed together. Further, the output screen 151 may be configured to display an age which is estimated based on an image displayed on a playback screen 13 . In this manner, the user can recognize an estimated age of a human figure displayed on the playback screen 13 .
- Functions described in the above embodiment may be constituted not only with use of hardware but also with use of software, for example, by making a computer read a program which describes the functions.
- the functions each may be constituted by appropriately selecting either software or hardware.
Abstract
According to one embodiment, an image search apparatus includes, an image input module which is input with an image, an event detection module which detects events from the input image input by the image input module, and determines levels, depending on types of the detected events, an event controlling module which retains the events detected by the event detection module, for each of the levels, and an output module which outputs the events retained by the event controlling module, for each of the levels.
Description
- This application is based upon and claims the benefit of priority from prior Japanese Patent Application No. 2010-271508, filed Dec. 6, 2010, the entire contents of which are incorporated herein by reference.
- Embodiments described herein relate generally to an image search apparatus and an image search method.
- Developments are made in technology for searching for a desired image from monitor images obtained by a plurality of cameras installed at a plurality of locations. Such technology is to search for a desired image from among images directly input from cameras or images accumulated in a recording apparatus.
- For example, there is technology of detecting an image which images some change or images a human figure. An observer specifies a desired image by monitoring detected images. However, if a large number of images imaging changes or human figures are detected, a visual check of the detected images requires much labor.
- For an easy visual check of images, there is technology of searching for a similar image by pointing out attribute information for a face image. For example, a face image including a specified feature can is searched for from a database by specifying a feature of a face of a human figure to search for, as a search condition.
- Further, there is technology of narrowing face images by using attributes (in text form) preliminarily appended to a database. For example, a high-speed search is achieved by performing a search by using a name, a member ID, or registration year/month/date, in addition to a face image. Further, recognition dictionaries are narrowed by using attribute information (height, weight, gender, age, etc.) other than main biometric information such as a face.
- However, when an image which matches with attribute information is searched for, there is a problem that accuracy degrades since time points of imaging are considered by neither dictionaries' side nor inputting side.
- When narrowing is performed by using age information in text form, the narrowing cannot be achieved unless attribute information (in text form) is preliminarily attached to search targets.
- The present invention hence provides an image search apparatus and an image search method capable of more efficiently performing an image search.
-
FIG. 1 is an exemplary diagram showing for explaining an image search apparatus according to an embodiment; -
FIG. 2 is an exemplary diagram showing for explaining the image search apparatus according to the embodiment; -
FIG. 3 is an exemplary diagram showing for explaining the image search apparatus according to the embodiment; -
FIG. 4 is an exemplary diagram showing for explaining the image search apparatus according to the embodiment; -
FIG. 5 is an exemplary table showing for explaining the image search apparatus according to the embodiment; -
FIG. 6 is an exemplary graph showing for explaining the image search apparatus according to the embodiment; -
FIG. 7 is an exemplary diagram showing for explaining an image search apparatus according to an another embodiment; -
FIG. 8 is an exemplary diagram showing for explaining the image search apparatus according to the another embodiment; -
FIG. 9 is an exemplary diagram showing for explaining the image search apparatus according to the another embodiment; -
FIG. 10 is an exemplary diagram showing for explaining the image search apparatus according to the another embodiment; and -
FIG. 11 is an exemplary diagram showing for explaining the image search apparatus according to the another embodiment. - In general, according to one embodiment, an image search apparatus comprises; an image input module which is input with an image, an event detection module which detects events from the input image input by the image input module, and determines levels, depending on types of the detected events, an event controlling module which retains the events detected by the event detection module, for each of the levels, and an output module which outputs the events retained by the event controlling module, for each of the levels.
- Hereinafter, an image search apparatus and an image search method according to one embodiment will be specifically described.
-
FIG. 1 is an exemplary diagram showing for explaining animage search apparatus 100 according to the one embodiment. - As shown in
FIG. 1 , theimage search apparatus 100 comprises animage input module 110, anevent detection module 120, a search-feature-information controllingunit module 130, an event controllingmodule 140, and anoutput module 150. Theimage search apparatus 100 may comprise an operation module which receives an operational input from users. - The
image search apparatus 100 extracts scenes which image a specific human figure from input images (image sequence or photographs) such as monitor images. Theimage search apparatus 100 extracts events depending on reliability degrees indicating how reliably a human figure is imaged. In this manner, theimage search apparatus 100 assigns levels to scenes including the extracted events, respectively for the reliability degrees. By controlling a list of the extracted events linked with images, theimage search apparatus 100 can easily output scenes in which a desired human figure exists. - In this manner, the
image search apparatus 100 can search for the same human figure as imaged in a face photo currently in hand. Thevideo search apparatus 100 can also search for relevant images when an accident or crime happens. Further, theimage search apparatus 100 can search for relevant scenes or events among images from an installed security camera. - The
image input module 110 is an input means to which images are input from a camera or a storage which stores images. - The
event detection module 120 detects events such as a moving region, a personal region, face region, personal attribute information, or personal identification information. Theevent detection module 120 sequentially obtains information (frame information) indicating positions of frames including the detected events in a video image. - A search-feature-information controlling
module 130 stores personal information and information used for attribute determination. - An
event controlling module 140 links input images, detected events, and frame information to one another. Theoutput module 150 outputs a result controlled by theevent controlling module 140. - Modules of the
image search apparatus 100 will now be described in order below. - The
image input module 110 inputs a face image of a target human figure to image. Theimage input module 110 comprises, for example, an industrial television (ITV) camera. The ITV camera digitizes optical information received through a lens, by an A/D converter, and outputs the information as image data. In this manner, theimage input module 110 can output image data to theevent detection module 120. - The
image input module 110 may alternatively be configured to comprise a recording apparatus such as a digital video recorder (DVR), which records images, or an input terminal which is input with images recorded on a recording medium. Specifically, theimage input module 110 may have any configuration insofar as the configuration can obtain digitized image data. - A search target needs only to be, finally, digital image data including a face image. An image file imaged by a digital still camera may be loaded through a medium, or even a digital image scanned from a paper medium or a photograph is available. In this case, a scene of searching a large amount of stored still images for a corresponding image is cited as an application example.
- The
event detection module 120 detects an image supplied from theimage input module 110 or an event to be detected based on a plurality of images. Theevent detection module 120 also detects an index indicating a frame (e.g., a frame number) in which an event has been detected. For example, when images to be input are a plurality of still images, theevent detection module 120 may detect file names of the still images as frame information. - The
event detection module 120 detects, as events, a scene where a region which moves with a predetermined size or more exists, a scene where a human figure exists, a scene where a face of a human figure is detected, a scene where a face of a human figure is detected and a person corresponding to a specific attribute exists, and a scene where a face of a human figure is detected and a specific person exists. However, events which are detected by theevent detection module 120 are not limited to those described above. Theevent detection module 120 may be configured to detect an event in any way insofar as the event indicates that a human figure exists. - The
event detection module 120 detects a scene which may image a human figure, as an event. Theevent detection module 120 adds levels respectively to scenes in order from a scene from which the greatest amount of information relevant to a human figure can be obtained. - Specifically, the
event detection module 120 assigns “level 1” as the lowest level to each scene where a region which moves over a predetermined size or more exists. Theevent detection module 120 assigns “level 2” to each scene where a human figure exists. Theevent detection module 120 assigns “level 3” to each scene where a human figure's face is detected. Theevent detection module 120 assigns “level 4” to each scene where a human figure's face is detected and a human figure corresponding to a specific attribute exists. Further, theevent detection module 120 assigns “level 5” as the highest level to each scene where a human figure's face is detected and a specific person exists. - The
event detection module 120 detects a region which moves over a predetermined size or more, in a method described below. Theevent detection module 120 detects a scene where a region which moves over a predetermined size or more exists, based on a method disclosed in Japanese Patent No. P3486229, P3490196, or P3567114. - Specifically, the
event detection module 120 stores, for preliminary study, a distribution of luminance in a background image, and compares an image supplied from theimage input module 110 with the prestored luminance distribution. As a result of comparison, theevent detection module 120 determines that an “object not forming part of a background exists” in any region of the image which does not match with the luminance distribution. - In the present embodiment, general versatility can be improved by employing a method capable of correctly detecting an “object not forming part of a background” even from an image including a background where a periodical change appears like trembling of leaves.
- The
event detection module 120 extracts pixels where a predetermined or greater change in luminance occurred in the detected moving region, and transforms the pixels into a binary image expressed by “change=1” and “no change=0”. Theevent detection module 120 divides each set of pixels each of which is expressed by “1” by means of labeling, and calculates a size of a moving region, based on a size of a circumscribed rectangle for each of the sets of pixels, or based on a number of moving pixels included in each of the sets of pixels. If the calculated size is larger than a preset reference size, theevent detection module 120 determines “changed” and extracts the image. - If the moving region is extremely large, the
event detection module 120 determines that pixel values have changed because the sun has gone behind a cloud and it has suddenly become dark or because a near illumination has turned on, or from any other casual reason. Therefore, theevent detection module 120 can correctly extract a scene where a moving object such as a human figure exists. - The
event detection module 120 can also correctly extract a scene where a moving object such as a human figure exists, by setting an upper limit to a size to be determined as a moving region. For example, theevent detection module 120 can more accurately extract a scene where a human figure exists, by setting thresholds for upper and lower limits to an assumed size of a distribution of a human being. - The
event detection module 120 can detect a scene where a human figure exists, based on a method described below. For example, theevent detection module 120 can detect a scene where a human figure exists by using technology of detecting a region of the whole of a human figure. The technology of detecting a region of the whole of a human figure is described for example, Document 1 (Watanabe et al., “Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection, In Proceedings of the 3rd Pacific-Rim Symposium on Image and Video Technology” (PSIVT2009), pp. 37-47.) - In this case, the
event detection module 120 obtains how a distribution of luminance gradient information appears when a human figure exists, by using co-occurrence at a plurality of local regions. If a human figure exists, an upper half region of the human figure can be calculated as rectangle information. - If a human figure exists in an input image, the
event detection module 120 detects a frame thereof as an event. According to this method, theevent detection module 120 can detect a scene where a human figure exists even when a face of the human figure is not imaged in the image or if resolution is insufficient to recognize a face. - Based on a method described below, the
event detection module 120 detects a scene where a face of a human figure is detected. Theevent detection module 120 calculates a correlation value with moving a prepared template within an input image. Theevent detection module 120 specifies, as a face region, a region where a highest correlation value is calculated. In this manner, theevent detection module 120 can detect a scene where a face of a human figure is imaged. - Alternatively, the
event detection module 120 may be configured to detect a face region by using an eigen space method or a subspace method. Theevent detection module 120 detects a position of a facial portion such as an eye or a nose from an image of a detected face region. Theevent detection module 120 can detect facial portions according to a method described in, for example, Document 2 (Kazuhiro Fukui and Osamu Yamaguchi, “Facial Feature Point Extraction Method Based on Combination of Shape Extraction and Pattern Matching”, Transactions of the Institute of Electronics, Information and Communication Engineers (D), vol. J80-D-II, No. 8, pp 2170-2177 (1997)) - When the
event detection module 120 detects one face region (facial feature) from one image, theevent detection module 120 obtains a correlation value with respect to a template for the whole image, and outputs a position and a size which maximize the correlation value. When a plurality of facial features are obtained from one image, theevent detection module 120 obtains a local maximum value of the correlation value for the while image, and narrows candidate positions of a face in consideration of overlapping within one image. Further, theevent detection module 120 can finally simultaneously detect a plurality of facial features in consideration of relationships (chronological transition) with past images which have been sequentially input. - Alternatively, the
event detection module 120 may be configured to prestore facial patterns of human figures wearing a mask, sunglasses, and a headgear, as templates in order that a face region can be detected even if a human figure wears a mask, sun-glasses, or a headgear. - If the
event detection module 120 cannot detect all of facial feature points when theevent detection module 120 detects facial feature points, theevent detection module 120 performs a processing, based on evaluation values for part of facial feature points. Specifically, if an evaluation value for part of facial feature points is not smaller than a preset reference value, theevent detection module 120 can estimate remaining feature points from feature points which have been detected by using a two-dimensional or three-dimensional facial model. - Even when any feature point can not be detected at all, the
event detection module 120 can detect a position of a whole face and can estimate a facial feature point from the position of the whole face, by preliminarily studying a pattern of a whole face. - If a plurality of faces exist in an image, the
event detection module 120 may give an instruction about which face to set as a search target, by a search condition setting means or an output means. Further, theevent detection module 120 may be configured to automatically select and output search targets in an order of indices indicating face likelihood obtained through the processing described above. - If one identical human figure is imaged throughout sequential frames, it is more adequate to treat the frames as “one event which images one identical human figure” than to control the frames as respectively different events, in many cases.
- Hence, the
event detection module 120 calculates probabilities, based on statistical information indicating which of sequential frames a human figure who normally walks moves to, and selects a combination which maximizes the probability. Theevent detection module 120 can thereby associate the combination with an event to issue. In this manner, theevent detection module 120 can recognize, as one event, a scene where an identical human figure is imaged throughout a plurality of frames. - When a frame rate is high, the
event detection module 120 associates personal regions or face regions with one another between frames by using, for example, an optical flow. Accordingly, theevent detection module 120 can recognize, as one event, a scene where an identical human figure is imaged throughout a plurality of frames. - Further, the
event detection module 120 can select a “best shot” from a plurality of frames (a group of associated images). The best shot is most suitable for visually checking a human figure. - Among frames included in a detected event, the
event detection module 120 selects, as the best shot, a frame having the highest value which takes at least one or more indices into consideration, from among a frame which includes the largest face region, a frame in which a face of a human being is directed in a direction closest to the front direction, a frame which has the greatest contrast of an image in a face region, and a frame which has the greatest similarity to a pattern indicating face likelihood. - Alternatively, the
event detection module 120 may be configured to select, as the best shot, an easy-to-see image for human eyes or an image suitable for a recognition processing. A selection criterion for selecting such a best shot may be freely set based on user's discretion. - The
event detection module 120 detects a scene where a human figure corresponding to a specific attribute exists, based on a method described below. Theevent detection module 120 calculates feature information for specifying attribute information of a human figure by using information of a face region detected by the processing described above. - Attribute information described in the present embodiment has been described as including the five types of age, sex, glasses type, mask type, and headgear type. However, the
event detection module 120 may be configured to use other attribute information. For example, theevent detection module 120 may be configured to use, as attribute information, a race, wearing glasses or not (information of 1 or 0), wearing a mask or not (information of 1 or 0), wearing a headgear or not (information of 1 or 0), a facial accessory (pierce, earring, etc.), a wear, a face look, an obesity index, a wealth index, etc. Theevent detection module 120 can use any feature as an attribute by studying a pattern in advance for each attribute by using an attribute determination method described later. - The
event detection module 120 extracts a facial feature from an image in a face region. For example, theevent detection module 120 can calculate the facial feature by using the subspace method. - When an attribute of a human figure is determined by comparing a facial feature with attribute information, there is a case that a calculation method for calculating a facial feature differs for each attribute. Hence, the
event detection module 120 may be configured to calculate a facial feature by using a calculation method depending on attribute information to be compared with. - For example, when comparison is performed with attribute information such as an age or a gender, the
event detection module 120 can more accurately determine an attribute by applying an adequate pre-processing for each of the age and gender. - Usually, every human figure has a face which more wrinkles as an age of the human figure increases. Therefore, the
event detection module 120 can determine an attribute (age decade) of a human figure with high accuracy, by synthesizing a line-segment emphasis filter which emphasizes wrinkles, on an image of a face region. - The
event detection module 120 synthesizes a filter which emphasizes a frequency component to emphasize a portion specific to a gender (such as a beard), on an image of a face region, or synthesizes a filter which emphasizes skeletal information, on an image of a face region. In this manner, theevent detection module 120 can more accurately determine an attribute (gender) of a person. - Further, the
event detection module 120 specifies a position of an eye, an outer canthus, or an inner canthus from a facial portion obtained by a face detection processing. Therefore, theevent detection module 120 can obtain feature information concerning glasses by cutting out an image around two eyes and by treating the cut image as a calculation target for a subspace. - The
event detection module 120 specifies, for example, positions of a mouth and a nose from positional information of facial portions, which is obtained by the face detection processing. Therefore, theevent detection module 120 can obtain feature information concerning a mask, by cutting out an image around the specified positions of the mouth and nose and by treating the cut image as a calculation target for a subspace. - The
event detection module 120 specifies positions of eyes and eyeblows from positional information of facial portions obtained by the face detection processing. Therefore, theevent detection module 120 can specify an upper end of a skin region of a face. Further, theevent detection module 120 can obtain feature information concerning a headgear, by cutting out an image of a top region of a specified face and by treating the cut image as a calculation target for a subspace. - As described above, the
event detection module 120 can extract feature information by specifying glasses, a mask, and a hat from a position of a face. Specifically, theevent detection module 120 can extract feature information from any attribute insofar as the attribute exists at a position which is estimable from a position of a face. - An algorithm which directly detects an object which a human figure puts on has generally been put into practical use. The
event detection module 120 may be configured to extract feature information by using such a method. - Unless a human figure wears glasses, a mask, or a headgear, the
event detection module 120 extracts facial skin information directly as feature information. Therefore, different feature information is extracted individually for each of attributes such as glasses, a mask, and sunglasses. Specifically, theevent detection module 120 need not mandatory extract feature information by particularly classifying attributes such as glasses, a mask, and sunglasses. - The
event detection module 120 may be configured to separately extract feature information indicating nothing put on if a human figure wears neither glasses, a mask, nor a hat. - After calculating the feature information for determining an attribute, the
event detection module 120 further compares the feature information with attribute information stored by the search-feature-information controlling module 130 described later. Theevent detection module 120 thereby determines an attribute such as a gender, an age decade, glasses, a mask, and a hat for a human figure of an input face image. Theevent detection module 120 sets, as an attribute to be used for detecting an event, at least one of an age, a gender, wearing glasses or not, a glasses type, wearing a mask or not, a mask type, wearing a headgear or not, a headgear type, a beard, a mole, a wrinkle, an injury, a hair color, a wear color, a wear shape, a headgear, an ornament, an accessory near a face, a face look, a wealth degree, and a race. - The
event detection module 120 outputs the determined attribute to theevent detection module 120. Specifically, as shown inFIG. 2 , theevent detection module 120 comprises anextraction module 121 and anattribute determination module 122. Theextraction module 121 extracts feature information for a predetermined region in a registered image (input image), as described above. For example, when face region information indicating a face region and an input image are input, theextraction module 121 then calculates feature information for the region indicated by the face region information in the input image. - The
attribute determination module 122 determines an attribute of a human figure in the input image, based on feature information extracted by theextraction module 121 and attribute information prestored in the search-feature-information controlling module 130. Theattribute determination module 122 determines an attribute of the human figure in the input image, by calculating a similarity between feature information extracted by theextraction module 121 and attribute information prestored in the search-feature-information controlling module 130. - The
attribute determination module 122 comprises, for example, agender determination module 123 and an age-decade determination module 124. Theattribute determination module 122 may further comprise a determination module for determining a further attribute. For example, theattribute determination module 122 may comprise a determination module which determines an attribute such as glasses, a mask, or a headgear. - For example, the search-feature-
information controlling module 130 preliminarily retains male attribute information and female attribute information. Thegender determination module 123 calculates similarities, based on the male attribute information and female attribute information retained by the search-feature-information controlling module 130, and the feature information extracted by theextraction module 121. Thegender determination module 123 outputs attribute information for which a greater similarity has been calculated, as a result of an attribute determination for an input image. - For example, as described in Jpn. Pat. Appln. KOKAI Publication No. 2010-044439, the
gender determination module 123 uses a feature amount by retaining an occurrence frequency of a local gradient feature of a face as statistical information. Specifically, thegender determination module 123 determines two classes such as maleness and femaleness, by selecting a gradient feature for which maleness or femaleness can be most identified from the statistical information, and by calculating a discriminator which identifies the feature through studies. - If there are attributes of three classes or more in place of two classes, as in age estimation, the search-feature-
information controlling module 130 preliminarily retains dictionaries of average facial features (attribute information) for the respective classes (age decades in this case). The age-decade determination module 124 calculates a similarity between attribute information for each age decade, which is retained in the search-feature-information controlling module 130, and feature information extracted by theextraction module 121. The age-decade determination module 124 determines an age decade of a human figure in an input image, based on the attribute information used for calculating the highest similarity. - Technology for estimating an age decade at much higher accuracy will be a method described below, which uses a two-class discriminator as described above.
- At first, in order to estimate ages, the search-feature-
information controlling module 130 preliminarily retains a face image for each of ages which are desired to identify. For example, to determine an age decade group of ages from 10 to 60, the search-feature-information controlling module 130 preliminarily retains a face image for ages smaller than 10 and not smaller than 60. In this case, as the number of face images retained by the search-feature-information controlling module 130 increases, age decades can be determined more accurately. Further, the search-feature-information controlling module 130 can widen determinable ages by preliminarily retaining face images for wider age decades. - Next, the search-feature-
information controlling module 130 prepares a discriminator for determining “whether an age decade is greater or smaller than a reference age”. The search-feature-information controlling module 130 can make theevent detection module 120 perform a two-class determination by using linear discriminate analysis. - The
event detection module 120 and search-feature-information controlling module 130 may be configured to employ a method such as a support vector machine. The support vector machine will be hereinafter referred to as an SVM. According to the SVM, a boundary condition for discriminating two classes can be set, and whether a distance is within a set distance from a boundary or not can be calculated. Therefore, theevent detection module 120 and search-feature-information controlling module 130 can discriminate face images which belong to ages greater than a reference age N and face images which belong to ages smaller than the reference age N. - For example, where the reference age is 30, the search-feature-
information controlling module 130 preliminarily retains a group of images for determining whether 30 is exceeded or not. For example, the search-feature-information controlling module 130 is input with images including images for theage 30 or higher, as images for a positive class of “30 or higher”. The search-feature-information controlling module 130 is also input with images for a negative class of “smaller than 30”. The search-feature-information controlling module 130 performs SVM studies based on the input images. - By the method described above, the search-feature-
information controlling module 130 creates dictionaries, with reference ages shifted from 10 to 60. In this manner, for example, as shown inFIG. 3 , the search-feature-information controlling module 130 creates dictionaries for age decade determination of “10 or greater”, “smaller than 10”, “20 or greater”, “smaller than 20”, . . . , and “60 or greater”, “smaller than 60”. The age-decade determination module 124 determines an age decade for a human figure in an input image, based on a plurality of dictionaries for age decade determination which are stored by the search-feature-information controlling module 130, and based on the input image. - The search-feature-
information controlling module 130 classifies images for age decade determination, which have been prepared by shifting the reference ages from 10 to 60, into two classes relative to a reference age. In this manner, the search-feature-information controlling module 130 can prepare a SVM study machine in accordance with the number of reference ages. In the present embodiment, the search-feature-information controlling module 130 prepares six study machines for ages from 10 to 60. - The search-feature-
information controlling module 130 “returns an index of a plus value when an age greater than the reference age is input” by studying a class of “age X or greater” as a “positive” class. An index indicating whether an age decade is greater or lower than the reference age can be obtained, by performing this determination processing with shifting the reference ages from 10 to 60. Among indices thus output, an index which is closest to zero is closest to an age to be output. -
FIG. 4 shows a method for estimating an age. An age-decade determination module 124 in theevent detection module 120 calculates an output value of the SVM for each reference age. Further, the age-decade determination module 124 plots output values along the vertical axis representing output values and along the horizontal axis representing reference ages. Based on the plot, the age-decade determination module 124 can specify an age of a human figure in an input image. - For example, the age-
decade determination module 124 selects a plot whose output value is closest to zero. In the example shown inFIG. 4 , thereference age 30 results in the output value closest to zero. In this case, the age-decade determination module 124 outputs “thirties” as an attribute of a human figure in an input image. When the plot unstably fluctuates up and down, the age-decade determination module 124 can stably determine an age decade by calculating an average change relative to adjacent reference ages. - For example, the age-
decade determination module 124 may be configured to calculate an approximation function, based on a plurality of plots adjacent to one another, and to specify a value on the horizontal axis as an estimated age if an output value of the calculated approximation function is 0. In an example shown inFIG. 4 , the age-decade determination module 124 specifies an intersection point by calculating a linear approximation function, based on plots, and can specify an age of approximately 33 from the specified intersection point. - Further, the age-
decade determination module 124 may be configured to calculate an approximation function based on all plots in place of a subset (e.g., plots covering three adjacent reference ages). In this case, an approximation function with less approximation errors can be calculated. - Alternatively, the age-
decade determination module 124 may be configured to determine a class by a value obtained from a predetermined transform function. - Further, the
event detection module 120 detects a scene where a specific person exists, based on a method described below. At first, theevent detection module 120 calculates feature information for specifying attribute information of a human figure by using information of a face region detected by the processing as described above. In this case, the search-feature-information controlling module 130 comprises a dictionary for specifying a person. This dictionary comprises feature information calculated from a face image of a person to specify. - The
event detection module 120 cuts a face region into a constant size and a constant shape, based on detected positions of parts of a face, and uses grayscale information thereof as a feature amount. Here, theevent detection module 120 uses grayscale values of a region of m×n pixels directly as feature information, and m×n dimensional information as a feature vector. - The
event detection module 120 performs a processing by employing the subspace method, based on feature information extracted from an input image and feature information of a person retained by the search-feature-information controlling module 130. Specifically, theevent detection module 120 calculates a similarity between feature vectors by performing normalization to set lengths of vectors each to 1 and by calculating an inner product, according to a simple similarity method. - Alternatively, the
event detection module 120 may apply a method of creating an image in which a direction or condition of a face is intentionally moved, by using a model, to face image information of one image. According to the processing described above, theevent detection module 120 can obtain a feature of a face from an image. - The
event detection module 120 can recognize a human figure at higher accuracy, based on an image sequence including a plurality of images obtained chronologically sequentially from one identical human figure. For example, theevent detection module 120 may be configured to employ a mutual subspace method described in Document 3 (Kazuhiro Fukui, Osamu Yamaguchi, and Kenichi Maeda: “Face Recognition System using Temporal Image Sequence”, IEICE technical report PRMU, vol 97, No. 113, pp 17-24 (1997)) - In this case, the
event detection module 120 cuts out an image of m×n pixels from an image sequence, as in the feature extraction processing described above, obtains a correlation matrix based on the cut data, and obtains orthonormal vectors by KL expansion. Therefore, theevent detection module 120 can calculate a subspace indicating a facial feature obtained from the sequential images. - According to a calculation method for a subspace, a correlation matrix (or covariance matrix) of feature vectors is calculated, and orthonormal vectors (eigen vectors) are calculated by K-L expansion thereof. Accordingly, a subspace is calculated. The subspace is expressed by selecting k eigen vectors corresponding to an eigen value, in an order from one having the greatest eigen value, and by using a set of the eigen vectors. In the present embodiment, a matrix Φ of eigen vectors is obtained by obtaining a correlation matrix Cd from feature vectors, and by diagonalizing the matrix with the correlation matrix Cd=Φd Λd Φd T. This information is a subspace indicating a facial feature of a human figure who is currently a recognition target.
- Feature information such as a subspace which is output in a method as described above is taken as feature information of a person for a face detected from an input image. The
event detection module 120 performs a processing of performing a calculation to indicate similarities to facial feature information in the search-feature-information controlling module 130 which preliminarily registers a plurality of faces, and of returning results in order from one having the highest similarity. - At this time, as results of the search processing, human figures controlled in the search-feature-
information controlling module 130 to identify persons, IDs, and indices indicating similarities as calculation results are returned in order from one having the highest similarity. In addition to the results, information controlled for each of persons by the search-feature-information controlling module 130 may be returned together. However, since association with identification IDs is available, additional information need not be used in the search processing. - An index indicating a similarity, a similarity between subspaces controlled as facial feature information is used. A calculation method thereof may be a subspace method, a multiple similarity method, or any other method. In the method, both of recognition data prestored in registration information and input data are expressed as subspaces calculated from a plurality of images, and an “angle” between two subspaces is defined as a similarity.
- Here, an input subspace is referred to as an input means subspace. The
event detection module 120 also obtains a correlation matrix Cin for an input data column, and is diagonalized with the matrix with Cin=ΦinΛinΦinT, thereby to obtain eigen vectors Φin. Theevent detection module 120 obtains a subspace similarity (0.0 to 1.0) for a subspace expressed by two eigen vectors Φin and Φd. Theevent detection module 120 uses this similarity as a similarity for recognizing a person. - The
event detection module 120 may be configured to identify a person by projecting a plurality of face images, which are known to belong to one identical human figure, together to a subspace. In this case, accuracy of personal identification can be improved. - The search-feature-
information controlling module 130 retains a variety of information used in a processing for detecting various events by theevent detection module 120. As described above, the search-feature-information controlling module 130 retains information required for determining persons, and attributes of human figures. - The search-feature-
information controlling module 130 retains, for example, facial feature information for each of the persons, and feature information (attribute information) for each of the attributes. Further, the search-feature-information controlling module 130 can retain attribute information associated with each identical human figure. - The search-feature-
information controlling module 130 retains, as facial feature information and attribute information, a variety of feature information calculated in the same method as theevent detection module 120. For example, the search-feature-information controlling module 130 retains m×n feature vectors, a subspace, or a correlation matrix immediately before KL expansion is performed. - Feature information for specifying persons cannot be prepared in advance in many cases. Therefore, the configuration may be arranged so as to detect human figures from photographs or image sequences input to the
image search apparatus 100, calculate feature information based on images of detected human figures, and store the calculated feature information into the search-feature-information controlling module 130. In this case, the search-feature-information controlling module 130 stores, with associating the feature information, facial images, identification IDs, and names with one another, wherein the names are input through an unillustrated operation input module. - The search-feature-
information controlling module 130 may be configured to store different additional information or attribute information associated with feature information, based on preset text information. - The
event controlling module 140 retains information concerning an event detected by theevent detection module 120. For example, theevent controlling module 140 stores input image information directly just as the image information is input or down-converted. If image information is input from an apparatus such as DVR, theevent controlling module 140 stores link information to a corresponding image. In this manner, theevent controlling module 140 can easily search a scene which is instructed about when playback of an arbitrary scene is instructed about. Accordingly, theimage search apparatus 100 can play theimage search apparatus 100. -
FIG. 5 is a table showing for explaining an example of information stored by theevent controlling module 140. - As shown in
FIG. 5 , theevent controlling module 140 retains types of events (equivalent to levels described above) detected by theevent detection module 120, information (coordinate information) indicating coordinates at which detected objects are imaged, attribute information, identification information for identifying persons, and frame information indicating frames in images, with the types and foregoing information associated with one another. - The
event controlling module 140 controls, as a group, a plurality of frames throughout which one identical human figure is sequentially imaged. In this case, theevent controlling module 140 selects and retains a best shot image as a representative image. For example, when a face region has been detected, theevent controlling module 140 retains a face image from which the face region can be known, as a best shot. - Alternatively, when a personal region has been detected, the
event controlling module 140 retains an image of a personal region as a best shot. In this case, theevent controlling module 140 selects, as a best shot, an image in which a personal region is imaged to be largest or an image in which a human figure is determined to face in a direction closest to the front direction due to bilateral symmetry. - When a moving region has been detected, for example, the
event controlling module 140 selects, as a best shot, an image in which a moving amount is the greatest or an image which shows a move but looks stable since a moving amount thereof is small. - As has been described above, the
event controlling module 140 classifies events detected by theevent detection module 120 into levels depending on “human likelihood”. Specifically, theevent controlling module 140 assigns “level 1” as the lowest level to a scene where a region which moves over a predetermined size or more exists. Theevent controlling module 140 assigns “level 2” to a scene where a human figure exists. Theevent controlling module 140 assigns “level 3” to a scene where a face of a human figure is detected. Theevent controlling module 140 assigns “level 4” to a scene where a face of a human figure is detected and a person corresponding to a specific attribute exists. Further, theevent controlling module 140 assigns “level 5” as the highest level to a scene where a face of a human figure is detected and a specific person exists. - As the level is closer to 1, failures in detecting a “scene where a human figure exists” decrease. However, sensitive detections occur more often, and accuracy in narrowing to a specific person decreases. As the level is closer to 5, an event which is more narrowed to a specific person is output. On the other side, failures in detection increase.
-
FIG. 6 is a diagram showing for explaining an example of a screen displayed by theimage search apparatus 100. - The
output module 150 outputs anoutput screen 151 as shown inFIG. 6 , based on information stored by theevent controlling module 140. - The
output screen 151 output from theoutput module 150 comprises animage switch button 11, adetection setting button 12, aplayback screen 13,control buttons 14, atime bar 15, event marks 16, and an event-display setting button 17. - The
image switch button 11 is to switch an image as a processing target. This embodiment will now be described with reference to an example of reading an image file. In this case, theimage switch button 11 shows a file name of a read image file. As described above, an image to be processed by the present apparatus may be directly input from a camera or may be a list of still images in a folder. - The
detection setting button 12 is to make a setting for detection from an image as a target. For example, to perform the level 5 (personal identification), thedetection setting button 12 is operated. In this case, thedetection setting button 12 shows a list of persons as search targets. The displayed list of persons may be configured to allow the persons to be deleted or edited or to allow a new search target to be added. - The
playback screen 13 is a screen which plays an image as a target. A playback processing for an image is controlled by thecontrol buttons 14. For example, thecontrol button 14 comprises “skip to previous event”, “reverse high-speed play”, “reverse play”, “frame-by-frame reverse”, “pause”, “frame-by-frame advance”, “play”, “high-speed play”, and “skip to next event” in this order from the left side inFIG. 6 . A further button for another function may be added or any useless buttons may be deleted from thecontrol buttons 14. - The
time bar 15 indicates a playback position relative to a whole image length. Thetime bar 15 comprises a slider which indicates a current playback position. When the slider is operated, theimage search apparatus 100 performs a processing to change the playback position. - The event marks 16 marks positions of detected events. Positions of the event marks 16 correspond to playback positions on the
time bar 15. When the “skip to previous event” or “skip to next event” of thecontrol buttons 14 is operated, theimage search apparatus 100 skips to a position of an event existing before or after the slider of thetime bar 15. - The event-
display setting button 17 comprises check boxes shown forlevels 1 to 5. Events corresponding to checked levels are marked as the event marks 16. Specifically, the user can make useless events undisplayed by operating the event-display setting button 17. - Further, the
output module 150 comprisesbuttons thumbnails 20 to 23, and asave button 24. - The
thumbnails 20 to 23 form a displayed list of events. Thethumbnails 20 to 23 respectively show best shot images for events, frame information (frame numbers), event levels, and additional information concerning the events. Theimage search apparatus 100 may be configured to show images of detected regions as thethumbnails 20 to 23 if a personal region or a face region is detected for each event. Thethumbnails 20 to 23 show events close to corresponding positions on the slider of thetime bar 15. - When the
button image search apparatus 100 switches one of thethumbnails 20 to 23 to another. For example, when thebutton 18 is operated, theimage search apparatus 100 then displays a thumbnail concerning an event existing before a currently displayed event. - Alternatively, when the
button 19 is operated, theimage search apparatus 100 then displays a thumbnail concerning an event existing after a currently displayed event. A thumbnail corresponding to an event being played on theplayback screen 13 is displayed, bordered as shown inFIG. 6 . - When any of the displayed
thumbnails 20 to 23 is selected by a double click, theimage search apparatus 100 skips to a playback position of a selected event and displays a corresponding image on theplayback screen 13. - The
save button 24 is to store an image or an image sequence of an event. When thesave button 24 is selected, theimage search apparatus 100 can then store, into an unillustrated storage module, an image of an event corresponding to a selected one of the displayedthumbnails 20 to 23. - If the
image search apparatus 100 saves an event as an image, this image to save may be selected and saved from a “face region”, “upper half body region”, “whole body region”, “whole moving region”, and “whole image” in accordance with an operation input. In this case, theimage search apparatus 100 may be configured to output a frame number, file name, and text file. Theimage search apparatus 100 outputs, as a file name for the text file, a file name having a different extension from that of an image file. Further, theimage search apparatus 100 may output all relevant information in text form. - When an event is an image sequence of the
level 1, theimage search apparatus 100 outputs, as an image sequence file, images for a duration throughout which a move continues sequentially. When an event is an image sequence of thelevel 2, theimage search apparatus 100 outputs, as an image sequence file, images corresponding to a range throughout which one identical human figure can be associated throughout a plurality of frames. - The
image search apparatus 100 can store the file which is thus output, as an evidence image or video which can be visually checked. Further, theimage search apparatus 100 can output the file to a system which performs comparison with preregistered human figure. - As described above, the
image search apparatus 100 is input with a monitor camera image or a recorded image, and extracts scenes where human figures are imaged, with the scenes associated with an image sequence. In this case, theimage search apparatus 100 assigns levels to extracted events, depending on reliability degrees indicating how reliably the human figures exist. Further, theimage search apparatus 100 controls a list of extracted events, linked with images. In this manner, theimage search apparatus 100 can output scenes where a human figure desired by the user is imaged. - For example, the
image search apparatus 100 allows the user to easily see images of detected human figures by outputting firstly an event of thelevel 5 and secondly an event of thelevel 4. Further, theimage search apparatus 100 makes the user see events throughout an entire image without fails, by displaying the events, switching the levels in order from 3 to 1. - Hereinafter, the second embodiment will be described. Features of configuration which are common to the first embodiment will be referred to common reference symbols, and detailed descriptions thereof will be omitted.
-
FIG. 7 is a diagram showing for explaining the configuration of animage search apparatus 100 according to the second embodiment. Theimage search apparatus 100 comprises animage input module 110, anevent detection module 120, a search-feature-information controlling module 130, anevent controlling module 140, anoutput module 150, and atime estimation module 160. - The
time estimation module 160 estimates a time point of an input image. Thetime estimation module 160 estimates a time point when the input image was imaged. Thetime estimation module 160 assigns information (time point information) indicating the estimated time point to the image input to theimage input module 110, and outputs the information to theevent detection module 120. - Although the
image input module 110 has substantially the same configuration as that of the first embodiment, time information indicating an imaging time point of an image is input, according to the present embodiment. For example, when an image is a file, theimage input module 110 and thetime estimation module 160 can associate frames of the image and time points with each other, based on time stamps and a frame rate of the file. - In digital video recorders (DVR) for monitor cameras, time point information is often graphically embedded in an image. Therefore, the
time estimation module 160 can generate time information by recognizing numerical figures expressing time points, which are embedded in the image. - The
time estimation module 160 can also obtain a current time point by using time point information obtained from a real time clock which is directly input from a camera. - There is a case that a meta file including information indicating time is added to an image file. In this case, a method is available for providing information indicating a relationship of respective frames with time points, in form of an external meta file as a caption information file, separately from the
time estimation module 160. Therefore, time information can be obtained by reading the external meta file. - If time information of an image is not supplied simultaneously together with the image, the
image search apparatus 100 prepares, as face images for search, face images which have been respectively preliminarily given imaging time points and ages, or face images for which imaging time points have been known and ages are estimated by using the face images - The
time estimation module 160 estimates an imaging time point, based on a method of using EXIF information added to a face image or a time stamp of a file. Alternatively, thetime estimation module 160 may be configured to use, as an imaging time point, time information input by an unillustrated operation input. - The
image search apparatus 100 calculates similarities between all face images detected from an input image and personal facial feature information for search, which is prestored in the search-feature-information controlling module 130. Theimage search apparatus 100 performs a processing from an arbitrary position of an image, and estimates an age for a face image for which a predetermined similarity is calculated first. Further, theimage search apparatus 100 backwardly calculates an imaging time point of an input image, based an average value or a mode value among differences between age estimation results for the face images for search and age estimation results for the face images for which the predetermined similarity has been calculated. -
FIG. 8 shows an example of the time estimation processing. As shown inFIG. 8 , ages are preliminarily estimated for the face images for search which are stored in the search-feature-information controlling module 130. In an example shown inFIG. 8 , a human figure of a face image for search is estimated to be 35 years old. In this state, theimage search apparatus 100 searches for the same human figure as of the face image for search by using facial features from an input image. A method for searching the same human figure is the same as described in the first embodiment. - The
image search apparatus 100 calculates similarities between all face images detected from an image and a face image for search. Theimage search apparatus 100 assigns a similarity “∘” to each face image for which a similarity is calculated to be a preset predetermined value or greater, and assigns a similarity “x” to each face image for which a similarity is calculated to be smaller than the predetermined value. - Based on the face images for which the similarity is calculated to be “∘”, the
image search apparatus 100 estimates an age for each of these face images by using the same method as described in the first embodiment. Further, theimage search apparatus 100 calculates an average value of the calculated ages, and estimates time point information indicating an imaging time point of an input image, based on a difference between the average value and an age estimated from the face image for search. In this method, theimage search apparatus 100 has been described to have a configuration of using an average value of calculated ages. However, theimage search apparatus 100 may be configured to use an intermediate value, a mode value, or any other value. - According to the example shown in
FIG. 8 , calculated ages are 40, 45, and 44. Therefore, an average value thereof is 43. An age difference of 8 years exists to the face image for search. - Specifically, the
image search apparatus 100 determines that the input image was imaged between theyear 2000 when the face image for search had been imaged and the year 2008 which is eight years after 2000. - If the input image is determined to have been imaged eight years later, for example, the
image search apparatus 100 specifies the imaging time point of the input image to be Aug. 23, 2008, including year/month/date, though depending on accuracy of age estimation. Specifically, theimage search apparatus 100 can estimate imaging date/time in units of days. - Further, the
image search apparatus 100 may be configured to estimate an age, for example, based on a face image detected first, as shown inFIG. 9 , and to estimate an imaging time point, based on the estimated age and the age of an image for search. According to this method, theimage search apparatus 100 can estimate an imaging time point faster. - The
event detection module 120 performs the same processing as the first embodiment. However, in the present embodiment, an imaging time point is added to an image. Theevent detection module 120 may be configured to associate not only frame information but also an imaging time point with each event detected. - Further, the
event detection module 120 may be configured to narrow estimated ages by using a difference between an imaging time point of a face image for search and an imaging time point of an input image, when theevent detection module 120 performs a processing of thelevel 5, i.e., when a scene where a specific person is imaged is detected from an input image. - In this case, as shown in
FIG. 10 , theevent detection module 120 estimates an age at the time when the input image of the human figure to search for was imaged, based on a difference between the imaging time of the face image for search and the imaging time point of the input image. Further, theevent detection module 120 estimates ages respectively for human figures in a plurality of events in which the human figures detected from the input image are imaged. Theevent detection module 120 detects an event in which a human figure close to the age at the time when the input image of the person in the face image for search was imaged. - In the example shown in
FIG. 10 , the face image for search was imaged in theyear 2000, and the human figure in the face image for search is estimated to be 35 years old. Further, the input image is known to be imaged in theyear 2010. In this case, theevent detection module 120 estimates that an age of the human figure in the face image for search is 35+(2010−2000)=45 at the time point of the input image. Theevent detection module 120 detects an event in which a human figure who is determined to be close to the estimated age of 45 is imaged. - For example, the
event detection module 120 sets, as a target for detecting an event, the age at the time when the input image of the human figure in the face image for search was imaged ±α. In this manner, theimage search apparatus 100 can more steadily detect events without fails. The value of α may be arbitrarily set based on a user's operation input or may be preset as a reference value. - As described above, the
image search apparatus 100 according to the present embodiment estimates a time point when an input image was imaged, in a processing of thelevel 5 for detecting a person from an input image. Further, the image search apparatus estimates an age at a time point when an input image of a human figure to search for was imaged. Theimage search apparatus 100 detects a plurality of scenes in which human figures are imaged, and estimates ages of the human figures who are imaged in the scenes. Theimage search apparatus 100 can detect a scene where a human figure who is estimated to have an age close to the age of the human figure to search for. As a result, theimage search apparatus 100 can detect, at a higher speed, scenes where a specific human figure is imaged. - In the present embodiment, the search-feature-
information controlling module 130 further retains time point information indicating a time point when a face image was imaged and information indicating an age at the time point of having imaged the face image, together with feature information extracted from the face image of each human figure. Ages may be either estimated from images or input by the user. -
FIG. 11 is a diagram showing for explaining an example of a screen displayed by theimage search apparatus 100. - The
output module 150 outputs anoutput screen 151 which comprisestime point information 25 indicating a time point of an image in addition to the same content as displayed in the first embodiment. Time point information of the image is thus displayed together. Further, theoutput screen 151 may be configured to display an age which is estimated based on an image displayed on aplayback screen 13. In this manner, the user can recognize an estimated age of a human figure displayed on theplayback screen 13. - Functions described in the above embodiment may be constituted not only with use of hardware but also with use of software, for example, by making a computer read a program which describes the functions. Alternatively, the functions each may be constituted by appropriately selecting either software or hardware.
- While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Claims (12)
1. An image search apparatus comprising:
an image input module which is input with an image;
an event detection module which detects events from the input image input by the image input module, and determines levels, depending on types of the detected events;
an event controlling module which retains the events detected by the event detection module, for each of the levels; and
an output module which outputs the events retained by the event controlling module, for each of the levels.
2. The image search apparatus of claim 1 , wherein the event detection module detects at least one of scenes, as an event, and determines a level for each of the at least one of scenes detected as an event, the scenes being a scene where a moving region exists, a scene where a personal region exists, a scene where a human figure corresponding to a preset attribute exists, and a scene where a preset person exists.
3. The image search apparatus of claim 2 , wherein the event detection module sets, as an attribute, at least one of a personal age, a gender, wearing glasses or not, a glasses type, wearing a mask or not, a mask type, wearing a headgear or not, a headgear type, a beard, a mole, a wrinkle, an injury, a hair style, a hair color, a wear color, a wear shape, a headgear, an ornament, an accessory near a face, a face look, a wealth degree, and a race.
4. The image search apparatus of claim 2 , wherein the event detection module detects a plurality of sequential frames as an event when the event detection module detects an event from the sequential frames.
5. The image search apparatus of claim 4 , wherein the event detection module selects, as a best shot, at least one of a frame in which a largest face region exists, a frame in which a human face faces in a direction closest to a front direction, and a frame in which an image of a face region has greatest contrast, among frames included in the detected event.
6. The image search apparatus of claim 2 , wherein the event detection module adds, to an event, frame information indicating a position of a frame from which an event is detected, in the input image.
7. The image search apparatus of claim 6 , wherein if a playback screen which displays the input image, and an event mark indicating a position of an event in the input image, which is retained by the event controlling module, and if the event mark is selected, the output module plays the input image from a frame indicated by the frame information added to the event corresponding to the selected event mark
8. The image search apparatus of claim 2 , wherein the output module saves, as an image or an image sequence, at least one of a face region, an upper-half body region, a whole body region, a whole moving region, and a whole region, concerning an event retained by the event controlling module.
9. The image search apparatus of claim 2 , wherein
the event detection module performs
estimating a time point when the input image was imaged,
estimating a first estimated age of a human figure in a face image for search at an imaging time point of the input image, based on a time point when the face image for search to detect a person was imaged, an age of the human figure in the face image for search at the time point when the face image for search was imaged, and the imaging time point of the input image,
estimating a second estimated age of a human figure imaged in the input image, and
detecting, as an event, a scene where the human figure for which the second estimated age has been estimated, the second estimated age having a difference not smaller than a preset predetermined value to the first estimated age.
10. The image search apparatus of claim 9 , wherein the event detection module estimates a time point when the input image was imaged, based on time point information embedded as an image in the input image.
11. The image search apparatus of claim 9 , wherein
the event detection module estimates a third estimated age of at least one human figure for which a similarity to the face image for search is not smaller than a preset predetermined value, among human figures imaged in the input image, and
the event detection module estimates a time point when the input image was imaged, based on a time point when the face image for search was imaged, an age of the human figure in the face image for search at the time point when the face image for search was imaged, and the third estimated age.
12. An image search method, comprising:
detecting events from an input image, and determining levels depending on types of the detected events;
retaining the detected events for each of the levels; and
outputting the retained events for each the levels.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010271508A JP5649425B2 (en) | 2010-12-06 | 2010-12-06 | Video search device |
JP2010-271508 | 2010-12-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120140982A1 true US20120140982A1 (en) | 2012-06-07 |
Family
ID=46162272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/232,245 Abandoned US20120140982A1 (en) | 2010-12-06 | 2011-09-14 | Image search apparatus and image search method |
Country Status (4)
Country | Link |
---|---|
US (1) | US20120140982A1 (en) |
JP (1) | JP5649425B2 (en) |
KR (1) | KR20120062609A (en) |
MX (1) | MX2011012725A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014021956A1 (en) * | 2012-07-31 | 2014-02-06 | Google Inc. | Customized video |
US20140149865A1 (en) * | 2012-11-26 | 2014-05-29 | Sony Corporation | Information processing apparatus and method, and program |
EP2787463A1 (en) * | 2013-04-01 | 2014-10-08 | Samsung Electronics Co., Ltd | Display apparatus for performing user certification and method thereof |
US20150088508A1 (en) * | 2013-09-25 | 2015-03-26 | Verizon Patent And Licensing Inc. | Training speech recognition using captions |
US20160117827A1 (en) * | 2014-10-27 | 2016-04-28 | Hanwha Techwin Co.,Ltd. | Apparatus and method for visualizing loitering objects |
US20160239712A1 (en) * | 2013-09-26 | 2016-08-18 | Nec Corporation | Information processing system |
WO2018097389A1 (en) * | 2016-11-23 | 2018-05-31 | 한화테크윈 주식회사 | Image searching device, data storing method, and data storing device |
US10354123B2 (en) * | 2016-06-27 | 2019-07-16 | Innovative Technology Limited | System and method for determining the age of an individual |
CN110277159A (en) * | 2013-01-11 | 2019-09-24 | 卓尔医学产品公司 | The system and defibrillator of medical events are checked for code |
US10747989B2 (en) | 2018-08-21 | 2020-08-18 | Software Ag | Systems and/or methods for accelerating facial feature vector matching with supervised machine learning |
US10748026B2 (en) * | 2015-10-09 | 2020-08-18 | Ihi Corporation | Line segment detection method |
CN111695419A (en) * | 2020-04-30 | 2020-09-22 | 华为技术有限公司 | Image data processing method and related device |
US11151360B2 (en) * | 2017-11-28 | 2021-10-19 | Tencent Technology (Shenzhen) Company Ltd | Facial attribute recognition method, electronic device, and storage medium |
CN113627221A (en) * | 2020-05-09 | 2021-11-09 | 阿里巴巴集团控股有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3549176B2 (en) * | 1997-07-28 | 2004-08-04 | 株式会社東芝 | Liquid crystal display device and method for manufacturing color filter substrate |
JP6039942B2 (en) * | 2012-07-09 | 2016-12-07 | キヤノン株式会社 | Information processing apparatus, control method thereof, and program |
JP2014134898A (en) * | 2013-01-08 | 2014-07-24 | Canon Inc | Image search apparatus |
JP5852171B2 (en) * | 2014-05-09 | 2016-02-03 | 株式会社Jストリーム | Content additional information provision system |
JP6214762B2 (en) * | 2014-05-22 | 2017-10-18 | 株式会社日立国際電気 | Image search system, search screen display method |
KR101713197B1 (en) | 2015-04-01 | 2017-03-09 | 주식회사 씨케이앤비 | Server computing device and system for searching image based contents cognition using the same |
KR101645517B1 (en) | 2015-04-01 | 2016-08-05 | 주식회사 씨케이앤비 | Apparatus and method for extracting keypoint and image matching system for analyzing distribution state of contents using the same |
DE102015207415A1 (en) * | 2015-04-23 | 2016-10-27 | Adidas Ag | Method and apparatus for associating images in a video of a person's activity with an event |
PL3131064T3 (en) * | 2015-08-13 | 2018-03-30 | Nokia Technologies Oy | Searching image content |
JP6483576B2 (en) * | 2015-09-01 | 2019-03-13 | 東芝情報システム株式会社 | Event judgment device and quantity prediction system |
KR102489557B1 (en) * | 2016-05-11 | 2023-01-17 | 한화테크윈 주식회사 | Image processing apparatus and controlling method thereof |
JP6738213B2 (en) * | 2016-06-14 | 2020-08-12 | グローリー株式会社 | Information processing apparatus and information processing method |
JP2018037029A (en) * | 2016-09-02 | 2018-03-08 | 株式会社C.U.I | Web site search display system, web site search display method, terminal, server device and program |
US11042753B2 (en) * | 2016-09-08 | 2021-06-22 | Goh Soo Siah | Video ingestion framework for visual search platform |
JP7120590B2 (en) * | 2017-02-27 | 2022-08-17 | 日本電気株式会社 | Information processing device, information processing method, and program |
JP7098752B2 (en) * | 2018-05-07 | 2022-07-11 | アップル インコーポレイテッド | User interface for viewing live video feeds and recorded videos |
US10904029B2 (en) | 2019-05-31 | 2021-01-26 | Apple Inc. | User interfaces for managing controllable external devices |
US11363071B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User interfaces for managing a local network |
EP4068791A4 (en) * | 2019-11-26 | 2023-11-01 | Hanwha Vision Co., Ltd. | Event-oriented multi-channel image backup device and method therefor, and network surveillance camera system comprising same |
KR102554705B1 (en) * | 2020-04-01 | 2023-07-13 | 한국전자통신연구원 | Method for generating metadata basaed on scene representation using vector and apparatus using the same |
US11513667B2 (en) | 2020-05-11 | 2022-11-29 | Apple Inc. | User interface for audio message |
US11657614B2 (en) | 2020-06-03 | 2023-05-23 | Apple Inc. | Camera and visitor user interfaces |
US11589010B2 (en) | 2020-06-03 | 2023-02-21 | Apple Inc. | Camera and visitor user interfaces |
EP4189682A1 (en) | 2020-09-05 | 2023-06-07 | Apple Inc. | User interfaces for managing audio for media items |
JP7279241B1 (en) | 2022-08-03 | 2023-05-22 | セーフィー株式会社 | system and program |
JP7302088B1 (en) | 2022-12-28 | 2023-07-03 | セーフィー株式会社 | system and program |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6064303A (en) * | 1997-11-25 | 2000-05-16 | Micron Electronics, Inc. | Personal computer-based home security system |
US20020191952A1 (en) * | 2001-04-09 | 2002-12-19 | Monitoring Technology Corporation | Data recording and playback system and method |
US20040125877A1 (en) * | 2000-07-17 | 2004-07-01 | Shin-Fu Chang | Method and system for indexing and content-based adaptive streaming of digital video content |
US20040207730A1 (en) * | 2003-01-08 | 2004-10-21 | Toshie Imai | Image processing of image data |
US20050151671A1 (en) * | 2001-04-04 | 2005-07-14 | Bortolotto Persio W. | System and a method for event detection and storage |
US6940545B1 (en) * | 2000-02-28 | 2005-09-06 | Eastman Kodak Company | Face detecting camera and method |
US20060159370A1 (en) * | 2004-12-10 | 2006-07-20 | Matsushita Electric Industrial Co., Ltd. | Video retrieval system and video retrieval method |
US20060170787A1 (en) * | 2005-02-02 | 2006-08-03 | Mteye Security Ltd. | Device, system, and method of rapid image acquisition |
US20070294716A1 (en) * | 2006-06-15 | 2007-12-20 | Samsung Electronics Co., Ltd. | Method, medium, and apparatus detecting real time event in sports video |
US20080159708A1 (en) * | 2006-12-27 | 2008-07-03 | Kabushiki Kaisha Toshiba | Video Contents Display Apparatus, Video Contents Display Method, and Program Therefor |
US20080166045A1 (en) * | 2005-03-17 | 2008-07-10 | Li-Qun Xu | Method of Tracking Objects in a Video Sequence |
US20080222671A1 (en) * | 2007-03-08 | 2008-09-11 | Lee Hans C | Method and system for rating media and events in media based on physiological data |
US20090154806A1 (en) * | 2007-12-17 | 2009-06-18 | Jane Wen Chang | Temporal segment based extraction and robust matching of video fingerprints |
US20090297032A1 (en) * | 2008-06-02 | 2009-12-03 | Eastman Kodak Company | Semantic event detection for digital content records |
US20120148094A1 (en) * | 2010-12-09 | 2012-06-14 | Chung-Hsien Huang | Image based detecting system and method for traffic parameters and computer program product thereof |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001251607A (en) * | 2000-03-06 | 2001-09-14 | Matsushita Electric Ind Co Ltd | Image monitor system and image monitor method |
JP4569190B2 (en) * | 2004-06-24 | 2010-10-27 | オムロン株式会社 | Suspicious person countermeasure system and suspicious person detection device |
JP4622702B2 (en) * | 2005-05-27 | 2011-02-02 | 株式会社日立製作所 | Video surveillance device |
JP2008154228A (en) * | 2006-11-24 | 2008-07-03 | Victor Co Of Japan Ltd | Monitoring video recording controller |
JP4636190B2 (en) * | 2009-03-13 | 2011-02-23 | オムロン株式会社 | Face collation device, electronic device, face collation device control method, and face collation device control program |
-
2010
- 2010-12-06 JP JP2010271508A patent/JP5649425B2/en active Active
-
2011
- 2011-09-09 KR KR1020110092064A patent/KR20120062609A/en active Search and Examination
- 2011-09-14 US US13/232,245 patent/US20120140982A1/en not_active Abandoned
- 2011-11-29 MX MX2011012725A patent/MX2011012725A/en active IP Right Grant
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6064303A (en) * | 1997-11-25 | 2000-05-16 | Micron Electronics, Inc. | Personal computer-based home security system |
US6940545B1 (en) * | 2000-02-28 | 2005-09-06 | Eastman Kodak Company | Face detecting camera and method |
US20040125877A1 (en) * | 2000-07-17 | 2004-07-01 | Shin-Fu Chang | Method and system for indexing and content-based adaptive streaming of digital video content |
US20050151671A1 (en) * | 2001-04-04 | 2005-07-14 | Bortolotto Persio W. | System and a method for event detection and storage |
US20020191952A1 (en) * | 2001-04-09 | 2002-12-19 | Monitoring Technology Corporation | Data recording and playback system and method |
US20040207730A1 (en) * | 2003-01-08 | 2004-10-21 | Toshie Imai | Image processing of image data |
US20060159370A1 (en) * | 2004-12-10 | 2006-07-20 | Matsushita Electric Industrial Co., Ltd. | Video retrieval system and video retrieval method |
US20060170787A1 (en) * | 2005-02-02 | 2006-08-03 | Mteye Security Ltd. | Device, system, and method of rapid image acquisition |
US20080166045A1 (en) * | 2005-03-17 | 2008-07-10 | Li-Qun Xu | Method of Tracking Objects in a Video Sequence |
US20070294716A1 (en) * | 2006-06-15 | 2007-12-20 | Samsung Electronics Co., Ltd. | Method, medium, and apparatus detecting real time event in sports video |
US20080159708A1 (en) * | 2006-12-27 | 2008-07-03 | Kabushiki Kaisha Toshiba | Video Contents Display Apparatus, Video Contents Display Method, and Program Therefor |
US20080222671A1 (en) * | 2007-03-08 | 2008-09-11 | Lee Hans C | Method and system for rating media and events in media based on physiological data |
US20090154806A1 (en) * | 2007-12-17 | 2009-06-18 | Jane Wen Chang | Temporal segment based extraction and robust matching of video fingerprints |
US20090297032A1 (en) * | 2008-06-02 | 2009-12-03 | Eastman Kodak Company | Semantic event detection for digital content records |
US20120148094A1 (en) * | 2010-12-09 | 2012-06-14 | Chung-Hsien Huang | Image based detecting system and method for traffic parameters and computer program product thereof |
Non-Patent Citations (3)
Title |
---|
Gavrila, DM, et al., "Vision-Based Pedestrian Detection: The Protector System", IEEE Intelligent Vehicles Symposium, 2004, pg. 1-6. * |
Medioni, G. et al., "Event Detection and Analysis from Video Streams", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 8, Aug 2001, pg. 873-889 * |
Yanagiuchi et al., English Translation of JP2007-310646, "Search Information Management Device, Search Information Management Program and Search Information Management Method", 29 November 2007, pg. 1-20/ * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11722738B2 (en) | 2012-07-31 | 2023-08-08 | Google Llc | Methods, systems, and media for causing an alert to be presented |
US11356736B2 (en) | 2012-07-31 | 2022-06-07 | Google Llc | Methods, systems, and media for causing an alert to be presented |
WO2014021956A1 (en) * | 2012-07-31 | 2014-02-06 | Google Inc. | Customized video |
US11012751B2 (en) | 2012-07-31 | 2021-05-18 | Google Llc | Methods, systems, and media for causing an alert to be presented |
US10469788B2 (en) | 2012-07-31 | 2019-11-05 | Google Llc | Methods, systems, and media for causing an alert to be presented |
US9826188B2 (en) | 2012-07-31 | 2017-11-21 | Google Inc. | Methods, systems, and media for causing an alert to be presented |
US20140149865A1 (en) * | 2012-11-26 | 2014-05-29 | Sony Corporation | Information processing apparatus and method, and program |
CN110277159A (en) * | 2013-01-11 | 2019-09-24 | 卓尔医学产品公司 | The system and defibrillator of medical events are checked for code |
EP2787463A1 (en) * | 2013-04-01 | 2014-10-08 | Samsung Electronics Co., Ltd | Display apparatus for performing user certification and method thereof |
US9323982B2 (en) | 2013-04-01 | 2016-04-26 | Samsung Electronics Co., Ltd. | Display apparatus for performing user certification and method thereof |
US9418650B2 (en) * | 2013-09-25 | 2016-08-16 | Verizon Patent And Licensing Inc. | Training speech recognition using captions |
US20150088508A1 (en) * | 2013-09-25 | 2015-03-26 | Verizon Patent And Licensing Inc. | Training speech recognition using captions |
US10037467B2 (en) * | 2013-09-26 | 2018-07-31 | Nec Corporation | Information processing system |
US20160239712A1 (en) * | 2013-09-26 | 2016-08-18 | Nec Corporation | Information processing system |
US9740941B2 (en) * | 2014-10-27 | 2017-08-22 | Hanwha Techwin Co., Ltd. | Apparatus and method for visualizing loitering objects |
US20160117827A1 (en) * | 2014-10-27 | 2016-04-28 | Hanwha Techwin Co.,Ltd. | Apparatus and method for visualizing loitering objects |
US10748026B2 (en) * | 2015-10-09 | 2020-08-18 | Ihi Corporation | Line segment detection method |
US10354123B2 (en) * | 2016-06-27 | 2019-07-16 | Innovative Technology Limited | System and method for determining the age of an individual |
US11449544B2 (en) * | 2016-11-23 | 2022-09-20 | Hanwha Techwin Co., Ltd. | Video search device, data storage method and data storage device |
WO2018097389A1 (en) * | 2016-11-23 | 2018-05-31 | 한화테크윈 주식회사 | Image searching device, data storing method, and data storing device |
US11151360B2 (en) * | 2017-11-28 | 2021-10-19 | Tencent Technology (Shenzhen) Company Ltd | Facial attribute recognition method, electronic device, and storage medium |
US10747989B2 (en) | 2018-08-21 | 2020-08-18 | Software Ag | Systems and/or methods for accelerating facial feature vector matching with supervised machine learning |
CN111695419A (en) * | 2020-04-30 | 2020-09-22 | 华为技术有限公司 | Image data processing method and related device |
CN113627221A (en) * | 2020-05-09 | 2021-11-09 | 阿里巴巴集团控股有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
JP2012123460A (en) | 2012-06-28 |
KR20120062609A (en) | 2012-06-14 |
JP5649425B2 (en) | 2015-01-07 |
MX2011012725A (en) | 2012-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20120140982A1 (en) | Image search apparatus and image search method | |
KR102560308B1 (en) | System and method for exterior search | |
KR101490016B1 (en) | Person image processing apparatus and person image processing method | |
US8861801B2 (en) | Facial image search system and facial image search method | |
JP5444137B2 (en) | Face image search device and face image search method | |
TWI742300B (en) | Method and system for interfacing with a user to facilitate an image search for a person-of-interest | |
US9171012B2 (en) | Facial image search system and facial image search method | |
US9626551B2 (en) | Collation apparatus and method for the same, and image searching apparatus and method for the same | |
KR100996066B1 (en) | Face-image registration device, face-image registration method, face-image registration program, and recording medium | |
US8379931B2 (en) | Image processing apparatus for retrieving object from moving image and method thereof | |
US20060271525A1 (en) | Person searching device, person searching method and access control system | |
US10037467B2 (en) | Information processing system | |
US10303927B2 (en) | People search system and people search method | |
JP2005210573A (en) | Video image display system | |
JP6529314B2 (en) | IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND PROGRAM | |
WO2019083509A1 (en) | Person segmentations for background replacements | |
JP5787686B2 (en) | Face recognition device and face recognition method | |
JP2014016968A (en) | Person retrieval device and data collection device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: KABUSHIKI KAISHA TOSHIBA, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SUKEGAWA, HIROSHI;YAMAGUCHI, OSAMU;REEL/FRAME:027091/0526 Effective date: 20110907 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |