US20130129190A1 - Model-Based Stereo Matching - Google Patents

Model-Based Stereo Matching Download PDF

Info

Publication number
US20130129190A1
US20130129190A1 US12/952,431 US95243110A US2013129190A1 US 20130129190 A1 US20130129190 A1 US 20130129190A1 US 95243110 A US95243110 A US 95243110A US 2013129190 A1 US2013129190 A1 US 2013129190A1
Authority
US
United States
Prior art keywords
model
stereo
fused
input
confidence measure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/952,431
Other versions
US8447098B1 (en
Inventor
Scott D. Cohen
Qingxiong Yang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Adobe Inc
Original Assignee
Adobe Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Adobe Systems Inc filed Critical Adobe Systems Inc
Priority to US12/952,431 priority Critical patent/US8447098B1/en
Assigned to ADOBE SYSTEMS INCORPORATED reassignment ADOBE SYSTEMS INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: COHEN, SCOTT D., YANG, Qing-xiong
Application granted granted Critical
Publication of US8447098B1 publication Critical patent/US8447098B1/en
Publication of US20130129190A1 publication Critical patent/US20130129190A1/en
Assigned to ADOBE INC. reassignment ADOBE INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: ADOBE SYSTEMS INCORPORATED
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Definitions

  • This disclosure relates generally to image processing, and more specifically, stereo image processing.
  • FIG. 1 illustrates an example of a result of a conventional stereo matching technique, as applied to a human face, and indicates problem areas caused by occlusions, lack of texture, and specular highlights.
  • model-based stereo matching Various embodiments of model-based stereo matching are described. Reliable correspondences will be the basis of many stereo image processing tool features, such as a paint brush that simultaneously paints or applies some local effect to the corresponding areas of a stereo pair, and automatic view morphing. Embodiments may implement a model-based stereo matching technique that may be used to obtain a high quality depth map and/or other output for an object, such as a human face, from an input pair of stereo images.
  • Some embodiments may employ a three-dimensional (3D) face model method that may regularize and address the problems encountered in conventional stereo matching techniques.
  • 3D three-dimensional
  • One integrated modeling method is described that combines the coarse shape of a subject's face, obtained by stereo matching, with details from a 3D face model, which may be of a different person, to create a smooth, high quality depth map that captures the characteristics of the subject's face.
  • a semi-automated process may be used to align the facial features of the subject and the 3D model.
  • a fusion technique may be employed that utilizes a stereo matching confidence measure to assist in intelligently combining the ordinary stereo results and the roughly aligned 3D model.
  • a shape-from-shading method may be employed with a simple Lambertian model to refine the normals implied by the fusion output depth map and to bring out very fine facial details such as wrinkles and creases that may not be possible to capture with conventional stereo matching.
  • the quality of the normal maps may allow them to be used to re-light a subject's face from different light positions.
  • inputs to the framework may include a stereo image pair of a person's face and a pre-established face model, for example obtained from a 3D laser scanner, which is of a different subject than the subject in the stereo image pair.
  • a library of models or model database that includes a plurality of models may be provided as inputs and used in the framework instead of a single model.
  • Embodiments may apply stereo vision to the input stereo image pair to obtain a rough 3D face model, which may be limited in accuracy, and then use it to guide the registration and alignment of the laser-scanned face model.
  • Embodiments may employ a method that combines the rough 3D face model with the laser-scanned face model to produce a fused model that approximates both, such that the details from the laser-scanned face model can be transferred to the model obtained from stereo vision.
  • the formulation used by embodiments may be linear and can be solved efficiently, for example using a conjugated gradient method.
  • the method can also naturally integrate the confidence of the result obtained from stereo vision.
  • At least some embodiments may employ loopy belief propagation in a confidence estimation technique.
  • At least some embodiments may employ a method for estimating the surface normal and light direction.
  • the fused model may be refined using shading information from the stereo image pair.
  • FIG. 1 illustrates an example of a result of a conventional stereo matching technique, as applied to a human face, and indicates problem areas caused by occlusions, lack of texture, and specular highlights.
  • FIG. 2 illustrates an example of a stereo pair of images (a left and right image) captured using a stereo camera.
  • FIG. 3 illustrates an example laser-scanned 3D model of a human face.
  • FIG. 4 illustrates an example 3D model database.
  • FIG. 5 is a high-level block diagram that shows example inputs to the model-based stereo matching method, and an example depth map output, according to at least some embodiments.
  • FIG. 6 illustrates an example module that may implement an integrated modeling method, according to some embodiments.
  • FIG. 7 is a block diagram illustrating the operation of a model-based stereo matching module.
  • FIG. 8 illustrates iteratively performing sensor fusion and light direction and surface normal estimation to provide integrated estimation of depth, normal, light direction, and albedo, according to some embodiments.
  • FIG. 9 is a flowchart of an integrated modeling method, according to at least some embodiments.
  • FIG. 10 illustrates an example computer system that may be used in embodiments.
  • FIG. 11 illustrates modeling results for an example face, according to some embodiments.
  • such quantities may take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared or otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to such signals as bits, data, values, elements, symbols, characters, terms, numbers, numerals or the like. It should be understood, however, that all of these or similar terms are to be associated with appropriate physical quantities and are merely convenient labels. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout this specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining” or the like refer to actions or processes of a specific apparatus, such as a special purpose computer or a similar special purpose electronic computing device.
  • a special purpose computer or a similar special purpose electronic computing device is capable of manipulating or transforming signals, typically represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the special purpose computer or similar special purpose electronic computing device.
  • Embodiments may implement a model-based stereo matching technique that may be used to obtain a high quality depth map and other outputs for a human face, or for other types of objects, from an input stereo pair of images.
  • An integrated modeling method is described that combines the coarse shape of a subject's face obtained by stereo matching with the details from a 3D face model (of a different person) to create a smooth, high quality depth map that captures the characteristics of the subject's face.
  • the stereo pair of images may be captured using a stereo camera that may, in some embodiments, collectively serve as one input to the disclosed stereo matching process.
  • an n-way stereo that takes in n images could be provided as an input to the disclosed stereo matching process.
  • the input images may be lit from any direction, including from the camera direction. This may allow a flash to be used in capturing the images.
  • FIG. 3 shows an example laser-scanned 3D model of a human face that may, in some embodiments, serve as one input to the disclosed stereo matching process.
  • a library of models or a model database that includes a plurality of models may be used instead of a single 3D model.
  • FIG. 4 shows an example of such a model database.
  • the input 3D model may be a non-laser-scanned model.
  • the output of the disclosed process may be fed back and used as the input model in one iterative embodiment.
  • FIG. 5 is a high-level block diagram that shows example inputs, in the form of a pair of stereo images and a laser-scanned 3D model, to the model-based stereo matching method, and an example depth map output, according to at least some embodiments.
  • a semi-automated process may be used to align the facial features of the subject and the 3D model.
  • the alignment process may be fully automated.
  • a fusion algorithm may then employ a stereo matching confidence measure to assist in intelligently combining the ordinary stereo results with the roughly-aligned 3D model.
  • a shape-from-shading technique may be employed with a simple Lambertian model to refine the normals implied by the fusion output depth map and to bring out very fine facial details such as wrinkles and creases that were not possible to capture with conventional stereo matching. The quality of the normal maps may enable them to re-light a subject's face from different light positions.
  • Embodiments of an integrated modeling method may be implemented in a model-based stereo matching module implemented by program instructions stored in a computer-readable storage medium and executable by one or more processors (e.g., one or more CPUs and/or GPUs).
  • the model-based stereo matching module may implement an interactive modeling method in which at least a portion of the modeling process may be guided by user input, for example, to guide a model registration process.
  • Embodiments of the model-based stereo matching module may, for example, be implemented as a stand-alone application, as a module of an application, as a plug-in for applications including image processing applications, and/or as a library function or functions that may be called by other applications such as image processing applications.
  • Embodiments of the model-based stereo matching module may be implemented in any image processing application.
  • An example model-based stereo matching module that may implement the integrated modeling method, as described herein, is illustrated in FIGS. 6 and 7 .
  • An example system in which a model-based stereo matching module may be implemented is illustrated in FIG. 10 .
  • FIG. 6 illustrates an example module that may implement embodiments of the integrated modeling method(s), as described herein.
  • Model-based stereo matching module 100 may, for example, implement a model from stereo vision method as submodule 120 , a semi-automatic model registration method as submodule 130 , a sensor fusion method as submodule 140 , and a light direction and surface normal estimation method as submodule 150 .
  • Module 100 may receive, as input 110 , a laser-scanned 3D model (or, alternatively, a model database) and a pair of images captured by a stereo camera.
  • the input model may be a non-laser-scanned 3D model.
  • the output of module 100 may be fed back as the input model to module 100 in one iterative embodiment.
  • Module 100 may perform the integrated modeling method, for example as described below in relation to FIGS. 7 and 9 . Some embodiments may iteratively perform sensor fusion 140 and light direction and surface normal estimation 150 , as shown in FIG. 8 , to provide integrated estimation of depth, surface normal, light direction, and albedo.
  • Module 100 may receive user input 104 .
  • a user may specify points as user input 104 for use in the registration/alignment process, described below, by submodule 130 .
  • module 100 may provide a user interface 102 via which a user may interact with the module 100 , for example, via user input 104 to specify points for registration, or to perform other interactive tasks.
  • Output 170 may include, but is not limited to, a depth map, surface albedo, and a surface normal map. Output 170 may, for example, be stored to a storage medium 180 , such as system memory, a disk drive, DVD, CD, etc. Output 170 may also be passed to one or more other modules 190 for further processing.
  • FIG. 7 is a block diagram illustrating the operation of a model-based stereo matching module 100 that implements an integrated modeling method according to at least some embodiments.
  • the integrated modeling method may include several components that may be implemented in the model-based stereo matching module 100 as submodules:
  • a model from stereo vision method implemented as submodule 120 a model from stereo vision method implemented as submodule 120 ;
  • each of these components may be implemented as separate modules implemented by program instructions stored in a computer-readable storage medium and executable by one or more processors (e.g., one or more CPUs and/or GPUs), as shown in FIG. 10 .
  • the separate modules may be provided as modules of an application, as plug-ins for modules or applications including image processing modules or applications, and/or as library functions that may be called by other modules or applications such as image processing modules or applications.
  • inputs 110 to model-based stereo matching module 100 may include a laser-scanned 3D model (M L ) (see, for example, FIG. 3 ) and a stereo image pair (I L and I R ) (see, for example, FIG. 2 ).
  • the stereo image pair may be the resulting images from a stereo camera snapshot.
  • an n-way stereo that takes in n images could be provided to input 110 .
  • a model database may replace the single laser-scanned 3D model as an input. See FIG. 4 for an example model database.
  • the input model is a non laser-scanned model.
  • the output of model-based stereo matching module 100 may be a final face model including, but not limited to, a depth map (D F ), normal map (N) and surface albedo (A). See FIG. 5 for an example output depth map.
  • a stereo pair (a left and right image, designated I L and I R , respectively) may be provided to or obtained by submodule 120 .
  • Submodule 120 may perform stereo matching to generate its outputs, which may include an estimated stereo depth map (D S ), confidence map (C S ) and a 3D stereo model (M S ), which may be established from the estimated stereo depth map.
  • D S estimated stereo depth map
  • C S confidence map
  • M S 3D stereo model
  • submodule 120 may utilize a loopy belief propagation (BP) based binocular stereo matching method.
  • the method may be used for face reconstruction, i.e., to generate M S and other outputs.
  • a global optimization method rather than local optimization, may be employed. Global optimization may be more robust on low-textured surfaces such as faces.
  • an efficient BP algorithm such as a constant space belief propagation (CSBP) algorithm, may be implemented to compute a disparity map. Use of a CSBP algorithm may result in a speed and memory cost improvement.
  • a disparity as used herein, means how many pixels away the matching pixels in two stereo images is calculated to be. For example, if a pixel at coordinates (3, 11) in stereo image I L is calculated to correspond to pixel (7, 11) in stereo image I R. , the disparity will be 4. Other methods or technologies to compute a disparity map may also be used.
  • a stereo confidence measure may be computed in submodule 120 .
  • a BP technique used for stereo matching may be used to find a corresponding pixel in the other stereo image by looking at the same row of pixels (height).
  • a fast-converging BP algorithm may be used. The algorithm may begin with each pixel in one image matching its colors to pixels in the other image to guess what its disparity may be. The algorithm may further integrate each pixel's calculation of its own disparity along with what neighboring pixels calculate (believe) as well. Further, the algorithm may be iterative such that, at each iteration, each pixel updates its individual belief and neighboring pixels update and propagate their beliefs.
  • the algorithm may identify each pixel as converging or not converging to a disparity. By detecting non-converged pixels and updating the messages of those pixels, the algorithm may decrease the running time in situations with a large number of iterations. After several iterations, the number of non-converged statuses (let it be T) may be accumulated for each pixel. Pixels with a greater number of non-converged statuses exhibit a lower confidence of the calculated disparity while pixels with a lesser number of non-converged statuses are more confident about the calculated disparity. For each given pixel, T values result in a number describing a confidence measure.
  • C S includes a value for each pixel from 0 to 1, with 0 representing less confidence and 1 representing more confidence.
  • the stereo confidence, C S may be used in the fusion process described below or in any other algorithm or process that may benefit from knowing the confidence of stereo matching. Other processes that use stereo matching may benefit from the confidence measures.
  • M S may need to be aligned with the laser-scanned model.
  • Submodule 130 may register the stereo model M S generated by submodule 120 with the input laser-scanned model M L .
  • a user may be able to provide an input 132 to submodule 130 .
  • submodule 130 may be fully automated, and not allow any user input.
  • M L may include some predefined points. The predefined points may be predefined by a user or automatically predefined by an algorithm.
  • a user may be able to select one or more points on M S which correspond to the predefined points of the laser-scanned model M L . For example, illustrated in FIG.
  • a user may select four points (crosses on the bottom image of user input 132 ) of M S that correspond to four predefined points (white circles on the top image of user input 132 ) of M L .
  • the four correspondences may then be used to compute a coarse transformation between M L and M, shown in global registration 134 .
  • the transform in global registration 134 may include a constant scalar, a rotation matrix and a translation vector.
  • the transform may be computed using a method that performs a least-squares estimation of transformation parameters between two point patterns.
  • the resulting coarse transformation may then be iteratively revised, which may, in some embodiments, utilize all points in the models and not just the predefined and selected points.
  • the revising is performed using an algorithm such as iterative closest point (ICP).
  • ICP iterative closest point
  • the course estimation of the transform may be used as an initial estimation in the ICP technique, which may revise the transformation (rotation and translation) and minimize the distance between the two models.
  • local manual adjustment 136 may also be used to improve the registration accuracy. Small variances around some features, for example around the mouth area, may be hard to capture in the stereo model. Thus, it may be difficult to register such a feature on the stereo model correctly with the laser-scanned model. As shown in FIG. 7 , in the dotted box of the rightmost image of local adjustment 136 , the mouth region of the registered model before local adjustment may not be well aligned.
  • the contour of the feature (e.g., mouth) on the laser-scanned model and several key points (p L ) on the contour may be manually selected in advance. For each input stereo model, the contour of the feature (e.g., mouth) on the stereo model may be selected by several line segments.
  • the contour does not need to be very precise as the transform around the feature may be very smooth.
  • the correspondence of the key points on the stereo model may also need to be identified, let them be designated as (p S ).
  • the motion vectors of the key points may then be computed as the difference of the key points P L -P S , and the motion vector of the other points on the contour of the feature may then be interpolated from the motion vectors of these key points.
  • a Poisson interpolation technique may be used to estimate the motion vectors for every point inside the feature area with the boundary conditions that the estimated motion vectors will be the same as those on the bounding box and the contour of the feature.
  • Submodule 130 may output the registered laser-scanned model and a corresponding depth map computed from this model, referred to as D L .
  • Submodule 140 may fuse the stereo model depth map D S with the registered, aligned laser-scanned model depth map D L and generate a new model that approximates both input models.
  • the new fused model may include the shape of the stereo model and the smoothness and detail of the aligned/referenced model.
  • the upper left image of submodule 140 shows the fused depth map D F , which may be smoother than the depth map from stereo vision D S and more detailed than the aligned model depth map D L .
  • Many differences exist between the details of the fused model and D L . For instance, the eyes of D L are lower than the eyes of the fused model, and the curvature of the region between the chin and the mouth of M L is larger than the fused model. More views of the screenshots of the fused model are presented on the bottom row of images of submodule 140 .
  • Stereo confidence C S may also used in the fusion step to intelligently combine D S and D L .
  • One objective of the sensor fusion method of submodule 140 may be to transfer the details (high-frequency component) from D L to D S , while keeping the large-scale variations (low-frequency component) of D S .
  • depth function D F may conform to the estimates for the gradient computed from D L and the depth obtained from D S at each point. To accomplish this, in at least some embodiments, a depth function may minimize the sum of two error terms: the gradient error E G and the depth error E D .
  • the gradient error may be defined as the sum of squared distances between the partial derivatives of the optimized depth value D F and the depth values obtained from D L :
  • the depth error E D may be defined as the sum of squared distances between the optimized depth value D F and that from stereo vision D S :
  • D i F is the depth value of the ith optimized point
  • D i L and D i S are the depth values of the ith point obtained from the laser scanner and stereo vision, respectively.
  • a depth map D F may then be given by solving
  • the depth map D F may be computed as follows:
  • C S ⁇ [0, 1] may control how much influence the depth error has in the optimization.
  • the method considers the result obtained from the laser-scanned 3D input exclusively, except in boundary conditions.
  • C S is 1, the method returns the depth values from stereo matching exclusively.
  • the method performs a weighted combination of the two inputs.
  • C S may be higher in high texture areas, such as eyebrows while C S may be lower in occluded areas, in areas with oblique angles, and in low-texture areas.
  • Each point/pixel may generate three equations. These equations may include one for the depth error and one for the gradient error in each of the x and y directions. Before squaring, the equations for the error terms are linear in the depth values being solved for. Therefore, the entire minimization can be formulated as a large over-constrained linear system to be solved, for example, by a least squares technique:
  • U is an identity matrix
  • the matrix multiplication result is the gradient of D F in the x direction.
  • a T b ( C S ) 2 (2 ⁇ ) 2 D S +(1 ⁇ C S ) 2 ⁇ D L (10)
  • equation (8) can be solved using a conjugated gradient method.
  • matrix A T A may be large, it may also be very sparse. Therefore, the number of non-zero entries may be linear in the number of pixels because there may be at most five non-zero entries per row (one coefficient for the depth of the reference pixel and the other for its neighbors used to find the second partial derivatives).
  • the fused depth map D F may then be provided to submodule 150 for surface normal and light direction estimation.
  • submodule 150 may roughly estimate the normal and robustly compute the light direction followed by refining the normal using the light direction to bring out details of the object.
  • Normals may be vectors [x,y,z], such that x is red, y is green, and z is blue.
  • a body part pointing right back at the camera, like a chin may be blue.
  • the underside of the nose points down along the y axis and thus may be green.
  • submodule 150 assumes that the albedo of the skin is a constant, and detects skin pixels based on surface chromaticities.
  • a coarse normal map N(D F ) may be computed from the fused depth map D F .
  • such a normal map may include various incorrect details of the laser-scanned 3D model.
  • at least some embodiments may smooth the fused depth map D F , and then create a corresponding normal map N F from the smoothed depth map.
  • the light direction and skin albedo may then be estimated using the intensity values of the detected skin pixels, and the corresponding normal vectors may be obtained from N F .
  • the estimated light direction, skin albedo and intensity values are used to refine the coarse normal estimate N F to obtain a refined normal map N.
  • the light direction, normal map N, and the color values of the input image may then be used to compute the albedo at each pixel location, and can be used for scene re-lighting.
  • a re-lighting example is shown in the rightmost image of submodule 150 in FIG. 7 .
  • the method may first locate all the skin pixels based on surface chromaticities, and then compute a coarse normal map N F from the input depth map. Assuming that the albedo is constant over all skin pixels, the method may then compute the light direction L using the coarse normal map N F and the intensity of the skin pixels, for example using a simple Lambertian model. The coarse normal and the image intensity at each pixel location may then be used together with the estimated light direction to solve for the final normal at the current pixel location using the same Lambertian model.
  • the input depth map D F may be refined using the shading information of the stereo image.
  • the refined depth map may be more consistent with the other outputs that have been computed, e.g., the normals.
  • One algorithm to refine a depth map is detailed below. Another algorithm is provided in the provisional application this application claims priority to.
  • the refined depth function be Z and the intrinsic matrix of the stereo camera be K
  • Z 0 be the depth at pixel location [x, y]
  • Z x be the depth at [x+1, y]
  • Z y be the depth at [x, y+1]
  • x+y+1.
  • the normal at [x, y] will be:
  • Equation (23) may then be simplified as
  • some embodiments may iteratively perform sensor fusion 140 and light direction and surface normal estimation 150 to provide integrated estimation of depth, surface normal, light direction, and albedo.
  • the outputs from light direction and surface normal estimation 150 may be fed back to sensor fusion 140 to iteratively improve overall results.
  • outputs from light direction and surface normal estimation 150 may be fed back to another component of model-based stereo matching module 100 .
  • the output model may replace the input laser-scanned model or be added to the library of models.
  • the depth map may be improved by using the normals from submodule 150 as an additional input to the fusion module 140 and by modifying the basic fusion algorithm to include the additional input.
  • the basic fusion algorithm is given by the following 3 equations:
  • the second and third equations could be replaced with a term that encourages the normals implied by the fused result to agree with the input normals.
  • the normals implied by the fused depth map, N(D F ) may be specified with equations (13)-(16) above (with D F instead of Z).
  • the fusion algorithm may then include:
  • N is the normal output from sensor fusion 140 and the equation is computed at each pixel (x,y). (*) could replace the second and third equations using the laser-scanned model or be added to the algorithm.
  • FIG. 9 is a flowchart of an integrated modeling method, according to at least some embodiments.
  • a plurality of stereo images of an object e.g., a human face
  • at least one three-dimensional input model of the same type of object may be received.
  • a single, laser-scanned model may be obtained.
  • a model database including a plurality of models may be obtained.
  • the input 3D model may be the output of a previous iteration of the integrated modeling method.
  • the input 3D model is a non-laser-scanned model.
  • the object may be any type of object including, but not limited to, human faces, animals, plants, or landscapes.
  • a three-dimensional model of the object may be generated from the plurality of stereo images of the object.
  • generating a three-dimensional model of the object may be performed by applying belief propagation (BP) based binocular stereo matching technology.
  • generating a three-dimensional model of the object may include applying constant space belief propagation (CSBP) technology to compute a disparity map.
  • CSBP constant space belief propagation
  • generating a 3D model of the object may include computing a stereo confidence C S and/or a stereo depth map D S .
  • the stereo model M S may be aligned, or registered, with the at least one input model M L resulting in an aligned model.
  • texture data of the input model may not be used in the alignment process.
  • Aligning the two models may include receiving a user input, such as selecting points on M S that correspond to predetermined points on M L .
  • a course transformation, or global registration may be computed based on the correspondences.
  • Global registration may also include iteratively revising the transformation. In one embodiment, the iterative revision may be performed using an iterative closest point algorithm. The results of global registration may be locally adjusted to refine the output aligned/registered model.
  • a fused model may be generated by combining the depth map of the object D S with the aligned-model depth map D L .
  • the fused model may approximate both input models including the shape of the stereo model and the detail and smoothness of the aligned model.
  • the fusion process may compute a fused depth map that may minimize the sum of a gradient error and a depth error, as discussed above.
  • the stereo confidence C S may be used to intelligently combine D S and D L .
  • C S may be a value from 0 to 1, inclusive, for each pixel. If the confidence of a pixel in the stereo model is 0, then the corresponding pixel in the fused model may be generated entirely from the corresponding pixel in the aligned model. If the confidence of a pixel in the stereo model is 1, then the corresponding pixel in the fused model may be generated entirely from the stereo model.
  • a surface normal map and a light direction may be estimated from the fused model.
  • a rough surface normal may be estimated followed by computing the light direction.
  • the normal may be refined using the computed light direction, which may result in bringing out details of the object.
  • a skin albedo may also be calculated.
  • the surface normal map may be refined according to the light direction, albedo, and intensity values to generate a refined surface normal map.
  • some of or all of elements 200 - 208 may be iteratively performed.
  • One embodiment is illustrated with the feedback line from step 208 to the input of step 206 .
  • the generated surface normal map and estimated light direction and albedo may be fed back to the fusion step to iteratively improve results of the fused depth map D F .
  • elements 200 - 208 may be performed using only one input 3D model. In other embodiments, elements 200 - 208 may be performed using more than one input 3D model.
  • Some embodiments may provide interactive tools for editing disparity maps given stereo pairs.
  • user interface elements may be provided that allow a user to pick a model from a set of models displayed on the user interface and, for example, drop the model on an object in one of the views for disparity refinement.
  • a user interface may provide one or more user interface elements or tools (e.g., brushes) via which the user may adjust previously computed disparity maps.
  • Embodiments of a model-based stereo matching module and/or of the various submodules as described herein may be executed on one or more computer systems, which may interact with various other devices.
  • One such computer system is illustrated by FIG. 10 .
  • computer system 1000 may be any of various types of devices, including, but not limited to, a personal computer system, desktop computer, laptop, notebook, or netbook computer, mainframe computer system, handheld computer, workstation, network computer, a camera, a set top box, a mobile device, a consumer device, video game console, handheld video game device, application server, storage device, a peripheral device such as a switch, modem, router, or in general any type of computing or electronic device.
  • computer system 1000 includes one or more processors 1010 coupled to a system memory 1020 via an input/output (I/O) interface 1030 .
  • Computer system 1000 further includes a network interface 1040 coupled to I/O interface 1030 , and one or more input/output devices 1050 , such as cursor control device 1060 , keyboard 1070 , and display(s) 1080 .
  • I/O input/output
  • embodiments may be implemented using a single instance of computer system 1000 , while in other embodiments multiple such systems, or multiple nodes making up computer system 1000 , may be configured to host different portions or instances of embodiments.
  • some elements may be implemented via one or more nodes of computer system 1000 that are distinct from those nodes implementing other elements.
  • computer system 1000 may be a uniprocessor system including one processor 1010 , or a multiprocessor system including several processors 1010 (e.g., two, four, eight, or another suitable number).
  • processors 1010 may be any suitable processor capable of executing instructions.
  • processors 1010 may be general-purpose or embedded processors implementing any of a variety of instruction set architectures (ISAs), such as the x86, PowerPC, SPARC, or MIPS ISAs, or any other suitable ISA.
  • ISAs instruction set architectures
  • each of processors 1010 may commonly, but not necessarily, implement the same ISA.
  • At least one processor 1010 may be a graphics processing unit.
  • a graphics processing unit or GPU may be considered a dedicated graphics-rendering device for a personal computer, workstation, game console or other computing or electronic device.
  • Modern GPUs may be very efficient at manipulating and displaying computer graphics, and their highly parallel structure may make them more effective than typical CPUs for a range of complex graphical algorithms.
  • a graphics processor may implement a number of graphics primitive operations in a way that makes executing them much faster than drawing directly to the screen with a host central processing unit (CPU).
  • the image processing methods disclosed herein may, at least in part, be implemented by program instructions configured for execution on one of, or parallel execution on two or more of, such GPUs.
  • the GPU(s) may implement one or more application programmer interfaces (APIs) that permit programmers to invoke the functionality of the GPU(s). Suitable GPUs may be commercially available from vendors such as NVIDIA Corporation, ATI Technologies (AMD), and others.
  • APIs application programmer interfaces
  • System memory 1020 may be configured to store program instructions and/or data accessible by processor 1010 .
  • system memory 1020 may be implemented using any suitable memory technology, such as static random access memory (SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type of memory.
  • SRAM static random access memory
  • SDRAM synchronous dynamic RAM
  • program instructions and data implementing desired functions, such as those described above for embodiments of a model-based stereo matching module and/or of the various submodules as described herein are shown stored within system memory 1020 as program instructions 1025 and data storage 1035 , respectively.
  • program instructions and/or data may be received, sent or stored upon different types of computer-accessible media or on similar media separate from system memory 1020 or computer system 1000 .
  • a computer-accessible medium may include storage media or memory media such as magnetic or optical media, e.g., disk or CD/DVD-ROM coupled to computer system 1000 via I/O interface 1030 .
  • Program instructions and data stored via a computer-accessible medium may be transmitted by transmission media or signals such as electrical, electromagnetic, or digital signals, which may be conveyed via a communication medium such as a network and/or a wireless link, such as may be implemented via network interface 1040 .
  • I/O interface 1030 may be configured to coordinate I/O traffic between processor 1010 , system memory 1020 , and any peripheral devices in the device, including network interface 1040 or other peripheral interfaces, such as input/output devices 1050 .
  • I/O interface 1030 may perform any necessary protocol, timing or other data transformations to convert data signals from one component (e.g., system memory 1020 ) into a format suitable for use by another component (e.g., processor 1010 ).
  • I/O interface 1030 may include support for devices attached through various types of peripheral buses, such as a variant of the Peripheral Component Interconnect (PCI) bus standard or the Universal Serial Bus (USB) standard, for example.
  • PCI Peripheral Component Interconnect
  • USB Universal Serial Bus
  • I/O interface 1030 may be split into two or more separate components, such as a north bridge and a south bridge, for example.
  • some or all of the functionality of I/O interface 1030 such as an interface to system memory 1020 , may be incorporated directly into processor 1010 .
  • Network interface 1040 may be configured to allow data to be exchanged between computer system 1000 and other devices attached to a network, such as other computer systems, or between nodes of computer system 1000 .
  • network interface 1040 may support communication via wired or wireless general data networks, such as any suitable type of Ethernet network, for example; via telecommunications/telephony networks such as analog voice networks or digital fiber communications networks; via storage area networks such as Fibre Channel SANs, or via any other suitable type of network and/or protocol.
  • Input/output devices 1050 may, in some embodiments, include one or more display terminals, keyboards, keypads, touchpads, scanning devices, voice or optical recognition devices, or any other devices suitable for entering or retrieving data by one or more computer system 1000 .
  • Multiple input/output devices 1050 may be present in computer system 1000 or may be distributed on various nodes of computer system 1000 .
  • similar input/output devices may be separate from computer system 1000 and may interact with one or more nodes of computer system 1000 through a wired or wireless connection, such as over network interface 1040 .
  • memory 1020 may include program instructions 1025 , configured to implement embodiments of a model-based stereo matching module and/or of the various submodules as described herein, and data storage 1035 , comprising various data accessible by program instructions 1025 .
  • program instructions 1025 may include software elements of embodiments of a model-based stereo matching module and/or of the various submodules as illustrated in the provided Figures and as described herein.
  • Data storage 1035 may include data that may be used in embodiments. In other embodiments, other or different software elements and data may be included.
  • computer system 1000 is merely illustrative and is not intended to limit the scope of a model-based stereo matching module and/or of the various submodules as described herein.
  • the computer system and devices may include any combination of hardware or software that can perform the indicated functions, including a computer, personal computer system, desktop computer, laptop, notebook, or netbook computer, mainframe computer system, handheld computer, workstation, network computer, a camera, a set top box, a mobile device, network device, internet appliance, PDA, wireless phones, pagers, a consumer device, video game console, handheld video game device, application server, storage device, a peripheral device such as a switch, modem, router, or in general any type of computing or electronic device.
  • Computer system 1000 may also be connected to other devices that are not illustrated, or instead may operate as a stand-alone system.
  • the functionality provided by the illustrated components may in some embodiments be combined in fewer components or distributed in additional components.
  • the functionality of some of the illustrated components may not be provided and/or other additional functionality may be available.
  • instructions stored on a computer-accessible medium separate from computer system 1000 may be transmitted to computer system 1000 via transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as a network and/or a wireless link.
  • Various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the foregoing description upon a computer-accessible medium. Accordingly, the present disclosure may be practiced with other computer system configurations.
  • FIG. 11 illustrates modeling results for an example face, according to some embodiments.
  • FIG. 11 ( a ) and FIG. 11 ( b ) are the input stereo images.
  • FIG. 11 ( c ) is the close-up of the face in FIG. 11 ( a ).
  • FIG. 11 ( d ) and FIG. 11 ( e ) are the confidence map and depth map computed from stereo matching, respectively.
  • FIG. 11 ( f ) is the registered laser-scanned model and 11 ( g ) is the fused model.
  • FIG. 11 ( h )-( j ) are the screenshots of the stereo model, laser-scanned model and fused model, respectively.
  • FIG. 11 ( k ) is the estimated surface normal map
  • FIG. 11 ( l ) is the re-lighted result of FIG. 11 ( c ) using the estimated normal map in FIG. 11 ( k ).
  • FIG. 11 illustrates modeling results of a person whose face is quite different from the laser-scanned model used, as can be seen from the stereo model in FIG. 11 ( h ) and registered laser-scanned model in FIG. 11 ( i ).
  • the fused model is presented in FIG. 11 ( j ).
  • the incorrect mouth and chin are corrected in FIG. 11 ( j ).
  • FIG. 11 ( k ) is the estimated surface normal, which is then used for scene relighting as shown in FIG. 11 ( l ).
  • a computer-accessible medium may include storage media or memory media such as magnetic or optical media, e.g., disk or DVD/CD-ROM, volatile or non-volatile media such as RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.), ROM, etc., as well as transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as network and/or a wireless link.
  • storage media or memory media such as magnetic or optical media, e.g., disk or DVD/CD-ROM, volatile or non-volatile media such as RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.), ROM, etc.
  • RAM e.g. SDRAM, DDR, RDRAM, SRAM, etc.
  • ROM etc.
  • transmission media or signals such as electrical, electromagnetic, or digital signals

Abstract

Model-based stereo matching from a stereo pair of images of a given object, such as a human face, may result in a high quality depth map. Integrated modeling may combine coarse stereo matching of an object with details from a known 3D model of a different object to create a smooth, high quality depth map that captures the characteristics of the object. A semi-automated process may align the features of the object and the 3D model. A fusion technique may employ a stereo matching confidence measure to assist in combining the stereo results and the roughly aligned 3D model. A normal map and a light direction may be computed. In one embodiment, the normal values and light direction may be used to iteratively perform the fusion technique. A shape-from-shading technique may be employed to refine the normals implied by the fusion output depth map and to bring out fine details. The normals may be used to re-light the object from different light positions.

Description

    PRIORITY INFORMATION
  • This application claims benefit of priority of U.S. Provisional Application Ser. No. 61/375,536 entitled “Methods and Apparatus for Model-Based Stereo Matching” filed Aug. 20, 2010, the content of which is incorporated by reference herein in its entirety.
  • BACKGROUND
  • 1. Technical Field
  • This disclosure relates generally to image processing, and more specifically, stereo image processing.
  • 2. Description of the Related Art
  • Conventional stereo matching techniques are unreliable in many cases due to occlusions (where a point may be visible in one stereo image but not the other), lack of texture (constant color, not much detail), and specular highlights (a highlighted portion that may move around in different camera views). All of these difficulties exist when applying stereo matching techniques to human faces, with lack of texture being a particular problem. The difficulties apply to other types of objects as well. FIG. 1 illustrates an example of a result of a conventional stereo matching technique, as applied to a human face, and indicates problem areas caused by occlusions, lack of texture, and specular highlights.
  • While commercial stereo cameras are emerging, many if not most image processing applications do not provide tools to process stereo images, or, if they do, the tools have limitations.
  • SUMMARY
  • Various embodiments of model-based stereo matching are described. Reliable correspondences will be the basis of many stereo image processing tool features, such as a paint brush that simultaneously paints or applies some local effect to the corresponding areas of a stereo pair, and automatic view morphing. Embodiments may implement a model-based stereo matching technique that may be used to obtain a high quality depth map and/or other output for an object, such as a human face, from an input pair of stereo images.
  • Some embodiments may employ a three-dimensional (3D) face model method that may regularize and address the problems encountered in conventional stereo matching techniques. One integrated modeling method is described that combines the coarse shape of a subject's face, obtained by stereo matching, with details from a 3D face model, which may be of a different person, to create a smooth, high quality depth map that captures the characteristics of the subject's face. In one embodiment, a semi-automated process may be used to align the facial features of the subject and the 3D model. A fusion technique may be employed that utilizes a stereo matching confidence measure to assist in intelligently combining the ordinary stereo results and the roughly aligned 3D model. A shape-from-shading method may be employed with a simple Lambertian model to refine the normals implied by the fusion output depth map and to bring out very fine facial details such as wrinkles and creases that may not be possible to capture with conventional stereo matching. The quality of the normal maps may allow them to be used to re-light a subject's face from different light positions.
  • In some embodiments, inputs to the framework may include a stereo image pair of a person's face and a pre-established face model, for example obtained from a 3D laser scanner, which is of a different subject than the subject in the stereo image pair. In some embodiments, a library of models or model database that includes a plurality of models may be provided as inputs and used in the framework instead of a single model. Embodiments may apply stereo vision to the input stereo image pair to obtain a rough 3D face model, which may be limited in accuracy, and then use it to guide the registration and alignment of the laser-scanned face model.
  • Embodiments may employ a method that combines the rough 3D face model with the laser-scanned face model to produce a fused model that approximates both, such that the details from the laser-scanned face model can be transferred to the model obtained from stereo vision. The formulation used by embodiments may be linear and can be solved efficiently, for example using a conjugated gradient method. The method can also naturally integrate the confidence of the result obtained from stereo vision. At least some embodiments may employ loopy belief propagation in a confidence estimation technique. At least some embodiments may employ a method for estimating the surface normal and light direction. In some embodiments, the fused model may be refined using shading information from the stereo image pair.
  • While some embodiments are directed toward modeling human faces, it is noted that embodiments of the disclosed modeling techniques can be employed or adapted to model other types of objects.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates an example of a result of a conventional stereo matching technique, as applied to a human face, and indicates problem areas caused by occlusions, lack of texture, and specular highlights.
  • FIG. 2 illustrates an example of a stereo pair of images (a left and right image) captured using a stereo camera.
  • FIG. 3 illustrates an example laser-scanned 3D model of a human face.
  • FIG. 4 illustrates an example 3D model database.
  • FIG. 5 is a high-level block diagram that shows example inputs to the model-based stereo matching method, and an example depth map output, according to at least some embodiments.
  • FIG. 6 illustrates an example module that may implement an integrated modeling method, according to some embodiments.
  • FIG. 7 is a block diagram illustrating the operation of a model-based stereo matching module.
  • FIG. 8 illustrates iteratively performing sensor fusion and light direction and surface normal estimation to provide integrated estimation of depth, normal, light direction, and albedo, according to some embodiments.
  • FIG. 9 is a flowchart of an integrated modeling method, according to at least some embodiments.
  • FIG. 10 illustrates an example computer system that may be used in embodiments.
  • FIG. 11 illustrates modeling results for an example face, according to some embodiments.
  • While the disclosure is described by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the disclosure is not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description are not intended to limit the disclosure to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present disclosure. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to.
  • DETAILED DESCRIPTION OF EMBODIMENTS
  • In the following detailed description, numerous specific details are set forth to provide a thorough understanding of claimed subject matter. However, it will be understood by those skilled in the art that claimed subject matter may be practiced without these specific details. In other instances, methods, apparatuses or systems that would be known by one of ordinary skill have not been described in detail so as not to obscure claimed subject matter.
  • Some portions of the detailed description which follow are presented in terms of algorithms or symbolic representations of operations on binary digital signals stored within a memory of a specific apparatus or special purpose computing device or platform. In the context of this particular specification, the term specific apparatus or the like includes a general purpose computer once it is programmed to perform particular functions pursuant to instructions from program software. Algorithmic descriptions or symbolic representations are examples of techniques used by those of ordinary skill in the signal processing or related arts to convey the substance of their work to others skilled in the art. An algorithm is here, and is generally, considered to be a self-consistent sequence of operations or similar signal processing leading to a desired result. In this context, operations or processing involve physical manipulation of physical quantities. Typically, although not necessarily, such quantities may take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared or otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to such signals as bits, data, values, elements, symbols, characters, terms, numbers, numerals or the like. It should be understood, however, that all of these or similar terms are to be associated with appropriate physical quantities and are merely convenient labels. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout this specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining” or the like refer to actions or processes of a specific apparatus, such as a special purpose computer or a similar special purpose electronic computing device. In the context of this specification, therefore, a special purpose computer or a similar special purpose electronic computing device is capable of manipulating or transforming signals, typically represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the special purpose computer or similar special purpose electronic computing device.
  • Various embodiments of methods and apparatus for model-based stereo matching are described. Embodiments may implement a model-based stereo matching technique that may be used to obtain a high quality depth map and other outputs for a human face, or for other types of objects, from an input stereo pair of images. An integrated modeling method is described that combines the coarse shape of a subject's face obtained by stereo matching with the details from a 3D face model (of a different person) to create a smooth, high quality depth map that captures the characteristics of the subject's face.
  • Turning now to FIG. 2, an example stereo pair of input images (a left and right image) is shown. The stereo pair of images may be captured using a stereo camera that may, in some embodiments, collectively serve as one input to the disclosed stereo matching process. In other embodiments, an n-way stereo that takes in n images could be provided as an input to the disclosed stereo matching process. The input images may be lit from any direction, including from the camera direction. This may allow a flash to be used in capturing the images.
  • FIG. 3 shows an example laser-scanned 3D model of a human face that may, in some embodiments, serve as one input to the disclosed stereo matching process. In some embodiments, a library of models or a model database that includes a plurality of models may be used instead of a single 3D model. FIG. 4 shows an example of such a model database. In one embodiment, the input 3D model may be a non-laser-scanned model. For example, the output of the disclosed process may be fed back and used as the input model in one iterative embodiment.
  • FIG. 5 is a high-level block diagram that shows example inputs, in the form of a pair of stereo images and a laser-scanned 3D model, to the model-based stereo matching method, and an example depth map output, according to at least some embodiments.
  • In at least some embodiments of the model-based stereo matching method, a semi-automated process may be used to align the facial features of the subject and the 3D model. In some embodiments, the alignment process may be fully automated. A fusion algorithm may then employ a stereo matching confidence measure to assist in intelligently combining the ordinary stereo results with the roughly-aligned 3D model. Finally, a shape-from-shading technique may be employed with a simple Lambertian model to refine the normals implied by the fusion output depth map and to bring out very fine facial details such as wrinkles and creases that were not possible to capture with conventional stereo matching. The quality of the normal maps may enable them to re-light a subject's face from different light positions.
  • Embodiments of an integrated modeling method, as described herein, may be implemented in a model-based stereo matching module implemented by program instructions stored in a computer-readable storage medium and executable by one or more processors (e.g., one or more CPUs and/or GPUs). In at least some embodiments, the model-based stereo matching module may implement an interactive modeling method in which at least a portion of the modeling process may be guided by user input, for example, to guide a model registration process. Embodiments of the model-based stereo matching module may, for example, be implemented as a stand-alone application, as a module of an application, as a plug-in for applications including image processing applications, and/or as a library function or functions that may be called by other applications such as image processing applications. Embodiments of the model-based stereo matching module may be implemented in any image processing application. An example model-based stereo matching module that may implement the integrated modeling method, as described herein, is illustrated in FIGS. 6 and 7. An example system in which a model-based stereo matching module may be implemented is illustrated in FIG. 10.
  • FIG. 6 illustrates an example module that may implement embodiments of the integrated modeling method(s), as described herein. Model-based stereo matching module 100 may, for example, implement a model from stereo vision method as submodule 120, a semi-automatic model registration method as submodule 130, a sensor fusion method as submodule 140, and a light direction and surface normal estimation method as submodule 150.
  • Module 100 may receive, as input 110, a laser-scanned 3D model (or, alternatively, a model database) and a pair of images captured by a stereo camera. In one embodiment, the input model may be a non-laser-scanned 3D model. For example, the output of module 100 may be fed back as the input model to module 100 in one iterative embodiment. Module 100 may perform the integrated modeling method, for example as described below in relation to FIGS. 7 and 9. Some embodiments may iteratively perform sensor fusion 140 and light direction and surface normal estimation 150, as shown in FIG. 8, to provide integrated estimation of depth, surface normal, light direction, and albedo. Module 100 may receive user input 104. In one embodiment, a user may specify points as user input 104 for use in the registration/alignment process, described below, by submodule 130. In some embodiments, module 100 may provide a user interface 102 via which a user may interact with the module 100, for example, via user input 104 to specify points for registration, or to perform other interactive tasks. Output 170 may include, but is not limited to, a depth map, surface albedo, and a surface normal map. Output 170 may, for example, be stored to a storage medium 180, such as system memory, a disk drive, DVD, CD, etc. Output 170 may also be passed to one or more other modules 190 for further processing.
  • FIG. 7 is a block diagram illustrating the operation of a model-based stereo matching module 100 that implements an integrated modeling method according to at least some embodiments. The integrated modeling method may include several components that may be implemented in the model-based stereo matching module 100 as submodules:
  • a model from stereo vision method implemented as submodule 120;
  • a semi-automatic model registration method implemented as submodule 130;
  • a sensor fusion method implemented as submodule 140; and
  • a light direction and surface normal estimation method that computes normal and
  • light direction from depth and shading, implemented as submodule 150.
  • In some embodiments, each of these components may be implemented as separate modules implemented by program instructions stored in a computer-readable storage medium and executable by one or more processors (e.g., one or more CPUs and/or GPUs), as shown in FIG. 10. The separate modules may be provided as modules of an application, as plug-ins for modules or applications including image processing modules or applications, and/or as library functions that may be called by other modules or applications such as image processing modules or applications.
  • Referring again to FIG. 7, inputs 110 to model-based stereo matching module 100 may include a laser-scanned 3D model (ML) (see, for example, FIG. 3) and a stereo image pair (IL and IR) (see, for example, FIG. 2). In one embodiment, the stereo image pair may be the resulting images from a stereo camera snapshot. In other embodiments, an n-way stereo that takes in n images could be provided to input 110. In some embodiments, a model database may replace the single laser-scanned 3D model as an input. See FIG. 4 for an example model database. In some embodiments, the input model is a non laser-scanned model. The output of model-based stereo matching module 100 may be a final face model including, but not limited to, a depth map (DF), normal map (N) and surface albedo (A). See FIG. 5 for an example output depth map.
  • Referring again to FIG. 7, a stereo pair (a left and right image, designated IL and IR, respectively) may be provided to or obtained by submodule 120. Submodule 120 may perform stereo matching to generate its outputs, which may include an estimated stereo depth map (DS), confidence map (CS) and a 3D stereo model (MS), which may be established from the estimated stereo depth map.
  • In at least some embodiments, submodule 120 may utilize a loopy belief propagation (BP) based binocular stereo matching method. In one embodiment, the method may be used for face reconstruction, i.e., to generate MS and other outputs. In at least some embodiments, a global optimization method, rather than local optimization, may be employed. Global optimization may be more robust on low-textured surfaces such as faces. In at least some embodiments, an efficient BP algorithm, such as a constant space belief propagation (CSBP) algorithm, may be implemented to compute a disparity map. Use of a CSBP algorithm may result in a speed and memory cost improvement. A disparity, as used herein, means how many pixels away the matching pixels in two stereo images is calculated to be. For example, if a pixel at coordinates (3, 11) in stereo image IL is calculated to correspond to pixel (7, 11) in stereo image IR., the disparity will be 4. Other methods or technologies to compute a disparity map may also be used.
  • In at least some embodiments of the integrated modeling method, a stereo confidence measure may be computed in submodule 120. Typically, a BP technique used for stereo matching may be used to find a corresponding pixel in the other stereo image by looking at the same row of pixels (height). In one embodiment, a fast-converging BP algorithm may be used. The algorithm may begin with each pixel in one image matching its colors to pixels in the other image to guess what its disparity may be. The algorithm may further integrate each pixel's calculation of its own disparity along with what neighboring pixels calculate (believe) as well. Further, the algorithm may be iterative such that, at each iteration, each pixel updates its individual belief and neighboring pixels update and propagate their beliefs. The phrases propagating a belief and updating messages are meant to be used interchangeably. At each iteration, the algorithm may identify each pixel as converging or not converging to a disparity. By detecting non-converged pixels and updating the messages of those pixels, the algorithm may decrease the running time in situations with a large number of iterations. After several iterations, the number of non-converged statuses (let it be T) may be accumulated for each pixel. Pixels with a greater number of non-converged statuses exhibit a lower confidence of the calculated disparity while pixels with a lesser number of non-converged statuses are more confident about the calculated disparity. For each given pixel, T values result in a number describing a confidence measure. Collectively, the confidence measure values make up the stereo confidence CS. CS includes a value for each pixel from 0 to 1, with 0 representing less confidence and 1 representing more confidence. The stereo confidence, CS, may be used in the fusion process described below or in any other algorithm or process that may benefit from knowing the confidence of stereo matching. Other processes that use stereo matching may benefit from the confidence measures.
  • In one embodiment, MS may need to be aligned with the laser-scanned model. Submodule 130 may register the stereo model MS generated by submodule 120 with the input laser-scanned model ML. In some embodiments, a user may be able to provide an input 132 to submodule 130. In other embodiments, submodule 130 may be fully automated, and not allow any user input. ML may include some predefined points. The predefined points may be predefined by a user or automatically predefined by an algorithm. In some embodiments, a user may be able to select one or more points on MS which correspond to the predefined points of the laser-scanned model ML. For example, illustrated in FIG. 7, a user may select four points (crosses on the bottom image of user input 132) of MS that correspond to four predefined points (white circles on the top image of user input 132) of ML. The four correspondences may then be used to compute a coarse transformation between ML and M, shown in global registration 134.
  • The transform in global registration 134 may include a constant scalar, a rotation matrix and a translation vector. In some embodiments, the transform may be computed using a method that performs a least-squares estimation of transformation parameters between two point patterns. The resulting coarse transformation may then be iteratively revised, which may, in some embodiments, utilize all points in the models and not just the predefined and selected points. In one embodiment, the revising is performed using an algorithm such as iterative closest point (ICP). The course estimation of the transform may be used as an initial estimation in the ICP technique, which may revise the transformation (rotation and translation) and minimize the distance between the two models.
  • In at least some embodiments, local manual adjustment 136 may also be used to improve the registration accuracy. Small variances around some features, for example around the mouth area, may be hard to capture in the stereo model. Thus, it may be difficult to register such a feature on the stereo model correctly with the laser-scanned model. As shown in FIG. 7, in the dotted box of the rightmost image of local adjustment 136, the mouth region of the registered model before local adjustment may not be well aligned. To locally adjust, the contour of the feature (e.g., mouth) on the laser-scanned model and several key points (pL) on the contour may be manually selected in advance. For each input stereo model, the contour of the feature (e.g., mouth) on the stereo model may be selected by several line segments. The contour does not need to be very precise as the transform around the feature may be very smooth. The correspondence of the key points on the stereo model may also need to be identified, let them be designated as (pS). The motion vectors of the key points may then be computed as the difference of the key points PL-PS, and the motion vector of the other points on the contour of the feature may then be interpolated from the motion vectors of these key points. In at least some embodiments, setting the motion vectors of points on a bounding box of the feature to all zero, a Poisson interpolation technique may be used to estimate the motion vectors for every point inside the feature area with the boundary conditions that the estimated motion vectors will be the same as those on the bounding box and the contour of the feature. While the mouth is used as an example feature, this local adjustment method may be applied to other regions or features. As example of local adjustment that may improve the registered model's results is shown in the dashed box of the rightmost image of local adjustment 136. Submodule 130 may output the registered laser-scanned model and a corresponding depth map computed from this model, referred to as DL.
  • Submodule 140 may fuse the stereo model depth map DS with the registered, aligned laser-scanned model depth map DL and generate a new model that approximates both input models. The new fused model may include the shape of the stereo model and the smoothness and detail of the aligned/referenced model. As illustrated in FIG. 7, the upper left image of submodule 140 shows the fused depth map DF, which may be smoother than the depth map from stereo vision DS and more detailed than the aligned model depth map DL. Many differences exist between the details of the fused model and DL. For instance, the eyes of DL are lower than the eyes of the fused model, and the curvature of the region between the chin and the mouth of ML is larger than the fused model. More views of the screenshots of the fused model are presented on the bottom row of images of submodule 140. Stereo confidence CS may also used in the fusion step to intelligently combine DS and DL.
  • One objective of the sensor fusion method of submodule 140 may be to transfer the details (high-frequency component) from DL to DS, while keeping the large-scale variations (low-frequency component) of DS. In one embodiment, depth function DF may conform to the estimates for the gradient computed from DL and the depth obtained from DS at each point. To accomplish this, in at least some embodiments, a depth function may minimize the sum of two error terms: the gradient error EG and the depth error ED.
  • The gradient error may be defined as the sum of squared distances between the partial derivatives of the optimized depth value DF and the depth values obtained from DL:
  • E O ( D F ) = i ( D i F x - D i L x ) 2 + ( D i F y - D i L y ) 2 . ( 1 )
  • The depth error ED may be defined as the sum of squared distances between the optimized depth value DF and that from stereo vision DS:
  • E D ( D F ) = i ( D i F - D i S ) 2 . ( 2 )
  • where Di F is the depth value of the ith optimized point, and Di L and Di S are the depth values of the ith point obtained from the laser scanner and stereo vision, respectively.
  • A depth map DF may then be given by solving
  • arg min D F λ E D ( D F ) + E G ( D F ) , ( 3 )
  • where λ=0.03 is a constant scalar parameter. The constant scalar parameter λ may also be other values. When the confidence measurement, CS, of the employed stereo matching method is available, the depth map DF may be computed as follows:
  • arg min D F C S ( 2 λ E D ( D F ) ) + ( 1 - C S ) E G ( D F ) . ( 4 )
  • CSε[0, 1] may control how much influence the depth error has in the optimization. Where the stereo confidence CS is 0, the method considers the result obtained from the laser-scanned 3D input exclusively, except in boundary conditions. Where CS is 1, the method returns the depth values from stereo matching exclusively. For intermediate values, the method performs a weighted combination of the two inputs. CS may be higher in high texture areas, such as eyebrows while CS may be lower in occluded areas, in areas with oblique angles, and in low-texture areas.
  • Each point/pixel may generate three equations. These equations may include one for the depth error and one for the gradient error in each of the x and y directions. Before squaring, the equations for the error terms are linear in the depth values being solved for. Therefore, the entire minimization can be formulated as a large over-constrained linear system to be solved, for example, by a least squares technique:
  • [ C S ( 2 λ ) U ( 1 - C S ) x ( 1 - C S ) y ] [ D F ] = [ C S ( 2 λ ) D S ( 1 - C S ) D L x ( 1 - C S ) D L y ] . ( 5 )
  • Here, U is an identity matrix and
  • x
  • represents a matrix that, when multiplied by the unknown vector DF, produces a vector with one row per point. The matrix multiplication result is the gradient of DF in the x direction.
  • y
  • represents the same operation as
  • x
  • but
    in the y direction.
  • [ x , y ]
  • is the gradient operator.
  • Let:
  • A = [ C S ( 2 λ ) U ( 1 - C S ) x ( 1 - C S ) y ] ( 6 )
  • and let:
  • b = [ C S ( 2 λ ) D S ( 1 - C S ) D L x ( 1 - C S ) D L y ] , ( 7 )
  • Multiplying AT on both sides of equation (5), the following may be obtained:

  • [A T A][D F ]=A T b.  (8)

  • where:

  • A T A=(C S)2(2λ)2 U+(1−C S)2Δ.  (9)

  • A T b=(C S)2(2λ)2 D S+(1−C S)2 ΔD L  (10)
  • and:
  • Δ = 2 x 2 + 2 y 2
  • is the Laplacian operator.
  • In some embodiments, equation (8) can be solved using a conjugated gradient method. Although matrix ATA may be large, it may also be very sparse. Therefore, the number of non-zero entries may be linear in the number of pixels because there may be at most five non-zero entries per row (one coefficient for the depth of the reference pixel and the other for its neighbors used to find the second partial derivatives).
  • In some embodiments, the fused depth map DF may then be provided to submodule 150 for surface normal and light direction estimation. Generally, submodule 150 may roughly estimate the normal and robustly compute the light direction followed by refining the normal using the light direction to bring out details of the object. Normals may be vectors [x,y,z], such that x is red, y is green, and z is blue. For example, a body part pointing right back at the camera, like a chin, may be blue. The underside of the nose points down along the y axis and thus may be green.
  • In one embodiment, submodule 150 assumes that the albedo of the skin is a constant, and detects skin pixels based on surface chromaticities. A coarse normal map N(DF) may be computed from the fused depth map DF. However, as shown in FIG. 7, such a normal map may include various incorrect details of the laser-scanned 3D model. As a result, at least some embodiments may smooth the fused depth map DF, and then create a corresponding normal map NF from the smoothed depth map. The light direction and skin albedo may then be estimated using the intensity values of the detected skin pixels, and the corresponding normal vectors may be obtained from NF. Finally, the estimated light direction, skin albedo and intensity values are used to refine the coarse normal estimate NF to obtain a refined normal map N. The light direction, normal map N, and the color values of the input image may then be used to compute the albedo at each pixel location, and can be used for scene re-lighting. A re-lighting example is shown in the rightmost image of submodule 150 in FIG. 7.
  • A more detailed example algorithm for estimating the surface normal and light direction is summarized below in algorithm (1). The method may first locate all the skin pixels based on surface chromaticities, and then compute a coarse normal map NF from the input depth map. Assuming that the albedo is constant over all skin pixels, the method may then compute the light direction L using the coarse normal map NF and the intensity of the skin pixels, for example using a simple Lambertian model. The coarse normal and the image intensity at each pixel location may then be used together with the estimated light direction to solve for the final normal at the current pixel location using the same Lambertian model.
  • Algorithm 1
      • 1: Compute the chromaticity of the reference color image at each pixel and find the median chromaticity.
      • 2: Keep only half of the image pixels as skin pixels based on the similarity of their chromaticity and the median chromaticity.
      • 3: Smooth the depth map DF obtained from sensor fusion to remove the incorrect details. Let the smoothed depth map be designated as DS F.
      • 4: Compute the coarse normal map NF from DS F.
      • 5: Under the assumption of constant skin albedo, a simple Lambertian model, and directional light source, computes the light direction
  • L = L L
      • by solving an overconstrained linear system as follows:
  • [ ( N i F ) T ] [ ] = [ I i ] . ( 11 )
      • NF and Ii are the normal and intensity at the ith skin pixel.
      • 6: Compute the final normal map N at each pixel by solving the following linear system using the estimated light direction:
  • [ T U ] [ N i ] = [ I i N i F ] . ( 12 )
      • U is a 3×3 identity matrix, Ni F and Ii are the normal and intensity at the ith pixel.
  • In at least some embodiments, the input depth map DF may be refined using the shading information of the stereo image. The refined depth map may be more consistent with the other outputs that have been computed, e.g., the normals. One algorithm to refine a depth map is detailed below. Another algorithm is provided in the provisional application this application claims priority to.
  • Let the refined depth function be Z and the intrinsic matrix of the stereo camera be K, let Z0 be the depth at pixel location [x, y], let Zx be the depth at [x+1, y], let Zy be the depth at [x, y+1], and let α=x+y+1. The normal at [x, y] will be:
  • n = ( K - 1 dx ) × ( K - 1 dy ) ( K - 1 dx ) × ( K - 1 dy ) = det ( K - 1 ) K T dx × dy dx × dy , ( 13 )
  • where:
  • dx = Z 0 [ x y 1 ] - Z x [ x + 1 y 1 ] , ( 14 ) dy = Z 0 [ x y 1 ] - Z y [ x y + 1 1 ] , ( 15 ) dx × dy = [ Z y ( Z 0 - Z x ) Z x ( Z 0 - Z y ) α Z x Z y - ( xZ y + yZ x ) Z 0 ] . ( 16 )
  • Using the estimated light direction and the skin albedo, the following is obtained:

  • f(Z 0 ,Z x ,Z y)=
    Figure US20130129190A1-20130523-P00001
    T ·{right arrow over (n)}−I x,y=0.  (17)

  • Let:

  • H=det 2(K −1)(K −T
    Figure US20130129190A1-20130523-P00001
    Figure US20130129190A1-20130523-P00001
    T K T)

  • and:

  • E=det 2(K −1)(K −T K T),
  • then Hand E are both constant 3×3 matrices. Let:

  • F=H−I x,y 2 E

  • and:

  • G=({right arrow over (d)}x×{right arrow over (d)}y)({right arrow over (d)}x×{right arrow over (d)}y)T,
  • Substituting equation (13) into equation (17), the following is obtained:

  • f(Z0,Zx,Zy)=F:G=0,  (18)
  • where the symbol “:” represents the Frobenius inner product operation.
  • Newton's iteration method may then be used to solve equation (18):
  • f ( Z 0 t + 1 , Z x t + 1 , Z y t + 1 ) = ( 19 ) f ( Z 0 t , Z x t , Z y t ) + ( 20 ) f ( Z 0 t , Z x t , Z y t ) Z 0 ( Z 0 t + 1 - Z 0 t ) + ( 21 ) f ( Z 0 t , Z x t , Z y t ) Z x ( Z x t + 1 - Z x t ) + ( 22 ) f ( Z 0 t , Z x t , Z y t ) Z y ( Z y t + 1 - Z y t ) = 0. ( 23 )
  • At each iteration, a linear system is solved:
  • [ 0 , , df 0 , df x , 0 , , df y , 0 , ] [ Z 0 t + 1 ] = ( 24 ) df 0 Z 0 t + df x Z x t + df y Z y t - ( 25 ) f ( Z 0 t , Z x t , Z y t ) , ( 26 )
  • where:
  • df 0 = f ( Z 0 t , Z x t , Z y t ) Z 0 , ( 27 )
  • which can be computed from equation (18).
  • Let
  • J = [ J 0 , J x , J y ] = ( 28 ) = [ Z y - Z y Z 0 - Z x Z x Z 0 - Z y - Z x - xZ y - yZ x α Z y - yZ 0 α Z x - xZ 0 ] ( 29 )
  • be the Jacobian matrix of vector {right arrow over (d)}×{right arrow over (d)}y with respect to [Z0, Zx, Zy]. Then:
  • df 0 = F : ( [ J 0 , dx × dy ] [ ( dx × dy ) T J 0 T ] ) ( 30 ) df x = F : ( [ J x , dx × dy ] [ ( dx × dy ) T J x T ] ) ( 31 ) df y = F : ( [ J y , dx × dy ] [ ( dx × dy ) T J y T ] ) . ( 32 )
  • The definition of {right arrow over (d)}x×{right arrow over (d)}y is provided in equation (16).
  • Instead of solving the large linear system in equation (26), which may be slow, a more efficient solution that may be used in some embodiments may be obtained by setting

  • Z x t+1 =Z x t

  • and

  • Z y t+1 =Z y t
  • in equation (23). Equation (23) may then be simplified as
  • Z 0 t + 1 = Z 0 t - f ( Z 0 t , Z x t , Z y t ) f ( Z 0 t , Z x t , Z y t ) Z 0 = Z 0 t - f ( Z 0 t , Z x t , Z y t ) df 0 . ( 33 )
  • Turning now to FIG. 8, some embodiments may iteratively perform sensor fusion 140 and light direction and surface normal estimation 150 to provide integrated estimation of depth, surface normal, light direction, and albedo. In these embodiments, as shown in FIG. 8, the outputs from light direction and surface normal estimation 150 may be fed back to sensor fusion 140 to iteratively improve overall results. In other embodiments, outputs from light direction and surface normal estimation 150 may be fed back to another component of model-based stereo matching module 100. For example, the output model may replace the input laser-scanned model or be added to the library of models.
  • In one embodiment, the depth map may be improved by using the normals from submodule 150 as an additional input to the fusion module 140 and by modifying the basic fusion algorithm to include the additional input. The basic fusion algorithm is given by the following 3 equations:
  • [ D F ] = D S [ x D F ] = D L x [ y D F ] = D L y
  • In one embodiment, the second and third equations could be replaced with a term that encourages the normals implied by the fused result to agree with the input normals. The normals implied by the fused depth map, N(DF), may be specified with equations (13)-(16) above (with DF instead of Z). The fusion algorithm may then include:

  • (*)N(D F)=N,
  • where N is the normal output from sensor fusion 140 and the equation is computed at each pixel (x,y). (*) could replace the second and third equations using the laser-scanned model or be added to the algorithm.
  • Integrated Modeling Method Flowchart
  • FIG. 9 is a flowchart of an integrated modeling method, according to at least some embodiments. As indicated at 200, a plurality of stereo images of an object (e.g., a human face) and at least one three-dimensional input model of the same type of object may be received. In some embodiments, a single, laser-scanned model may be obtained. In other embodiments, a model database including a plurality of models may be obtained. In some embodiments, the input 3D model may be the output of a previous iteration of the integrated modeling method. In some embodiments, the input 3D model is a non-laser-scanned model. The object may be any type of object including, but not limited to, human faces, animals, plants, or landscapes.
  • As indicated at 202, a three-dimensional model of the object may be generated from the plurality of stereo images of the object. In some embodiments, generating a three-dimensional model of the object may be performed by applying belief propagation (BP) based binocular stereo matching technology. In some embodiments, generating a three-dimensional model of the object may include applying constant space belief propagation (CSBP) technology to compute a disparity map. Further, in some embodiments, generating a 3D model of the object may include computing a stereo confidence CS and/or a stereo depth map DS.
  • As indicated at 204, the stereo model MS may be aligned, or registered, with the at least one input model ML resulting in an aligned model. In one embodiment, texture data of the input model may not be used in the alignment process. Aligning the two models may include receiving a user input, such as selecting points on MS that correspond to predetermined points on ML. In one embodiment, a course transformation, or global registration, may be computed based on the correspondences. Global registration may also include iteratively revising the transformation. In one embodiment, the iterative revision may be performed using an iterative closest point algorithm. The results of global registration may be locally adjusted to refine the output aligned/registered model.
  • As indicated at 206, a fused model may be generated by combining the depth map of the object DS with the aligned-model depth map DL. The fused model may approximate both input models including the shape of the stereo model and the detail and smoothness of the aligned model. In at least some embodiments, the fusion process may compute a fused depth map that may minimize the sum of a gradient error and a depth error, as discussed above. The stereo confidence CS may be used to intelligently combine DS and DL. In one embodiment, CS may be a value from 0 to 1, inclusive, for each pixel. If the confidence of a pixel in the stereo model is 0, then the corresponding pixel in the fused model may be generated entirely from the corresponding pixel in the aligned model. If the confidence of a pixel in the stereo model is 1, then the corresponding pixel in the fused model may be generated entirely from the stereo model.
  • As indicated at 208, a surface normal map and a light direction may be estimated from the fused model. In one embodiment, a rough surface normal may be estimated followed by computing the light direction. Next, the normal may be refined using the computed light direction, which may result in bringing out details of the object. In one embodiment, a skin albedo may also be calculated. In some embodiments, shown in FIG. 9 with the feedback line to the input to step 208, the surface normal map may be refined according to the light direction, albedo, and intensity values to generate a refined surface normal map.
  • In some embodiments, some of or all of elements 200-208 may be iteratively performed. One embodiment is illustrated with the feedback line from step 208 to the input of step 206. In that scenario, the generated surface normal map and estimated light direction and albedo may be fed back to the fusion step to iteratively improve results of the fused depth map DF.
  • In one embodiment, elements 200-208 may be performed using only one input 3D model. In other embodiments, elements 200-208 may be performed using more than one input 3D model.
  • While embodiments are generally illustrated and described as being applied for modeling human faces, at least some embodiments of the integrated modeling method may be applied to other objects or models, such as airplanes, people (full bodies), buildings or other structures, automobiles or other vehicles, etc.
  • Some embodiments may provide interactive tools for editing disparity maps given stereo pairs. In some embodiments, user interface elements may be provided that allow a user to pick a model from a set of models displayed on the user interface and, for example, drop the model on an object in one of the views for disparity refinement. In some embodiments, for objects in a scene that are unavailable as models, a user interface may provide one or more user interface elements or tools (e.g., brushes) via which the user may adjust previously computed disparity maps.
  • Example System
  • Embodiments of a model-based stereo matching module and/or of the various submodules as described herein may be executed on one or more computer systems, which may interact with various other devices. One such computer system is illustrated by FIG. 10. In different embodiments, computer system 1000 may be any of various types of devices, including, but not limited to, a personal computer system, desktop computer, laptop, notebook, or netbook computer, mainframe computer system, handheld computer, workstation, network computer, a camera, a set top box, a mobile device, a consumer device, video game console, handheld video game device, application server, storage device, a peripheral device such as a switch, modem, router, or in general any type of computing or electronic device.
  • In the illustrated embodiment, computer system 1000 includes one or more processors 1010 coupled to a system memory 1020 via an input/output (I/O) interface 1030. Computer system 1000 further includes a network interface 1040 coupled to I/O interface 1030, and one or more input/output devices 1050, such as cursor control device 1060, keyboard 1070, and display(s) 1080. In some embodiments, it is contemplated that embodiments may be implemented using a single instance of computer system 1000, while in other embodiments multiple such systems, or multiple nodes making up computer system 1000, may be configured to host different portions or instances of embodiments. For example, in one embodiment some elements may be implemented via one or more nodes of computer system 1000 that are distinct from those nodes implementing other elements.
  • In various embodiments, computer system 1000 may be a uniprocessor system including one processor 1010, or a multiprocessor system including several processors 1010 (e.g., two, four, eight, or another suitable number). Processors 1010 may be any suitable processor capable of executing instructions. For example, in various embodiments, processors 1010 may be general-purpose or embedded processors implementing any of a variety of instruction set architectures (ISAs), such as the x86, PowerPC, SPARC, or MIPS ISAs, or any other suitable ISA. In multiprocessor systems, each of processors 1010 may commonly, but not necessarily, implement the same ISA.
  • In some embodiments, at least one processor 1010 may be a graphics processing unit. A graphics processing unit or GPU may be considered a dedicated graphics-rendering device for a personal computer, workstation, game console or other computing or electronic device. Modern GPUs may be very efficient at manipulating and displaying computer graphics, and their highly parallel structure may make them more effective than typical CPUs for a range of complex graphical algorithms. For example, a graphics processor may implement a number of graphics primitive operations in a way that makes executing them much faster than drawing directly to the screen with a host central processing unit (CPU). In various embodiments, the image processing methods disclosed herein may, at least in part, be implemented by program instructions configured for execution on one of, or parallel execution on two or more of, such GPUs. The GPU(s) may implement one or more application programmer interfaces (APIs) that permit programmers to invoke the functionality of the GPU(s). Suitable GPUs may be commercially available from vendors such as NVIDIA Corporation, ATI Technologies (AMD), and others.
  • System memory 1020 may be configured to store program instructions and/or data accessible by processor 1010. In various embodiments, system memory 1020 may be implemented using any suitable memory technology, such as static random access memory (SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type of memory. In the illustrated embodiment, program instructions and data implementing desired functions, such as those described above for embodiments of a model-based stereo matching module and/or of the various submodules as described herein are shown stored within system memory 1020 as program instructions 1025 and data storage 1035, respectively. In other embodiments, program instructions and/or data may be received, sent or stored upon different types of computer-accessible media or on similar media separate from system memory 1020 or computer system 1000. Generally speaking, a computer-accessible medium may include storage media or memory media such as magnetic or optical media, e.g., disk or CD/DVD-ROM coupled to computer system 1000 via I/O interface 1030. Program instructions and data stored via a computer-accessible medium may be transmitted by transmission media or signals such as electrical, electromagnetic, or digital signals, which may be conveyed via a communication medium such as a network and/or a wireless link, such as may be implemented via network interface 1040.
  • In one embodiment, I/O interface 1030 may be configured to coordinate I/O traffic between processor 1010, system memory 1020, and any peripheral devices in the device, including network interface 1040 or other peripheral interfaces, such as input/output devices 1050. In some embodiments, I/O interface 1030 may perform any necessary protocol, timing or other data transformations to convert data signals from one component (e.g., system memory 1020) into a format suitable for use by another component (e.g., processor 1010). In some embodiments, I/O interface 1030 may include support for devices attached through various types of peripheral buses, such as a variant of the Peripheral Component Interconnect (PCI) bus standard or the Universal Serial Bus (USB) standard, for example. In some embodiments, the function of I/O interface 1030 may be split into two or more separate components, such as a north bridge and a south bridge, for example. In addition, in some embodiments some or all of the functionality of I/O interface 1030, such as an interface to system memory 1020, may be incorporated directly into processor 1010.
  • Network interface 1040 may be configured to allow data to be exchanged between computer system 1000 and other devices attached to a network, such as other computer systems, or between nodes of computer system 1000. In various embodiments, network interface 1040 may support communication via wired or wireless general data networks, such as any suitable type of Ethernet network, for example; via telecommunications/telephony networks such as analog voice networks or digital fiber communications networks; via storage area networks such as Fibre Channel SANs, or via any other suitable type of network and/or protocol.
  • Input/output devices 1050 may, in some embodiments, include one or more display terminals, keyboards, keypads, touchpads, scanning devices, voice or optical recognition devices, or any other devices suitable for entering or retrieving data by one or more computer system 1000. Multiple input/output devices 1050 may be present in computer system 1000 or may be distributed on various nodes of computer system 1000. In some embodiments, similar input/output devices may be separate from computer system 1000 and may interact with one or more nodes of computer system 1000 through a wired or wireless connection, such as over network interface 1040.
  • As shown in FIG. 10, memory 1020 may include program instructions 1025, configured to implement embodiments of a model-based stereo matching module and/or of the various submodules as described herein, and data storage 1035, comprising various data accessible by program instructions 1025. In one embodiment, program instructions 1025 may include software elements of embodiments of a model-based stereo matching module and/or of the various submodules as illustrated in the provided Figures and as described herein. Data storage 1035 may include data that may be used in embodiments. In other embodiments, other or different software elements and data may be included.
  • Those skilled in the art will appreciate that computer system 1000 is merely illustrative and is not intended to limit the scope of a model-based stereo matching module and/or of the various submodules as described herein. In particular, the computer system and devices may include any combination of hardware or software that can perform the indicated functions, including a computer, personal computer system, desktop computer, laptop, notebook, or netbook computer, mainframe computer system, handheld computer, workstation, network computer, a camera, a set top box, a mobile device, network device, internet appliance, PDA, wireless phones, pagers, a consumer device, video game console, handheld video game device, application server, storage device, a peripheral device such as a switch, modem, router, or in general any type of computing or electronic device. Computer system 1000 may also be connected to other devices that are not illustrated, or instead may operate as a stand-alone system. In addition, the functionality provided by the illustrated components may in some embodiments be combined in fewer components or distributed in additional components. Similarly, in some embodiments, the functionality of some of the illustrated components may not be provided and/or other additional functionality may be available.
  • Those skilled in the art will also appreciate that, while various items are illustrated as being stored in memory or on storage while being used, these items or portions of them may be transferred between memory and other storage devices for purposes of memory management and data integrity. Alternatively, in other embodiments some or all of the software components may execute in memory on another device and communicate with the illustrated computer system via inter-computer communication. Some or all of the system components or data structures may also be stored (e.g., as instructions or structured data) on a computer-accessible medium or a portable article to be read by an appropriate drive, various examples of which are described above. In some embodiments, instructions stored on a computer-accessible medium separate from computer system 1000 may be transmitted to computer system 1000 via transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as a network and/or a wireless link. Various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the foregoing description upon a computer-accessible medium. Accordingly, the present disclosure may be practiced with other computer system configurations.
  • Example Results
  • FIG. 11 illustrates modeling results for an example face, according to some embodiments. FIG. 11 (a) and FIG. 11 (b) are the input stereo images. FIG. 11 (c) is the close-up of the face in FIG. 11 (a). FIG. 11 (d) and FIG. 11 (e) are the confidence map and depth map computed from stereo matching, respectively. FIG. 11 (f) is the registered laser-scanned model and 11 (g) is the fused model. FIG. 11 (h)-(j) are the screenshots of the stereo model, laser-scanned model and fused model, respectively. FIG. 11 (k) is the estimated surface normal map, and FIG. 11 (l) is the re-lighted result of FIG. 11 (c) using the estimated normal map in FIG. 11 (k).
  • FIG. 11 illustrates modeling results of a person whose face is quite different from the laser-scanned model used, as can be seen from the stereo model in FIG. 11 (h) and registered laser-scanned model in FIG. 11 (i). The fused model is presented in FIG. 11 (j). The incorrect mouth and chin are corrected in FIG. 11 (j). FIG. 11 (k) is the estimated surface normal, which is then used for scene relighting as shown in FIG. 11 (l).
  • CONCLUSION
  • Various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the foregoing description upon a computer-accessible medium. Generally speaking, a computer-accessible medium may include storage media or memory media such as magnetic or optical media, e.g., disk or DVD/CD-ROM, volatile or non-volatile media such as RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.), ROM, etc., as well as transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as network and/or a wireless link.
  • The various methods as illustrated in the Figures and described herein represent example embodiments of methods. The methods may be implemented in software, hardware, or a combination thereof. The order of method may be changed, and various elements may be added, reordered, combined, omitted, modified, etc.
  • Various modifications and changes may be made as would be obvious to a person skilled in the art having the benefit of this disclosure. It is intended that the disclosure embrace all such modifications and changes and, accordingly, the above description to be regarded in an illustrative rather than a restrictive sense.

Claims (31)

1. A method, comprising:
performing, by one or more computers:
receiving a plurality of stereo images of an object of a type and at least one three-dimensional input model of the same type of object;
generating a three-dimensional stereo model of the object from the plurality of stereo images;
computing a confidence measure for the stereo model;
aligning the stereo model with the at least one input model resulting in an aligned model; and
generating a fused model, wherein said generating a fused model comprises combining the stereo model with the aligned model, wherein said combining includes weighting the stereo model and the aligned model based, at least in part, on the confidence measure.
2. The method of claim 1, wherein said computing the confidence measure includes computing a respective confidence measure value for each pixel of the stereo model, wherein said combining includes weighting each pixel of the stereo model and each pixel of the aligned model based, at least in part, on the respective confidence measure values.
3. The method of claim 1, wherein said combining comprises minimizing a sum of a gradient error and a depth error, wherein the gradient error is computed by matching a plurality of gradients of the aligned model with a plurality of gradients of the fused model and the depth error is computed by matching a plurality of depths resulting from the stereo model generation with a plurality of depths resulting from the fused model generation, and wherein the depth error is weighted by the confidence measure.
4. The method of claim 1, wherein the computing the confidence measure includes using an iterative algorithm.
5. The method of claim 4, wherein the computing the confidence measure includes detecting a convergence status of each of a plurality of pixels of the stereo model and accumulating the convergence statuses of the pixels over a plurality of iterations of the algorithm.
6. The method of claim 1, wherein the aligning the stereo model with the at least one input model comprises:
receiving a plurality of inputs to the stereo model corresponding to a plurality of selected points in the at least one input model; and
computing a transformation between the stereo model and the at least one input model, based on the corresponding inputs.
7. The method of claim 6, wherein the aligning the stereo model with the at least one input model further comprises revising the transformation iteratively to minimize the difference between the stereo model and the at least one input model and locally adjusting an area of the transformation.
8. The method of claim 1, further comprising computing a surface normal based on the fused model.
9. The method of claim 8, wherein the computing the surface normal comprises:
generating a rough normal map from the fused model; and
for each pixel of the of the fused model:
computing an intensity of the pixel;
estimating a light direction based on the rough normal map and the intensity; and
refining a final normal using the estimated light direction.
10. The method of claim 8, further comprising iteratively performing:
the generating the fused model by providing the surface normal as an input to the generating resulting in an iterative fused model; and
the computing the surface normal based on the iterative fused model.
11. The method of claim 8, further comprising refining the fused model based on a shading information of the stereo images and a light direction of the fused model.
12. A system, comprising:
at least one processor; and
a memory comprising program instructions, wherein the program instructions are executable by the at least one processor to:
receive a stereo pair of images of an object of a type and at least one three-dimensional input model of the same type of object;
generate a three-dimensional stereo model of the object from the stereo pair of images of the object;
compute a confidence measure for the stereo model;
align the stereo model with the at least one input model resulting in an aligned model; and
generate a fused model, wherein to generate the fused model comprises combining the stereo model with the aligned model, wherein said combining includes weighting the stereo model and the aligned model based, at least in part, on the confidence measure.
13. The system of claim 12, wherein, to compute the confidence measure, the program instructions are executable by the at least one processor to compute a respective confidence measure value for each pixel of the stereo model, wherein said combining includes weighting each pixel of the stereo model and each pixel of the aligned model based, at least in part, on the respective confidence measure values.
14. The system of claim 12, wherein, to generate the fused model, the program instructions are executable by the at least one processor to:
minimize a sum of a gradient error and a depth error, wherein the gradient error is computed by matching a plurality of gradients of the aligned model with a plurality of gradients of the fused model and the depth error is computed by matching a plurality of depths from the stereo model generation with a plurality of depths from the fused model generation.
15. The system of claim 12, wherein, to determine the confidence measure, the program instructions are executable by the at least one processor to apply an algorithm iteratively.
16. The system of claim 15, wherein, to determine the confidence measure, the program instructions are executable by the at least one processor to:
detect a convergence status of each of a plurality of pixels of the stereo model; and
accumulate the convergence statuses of the pixels over a plurality of iterations of the algorithm.
17. The system of claim 12, wherein, to align the stereo model with the at least one input model, the program instructions are executable by the at least one processor to:
receive a plurality of inputs to the stereo model corresponding to a plurality of selected points in the at least one input model;
compute a transformation between the stereo model and the at least one input model, based on the corresponding inputs; and
revise the transformation iteratively to minimize the difference between the stereo model and the at least one input model.
18. The system of claim 12, further comprising wherein the program instructions are executable by the at least one processor to compute a surface normal based on the fused model.
19. The system of claim 18, wherein, to compute the surface normal, the program instructions are executable by the at least one processor to:
generate a rough normal map from the fused model; and
for each pixel of the of the fused model:
compute an intensity of the pixel;
estimate a light direction based on the rough normal map and the intensity; and
refine a final normal using the estimated light direction.
20. The system of claim 18, further comprising wherein the program instructions are executable by the at least one processor to iteratively:
generate the fused model by looping the surface normal back as an input to the generation; and
compute the surface normal based on the iterative fused model.
21. The system of claim 18, further comprising wherein the program instructions are executable by the at least one processor to refine the fused model based on a shading information of the stereo images and a light direction of the fused model.
22. A non-transitory computer-readable storage medium storing program instructions, wherein the program instructions are computer-executable to implement:
receiving a plurality of stereo images of an object of a type and at least one three-dimensional input model of the same type of object;
generating a three-dimensional stereo model of the object from the plurality of stereo images;
computing a confidence measure for the stereo model;
aligning the stereo model with the at least one input model resulting in an aligned model; and
generating a fused model, wherein said generating a fused model comprises combining the stereo model with the aligned model, wherein said combining includes weighting the stereo model and the aligned model based, at least in part, on the confidence measure.
23. The computer-readable storage medium of claim 22, wherein said computing the confidence measure includes computing a respective confidence measure value for each pixel of the stereo model, wherein said combining includes weighting each pixel of the stereo model and each pixel of the aligned model based, at least in part, on the respective confidence measure values.
24. The computer-readable storage medium of claim 22, wherein said combining comprises minimizing a sum of a gradient error and a depth error, wherein the gradient error is computed by matching a plurality of gradients of the aligned model with a plurality of gradients of the fused model and the depth error is computed by matching a plurality of depths from the stereo model generation with a plurality of depths from the fused model generation, and wherein the depth error is weighted by the confidence measure.
25. The computer-readable storage medium of claim 22, wherein the computing the confidence measure includes using an iterative algorithm.
26. The computer-readable storage medium of claim 25, wherein the computing the confidence measure includes detecting a convergence status of each of a plurality of pixels of the stereo model and accumulating the convergence statuses of the pixels over a plurality of iterations of the algorithm.
27. The computer-readable storage medium of claim 22, wherein the aligning the stereo model with the at least one input model comprises:
receiving a plurality of inputs to the stereo model corresponding to a plurality of selected points in the at least one input model;
computing a transformation between the stereo model and the at least one input model, based on the corresponding inputs; and
revising the transformation iteratively to minimize the difference between the stereo model and the at least one input model.
28. The computer-readable storage medium of claim 22, further comprising wherein the program instructions are computer-executable to implement computing a surface normal based on the fused model.
29. The computer-readable storage medium of claim 28, wherein, to compute the surface normal, the program instructions are computer-executable to implement:
generating a rough normal map from the fused model; and
for each pixel of the of the fused model:
computing an intensity of the pixel;
estimating a light direction based on the rough normal map and the intensity; and
refining a final normal using the estimated light direction.
30. The computer-readable storage medium of claim 28, further comprising wherein the program instructions are computer-executable to iteratively implement:
generating the fused model by looping the surface normal back as an input to the generation; and
computing the surface normal based on the iterative fused model.
31. The computer-readable storage medium of claim 28, further comprising wherein the program instructions are computer-executable to implement refining the fused model based on a shading information of the stereo images and a light direction of the fused model.
US12/952,431 2010-08-20 2010-11-23 Model-based stereo matching Active 2031-07-23 US8447098B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/952,431 US8447098B1 (en) 2010-08-20 2010-11-23 Model-based stereo matching

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US37553610P 2010-08-20 2010-08-20
US12/952,431 US8447098B1 (en) 2010-08-20 2010-11-23 Model-based stereo matching

Publications (2)

Publication Number Publication Date
US8447098B1 US8447098B1 (en) 2013-05-21
US20130129190A1 true US20130129190A1 (en) 2013-05-23

Family

ID=48365370

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/952,431 Active 2031-07-23 US8447098B1 (en) 2010-08-20 2010-11-23 Model-based stereo matching

Country Status (1)

Country Link
US (1) US8447098B1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110292043A1 (en) * 2009-02-13 2011-12-01 Thomson Licensing Depth Map Coding to Reduce Rendered Distortion
US20140002609A1 (en) * 2012-06-29 2014-01-02 Samsung Electronics Co., Ltd. Apparatus and method for generating depth image using transition of light source
US8718387B1 (en) * 2011-11-11 2014-05-06 Edge 3 Technologies, Inc. Method and apparatus for enhanced stereo vision
US20140368506A1 (en) * 2013-06-12 2014-12-18 Brigham Young University Depth-aware stereo image editing method apparatus and computer-readable medium
US20150062302A1 (en) * 2013-09-03 2015-03-05 Kabushiki Kaisha Toshiba Measurement device, measurement method, and computer program product
CN105794196A (en) * 2013-10-21 2016-07-20 诺基亚技术有限公司 Method, apparatus and computer program product for modifying illumination in an image
DE102015223003A1 (en) * 2015-11-20 2017-05-24 Bitmanagement Software GmbH Device and method for superimposing at least a part of an object with a virtual surface
US20170345131A1 (en) * 2016-05-30 2017-11-30 Novatek Microelectronics Corp. Method and device for image noise estimation and image capture apparatus
WO2019164631A1 (en) * 2018-02-26 2019-08-29 Qualcomm Incorporated Dynamic lighting for objects in images
US10462445B2 (en) 2016-07-19 2019-10-29 Fotonation Limited Systems and methods for estimating and refining depth maps
WO2020027584A1 (en) * 2018-08-01 2020-02-06 Samsung Electronics Co., Ltd. Method and an apparatus for performing object illumination manipulation on an image
US10839535B2 (en) 2016-07-19 2020-11-17 Fotonation Limited Systems and methods for providing depth map information
US11210803B2 (en) * 2018-03-14 2021-12-28 Dalian University Of Technology Method for 3D scene dense reconstruction based on monocular visual slam
US11470303B1 (en) 2010-06-24 2022-10-11 Steven M. Hoffberg Two dimensional to three dimensional moving image converter

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2158573A1 (en) * 2007-06-20 2010-03-03 Thomson Licensing System and method for stereo matching of images
TW201308253A (en) * 2011-08-04 2013-02-16 Univ Nat Taiwan Locomotion analysis method and locomotion analysis apparatus applying the same method
US9201580B2 (en) 2012-11-13 2015-12-01 Adobe Systems Incorporated Sound alignment user interface
US9355649B2 (en) 2012-11-13 2016-05-31 Adobe Systems Incorporated Sound alignment using timing information
US10249321B2 (en) 2012-11-20 2019-04-02 Adobe Inc. Sound rate modification
US9589326B2 (en) * 2012-11-29 2017-03-07 Korea Institute Of Science And Technology Depth image processing apparatus and method based on camera pose conversion
US9451304B2 (en) 2012-11-29 2016-09-20 Adobe Systems Incorporated Sound feature priority alignment
US10455219B2 (en) 2012-11-30 2019-10-22 Adobe Inc. Stereo correspondence and depth sensors
US9208547B2 (en) 2012-12-19 2015-12-08 Adobe Systems Incorporated Stereo correspondence smoothness tool
US10249052B2 (en) * 2012-12-19 2019-04-02 Adobe Systems Incorporated Stereo correspondence model fitting
US9214026B2 (en) 2012-12-20 2015-12-15 Adobe Systems Incorporated Belief propagation and affinity measures
CA2902430C (en) 2013-03-15 2020-09-01 Uber Technologies, Inc. Methods, systems, and apparatus for multi-sensory stereo vision for robotics
US9224060B1 (en) * 2013-09-17 2015-12-29 Amazon Technologies, Inc. Object tracking using depth information
US9507995B2 (en) * 2014-08-29 2016-11-29 X Development Llc Combination of stereo and structured-light processing
US10404969B2 (en) 2015-01-20 2019-09-03 Qualcomm Incorporated Method and apparatus for multiple technology depth map acquisition and fusion
JP6545997B2 (en) * 2015-04-24 2019-07-17 日立オートモティブシステムズ株式会社 Image processing device
US9922452B2 (en) * 2015-09-17 2018-03-20 Samsung Electronics Co., Ltd. Apparatus and method for adjusting brightness of image
CN106887018B (en) * 2015-12-15 2021-01-05 株式会社理光 Stereo matching method, controller and system
CN105701787B (en) * 2016-01-15 2019-04-12 四川大学 Depth map fusion method based on confidence level
US10656624B2 (en) 2016-01-29 2020-05-19 Hewlett-Packard Development Company, L.P. Identify a model that matches a 3D object
US10077007B2 (en) * 2016-03-14 2018-09-18 Uber Technologies, Inc. Sidepod stereo camera system for an autonomous vehicle
US20170359561A1 (en) * 2016-06-08 2017-12-14 Uber Technologies, Inc. Disparity mapping for an autonomous vehicle
US10967862B2 (en) 2017-11-07 2021-04-06 Uatc, Llc Road anomaly detection for autonomous vehicle
CN108537187A (en) * 2017-12-04 2018-09-14 深圳奥比中光科技有限公司 Task executing method, terminal device and computer readable storage medium
CN110349196A (en) * 2018-04-03 2019-10-18 联发科技股份有限公司 The method and apparatus of depth integration
US11100704B2 (en) * 2018-12-14 2021-08-24 Hover Inc. Generating and validating a virtual 3D representation of a real-world structure
CN109919993B (en) * 2019-03-12 2023-11-07 腾讯科技(深圳)有限公司 Parallax map acquisition method, device and equipment and control system
US11854279B2 (en) * 2020-05-25 2023-12-26 Subaru Corporation Vehicle exterior environment recognition apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6757086B1 (en) * 1999-11-12 2004-06-29 Sony Corporation Hologram forming apparatus and method, and hologram
US20110025827A1 (en) * 2009-07-30 2011-02-03 Primesense Ltd. Depth Mapping Based on Pattern Matching and Stereoscopic Information
US8094928B2 (en) * 2005-11-14 2012-01-10 Microsoft Corporation Stereo video for gaming

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7616198B2 (en) 1998-02-20 2009-11-10 Mental Images Gmbh System and computer-implemented method for modeling the three-dimensional shape of an object by shading of a two-dimensional image of the object
EP1412917B1 (en) 2000-03-08 2008-04-30 Cyberextruder.com, Inc. Apparatus and method for generating a three-dimensional representation from a two-dimensional image
KR100682889B1 (en) 2003-08-29 2007-02-15 삼성전자주식회사 Method and Apparatus for image-based photorealistic 3D face modeling
US7415152B2 (en) 2005-04-29 2008-08-19 Microsoft Corporation Method and system for constructing a 3D representation of a face from a 2D representation
US7756325B2 (en) 2005-06-20 2010-07-13 University Of Basel Estimating 3D shape and texture of a 3D object based on a 2D image of the 3D object
US7856125B2 (en) 2006-01-31 2010-12-21 University Of Southern California 3D face reconstruction from 2D images
US20090135177A1 (en) 2007-11-20 2009-05-28 Big Stage Entertainment, Inc. Systems and methods for voice personalization of video content
KR100914846B1 (en) 2007-12-15 2009-09-02 한국전자통신연구원 Method and system for texturing of 3d model in 2d environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6757086B1 (en) * 1999-11-12 2004-06-29 Sony Corporation Hologram forming apparatus and method, and hologram
US8094928B2 (en) * 2005-11-14 2012-01-10 Microsoft Corporation Stereo video for gaming
US20110025827A1 (en) * 2009-07-30 2011-02-03 Primesense Ltd. Depth Mapping Based on Pattern Matching and Stereoscopic Information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
(Encisco, E., "Synthesis of 3D faces", 1999, Integrated Media Systems Center, University of Southern California). *
(Klaudiny, Martin, "High-detail 3D capture of Facial Performance, May, 2010, "University of Surrey, Guildford GU2 7XH, UK). *
(Sun, Jian, "Sterro Matching Using Belief Propagation", July, 2003, "IEEE Transactions On Pattern Analysis and Machine Intelligence, COl. 25). *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9066075B2 (en) * 2009-02-13 2015-06-23 Thomson Licensing Depth map coding to reduce rendered distortion
US20110292043A1 (en) * 2009-02-13 2011-12-01 Thomson Licensing Depth Map Coding to Reduce Rendered Distortion
US11470303B1 (en) 2010-06-24 2022-10-11 Steven M. Hoffberg Two dimensional to three dimensional moving image converter
US20180276805A1 (en) * 2011-11-11 2018-09-27 Edge 3 Technologies, Inc. Method and Apparatus for Enhancing Stereo Vision
US10037602B2 (en) * 2011-11-11 2018-07-31 Edge 3 Technologies, Inc. Method and apparatus for enhancing stereo vision
US8718387B1 (en) * 2011-11-11 2014-05-06 Edge 3 Technologies, Inc. Method and apparatus for enhanced stereo vision
US10825159B2 (en) * 2011-11-11 2020-11-03 Edge 3 Technologies, Inc. Method and apparatus for enhancing stereo vision
US20160239979A1 (en) * 2011-11-11 2016-08-18 Edge 3 Technologies, Inc. Method and Apparatus for Enhancing Stereo Vision
US9554120B2 (en) * 2012-06-29 2017-01-24 Samsung Electronics Co., Ltd. Apparatus and method for generating depth image using transition of light source
US20140002609A1 (en) * 2012-06-29 2014-01-02 Samsung Electronics Co., Ltd. Apparatus and method for generating depth image using transition of light source
US20140368506A1 (en) * 2013-06-12 2014-12-18 Brigham Young University Depth-aware stereo image editing method apparatus and computer-readable medium
US10109076B2 (en) * 2013-06-12 2018-10-23 Brigham Young University Depth-aware stereo image editing method apparatus and computer-readable medium
US20150062302A1 (en) * 2013-09-03 2015-03-05 Kabushiki Kaisha Toshiba Measurement device, measurement method, and computer program product
EP3061071A4 (en) * 2013-10-21 2017-06-21 Nokia Technologies OY Method, apparatus and computer program product for modifying illumination in an image
US10015464B2 (en) 2013-10-21 2018-07-03 Nokia Technologies Oy Method, apparatus and computer program product for modifying illumination in an image
CN105794196A (en) * 2013-10-21 2016-07-20 诺基亚技术有限公司 Method, apparatus and computer program product for modifying illumination in an image
DE102015223003A1 (en) * 2015-11-20 2017-05-24 Bitmanagement Software GmbH Device and method for superimposing at least a part of an object with a virtual surface
US10127635B2 (en) * 2016-05-30 2018-11-13 Novatek Microelectronics Corp. Method and device for image noise estimation and image capture apparatus
US20170345131A1 (en) * 2016-05-30 2017-11-30 Novatek Microelectronics Corp. Method and device for image noise estimation and image capture apparatus
US10462445B2 (en) 2016-07-19 2019-10-29 Fotonation Limited Systems and methods for estimating and refining depth maps
US10839535B2 (en) 2016-07-19 2020-11-17 Fotonation Limited Systems and methods for providing depth map information
US10643375B2 (en) 2018-02-26 2020-05-05 Qualcomm Incorporated Dynamic lighting for objects in images
CN111742347A (en) * 2018-02-26 2020-10-02 高通股份有限公司 Dynamic illumination for objects in an image
US10818081B2 (en) 2018-02-26 2020-10-27 Qualcomm Incorporated Dynamic lighting for objects in images
WO2019164631A1 (en) * 2018-02-26 2019-08-29 Qualcomm Incorporated Dynamic lighting for objects in images
US11210803B2 (en) * 2018-03-14 2021-12-28 Dalian University Of Technology Method for 3D scene dense reconstruction based on monocular visual slam
WO2020027584A1 (en) * 2018-08-01 2020-02-06 Samsung Electronics Co., Ltd. Method and an apparatus for performing object illumination manipulation on an image
US11238302B2 (en) 2018-08-01 2022-02-01 Samsung Electronics Co., Ltd. Method and an apparatus for performing object illumination manipulation on an image

Also Published As

Publication number Publication date
US8447098B1 (en) 2013-05-21

Similar Documents

Publication Publication Date Title
US8447098B1 (en) Model-based stereo matching
US10803546B2 (en) Systems and methods for unsupervised learning of geometry from images using depth-normal consistency
CN109003325B (en) Three-dimensional reconstruction method, medium, device and computing equipment
US11514593B2 (en) Method and device for image processing
US10217234B2 (en) Modeling method and apparatus using three-dimensional (3D) point cloud
US10553026B2 (en) Dense visual SLAM with probabilistic surfel map
US9830701B2 (en) Static object reconstruction method and system
US8610712B2 (en) Object selection in stereo image pairs
JP7403528B2 (en) Method and system for reconstructing color and depth information of a scene
EP2874118B1 (en) Computing camera parameters
Tanskanen et al. Semi-direct EKF-based monocular visual-inertial odometry
US9098930B2 (en) Stereo-aware image editing
US20180189956A1 (en) Producing a segmented image using markov random field optimization
WO2019011958A1 (en) System and method for pose-invariant face alignment
CN101866497A (en) Binocular stereo vision based intelligent three-dimensional human face rebuilding method and system
CN104661010A (en) Method for building a three-dimensional model and apparatus thereof
US9437034B1 (en) Multiview texturing for three-dimensional models
US9147279B1 (en) Systems and methods for merging textures
EP3665651B1 (en) Hierarchical disparity hypothesis generation with slanted support windows
EP3343507B1 (en) Producing a segmented image of a scene
CN105590327A (en) Motion estimation method and apparatus
Roxas et al. Variational fisheye stereo
Concha et al. An evaluation of robust cost functions for RGB direct mapping
Delaunoy et al. Towards full 3D Helmholtz stereovision algorithms
EP3343504A1 (en) Producing a segmented image using markov random field optimization

Legal Events

Date Code Title Description
AS Assignment

Owner name: ADOBE SYSTEMS INCORPORATED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COHEN, SCOTT D.;YANG, QING-XIONG;REEL/FRAME:025400/0153

Effective date: 20101119

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: ADOBE INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:ADOBE SYSTEMS INCORPORATED;REEL/FRAME:048867/0882

Effective date: 20181008

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8