WO2001026050A2

WO2001026050A2 - Improved image segmentation processing by user-guided image processing techniques

Info

Publication number: WO2001026050A2
Application number: PCT/US2000/027347
Authority: WO
Inventors: William Dolson; Stanley J. Chayka
Original assignee: A.F.A. Products Group, Inc.
Priority date: 1999-10-04
Filing date: 2000-10-04
Publication date: 2001-04-12
Also published as: WO2001026050A3; AU7853900A

Abstract

Machine analysis of an image to segment (106) the image for subsequent processing is guided by input (102) from a human operator. A convenient drawing tool allows the user to select an object in an image. Machine image segmentation processing is constrained to a region of interest indicated by the operator using the drawing tool. Through the combination of opeator input and machine analysis, including edge detection, the object's boundaries are detected accurately. A resulting key signal may be manipulated by the operator in a number of respects and outputted for use in subsequent processing of the image, including color correction (104). Automatic segmentation is also applied to each image in a video clip based on a region of interest indicated by the operator on a first image of the clip. The region of interest is repositioned from image to image on the basis of detected motion of the object indicated by the region of interest.

Description

IMPROVED IMAGE SEGMENTATION PROCESSING BY USER-GUIDED IMAGE PROCESSING TECHNIQUES

BACKGROUND OF THE INVENTION

This invention relates to systems and methods for processing image signals. More particularly, the present invention pertains to improved systems and methods for segmenting images to generate key signals and for other purposes, and further is concerned with manipulation of key signals.

It is frequently desirable to segment an image plane so that only a portion of an image displayed in the image plane is selected for processing. For example, image segmentation is employed to select a particular object in an image for color correction separate from the balance of the image. According to one conventional image segmentation technique, the human operator attempts to fit a rectangular or circular window to an object to be color-corrected. Since most objects to be selected are neither circular nor rectangular, the fitting of the window to the object is usually inexact to a considerable extent. Even if the edges of the window are blurred or softened, the resulting color correction applied to the area of the window often produces unsatisfactory results.

In another known technique, referred to as "rotoscoping," the operator draws the boundary of a window under high magnification at a pixel-by-pixel level to outline the boundary of an object to be selected for color-correction. The rotoscope technique can result in windows that are very precisely matched to the object's outline, thereby producing high quality results. However, this technique is very time-consuming and labor-intensive, and therefore costly.

Other image segmentation techniques rely on color keying. According to one technique, the human operator draws a free-hand closed line-figure that entirely surrounds an object to be selected, and then draws a second free-hand closed line-figure that is entirely within the object. The computer then examines the colors of the pixels between the two line-figures to determine whether the pixels match those of the interior of the object or those of the background. This technique is unlikely to produce satisfactory results unless the object to be selected is of a contrasting color relative to the background. Various other proposals have also been made for semi-automatic image segmentation processing in which a human operator provides some guidance to an object boundary finding algorithm to be carried out by a computer processor. Examples of these proposals are disclosed in the following U.S. patents: No. 5,247,583, issued to Kato et al. and entitled, "Image Segmentation

Method and Apparatus Therefor;"

No. 5,617,487, issued to Yoneyama et al and entitled, "Image Cutout Apparatus;"

No. 5,181,261, issued to Nagao and entitled, "An Image Processing Apparatus For Detecting the Boundary of an Object Displayed in Digital Image;"

No. 5,887,082, issued to Mitsunaga et al. and entitled, "Image Detecting Apparatus."

However, to the best of applicants' knowledge none of these prior proposals have been embodied in a commercially available segmentation apparatus that can reliably identify object boundaries in a wide range of circumstances, and even when the object of interest shares color characteristics with the background of the image. It appears that prior proposals have failed to find an optimal combination of sophisticated image processing techniques and flexible options for the operator to guide the image processing techniques. Moreover, it is believed that the prior art has to date focused on still image segmentation, and has failed to consider how human-guided computerized image segmentation can be applied to dynamic sequences of images.

It would be desirable to provide an image segmentation technique in which an object to be processed, whether in a still image or a dynamic sequence of images, can be accurately and reliably identified and its boundaries outlined, without requiring laborious detailed input from a human operator.

OBJECTS OF THE INVENTION

Accordingly, an object of the invention is to satisfy the above needs and to provide a system and method for segmenting images with increased accuracy, efficiency, speed and convenience.

A further object is to efficiently apply an image segmentation algorithm to a dynamic sequence of images with limited operator guidance. Another object of the invention is to provide an apparatus and method which quickly and accurately generate a key signal to isolate a desired object for color correction or other image processing.

An additional object of the invention is to provide an improved user interface for generating matte and key signals.

Further objects of the invention are concerned with providing improved techniques for manipulating and adjusting key signals.

SUMMARY OF THE INVENTION

The invention satisfies the needs identified above and meets the foregoing objects by providing a method in which flexible tools for guidance by a human operator are combined with sophisticated machine analysis techniques to produce better and more accurate object selection windows than have heretofore been practical.

In a method provided in accordance with a first aspect of the invention, a first image of a sequence of images is displayed on a display device, and a region of interest is designated by the operator. An image segmentation algorithm is applied to the first image to generate an outline in the region of interest, the image segmentation algorithm being constrained to operate only within the region of interest. Another algorithm provides an indication of the motion of an object corresponding to the outline between the first image and a second image of the sequence of images and the region of interest is repositioned on the basis of the indicated motion of the object. The image segmentation algorithm is then applied to the second image to generate a second outline in the repositioned region of interest.

According to another aspect of the invention, a method of segmenting an image plane on the basis of features of an image displayed in the image plane includes the following steps: displaying the image on a display device, using a drawing device to superimpose a free-hand drawing figure on the image displayed on the display device (the free-hand drawing figure defining a band-shaped region of interest in the image plane formed as the locus of a circle moved in an arbitrary manner), applying an image analysis algorithm to the displayed image (the image analysis algorithm being constrained to operate only within the region of interest defined by the free-hand drawing figure and the algorithm operating without reference to any portion of the image outside of the region of interest), and segmenting the image plane on the basis of a result provided by application of the image analysis algorithm.

According to yet another aspect of the invention, a process for extracting features from an image includes applying an edge detector algorithm to pixel information arrayed in a region of interest in an image plane. The edge detector algorithm generates edge information from the pixel information. The process further includes the step of applying a bias function to the edge information to emphasize components of the edge information at the central portions of the region of interest, thereby producing biased edge information. According to a further aspect of the invention, an edge-modulated softness function is provided with respect to a key signal. In accordance with this aspect of the invention, a key boundary is generated by means of an edge detection algorithm, and the algorithm generates for each pixel on the key boundary edge-degree data which indicates a degree of definiteness of an edge at the respective pixel. A softness function is adjusted along the key boundary in dependence on the edge-degree data. The degree of softness is increased at points on the key boundary where a less definite edge was found.

According to still another aspect of the invention, a softness function is adjusted on the basis of an operator input signal to provide a "clean up" function. In implementing the clean up function, a key boundary is generated, a first region bordered by the key boundary is designated to be an inside region and a second region bordered by the key boundary is designated to be an outside region. A softness function is applied to the key boundary to generate a gradient in a key signal between the inside region and the outside region. In response to a control signal input by a human operator, the softness function is adjusted so that the slope of the gradient is increased on a side adjacent to the outside region without changing the slope of the gradient on a side adjacent the inside region.

The features of the invention allow for highly efficient image segmentation, in which a desired object in a dynamic stream of images may be selected for subsequent processing with great accuracy. The results obtained rival those which could be achieved in the prior art only by use of a rotoscope, but without the tedious and extremely time consuming high-magnification work required by the rotoscope. As compared to the rotoscope, the present invention represents an orders -of-magnitude improvement in speed and convenience.

Other significant features of the invention include a user interface that is highly intuitive, and easy to learn and to use. In addition, unique key-signal manipulation tools are provided which further enhance the utility of the invention.

The techniques of the present invention may advantageously be embodied in an external matte/key generator to be provided as a peripheral device for a color correction apparatus. The techniques of the present invention are also applicable to many other functions, such as image compositing, editing of still images generally, desk top publishing applications, video and motion picture production, colorizing of black and white films, and 3-D graphics displays.

It is also contemplated to include at least some of the capabilities of the present invention in image processing software of the types distributed to consumers and professional artists for operation on standard personal computers. BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects, features, and advantages of the present invention will become apparent upon consideration of the following detailed description of illustrative embodiments thereof, especially when taken in conjunction with the accompanying drawings, wherein: Fig. 1 is a block diagram of an image processing system in which the present invention is applied.

Fig. 2 is a block diagram of personal computer hardware which may constitute a portion of an image segmentation component shown in Fig. 1.

Figs. 3 and 4 together schematically illustrate image segmentation and key signal manipulation processes carried on in accordance with the invention.

Figs. 5A and 5B pictorially illustrate key signal manipulation processes carried out in accordance with the invention.

Fig. 6 is a screen display which shows an image to be processed for image segmentation as well as certain control options made available to a human operator. Fig. 7 is a screen display similar to Fig. 6, but also showing a partial drawing figure superimposed on the image to select a portion of the image for color correction. Fig. 8 is another screen display, showing the complete drawing figure which selects an image portion for color correction.

Fig. 9 is another screen display, showing a region of interest designated by the human operator for image segmentation purposes, as well as associated inside and outside regions and an extended region of interest.

Fig. 10 is a pictorial illustration of certain calculations included in processes shown in Fig. 3.

Fig. 11 is another screen display, illustrating the locus of a center of the region of interest. Fig. 12 is another screen display, showing a mapping of edge detection information generated by reference to luminance image information.

Fig. 13 is a screen display similar to Fig. 12 but showing a mapping of edge detection information based on color image information.

Fig. 14 is still another similar screen display, showing a combination of the luminance and color edge detection maps.

Fig. 15 is a pictorial illustration of a step included in the processes illustrated in Fig. 3.

Fig. 16 is a screen display similar to Figs. 12 - 14, and illustrating a result of applying a biasing function to the combined edge information map shown in Fig. 14. Fig. 17 is another screen display, illustrating edge gradient data calculated from the biased edge data illustrated in Fig. 16.

Fig. 18 is a screen display illustrating the effect of applying a diffusion function to the edge gradient data illustrated in Fig. 17.

Fig. 19 is still another screen display, showing the image that was processed, together with an outline which is the outcome of the image segmentation process of the present invention.

Fig. 20 is a screen display which shows a key mask produced from the image 10 segmentation process.

Fig. 21 is another screen display, illustrating an outline adjustment mode provided in accordance with the invention.

Fig. 22 is a screen display which is similar to Fig. 20, but showing a key mask to which a softness function has been applied. Fig. 23 is another screen display, showing how the key mask of Fig. 22 selects a portion of the image.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

System Overview Fig. 1 shows an image processing system 100 in which the present invention is employed. The image processing system 100 includes an image information source device 102 which provides information representative of images to be processed. In the particular embodiment shown in Fig. 1, the image processing system 100 is employed for color correction and includes a color-correction device 104 which receives image information representing images to be color-corrected from the source device 102.

The image processing system 100 also includes an image segmentation device 106 which receives image information from the source device 102. The image segmentation device 106 processes the image information to generate key signals which are output from the image segmentation device 106 to the color correction device 104. The color correction device 104 uses the key signals generated by the image segmentation device 106 to control color correction processes in the color correction device 104.

The source device 102 may be any conventional memory or mass storage device used to store digital image information. The source device 102 may also, or alternatively, include conventional film and television record and playback devices including telecine transfer systems and film projectors, and video tape and disc players and recorders. If such devices are employed, there preferably is a mechanism for synchronizing the segmentation device 106 and the source 102 so that the key signal output from segmentation device 106 is provided to the color corrector 104 synchronously with the corresponding image from source 102. Alternatively, a digital camera or a transmission facility may be substituted for the image information source device 102 as the source of the image information fed to the color correction device 104 and to the image segmentation device 106. The color correction device 104 may also be a conventional item, and preferably is either one of the ColorVision Stealth and ColorVision Copernicus color correctors, which are available from the assignee of the present application. Hardware Aspects of Image Segmentation Device

The image segmentation device 106 is preferably implemented with standard PC hardware programmed with software provided in accordance with the invention. It is also preferred that the image segmentation device 106 have enhanced digital image storage capabilities by incorporating an integrated digital disk recorder board such as the ClipStation PRO, which is commercially available from DVS GmbH, Hannover, Germany.

Fig. 2 provides a simplified overview of the hardware which makes up the image segmentation device 106. The hardware components of the image segmentation device 106 include a microprocessor 110 which controls the over-all operation of the image segmentation device 106 and also carries out image segmentation processes in accordance with the invention. Connected to the microprocessor 110 are memory 112, which is a RAM for storing a program to control the microprocessor 110 and also functions as a working memory, and mass storage 116 in which image information to be processed by the image segmentation device 106 may be stored. Of course, program information may also be stored in the storage device 116. The mass storage 116 may correspond to the above-referenced integrated digital disk recorder board. Or, the mass storage 116 may be a combination of the recorder board and a standard hard disk, or simply a standard hard disk alone. Also connected to the microprocessor 110 is a data communication interface

118 through which the image segmentation device 106 receives image information to be processed from the information storage device 102 and transmits key signal information to the color correction device 104.

The user interface for the image segmentation device 106 includes a display device 120 driven by the microprocessor 110 and a drawing device 122 connected to the microprocessor. In a preferred embodiment of the invention, the drawing device 122 is the Intuos II stylus and tablet/mouse peripheral which is commercially available from Wacom Technology Corporation, Vancouver, Washington.

The image segmentation device 106 may also include other input/output components (not shown in the drawing) which are standard in personal computers. such as a keyboard, speakers etc. The drawing device 122 may be constituted by only one of a stylus/tablet or mouse, and/or by a trackball, light pen or touch screen. Indicating Region of Interest in Image

Processes carried out by the image segmentation device 106 in accordance with the invention will now be described with reference to Figs. 3 and 4. For the present discussion it will be assumed that image segmentation device 106 has received image information from the information storage device 102. The received image information may represent a single image to be segmented or may represent plural images, including images making up a dynamic sequence of images (e.g. a video clip). In the case of processing a video clip, either complete frames or individual fields may be processed depending on the origin of the video clip. The discussion to follow will be concerned with segmentation of images in a video clip, but many of the segmentation techniques to be described are also applicable to still images. It will further be assumed that a first image in the video clip has been selected for processing by the human operator. Accordingly, as shown in Fig. 6, an image 208 which is to be segmented for color correction is displayed in an image window 210 in a graphical user interface screen 212. At this point, and in accordance with block 410 in Fig. 3, the human operator is permitted to input signals by means of the drawing device 122 to implement a software drawing tool by which the operator roughly indicates a desired segmentation of the image 208. For purposes of illustration, it will be assumed that the task to be performed is color correction of the flesh tones of the model 214 who is seen in image 208. The width of the drawing figure to be drawn by the drawing tool can be adjusted by means of slide bar 215 and the currently selected width is indicated at 217.

Fig. 7 is a screen display similar to Fig. 6 but showing a region 216 which is generated by the image segmentation device 106 by operation of the software drawing tool in response to signals input by the human operator via the drawing device 122. It will be seen that the region 216 is a partial rough outline of portions of the image which correspond to the model^'s skin. The region 216 is in the form of an extended band. The region 216 is defined as the locus of a circle moved in an arbitrary manner as indicated by the drawing device 122. The region 216 also corresponds to the portion of the image plane between a pair of substantially parallel free-hand drawing lines 218 and 220 which are simultaneously generated on the screen as the human operator draws using the drawing device 122. The lines 218 and 220 are "substantially parallel" in the sense that the distance across region 216 in the direction normal to lines 218, 220 is substantially constant along the length of region 216. This distance across region 216 is equal to the width of the drawing tool as selected by means of slide bar 215 and indicated by feature 217. As is common with free-hand software drawing tools, the lines 218 and 220 may be either curved or straight, and in general may be used to define an arbitrary, irregularly shaped region. (As will be appreciated by those who are skilled in the art, conventional drawing software packages include: (a) shape tools by which predetermined geometric shapes such as rectangles or other polygons, circles and ovals are created, positioned and stretched or shrunk or otherwise manipulated, (b) "connect-the-dots" tools by which straight line segments are generated between control points established by the user, and (c) free-hand tools in which a line is generated on the screen without any restriction as to shape and governed solely by the locus through which the mouse or other drawing instrument is moved, akin to doodling with a pencil on a piece of paper. The software drawing tool which generates region 216 is of the latter type, having a user- adjustable width, which is also a conventional feature.)

Although the region 216 may be drawn with a single continuous stroke, this is not required. The region 216 may also be indicated with multiple disconnected strokes, may have an irregular border, may be defined by repeated short motions or sketching by the drawing device, may be filled by additional strokes along the outside and/or the inside, and may have multiple branches and regions. There is no restriction on the manner or order in which the region is drawn and there is no restriction on the shape of the region. However, for best results the region should be shaped and positioned so that the desired object boundary is approximately at the center of the region. The region 216 may be indicated on the screen display by changing the luminance level and/or a color tint in the region relative to the balance of the image. Features of the underlying image remain visible in the region 216 (see, e.g., the model's right hand at 222) and thus are not occluded by the region 216.

Fig. 8 is another screen display similar to Figs. 6 and 7, but showing the region 216 after it has been completed so as to surround the entire portion of the image which corresponds to the model's skin. Each of the lines 218 and 220 (which together define the region 216) forms a respective closed freehand figure, with the figure defined by line 220 being entirely contained within the figure defined by line 218. It will be understood that the region 216 itself constitutes a free-hand drawing figure. In the display shown in Fig. 8, the background 224 of the image, corresponding to the area outside of the region 216, is at a reduced luminance relative to the region 216, and the inside 226 of the area selected for color correction is at an increased luminance level relative to the region 216. This is indicative of the operator having designated area 216 to be an "inside" region and area 224 to be an "outside" region relative to the highlighted region 216. The designation of the "inside" region may be made automatically by the system 100 (e.g. by selecting the smaller of two regions partitioned in the image plane by the region 216) or may be designated by the operator. A designation made by the system may be over-ridden by the operator.

By drawing the region 216 in the image plane, the human operator indicates to the image segmentation device 106 a specific, limited portion of the region in which the image segmentation device is to perform image segmentation processing to find boundaries of an object to be color corrected. The region drawn by the operator may be referred to as a region of interest (ROI), and appears in Fig. 9 as a shaded freehand drawing figure 216, corresponding to the "highlighted" region 216 of Fig. 8. An inside region 232, indicated in white in Fig. 9, is bordered by the region of interest 216 and represents a portion of the image which is entirely inside the object selected by the operator. A background or outside region 234 is also bordered by the region of interest 216 and is indicated in dark tones in Fig. 9. In connection with the display of Fig. 9, the operator may again have the option to modify the region of interest 216 by using the drawing device 122.

In the example which has been illustrated hereinabove, a single object was selected for color correction by means of a single closed drawing figure which defines a single closed region of interest. However, a preferred embodiment of the invention provides many other options to the human operator in terms of selecting objects and drawing regions of interest. For example, the region of interest need not be a closed drawing figure, but rather can be terminated at one or more sides of the image plane. Also, the drawing figure to define the region of interest need not simply be drawn with one continuous stroke of the drawing tool. The region of interest may be expanded by drawing additional strokes with the drawing tool to shade or fill in the region of interest either to the inside or outside or both. Thus the operator may increase the width of the region of interest at particular portions of the ROI which had previously been designated and displayed.

There also need not be a one-to-one relationship between regions of interest and objects selected for color correction. Thus, an object which surrounds an area which is not to be considered the object (e.g., a doughnut seen in plan view), may be defined by means of two unconnected regions of interest. In the example just given, one region of interest would be drawn to indicate the outer perimeter of the doughnut, and a second region of interest drawn to indicate the inner perimeter of the doughnut. To define the key boundary in this case, the image segmentation device performs two image segmentation processes, one constrained to the first region of interest and the second constrained to the second region of interest.

A preferred embodiment of the invention also allows the operator to select several objects in the image for color correction simultaneously, using respective regions of interest to select each of the objects. For example, in the image shown in Fig. 6, the operator could draw a respective region of interest around each one of several of the flowers shown in the image, and the image segmentation device would then find the boundaries of each of the flowers to generate a key map made up of several disjoint parts. If more than one object is selected in an image, different luminance levels or color tints may be displayed in the respective regions corresponding to the selected objects, to indicate that different post-processes, such as different color correction processes, are to be applied to the various objects.

The region of interest can also be processed in a "skeleton" mode (accessible by the control 328 shown in Fig. 6). When the ROI is processed in the skeleton mode, the image segmentation device automatically analyzes drawing figures generated in this mode to derive a "skeleton" of the drawing figure in accordance with known image analysis techniques. ("Skeleton" is a term of art that is well understood in the context of image analysis processing.) The resulting skeleton is then automatically designated to be an inside region. This mode is particularly useful when it is desired to select for color correction thin linear objects such as plant stems or birds' legs. The selection can simply be done by drawing a linear stroke of the drawing tool, in the skeleton mode, along the length of the object to be selected. If such a linear stroke is drawn so as to be attached to an existing region of interest with an inside designated region surrounded by the region of interest, the interior region designated as the skeleton will be connected to the designated inside region.

Another mode of operating the drawing tool, which may be referred to as a strand mode, is somewhat similar to the skeleton mode, but does not require analysis of a drawing figure to find the skeleton thereof. Instead, figures drawn using the strand tool automatically include a third line which appears on the screen parallel to and halfway between the two lines which are effectively defined by the left and right sides of the band drawn by the drawing tool. This inner line is automatically designated to be an inside region relative to the region of interest defined by the locus of the figure drawn by the drawing tool. Thus this tool simultaneously draws three free-hand lines in parallel to each other with equal spacing between the first and second line and between the second and third line. The region of interest is defined between the first and second line and between the second and third line with the second line itself being a narrow inside region. If such a tool is employed to draw a closed figure, the result would be three closed line figures with the second contained inside the first and the third contained inside the second. A first region of interest defined between the first and second line figures would be subjected to an image segmentation operation, as would a second region of interest defined between the second and third line figures.

In addition to or instead of the free-hand drawing tools provided in the above-described embodiments of the invention, it is also contemplated to provide a drawing tool of the type, well known from computer drawing software packages, in which straight lines (of adjustable width) are drawn sequentially between control points selected by the operator. Such drawing tools are sometimes referred to as

"connect the dots" tools. The present invention also contemplates a further alternative drawing tool to be used to designate a region of interest instead of or in addition to the tools described hereinabove. In accordance with this aspect of the invention the operator is permitted to draw a single fine line completely inside or completely outside the object of interest. The line may be automatically closed in accordance with conventional techniques if the operator so selects. An operator- actuatable control then causes the width of the line to be increased toward the inside or outside of the object, as the case may be, until the widened line covers the object boundary. Preferably the widening of the line is continued by the operator until the line is more or less evenly divided by the object boundary. The widened line now may be taken to be a region of interest in which image segmentation may be performed. A drawing tool having some width can be used to adjust the region of interest by adding to it or erasing parts of it. It is also contemplated to employ the step of increasing the width of the line drawn by the operator to a line drawn with a tool having some substantial width, i.e. to a band-shaped region as previously described.

Another drawing tool (indicated at 330 in Fig. 6). referred to as the "mesh" tool, is available to the operator to allow him or her to designate sections of the region of interest for carrying out a supplemental detail finding algorithm in areas where the desired object's boundary is highly complex. The designation of such a supplemental region of interest is indicated by block 411 in Fig. 3. Preferably the supplemental detail ROI sections generated at block 411 are displayed with a mesh pattern or other distinctive marking to distinguish it from the main ROI section 216 as seen in Fig. 8. (No such supplemental ROI section is shown in the drawings.) In the particular image shown in Fig. 6, it might be desirable to invoke complex boundary finding where the model's hair partially hides her forehead, as seen at 331 in Fig. 6. For purposes of segmenting images in a video clip, the supplemental ROI may be attached as an appendage to the main ROI, so that the supplemental ROI has its position changed as the main ROI has its position changed in accordance with practices to be described below. The supplemental ROI may be appended so as to fall along the center of the main ROI, or may have offset data associated with the supplemental ROI so that the supplemental ROI is appended inside or outside the rough boundary indicated by the main ROL It may also be desired to draw the entire ROI with only the mesh tool, in which case the main ROI and supplemental ROI are the same.

Referring again to Fig. 9, an extended region of interest (EROI) 236 extends both inwardly and outwardly from the ROI 216. No image segmentation process is carried out within the EROI, but key signal processing (to be described below) may occur within the EROI 236. Moreover, motion measurement may be constrained to occur only within EROI 236. In effect, by drawing a region of interest, the operator has given to the image segmentation device a general indication of where in the image plane to find the boundary of an object selected by the operator. Based on the locus of the ROI, the image segmentation device proceeds to perform further processing to segment the image according to the guidance provided by the operator. This processing is represented by blocks 412 and 413 in Fig. 3. At block 412 the image segmentation device extracts luminance and color component information for the portion of the image corresponding to ROI 216. The process then may continue according to any one of a large number of feature extraction techniques. In a preferred embodiment of the invention simple (Sobel) edge detection processing is used to generate an "external force field" which drives a "snake" (active contour) to find the object boundary.

First, the segmentation device calculates, for each pixel in the ROI, a metric which will be referred to as the "ROI distance" (D _OI). Fig. 10 schematically illustrates how this data is calculated. In Fig. 10, a portion of an ROI 216 is shown, including a pixel 242 for which a DRO, is to be calculated. The calculation is based on the distance DI between the pixel 242 and a pixel 244 which is the closest pixel in the inside region 232 to pixel 242. The image segmentation device also defines a distance Do between pixel 242 and a pixel 246 which is the closest pixel in the outside region 234 to the pixel 242. The ROI Distance DROI is then calculated as D_o/D,+D₀).

On the basis of the ROI distance data calculated for the pixels of the region of interest 216, the image segmentation device goes on to calculate further data. As indicated by block 242, the image segmentation device defines a center locus of the region of interest as formed by pixels which have an ROI distance substantially equal to 0.50. This center locus for the ROI is indicated as the outer perimeter of a shaded area 216' in Fig. 11. If Fig. 11 is compared with Fig. 9, it will be observed that the shaded region in Fig. 11 has been reduced in width by one-half relative to the shaded area in Fig. 9. The ROI distance data is also used to calculate a metric called "ROI width" for each point in the ROL This is done by adding, for each point at the ROI center, the respective O (distance to nearest inside pixel) plus D₀ (distance to nearest outside pixel). Both of these measures were previously referred to in connection with Fig. 10. Furthermore a "relief map" is generated for the region of interest. The "relief map" is a vector field which is the derivative or slope in the X and Y (horizontal and vertical) directions of the ROI distance which was calculated for all of the pixels in the region of interest.

The ROI center information and ROI width data are used to generate another kind of mapping data for pixels in the region of interest. This data is referred to as "edge axis" mapping data and is generated as follows. For each point at the ROI center, an average normal direction to the ROI center boundary is determined. This direction is then compared to the following four default directions: "north-south", "east-west", "northeast-southwest", and "northwest-southeast". The one of these four default directions which is closest to the determined normal direction at the ROI center point being considered is selected, and then the selected default direction is assigned to all pixels surrounding the center point and within a distance from the center point equal to one-half the ROI width at the center point. Of course, this default direction data is not assigned to points outside of the region of interest. Also, since this process applies to each point on the ROI center, and because the center bends along its length, more than one of the default directions may be assigned to at least some of the pixels in the ROL The assigned default direction data are used, as will be seen, as an input to directional edge detection processes which will now be discussed. According to alternative embodiments, the normal direction at each point of the ROI center may be quantized to more or fewer than four values, or the raw normal direction itself may be stored as an input to a directional edge detection process.

Feature Extraction

As noted before, in a preferred embodiment of the invention, feature extraction is implemented as a conventional edge detection technique, such as the well-known Sobel edge detector, but modified for detection in a desired direction. Prior to application of edge detection or other feature detection processing, low pass filtering may be applied to the image data in one or both of the main ROI and the supplemental detail ROL In a preferred embodiment, the edge detector operates as a convolution in luminance space on each pixel and its eight nearest neighbors with a separate convolution kernel for each direction. The luminance edge detection of block 256 is applied at each pixel location in the ROI in the one or more directions that were assigned to the pixel by the edge axis map generated at block 254, and the resulting edge data for each of the indicated directions is averaged. The luminance edge detector is not operated outside of the region of interest.

A similar directional edge detection process is carried out with respect to color information for the pixels in the ROL Again, the edge axis mapping information is used to indicate the direction in which edges are to be sought at each pixel. This color edge detection process is applied to the color component information extracted at block 412 in Fig. 3. However, contrary to conventional practices, the color edge detection processing is not applied to the color information on a component-by-component basis. Rather, edge detection is performed on the basis of distances between pixels in a multi-axial color space. That is, the edge detection algorithm operates by calculating Euclidean distances among one or more pairs of nearest neighbors of the pixel in question, as measured in multi-axis color space. This is different from using simple subtractive distances in a single color component axis, as has been prescribed by the prior art. The color space in which the distances are calculated is defined by R-Y and B-Y axes in a preferred embodiment of the invention. However, it is contemplated to use other sets of axes, such as hue and saturation, and to use color spaces having more than two dimensions. The present inventors have found that applying color edge detection to Euclidean color space distances provides much more effective object boundary detection than the single component-based edge detection processing applied to color information according to the prior art.

Like the luminance edge detector, the color edge detector may be a variant of a conventional edge detection process such as the Sobel detector, and is constrained to operate only within the region of interest. After the luminance and color edge detection processes are complete, it is preferred that the resulting data of each one be normalized to a range of 0-1.0, and then each is raised to a power such that the mean of each lies at the same value. These normalization and mean-matching steps have been found to provide optimum performance of subsequent operations. Fig. 12 illustrates edge detection data generated by the directional luminance edge detector. It will be observed from Fig. 12 that the luminance edge detector found rather definite edges at areas indicated at 260, 262, 264, and 266 in Fig. 12. At other parts of the region of interest, such as those indicated at 268 and 270, the detector found rather weak or virtually nonexistent edges.

Fig. 13 shows the edge detection information generated on the basis of the color image information. Strong edges are seen in Fig. 13 at 272 and 274 in Fig. 13, whereas no strong edges were found in areas 276, 278.

The luminance-based and color-based edge detection information are then combined to produce combined edge information. The two edge detection maps may be combined additively, but preferably for each pixel the maximum of the luma and color edge detection data is taken to provide the combined edge data. Also, either one of the luma edge detector and the color edge detector may be disabled, as indicated at control portion 282 of Fig. 6. It is contemplated to combine additional feature maps generated by algorithms which extract features based on direct measurement and/or derivatives of one or multi-dimensional image parameter spaces, resulting in a combined feature map.

The combined edge map is illustrated in Fig. 14. It will be observed from Fig. 14 that a rather strong edge has been found virtually all along the region of interest. A weighting or biasing function is applied to the edge detection data across the transverse dimension of the region of interest so as to emphasize components of the edge detection data that are located toward the center of the region of interest. For that purpose a Gaussian or similar weighting function (such as a sinusoidal peak function) is applied across the region of interest, as schematically illustrated in Fig. 15.

The weighting function is applied in a manner which accommodates arbitrary shapes of the ROI The weighting function is defined over the range 0 - 1 , inclusive. It will be recalled that an ROI distance metric has been defined for each point in the ROI and having values in that range which indicate the distance of the respective point for the inside and outside regions. The weighting function would be defined to have the value 0 for the 0 and 1 values of the ROI distance metric and a value of 1 for the 0.5 value of the ROI distance metric, with a suitable tapering for values of the ROI distance metric between 0 and 0.5 and between 0.5 and 1. Such a weighting function can easily be implemented as a lookup table. Since the ROI distance function is completely defined over any ROI of arbitrary topology, the weighting function can be defined for an ROI of any shape. As a result, the present invention can allow the user to designate an ROI having any arbitrary shape.

As a result of the bias function, edges or other features detected near the center of the region of interest are favored, while those near the edge of the ROI are suppressed. This bias function effectively sharpens the guidance provided by the operator's input to guide the machine's image segmentation process toward the center of the region of interest. This reflects an assumption that the operator will attempt to more or less evenly bracket the desired object boundary with the inner and outer perimeters of the region of interest.

The biased edge information is illustrated in Fig. 16. If Fig. 16 is compared with Fig. 14, it will be observed that the edge components toward the inside or outside perimeters of the region of interest have generally been reduced (de-emphasized). Conversely, the components of the edge information at a central portion of the ROI are emphasized by the bias function.

Although the present inventors have found that the results of the edge detection processes are enhanced by using directional edge detectors based on the edge axis map, it has been found that adequate results may also be obtained by using non-directional edge detectors, in which case the edge axis mapping process may be dropped. It is also contemplated to use component-based color edge detection instead of the color-space-based edge detection which was referred to above.

Boundary Finding The next step in the segmentation process is to locate the position of a boundary by using the extracted and processed feature map. In a preferred embodiment, snakes or active contours are used within a gradient vector flow (GVF) field to find the position of the object boundary. The biased edge detection data is processed to generate a gradient vector flow field, which is generally of the type discussed in the following papers: C. Xu and J.L. Prince, "Gradient Vector Flow: A New External Force for Snakes," Proc. IEEE Conf. on Comp. Vis. Patt. Recog. (CVPR), Los Alamitos: Comp. Soc. Press, pp. 66-71, June 1997;

C. Xu and J.L. Prince, "Snakes, Shapes, and Gradient Vector Flow," 7EEE Transactions on Image Processing, pp. 359-369, March 1998.

Some of this material is also available in an article published online entitled "Gradient Vector Flow", by Chenyang Xu and Jerry L. Prince, found at iacl.ece.jhu.edu/project/gvf.

The purpose of the gradient vector flow field is to provide an external force field for a snake or active contour segmentation process.

The biased edge data is then used to calculate an edge gradient vector field, which may be a LaPlacian or derivative of the biased edged information. The resulting edge gradient field is shown in Fig. 17. This edge gradient data is then normalized and diffused throughout the region of interest to generate a data field referred to as the "edge relief map" or GVF field. The resulting map is illustrated in Fig. 18.

The resulting GVF field is used in connection with another known image analysis technique referred to as an "active contour model" or "snake". The articles referred to above contain descriptions of image analysis using snakes, and so does an article entitled "Active Contour Models (Snakes)", which has been published online at www.cogs.susx.ac.uk users/davidy/teachvision/vision7.html. A snake is a model that may be generated in a two dimensional image plane and includes control points along the length of the model which are deemed connected by virtual springs. The springs may reflect various models, but in a preferred embodiment of the present invention are modeled in accordance with Hooke's Law and have a rather low spring force with no resistance to bending.

Essentially, the balance of the process for segmenting a single image entails generating a snake and allowing it to be driven to the desired object boundary through interaction of the gradient vector flow field and the snake's own internal energy characteristics. As is known to those who are skilled in the art, snake models operate to minimize the combined energy of the system in which they operate. An initial position for the snake is set as the center of the region of interest, as described above in connection with Fig. 11. The snake is then iteratively repositioned under the influence of the edge relief map, until a convergence test is satisfied. Each time the snake is repositioned, it is reparameterized so that the spacing between the control points is substantially equalized. Also, the snake is not permitted to depart from the region of interest. It should be noted that it is not a common practice to constrain snakes within a user-defined region of interest.

The convergence test calls for comparing the movement of the snake with a threshold to determine whether the snake has moved significantly in the latest repositioning. If the amount of movement is not significant, convergence is deemed to have occurred. The snake movement is measured by determining the amount that each control point has moved at the direction normal to the snake at the control point. The amount of movement in the normal direction is then averaged over the control points and the resulting average is compared with the threshold. Once convergence is found, the image is segmented at the final position of the snake and a key signal is produced. (The term "key signal" should be understood to include signals used for various types of image segmentation activities, including keying operations, mattes, windows, rotoscopes and "cutouts".) The locus of the final snake position defines a segmentation map for the image plane, the segmentation map being an output of the image segmentation process. The segmentation map corresponds to a machine-generated outline of the object selected by the operator.

In Fig. 19 a rather bright outline 298 is indicative of the final snake position, and hence the locus of the segmentation map. It will be observed that the outline 298 quite accurately indicates the boundary between the skin area selected by the human operator and the balance of the image. The segmentation map is also illustrated in Fig. 20 in the form of a "hard" key mask.

If reasonably high-end PC hardware is used, the process of machine image analysis, from the time the operator indicates drawing of the ROI 216 is complete until the outline 298 is drawn by the computer, requires only a few seconds or less. The process of drawing the ROI itself also need only take a few seconds. In the embodiments described herein, snakes (active contour models) have been used to find object boundaries in the image. However, the invention contemplates using boundary locating techniques other than snakes.

Moreover, assuming snakes or similar techniques are employed, there are many possible variations in the manner of calculating an external force field for the snake.

Although the snake-based segmentation process described above often produces a very accurate result in terms of finding the desired object boundary, it is also contemplated (although not essential) to apply "high level" constraints to the segmentation process to further improve the accuracy of the process. The application of high level constraints is represented by block 414 in Fig. 3. If a snake is employed to find the boundary, certain high level constraints, such as continuity or closure, smoothness, resistance to bending and elasticity, may be inherent in the processing of block 413. But if other boundary finding techniques, such as Canny edge detection, are employed at block 413, then high level constraints such as those enumerated above may be applied at block 414.

Another high level constraint that may be employed at block 414, even when a snake is employed at block 413, is referred to as "shape memory". In essence, shape memory may be applied at parts of the nominal boundary where the edge detection information is weak (exhibits low confidence). Assuming that the image segmented in block 413 is not the first in a scene, the shape of the outline in the low confidence region is clipped from the corresponding portion of the boundary outline in an immediately preceding image, and the clipped outline segment is spliced into the region of low confidence. If the image is the first in a scene, the low confidence portion of the outline may be replaced by the corresponding segment of the initial snake position along the ROI center. Since it is possible that the outline has moved and/or changed shape firm image to image, the clipped segment must be transformed to match the new outline. This is done by measuring the transformation at the regions of good confidence adjoining the splice and interpolating over the length of the splice. To make this possible, each control point on the snake carries or drags with it certain parametric data as the snake is iteratively repositioned in block 413. This parametric data is stored in a shadow data structure, and may include edge strength data (relevant to the "shape memory" feature, and also relevant to the "edge modulated softness" feature described below), ROI width data (relevant to ROI relocation, to be discussed below), and prior positions of the control points (relevant to the above-mentioned convergence test). As is known, the reparameterization of the snake during block 413 may result in control points being added or dropped. If a control point is dropped, its corresponding shadow data in the shadow data structure is also dropped. If a control point is added, the corresponding shadow data for the new control point is augmented by interpolation or the like from the shadow data for neighboring control points. A variation of shape memory, referred to as "shape history", may also be employed at block 414. If the shape history constraint is applied, more than one prior image is considered in forming a splice for a region of low edge confidence. The outline data from the prior images may be accumulated by adding the coordinates for each successive outline to a prior average. The scaling factors applied to the most recent outline data and to the running average may be varied to provide a variable degree of persistence to the outlines of the prior images.

The threshold for the confidence measure, to be used to determine whether shape memory or shape history is applied, may be subject to hysteresis. That is, the threshold may be set higher at portions of the outline for which low confidence was found in a prior image. Also, an average confidence measure may be computed for the outline as a whole, and a graphical display element such as a bar graph may be displayed based on the average confidence measure to provide an indication of the success of the segmentation process. The average confidence measure can be thought of as a figure of merit for the segmentation process. This figure of merit may be particularly useful in the object tracking process to be described below. When the confidence measure declines significantly from one image to the next, it may be assumed that there has been a change of scene, or loss of tracking of the desired object for some other reason. In these cases, the operator may be prompted to designate a new ROI. Another high level constraint that may be applied at block 414 is shape reluctance. This constraint may be described as a resistance to changing shape, even in areas of high edge confidence. This constraint may be applied either as a simple gate, specifying a maximum permitted deviation in shape, or as a continuous function, whereby indicated changes in shape are scaled nonlinearly.

Another high level constraint that could be applied at block 414 would be to require the operator to manually adjust the outline at regions which exhibit low edge confidence. Operator adjustment of the outline will be discussed further below.

Still another high level constraint that could be applied at block 414 would be temporal smoothing of the outline over multiple images. This technique would eliminate or minimize contributions from noise or other perturbations (such as edges which impinge on the object of interest from the background) which momentarily disrupt the outline shape. Temporal smoothing could be accomplished by imposing a running average on outline coordinates and/or limiting the rate of derivatives of outline coordinates.

Once the segmentation of the first image is complete, the region of interest is repositioned in the image plane taking the segmentation map locus (final snake position or outline) as the center of the repositioned region of interest. (This step is represented by block 415 in Fig. 3.) Preferably the recentering of the region of interest may be performed adaptively along the segmentation map, in the sense that for a given point along the segmentation map the region of interest is not repositioned unless the segmentation reflects a high degree of confidence that the object boundary was properly found. In other words, the region of interest may only be re-centered for points of the segmentation map at which a rather definite edge was found. At other portions of the segmentation map, the shape of the region of interest remains unchanged. The process of re-centering the region of interest may use the ROI width metric to set the borders of the region of interest relative to the image segmentation map. The width metric which is employed at any given point on the image segmentation map is the width metric which was originally assigned to the corresponding point on the snake obtained from the ROI width when the snake was at its initial position at the center of the ROI as drawn by the operator. The re-centering may alternatively use one of a variety of other techniques including morphing and patch displacement interpolation.

Once the ROI has been repositioned to be centered on the final outline position, the ROI products (distance and relief map) may be recomputed to improve the modulation of the final key shape. The EROI (Extended) ROI 236, referred to in connection with Fig. 11, is repositioned along with the ROI.

Blocks 416 and 417 in Fig. 3 represent processes in which the operator is able to provide additional input to the boundary finding procedure, beyond the initial indications of the ROI provided by the operator at blocks 410 and 411.

Block 416 represents an operator adjustment to one or both of the ROI's designated at steps 410 and 411. The operator adjustment of the ROI(s) may be performed iteratively after block 414 and/or block 412. The operator may use the same drawing tools referred to in connection with blocks 410 and 411 to add to the previously designated ROI(s).

In addition, it is preferred to also provide an erase function whereby the drawing tool, when applied to the region of interest, causes the region of interest to be erased. If the erase tool enters the region of interest from the adjoining region designated as the outside, the erased part of the region of interest is joined to the outside region. Conversely, if the erase tool enters the region of interest from the adjoining region designated to be the inside, the erased portion of the region of interest is added to the inside region.

Block 417 represents an option provided to the human operator to permit adjustment of the segmentation map (outline). As illustrated indicated at 300 in Fig. 21, the operator may select an "adjust outline" option. Upon selecting this option, the operator is provided with a software drawing tool having quite a narrow width. By means of this drawing tool, the operator can use the drawing device 122 (Fig. 2) to erase and redraw portions of the outline 298 to make corrections in the segmentation map generated by the machine image analysis process. Boundary Finding in Dynamic Images

The process of Fig. 3 next turns to segmenting a second image in the sequence of images. For that purpose, and as indicated block 419, there must be provided an indication of the motion of the object of interest (or, more precisely, of the boundary of the object). To provide the motion indication, actual motion from the first image to the second image to be segmented may be measured (e.g., via optical flow techniques). If no motion is detected, the outline and ROI from the previous image may be used without change. Alternatively, motion of the object boundary may be estimated (e.g., via extrapolation techniques). If motion is to be estimated, the locus of the boundary outline must have been established for at least two images prior to the image to be segmented; accordingly, the operator may be required to manually indicate the ROI (block 410) for two or more images at the beginning of each scene if motion projection (extrapolation) is employed.

Motion extrapolation techniques involve developing a measure of gross geometric parameters of the outline such as the centroid, size, differential scale factors, and rotational orientation. These geometric parameters are then used to predict the future position of an object based on prior positions using a model of

Newtonian mechanics. Velocity and acceleration in x and y directions can be tracked by examining the movement of the centroid and the same factors in the z direction can be tracked on the basis of change in size. Angular velocity and acceleration about the major, minor and z axis may also be measured by examining changes in differential scale and rotational orientation. All of these motions may be extrapolated based on constant velocity, acceleration, or rate of change of acceleration ("jerk").

Motion measurement may be performed in accordance with any one of a number of techniques. In a preferred embodiment the optical flow technique of Horn and Schunck is used. This technique is described in: Horn, B.K.P., and Schunck, B.G., "Determining Optical Flow", Artificial Intelligence. 17, pp. 185-204 (1981).

As reported in the literature, many optical flow techniques are unable to measure high velocities, unless a supplemental technique, known as a pyramid, is employed. The latter technique uses multi-scale and multi-resolution flow measurements. Filtered and subsampled (reduced) versions of the images are processed to extract high velocity components, which are then passed down for refinement to higher resolution versions of the image.

In a preferred embodiment of the invention, conventional optical flow techniques have been modified by applying the same to both luminance and color information in an extended region of interest (EROI). In this embodiment, the image information in the EROI is normalized before applying optical flow detection processing. According to a preferred embodiment of the invention, motion measurement or estimation is constrained to be carried out within a portion of the image plane which corresponds to the ROI and EROI, as taken together and after repositioning in accordance with block 415. It was noted that the EROI is also used as an area in which key signal modulation (e.g., a "softness" function) may occur. However, it is also contemplated to provide separate extended ROI's, respectively, for key signal modulation and motion measurement/estimation.

On the basis of the motion indication obtained at block 419, the outline obtained in the first image at block 414 is reshaped and relocated to reflect the motion of the object between the first and second images. Then the ROI(s) are repositioned so as to be centered on the repositioned outline (block 418). The image segmentation device now has an indication of where to find the object boundary in the second image, and may now proceed with the processes of blocks 412-415 with respect to the second image, so that the second image can be segmented very accurately without operator input. After segmentation of the second image is complete, steps 419 and 418 may be carried out to prepare for segmentation of a third image. Indeed, the loop of steps 412, 413, 414, 415, 419, 418 may be carried out sequentially and automatically to segment a large number of images which make up a video clip. It should be understood that the outlines used for object tracking in accordance with steps 418 and 419 may differ in some respects from the outlines used to generate key signals for individual images. For example, the latter outlines may reflect operator adjustments that are not applied to the outlines used for tracking.

It is contemplated to apply the above-described process of finding a boundary in a sequence of dynamic images to a recorded video clip or to a live sequence of video images, or to the output of a telecine or other source of video images. It is also contemplated to perform the process in real time, or slower than real time, or faster than real time.

According to a preferred embodiment of the invention, the image segmentation device stores the ROI's, the outlines, unmodified key signals for each image, key modification settings (and also color correction data, if the image segmentation device itself is arranged to perform color correction) for each image in a video clip as the image segmentation process proceeds through the clip, either with or without intervention by the operator. After the processing of the clip is complete, the operator may review the sequences of data that were generated and stored for the clip. The operator can select outlines, ROI's, key modification settings and so forth for adjustment. The adjustments may be applied to an individual image or to a selected number of images subsequent to the image for which the adjustment is made. Depending on what adjustments are made, it may only be necessary to perform a subset of the image segmentation and object tracking processes described above. The data referred to in this paragraph, whether prior to or after adjustment, may be archived to provide a complete record of the image processing applied to the video clip.

Output Signal Options

Reference is now made to Fig. 4, which illustrates other processes carried out by the image segmentation device, primarily in regard to providing output signals based on the final boundary outline(s) produced at block 414. One user-controlled option, represented by block 420 in Fig. 4 and actuated through a slide bar 306 shown in Fig. 19, allows the operator to increase or decrease the size of the outline (Fig. 20). That is, the operator may use the slide bar 306 to increase or decrease the amount of space (indicated as white in Fig. 20) which is inside the outline. The direction in which the key mask is adjusted in response to the operator's input is determined, at each point along the outline, on the basis of the ROI relief map which was referred to above (and was calculated, e.g., as a derivative in the x and y directions of the ROI distance metric). In this way, even the arbitrarily shaped and highly irregular key masks produced by the techniques of the present invention can be appropriately resized without significant distortion. In general, the operator will decide whether to enlarge or reduce the size of the outline, or to leave it unchanged, based on factors such as relative brightness or darkness of the object of interest relative to the background, color contrast between the object and the background, the nature (e.g., definiteness) of the boundary of the object, the type of post processing for which segmentation has been performed, and so forth. At block 421 an "area-fill" operation is performed for the area inside the (optionally re-sized) outline to generate a key signal to select the desired object, and to de-select the balance of the image (background). At this point, as represented by block 422, the operator is provided with a number of options for modifying the resulting key signal.

According to one option, the operator can use the softness slide bar 310 (Fig. 19) to adjust softness at the edge of the key mask, in accordance with conventional practices. As is known to those of ordinary skill in the art, when using the softness adjustment option, the operator varies the width of the transition region between a region of full keying (key = 1) and a non-keyed region (key = 0). That is, the key signal takes on values between zero and one in the transition region and hence exhibits a gradient. In Fig. 5A, a curve 312, having a conventional "S" shape, is indicative of a rather wide transition region, corresponding to a rather large degree of softness. When the softness is reduced, an "S" curve 314 is produced, defining a narrower transition region in which a somewhat steeper slope is present.

Fig. 22 is indicative of a key map to which a degree of softness has been applied, and may be compared with the hard key map of Fig. 20. Fig. 23 illustrates how the softened key map of Fig. 23 selects desired portions of the processed image.

A "clean up" option is also provided to the operator, which is accessible via slide bar 318 (Fig. 19).

The clean-up function allows the operator to increase or reduce the degree of softness toward the outside of the transition region without affecting the softness profile toward the inside of the transition region. As illustrated in Fig. 5B, the curve 312 again illustrates a conventional softness profile, whereas, at 320, the softness profile is "hardened" (i.e. given a steeper slope), in response to the operator's invocation of the clean up function, on the side toward an outside (background) region relative to the keyed portion of the image. As will be observed from Fig. 5B, the softness profile remains unchanged on the side of the profile which is towards the inside (keyed) region of the image. Thus, the slope of the gradient of the softness function is increased, by invocation of the cleanup function, toward the side adjacent to the outside region but not toward the side adjacent to the inside region. Still another option provided to the user is accessible via the slide bar 324 shown in Fig. 19 and may be referred to as an "edge-modulated softness" function. When the operator actuates this function, the image segmentation device causes the degree of softness to be varied along the perimeter of the key map on the basis of edge information which had previously been generated from the image information representing the displayed image. Thus, at points on the perimeter of the key map where a more definite edge was detected, the degree of softness is made lower, while at points on the key map where less edge definiteness was detected, the degree of softness is made greater. The adaptation or modulation of the degree of softness along the key map perimeter may be based on the edge detection data which is produced with respect to luminance or color data, or based on both. When this edge-modulated softness feature is invoked, the degree of softness provided at any particular point on the key map perimeter depends both on the pertinent edge detection data for that point and the setting of the slide bar, with the slide bar controlling the overall extent to which the softness is adjusted based on the edge detection data.

Once the operator has completed all adjustments to the key signal which he or she considers desirable, the resulting key signal is output from the image segmentation device 106 to the color corrector 104 (Fig. 1). The key signal is then used in the color corrector 104 as a window to select a portion of the corresponding image in which a color correction process is carried out. Because the key signal has been accurately matched to the shape of the desired object, by a unique interplay of human and machine intelligence, a highly satisfactory color correction process can be achieved. The key signal produced by the techniques of the present invention is also suitable for use in image compositing operations. One potential use of the key signal would be application to the original image to produce an image of the desired object alone. Conventional spill suppression techniques may then be applied to the isolated image. Such techniques are well known in the industry and serve to eliminate background color contamination from the edges of an isolated object. Both the keyed image and the combined key signal could be provided to a compositing system, either directly or after storage in the image segmentation device 106 or in the image source device 102. Supplemental Detail Finding

There will now be discussed supplemental detail finding operations carried out on the basis of the supplemental ROI designated at block 411 of Fig. 3. It will be recalled that the supplemental ROI was drawn at portions of the object of interest where the boundary was highly complex, say the fur on an animal, or the outer extent of branches and leaves on a tree. The generation of the key signal for images of this type may proceed by color or luminance keying techniques or by detecting spatial frequency characteristics within the supplemental ROI. According to a preferred color keying technique, the image segmentation device parses the color information for pixels that are "inside" the object (based on the inside region designated with respect to the primary ROI of block 410), and those which are outside. Then, for each pixel in the supplemental ROI, it is determined whether its color is one that is (a) only found among "inside" pixels; (b) only found among "outside" (background) pixels; or (c) found both inside and outside the object. Pixels falling within category (a) are considered to be part of the object; pixels falling within category (b) are excluded from the object; and those in category (c) are assigned to the object or not, depending on a variety of criteria, such as whether they are closest to category (a) or category (b) pixels. Differentiation between inside and outside content in the supplemental ROI may be based on a variable-sized local area adjacent to each pixel instead of the entire supplemental ROI. This local area is evaluated for each pixel or group of pixels and may be defined by a radial distance (circle) or a shape or shapes drawn by the operator. A similar keying process may be carried out based on luminance level distribution.

Another supplemental boundary-finding technique that may applied in conjunction with the supplemental ROI entails using a bandpass filter to detect higher frequencies, where the object has a more complex texture than the background. The detail key resulting from block 423 may be resized at block 424 by logically anding with a resized version of the outline key. In this way the operator can control the extent to which supplemental detail is found over the primary ROI. At block 425 the complex boundary key can be modified by use of some of the functions referred to at block 422, including the "clean-up" function and the conventional softness function.

Block 426 indicates that the detail key may be combined with the outline key. This may be done in a variety of ways, including selecting the maxima of the respective keys, or running each through a respective variable gain stage to obtain a weighted sum of the two keys. It is also contemplated to apply a variable weighting between the outline key and the detail key at particular portions of the image based on characteristics of the image and the ROI. It may be desirable to increase the weighting in favor of the detail key in areas where the confidence of the edge detection is low, indicating a poorly defined edge. It may also be desirable to increase the weighting in favor of the detail key where the ROI is relatively wide, since this too may indicate a poorly defined boundary for the object. The resulting combined key can be used for the same purposes as the outline key, including color correction and image compositing.

The balance of the processes indicated in Fig. 4 are concerned with converting the outline established at block 414 (Fig. 3) to other data forms which are useful in applications other than color correction. At block 427 a known technique is employed to form an approximation of the outline using Bezier curve spline segments. Optionally, as indicated at block 428, the splines created at block 427 may be automatically removed in regions where the edge data exhibited low confidence. In those regions, the operator would manually insert replacement splines to more accurately follow the object boundary. At further optional steps 429 and 430, respectively, temporal smoothing may be applied to the splines, and the operator is permitted to manually adjust the splines. The temporal smoothing step includes techniques such as temporal averaging or derivative rate limiting to insure that the splines move smoothly while tracking the object of interest in a dynamic image stream. In block 430, the operator can use well known techniques employed in computer animation and compositing such as moving spline control points and manipulating control point "handles" to modify parameters of spline segments.

The resulting spline data, as optionally modified at blocks 428-430, may be employed for applications such as compositing and computer animation.

Block 431 relates to outputting useful geometric data which describe the object of interest on the basis of the outline drawn at block 414. The geometric parameters used at block 419 for motion estimation are of interest if the system is to be used to synchronize external motion control equipment (not shown) with live or recorded images or to provide motion tracking information for creating animated or live action material which is to be matched with preexisting material. The geometric parameter data describing the position and orientation of the boundary outlines can be conveyed by standard interfacing techniques to the external equipment. Also, control points can be specified and tracked within or along the boundary outline for use in compositing applications.

Feature Discrimination Based on Adjacent Image Parameter Data Signatures

In the segmentation method described herein, a detected feature map (edges in a preferred embodiment) in a region of interest is derived from image parameter data (luminance and chrominance in a preferred embodiment), and features that are not in a central portion of the region of interest are suppressed by a bias function. In addition to or instead of using a bias function to suppress features that are unlikely to be of interest, it is contemplated to use a priori knowledge from a prior image or images (or based on operator input) to suppress features in an image in a video clip based on characteristics of adjoining pixels. Such a process may be implemented as follows.

Once the object boundary outline has been located and adjusted in a first image, adjacent image parameter data on the object side of the boundary is computed for each point along the boundary. This computation develops a "signature" of the adjacent image parameter data for the object of interest, and may be based on luminance data (e.g., average luminance over some distance toward the interior of the object), chrominance data, texture/spatial frequency content or some combination of these characteristics. The resulting signature data can be expected, in many cases, to differ from the corresponding characteristics of areas immediately outside of the object. Instead of the simple average luminance signature referred to above, other signatures are possible, including, for example, mean and variance of luminance and chrominance values within a neighborhood of the vector normal to the boundary outline. The contribution of each pixel within the neighborhood could be weighted inversely to the pixel's distance from the outline.

Signatures could also be generated for the pixel data outside of the boundary, or, respectively, both for inside and outside.

It will be recalled that the process for segmenting a video clip, as described above, calls for relocating the boundary outline for the prior image based on a measure of inter-image motion, and the region of interest is then re-centered on the new outline position. Then a detected feature map is computed for the current image in the re-centered region of interest. The signature data described in this section may be employed to discriminate among the detected features, suppressing those which do not have a similar signature.

Since the detected features may be distributed throughout the region of interest and the signature information is only defined at the location of the boundary outline, the signature information and normal vector directions must be diffused throughout the region of interest to permit evaluation of all detected features. This is accomplished by creating a "signature map" which consists of data arrays for the region of interest, one for each signature parameter and one to encode the normal vector direction. Points in these arrays which correspond to the boundary outline location are initialized from the corresponding signature information and vector direction of the pixels at the boundary outline. The signature map is then developed by a morphological dilation operation. Each new pixel resulting from the dilation inherits signature and vector direction information from the neighboring pixel or pixels of the previous iteration of the signature map. If there is only one neighboring pixel, then the same data values are carried over to the new pixel; if there is more than one neighboring pixel, the data of the neighboring pixels may be averaged to generate the data for the new pixel.

Once the signature map has been constructed, the detected features may be evaluated in terms of signature data and those lacking appropriate signatures may be suppressed. For each point in the ROI (or only those exhibiting at least a certain level of the detected feature (e.g., edge)), a signature calculation is performed, using the vector direction at the corresponding position in the signature map. The resulting signature data is compared to the signature data at the corresponding position in the signature map, and the feature is suppressed in inverse proportion to the degree of matching between the calculated signature data for the point and the corresponding signature map data. This would tend to suppress features found outside of the object. As a possible additional feature, the signature may also be calculated in the opposite to the normal direction (i.e. toward the "outside"), and the feature suppressed in proportion to the degree of matching with the signature map data. This would have the effect of suppressing features which have the same signature on both sides, which are likely to be inside the object of interest.

The signature map may be based on a temporal average of prior images instead of just one prior image. The operator may also be permitted to selectively enable or disable feature discrimination based on signatures for objects which have changing interior parameters such as varying illumination levels or changing colors. For such objects it may be preferable to discriminate features based only on the background signature.

Other Embodiments Referring to Fig. 11, it will be noted that, at the right side of the screen display, a rather large set of control options 332 is provided, permitting the operator to select among many screen displays of either intermediate data generated by the image segmentation device or final outputs such as the key signal which results from the image segmentation process. However, in a preferred commercial embodiment of the invention, it is contemplated to reduce the complexity of the user interface and to improve operability by limiting the screen display selection options to "Outline Key" (corresponding to Fig. 22), "Keyed Image" (Fig. 23), and "Outlined Image" (Fig. 19).

In addition to the novel, operator-guided automatic image segmentation techniques disclosed herein, it is also contemplated to incorporate in the image segmentation device 106 conventional features which allow the operator to perform segmentation by drawing and animating simple geometric shapes. Segmentation by these known techniques may be adequate when the object of interest itself has a rather simple shape. The shapes that may be selected by the operator may include ellipses and quadrilaterals and may be subjected to known geometric transform control functions such as size, position, rotation, aspect and trapezoid.

The resulting key signals could also be subjected to the types of key modification processes referred to above.

The animation of the selected (and possibly transformed) shapes can be carried out with conventional key frame and path techniques, which need not be described further. It is also contemplated to extended these conventional practices by attaching one or more simple geographic shapes to the outline generated for tracking purposes according to the inventive procedure described above. The permits a geometric shape to be located relative to a tracked object boundary or portion thereof. The invention has been described primarily in the context of a color correction system of the type used for film-to-tape or tape-to-tape post-production work. Specifically the invention has been described as a peripheral device to be connected to a color corrector to provide key signals to the color corrector. Nevertheless, many other applications of the teachings of the present invention are contemplated. The software processes described herein could be advantageously applied to and embodied in many other kinds of image manipulation equipment including video special effects devices and devices used for colorizing black-and-white motion pictures. It is also contemplated to include software embodying the present invention in commercially distributed software packages used for desktop publishing or for manipulating clip art images. The image segmentation capabilities of the invention are further applicable to image compositing operations and manipulation of still images generally (both color and black and white), including pre-press image processing. Another potential application of the present image segmentation techniques is in computer-aided-design and computer-aided-manufacturing software packages. The invention may also be used to perform image segmentation as an input to 3-D simulation processes. Software which embodies the present invention may be stored in 5 various types of digital memories such as RAM, hard disks, CD-ROM's and DVD's.

Previous discussion of Fig. 1 indicated that the key signals produced by the process of Figs. 4 and 5 are outputted from image segmentation device 106 to color corrector 104. However, a number of variations and alternatives are also contemplated. For example, the functions of the image segmentation and color correction blocks 106, 104 may be integrated in a single device. Moreover, the key signals, other data produced from the processes described herein, and/or processed images may be stored in the mass storage 116 (Fig. 2) of the image segmentation device 106 and/or in a storage device which serves as the image source 102 (Fig. 1). Further, if the color corrector does not have a capability for receiving an external key input, an external keyer may be connected after the color corrector to receive key signals from the image segmentation device, combining uncorrected image data from the image source and corrected image data from the color corrector on the basis of the key signal. In addition, the image segmentation device 106 may itself be arranged to perform keying operations on the images from the image source 102 and image signals from the color corrector, based on key signals which the segmentation device itself generates. Also, as has been stated above, the segmentation device 106 may itself perform both the keying and color correction operations.

Although particular illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, the present invention is not limited to these particular embodiments. Various changes and modifications may be made thereto by those skilled in the art without departing from the spirit or scope of the invention, which is defined by the appended claims.

Claims

What is claimed is:

1. A method of dynamically segmenting an image plane on the basis of features of a dynamic sequence of images displayed in the image plane, the method comprising the steps of: displaying on a display device a first image of the sequence of images; designating a region of interest in the image plane; applying an image segmentation algorithm to said first image to generate a first outline in said region of interest, the image segmentation algorithm being constrained to operate only within said region of interest; applying an algorithm to provide an indication of motion, between said first image and a second image of said sequence of images, of an object corresponding to said first outline; repositioning said region of interest on the basis of the indicated motion of said object; and applying said image segmentation algorithm to said second image to generate a second outline in said repositioned region of interest.

2. A method according to claim 1, wherein said indication of motion is a motion estimate projected from images generated prior in time to said second image.

3. A method according to claim 1, wherein said indication of motion is based on a measurement of motion between said first and second images.

4. A method according to claim 3, wherein said measurement of motion is based on a measurement of optical flow.

5. A method according to claim 1, wherein said image segmentation algorithm includes an edge detection algorithm.

6. A method according to claim 5, wherein said edge detection algorithm uses a snake driven by a gradient vector flow (GVF) field.

7. A method according to claim 6, wherein said GVF field is generated by normalizing and then diffusing edge information generated by a Sobel edge detector.

8. A method according to claim 1, further comprising the steps of: prior to the first applying step recited in claim 1, designating a first region bordered by said region of interest as an inside region and designating a second region bordered by said region of interest as an outside region; for each pixel in said outline generated in said first applying step recited in claim 1, calculating an ROI width metric as the sum of (a) a distance from the respective pixel to a nearest pixel in the inside region and (b) a distance from the respective pixel to a nearest pixel in the outside region; and prior to said repositioning step, recentering said region of interest relative to said outline by using width metrics calculated for the pixels of the outline.

9. A method of segmenting an image plane on the basis of features of an image displayed in the image plane, the method comprising the steps of: displaying the image on a display device; using a drawing device to superimpose a free-hand drawing figure on the image displayed on the display device, said free-hand drawing figure defining a band-shaped region of interest in the image plane formed as the locus of a circle moved in an arbitrary manner; applying an image analysis algorithm to the displayed image, said image analysis algorithm being constrained to operate only within said region of interest defined by said free-hand drawing figure, said image analysis algorithm operating without reference to any portion of said image outside of said region of interest; and segmenting the image plane on the basis of a result provided by application of said image analysis algorithm.

10. A method according to claim 9, wherein said image analysis algorithm includes an edge detection algorithm which produces edge information and further comprising the step of de-emphasizing components of the edge information that are not located at a central portion of said region of interest.

11. A method according to claim 9, wherein said image analysis algorithm includes: applying edge detection processing to luminance information in said region of interest to produce luma edge information; applying edge detection processing to color information in said region of interest to produce color edge information; and combining the luma edge information and the color edge information to produce combined edge information.

12. A method according to claim 9, wherein said segmenting step comprises: computing an external force field within said region of interest; initializing a snake within said region of interest; iteratively repositioning the snake within said region of interest on the basis of the external force field until a convergence test is satisfied; and segmenting the image plane on the basis of a repositioned snake which satisfies the convergence test.

13. A method according to claim 9, further comprising the step of designating a region bordered by said drawing figure as an inside region.

14. A method according to claim 13, further comprising the step of using the drawing device to erase a portion of said drawing figure, an area corresponding to said erased portion being 5 added to the inside region.

15. A method according to claim 9, further comprising the step, performed after said drawing figure has been superimposed on said displayed image, of using the drawing device to increase the width of said region of interest at a selected portion of said region of interest.

16. A method according to claim 9, wherein said drawing figure is indicated on the display device by altering a luminance level in the locus of the drawing figure.

17. A method according to claim 9, wherein said drawing figure is indicated on the display device by altering a magnitude of at least one color component in the locus of the drawing figure.

18. A method according to claim 9, further comprising the steps of: designating a first region bordered by said drawing figure as an inside region and designating a second region bordered by said drawing figure as an outside region; and calculating a distance metric for each pixel in the region of interest, the distance metric for the respective pixel corresponding to the ratio of (a) a distance between the respective pixel and a nearest pixel in the inside region, relative to (b) a distance between the respective pixel and a nearest pixel in the outside region.

19. A method according to claim 18, further comprising the steps of: generating a key map on the basis of a result provided by application of said image analysis algorithm; generating a vector field in said region of interest on the basis of a slope function derived from the distance metrics for the pixels of said region of interest; receiving a command to resize said key map; and resizing said key map on the basis of said vector field.

20. A method of applying an edge detection algorithm to color information corresponding to pixels arrayed in an image plane, the method comprising the step of: calculating, for each of said pixels, a distance value in color space between a pair of neighboring pixels, said color space being defined by plural color axes.

21. A method of providing a softness function with respect to a key signal, the method comprising the steps of generating a key boundary by means of an edge detection algorithm, said algorithm generating for each pixel on said key boundary edge-degree data indicative of a degree of definiteness of an edge at the respective pixel; and adjusting a softness function along said key boundary in dependence on said edge-degree data.

22. A method according to claim 21, wherein said softness function is adjusted such that the softness function varies inversely with said edge-degree data.

23. A method of adjusting a softness function with respect to a key signal, the method comprising the steps of: generating a key boundary; designating a first region bordered by the key boundary as an inside region and designating a second region bordered by the key boundary as an outside region; applying a softness function to the key boundary to generate a gradient in a key signal between said inside region and said outside region; and adjusting said softness function in response to a control signal input by a human operator so that a slope of said gradient is increased on a side adjacent said outside region without changing a slope of said gradient on a side adjacent said inside region.

24. A method of generating an edge information field, the method comprising the steps of: applying an edge detector algorithm to pixel information arrayed in a region of interest in an image plane, said edge detector algorithm generating edge information from said pixel information; and applying a bias function to said edge information to emphasize components of said edge information at a central portion of said region of interest, thereby producing biased edge information.

25. A method according to claim 24, further comprising the step of computing an edge gradient vector field from the biased edge information.

26. A method of performing an edge detection algorithm which employs a snake driven by force field data, the method comprising the steps of: iteratively repositioning the snake in response to the force field data; and after each repositioning of the snake, applying a convergence test which includes: determining for each control point in the snake an amount by which said control point was moved in the respective repositioning in a direction normal to the length of the snake at said control point; averaging the determined amounts; and comparing the averaged amounts to a threshold.

27. An image signal processing apparatus, comprising: a memory for storing image information which represents a dynamic sequence of images; a display device for displaying in an image plane at least a first image of the dynamic sequence of images; a processor connected to the memory; and a drawing device connected to the processor; wherein the processor is programmed to: cause the display device to display said first image of the dynamic sequence of images; receive signals generated by the drawing device; on the basis of the received signals, superimpose a drawing figure on the first image displayed by the display device, said drawing figure defining a region of interest in the image plane; apply an image segmentation algorithm to said first image to generate a first outline in said region of interest, the image segmentation algorithm being constrained to operate only within said region of interest; apply an algorithm to provide an indication of motion, between said first image and a second image of said sequence of images, of an object corresponding to said first outline; reposition the region of interest on the basis of the indicated motion of the object; and apply the image segmentation algorithm to the second image to generate a second outline in the repositioned region of interest.

28. An apparatus according to claim 27, wherein said drawing figure defining said region of interest is a free-hand drawing figure which defines a band-shaped region formed as the locus of a circle moved in an arbitrary manner.

29. An apparatus according to claim 27, wherein said image segmentation algorithm includes an edge detection algorithm.

30. An apparatus according to claim 27, wherein said drawing device includes at least one of a mouse and a tablet/stylus arrangement.

31. An image signal processing apparatus, comprising: a memory for storing image information which represents an image; a display device for displaying in an image plane the image represented by the image information; a processor connected to the memory; and a drawing device connected to the processor; wherein the processor is programmed to: cause the display device to display the image represented by the image information; receive signals generated by the drawing device; on the basis of the received signals, render a free-hand drawing figure on the display device, said drawing figure defining a band shaped region formed as the locus of a circle moved in an arbitrary manner; apply an image analysis algorithm to the image information, said image analysis algorithm being constrained to operate only within the region of interest defined by the free-hand drawing figure, said image analysis algorithm operating without reference to any portion of said image outside said region of interest; and segment the image plane on the basis of a result of said application of said image analysis algorithm.

32. An apparatus according to claim 31 , wherein said processor segments the image plane by: computing a gradient vector flow field from edge information within said region of interest; initializing a snake within said region of interest; iteratively repositioning the snake within said region of interest until a convergence test is satisfied; and segmenting the image plane on the basis of a repositioned snake which satisfies the convergence test.

33. An apparatus according to claim 31, further comprising: a storage device connected to the memory for providing image information to be stored in the memory; and a color correction device connected to the processor; wherein the processor segments the image plane to generate a key signal and transmits the key signal to the color correction device.

34. An apparatus according to claim 31 , wherein said processor is programmed to permit a user of the apparatus to designate, by means of said drawing device, a first region bordered by said region of interest as an inside region and second region bordered by said region of interest as an outside region.

35. An apparatus according to claim 31, wherein the image analysis algorithm includes an edge detection function.

36. A digital memory which stores a program for instructing a processor to segment an image plane on the basis of features of a dynamic sequence of images displayed in the image plane, the program including instructions for: causing a first image of the sequence of images to be displayed on a display device; designating a region of interest in the image plane; applying an image segmentation algorithm to the displayed image, to generate a first outline in the region of interest, the image segmentation algorithm being constrained to operate only within said region of interest; applying an algorithm to provide an indication of motion, between said first image and a second image of said sequence of images, of an object corresponding to said first outline; repositioning said region of interest on the basis of the indicated motion of said object; and applying said image segmentation algorithm to said second image to generate a second outline in said repositioned region of interest.

37. A digital memory which stores a program for instructing a processor to segment an image plane on the basis of features of an image displayed in the image plane, the program including instructions for: causing the image to be displayed on a display device; receiving signals from a drawing device and, on the basis of the received signals, superimposing a free-hand drawing figure on the image displayed on the display device, said free-hand drawing figure defining a band-shaped region of interest formed as the locus of a circle moved in an arbitrary manner; applying an image analysis algorithm to the displayed image, said image analysis algorithm being constrained to operate only within said region of interest defined by said free-hand drawing figure, said image analysis algorithm operating without reference to any portion of said image outside of said region of interest; and segmenting the image plane on the basis of a result provided by application of said image analysis algorithm.

38. A method of dynamically segmenting an image plane on the basis of features of a dynamic sequence of images displayed in the image plane, the method comprising the steps of: applying an image segmentation algorithm to a first image of the sequence of images to generate an outline in said image plane; detecting at least one characteristic of said first image in at least one area adjacent to said outline to generate signature map data; applying an algorithm to provide an indication of motion, between said first image and a second image of said sequence of images, of an object corresponding to said outline; repositioning said outline in said image plane on the basis of the indicated motion of said object; detecting features of said second image in a region of interest defined around said repositioned outline; detecting said at least one characteristic of said second image, in at least one area adjacent to said detected features, to generate signature data for said second image; comparing said signature data for said second image with said signature map data; and selectively suppressing components of said detected features on the basis of a result of said comparing step.

39. A method according to claim 38, wherein said step of detecting features of said second image includes detecting edges in said second image.

40. A method according to claim 38, wherein said signature map data includes direction data indicative of a direction normal to said outline generated by said image segmentation algorithm; and said direction data is used in said step of detecting said at least one characteristic of said second image.