US20020181786A1 - Intelligent systems and methods for processing image data based upon anticipated regions of visual interest - Google Patents

Intelligent systems and methods for processing image data based upon anticipated regions of visual interest Download PDF

Info

Publication number
US20020181786A1
US20020181786A1 US10/145,611 US14561102A US2002181786A1 US 20020181786 A1 US20020181786 A1 US 20020181786A1 US 14561102 A US14561102 A US 14561102A US 2002181786 A1 US2002181786 A1 US 2002181786A1
Authority
US
United States
Prior art keywords
image
interest
images
regions
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/145,611
Inventor
Lawrence Stark
Claudio Privitera
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/145,611 priority Critical patent/US20020181786A1/en
Publication of US20020181786A1 publication Critical patent/US20020181786A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/162User input
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20052Discrete cosine transform [DCT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20064Wavelet transform [DWT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20164Salient point detection; Corner detection

Definitions

  • the present invention relates generally to image processing systems and, more particularly, to systems and methods for processing image data based upon predetermined regions of human visual interest.
  • the Scanpath Theory of human vision suggests that a top-down, internal cognitive model of what a person sees when actively looking at an image guides active eye movements of the person and controls and/or influences the person's perception of the image being viewed.
  • Noton and Stark suggest that eye movements utilized in visually examining an image are generated based at least in part upon an internal cognitive model that has been developed by a person through experience.
  • the term “top down processing” as used herein denotes image processing that proceeds with some assumed knowledge regarding the type of image being viewed or image data being analyzed.
  • the Scanpath Theory posits that when a person views an image, the eye movements of the person will follow a pattern that is premised upon knowledge of the type of image that is being viewed and/or similar types of images.
  • the Scanpath Theory recognizes that active eye movements comprise an essential part of visual perception, because these eye movements carry the fovea, a region of high visual acuity in the retina, into each part of an image to be processed.
  • the Scanpath Theory posits that an internal cognitive model drives human eye movements in a repetitive, sequential set of saccades and fixations (“glances”) over specific regions-of-interest (“ROIs”) in a scene, with the subconscious aim of confirming the top-down, internal cognitive model—the “Mind's Eye”, so to speak.
  • bottom up processing is used herein to denote processing methods that assume no knowledge of an image being viewed or image data being processed.
  • Structural Methods are based on an assumption that images have detectable and recognizable primitives, which are distributed according to some placement rules—examples of prior art methods that use such an approach are matched filters.
  • U.S. Pat. No. 5,535,013, entitled “Image Data Compression and Expansion Apparatus, and Image Area Discrimination Processing Apparatus Therefor,” teaches a method of image data compression in which an image is first divided into square pixel blocks and then encoded using an orthogonal transform. This is a statistical method. The encoding process is based upon a discrete cosine transform, and is thus a JPEG algorithm. Using the coefficients of the discrete cosine transform, the method taught by U.S. Pat. No. 5,535,013 discriminates blocks containing text from blocks containing general, non-text dot images. Then; a selective quantization method is used to identify different quantization coefficients for text blocks and non-text blocks.
  • the present invention is directed to systems and methods for image processing that utilize a cognitive model stored in memory to identify regions within an image that correlate with previously determined regions of visual interest for a given type of image or type of image data being processed.
  • systems and methods in accordance with the present invention may select algorithms for processing collections of images by comparing algorithmic region of interest (aROI) data to stored human visual region of interest (hROI) data to select an optimal algorithm or group of algorithms to be used in transforming data comprising the collection or collections of images.
  • the selected algorithms may then be used, for example, in data compression, image enhancement or database query functions.
  • the present invention is directed to systems and methods that utilize conventional image processing algorithms in combination with innovative clustering, sequencing, comparing and parsing techniques to predict loci of human fixations within an image or within collections of images for the purposes of, for example, data compression, image enhancement and image database query functions.
  • empirical analysis reveals that systems and method in accordance with the present invention enable a prediction of human fixation loci that is comparable in measure to the ability of one human to predict the loci of eye movements of other persons viewing an image.
  • systems and methods in accordance with the present invention may detect regions of visual interest (ROIs) within an image based upon stored characteristic data representative of human visual perception.
  • ROIs regions of visual interest
  • algorithmic regions of interest (aROIs) having a high, or relatively high, correlation with human regions of visual interests (hROIs) may be developed for an image or collection of images, and thereafter an image or collection of images may be saved within a system using selected portions of the original picture (i.e., aROIs) as identification data. Then, the selected portions of the picture (i.e., saved aROI data) may be used in performing a query search.
  • the query search may proceed, for example, by comparing saved aROIs in a database with ROIs specified by the system operator. Processing image data in this fashion should provide for substantial reductions in image processing time. Further, it will be appreciated that, through the use of processing algorithms and methodologies in accordance with the present invention, it is possible to take into consideration more complex features of an image, not just indications of color, shape and the like.
  • a system for compressing and processing collections of images in accordance with one form of the present invention may comprise, for example, means for transforming image data representative of a particular image, collection of images or type of image into a domain of “visual relevance”, for example, using a database of image processing transformation functions; means for obtaining a set of algorithmic regions of interest (aROIs) from a transformed image, for example, by thresholding; means for clustering local maxima from the transformed image into a second set of only a few, very relevant algorithmic regions of interest (aROIs), such that the most relevant algorithmic regions of interest (aROIs) are properly distributed over the image; means for comparing the identified algorithmic regions of interest (aROIs) with predetermined human visual regions of interest (hROIs) to select an optimal image processing transformation function; and means for using the selected optimal image processing transformation function to compress the remainder of images with a collection or collections of images.
  • a system in accordance with the present invention may comprise means for using the algorithmic regions of interest (aROI) from
  • systems and methods in accordance with the present invention can be utilized to process very large collections of data including, for example, large collections of pictures, scenes and works of art. It also will be appreciated that systems and methods in accordance with the present invention may be utilized, for example, to compress, search and/or enhance images ranging from natural and constructed landscapes and “cityscapes”, to groups of persons and animals and objects, and to single portraits and still lives.
  • FIG. 1 is an illustration of an image of the Mona Lisa and a transformation of that image.
  • FIG. 2 shows how a thresholding algorithm may be applied to the transformed image of FIG. 1 to produce image local maxima data.
  • FIG. 3 shows how an original image may be transformed, how local maxima may be identified within the transformed image, how the local maxima may be clustered and, finally, how any resulting clusters may be quantified based upon local maxima data.
  • FIG. 4 shows how a transformed image may be processed to obtain local maxima, how the local maxima may be iteratively clustered, and how regions of interest may be identified based upon quantified cluster data.
  • FIG. 5 shows how an image may be transformed and represented by a 3-dimensional pixel intensity diagram, and how quantified cluster data may be represented within the 3-dimensional pixel intensity diagram.
  • FIG. 6 shows how an image may be transformed and local maxima obtained without the utilization of a clustering algorithm.
  • FIG. 7 comprises a representation of mathematical image processing transformation functions, grouped by category, that may be used in image processing systems.
  • FIGS. 8 a and 8 b illustrate how human eye movement may be utilized to identify human visual regions of interest (hROI) within an image.
  • FIG. 9 provides a second illustration of how human eye movement may be utilized to identify human visual regions of interest (hROI) within an image.
  • FIG. 10 shows how algorithmic regions of interest (aROIs) may be compared to human visual regions of interest (hROIs) to obtain a quantitative measurement between the various regions of interest.
  • aROIs algorithmic regions of interest
  • hROIs human visual regions of interest
  • FIG. 11 comprises a table showing a correlation between regions of interest identified by various transformation algorithms and regions of interest obtained through monitoring the eye movement of various human subjects.
  • FIGS. 12 a - d show how anticipated human visual regions of interest (hROIs) may be used within image compression techniques.
  • FIG. 1 those skilled in the art will appreciate that, using conventional transformation functions, it is possible to convert an original image 10 to a transformed image 12 . Similarly, it is possible to apply thresholding criteria to the transformed image 12 to obtain a mapping 14 comprising a plurality of maxima loci within the transformed image. This is shown, for example, in FIG. 2.
  • the information content of a generic picture can be identified by different image parameters which in turn can be identified by relevant image processing algorithms.
  • applying algorithms to a picture means to map that image into different domains, where for each domain a specific set of parameters is extracted. After the image has been processed, only the loci of the local maxima from each domain are retained; these maxima are then clustered in order to yield a limited number of ROIs.
  • Exemplary algorithms that have been studied include:
  • the second factor represents a simplified notion of symmetry: ⁇ 1 and ⁇ 2 correspond to the angles of the gray level intensity gradient of the two pixels (i 1 ,j 1 ) and (i 2 ,j 2 ). The factor achieves the maximum value when the gradients of the two points are oriented in the same direction.
  • the guassian represents a distance weight function which introduces localization in the symmetry evaluation.
  • 3-W, discrete wavelet transform is based on a pyramidal algorithm which split the image spectrum into four spatial frequency bands containing horizontal lows/vertical lows (ll), horizontal lows/vertical highs (lh), horizontal highs/vertical lows (ll) and horizontal highs/vertical highs (hh).
  • the procedure is repeatedly applied to each resulting low frequency band resulting in a multiresolution decomposition into octave bands.
  • the process of the image wavelet decomposition is achieved using pair of conjugate quadrature filters (CPFs) which acts as a smoothing filter (i.e. a moving average) and a detailing filter respectively.
  • CPFs conjugate quadrature filters
  • ⁇ right arrow over (m) ⁇ (x,y) is the average orientation vector evaluated within the neighborhood of 7 ⁇ 7 pixels.
  • the first factor of the equation achieves high values for big differences in orientation between the center pixel and the surroundings.
  • the second factor acts as a low-pass filter for the orientation feature.
  • Michaelson contrast is most useful in identifying high contrast elements, generally considered to be an important choice feature for human vision.
  • Michaelson contrast is calculated as
  • 9-H, discrete cosine transform, DCT, introduced by, is used in several coding standards as, for example, in the JPEG-DCT compression algorithm.
  • the image is first subdivided into square blocks (i.e. 8 ⁇ 8); each block is then transformed into a new set of coefficients using the DCT; finally, only the high frequency coefficients, the ones that are instead discarded in the JPEG algorithm, are retained to quantify the corresponding block.
  • FIG. 3 if desired, it is possible to convert an original image 10 to a transformed image 12 , to process the transformed image 12 to obtain a mapping 16 of maxima loci that are grouped by cluster and, finally, to quantify the value of the local maxima within the clusters to obtain quantified data indicative of algorithmic regions of interest (aROIs) 18 a - g within the original image 10 .
  • aROIs algorithmic regions of interest
  • FIG. 4 shows how a transformed image 20 may be processed to obtain a mapping 22 of local maxima, and how clustering algorithms may be applied in an iterative fashion to the mapping 22 of local maxima to develop, for example, iterative clusters 24 - 28 of local maxima and to identify a plurality of algorithmic regions of interest (aROIs) 30 a - g within an original image 32 .
  • aROIs algorithmic regions of interest
  • a clustering procedure in accordance with the present invention may proceed as follows.
  • the initial set of local maxima is clustered connecting local maxima by gradually increasing an acceptance radius for their joining.
  • approximately 100 initial local maxima may be reduced, for example, to nine regions or clusters by setting a termination decision to end the clustering process at the prescribed number of domains.
  • the clustered domains can be assigned values depending upon the value of the highest local maxima incorporated into that domain or, alternatively, based on the number of local maxima included within a cluster.
  • Those skilled in the art will appreciate that other criteria may also be utilized.
  • each image processing algorithm contributes to the intensity of its selected parameter to find local maxima and values of resulting clustered ROI domains.
  • the clustering algorithm may comprise an eccentricity weighting algorithm, where lower local maxima that are eccentrically located can be selected to form a domain.
  • a transformation function may be applied to an original image 34 to convert the original image to a transformed image 36 and how, thereafter, a 3-dimensional pixel intensity diagram 38 may be developed from the transformed image 36 .
  • the 3-dimensional pixel intensity diagram 38 provides location data along the x and y axes of the graph and pixel intensity values along the z axis of the graph.
  • the height of the 3-dimensional pixel intensity diagram 38 at a particular x,y pixel location may represent the pixel intensity or local maxima value at that location.
  • FIG. 5 also provides an illustration of a plurality of final cluster locations 40 a - g defined within the 3-dimensional pixel intensity diagram 38 .
  • FIG. 6 that figure shows how a limited number of local maxima 42 a - g may be identified within a transformed image 44 and may be mapped onto an original image 46 .
  • aROIs algorithmic regions of interest
  • FIG. 7 those skilled in the art will appreciate that numerous image processing transformation functions may be used in accordance with the present invention to identify algorithmic regions of interest (aROIs) within an image.
  • aROIs algorithmic regions of interest
  • Several such algorithms may comprise a database represented by FIG. 7, and in the example provided an entropy algorithm is used. It will be appreciated that an entropy transformation algorithm was used to process the image 10 provided in FIG. 1.
  • FIGS. 8 a and 8 b those figures illustrate how, by mapping human eye movements, human visual regions of interest (hROIs) 50 a - g may be identified within an image. More specifically, FIG. 8 a shows how human fixation loci 52 may be developed as a person observes an image. It will be noted that the human fixation loci 52 illustrated in FIG. 8 a are developed by monitoring the amount of time that the human eye focuses on particular loci within the image. Turning now to FIG. 8 b , it will be seen that, by tracking human eye movements to identify the human fixation loci 52 , and by applying fixation identification procedures to those loci 52 , it is possible to identify human visual regions of interest (hROIs) within the image 10 .
  • hROIs human visual regions of interest
  • FIG. 9 shows how raw data, on the left, indicative of human eye movement may be parsed to identify human regions of visual interest (hROIs), on the right, within an image 54 .
  • hROIs human regions of visual interest
  • the present invention provides for a correlation of algorithmic regions of interest (aROIs) and human visual regions of interest (hROIs), such that transformation algorithms to be applied to particular types of images or collections of images may be selected based upon a predetermined correlation between aROIs and hROIs for a particular type of image or collection of images to be processed.
  • aROIs algorithmic regions of interest
  • hROIs human visual regions of interest
  • the internal cognitive model data (i.e., aROI and hROI correlation data) may then be used to select appropriate image processing transformation functions for utilization in processing image data such that any algorithmically determined regions of interest (aROIs) may have a high, or relatively high, likelihood of corresponding to a set of human visual regions of interest (hROIs) within the images or collection(s) of images being processed.
  • aROIs algorithmically determined regions of interest
  • hROIs human visual regions of interest
  • a table showing a correlation between algorithmic regions of interest (aROIs) identified by four exemplary transformation functions and human visual regions of interest (hROIs) developed through monitoring the eye movements of four human subjects is provided in FIG. 11.
  • Correlations of the type described above may be established as follows. ROI loci selected by different image processing algorithms and those defined by human eye movement fixations are first compared. Further, any comparison of aROIs to hROIs preferably proceeds by obtaining two sets of ROIs, one aROI and one hROI, and clustering the two sets of ROIs using a distance measure derived from a k-means pre-evaluation. This evaluation preferably determines regions defining coincidence and non-coincidence based upon distances between the respective loci of the two sets of ROIs. The final selection of joined-ROIs then enables the calculation of a similarity metric, S p , to determine how close the two sets of ROIs were.
  • S p similarity metric
  • an index of similarity may be utilized to describe how closely two sets of ROIs resemble one another.
  • an index of similarity may be utilized to describe how closely two sets of ROIs resemble one another.
  • FIGS. 12 a - d it can be seen how algorithmic regions of interest (aROIs) having a high correlation with anticipated human visual regions of interest (hROIs) may be used to enhance the performance of various data compression techniques.
  • aROIs algorithmic regions of interest
  • FIG. 12 b it will be appreciated that an original image 60 is provided in FIG. 12 a , and that algorithmic regions of interest (aROIs) 62 a - g having a relatively high correlation with predetermined human visual regions of interest (hROIs) for the original image 60 are shown in FIG. 12 b .
  • FIG. 12 b it also can be seen that, when the selected algorithmic regions of interest (aROIs) 62 a - g are incorporated into a compressed image 64 such as that shown in FIG. 12 d , substantial improvements in compressed image quality may be achieved over conventional compressed images, such as the compressed image 66 shown in FIG. 12 c.
  • a region-of-visual-interest image processing (ROVIIP) system in accordance with a presently preferred form of the present invention preferably performs six basic functions or processes. These include image transformation and thresholding, transformed image clustering, human visual region of interest (hROI) identification and/or storage, similarity index generation, optimal transformation algorithm selection, and optimal transformation algorithm utilization.
  • image transformation and thresholding include image transformation and thresholding, transformed image clustering, human visual region of interest (hROI) identification and/or storage, similarity index generation, optimal transformation algorithm selection, and optimal transformation algorithm utilization.
  • Step 1 Image Transformation and Thresholding
  • the first step generally performed by a region-of-visual-interest image processing (ROVIIP) system generally requires the transformation of a sample image from a collection of images to be processed. Transformations of the sample image(s) are performed using a plurality image transformation functions stored within a database. These transformations yield a respective plurality of transformed images, and a thresholding function is preferably applied to the transformed images to identify respective sets of local maxima within the transformed images. Preferably sets of approximately 100 local maxima are identified for each transformed image.
  • ROIIP region-of-visual-interest image processing
  • Step 2 Transformed Image Clustering
  • clustering algorithms preferably are applied iteratively to the respective sets of local maxima to identify respective smaller sets of relevant loci.
  • the smaller sets of relevant loci preferably number 10 or less and are referred to herein as algorithmic regions of interest (aROIs).
  • Step 3 Human Visual Region of Interest Identification and/or Storage
  • hROIs human visual regions of interest
  • the eye movements of several subjects, when presented with the above-referenced sample image may be observed, monitored and quantified to develop a set of hROIs for the type of image or collection(s) of images to be processed.
  • Step 4 Similarity Index Generation
  • a similarity index between the two types of ROI data may be developed and utilized to provide a correlation between the sets of aROIs and hROIs developed for the sample image.
  • Step 5 Optimal Transformation Algorithm Selection
  • the selected optimal image transformation function may be referred to, for example, as A*, and may correspond to the image transformation function that yields aROIs for the sample image(s) that show the greatest similarity to the predetermined hROIs.
  • Step 6 Optimal Transformation Algorithm Utilization
  • an optimal image transformation function A*
  • that image transformation function may be utilized to process the remainder of images within the collection or collections of images, thus insuring that the overall image processing function proceeds in an intelligent manner.
  • the manner is deemed to be “intelligent” because the optimal image processing algorithm, A*, has been selected to have a high, or relatively, high correlation with human image processing and yet can process large collections of image data autonomously.

Abstract

Systems and methods for performing intelligent image processing. Image processing systems and methods in accordance with the present invention may select algorithms for processing collections of images by comparing algorithmic region of interest (aROI) data to stored human visual region of interest (hROI) data to select from a database of available transformation algorithms an optimal algorithm or group of algorithms to be used in transforming data comprising the collection or collections of images. The selected algorithm(s) may then be used, for example, in data compression, image enhancement or database query functions.

Description

    BACKGROUND
  • 1. Field of the Invention [0001]
  • The present invention relates generally to image processing systems and, more particularly, to systems and methods for processing image data based upon predetermined regions of human visual interest. [0002]
  • 2. Background of the Invention [0003]
  • The Scanpath Theory of human vision, proposed by Noton and Stark in 1971, suggests that a top-down, internal cognitive model of what a person sees when actively looking at an image guides active eye movements of the person and controls and/or influences the person's perception of the image being viewed. Stated somewhat differently, Noton and Stark suggest that eye movements utilized in visually examining an image are generated based at least in part upon an internal cognitive model that has been developed by a person through experience. The term “top down processing” as used herein denotes image processing that proceeds with some assumed knowledge regarding the type of image being viewed or image data being analyzed. Thus, the Scanpath Theory posits that when a person views an image, the eye movements of the person will follow a pattern that is premised upon knowledge of the type of image that is being viewed and/or similar types of images. [0004]
  • The Scanpath Theory recognizes that active eye movements comprise an essential part of visual perception, because these eye movements carry the fovea, a region of high visual acuity in the retina, into each part of an image to be processed. Thus, the Scanpath Theory posits that an internal cognitive model drives human eye movements in a repetitive, sequential set of saccades and fixations (“glances”) over specific regions-of-interest (“ROIs”) in a scene, with the subconscious aim of confirming the top-down, internal cognitive model—the “Mind's Eye”, so to speak. [0005]
  • Experimental investigation of the Scanpath Theory has involved presenting a complex visual stimulus (such as a scenic photograph) to a human subject and recording the eye movements made by the subject while looking at the presented image. Thus, computer-controlled experiments present an image and carefully measure the subject's eye movements using video cameras. Eye movement recordings are then represented as sequences of alternating glances (saccades and fixations), where the duration of each glance generally lasts about 300 milliseconds. Every glance the subject makes while looking at the image enables the high resolution fovea of the retina to abstract information from the image during the fixation period, identifying a fixation point on the image as a visual region-of-interest, or ROI. This is shown, for example, in FIGS. 8[0006] a, 8 b and 9.
  • Diametrically opposed to the Scanpath Theory, current methods for computerized image processing are usually intended to detect and localize specific features in a digital image in a “bottom-up” fashion, analyzing, for example, spatial frequency, texture conformation, or other informative values of loci of the visual stimulus. The term “bottom up processing” is used herein to denote processing methods that assume no knowledge of an image being viewed or image data being processed. Prior art methods that have been proposed in the literature can be classified into three principal approaches: [0007]
  • 1. Structural Methods are based on an assumption that images have detectable and recognizable primitives, which are distributed according to some placement rules—examples of prior art methods that use such an approach are matched filters. [0008]
  • 2. Statistical Methods are based on statistical characteristics of the texture of the picture—examples of prior art methods that use a statistical approach are Co-Occurrence Matrices and Entropy Functions. [0009]
  • 3. Modeling Methods hypothesize underlying processes for generating local regions of visual interest—examples of prior art that use a modeling approach are Fractal Descriptors. [0010]
  • U.S. Pat. No. 5,535,013, entitled “Image Data Compression and Expansion Apparatus, and Image Area Discrimination Processing Apparatus Therefor,” teaches a method of image data compression in which an image is first divided into square pixel blocks and then encoded using an orthogonal transform. This is a statistical method. The encoding process is based upon a discrete cosine transform, and is thus a JPEG algorithm. Using the coefficients of the discrete cosine transform, the method taught by U.S. Pat. No. 5,535,013 discriminates blocks containing text from blocks containing general, non-text dot images. Then; a selective quantization method is used to identify different quantization coefficients for text blocks and non-text blocks. [0011]
  • Other bottom-up methods of image processing suggest that characterization and decomposition of an image can be based upon primitives such as color, texture, or shape. Such methods can be more powerful than the text/non-text discrimination method of U.S. Pat. No. 5,535,013, but still cannot overcome the important limitation that for a general, complex image, regions of interest are difficult to specify by a single parameter such as color or shape. This is shown, for example, in U.S. Pat. No. 5,579,471, entitled “Image Query System and Method.”[0012]
  • In view of the foregoing, it is submitted that those skilled in the art would find to be quite useful a method and apparatus for image processing which takes into account the underlying nature of human vision and perception, so as to selectively decompose an image into its most meaningful regions of visual interest, thereby providing a means for improving image compression, image query techniques and visual image enhancement systems. [0013]
  • SUMMARY OF THE INVENTION
  • In one particularly innovative aspect, the present invention is directed to systems and methods for image processing that utilize a cognitive model stored in memory to identify regions within an image that correlate with previously determined regions of visual interest for a given type of image or type of image data being processed. [0014]
  • In another innovative aspect, systems and methods in accordance with the present invention may select algorithms for processing collections of images by comparing algorithmic region of interest (aROI) data to stored human visual region of interest (hROI) data to select an optimal algorithm or group of algorithms to be used in transforming data comprising the collection or collections of images. The selected algorithms may then be used, for example, in data compression, image enhancement or database query functions. [0015]
  • In still another innovative aspect, the present invention is directed to systems and methods that utilize conventional image processing algorithms in combination with innovative clustering, sequencing, comparing and parsing techniques to predict loci of human fixations within an image or within collections of images for the purposes of, for example, data compression, image enhancement and image database query functions. Indeed, empirical analysis reveals that systems and method in accordance with the present invention enable a prediction of human fixation loci that is comparable in measure to the ability of one human to predict the loci of eye movements of other persons viewing an image. [0016]
  • In still another innovative aspect, systems and methods in accordance with the present invention may detect regions of visual interest (ROIs) within an image based upon stored characteristic data representative of human visual perception. For example, using the method(s) of the present invention, algorithmic regions of interest (aROIs) having a high, or relatively high, correlation with human regions of visual interests (hROIs) may be developed for an image or collection of images, and thereafter an image or collection of images may be saved within a system using selected portions of the original picture (i.e., aROIs) as identification data. Then, the selected portions of the picture (i.e., saved aROI data) may be used in performing a query search. The query search may proceed, for example, by comparing saved aROIs in a database with ROIs specified by the system operator. Processing image data in this fashion should provide for substantial reductions in image processing time. Further, it will be appreciated that, through the use of processing algorithms and methodologies in accordance with the present invention, it is possible to take into consideration more complex features of an image, not just indications of color, shape and the like. [0017]
  • A system for compressing and processing collections of images in accordance with one form of the present invention may comprise, for example, means for transforming image data representative of a particular image, collection of images or type of image into a domain of “visual relevance”, for example, using a database of image processing transformation functions; means for obtaining a set of algorithmic regions of interest (aROIs) from a transformed image, for example, by thresholding; means for clustering local maxima from the transformed image into a second set of only a few, very relevant algorithmic regions of interest (aROIs), such that the most relevant algorithmic regions of interest (aROIs) are properly distributed over the image; means for comparing the identified algorithmic regions of interest (aROIs) with predetermined human visual regions of interest (hROIs) to select an optimal image processing transformation function; and means for using the selected optimal image processing transformation function to compress the remainder of images with a collection or collections of images. In addition, a system in accordance with the present invention may comprise means for using the algorithmic regions of interest (aROIs) to implement image query functions and/or means for using the algorithmic region of interest (aROI) data to implement various visual image enhancement techniques. [0018]
  • It will be appreciated that systems and methods in accordance with the present invention can be utilized to process very large collections of data including, for example, large collections of pictures, scenes and works of art. It also will be appreciated that systems and methods in accordance with the present invention may be utilized, for example, to compress, search and/or enhance images ranging from natural and constructed landscapes and “cityscapes”, to groups of persons and animals and objects, and to single portraits and still lives. [0019]
  • Accordingly, it is an object of the present invention to provide improved systems and methods for use in the field of image processing. [0020]
  • It is also an object of the present invention to provide systems and methods the utilize top down image processing techniques to improve image processing functions and efficiency. [0021]
  • Other objects and features of the present invention will become apparent from consideration of the following description taken in conjunction with the accompanying drawings.[0022]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is an illustration of an image of the Mona Lisa and a transformation of that image. [0023]
  • FIG. 2 shows how a thresholding algorithm may be applied to the transformed image of FIG. 1 to produce image local maxima data. [0024]
  • FIG. 3 shows how an original image may be transformed, how local maxima may be identified within the transformed image, how the local maxima may be clustered and, finally, how any resulting clusters may be quantified based upon local maxima data. [0025]
  • FIG. 4 shows how a transformed image may be processed to obtain local maxima, how the local maxima may be iteratively clustered, and how regions of interest may be identified based upon quantified cluster data. [0026]
  • FIG. 5 shows how an image may be transformed and represented by a 3-dimensional pixel intensity diagram, and how quantified cluster data may be represented within the 3-dimensional pixel intensity diagram. [0027]
  • FIG. 6 shows how an image may be transformed and local maxima obtained without the utilization of a clustering algorithm. [0028]
  • FIG. 7 comprises a representation of mathematical image processing transformation functions, grouped by category, that may be used in image processing systems. [0029]
  • FIGS. 8[0030] a and 8 b illustrate how human eye movement may be utilized to identify human visual regions of interest (hROI) within an image.
  • FIG. 9 provides a second illustration of how human eye movement may be utilized to identify human visual regions of interest (hROI) within an image. [0031]
  • FIG. 10 shows how algorithmic regions of interest (aROIs) may be compared to human visual regions of interest (hROIs) to obtain a quantitative measurement between the various regions of interest. [0032]
  • FIG. 11 comprises a table showing a correlation between regions of interest identified by various transformation algorithms and regions of interest obtained through monitoring the eye movement of various human subjects. [0033]
  • FIGS. 12[0034] a-d show how anticipated human visual regions of interest (hROIs) may be used within image compression techniques.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • Turning now to the drawings, before addressing the various image processing techniques that are utilized in accordance with the present invention, it will be noted that the methods described and claimed herein may be implemented or, stated differently, executed on any of a number of computer systems. For example, the image processing protocols described herein may be implemented within applications designed to run on either personal computer (PC), Unix work stations, dedicated hardware or, indeed, within virtually any other environment. Thus, the specific hardware used to implement the systems and methods described below will not be described in detail herein. Rather, it will be understood that the systems and methods may be implemented using virtually any computing system and that a typical PC including a 200 MHz or better Pentium® processor manufactured by the Intel Corporation and related components would be exemplary. [0035]
  • Turning to FIG. 1, those skilled in the art will appreciate that, using conventional transformation functions, it is possible to convert an [0036] original image 10 to a transformed image 12. Similarly, it is possible to apply thresholding criteria to the transformed image 12 to obtain a mapping 14 comprising a plurality of maxima loci within the transformed image. This is shown, for example, in FIG. 2.
  • The information content of a generic picture can be identified by different image parameters which in turn can be identified by relevant image processing algorithms. In this sense, applying algorithms to a picture means to map that image into different domains, where for each domain a specific set of parameters is extracted. After the image has been processed, only the loci of the local maxima from each domain are retained; these maxima are then clustered in order to yield a limited number of ROIs. Exemplary algorithms that have been studied include: [0037]
  • 1-X, a x-like mask, positive along two diagonal and negative elsewhere, were convolved with the image. We have also used different high-curvature masks convolution as for example the “<”-like mask whose the definition is intuitive: these were rotationally invariant. [0038]
  • 2-S, symmetry, a structural approach, appears to be a very prominent spatial relation. For each pixel, x, y of the image, we define a local symmetry magnitude S(x,y) as follows: [0039] S ( x , y ) = ( i 1 j 1 ) , ( i 2 , j 2 ) Γ ( x , y ) s ( ( i 1 j 1 ) , ( i 2 , j 2 ) )
    Figure US20020181786A1-20021205-M00001
  • where Γ(x,y) is the neighborhood of radius [0040] 7 of point x,y defined along the horizontal and vertical axis (Γ(x,y)=(x−r,y), . . . , (x,y), . . . (x,y−r), . . . , (x,y+r)) and s((i1,j1),(i2,j2)) is defined by the following equation:
  • s((i 1 ,j 1),(i 2 ,j 2))=(d((i 1 ,j 1),(i 2 ,j 2)))| cos(θ1−θ2)|
  • The first factor Gσ is a guassian of fixed variance, σ=3 pixels and d(·) represents the distance function. The second factor represents a simplified notion of symmetry: θ[0041] 1 and θ2 correspond to the angles of the gray level intensity gradient of the two pixels (i1,j1) and (i2,j2). The factor achieves the maximum value when the gradients of the two points are oriented in the same direction. The guassian represents a distance weight function which introduces localization in the symmetry evaluation.
  • 3-W, discrete wavelet transform is based on a pyramidal algorithm which split the image spectrum into four spatial frequency bands containing horizontal lows/vertical lows (ll), horizontal lows/vertical highs (lh), horizontal highs/vertical lows (ll) and horizontal highs/vertical highs (hh). The procedure is repeatedly applied to each resulting low frequency band resulting in a multiresolution decomposition into octave bands. The process of the image wavelet decomposition is achieved using pair of conjugate quadrature filters (CPFs) which acts as a smoothing filter (i.e. a moving average) and a detailing filter respectively. We have used different orders from the Daubechies family basis to define CPF filters. For each resolution l, only the wavelet coefficients of the highs/highs hh[0042] 1 matrix were retained and finally relocated into an final matrix HH (with the same dimension as the original image) by the following combination: HH = i = 1 n ζ i ( hh i )
    Figure US20020181786A1-20021205-M00002
  • Where n is the maximum depth of the pyramidal algorithm (n=3 in our case) and where ζ(·) is a matrix operation which returns a copy of the input matrix hh by inserting alternatively rows and columns of zeros. [0043]
  • 4-F, a center-surround on/off quasi-receptive field mask, positive in the center and negative in the periphery, is convolved with the image. [0044]
  • 5-O, difference in the gray-level orientation, a statistical-type kernel, is analyzed in early visual cortices. Center-surround difference is determined first convolving the image with four gabor masks of angles 0°, 45°, 90° and 135° respectively. For each pixels x, y, the scalar result of the four convolutions are then associated with four unit vectors corresponding to the four different orientations. The orientation vector {right arrow over (o)} (x,y) is represented by the vectorial sum of these four weighted unit vectors. We define the center-surround difference transform as follows: [0045]
  • O(x,y)=(1−{right arrow over (o)}(x,y{right arrow over (m)}(x,y))|{right arrow over (o)}(x,y)||{right arrow over (m)}(x,y)|
  • where {right arrow over (m)} (x,y) is the average orientation vector evaluated within the neighborhood of 7×7 pixels. The first factor of the equation achieves high values for big differences in orientation between the center pixel and the surroundings. The second factor acts as a low-pass filter for the orientation feature. [0046]
  • 6-E, concentration of edges per unit area is determined by detecting edges in an image, using the canny operator [2] and then congregating the edges detected with a gaussian of σ=3 pixels. [0047]
  • 7-N, entropy is calculated as [0048] 255 i = 0 ρ i
    Figure US20020181786A1-20021205-M00003
  • log p[0049] i where pi is the probability of the gray level l within the 7×7 surrounding of the center pixel.
  • 8-C, Michaelson contrast, is most useful in identifying high contrast elements, generally considered to be an important choice feature for human vision. Michaelson contrast is calculated as |(L[0050] m−LM)/(Lm+LM)| where Lm is the mean luminance within a 7×7 surrounding of the center pixel and LM is the overall mean luminance of the image.
  • 9-H, discrete cosine transform, DCT, introduced by, is used in several coding standards as, for example, in the JPEG-DCT compression algorithm. The image is first subdivided into square blocks (i.e. 8×8); each block is then transformed into a new set of coefficients using the DCT; finally, only the high frequency coefficients, the ones that are instead discarded in the JPEG algorithm, are retained to quantify the corresponding block. [0051]
  • Turning now to FIG. 3, if desired, it is possible to convert an [0052] original image 10 to a transformed image 12, to process the transformed image 12 to obtain a mapping 16 of maxima loci that are grouped by cluster and, finally, to quantify the value of the local maxima within the clusters to obtain quantified data indicative of algorithmic regions of interest (aROIs) 18 a-g within the original image 10.
  • FIG. 4 shows how a transformed [0053] image 20 may be processed to obtain a mapping 22 of local maxima, and how clustering algorithms may be applied in an iterative fashion to the mapping 22 of local maxima to develop, for example, iterative clusters 24-28 of local maxima and to identify a plurality of algorithmic regions of interest (aROIs) 30 a-g within an original image 32.
  • A clustering procedure in accordance with the present invention may proceed as follows. The initial set of local maxima is clustered connecting local maxima by gradually increasing an acceptance radius for their joining. In preferred embodiment, approximately 100 initial local maxima may be reduced, for example, to nine regions or clusters by setting a termination decision to end the clustering process at the prescribed number of domains. Then, the clustered domains can be assigned values depending upon the value of the highest local maxima incorporated into that domain or, alternatively, based on the number of local maxima included within a cluster. Those skilled in the art will appreciate that other criteria may also be utilized. [0054]
  • It will also be appreciated that each image processing algorithm contributes to the intensity of its selected parameter to find local maxima and values of resulting clustered ROI domains. Moreover, the clustering algorithm may comprise an eccentricity weighting algorithm, where lower local maxima that are eccentrically located can be selected to form a domain. [0055]
  • Turning to FIG. 5, those skilled in the art will appreciate that a transformation function may be applied to an [0056] original image 34 to convert the original image to a transformed image 36 and how, thereafter, a 3-dimensional pixel intensity diagram 38 may be developed from the transformed image 36. The 3-dimensional pixel intensity diagram 38 provides location data along the x and y axes of the graph and pixel intensity values along the z axis of the graph. Thus, the height of the 3-dimensional pixel intensity diagram 38 at a particular x,y pixel location may represent the pixel intensity or local maxima value at that location. FIG. 5 also provides an illustration of a plurality of final cluster locations 40 a-g defined within the 3-dimensional pixel intensity diagram 38.
  • Turning now to FIG. 6, that figure shows how a limited number of local maxima [0057] 42 a-g may be identified within a transformed image 44 and may be mapped onto an original image 46. It will be appreciated that, in view of FIG. 6, that when iterative clustering protocols are not applied to transformed image data, less optimally distributed algorithmic regions of interest (aROIs) 42 a-g are identified. Stated somewhat differently, where iterative clustering techniques are not applied to transformed image data, less relevant algorithmic regions of interest (aROIs) 42 a-g are identified.
  • Turning now to FIG. 7, those skilled in the art will appreciate that numerous image processing transformation functions may be used in accordance with the present invention to identify algorithmic regions of interest (aROIs) within an image. Several such algorithms may comprise a database represented by FIG. 7, and in the example provided an entropy algorithm is used. It will be appreciated that an entropy transformation algorithm was used to process the [0058] image 10 provided in FIG. 1.
  • Now, turning to FIGS. 8[0059] a and 8 b, those figures illustrate how, by mapping human eye movements, human visual regions of interest (hROIs) 50 a-g may be identified within an image. More specifically, FIG. 8a shows how human fixation loci 52 may be developed as a person observes an image. It will be noted that the human fixation loci 52 illustrated in FIG. 8a are developed by monitoring the amount of time that the human eye focuses on particular loci within the image. Turning now to FIG. 8b, it will be seen that, by tracking human eye movements to identify the human fixation loci 52, and by applying fixation identification procedures to those loci 52, it is possible to identify human visual regions of interest (hROIs) within the image 10.
  • FIG. 9 shows how raw data, on the left, indicative of human eye movement may be parsed to identify human regions of visual interest (hROIs), on the right, within an [0060] image 54.
  • Turning now to FIG. 10, in one particularly innovative aspect, the present invention provides for a correlation of algorithmic regions of interest (aROIs) and human visual regions of interest (hROIs), such that transformation algorithms to be applied to particular types of images or collections of images may be selected based upon a predetermined correlation between aROIs and hROIs for a particular type of image or collection of images to be processed. Thus, in accordance with one form of the present invention it is possible to store data reflecting an internal cognitive model, or correspondence between aROIs and hROIs, for particular types of images within an image processing system. The internal cognitive model data (i.e., aROI and hROI correlation data) may then be used to select appropriate image processing transformation functions for utilization in processing image data such that any algorithmically determined regions of interest (aROIs) may have a high, or relatively high, likelihood of corresponding to a set of human visual regions of interest (hROIs) within the images or collection(s) of images being processed. [0061]
  • A table showing a correlation between algorithmic regions of interest (aROIs) identified by four exemplary transformation functions and human visual regions of interest (hROIs) developed through monitoring the eye movements of four human subjects is provided in FIG. 11. [0062]
  • Correlations of the type described above may be established as follows. ROI loci selected by different image processing algorithms and those defined by human eye movement fixations are first compared. Further, any comparison of aROIs to hROIs preferably proceeds by obtaining two sets of ROIs, one aROI and one hROI, and clustering the two sets of ROIs using a distance measure derived from a k-means pre-evaluation. This evaluation preferably determines regions defining coincidence and non-coincidence based upon distances between the respective loci of the two sets of ROIs. The final selection of joined-ROIs then enables the calculation of a similarity metric, S[0063] p, to determine how close the two sets of ROIs were.
  • Thus, in a preferred form, an index of similarity may be utilized to describe how closely two sets of ROIs resemble one another. For additional discussion of methods of correlating ROIs, reference is made to Privitera and Stark, “Algorithms for Defining Visual Region-of-Interest: Comparison with Eye Fixations,” Memorandum No. UCB/ERL M97/72, Electronics Research Laboratory, College of Engineering, University of California, Berkeley, which is hereby incorporated by reference. [0064]
  • Turning now to FIGS. 12[0065] a-d, it can be seen how algorithmic regions of interest (aROIs) having a high correlation with anticipated human visual regions of interest (hROIs) may be used to enhance the performance of various data compression techniques. For example, it will be appreciated that an original image 60 is provided in FIG. 12a, and that algorithmic regions of interest (aROIs) 62 a-g having a relatively high correlation with predetermined human visual regions of interest (hROIs) for the original image 60 are shown in FIG. 12b. It also can be seen that, when the selected algorithmic regions of interest (aROIs) 62 a-g are incorporated into a compressed image 64 such as that shown in FIG. 12d, substantial improvements in compressed image quality may be achieved over conventional compressed images, such as the compressed image 66 shown in FIG. 12c.
  • A region-of-visual-interest image processing (ROVIIP) system in accordance with a presently preferred form of the present invention preferably performs six basic functions or processes. These include image transformation and thresholding, transformed image clustering, human visual region of interest (hROI) identification and/or storage, similarity index generation, optimal transformation algorithm selection, and optimal transformation algorithm utilization. [0066]
  • Step 1: Image Transformation and Thresholding [0067]
  • As explained above, the first step generally performed by a region-of-visual-interest image processing (ROVIIP) system generally requires the transformation of a sample image from a collection of images to be processed. Transformations of the sample image(s) are performed using a plurality image transformation functions stored within a database. These transformations yield a respective plurality of transformed images, and a thresholding function is preferably applied to the transformed images to identify respective sets of local maxima within the transformed images. Preferably sets of approximately 100 local maxima are identified for each transformed image. [0068]
  • Step 2: Transformed Image Clustering [0069]
  • Following the basic transformation and thresholding step, clustering algorithms preferably are applied iteratively to the respective sets of local maxima to identify respective smaller sets of relevant loci. The smaller sets of relevant loci preferably [0070] number 10 or less and are referred to herein as algorithmic regions of interest (aROIs).
  • Step 3: Human Visual Region of Interest Identification and/or Storage [0071]
  • Preferably, human visual regions of interest (hROIs) are predetermined for the type of image or collection(s) of images to be processed. In the event that hROIs are not predetermined for the type of image or collection(s) of images to be processed, the eye movements of several subjects, when presented with the above-referenced sample image, may be observed, monitored and quantified to develop a set of hROIs for the type of image or collection(s) of images to be processed. [0072]
  • Step 4: Similarity Index Generation [0073]
  • Once the sets of aROIs and hROIs have been developed for the sample image, or a set of sample images, a similarity index between the two types of ROI data may be developed and utilized to provide a correlation between the sets of aROIs and hROIs developed for the sample image. [0074]
  • Step 5: Optimal Transformation Algorithm Selection [0075]
  • Using the similarity index or correlation data, it is possible to select an optimal image transformation function, or optimal group of functions, from the database of available image transformation functions. The selected optimal image transformation function may be referred to, for example, as A*, and may correspond to the image transformation function that yields aROIs for the sample image(s) that show the greatest similarity to the predetermined hROIs. [0076]
  • Step 6: Optimal Transformation Algorithm Utilization [0077]
  • Once an optimal image transformation function, A*, has been selected, that image transformation function may be utilized to process the remainder of images within the collection or collections of images, thus insuring that the overall image processing function proceeds in an intelligent manner. The manner is deemed to be “intelligent” because the optimal image processing algorithm, A*, has been selected to have a high, or relatively, high correlation with human image processing and yet can process large collections of image data autonomously. [0078]
  • While the invention is susceptible to various modifications and alternative forms, specific examples thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the invention is not to be limited to the particular forms or methods disclosed, but to the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the appended claims. [0079]

Claims (10)

What is claimed is:
1. A method for identifying algorithmic regions of interest within an image, said method comprising the steps of:
applying an image transformation function to a set of data representative of said image to thereby obtain a data set representative of a transformed image,
applying a thresholding function to said data set representative of said transformed image to thereby obtain data representative of a plurality of local maxima within said image, and
iteratively applying a clustering algorithm to said data representative of said local maxima to identify a plurality of algorithmic regions of interest within said image.
2. A method for processing image data, said method comprising the steps of:
establishing for at least one image type a correlation between a plurality of algorithmic regions of interest developed by a respective plurality of transformation algorithms and at least one set of human visual regions of interest;
selecting a transformation algorithm for processing data representative of an image based upon an image type and said correlation; and
transforming said image data from a first domain to a second domain using said selected transformation algorithm.
3. A method for selecting image transformation functions for image processing applications, said method comprising the steps of:
storing within a memory at least one data set descriptive of human visual regions of interest for at least one type of image;
applying a plurality of image transformation functions to an image corresponding to said at least one type of image to derive a plurality of respective data sets comprising local maxima;
applying clustering functions to said plurality of respective data sets comprising said local maxima to derive a plurality of respective data sets comprising algorithmic regions of interest within said image; and
comparing said respective data sets comprising said algorithmic regions of interest to said at least one data set descriptive of said human visual regions of interest to select a transformation function for processing additional images corresponding to said at least one type of image.
4. A system for processing image data comprising:
an image processing engine comprising a central processing unit, memory and an image processing program stored within said memory, said image processing program including,
a database of image transformation functions,
code for applying said image transformation functions stored within said database to a sample image from a collection of images to develop a plurality of transformed images corresponding to said sample image;
code for applying a thresholding algorithm to said respective transformed images to identify respective sets of local maxima within said transformed images;
code for applying a clustering algorithm to said respective sets of local maxima to identify sets of algorithmic regions of interest within said transformed images;
code for comparing said sets of algorithmic regions of interest to predetermined human regions of visual interest for said sample image to select from said database of image transformation functions a preferred image transformation function; and
code for applying said selected preferred image transformation function to a remainder of images within said collection of images when said collection of images is to be processed.
5. The system of claim 4, wherein said image processing engine is used to perform a function selected from the group of image compression, image query, and image enhancement.
6. A system for processing image data comprising:
an image processing engine comprising a central processing unit, memory and an image processing program stored within said memory, said image processing program including,
a database of image transformation functions,
code for applying said image transformation functions stored within said database to a sample image from a collection of images to develop a plurality of transformed images corresponding to said sample image;
code for applying a thresholding algorithm to said respective transformed images to identify respective sets of local maxima within said transformed images;
code for applying a clustering algorithm to said respective sets of local maxima to identify sets of algorithmic regions of interest within said transformed images;
code for comparing said sets of algorithmic regions of interest to predetermined human regions of visual interest for images of a similar type to said sample to identify from said database of image transformation functions a preferred image transformation function; and
code for applying said preferred image transformation function to a remainder of images within said collection of images when said collection of images is to be processed.
7. The system of claim 6, wherein said image processing engine is used to perform a function selected from the group of image compression, image query, and image enhancement.
8. A system for processing image data comprising:
an image processing engine comprising a central processing unit, memory and an image processing program stored within said memory, said image processing program including,
a database of image transformation functions,
code for comparing stored sets of algorithmic regions of interest for a particular type of image collection to be processed to at least one stored set of human regions of visual interest for said particular type of image collection to be processed to identify from said database of image transformation functions a preferred image transformation function; and
code for applying said preferred image transformation function to images within said collection of images when said collection of images is to be processed.
9. The system of claim 8, wherein said image processing engine is used to perform a function selected from the group of image compression, image query, and image enhancement.
10. A system for processing image data comprising:
an image processing engine comprising a central processing unit, memory and an image processing program stored within said memory, said image processing program including,
a database of image transformation functions,
code for selecting an image transformation function to be used in performing a predetermined image processing task based upon a correlation between stored algorithmic region of interest data and human visual region of interest data for said task; and
code for applying said selected image transformation function to images within a collection of images that are to be processed in accordance with said task.
US10/145,611 1998-06-08 2002-05-13 Intelligent systems and methods for processing image data based upon anticipated regions of visual interest Abandoned US20020181786A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/145,611 US20020181786A1 (en) 1998-06-08 2002-05-13 Intelligent systems and methods for processing image data based upon anticipated regions of visual interest

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/093,743 US6389169B1 (en) 1998-06-08 1998-06-08 Intelligent systems and methods for processing image data based upon anticipated regions of visual interest
US10/145,611 US20020181786A1 (en) 1998-06-08 2002-05-13 Intelligent systems and methods for processing image data based upon anticipated regions of visual interest

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/093,743 Continuation US6389169B1 (en) 1998-06-08 1998-06-08 Intelligent systems and methods for processing image data based upon anticipated regions of visual interest

Publications (1)

Publication Number Publication Date
US20020181786A1 true US20020181786A1 (en) 2002-12-05

Family

ID=22240457

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/093,743 Expired - Fee Related US6389169B1 (en) 1998-06-08 1998-06-08 Intelligent systems and methods for processing image data based upon anticipated regions of visual interest
US10/145,611 Abandoned US20020181786A1 (en) 1998-06-08 2002-05-13 Intelligent systems and methods for processing image data based upon anticipated regions of visual interest

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/093,743 Expired - Fee Related US6389169B1 (en) 1998-06-08 1998-06-08 Intelligent systems and methods for processing image data based upon anticipated regions of visual interest

Country Status (1)

Country Link
US (2) US6389169B1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060181678A1 (en) * 1999-04-23 2006-08-17 Neuroptics, Inc. A California Corporation Pupilometer with pupil irregularity detection, pupil tracking, and pupil response detection capability, glaucoma screening capability, intracranial pressure detection capability, and ocular aberration measurement capability
US20060204128A1 (en) * 2005-03-07 2006-09-14 Silverstein D A System and method for correcting image vignetting
US20060210151A1 (en) * 2005-03-16 2006-09-21 Fabio Riccardi Interface method and system for finding image intensities
US20070092148A1 (en) * 2005-10-20 2007-04-26 Ban Oliver K Method and apparatus for digital image rudundancy removal by selective quantization
US20070211962A1 (en) * 2006-03-10 2007-09-13 Samsung Electronics Co., Ltd. Apparatus and method for selectively outputing image frames
US20080104415A1 (en) * 2004-12-06 2008-05-01 Daphna Palti-Wasserman Multivariate Dynamic Biometrics System
US20080310755A1 (en) * 2007-06-14 2008-12-18 Microsoft Corporation Capturing long-range correlations in patch models
WO2009058915A1 (en) * 2007-10-29 2009-05-07 The Trustees Of The University Of Pennsylvania Computer assisted diagnosis (cad) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (mrs) and imaging (mri)
US20100169024A1 (en) * 2007-10-29 2010-07-01 The Trustees Of The University Of Pennsylvania Defining quantitative signatures for different gleason grades of prostate cancer using magnetic resonance spectroscopy
US20100289818A1 (en) * 2009-05-12 2010-11-18 Canon Kabushiki Kaisha Image layout device, image layout method, and storage medium
US20110228224A1 (en) * 2008-11-28 2011-09-22 Kamran Siminou Methods, systems, and devices for monitoring anisocoria and asymmetry of pupillary reaction to stimulus
US20120267345A1 (en) * 2011-04-20 2012-10-25 Rolls-Royce Plc Method of manufacturing a component
US8911085B2 (en) 2007-09-14 2014-12-16 Neuroptics, Inc. Pupilary screening system and method
US8965104B1 (en) * 2012-02-10 2015-02-24 Google Inc. Machine vision calibration with cloud computing systems
US9190017B2 (en) 2013-01-02 2015-11-17 International Business Machines Corporation Proportional pointer transition between multiple display devices
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6735341B1 (en) * 1998-06-18 2004-05-11 Minolta Co., Ltd. Image processing device and method and recording medium for recording image processing program for same
US20020176619A1 (en) * 1998-06-29 2002-11-28 Love Patrick B. Systems and methods for analyzing two-dimensional images
US7006685B2 (en) * 1998-06-29 2006-02-28 Lumeniq, Inc. Method for conducting analysis of two-dimensional images
US7068829B1 (en) 1999-06-22 2006-06-27 Lumeniq, Inc. Method and apparatus for imaging samples
US6751363B1 (en) * 1999-08-10 2004-06-15 Lucent Technologies Inc. Methods of imaging based on wavelet retrieval of scenes
US6748097B1 (en) * 2000-06-23 2004-06-08 Eastman Kodak Company Method for varying the number, size, and magnification of photographic prints based on image emphasis and appeal
JP4193342B2 (en) * 2000-08-11 2008-12-10 コニカミノルタホールディングス株式会社 3D data generator
US6754384B1 (en) * 2000-08-30 2004-06-22 Eastman Kodak Company Method for processing an extended color gamut digital image using an image information parameter
CA2323883C (en) * 2000-10-19 2016-02-16 Patrick Ryan Morin Method and device for classifying internet objects and objects stored oncomputer-readable media
US7027655B2 (en) * 2001-03-29 2006-04-11 Electronics For Imaging, Inc. Digital image compression with spatially varying quality levels determined by identifying areas of interest
US6690828B2 (en) * 2001-04-09 2004-02-10 Gary Elliott Meyers Method for representing and comparing digital images
TW569159B (en) * 2001-11-30 2004-01-01 Inst Information Industry Video wavelet transform processing method
US8090730B2 (en) * 2001-12-04 2012-01-03 University Of Southern California Methods for fast progressive evaluation of polynomial range-sum queries on real-time datacubes
RU2220514C2 (en) * 2002-01-25 2003-12-27 Андрейко Александр Иванович Method for interactive television using central vision properties of eyes of individual users or groups thereof that protects information against unauthorized access, distribution, and use
US7010169B2 (en) * 2002-04-15 2006-03-07 Sbc Technology Resources, Inc. Multi-point predictive foveation for bandwidth reduction of moving images
US7499594B2 (en) * 2002-04-15 2009-03-03 At&T Intellectual Property 1, L.P. Multi-resolution predictive foveation for bandwidth reduction of moving images
US7050630B2 (en) * 2002-05-29 2006-05-23 Hewlett-Packard Development Company, L.P. System and method of locating a non-textual region of an electronic document or image that matches a user-defined description of the region
US20040109608A1 (en) * 2002-07-12 2004-06-10 Love Patrick B. Systems and methods for analyzing two-dimensional images
US8595242B2 (en) * 2003-06-13 2013-11-26 Ricoh Company, Ltd. Method for parsing an information string to extract requested information related to a device coupled to a network in a multi-protocol remote monitoring system
JP4279083B2 (en) * 2003-08-18 2009-06-17 富士フイルム株式会社 Image processing method and apparatus, and image processing program
US7116806B2 (en) * 2003-10-23 2006-10-03 Lumeniq, Inc. Systems and methods relating to AFIS recognition, extraction, and 3-D analysis strategies
JP2006295582A (en) * 2005-04-12 2006-10-26 Olympus Corp Image processor, imaging apparatus, and image processing program
US7602157B2 (en) * 2005-12-28 2009-10-13 Flyback Energy, Inc. Supply architecture for inductive loads
FR2897183A1 (en) * 2006-02-03 2007-08-10 Thomson Licensing Sas METHOD FOR VERIFYING THE SAVING AREAS OF A MULTIMEDIA DOCUMENT, METHOD FOR CREATING AN ADVERTISING DOCUMENT, AND COMPUTER PROGRAM PRODUCT
US7873220B2 (en) * 2007-01-03 2011-01-18 Collins Dennis G Algorithm to measure symmetry and positional entropy of a data set
JP5539879B2 (en) * 2007-09-18 2014-07-02 フライバック エネルギー,インク. Current waveform structure that generates AC power with low harmonic distortion from a local energy source
WO2011019625A1 (en) * 2009-08-10 2011-02-17 Telcordia Technologies, Inc. System and method for multi-resolution information filtering
US8462392B2 (en) * 2009-08-13 2013-06-11 Telcordia Technologies, Inc. System and method for multi-resolution information filtering
WO2011082188A1 (en) * 2009-12-28 2011-07-07 Flyback Energy Inc. External field interaction motor
CA2785715A1 (en) * 2009-12-28 2011-07-28 Paul M. Babcock Controllable universal supply with reactive power management
US8542875B2 (en) 2010-09-17 2013-09-24 Honeywell International Inc. Image processing based on visual attention and reduced search based generated regions of interest
RU2549584C2 (en) * 2010-12-09 2015-04-27 Нокиа Корпорейшн Limited context-based identification of key frame of video sequence
CN102436576B (en) * 2011-10-21 2013-11-06 洪涛 Multi-scale self-adaptive high-efficiency target image identification method based on multi-level structure
EP2654015A1 (en) * 2012-04-21 2013-10-23 General Electric Company Method, system and computer readable medium for processing a medical video image
US9478004B2 (en) 2013-04-11 2016-10-25 John Balestrieri Method and system for analog/digital image simplification and stylization
US10979721B2 (en) 2016-11-17 2021-04-13 Dolby Laboratories Licensing Corporation Predicting and verifying regions of interest selections

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5133020A (en) * 1989-07-21 1992-07-21 Arch Development Corporation Automated method and system for the detection and classification of abnormal lesions and parenchymal distortions in digital medical images
US6304675B1 (en) * 1993-12-28 2001-10-16 Sandia Corporation Visual cluster analysis and pattern recognition methods
US5790690A (en) * 1995-04-25 1998-08-04 Arch Development Corporation Computer-aided method for automated image feature analysis and diagnosis of medical images
JPH09261640A (en) * 1996-03-22 1997-10-03 Oki Electric Ind Co Ltd Image coder
US5987094A (en) * 1996-10-30 1999-11-16 University Of South Florida Computer-assisted method and apparatus for the detection of lung nodules
US5999639A (en) * 1997-09-04 1999-12-07 Qualia Computing, Inc. Method and system for automated detection of clustered microcalcifications from digital mammograms
US6075878A (en) * 1997-11-28 2000-06-13 Arch Development Corporation Method for determining an optimally weighted wavelet transform based on supervised training for detection of microcalcifications in digital mammograms

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8235526B2 (en) 1999-04-23 2012-08-07 Neuroptics, Inc. Pupilometer with pupil irregularity detection, pupil tracking, and pupil response detection capability, glaucoma screening capability, intracranial pressure detection capability, and ocular aberration measurement capability
US20060181678A1 (en) * 1999-04-23 2006-08-17 Neuroptics, Inc. A California Corporation Pupilometer with pupil irregularity detection, pupil tracking, and pupil response detection capability, glaucoma screening capability, intracranial pressure detection capability, and ocular aberration measurement capability
US20100195049A1 (en) * 1999-04-23 2010-08-05 Neuroptics, Inc. Pupilometer with pupil irregularity detection, pupil tracking, and pupil response detection capability, glaucoma screening capability, intracranial pressure detection capability, and ocular aberration measurement capability
US7147327B2 (en) 1999-04-23 2006-12-12 Neuroptics, Inc. Pupilometer with pupil irregularity detection, pupil tracking, and pupil response detection capability, glaucoma screening capability, intracranial pressure detection capability, and ocular aberration measurement capability
US7670002B2 (en) 1999-04-23 2010-03-02 Neuroptics, Inc. Pupilometer with pupil irregularity detection, pupil tracking, and pupil response detection capability, glaucoma screening capability, intracranial pressure detection capability, and ocular aberration measurement capability
US20150294149A1 (en) * 2004-12-06 2015-10-15 Id-U Biometrics Ltd. Multivariate Dynamic Biometrics System
US20080104415A1 (en) * 2004-12-06 2008-05-01 Daphna Palti-Wasserman Multivariate Dynamic Biometrics System
US20060204128A1 (en) * 2005-03-07 2006-09-14 Silverstein D A System and method for correcting image vignetting
US7634152B2 (en) * 2005-03-07 2009-12-15 Hewlett-Packard Development Company, L.P. System and method for correcting image vignetting
US7421144B2 (en) * 2005-03-16 2008-09-02 Fabio Riccardi Interface method and system for finding image intensities
US20060210151A1 (en) * 2005-03-16 2006-09-21 Fabio Riccardi Interface method and system for finding image intensities
US20070092148A1 (en) * 2005-10-20 2007-04-26 Ban Oliver K Method and apparatus for digital image rudundancy removal by selective quantization
US20070211962A1 (en) * 2006-03-10 2007-09-13 Samsung Electronics Co., Ltd. Apparatus and method for selectively outputing image frames
US8023769B2 (en) * 2006-03-10 2011-09-20 Samsung Electronics Co., Ltd. Apparatus and method for selectively outputing image frames
US7978906B2 (en) * 2007-06-14 2011-07-12 Microsoft Corporation Capturing long-range correlations in patch models
US20080310755A1 (en) * 2007-06-14 2008-12-18 Microsoft Corporation Capturing long-range correlations in patch models
US8911085B2 (en) 2007-09-14 2014-12-16 Neuroptics, Inc. Pupilary screening system and method
US20100329529A1 (en) * 2007-10-29 2010-12-30 The Trustees Of The University Of Pennsylvania Computer assisted diagnosis (cad) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (mrs) and imaging (mri)
US20100169024A1 (en) * 2007-10-29 2010-07-01 The Trustees Of The University Of Pennsylvania Defining quantitative signatures for different gleason grades of prostate cancer using magnetic resonance spectroscopy
WO2009058915A1 (en) * 2007-10-29 2009-05-07 The Trustees Of The University Of Pennsylvania Computer assisted diagnosis (cad) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (mrs) and imaging (mri)
US8295575B2 (en) 2007-10-29 2012-10-23 The Trustees of the University of PA. Computer assisted diagnosis (CAD) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (MRS) and imaging (MRI)
US20110228224A1 (en) * 2008-11-28 2011-09-22 Kamran Siminou Methods, systems, and devices for monitoring anisocoria and asymmetry of pupillary reaction to stimulus
US8534840B2 (en) 2008-11-28 2013-09-17 Neuroptics, Inc. Methods, systems, and devices for monitoring anisocoria and asymmetry of pupillary reaction to stimulus
US8665294B2 (en) * 2009-05-12 2014-03-04 Canon Kabushiki Kaisha Image layout device, image layout method, and storage medium
US20100289818A1 (en) * 2009-05-12 2010-11-18 Canon Kabushiki Kaisha Image layout device, image layout method, and storage medium
US20120267345A1 (en) * 2011-04-20 2012-10-25 Rolls-Royce Plc Method of manufacturing a component
US8965104B1 (en) * 2012-02-10 2015-02-24 Google Inc. Machine vision calibration with cloud computing systems
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
US9607023B1 (en) 2012-07-20 2017-03-28 Ool Llc Insight and algorithmic clustering for automated synthesis
US10318503B1 (en) 2012-07-20 2019-06-11 Ool Llc Insight and algorithmic clustering for automated synthesis
US11216428B1 (en) 2012-07-20 2022-01-04 Ool Llc Insight and algorithmic clustering for automated synthesis
US9190017B2 (en) 2013-01-02 2015-11-17 International Business Machines Corporation Proportional pointer transition between multiple display devices
US9514707B2 (en) 2013-01-02 2016-12-06 International Business Machines Corporation Proportional pointer transition between multiple display devices
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Also Published As

Publication number Publication date
US6389169B1 (en) 2002-05-14

Similar Documents

Publication Publication Date Title
US6389169B1 (en) Intelligent systems and methods for processing image data based upon anticipated regions of visual interest
Privitera et al. Algorithms for defining visual regions-of-interest: Comparison with eye fixations
Jian et al. Visual-patch-attention-aware saliency detection
Krisshna et al. Face recognition using transform domain feature extraction and PSO-based feature selection
Kruizinga et al. Nonlinear operator for oriented texture
Chaki et al. Texture feature extraction techniques for image recognition
Laws Textured image segmentation
US6463163B1 (en) System and method for face detection using candidate image region selection
US20040161134A1 (en) Method for extracting face position, program for causing computer to execute the method for extracting face position and apparatus for extracting face position
JPH08339445A (en) Method and apparatus for detection, recognition and coding of complicated object using stochastic intrinsic space analysis
Leeds et al. Comparing visual representations across human fMRI and computational vision
EP1964028A1 (en) Method for automatic detection and classification of objects and patterns in low resolution environments
Liu et al. Pre-attention and spatial dependency driven no-reference image quality assessment
Bruce Features that draw visual attention: an information theoretic perspective
Venkatachalam et al. An efficient Gabor Walsh-Hadamard transform based approach for retrieving brain tumor images from MRI
Kara et al. Using wavelets for texture classification
Colores-Vargas et al. Video images fusion to improve iris recognition accuracy in unconstrained environments
Zujovic Perceptual texture similarity metrics
Mendi Image quality assessment metrics combining structural similarity and image fidelity with visual attention
CN114820603A (en) Intelligent health management method based on AI tongue diagnosis image processing and related device
Jai-Andaloussi et al. Content Based Medical Image Retrieval based on BEMD: optimization of a similarity metric
Walshe et al. Detection of occluding targets in natural backgrounds
Jian et al. Towards reliable object representation via sparse directional patches and spatial center cues
Seetharaman et al. Statistical framework for content-based medical image retrieval based on wavelet orthogonal polynomial model with multiresolution structure
EP3776475B1 (en) Methods of generating an encoded representation of an image and systems of operating thereof

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION