US20020159750A1 - Method for segmenting and indexing TV programs using multi-media cues - Google Patents

Method for segmenting and indexing TV programs using multi-media cues Download PDF

Info

Publication number
US20020159750A1
US20020159750A1 US09/843,499 US84349901A US2002159750A1 US 20020159750 A1 US20020159750 A1 US 20020159750A1 US 84349901 A US84349901 A US 84349901A US 2002159750 A1 US2002159750 A1 US 2002159750A1
Authority
US
United States
Prior art keywords
segments
program
sub
video
genre
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/843,499
Inventor
Radu Jasinschi
Jennifer Louie
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Priority to US09/843,499 priority Critical patent/US20020159750A1/en
Assigned to KONINKLIJKE PHILIPS ELECTRONIS N.V. reassignment KONINKLIJKE PHILIPS ELECTRONIS N.V. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LOUIE, JENNIFER, JASINSCHI, RADU S.
Priority to KR1020027017707A priority patent/KR100899296B1/en
Priority to CNB028013948A priority patent/CN1284103C/en
Priority to EP02722619A priority patent/EP1393207A2/en
Priority to PCT/IB2002/001420 priority patent/WO2002089007A2/en
Priority to JP2002586236A priority patent/JP4332700B2/en
Publication of US20020159750A1 publication Critical patent/US20020159750A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/785Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the present invention generally relates to video data services and devices, and more particularly to a method and device for segmenting and indexing TV programs using multi-media cues.
  • the TIVO box On the market today, there are a number of video data services and devices.
  • An example of one is the TIVO box.
  • This device is a personal digital video recorder capable of continuously recording satellite, cable or broadcast TV.
  • the TIVO box also includes an electronic program guide (EPG) that enables a user to select a particular program or category of program to be recorded.
  • EPG electronic program guide
  • Genre describes TV programs by categories such as business, documentary, drama, health, news, sports and talk.
  • An example of genre classification is found in the Tribune Media Services EPG.
  • Fields 173 to 178, designated “tf_genre_desc”, are reserved for textual description of TV program genre. Therefore, using these fields, a user can program a TIVO-type box to record programs of a particular type genre.
  • EPG-based descriptions may not always be desirable.
  • EPG data may not always be available or always accurate.
  • the genre classification in current EPGs is for a whole program.
  • it is possible that the genre classification in a single program may change from segment to segment. Therefore, it is would be desirable to generate genre classifications directly from the program independent (1Y) of the EPG data.
  • the present invention is directed to a method of selecting dominant multi-media cues from a number of video segments.
  • the method includes a multi-media information probability being calculated for each frame of the video segments.
  • Each of the video segments is divided into sub-segments.
  • a probability distribution of multi-media information is also calculated for each of the sub-segments using the multi-media information for each frame.
  • the probability distribution for each sub-segment is combined to form a combined probability distribution. Further, the multi-media information having the highest combined probability in the combined probability distribution is selected as the dominant multi-media cues.
  • the present invention is also directed to a method of segmenting and indexing video.
  • the method includes program segments that are selected from the video.
  • the program segments are divided into program sub-segments.
  • Genre-based indexing is performed on the program sub-segments using multi-media cues characteristic of a given genre of program. Further, object-based indexing is also performed on the program sub-segments.
  • the present invention is also directed to a method of storing video.
  • the method includes the video being pre-processed. Also, program segments are selected from the video. The program segments are divided into program sub-segments. Genre-based indexing is performed on the program sub-segments using multi-media cues characteristic of a given genre of program. Further, object-based indexing is also performed on the program sub-segments.
  • the present invention is also directed to a device for storing video.
  • the device includes a pre-processor for pre-processing the video.
  • a segmenting and indexing unit is included for selecting program segments from the video, dividing the program segments into program sub-segments and performing genre-based indexing on the program sub-segments using multi-media cues characteristic of a given genre of program to produce indexed program sub-segments.
  • a storage device is also included for storing the indexed program sub-segments. Further, the segmenting and indexing unit also performs object-based indexing on the program sub-segments.
  • FIG. 1 is a flow chart showing one example of a method for determining the multi-media cues according to the present invention
  • FIG. 2 is a table showing one example of probabilities for mid-level audio information
  • FIG. 3 is a table showing one example of a system of votes and thresholds according to the present invention.
  • FIG. 4 is a bar graph showing a probability distribution calculated using the system of FIG. 3;
  • FIG. 5 is a flow chart showing one example of a method for segmenting and indexing TV programs according to the present invention
  • FIG. 6 is a bar graph illustrating another example of multi-media cues according to the present invention.
  • FIG. 7 is a block diagram showing one example of a video recording device according to the present invention.
  • Multi-media information is divided into three domains including (i) audio, (ii) visual, and (iii) textual. This information within each domain is divided in different levels of granularity including low, mid, and high-level.
  • low-level audio information is described by signal processing parameters, such as, average signal energy, cepstral coefficients, and pitch.
  • An example of low-level visual information is pixel or frame-based including visual attributes, such as, color, motion, shape, and texture that are represented at each pixel.
  • closed captioning CC
  • low-level information is given by ASCII characters, such as, letters or words.
  • mid-level multimedia information such as mid-level audio information usually is made up of the silence, noise, speech, music, speech plus noise, speech plus speech, and speech plus music categories.
  • key-frames are used, which are defined as the first frame of a new video shot (sequence of video frames with similar intensity profile), color, and visual text (text superimposed on video images).
  • CC information a set of keywords (words representative of textual information), and categories such as weather, international, crime, sports, movies, fashion, tech stocks, music, automobiles, war, economy, energy, disasters, art and politics.
  • mid-level information of the three multimedia domains probabilities are used. These probabilities are real numbers between zero and one, which determine how representative each category is, for each domain, within a given video segment. For example, numbers close to one determine that a given category is highly probable to be part of a video sequence, while numbers close to zero determine that the corresponding category is less likely to occurs in a video sequence. It should be noted that the present invention is not restricted to the particular choices of mid-level information described above.
  • multi-media characteristics or cues For example, there is usually a higher percentage of key-frames per unit time in commercials than in program segments. Further, there is also a usual higher amount of speech in talk shows.
  • these multi-media cues are used to segment and index TV programs, as described below in conjunction with FIG. 2.
  • these multi-media cues are used to generate genre classification information for TV program sub-segments.
  • current personal video recorders such as the TIVO box only include genre classification for a whole program as brief descriptive textual information in the EPG.
  • the multi-media cues are also used to separate program segments from commercial segments.
  • the multi-media cues are first determined.
  • One example of a method for determining the multi-media cues according to the present invention is shown in FIG. 1.
  • discrete video segments for each program are processed in steps 2 - 10 .
  • steps 12 - 13 a number of programs are processed in order to determine the multi-media cues for a particular genre.
  • the video segments may originate from cable, satellite or broadcast TV programming. Since these types of programming all include both program segments and commercial segments, it is further presumed that a video segment may be either a program segment or a commercial segment.
  • step 2 multi-media information probability for each frame of the video is calculated. This includes calculating the probability of occurrence of multi-media information such as audio, video and transcript in each frame of video. In order to perform step 2 , different techniques are utilized depending on the category of multimedia information.
  • the probability is calculated by the sequential use of edge detection, thresholding, region merging, and character shape extraction.
  • the presence or absence of text characters per frame is only looked at. Therefore, for the presence of text characters the probability is equal to one and for the absence of text characters the probability is equal to zero.
  • the probability is calculated by detecting with a given probability that depends on the joint of face skin tone color and oval face shape.
  • CC categories including weather, international, crime, sports, movies, fashion, tech stocks, music, automobiles, war, economy, energy, stocks, violence, financial, national, biotech, disasters, art, and politics.
  • Each category is associated with a set of “master” keywords. There exists overlap in this set of keywords.
  • keywords are determined, such as, words that repeat, and match these to the 20 lists of “master” keywords. If there is a match between the two, a vote is given to that key word. This is repeated for all keywords in the paragraph. In the end, these votes are divided by the total number of occurrences of this keyword within each paragraph. Therefore, this is the CC category probability.
  • step 2 it is preferred that probabilities for each of the (mid-level) categories of the multi-media information within each domain are calculated, which is done for each frame of the video sequence.
  • An example of such probabilities in the audio domain is shown in FIG. 2, which includes the seven audio categories as defined above. The first two columns of FIG. 2 correspond to the start and end frames of the video. While the last seven columns include the corresponding probabilities, one for each mid-level category.
  • multi-media cues are initially selected that are characteristic of a given TV program type.
  • this selection is based on common knowledge.
  • MTV programs in the majority of cases, there will be a lot of music.
  • the common knowledge says that audio cues should be used, and in particular focus on the “music” and (maybe) the “speech+music” categories. Therefore, common knowledge is the corpus of TV production cues and elements that are common (as verified by field tests) in TV programs.
  • the video segments are divided into sub-segments.
  • Step 6 may be performed in a number of different ways including dividing video segments into arbitrary equal sub-segments or by using a pre-computed tessellation.
  • the video segments may also be divided using close caption information if included in the transcript information of the video segments.
  • close caption information includes, in addition to the ASCII characters representing letters of an alphabet, characters, such as the double arrows to indicate a change in subject or person speaking. Since a change in speaker or subject could indicate a significant change in the video content information, it may be desirable to divide the video segments in such as way as to respect speaker change information. Therefore, in step 6 , it may be preferable to divide the video segments at the occurrence of such characters.
  • step 8 a probability distribution is calculated for the multi-media information included in each of the sub-segments using the probabilities calculated in step 2 . This is necessary since the probabilities calculated are for each frame and there are many frames in the video of TV programs typically about 30 frames per second. Thus, by determining probability distributions per sub-segments, an appreciable compactness is obtained.
  • the probability distribution is obtained by first comparing each probability with a (pre-determined) threshold for each category of multimedia information. In order to allow the maximum amount of frames to pass through, a lower threshold is preferred such as, 0.1. If each probability is larger than its corresponding threshold, then a one (1) is associated to that category. If each probability is not larger, a zero (0) is assigned.
  • step 10 the probability distributions calculated for each sub-segment in step 8 are combined to provide a single probability distribution for all of the video segments in a particular program.
  • step 10 may be performed by either forming an average or a weighted average of the probability distributions of each of the sub-segments.
  • a system of votes and thresholds be used.
  • An example of such a system is shown in FIG. 3, where the number of votes in the first three columns correspond to the thresholds in the last three columns.
  • FIG. 3 it is assumed that, out of the seven audio categories, three (3) are dominant. This presumption is based on the multi-media cues initially selected in step 4 of FIG. 1.
  • the probabilities for each sub-segment of the target video and for each of the seven audio categories are transformed to numbers from zero to 1, where 100% will correspond to a probability of 1.0, etc. First, it is determined in what range the sub-segment probability P falls. For example, in FIG.
  • the method may loop back to step 2 in order to begin processing the video segments of another program. If only one program is being processed, then the method will just advance to step 13 . However, it is preferred that a number of programs should be processed for a given genre of programs or commercials. If there are no more programs to be processed, the method will proceed to step 12 .
  • step 12 the probability distributions from a number of programs of the same genre are combined. This provides a probability distribution for all of the programs of the same genre.
  • An example of such a probability distribution is shown in FIG. 4.
  • step 12 may be performed by either calculating an average or a weighted average of the probability distributions for all of the programs of the same genre. Also, if the probability distributions being combined in step 12 were calculated using a system of votes and thresholds, then step 12 may also be performed by simply summing the votes of the same category for all of the programs of same genre.
  • the multi-media cues having the higher probabilities are selected in step 13 .
  • a probability is associated with each category and for each multimedia cue.
  • categories having a higher probability will be selected as the dominant multi-media cues.
  • a single category with the absolute largest probability value is not selected. Instead, a set of categories having the joint highest probability is selected. For example, in FIG. 4, the speech and speech plus music (SpMu) categories have the highest probability for TV NEWS program and thus would be selected as the dominant multimedia cues in step 13 .
  • FIG. 5 One example of a method for segmenting and indexing TV programs according to the present invention is shown in FIG. 5.
  • the first box represents the video in 14 that will be segmented and indexed according to the present invention.
  • the video in 14 may represent cable, satellite or broadcast TV programming that includes a number of discrete program segments. Further, as in most TV programming, there are commercial segments in between the program segments.
  • step 16 the program segments are selected from the video in 14 in order to separate the program segments 18 from the commercial segments.
  • the program segments are selected 16 using multi-media cues characteristic of a given type of video segment.
  • multi-media cues are selected that are capable of identifying a commercial in a video stream.
  • An example of one is shown in FIG. 6.
  • the percentage of key-frames is much higher for commercials than programs.
  • key frame rate would be a good example of a multi-media cue to be utilized in step 16 .
  • these multi-media cues are compared to segments of the video in 14 .
  • the segments that do not fit the pattern of the multi-media cues are selected as the program segments 18 . This is done by comparing the test video program/commercial segments' probabilities for each multimedia categories with the probabilities obtained above in the method of FIG. 1.
  • step 20 the program segments are divided into sub-segments 22 .
  • This division may be done by dividing the program segments into arbitrary equal sub-segments or by using a pre-computed tessellation.
  • close caption information includes characters (double arrows) to indicate a change in subject or person speaking. Since a change in speaker or subject could indicate a significant change in the video, this is a desirable place to divide the program segments 18 . Therefore, in step 20 , it may be preferable to divide the program segments at the occurrence of such a character.
  • step 20 indexing is then performed on the program sub-segments 22 in steps 24 and 26 , as shown.
  • genre-based indexing is performed on each of the program sub-segments 22 .
  • genre describes TV programs by categories such as business, documentary, drama, health, news, sports and talk.
  • genre-based information is inserted in each of the sub-segments 22 . This genre-based information could be in a form of a tag that corresponds to the genre classification of each of the sub-segments 22 .
  • the genre-based indexing 24 will be performed using the multi-media cues generated by the method described in FIG. 1. As previously described, these multi-media cues are characteristic of a given genre of program. Thus, in step 24 , multi-media cues that are characteristic of a particular genre of program are compared to each of the sub-segments 22 . Where there is a match between one of the multi-media cues and sub-segments, a tag indicating the genre is inserted.
  • step 26 object-based indexing is performed on the program sub-segments 22 .
  • information identifying each of the objects included in a sub-segment is inserted.
  • This object-based information could be in a form of a tag that corresponds to each of the objects.
  • an object may be background, foreground, people, cars, audio, faces, music clips, etc.
  • step 28 the sub-segments after being indexed in steps 24 , 26 are combined to produce segmented and indexed program segments 30 .
  • the genre-based information or tags and the object-based information or tags from corresponding sub-segments is compared. Where there is match between the two, the genre-based and object-based information is combined into the same sub-segment.
  • each of the segmented and indexed program segments 30 include tags indicating both genre and the object information.
  • the segmented and indexed program segments 30 produced by the method of FIG. 1 may be utilized in a personal recording device.
  • a video recording device An example of such a video recording device is shown in FIG. 7.
  • the video recording device includes a video pre-processor 32 that receives the Video In.
  • the pre-processor 32 performs pre-processing on the Video In such as de-multiplexing or decoding, if necessary.
  • a segmenting and indexing unit 34 is coupled to the output of the video preprocessor 32 .
  • the segmenting and indexing unit 34 receives the Video In after being preprocessed to segment and index the Video according to the method of FIG. 5.
  • the method of FIG. 5 divides the Video In into program sub-segments and, then performs genre-based indexing and object based indexing on each of the sub-segments to produce the segmented and indexed program segments.
  • a storage unit 36 is coupled to the output of the segmenting and indexing unit 34 .
  • the storage unit 36 is utilized to store the Video In after being segmented and indexed.
  • the storage unit 36 may be embodied by either a magnetic or an optical storage device.
  • a user interface 38 is also included.
  • the user interface 38 is utilized to access the storage unit 36 .
  • a user may utilize the genre-based and object-based information inserted into the segmented and indexed program segments, as previously described. This would enable a user via the user input 40 to retrieve a whole program, program segment or program sub-segment based on either a particular genre or object.

Abstract

The present invention is directed to a method of segmenting and indexing video using multi-media cues characteristic of a given genre of program. According to the present invention, these multi-media cues are selected by a multi-media information probability being calculated for each frame of video segments. Each of the video segments is divided into sub-segments. A probability distribution of multi-media information is also calculated for each of the sub-segments using the multi-media information for each frame. The probability distribution for each sub-segments are combined to form a combined probability distribution. Further, the multi-media information having the highest combined probability in the combined probability distribution is selected as the dominant multi-media cues.

Description

    BACKGROUND OF THE INVENTION
  • The present invention generally relates to video data services and devices, and more particularly to a method and device for segmenting and indexing TV programs using multi-media cues. [0001]
  • On the market today, there are a number of video data services and devices. An example of one is the TIVO box. This device is a personal digital video recorder capable of continuously recording satellite, cable or broadcast TV. The TIVO box also includes an electronic program guide (EPG) that enables a user to select a particular program or category of program to be recorded. [0002]
  • One way TV programs are classified is according to Genre. Genre describes TV programs by categories such as business, documentary, drama, health, news, sports and talk. An example of genre classification is found in the Tribune Media Services EPG. In this particular EPG, Fields 173 to 178, designated “tf_genre_desc”, are reserved for textual description of TV program genre. Therefore, using these fields, a user can program a TIVO-type box to record programs of a particular type genre. [0003]
  • However, the use of EPG-based descriptions may not always be desirable. First of all, EPG data may not always be available or always accurate. Further, the genre classification in current EPGs is for a whole program. However, it is possible that the genre classification in a single program may change from segment to segment. Therefore, it is would be desirable to generate genre classifications directly from the program independent (1Y) of the EPG data. [0004]
  • SUMMARY OF THE INVENTION
  • The present invention is directed to a method of selecting dominant multi-media cues from a number of video segments. The method includes a multi-media information probability being calculated for each frame of the video segments. Each of the video segments is divided into sub-segments. A probability distribution of multi-media information is also calculated for each of the sub-segments using the multi-media information for each frame. The probability distribution for each sub-segment is combined to form a combined probability distribution. Further, the multi-media information having the highest combined probability in the combined probability distribution is selected as the dominant multi-media cues. [0005]
  • The present invention is also directed to a method of segmenting and indexing video. The method includes program segments that are selected from the video. The program segments are divided into program sub-segments. Genre-based indexing is performed on the program sub-segments using multi-media cues characteristic of a given genre of program. Further, object-based indexing is also performed on the program sub-segments. [0006]
  • The present invention is also directed to a method of storing video. The method includes the video being pre-processed. Also, program segments are selected from the video. The program segments are divided into program sub-segments. Genre-based indexing is performed on the program sub-segments using multi-media cues characteristic of a given genre of program. Further, object-based indexing is also performed on the program sub-segments. [0007]
  • The present invention is also directed to a device for storing video. The device includes a pre-processor for pre-processing the video. A segmenting and indexing unit is included for selecting program segments from the video, dividing the program segments into program sub-segments and performing genre-based indexing on the program sub-segments using multi-media cues characteristic of a given genre of program to produce indexed program sub-segments. A storage device is also included for storing the indexed program sub-segments. Further, the segmenting and indexing unit also performs object-based indexing on the program sub-segments.[0008]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Referring now to the drawings were like reference numbers represent corresponding parts throughout: [0009]
  • FIG. 1 is a flow chart showing one example of a method for determining the multi-media cues according to the present invention; [0010]
  • FIG. 2 is a table showing one example of probabilities for mid-level audio information; [0011]
  • FIG. 3 is a table showing one example of a system of votes and thresholds according to the present invention; [0012]
  • FIG. 4 is a bar graph showing a probability distribution calculated using the system of FIG. 3; [0013]
  • FIG. 5 is a flow chart showing one example of a method for segmenting and indexing TV programs according to the present invention; [0014]
  • FIG. 6 is a bar graph illustrating another example of multi-media cues according to the present invention; [0015]
  • FIG. 7 is a block diagram showing one example of a video recording device according to the present invention.[0016]
  • DETAILED DESCRIPTION
  • Multi-media information is divided into three domains including (i) audio, (ii) visual, and (iii) textual. This information within each domain is divided in different levels of granularity including low, mid, and high-level. For example, low-level audio information is described by signal processing parameters, such as, average signal energy, cepstral coefficients, and pitch. An example of low-level visual information is pixel or frame-based including visual attributes, such as, color, motion, shape, and texture that are represented at each pixel. For closed captioning (CC), low-level information is given by ASCII characters, such as, letters or words. [0017]
  • According to the present invention, it is preferable to use mid-level multimedia information. Such mid-level audio information usually is made up of the silence, noise, speech, music, speech plus noise, speech plus speech, and speech plus music categories. For the mid-level visual information key-frames are used, which are defined as the first frame of a new video shot (sequence of video frames with similar intensity profile), color, and visual text (text superimposed on video images). For mid-level CC information, a set of keywords (words representative of textual information), and categories such as weather, international, crime, sports, movies, fashion, tech stocks, music, automobiles, war, economy, energy, disasters, art and politics. [0018]
  • As mid-level information of the three multimedia domains probabilities are used. These probabilities are real numbers between zero and one, which determine how representative each category is, for each domain, within a given video segment. For example, numbers close to one determine that a given category is highly probable to be part of a video sequence, while numbers close to zero determine that the corresponding category is less likely to occurs in a video sequence. It should be noted that the present invention is not restricted to the particular choices of mid-level information described above. [0019]
  • According to the present invention, it has been found that for a particular type of program, there are dominant multi-media characteristics or cues. For example, there is usually a higher percentage of key-frames per unit time in commercials than in program segments. Further, there is also a usual higher amount of speech in talk shows. Thus, according to the present invention, these multi-media cues are used to segment and index TV programs, as described below in conjunction with FIG. 2. In particular, these multi-media cues are used to generate genre classification information for TV program sub-segments. In contrast, current personal video recorders such as the TIVO box only include genre classification for a whole program as brief descriptive textual information in the EPG. Further, according to the present invention, the multi-media cues are also used to separate program segments from commercial segments. [0020]
  • Before being used, the multi-media cues are first determined. One example of a method for determining the multi-media cues according to the present invention is shown in FIG. 1. In the method of FIG. 1, discrete video segments for each program are processed in steps [0021] 2-10. Further, in steps 12-13, a number of programs are processed in order to determine the multi-media cues for a particular genre. For the purpose of this discussion, it is presumed that the video segments may originate from cable, satellite or broadcast TV programming. Since these types of programming all include both program segments and commercial segments, it is further presumed that a video segment may be either a program segment or a commercial segment.
  • In [0022] step 2, multi-media information probability for each frame of the video is calculated. This includes calculating the probability of occurrence of multi-media information such as audio, video and transcript in each frame of video. In order to perform step 2, different techniques are utilized depending on the category of multimedia information.
  • In the visual domain such as for keyframes, macroblock level information from the DC component of the DCT coefficients to determine frame differences is utilized. The probability of a keyframe occurrence is a normalized number, between zero and one, of a given DC component difference being larger than a (experimentally) given threshold. Given two consecutive frames, the DC components are extracted. This difference is compared to a threshold that is determined experimentally. Also, a maximum value for the DC difference is computed. The range between this maximum value and zero (the DC difference is equal to the threshold) is used to generate the probability, that is equal to the (DC_difference—threshold)/max_DC_difference. [0023]
  • For video text, the probability is calculated by the sequential use of edge detection, thresholding, region merging, and character shape extraction. In the current implementation, the presence or absence of text characters per frame is only looked at. Therefore, for the presence of text characters the probability is equal to one and for the absence of text characters the probability is equal to zero. Further, for faces, the probability is calculated by detecting with a given probability that depends on the joint of face skin tone color and oval face shape. [0024]
  • In the audio domain, for each twenty-two (22) ms temporal window “a segment” classification is realized between silence, noise, speech, music, speech plus noise, speech plus speech, and speech plus music categories. This is a “winner take all” decision where only one category wins. Then this is repeated for a hundred (100) such consecutive segments, that are, about two (2) seconds in duration. Then, a count (or vote) for the number of segments with a given category classification is performed, which is then divided by a hundred (100). This gives the probability for each category for all of the two (2) second intervals. [0025]
  • In the transcript domain, there are 20 close captioning (CC) categories including weather, international, crime, sports, movies, fashion, tech stocks, music, automobiles, war, economy, energy, stocks, violence, financial, national, biotech, disasters, art, and politics. Each category is associated with a set of “master” keywords. There exists overlap in this set of keywords. For each CC paragraph, between the “>>” symbol, keywords are determined, such as, words that repeat, and match these to the 20 lists of “master” keywords. If there is a match between the two, a vote is given to that key word. This is repeated for all keywords in the paragraph. In the end, these votes are divided by the total number of occurrences of this keyword within each paragraph. Therefore, this is the CC category probability. [0026]
  • For [0027] step 2, it is preferred that probabilities for each of the (mid-level) categories of the multi-media information within each domain are calculated, which is done for each frame of the video sequence. An example of such probabilities in the audio domain is shown in FIG. 2, which includes the seven audio categories as defined above. The first two columns of FIG. 2 correspond to the start and end frames of the video. While the last seven columns include the corresponding probabilities, one for each mid-level category.
  • Referring back to FIG. 1, in [0028] step 4, multi-media cues are initially selected that are characteristic of a given TV program type. However, at this time, this selection is based on common knowledge. For example, it is commonly known that TV commercials have, in general, a high cut rate (=a large number of shots or average key-frames per unit time); then use visual key-frame rate information. In another example, it is common for MTV programs, in the majority of cases, there will be a lot of music. Thus, the common knowledge says that audio cues should be used, and in particular focus on the “music” and (maybe) the “speech+music” categories. Therefore, common knowledge is the corpus of TV production cues and elements that are common (as verified by field tests) in TV programs.
  • In step [0029] 6, the video segments are divided into sub-segments. Step 6 may be performed in a number of different ways including dividing video segments into arbitrary equal sub-segments or by using a pre-computed tessellation. Further, the video segments may also be divided using close caption information if included in the transcript information of the video segments. As is commonly known, close caption information includes, in addition to the ASCII characters representing letters of an alphabet, characters, such as the double arrows to indicate a change in subject or person speaking. Since a change in speaker or subject could indicate a significant change in the video content information, it may be desirable to divide the video segments in such as way as to respect speaker change information. Therefore, in step 6, it may be preferable to divide the video segments at the occurrence of such characters.
  • In step [0030] 8, a probability distribution is calculated for the multi-media information included in each of the sub-segments using the probabilities calculated in step 2. This is necessary since the probabilities calculated are for each frame and there are many frames in the video of TV programs typically about 30 frames per second. Thus, by determining probability distributions per sub-segments, an appreciable compactness is obtained. In step 8, the probability distribution is obtained by first comparing each probability with a (pre-determined) threshold for each category of multimedia information. In order to allow the maximum amount of frames to pass through, a lower threshold is preferred such as, 0.1. If each probability is larger than its corresponding threshold, then a one (1) is associated to that category. If each probability is not larger, a zero (0) is assigned. Further, after assigning the 0s and 1s to each category, these values are summed and divided by the total number of frames per video sub-segment. This results in a number determining the number of times that a given category is present conditioned on a set of thresholds.
  • In [0031] step 10, the probability distributions calculated for each sub-segment in step 8 are combined to provide a single probability distribution for all of the video segments in a particular program. According to the present invention, step 10 may be performed by either forming an average or a weighted average of the probability distributions of each of the sub-segments.
  • In order to calculate a weighted average for [0032] step 10, it is preferable that a system of votes and thresholds be used. An example of such a system is shown in FIG. 3, where the number of votes in the first three columns correspond to the thresholds in the last three columns. For example, in FIG. 3, it is assumed that, out of the seven audio categories, three (3) are dominant. This presumption is based on the multi-media cues initially selected in step 4 of FIG. 1. The probabilities for each sub-segment of the target video and for each of the seven audio categories are transformed to numbers from zero to 1, where 100% will correspond to a probability of 1.0, etc. First, it is determined in what range the sub-segment probability P falls. For example, in FIG. 3, four ranges are included for a given probability P. In line 1 these are: (i) (0≦P<0.3), (ii) (0.3≦P<0.5), (iii) (0.5≦P<0.8), (iv) (0.8≦P<1.0). The three thresholds determine the range bounds. Second, a vote depending on within what range P falls in and is then assigned. This process is repeated for all fifteen possible combinations shown in FIG. 3. At the end of this process, a given number of total votes per sub-segment is obtained. This process is common to any multimedia category. At the end of this process all the sub-segments of a given program segment (or commercial) and all program segments are processed to provide a probability distribution for the whole program.
  • Referring back to FIG. 1, after performing [0033] step 10, the method may loop back to step 2 in order to begin processing the video segments of another program. If only one program is being processed, then the method will just advance to step 13. However, it is preferred that a number of programs should be processed for a given genre of programs or commercials. If there are no more programs to be processed, the method will proceed to step 12.
  • In [0034] step 12, the probability distributions from a number of programs of the same genre are combined. This provides a probability distribution for all of the programs of the same genre. An example of such a probability distribution is shown in FIG. 4. According to the present invention, step 12 may be performed by either calculating an average or a weighted average of the probability distributions for all of the programs of the same genre. Also, if the probability distributions being combined in step 12 were calculated using a system of votes and thresholds, then step 12 may also be performed by simply summing the votes of the same category for all of the programs of same genre.
  • After performing [0035] step 12, the multi-media cues having the higher probabilities are selected in step 13. In the probability distributions calculated in step 12, a probability is associated with each category and for each multimedia cue. Thus, in step 13, categories having a higher probability will be selected as the dominant multi-media cues. However, a single category with the absolute largest probability value is not selected. Instead, a set of categories having the joint highest probability is selected. For example, in FIG. 4, the speech and speech plus music (SpMu) categories have the highest probability for TV NEWS program and thus would be selected as the dominant multimedia cues in step 13.
  • One example of a method for segmenting and indexing TV programs according to the present invention is shown in FIG. 5. As can be seen, the first box represents the video in [0036] 14 that will be segmented and indexed according to the present invention. For the purpose of this discussion, the video in 14 may represent cable, satellite or broadcast TV programming that includes a number of discrete program segments. Further, as in most TV programming, there are commercial segments in between the program segments.
  • In [0037] step 16, the program segments are selected from the video in 14 in order to separate the program segments 18 from the commercial segments. There exists a number of known methods for selecting the program segments in step 16. However, according to the present invention, it is preferred that the program segments are selected 16 using multi-media cues characteristic of a given type of video segment.
  • As previously described, multi-media cues are selected that are capable of identifying a commercial in a video stream. An example of one is shown in FIG. 6. As can be seen, the percentage of key-frames is much higher for commercials than programs. Thus, key frame rate would be a good example of a multi-media cue to be utilized in [0038] step 16. In step 16, these multi-media cues are compared to segments of the video in 14. The segments that do not fit the pattern of the multi-media cues are selected as the program segments 18. This is done by comparing the test video program/commercial segments' probabilities for each multimedia categories with the probabilities obtained above in the method of FIG. 1.
  • In [0039] step 20, the program segments are divided into sub-segments 22. This division may be done by dividing the program segments into arbitrary equal sub-segments or by using a pre-computed tessellation. However, it may be preferable to divide the program segments in step 20 according to close caption information that is included in the video segments. As previously described, close caption information includes characters (double arrows) to indicate a change in subject or person speaking. Since a change in speaker or subject could indicate a significant change in the video, this is a desirable place to divide the program segments 18. Therefore, in step 20, it may be preferable to divide the program segments at the occurrence of such a character.
  • After performing [0040] step 20, indexing is then performed on the program sub-segments 22 in steps 24 and 26, as shown. In step 24, genre-based indexing is performed on each of the program sub-segments 22. As previously described, genre describes TV programs by categories such as business, documentary, drama, health, news, sports and talk. Thus, in step 24, genre-based information is inserted in each of the sub-segments 22. This genre-based information could be in a form of a tag that corresponds to the genre classification of each of the sub-segments 22.
  • According to the present invention, the genre-based [0041] indexing 24 will be performed using the multi-media cues generated by the method described in FIG. 1. As previously described, these multi-media cues are characteristic of a given genre of program. Thus, in step 24, multi-media cues that are characteristic of a particular genre of program are compared to each of the sub-segments 22. Where there is a match between one of the multi-media cues and sub-segments, a tag indicating the genre is inserted.
  • In [0042] step 26, object-based indexing is performed on the program sub-segments 22. Thus, in step 26, information identifying each of the objects included in a sub-segment is inserted. This object-based information could be in a form of a tag that corresponds to each of the objects. For the purpose of this discussion an object may be background, foreground, people, cars, audio, faces, music clips, etc. There exists a number of known methods for performing the object-based indexing. Examples of such methods are described in U.S. Pat. No. 5,969,755, entitled “Motion Based Event Detection System and Method”, to Courtney, U.S. Pat. No. 5,606,655, entitled “Method For Representing Contents Of A Single Video Shot Using Frames ”, to Arman et al., in U.S. Pat. No. 6,185,363, entitled “Visual Indexing System”, to Dimitrova, et al. and in U.S. Pat. No. 6,182,069, entitled “Video Query System and Method”, to Niblack et al., which are all hereby incorporated by reference.
  • In [0043] step 28, the sub-segments after being indexed in steps 24,26 are combined to produce segmented and indexed program segments 30. In performing step 28, the genre-based information or tags and the object-based information or tags from corresponding sub-segments is compared. Where there is match between the two, the genre-based and object-based information is combined into the same sub-segment. As result of step 28, each of the segmented and indexed program segments 30 include tags indicating both genre and the object information.
  • According to the present invention, the segmented and [0044] indexed program segments 30 produced by the method of FIG. 1 may be utilized in a personal recording device. An example of such a video recording device is shown in FIG. 7. As can be seen, the video recording device includes a video pre-processor 32 that receives the Video In. During operation, the pre-processor 32 performs pre-processing on the Video In such as de-multiplexing or decoding, if necessary.
  • A segmenting and indexing unit [0045] 34 is coupled to the output of the video preprocessor 32. The segmenting and indexing unit 34 receives the Video In after being preprocessed to segment and index the Video according to the method of FIG. 5. As previously described, the method of FIG. 5 divides the Video In into program sub-segments and, then performs genre-based indexing and object based indexing on each of the sub-segments to produce the segmented and indexed program segments.
  • A [0046] storage unit 36 is coupled to the output of the segmenting and indexing unit 34. The storage unit 36 is utilized to store the Video In after being segmented and indexed. The storage unit 36 may be embodied by either a magnetic or an optical storage device. As can be further seen, a user interface 38 is also included. The user interface 38 is utilized to access the storage unit 36. According to the present invention, a user may utilize the genre-based and object-based information inserted into the segmented and indexed program segments, as previously described. This would enable a user via the user input 40 to retrieve a whole program, program segment or program sub-segment based on either a particular genre or object.
  • The foregoing description of the present invention has been presented for the purposes of illustration and description. It is not intended to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teachings. Therefore, it is intended that the scope of the invention should not be limited by the detailed description. [0047]

Claims (17)

What is claimed is:
1. A method for selecting dominant multi-media cues from a number of video segments, comprising the steps of:
calculating a multi-media information probability for each frame of the video segments;
dividing each of the video segments into sub-segments;
calculating a probability distribution of multi-media information for each of the sub-segments using the multi-media information for each frame;
combining the probability distribution for each sub-segments to form a combined probability distribution; and
selecting the multi-media information having the highest combined probability in the combined probability distribution as the dominant multi-media cues.
2. The method of claim 1, wherein the video segments are selected from a group consisting of commercial segments and program segments.
3. The method of claim 1, wherein the dividing video segments into sub-segments is performed using close caption information included in the video segments.
4. The method of claim 1, wherein the combining the probability distribution for each sub-segments is performed by the operation selected from a group consisting of an average or a weighted average.
5. The method of claim 1, wherein the combined probability distribution is formed from probability distributions of sub-segments of multiple programs.
6. The method of claim 1, which further includes initially selecting multi-media cues characteristic of a given TV program type or commercial.
7. A method of segmenting and indexing video, comprising the steps of:
selecting program segments from the video;
dividing the program segments into program sub-segments; and
performing genre-based indexing on the program sub-segments using multi-media cues characteristic of a given genre of program.
8. The method of claim 7, wherein the selecting program segments is performed using multi-media cues characteristic of a given type of video segment.
9. The method of claim 7, wherein the dividing the program segments into program sub-segments is performed according to closed caption information included in the program segments.
10. The method of claim 7, wherein the genre-based indexing includes:
comparing the multi-media cues characteristic of a given genre of program to each of the program sub-segments; and
inserting a tag into one of the program sub-segments if there is a match between one of the multi-media cues and sub-segments.
11. The method of claim 7, which further include performing object-based indexing on the program sub-segments.
12. A method of storing video, comprising the steps of:
pre-processing the video;
selecting program segments from the video;
dividing the program segments into program sub-segments;
performing genre-based indexing on the program sub-segments using multi-media cues characteristic of a given genre of program to produce indexed program sub-segments; and
storing the indexed program sub-segments.
13. The method of claim 12, wherein the genre-based indexing includes:
comparing the multi-media cues characteristic of a given genre of program to each of the program sub-segments; and
inserting a tag into one of the program sub-segments if there is a match between one of the multi-media cues and sub-segments.
14. The method of claim 12, which further include performing object-based indexing on the program sub-segments.
15. A device for storing video, comprising:
a pre-processor for pre-processing the video;
a segmenting and indexing unit for selecting program segments from the video,
dividing the program segments into program sub-segments and performing genre-based indexing on the program sub-segments using multi-media cues characteristic of a given genre of program to produce indexed program sub-segments; and
a storage device for storing the indexed program sub-segments.
16. The method of claim 15, wherein the genre-based indexing includes:
comparing the multi-media cues characteristic of a given genre of program to each of the program sub-segments; and
inserting a tag into one of the program sub-segments if there is a match between one of the multi-media cues and sub-segments.
17. The method of claim 15, wherein the segmenting and indexing unit further performs object-based indexing on the program sub-segments.
US09/843,499 2001-04-26 2001-04-26 Method for segmenting and indexing TV programs using multi-media cues Abandoned US20020159750A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US09/843,499 US20020159750A1 (en) 2001-04-26 2001-04-26 Method for segmenting and indexing TV programs using multi-media cues
KR1020027017707A KR100899296B1 (en) 2001-04-26 2002-04-22 A computer implemented method an apparatus of selecting dominant multi-media cues
CNB028013948A CN1284103C (en) 2001-04-26 2002-04-22 A method for segmenting and indexing TV programs using multi-media cues
EP02722619A EP1393207A2 (en) 2001-04-26 2002-04-22 A method for segmenting and indexing tv programs using multi-media cues
PCT/IB2002/001420 WO2002089007A2 (en) 2001-04-26 2002-04-22 A method for segmenting and indexing tv programs using multi-media cues
JP2002586236A JP4332700B2 (en) 2001-04-26 2002-04-22 Method and apparatus for segmenting and indexing television programs using multimedia cues

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/843,499 US20020159750A1 (en) 2001-04-26 2001-04-26 Method for segmenting and indexing TV programs using multi-media cues

Publications (1)

Publication Number Publication Date
US20020159750A1 true US20020159750A1 (en) 2002-10-31

Family

ID=25290181

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/843,499 Abandoned US20020159750A1 (en) 2001-04-26 2001-04-26 Method for segmenting and indexing TV programs using multi-media cues

Country Status (6)

Country Link
US (1) US20020159750A1 (en)
EP (1) EP1393207A2 (en)
JP (1) JP4332700B2 (en)
KR (1) KR100899296B1 (en)
CN (1) CN1284103C (en)
WO (1) WO2002089007A2 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040003042A1 (en) * 2001-06-28 2004-01-01 Horvitz Eric J. Methods and architecture for cross-device activity monitoring, reasoning, and visualization for providing status and forecasts of a users' presence and availability
US20040074832A1 (en) * 2001-02-27 2004-04-22 Peder Holmbom Apparatus and a method for the disinfection of water for water consumption units designed for health or dental care purposes
EP1463258A1 (en) * 2003-03-28 2004-09-29 Mobile Integrated Solutions Limited A system and method for transferring data over a wireless communications network
US20040249776A1 (en) * 2001-06-28 2004-12-09 Microsoft Corporation Composable presence and availability services
US20050021485A1 (en) * 2001-06-28 2005-01-27 Microsoft Corporation Continuous time bayesian network models for predicting users' presence, activities, and component usage
US20050183127A1 (en) * 1999-10-08 2005-08-18 Vulcan Patents, Llc System and method for the broadcast dissemination of time-ordered data with minimal commencement delays
US20080036914A1 (en) * 2006-06-28 2008-02-14 Russ Samuel H Stretch and zoom bar for displaying information
US20080115045A1 (en) * 2006-11-10 2008-05-15 Sony Computer Entertainment Inc. Hybrid media distribution with enhanced security
US20080115229A1 (en) * 2006-11-10 2008-05-15 Sony Computer Entertainment Inc. Providing content using hybrid media distribution scheme with enhanced security
US20100071005A1 (en) * 2008-09-18 2010-03-18 Yoshiaki Kusunoki Program recommendation apparatus
US7849475B2 (en) 1995-03-07 2010-12-07 Interval Licensing Llc System and method for selective recording of information
CN102123303A (en) * 2011-03-25 2011-07-13 天脉聚源(北京)传媒科技有限公司 Audio/video file playing method and system as well as transmission control device
US20110208722A1 (en) * 2010-02-23 2011-08-25 Nokia Corporation Method and apparatus for segmenting and summarizing media content
US8176515B2 (en) 1996-12-05 2012-05-08 Interval Licensing Llc Browser for use in navigating a body of information, with particular application to browsing information represented by audiovisual data
CN102611915A (en) * 2012-03-15 2012-07-25 华为技术有限公司 Video startup method, device and system
US8238722B2 (en) 1996-12-05 2012-08-07 Interval Licensing Llc Variable rate video playback with synchronized audio
US20130067333A1 (en) * 2008-10-03 2013-03-14 Finitiv Corporation System and method for indexing and annotation of video content
US8429244B2 (en) 2000-01-28 2013-04-23 Interval Licensing Llc Alerting users to items of current interest
US20150256843A1 (en) * 2014-03-07 2015-09-10 Steven Roskowski Adaptive Security Camera Image Compression Apparatus and Method of Operation
US20160172000A1 (en) * 2013-07-24 2016-06-16 Prompt, Inc. An apparatus of providing a user interface for playing and editing moving pictures and the method thereof
US11023733B2 (en) * 2017-07-10 2021-06-01 Flickstree Productions Pvt Ltd System and method for analyzing a video file in a shortened time frame

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8504918B2 (en) * 2010-02-16 2013-08-06 Nbcuniversal Media, Llc Identification of video segments

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5103431A (en) * 1990-12-31 1992-04-07 Gte Government Systems Corporation Apparatus for detecting sonar signals embedded in noise
US5103341A (en) * 1989-05-16 1992-04-07 Carl-Zeis-Stiftung Uv-capable dry lens for microscopes
US5343251A (en) * 1993-05-13 1994-08-30 Pareto Partners, Inc. Method and apparatus for classifying patterns of television programs and commercials based on discerning of broadcast audio and video signals
US5794788A (en) * 1993-04-30 1998-08-18 Massen; Robert Method and device for sorting materials
US6147940A (en) * 1995-07-26 2000-11-14 Sony Corporation Compact disc changer utilizing disc database
US6580679B1 (en) * 1998-04-10 2003-06-17 Sony Corporation Recording medium, reproduction method/ apparatus with multiple table of contents
US6751354B2 (en) * 1999-03-11 2004-06-15 Fuji Xerox Co., Ltd Methods and apparatuses for video segmentation, classification, and retrieval using image class statistical models
US6928233B1 (en) * 1999-01-29 2005-08-09 Sony Corporation Signal processing method and video signal processor for detecting and analyzing a pattern reflecting the semantics of the content of a signal

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5103341A (en) * 1989-05-16 1992-04-07 Carl-Zeis-Stiftung Uv-capable dry lens for microscopes
US5103431A (en) * 1990-12-31 1992-04-07 Gte Government Systems Corporation Apparatus for detecting sonar signals embedded in noise
US5794788A (en) * 1993-04-30 1998-08-18 Massen; Robert Method and device for sorting materials
US5343251A (en) * 1993-05-13 1994-08-30 Pareto Partners, Inc. Method and apparatus for classifying patterns of television programs and commercials based on discerning of broadcast audio and video signals
US6147940A (en) * 1995-07-26 2000-11-14 Sony Corporation Compact disc changer utilizing disc database
US6580679B1 (en) * 1998-04-10 2003-06-17 Sony Corporation Recording medium, reproduction method/ apparatus with multiple table of contents
US6928233B1 (en) * 1999-01-29 2005-08-09 Sony Corporation Signal processing method and video signal processor for detecting and analyzing a pattern reflecting the semantics of the content of a signal
US6751354B2 (en) * 1999-03-11 2004-06-15 Fuji Xerox Co., Ltd Methods and apparatuses for video segmentation, classification, and retrieval using image class statistical models

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7849475B2 (en) 1995-03-07 2010-12-07 Interval Licensing Llc System and method for selective recording of information
US8584158B2 (en) 1995-03-07 2013-11-12 Interval Licensing Llc System and method for selective recording of information
US8238722B2 (en) 1996-12-05 2012-08-07 Interval Licensing Llc Variable rate video playback with synchronized audio
US8176515B2 (en) 1996-12-05 2012-05-08 Interval Licensing Llc Browser for use in navigating a body of information, with particular application to browsing information represented by audiovisual data
US8726331B2 (en) 1999-10-08 2014-05-13 Interval Licensing Llc System and method for the broadcast dissemination of time-ordered data
US20050183127A1 (en) * 1999-10-08 2005-08-18 Vulcan Patents, Llc System and method for the broadcast dissemination of time-ordered data with minimal commencement delays
US8341688B2 (en) 1999-10-08 2012-12-25 Interval Licensing Llc System and method for the broadcast dissemination of time-ordered data
US8046818B2 (en) 1999-10-08 2011-10-25 Interval Licensing Llc System and method for the broadcast dissemination of time-ordered data
US9317560B2 (en) 2000-01-28 2016-04-19 Interval Licensing Llc Alerting users to items of current interest
US8429244B2 (en) 2000-01-28 2013-04-23 Interval Licensing Llc Alerting users to items of current interest
US20040074832A1 (en) * 2001-02-27 2004-04-22 Peder Holmbom Apparatus and a method for the disinfection of water for water consumption units designed for health or dental care purposes
US7739210B2 (en) 2001-06-28 2010-06-15 Microsoft Corporation Methods and architecture for cross-device activity monitoring, reasoning, and visualization for providing status and forecasts of a users' presence and availability
US7233933B2 (en) 2001-06-28 2007-06-19 Microsoft Corporation Methods and architecture for cross-device activity monitoring, reasoning, and visualization for providing status and forecasts of a users' presence and availability
US20050021485A1 (en) * 2001-06-28 2005-01-27 Microsoft Corporation Continuous time bayesian network models for predicting users' presence, activities, and component usage
US20040003042A1 (en) * 2001-06-28 2004-01-01 Horvitz Eric J. Methods and architecture for cross-device activity monitoring, reasoning, and visualization for providing status and forecasts of a users' presence and availability
US7493369B2 (en) * 2001-06-28 2009-02-17 Microsoft Corporation Composable presence and availability services
US20040249776A1 (en) * 2001-06-28 2004-12-09 Microsoft Corporation Composable presence and availability services
US7689521B2 (en) 2001-06-28 2010-03-30 Microsoft Corporation Continuous time bayesian network models for predicting users' presence, activities, and component usage
EP1463258A1 (en) * 2003-03-28 2004-09-29 Mobile Integrated Solutions Limited A system and method for transferring data over a wireless communications network
US8364015B2 (en) * 2006-06-28 2013-01-29 Russ Samuel H Stretch and zoom bar for displaying information
US20080036914A1 (en) * 2006-06-28 2008-02-14 Russ Samuel H Stretch and zoom bar for displaying information
US20080115045A1 (en) * 2006-11-10 2008-05-15 Sony Computer Entertainment Inc. Hybrid media distribution with enhanced security
US8739304B2 (en) * 2006-11-10 2014-05-27 Sony Computer Entertainment Inc. Providing content using hybrid media distribution scheme with enhanced security
US20080115229A1 (en) * 2006-11-10 2008-05-15 Sony Computer Entertainment Inc. Providing content using hybrid media distribution scheme with enhanced security
US8752199B2 (en) 2006-11-10 2014-06-10 Sony Computer Entertainment Inc. Hybrid media distribution with enhanced security
US20100071005A1 (en) * 2008-09-18 2010-03-18 Yoshiaki Kusunoki Program recommendation apparatus
US8798170B2 (en) * 2008-09-18 2014-08-05 Mitsubishi Electric Corporation Program recommendation apparatus
US9407942B2 (en) * 2008-10-03 2016-08-02 Finitiv Corporation System and method for indexing and annotation of video content
US20130067333A1 (en) * 2008-10-03 2013-03-14 Finitiv Corporation System and method for indexing and annotation of video content
US20110208722A1 (en) * 2010-02-23 2011-08-25 Nokia Corporation Method and apparatus for segmenting and summarizing media content
US8489600B2 (en) 2010-02-23 2013-07-16 Nokia Corporation Method and apparatus for segmenting and summarizing media content
CN102123303A (en) * 2011-03-25 2011-07-13 天脉聚源(北京)传媒科技有限公司 Audio/video file playing method and system as well as transmission control device
CN102611915A (en) * 2012-03-15 2012-07-25 华为技术有限公司 Video startup method, device and system
US20160172000A1 (en) * 2013-07-24 2016-06-16 Prompt, Inc. An apparatus of providing a user interface for playing and editing moving pictures and the method thereof
US9875771B2 (en) * 2013-07-24 2018-01-23 Prompt, Inc. Apparatus of providing a user interface for playing and editing moving pictures and the method thereof
US20150256843A1 (en) * 2014-03-07 2015-09-10 Steven Roskowski Adaptive Security Camera Image Compression Apparatus and Method of Operation
US9648355B2 (en) * 2014-03-07 2017-05-09 Eagle Eye Networks, Inc. Adaptive security camera image compression apparatus and method of operation
US11023733B2 (en) * 2017-07-10 2021-06-01 Flickstree Productions Pvt Ltd System and method for analyzing a video file in a shortened time frame

Also Published As

Publication number Publication date
JP2004520756A (en) 2004-07-08
CN1582440A (en) 2005-02-16
JP4332700B2 (en) 2009-09-16
WO2002089007A2 (en) 2002-11-07
KR100899296B1 (en) 2009-05-27
WO2002089007A3 (en) 2003-11-27
CN1284103C (en) 2006-11-08
KR20030097631A (en) 2003-12-31
EP1393207A2 (en) 2004-03-03

Similar Documents

Publication Publication Date Title
US20020159750A1 (en) Method for segmenting and indexing TV programs using multi-media cues
US7336890B2 (en) Automatic detection and segmentation of music videos in an audio/video stream
US7143353B2 (en) Streaming video bookmarks
Huang et al. Automated generation of news content hierarchy by integrating audio, video, and text information
Alatan et al. Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing
EP1138151B1 (en) Automatic signature-based spotting, learning and extracting of commercials and other video content
US8949878B2 (en) System for parental control in video programs based on multimedia content information
US6771885B1 (en) Methods and apparatus for recording programs prior to or beyond a preset recording time period
Hanjalic Content-based analysis of digital video
US7170566B2 (en) Family histogram based techniques for detection of commercials and other video content
US6990496B1 (en) System and method for automated classification of text by time slicing
Li et al. Video content analysis using multimodal information: For movie content extraction, indexing and representation
US7349477B2 (en) Audio-assisted video segmentation and summarization
Jasinschi et al. Automatic TV program genre classification based on audio patterns
Tjondronegoro et al. The power of play-break for automatic detection and browsing of self-consumable sport video highlights
JPWO2008143345A1 (en) Content division position determination device, content viewing control device, and program
Jasinschi et al. Video scouting: An architecture and system for the integration of multimedia information in personal TV applications
Sugano et al. Shot genre classification using compressed audio-visual features
Brezeale Learning video preferences using visual features and closed captions
Dimitrova et al. Personalizing video recorders using multimedia processing and integration
Bagga et al. Multi-source combined-media video tracking for summarization
El-Khoury et al. Unsupervised TV program boundaries detection based on audiovisual features
Kyperountas et al. Scene change detection using audiovisual clues
Waseemullah et al. Unsupervised Ads Detection in TV Transmissions
Khan et al. Unsupervised Ads Detection in TV Transmissions

Legal Events

Date Code Title Description
AS Assignment

Owner name: KONINKLIJKE PHILIPS ELECTRONIS N.V., NETHERLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JASINSCHI, RADU S.;LOUIE, JENNIFER;REEL/FRAME:011772/0505;SIGNING DATES FROM 20010405 TO 20010421

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION