CN102622353B

CN102622353B - Fixed audio retrieval method

Info

Publication number: CN102622353B
Application number: CN 201110028979
Authority: CN
Inventors: 刘赵杰
Original assignee: TVMining Beijing Media Technology Co Ltd
Current assignee: TVMining Beijing Media Technology Co Ltd
Priority date: 2011-01-27
Filing date: 2011-01-27
Publication date: 2013-10-16
Anticipated expiration: 2031-01-27
Also published as: CN102622353A

Abstract

The invention discloses a fixed audio retrieval method which comprises the following steps of: when an audio data retrieval database is established, firstly extracting the characteristics of the audio data according to a detection segment to establish an index table; establishing a secondary index for the audio segment with relatively large information quantity in a fingerprint segment of the audio data; in the retrieval stage of target audio data, firstly segmenting the target audio data to be retrieved according to the classification of the target audio data; performing quick inquiry of theaudio data segment with relatively large information quantity to obtain possible candidate positions; and performing fine inquiry near the candidate positions by use of the target audio data. In the technical scheme of the invention, by performing high-quality indexing of the audio database and adopting a coarse-fine combined classified inquiry mode for inquiry, the computational complexity can be remarkably reduced, and the inquiry efficiency can be improved.

Description

A kind of fixed-audio search method

Technical field

The present invention relates to multimedia technology field, relate in particular to a kind of fixed-audio search method.

Background technology

Increasing of the development in accompanying information epoch, multimedia document presents the scale of magnanimity day by day.When people when browsing and understand these contents, as the audio frequency of significant part in the multi-medium data, for people's perception provides important clue.In order to obtain interested content from these data, need to carry out information extraction and retrieval and inquisition, wherein the fixed-audio retrieval is exactly a kind of practical technology.Fixed-audio retrieval refers to the audio fragment of detection and location and given inquiry audio frequency homology in audio frequency to be checked, and it is one of basic problem in the multimedia retrieval.The fixed-audio detection technique relates to pattern-recognition, Audio Signal Processing, the multinomial technology such as speech processes.This technology has very widely application prospect, can be widely used in retrieval and the location of program, music, advertisement etc., the aspects such as the compression quality of copyright protection and evaluation audio frequency and the audio signal decoding that some has military use and monitoring.Development along with continuous maturation and the computer hardware level of technology, can predict, in the near future, this technology will be come into rapidly popular life, it will change the mode of people's study, work and life ﹠ amusement, thereby produce huge economic benefit and social benefit.

In the audio retrieval field, be a kind of system that commonly uses based on the audio-frequency fingerprint searching system.The method that it is mainly processed by signal with transfer the audio-frequency fingerprint of a fixed byte size in the audio frequency to every the sound signal of a set time, changes into voice data audio-frequency fingerprint data in this way.Then system sets up concordance list to all audio-frequency fingerprint data, thereby voice data has been set up quick-searching.

, in the fewer situation of voice data, all finger print datas can be called in the internal memory based on the audio-frequency fingerprint searching system, carry out index after, can carry out easily quick-searching.Under actual conditions, the amount of voice data is very large, and quantity is also in continuous growth, while fixed-audio searching system, when the template number of inquiry was many, when perhaps the template length of inquiry was long, computation complexity will be high, efficient will descend by straight line, and is more obvious during in the face of magnanimity inquiry storehouse.The characteristic of data is not considered in fixed-audio retrieval and inquisition storehouse when setting up, it is very large to cause inquiring about storehouse itself, does not consider simultaneously the searched targets data characteristic, and when searched targets was longer, it was very long to become retrieval time.

Summary of the invention

The object of the invention is to propose a kind of fixed-audio search method, can greatly reduce computation complexity, improve the efficient of voice data inquiry.

For reaching this purpose, the present invention by the following technical solutions:

A kind of fixed-audio search method may further comprise the steps:

A, by quiet section voice data is carried out segmentation, form non-quiet audio data detection section;

B, the audio data detection section is carried out harmonic wave detect, and the audio data detection section is classified, form voice data fingerprint section category index;

C, the audio data detection section is divided into the voice data fingerprint section of regular length, according to quantity of information voice data fingerprint section is identified classification, form the segment index of voice data fingerprint;

D, each voice data fingerprint section is extracted the voice data fingerprint characteristic, set up the voice data fingerprint index;

E, treat retrieve audio data by quiet section and carry out segmentation, form non-quiet audio data detection section to be retrieved, therefrom choose and be no less than the longest audio data detection section to be retrieved of a period of time as inquiry audio data detection section;

F, inquiry audio data detection section is carried out harmonic wave detect, determine the classification of inquiry audio data detection section, by audio-frequency fingerprint section category index, find the audio data detection section of inquiry audio data detection section correspondence;

G, will inquire about the inquiry voice data fingerprint section that the audio data detection section is divided into regular length, the quantity of information of assessment inquiry voice data fingerprint section is chosen quantity of information and is surpassed the longest continuous-query voice data fingerprint section of predetermined threshold value as the inquiry audio data section piecemeal;

H, in the audio data detection section of described correspondence, by the segment index of voice data fingerprint, obtain the position candidate of inquiry audio data section in the audio data detection section of described correspondence;

I, by the voice data fingerprint index, the inquiry audio data section is mated acquisition audio retrieval result with the position candidate in the described corresponding audio data detection section.

Among the step B, the audio data detection section that comprises harmonic structure is divided into voice segments or music segments, the audio data detection section that does not comprise harmonic structure is divided into noise section or invalid segment.

In the step F, the inquiry audio data detection section that comprises harmonic structure is divided into voice segments or music segments, the inquiry audio data detection section that does not comprise harmonic structure is divided into noise section or invalid segment.

In the steps A, by the energy of voice data present segment and the ratio of total energy, judge whether quiet section or effective acoustic segment.

In the step e, by the energy of voice data present segment to be retrieved and the ratio of total energy, judge whether quiet section or effective acoustic segment.

Adopt technical scheme of the present invention, by audio database is carried out the high-quality index, adopted thickness in conjunction with a minute rank inquiry mode during inquiry, can greatly reduce computation complexity, improved search efficiency.

Description of drawings

Fig. 1 is fixed-audio retrieval flow figure in the specific embodiment of the invention.

Embodiment

Further specify technical scheme of the present invention below in conjunction with accompanying drawing and by embodiment.

The main thought of technical solution of the present invention is based on the voice data fingerprinting key, at first voice data is carried out a pre-service, voice data is classified by detection segment, such as music, voice, quiet and other sound etc.; Then the audio data detection section is carried out a simple classification by the set time section by quantity of information.When setting up the audio retrieval database, at first set up concordance list by the feature of detection segment extraction voice data, then the higher audio section of quantity of information in the voice data fingerprint section is set up secondary index.First according to the classification of target audio data target audio data to be retrieved are carried out segmentation in the searched targets voice data stage, the higher audio data section of quantity of information is carried out fast query obtain possible position candidate, then near position candidate, carry out meticulous inquiry with the target audio data.

Fig. 1 is fixed-audio retrieval flow figure in the specific embodiment of the invention.As shown in Figure 1, this fixed-audio retrieval flow may further comprise the steps:

Phase one is the audio database process of building, and is about to the huge audio repository of capacity and converts multiple index audio-frequency fingerprint storehouse to.

Step 101, the energy that passes through the voice data present segment and the ratio of total energy judge whether quiet section or effective acoustic segment, by quiet section voice data are carried out segmentation again, form non-quiet audio data detection section.

Step 102, the audio data detection section is carried out harmonic wave detect, the audio data detection section is classified, form voice data fingerprint section category index.Wherein, the audio data detection section that comprises harmonic structure is divided into voice segments or music segments, the audio data detection section that does not comprise harmonic structure is divided into noise section or invalid segment.

Step 103, the audio data detection section is divided into the voice data fingerprint section of regular length, according to quantity of information voice data fingerprint section is identified classification, form the segment index of voice data fingerprint.Namely to the voice data fingerprint section of regular length appreciation information amount piecemeal, then the section that wherein quantity of information is higher is done sign.

Step 104, each voice data fingerprint section is extracted the voice data fingerprint characteristic, set up the voice data fingerprint index.

Subordinate phase is the audio retrieval process, and the voice data to be retrieved that is based on input mates retrieval, obtains the needed voice data of user from audio database.

Step 105, the energy that passes through voice data present segment to be retrieved and the ratio of total energy, judge whether quiet section or effective acoustic segment, treat retrieve audio data by quiet section again and carry out segmentation, form non-quiet audio data detection section to be retrieved, therefrom choose and be no less than the longest audio data detection section to be retrieved of a period of time as inquiry audio data detection section.

Step 106, inquiry audio data detection section is carried out harmonic wave detect, determine the classification of inquiry audio data detection section, the inquiry audio data detection section that comprises harmonic structure is divided into voice segments or music segments, the inquiry audio data detection section that does not comprise harmonic structure is divided into noise section or invalid segment.By audio-frequency fingerprint section category index, find the audio data detection section of inquiry audio data detection section correspondence.

Step 107, will inquire about the inquiry voice data fingerprint section that the audio data detection section is divided into regular length, the quantity of information of assessment inquiry voice data fingerprint section is chosen quantity of information and is surpassed the longest continuous-query voice data fingerprint section of predetermined threshold value as the inquiry audio data section piecemeal.

Step 108, in the audio data detection section of described correspondence, by the segment index of voice data fingerprint, obtain the position candidate of inquiry audio data section in the audio data detection section of described correspondence.Here generally can give a looser thresholding, allow candidate result comprise wherein as far as possible.

Step 109, by the voice data fingerprint index, the inquiry audio data section is mated acquisition audio retrieval result with the position candidate in the described corresponding audio data detection section.

The above; only for the better embodiment of the present invention, but protection scope of the present invention is not limited to this, anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1. a fixed-audio search method is characterized in that, may further comprise the steps:

C, with audio data detection section conversion and be divided into the voice data fingerprint section of regular length, according to quantity of information voice data fingerprint section is identified classification, form the segment index of voice data fingerprint;

2. a kind of fixed-audio search method according to claim 1, it is characterized in that, among the step B, the audio data detection section that comprises harmonic structure is divided into voice segments or music segments, the audio data detection section that does not comprise harmonic structure is divided into noise section or invalid segment.

3. a kind of fixed-audio search method according to claim 1, it is characterized in that, in the step F, the inquiry audio data detection section that comprises harmonic structure is divided into voice segments or music segments, the inquiry audio data detection section that does not comprise harmonic structure is divided into noise section or invalid segment.

4. a kind of fixed-audio search method according to claim 1, it is characterized in that, in the steps A, by the energy of voice data present segment and the ratio of total energy, judge whether quiet section or effective acoustic segment, when being quiet section, by quiet section voice data is carried out segmentation, form non-quiet audio data detection section.

5. a kind of fixed-audio search method according to claim 1, it is characterized in that, in the step e, by the energy of voice data present segment to be retrieved and the ratio of total energy, judge whether quiet section or effective acoustic segment, when being quiet section, treating retrieve audio data by quiet section and carry out segmentation, form non-quiet audio data detection section to be retrieved, therefrom choose and be no less than the longest audio data detection section to be retrieved of a period of time as inquiry audio data detection section.