EP1437711A1 - Method and apparatus for generating a function to extract a global characteristic value of a signal contents - Google Patents

Method and apparatus for generating a function to extract a global characteristic value of a signal contents Download PDF

Info

Publication number
EP1437711A1
EP1437711A1 EP20020293122 EP02293122A EP1437711A1 EP 1437711 A1 EP1437711 A1 EP 1437711A1 EP 20020293122 EP20020293122 EP 20020293122 EP 02293122 A EP02293122 A EP 02293122A EP 1437711 A1 EP1437711 A1 EP 1437711A1
Authority
EP
European Patent Office
Prior art keywords
function
functions
compound
elementary
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP20020293122
Other languages
German (de)
French (fr)
Inventor
François Pachet
Aymeric Zils
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony France SA
Original Assignee
Sony France SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony France SA filed Critical Sony France SA
Priority to EP20020293122 priority Critical patent/EP1437711A1/en
Priority to EP03290635A priority patent/EP1431956A1/en
Priority to DE20321797U priority patent/DE20321797U1/en
Priority to US10/738,928 priority patent/US7624012B2/en
Publication of EP1437711A1 publication Critical patent/EP1437711A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments

Definitions

  • the invention relates to the field of signal processing, and more particularly to a technique for deriving automatically high level information expressed by an electronic input signal by analysing the signal's low-level characteristics.
  • high-level refers to the global characteristics of the signal content
  • low-level refers to the fine grain structure of the signal itself, typically at the level of its temporal or spatial modulation.
  • examples of its high-level expression would be an indication of whether the title pertains to a sung or instrumental piece of music, the musical genre, musical complexity, overall timbre, tempo, or the rhythm structure, etc.
  • the low-level characteristics would be the signal's time-dependent parameters such as amplitude, pitch, etc. analysed over successive short sampling periods.
  • the signals in question can thus be in the form of digital data accessed from a memory or inputted as a digital stream, or they can be in analogue form.
  • descriptor In such audio applications, the high-level information is normally known by the term "descriptor". Generally, a descriptor expresses a quality, or dimension, of the content represented by the signal, and which is meaningful to a human or to a machine for processing high-level information. Depending on what they express, descriptors attribute a value which can be of different types:
  • EMD Electronic Music Distribution
  • EMD systems use either manually entered descriptors (e.g. using software systems developed commercially by the companies “Moodlogic” and “AllMusicGuide”.
  • the descriptors are then used for accessing music browsers, using a search by similarity, or a search by example, or any other known database searching technique.
  • Some descriptors such as the musical genre, are influenced by cultural references and therefore require criteria to be entered from a specific population sample.
  • the invention provides for an automated tool which takes for input a test database containing a set of reference signals, for instance audio files readable by a music player, at least one arbitrary descriptor that can be potentially correlated to the signals, a grounded truth value of that descriptor for each of the database signals and a set of elementary signal processing functions.
  • the tool selects functions of that set to construct one compound function or more, and automatically applies it on the signals of the database.
  • new compound functions are created and tried, until an arbitrary end condition is reached.
  • the present invention relates to a method of generating a general extraction function which can operate on an input signal to extract therefrom a predetermined global characteristic value expressing a feature of the information conveyed by that signal, characterised in that it comprises the steps of:
  • the invention provides for many advantageous optional embodiments, which are outlined below.
  • the compound functions are preferably generated in successive populations, wherein each new population of functions takes as a basis earlier population functions which produce a relatively high correlation.
  • the method can be performed by the steps of:
  • the compound functions are preferably produced by random choices guided by rules and/or heuristics.
  • the rules and/or heuristics can comprise at least one rule which forbids, from a random draw for selecting an elementary function to be associated with a part of a compound function under construction, an elementary function that would be formally inappropriate for that part.
  • the rules and/or heuristics can comprise at least one heuristic which favours, in a random draw for selecting an elementary function to be associated with a part of a compound function under construction, an elementary function which is considered to produce potentially useful technical effects in association with that part, and/or which discourages from the random draw an elementary function considered to produce technical effects of little or no use in association with that part.
  • the rules and/or heuristics can comprise at least one heuristic which ensures that a compound function comprises only elementary functions that each produce a meaningful technical effect in their context.
  • the rules and/or heuristics can comprise at least one heuristic which takes into account at least one overall characteristic of the reference signals.
  • a new population of functions is produced using genetic programming techniques.
  • the genetic programming techniques comprise at least one of following:
  • a crossover operation and/or a mutation operation can be guided by at least one heuristic cited above.
  • the means that handle the elementary functions as symbolic objects preferably manage the functions in accordance with a tree structure comprising nodes and connecting branches, in which each node corresponds to a symbolic representation of a constituent unit function, the tree having a topography in accordance with the structure of the function.
  • the method further comprises a step of submitting a compound function to at least one rewriting rule executed by processing means to ensure that said compound function is cast in its most rational form or most efficient form in respect of execution efficiency.
  • the method uses a caching technique for evaluating a function, in which results of previously calculated parts of functions are stored in correspondence with those parts, and a function currently under calculation is initially analysed to determine whether at least a part of said function can be replaced by a corresponding stored result, said part being replaced by its corresponding result if such is the case.
  • the method can then comprise the steps of checking the usefulness of results stored according to a determined criterion, and of erasing those found not to be useful, the criterion for keeping a result Ri being a function which takes into account: i) the calculation time to produce Ri, ii) the frequency of use of Ri and, optionally, iii) the size (in bytes) of Ri.
  • the elementary functions can comprise signal processing operators and mathematical operators.
  • the method can further comprise a step of validating a general function against at least one reference signal having a known value for the general characteristic, and which was not used to serve as the reference.
  • the signal can express an audio content, and the global characteristic can be a descriptor of the audio content.
  • the audio content can be in the form of an audio file, the signal being the signal data of the file.
  • the invention relates to a method of extracting a global characteristic value expressing a feature of the information conveyed by a signal, characterised in that it comprises calculating for that signal the value of a general function produced specifically by the method according to the first aspect for that global characteristic.
  • the invention relates an apparatus for generating a general function which can operate on an input signal to extract therefrom a predetermined global characteristic value expressing a feature of the information conveyed by that signal, characterised in that it comprises:
  • the invention relates to an apparatus according to the third aspect configured to execute any one of the optional aspects of the method set out above, it being understood that the features defined in the context of the method can be implemented mutatis mutandis to the apparatus.
  • the invention relates to the use of the apparatus according to the third aspect as a fully autonomous automatic descriptor extraction function generating system.
  • the invention relates to the use of the apparatus according to the third aspect as a descriptor extraction means.
  • the invention relates to the use of the apparatus according to the third aspect as an authoring tool for producing descriptor extraction functions.
  • the invention relates to the use of the apparatus according to the third aspect as an evaluation tool for externally produced descriptor extraction functions.
  • the invention relates to a general function in a form exploitable by an electronic machine, produced specifically by the apparatus according to the third aspect.
  • the invention relates to a software product containing executable code which, when loaded in a data processing apparatus, enables the latter to perform the method according to the first aspect.
  • the above iterative search procedure through successive populations is implemented by what is known as genetic programming.
  • the functions ⁇ which typically take the form of executable code ⁇ are tried and the results serve to automatically create new populations of functions in accordance with genetic programming techniques, taking the best fitting functions in a manner somewhat analogous to selection and submitting those selected functions to actions corresponding e.g. to crossover and mutation phenomena occurring in biological processes at chromosome level.
  • the remarkable aspect here resides in applying a genetic programming technique on functions which take for argument raw electronic signals.
  • the proposed invention allows to extract arbitrary descriptors from music signals. More precisely, the embodiment does not extract a particular descriptor, but rather, given a set of music titles containing both examples (and possibly counter-examples) for a given descriptor, builds automatically a function that extracts from audio signals an optimum value.
  • the same system can be used to produce a function associated to an arbitrary descriptor such as one listed in the earlier part of the introduction, which can then be exploited as a general function for that associated descriptor, in the sense that it can be made to operate subsequently on any music file to extract the value of the descriptor for that file (assuming its signals are compatible).
  • Each extractor can be seen here as a function that takes as argument a given music signal (typically 3 minutes of audio), and outputs a value. This value can be of various types: a float (for the tempo), a vector (for the timbre), a symbol (for instrumental versus song discrimination), etc.
  • the main task of extractor design is to find the right composition of basic, low-level signal processing functions to yield a value that is as correlated as possible to the values obtained by psycho-acoustic tests.
  • the preferred embodiment contains a representation of a human expertise in signal processing: it will try different combinations of signal processing functions, evaluate them, and compare them against human perceptive values. Using an algorithm based on genetic programming, different signal processing functions will be tried concurrently, and modified to find a satisfying extractor function.
  • the system is one step higher: its primary function is not to produce a descriptor for a signal, but rather a function which itself will produce the descriptor, when applied on other music file signals e.g. taken from a database of signals.
  • Figure 1 depicts a system 2 in accordance with the invention to indicate the raw data on which it operates (user data input) and the output (user data output) it produces from the latter.
  • the example is based on a music data application, in which the system 2 generates as its user data output an executable function 4, referred to as a descriptor extractor function (DE function).
  • DE function descriptor extractor function
  • This function is then packaged in a data carrier 5 in a form suitable to be exploited for extracting a given descriptor from an arbitrary audio file 6.
  • the latter is typically formatted according to a recognised standard such as CD audio, MP3, MPEG7, WAV, etc exploitable by a music player, and contains a musical piece to which a descriptor value Dx is to be associated.
  • the DE function 4 operates on the raw data signal Sx of the audio file 6, i.e. it takes the latter as its argument or operand and returns the descriptor value DVex for that file.
  • the signal Sx is assumed to be compatible with the DE function 4 as regards data format.
  • the descriptor value is typically a number, a Boolean, or a statement, and generally belongs to the class or real objects R n .
  • the above data carrier 5 typically comprises a software package which can contain other DE functions, e.g. for extracting other descriptor values, and possibly auxiliary software code, e.g. for management and user assistance.
  • the data carrier 5 can be a physical entity, such as a CD ROM, or it can be in immaterial form, e.g. as downloadable software accessible from the Internet.
  • the system 2 generates the DE function 4 on the basis of both the user data input and internally programmed parameters, functions and algorithms, as shall be detailed later.
  • the user data input serves inter alia to feed an internal learning database and constitutes the raw learning material from which to model the DE function.
  • This material includes a set of m audio files A1 to Am and, for each one Ai(1 i m), a given value Dgti of a specific descriptor De for the audio item Ti it contains.
  • the audio files Ai are formatted as for file 6 above, and thus each produce a respective signal Si when accessed to reproduce the audio item Ti.
  • the respective descriptor values Dgt1-Dgtm associated to the audio files are established by a human judge, or a panel of human judges. For instance, if the descriptor De in question is the "global energy" of the music title, the judge or panel awards for each respective title Ti a number within a range from a minimum (level of a lullaby, for instance) to a maximum, and which constitutes the title's descriptor value Dgti. These values Dgti are referred to "grounded truth" descriptor values.
  • Figure 2 shows the general architecture of the system 2.
  • the system is preferably implemented using the hardware of a standard personal computer PC.
  • the different types of data used are divided into respective databases 10-18 under the general control of a data management unit 20, which further manages the overall data flow of the system 2.
  • the databases comprise:
  • the signal processing and overall management of the system are carried out by a main processor unit 22 which runs programs contained in a main program memory 24.
  • a user interface 26 associated to a monitor 28, keyboard 30 and mouse 31 allows the user input and output data of figure 1, as well as the internal programming data, to be entered and extracted.
  • Figure 3 illustrates the principle of an elementary function EF as exploited by the system 2.
  • the elementary function comprises executable code and one or a set of parameter(s) which it can receive as input Pin, and which defines the elementary function's boundary conditions.
  • An elementary function acts on an operand, or argument 32 ⁇ which can be signal data or the output of a preceding elementary function ⁇ and generates an output that is the result of the code executed on the operand data.
  • An elementary function EF is catalogued in the system inter alia by the type of operand, designated Toper, on which can operate and on the type of output, designated Tout, it delivers. Types Toper and Tout can be the same or mutually different for a given elementary function.
  • Typical types include: signal, numerical (single number, float, range), vector, or matrix.
  • elementary functions EF ⁇ which can be assimilated to modules ⁇ as symbolic objects or as executable operators depending on the nature of the processing required in the course of elaborating a compound function CF.
  • Figure 4 illustrates an example of an elementary function in the form of a low pass filter (LPF) operator.
  • LPF low pass filter
  • its executable code comprises a digital LPF algorithm and its input parameters Pip are the cut-off frequency F and optionally the attenuation rate (dB/octave).
  • Figure 5 illustrates another example of an elementary function, this time in the form of a short-time fast Fourier transform (short-time FFT) operator.
  • the executable code comprises a short time FFT algorithm, and its input parameters Pin are the sampling window and summation limits.
  • an elementary function also constitutes an argument, or operand, for its left-hand neighbour (i.e. succeeding function) to which its is joined by "*" function when the case arises.
  • an output of an elementary function can include parameter input data for its neighbouring function.
  • function EFb This is illustrated in figure 6 by the output of function EFb, which produces inter alia a signal which conveys a parameter Pin for its downstream function EFc, for instance the value of a high-pass cut off frequency if the latter is a high-pass filter function.
  • a compound function CF can contain an arbitrary number of elementary functions related by different arithmetical operators (+, -, * or ⁇ ).
  • Elementary functions connected together by a multiplicative or divisional operator form a term; several terms can be linked by associative operators + and - as the case arises when constructing a compound function CF.
  • the compound function construction program 25 is based on genetic programming techniques following an artificial intelligence (AI) approach. Accordingly, the elementary functions EF are also handled as symbols, whereby they are treated as first class obj ects in their symbolic representation.
  • AI artificial intelligence
  • the system 2 is capable of handling the elementary functions both as objects, when executing the compound function (CF) construction program 25, and as executable operators, notably for evaluating and testing the compound functions, when executing the function execution program 27.
  • these two programs 25 and 27 use languages adapted respectively to handling objects and to carrying out numerical calculations, an example of the latter being the "Matlab" language.
  • Table I gives a non-exhaustive example of elementary functions stored in the elementary function library 12, together with their operand type Top, output type Tout and parameters. sample list of elementary functions used by the system 2.
  • I.1 Mathematical functions Function name Operation Param Pin Toper Tout DERIV Time derivative - Signal Signal MAX Max value of set - set of No.s No. MIN Min value of set - set of No.s No. SQUARE Raise power 2 - No. No. LOG Logarithm - No. No. MEAN ave value of set - set of Nos. No. VAR variance of set - set of Nos. No.s ABS(V) Absolute value
  • the system when the system handles the elementary functions as symbols, as in the above construction phase, it uses a tree structure.
  • a compound function CF is symbolised in terms of nodes, where each node corresponds to one elementary function EF, and in which branches connect the nodes according to the arithmetic operators +, -, *, ⁇ used.
  • the three terms are developed along three respective branches Br1-Br3.
  • the three branches join at the "+" function, which is the common link to CF.
  • the order of appearance of the elementary functions is followed along successive nodes, the first elementary function (i.e. the first to operate on the signal) being nearest the free end of its branch.
  • the CF construction program 27 initially begins by selecting and aggregating elementary functions in a random fashion.
  • knowledge-based heuristics generally operate by associating to each elementary function EF a weighting coefficient affecting its random draw probability. These coefficients are attributed dynamically according to immediate context. The heuristics can in this way rule out some combinations of elementary functions through a zero weighting coefficient, at one extreme, and force combinations by imposing an absolute maximum value coefficient at the other extreme. A set of intermediate weighting coefficient values is provided to allow the random process to determine the construction of compound functions, albeit with constraints. These heuristics are generally derived from experience in using the system and the user's formal or intuitive knowledge. They thus allow the user to inject his or her know-how into the system and afford a degree of personalisation. They can also be generated by the system itself on an automated basis, using algorithms that detect similarities between compound functions that have been recognised as successful.
  • Rewriting involves recasting compound functions from their initial form to a mathematically equivalent form that allows them to executed more efficiently. It is governed by a set of deterministic rewriting rules of varying levels of complexity which are executed on each function CFi of the population by the main processor 22, those rules being in machine-readable form.
  • Simple rewriting rules eliminate self-cancelling terms in a compound function. For instance, if the compound function considered contains the terms HPF(S, Fa)+FFT(S)- FFT(S), the rewriting rules shall tidy up the expression and reduce it to HPF(S, Fa).
  • Another category of rewriting rules eliminates elementary functions that are redundant given their environment, i.e. which do not produce a technical effect. For instance, if an expression contains a bandpass filtering function with a passband between frequencies Fb and Fc, then the rules would eliminate any subsequent function in that term which filter out frequencies outside that passband range, i.e. which are no longer present.
  • the implementation of the rewriting rules uses the tree structure of the compound function under consideration. Each node, or section of the tree, is scanned against the set of rewriting rules. Whenever a rewriting rule is applicable to a node or a succession of nodes of the part of the tree being analysed, the node or succession of nodes in question is rewritten according to that rule and replaced by a new tree section or node that corresponds to the thus rewritten ⁇ and hence simplified ⁇ form of the compound function.
  • the tree scanning is repeated cyclically until no changes have been brought for a complete scan.
  • the rewriting rules do not produce a change that in itself leads to another change, and conversely, ad infinitum.
  • the system would not contain simultaneously a rule to rewrite A+B as B+A and another rule to rewrite B+A as A+B (in fact, this would be the same rule, infinitely applicable to the result of its own production, and therefore yielding an unending loop)
  • the signal Sj in question corresponds to a digitised form of an amplitude (signal level) evolving in time t, the time frame of t typically being on the order of 200 seconds in the case of a music title.
  • the n.m output values are mapped in matrix MAT(P) which is stored in a working memory of the main processor 22. These values are accessed at a subsequent stage of evaluating the overall fit of each of the n compound functions CF1-CFn with the descriptor De for which the grounded truths Dgt1-Dgtm were produced. This evaluation is carried out by standard statistical analysis techniques.
  • each of the output m.n output values of the matrix MAT(P) is compared with its respective corresponding grounded truth descriptor value Dgti. Specifically, the set of m.n values Dij is analysed against corresponding grounded truth descriptor values Dgt1-Dgtm for the descriptor De ascribed to the respective music titles T1-Tm.
  • the analysis here involves comparing the value Dij with the Dgtj value for the corresponding audio file. This comparison is performed for each of the audio files, so yielding m comparison values. These comparison values are submitted to statistical analysis to obtain a global fit ⁇ or fitness ⁇ value FIT(afj) with respect to the descriptor De for that function CFi.
  • the global fitness value FIT(afj) expresses objectively how well overall the descriptor values generated by the function CFj match ⁇ or correlate ⁇ with the corresponding grounded truth descriptors Dgt1-Dgtm.
  • the global fit in question is evaluated in the form appropriate for the descriptor, for instance numerical closeness for a numerical descriptor, Boolean correspondence for a Boolean descriptor, etc.
  • the r compound functions CF(1)1 to CF(1)r of the new population P1 are then processed in their symbolic object form according to the above-described tree structure.
  • the aim here is to generate from that population P1 a next generation population P2 of compound functions.
  • the system achieves 2 this by using genetic programming techniques. These programming techniques model aspects of biological regeneration or reproduction process naturally ocurring at chromosone level, such as crossover and mutation.
  • the analogue to a chromosone is an elementary function EF in its symbolic representation.
  • Genetic programming is in itself well documented, but hitherto reserved only to fields remote from electronic signal processing. Remarkably, it can be implemented to a great advantage in the present field by virtue of the present approach in which the compound functions question, whose primary purpose is to operate on an electronic signal, are conveniently made exploitable, at critical phases of their elaboration process, as symbolic objects.
  • This "object” form which advantageosly uses the above-described tree structure, thereby becomes amenable to genetic programming using standard knowledge of applied genetic programming. Accordingly, detailed aspects involving normal knowledge of genetic programming language and practice accessible to a person skilled in the art of genetic programming shall not be detailed in the present description for reasons of conciseness.
  • the concept of genetic programming applied to the present signal procesing functions CF is illustrated in connection with two interesting aspects: crossover and mutation. Each is implemented with adapted and specific rules and heuristics stored in the heuristics database 14 and the rules database 15.
  • rules and heuristics applied in the context of genetic programming are the formal and boundary condition rules, and knowledge-based heuristics outlined above, and adapted to circumstances. Overall, the rules and heuristics applied ensure that the compound functions resulting from genetic programming operations are formally acceptable, have a potential for exhibiting an improvement (in terms of fitness) compared the functions from which they are generated, and remain within the system's operating limits.
  • crossover involves taking two compound functions, say CF(1)p and AP(1)q, (for population P1) and creating from them a new function CF(1)pq which contains a mixing of functions CF(1)p and AP(1)q, in a manner analogous to two chromosomes combining to form a new chromosome.
  • FIG. 9 An example of a new function CF(1)pq produced by crossover of functions CF(1)p and AP(1)q is illustrated by figure 9 using the tree representation.
  • the elementary functions are designated in their abbreviated form: ep1-ep10 for compound function CF(1)p and eq1 to eq10 for compound function CF(1)q.
  • Crossover is carried out by a crossover generator module 33 forming part of the compound function construction program 25 stored in memory 24.
  • the module 33 receives the two functions CF(1)p and CF(1)q as input and analyses their tree structure using a set of stored crossover rules and heuristics. The analysis seeks to determine, for each function, a suitable break point along a branch. The break point divides the tree in question into a portion that is to be rejected and a portion that is to be retained. In the example, it can be seen that for compound function CF(1)p, the part of the tree structure comprising elementary functions ep7 to ep10 is retained, and the part on the other side of the break point comprising elementary functions ep1 to ep6 is rejected.
  • More complex crossover operations can involve extracting at least one section of a tree (not necessarily an end section) and inserting it within another tree by producing one or several break points in the latter depending on where it is to be accommodated.
  • break points are determined in a guided ⁇ or constrained ⁇ random draw, in which the guidance is provided by a set of crossover rules and heuristics.
  • a first such rule is of the formal type, and requires that two nodes susceptible of being joined together must be formally compatible from the point of view of types, as described above in the context of formal rules.
  • candidate break points for the random draw are considered in mutually indexed pairs, each member of the pair being associated to a respective tree.
  • the corresponding nodes to be joined are identified in terms of which ones correspond respectively to the operand and the operating function among the pair. Only those pairs of break points satisfying the formal requirements are accepted as candidates.
  • the rules in question shall ensure that despite the crossover resulting from a random draw, the operand type Toper(ep7) of elementary function ep7 is the same as the output type Tout(eq6) of elementary function eq6.
  • Another rule is of the boundary condition type and requires that the break point should preferably be at the central portion of the tree, e.g. by using weighted random draws, to ensure that the size of crossover-generated compound functions shall be statistically similar in size over repeated generations.
  • knowledge-based heuristics are tested on crossover-generated compound functions.
  • the operators in the new compound function are tested one by one starting from the break point.
  • the knowledge-based heuristics provide a probability for each new operator, regarding which the compound functions is accepted or rejected at each step.
  • Mutation involves taking one compound function CF(1)s and forming a variant thereof CF'(1)s.
  • the variant can be produced by modifying one or a number of the parameters of CF(1)s, and/or by modifying the function's structure, e.g. by adding, removing or changing one or several of its elementary functions, or by any other modification.
  • FIG. 10 An example of a new compound function CF'(1)s produced by mutation of a function CF(1)s is illustrated by figure 10.
  • the initial compound function CF(1)s has a tree structure formed of elementary functions es1 to es7 as shown.
  • This function is inputted to a mutation generator module 34 forming part of compound function construction program 25.
  • the mutation generator module 34 produces on that function one or several mutations on a guided - or constrained - random basis.
  • the outputted mutated function CF'(1)s happens to differ from the inputted function CF(1): i) at the level of the elementary function es6, which is a lo pass filter operator whose parameter P'(es6) now specifies a cut-off frequency of 450 Hz instead of 600 Hz in its original form P (es6), and ii) at level of elementary function es1, which is simply being deleted.
  • the mutation process is governed by mutation rules and heuristics, which include formal rules that likewise ensure that any changed function remains formally correct, and boundary condition rules which govern the nature and number of mutations allowed, etc.
  • the system can implement other genetic programming operations. For instance, it can produce a cloning, which involves taking one compound function CF(1)t and forming a variant thereof CF'(1)t.
  • the variant has exactly the same functional structure as the original function CF(1)s. Only the values of the fixed parameters are modified. For instance, if the original compound function contains a low-pass filter with a fixed cutoff frequency value of 500Hz, a clone would be the same compound function with a different cutoff frequency value of 400Hz for instance.
  • a cloning parameter can control the extent of the variations of the values (for example +/- 10%). Note that cloning is simply a special ⁇ and restricted ⁇ case of mutation in the sense described above.
  • the genetic programming procedure comprising the above crossover and mutation operations (and possibly other operations) are applied to the population P1 of functions over a given period or number of cycles.
  • the procedure is terminated for the population, there results a new population P2 of compound functions which are the genetic descendants of those from population P1.
  • the number of compound functions CF(2) forming the population P2 is made to be the same as for population P, so as to accommodate for a selection of the r best fitness functions of that population to produce its own succeeding population of functions P3.
  • the creation of new population typically calls for a repetition of the random creation procedure (described above for randomly creating the initial population P) to top up the population, given that crossover operations tend reduce the population (if C ⁇ CO).
  • the new population P2 is then treated in the same manner as the initial population P, starting with a phase undergoing rewriting rules (the rules and heuristics listed above have already applied explicitly or implicitly to that population P2 in the course of the genetic programming (crossover and mutation) operations.
  • each compound function CF(2) of the new population is determined against the grounded truth descriptor values Dgt1 to Dgtm for the descriptor De.
  • the procedure here is just as for obtaining population P1, and the algorithm described above applies mutatis mutandis by replacing P with P1, and P1 with P2.
  • the above procedure is carried out iteratively over a given number of cycles, each cycle producing a new population Pu from the previous population Pu-1 by genetic programming and a selection of the best compound functions for the population Pu.
  • a heuristic can be represented as a function which has for argument (operand):
  • the heuristic function produces from the above argument a result in the form of a value in a specified range, e.g. from 0 to 10, which expresses the appropriateness or interest of constructing a function in which the potential term is branched (according to the tree representation) to the current term, e.g. as its argument.
  • weighting coefficient 0 potential term forbidden from random draw 1 of very little interest ... 5 of medium interest 9 extremely interesting 10 potential term imposed (i.e. must be selected).
  • a heuristic shall determine the appropriateness of creating the branching where the "S" of the current term becomes "FFT.DERIV.FFT.S".
  • a further class of heuristics takes into account the global nature of the signals in the learning database 10. The latter is expressed by a quantity referred to as “global reference indicator"
  • This global reference indicator can also be for instance a set of descriptors taken out from that reference database.
  • the iterative loops used by the system 2 involve a considerable amount of processing, especially for the steps of extracting a value Dij of a compound function CFi for a signal data Sj.
  • the system advantageously uses the prior results cache 16 as a source of precalculated results that save having to repeat calculations that have previously been performed.
  • the corresponding caching technique involves analysing a compound function under execution in terms of its tree structure, and thus involves both the symbolic, object representation of the function and its exploitation as an operator.
  • Figure 11 is an example illustrating how the caching technique is implemented.
  • the main processor 22 is required to calculate the value of a branch Brq belonging to another function CFv(S).
  • the cache 24 is thus enriched with new results every time a new function or term is encountered and calculated.
  • the caching technique becomes increasingly useful cache contents grow in size, and contributes remarkably to the execution speed of the system 2.
  • the number of entries in the prior results cache 24 can become too large for an efficient use of allowable memory space and search.
  • a monitoring algorithm which regularly checks the usefulness of each result stored in the cache 24 according to a determined criterion and deletes those found not to useful.
  • the criterion for keeping a result Ri in the in the cache 24 is a function which takes into account: i) the calculation time to produce Ri, ii) the frequency of use of Ri, and iii) the size (in bytes) of Ri. The last condition can be disregarded if available memory space is not an issue, or if it is managed separately by the computer.
  • the system 2 After a given number of cycles or a given execution time according to a chose criterion, the system 2 produces as its user data output a descriptor extraction (DE) function 4 (cf. figure 1).
  • DE descriptor extraction
  • the latter is the member of the latest generation population Pf of compound functions CF(f) that has been found to have the best fit for the descriptor De.
  • the user output can produce more than one member of that population, for instance the b best fit functions CF(f), where b is an arbitrary integer, or those compound functions that exhibit a fit better than a given threshold.
  • the criterion for ending the loop back to creating a new population of functions is arbitrary, an ending criterion being for example one or a combination of: i) execution time, ii) quality of results in terms of the functions' fitness, iii) number of generations of functions (loops executed), etc.
  • an composite function is finally outputted as a DE function for future exploitation, it is validated against signals of other music titles taken from the validation database 18.
  • signals are not used to influence the construction of the DE functions 4, they serve as a neutral reference on which to check their effectiveness.
  • the checking procedure involves determining the degree of fit between on the one hand a descriptor value obtained by making a DE function operate on a signal Sv of the validation database and on the other the grounded truth descriptor value associated to the music title of that signal Sv.
  • An overall correlation or validation value is generated by statistical analysis over a given number of entries of the validation database 18. If the validation value is above an acceptable threshold, the DE function 4 is validated and thus considered to be exploitable. In the opposite case, the DE function is rejected and another DE function is considered.
  • Figure 12 is a flowchart illustrating some steps performed by the system 2 of figure 2 in the course of producing a descriptor extraction function DE 4, these being:
  • Heuristics and/or rules can be entered, edited, modified through the user interface unit 26 e.g. by manual input (keyboard) or by download, thereby making the system fully adaptive and configurable.
  • the system generates several hundred compound functions over a twelve-hour period.
  • the learning database preferable comprises at least several hundred titles, and preferably several thousand.
  • the handling of such large databases is simplified by the use of the above caching technique and heuristics.
  • Parallel processing, where a same function is calculated on several titles simultaneously using respective processors over a network can also be envisaged.
  • the size of the compound functions is typically of the order of ten elementary functions.
  • the system is remarkable in that it does not need to be informed of the descriptor De for which it must a find a suitable DE function. In other words, all that is necessary is to provide examples of just the descriptor values Dgti associated to music titles Ti and their signal data Si. This makes the system 2 completely open as regards descriptors, and amenable to generating suitable DE functions for different descriptors without requiring any initial formal training or programming specific to a given descriptor.
  • the system is connected to a network, such as Internet or a LAN, in order to facilitate the acquisition of music titles through a download centre 36.
  • a network such as Internet or a LAN
  • the networking also makes it possible to share and exchange elementary functions, compound functions, heuristics, rules and DE functions found to be interesting, as well as results data for the prior results cache 24, allowing parallel processing, etc. In this way, an interactive community of searchers can be fostered and allow the a rapid spread of new developments.
  • the heuristics and/or rules can be entered / edited / parameterised through the user interface unit 26; they can also be generated / adapted internally by the system, e.g. by processing techniques based on analysing compound functions that produce the best fits and determining common features thereof expressible as rules and/or heuristics.
  • Figure 12 is an example of different DE functions and their fitness produced automatically by the system for evaluating the presence of voice in music title.
  • Figure 13 is an example of different compositions of DE functions in terms of elementary functions, and their fitness produced automatically by the system to evaluate the global energy of music titles.
  • the method and data implemented by the system can be presented as executable code forming a software product stored on a computer-readable recording medium, e.g. a CD-ROM or downloadable from a source, the code executing all or part of operations presented.
  • a computer-readable recording medium e.g. a CD-ROM or downloadable from a source
  • the remarkable aspects of the present automated system 2 can be appreciated from considering how the task would have to be considered in a manual approach.
  • the starting point is the raw data signals as seen by the specialist in signal processing.
  • the latter tries out various processing functions according to a empirical methodology in the expectation that some rule shall emerge for correlating complex signal characteristics with that descriptor.
  • the approach is extremely heuristic in nature. It is also largely based on trial and error.
  • the programmed system 2 is able to generate an exploitable DE function 4 from scratch using just the user data input indicated with reference to figure 1.
  • the DE function typically takes on the form of executable code or instructions comprehensible to a human or machine.
  • the contents of the DE function thereby allow processing on the audio data signal of any given music title to extract its descriptor De, the latter being referenced to the function.
  • the process of extracting in this way the descriptor De of a music title can be performed by an apparatus which is separate from the system.
  • the apparatus in question takes for input the DE function (or set of DE functions) produced by the system 2 and audio files containing signals for which a descriptor has to be generated.
  • the output is then the descriptor value Dx of the descriptor De for the or each corresponding music title Tx.
  • the DE function (or set of DE functions) produced by the system 2 is in this case considered as a product in its own right for distribution either through a network, or through a recordable medium (CD, memory card, etc.) in which it is stored.
  • the system 2 already includes all the hardware and software necessary to constitute an automated descriptor generating apparatus as defined in the preceding section.
  • the DE functions shown as user data output of figure 1 are fed back to the system (or kept within system and stored).
  • the system can be switched to the descriptor extraction mode in which audio signal data corresponding to a music file Tx to be analysed is supplied as an input and the corresponding music descriptor value of Tx for the descriptor De is provided as the output.
  • the system is implemented more as an authoring tool.
  • the system allows the outputted DE functions to be modified by external intervention, generally by a human operator.
  • the rationale here is that in some cases, while the functions produced automatically may not be strictly optimal, they are nevertheless highly interesting as a starting basis for optimisation, or "tweaking".
  • the advantage in this case resides in that the human specialist has at his disposal a descriptor extraction function firstly which is already proven to be effective compared to a large number of other possible functions, indicating that it possesses a sound structure, and secondly which is proven to be amenable to fast and consistent execution.
  • the DE function outputted by the system 2 can generally be modified by intervening in this case too either at the level of the basic elementary function taken as a symbolic object, e.g. by substitution, removal, or addition, or at the level of the internal parameterisation of a basic elementary function, e.g. by changing a cut-off frequency value in the case of the low-pass filtering elementary function.
  • the aspect of the system 2 that analyses and evaluates compound functions can be put at the disposal of external sources of candidate DE functions, so as to help designers evaluluate their own descriptor extraction functions.
  • the evaluation can be used to provide an objective assessment of the "fitness" FIT of such a candidate function with respect to the learning database 10 or validation database 18.
  • the function calculation potential of the system 2 can be put at the disposal of outside users.
  • the latter can then input a given complex signal processing function (not necessarily in the context of descriptor extraction) and receive a calculated value as an output.

Abstract

The method involves generating compound functions composed of elementary functions, by using programmed units. Compound functions are operated on a reference signal with units that process elementary functions as executable operators. A correlation between compound functions values and a global characteristic value of a reference signal is determined to select an extraction function (4) with high correlation. The programmed units handle the elementary functions as symbolic objects and as executable operators. Independent claims are also included for the following: (a) a apparatus for generating a general function that can operate on an input signal (Sx) to extract a preset global characteristic value (DVex) expressing a feature of the information (De) conveyed by that signal (b) a software product containing executable code which, when loaded in a data processing apparatus, enables the latter to perform the method of generating a general extraction function.

Description

  • The invention relates to the field of signal processing, and more particularly to a technique for deriving automatically high level information expressed by an electronic input signal by analysing the signal's low-level characteristics. In this context, the term high-level refers to the global characteristics of the signal content, while the term low-level refers to the fine grain structure of the signal itself, typically at the level of its temporal or spatial modulation.
  • For instance, in the case of audio signals corresponding a given music title, such as contained in an audio file readable by a music player, examples of its high-level expression would be an indication of whether the title pertains to a sung or instrumental piece of music, the musical genre, musical complexity, overall timbre, tempo, or the rhythm structure, etc., while the low-level characteristics would be the signal's time-dependent parameters such as amplitude, pitch, etc. analysed over successive short sampling periods. The signals in question can thus be in the form of digital data accessed from a memory or inputted as a digital stream, or they can be in analogue form.
  • In such audio applications, the high-level information is normally known by the term "descriptor". Generally, a descriptor expresses a quality, or dimension, of the content represented by the signal, and which is meaningful to a human or to a machine for processing high-level information. Depending on what they express, descriptors attribute a value which can be of different types:
    • a Boolean, e.g. true/false to indicate whether or not a music title is sung,
    • a number to express information quantitatively against a reference scale, e.g. 7.3 against a scale of 1 to 10 for a global music energy descriptor,
    • an indication of a selection from a list of labels, e.g. "military music" to indicate a musical genre from a preset list.
  • In the field of music, descriptors are of interest notably in the expanding field of music access systems and Electronic Music Distribution (EMD). To facilitate user access to large music databases, descriptors of music titles are needed. EMD belongs to the more general concept of music information retrieval (MIR), which is the technique of intelligently searching and accessing musical information in large music databases.
  • Traditionally, EMD systems use either manually entered descriptors (e.g. using software systems developed commercially by the companies "Moodlogic" and "AllMusicGuide". The descriptors are then used for accessing music browsers, using a search by similarity, or a search by example, or any other known database searching technique.
  • A key issue in extracting automatically descriptors for audio signals is that it is very difficult to map signal properties with perceptive categories. In the prior art, attempts have been made to extract specific descriptors from a sound signal, these being documented notably in:
    • Scheirer, Eric D., "Tempo and Beat Analysis of Acoustic Musical Signals", J. Acoust. Soc. Am. (JASA) 103:1 (Jan 1998), pp 588-601., for tempo,
    • Aucouturier Jean-Julien, Pachet Francois, "Music Similarity Measures: What's the Use? ", Proceedings of the 3rd International Symposium on Music Information Retrieval (ISMIR02), Paris - France, October 2002, for timbre,
    • Pachet, F., Delerue, O. ,Gouyon, F., "Extracting Rhythm from Audio Signals ", SONY Research Forum, Tokyo, December 2000, for rhythm, and.
    • Berenzweig A.L., Ellis D. P. W., "Locating Singing Voice Segments within Music Signals", IEEE Workshop on Applications of Signal Processing to Acoustics and Audio (WASPAA01), Mohonk NY, October 2001.
  • There are however many other dimensions, i.e. descriptors, of music that can be extracted from the signal. For instance:
  • danceability
  • music for children
  • military music
  • music for slow
  • global energy
  • sung versus instrumental
  • original versus remix
  • acoustic versus electr(on)ic
  • live versus studio
  • musical complexity
  • musical density
  • etc.
  • While such descriptors are readily discernible by a human listener, the technical problem of producing them electronically from raw music data signals is reputed to be particularly difficult. For instance, there is no immediately apparent low-level characteristic of a raw music signal from which it is possible to identify whether it pertains to a sung piece or to an instrumental. This is particularly true when the sung voice is mixed with music. Even the global energy descriptor has no straightforward link with the energy level of the raw signal.
  • Some descriptors, such as the musical genre, are influenced by cultural references and therefore require criteria to be entered from a specific population sample.
  • In view of the foregoing, the invention provides for an automated tool which takes for input a test database containing a set of reference signals, for instance audio files readable by a music player, at least one arbitrary descriptor that can be potentially correlated to the signals, a grounded truth value of that descriptor for each of the database signals and a set of elementary signal processing functions. The tool then selects functions of that set to construct one compound function or more, and automatically applies it on the signals of the database. Depending the correlations between the value returned by the function and grounded truths, new compound functions are created and tried, until an arbitrary end condition is reached.
  • More particularly, according to a first aspect, the present invention relates to a method of generating a general extraction function which can operate on an input signal to extract therefrom a predetermined global characteristic value expressing a feature of the information conveyed by that signal,
       characterised in that it comprises the steps of:
    • generating automatically compound functions, each compound function being composed of at least one of a set of elementary functions, by using means that handle the elementary functions as symbolic objects,
    • operating said compound functions on at least one reference signal having a pre-attributed global characteristic value and serving for evaluation, by using means that process the elementary functions as executable operators,
    • determining the correlation between the values extracted by those compound functions as a result of operating on the reference signal and the pre-attributed global characteristic value of the reference signal, and
    • selecting the general extraction function among those compound functions for which the correlation is relatively high.
  • The invention provides for many advantageous optional embodiments, which are outlined below.
  • The compound functions are preferably generated in successive populations,
       wherein each new population of functions takes as a basis earlier population functions which produce a relatively high correlation.
  • The method can be performed by the steps of:
  • a) preparing at least one reference signal for which the predetermined global characteristic value is pre-attributed,
  • b) preparing a population of compound functions each composed of at least one elementary function,
  • c) modifying compound functions of the current population using the means that handle their elementary functions as symbolic objects,
  • d) operating the compound functions of the population on at least one reference signal using the means that exploit the elementary functions as executable operators, to obtain a calculated value for each compound function of the population in respect of the reference signal,
  • e) for at least some compound functions of the population, determining the degree of matching between its calculated value and the pre-attributed value for the signal from which that value has been calculated,
  • f) selecting compound functions of the population producing the best matches to form a new population of functions,
  • g) if an ending criterion is not satisfied, returning to step c), where the new population becomes the current population,
  • h) if an ending criterion is satisfied, outputting at least one compound function of the current new population as a general function.
  • The compound functions are preferably produced by random choices guided by rules and/or heuristics.
  • The rules and/or heuristics can comprise at least one rule which forbids, from a random draw for selecting an elementary function to be associated with a part of a compound function under construction, an elementary function that would be formally inappropriate for that part.
  • The rules and/or heuristics can comprise at least one heuristic which favours, in a random draw for selecting an elementary function to be associated with a part of a compound function under construction, an elementary function which is considered to produce potentially useful technical effects in association with that part, and/or which discourages from the random draw an elementary function considered to produce technical effects of little or no use in association with that part.
  • The rules and/or heuristics can comprise at least one heuristic which ensures that a compound function comprises only elementary functions that each produce a meaningful technical effect in their context.
  • The rules and/or heuristics can comprise at least one heuristic which takes into account at least one overall characteristic of the reference signals.
  • Advantageously, a new population of functions is produced using genetic programming techniques.
  • The genetic programming techniques comprise at least one of following:
    • crossover,
    • mutation,
    • cloning.
  • A crossover operation and/or a mutation operation can be guided by at least one heuristic cited above.
  • The means that handle the elementary functions as symbolic objects preferably manage the functions in accordance with a tree structure comprising nodes and connecting branches, in which each node corresponds to a symbolic representation of a constituent unit function, the tree having a topography in accordance with the structure of the function.
  • Advantageously, the method further comprises a step of submitting a compound function to at least one rewriting rule executed by processing means to ensure that said compound function is cast in its most rational form or most efficient form in respect of execution efficiency.
  • Preferably the method uses a caching technique for evaluating a function, in which results of previously calculated parts of functions are stored in correspondence with those parts, and a function currently under calculation is initially analysed to determine whether at least a part of said function can be replaced by a corresponding stored result, said part being replaced by its corresponding result if such is the case.
  • The method can then comprise the steps of checking the usefulness of results stored according to a determined criterion, and of erasing those found not to be useful, the criterion for keeping a result Ri being a function which takes into account: i) the calculation time to produce Ri, ii) the frequency of use of Ri and, optionally, iii) the size (in bytes) of Ri.
  • The elementary functions can comprise signal processing operators and mathematical operators.
  • The method can further comprise a step of validating a general function against at least one reference signal having a known value for the general characteristic, and which was not used to serve as the reference.
  • The signal can express an audio content, and the global characteristic can be a descriptor of the audio content.
  • The audio content can be in the form of an audio file, the signal being the signal data of the file.
  • Examples of descriptors for which the invention can be use are:
    • a global energy indication,
    • a sung or instrumental audio content,
    • an evaluation of the danceability,
    • an acoustic or electric sounding audio content,
    • presence or absence of a solo instrument, e.g. guitar or saxophone solo.
  • According to a second aspect, the invention relates to a method of extracting a global characteristic value expressing a feature of the information conveyed by a signal, characterised in that it comprises calculating for that signal the value of a general function produced specifically by the method according to the first aspect for that global characteristic.
  • According to a third aspect, the invention relates an apparatus for generating a general function which can operate on an input signal to extract therefrom a predetermined global characteristic value expressing a feature of the information conveyed by that signal,
       characterised in that it comprises:
    • means for generating automatically compound functions, each compound function being composed of at least one of a set of elementary functions, the means handling the elementary functions as symbolic objects,
    • means for operating the compound functions on at least one reference signal having a pre-attributed global characteristic value serving for evaluation, the means processing the elementary functions as executable operators,
    • means for determining the correlation between the values extracted by those compound functions as a result of operating on the reference signal and the pre-attributed global characteristic value of the reference signal, and
    • means for selecting the general extraction function among those compound functions for which the correlation is relatively high.
  • According to a third aspect, the invention relates to an apparatus according to the third aspect configured to execute any one of the optional aspects of the method set out above, it being understood that the features defined in the context of the method can be implemented mutatis mutandis to the apparatus.
  • According to an fourth aspect, the invention relates to the use of the apparatus according to the third aspect as a fully autonomous automatic descriptor extraction function generating system.
  • According to a fifth aspect, the invention relates to the use of the apparatus according to the third aspect as a descriptor extraction means.
  • According to a sixth aspect, the invention relates to the use of the apparatus according to the third aspect as an authoring tool for producing descriptor extraction functions.
  • According to a seventh aspect, the invention relates to the use of the apparatus according to the third aspect as an evaluation tool for externally produced descriptor extraction functions.
  • According to an eighth aspect, the invention relates to a general function in a form exploitable by an electronic machine, produced specifically by the apparatus according to the third aspect.
  • According to a ninth aspect, the invention relates to a software product containing executable code which, when loaded in a data processing apparatus, enables the latter to perform the method according to the first aspect.
  • In the preferred embodiment, the above iterative search procedure through successive populations is implemented by what is known as genetic programming. The functions ― which typically take the form of executable code ― are tried and the results serve to automatically create new populations of functions in accordance with genetic programming techniques, taking the best fitting functions in a manner somewhat analogous to selection and submitting those selected functions to actions corresponding e.g. to crossover and mutation phenomena occurring in biological processes at chromosome level. The remarkable aspect here resides in applying a genetic programming technique on functions which take for argument raw electronic signals.
  • When applied to the field of music files, the proposed invention allows to extract arbitrary descriptors from music signals. More precisely, the embodiment does not extract a particular descriptor, but rather, given a set of music titles containing both examples (and possibly counter-examples) for a given descriptor, builds automatically a function that extracts from audio signals an optimum value. The same system can be used to produce a function associated to an arbitrary descriptor such as one listed in the earlier part of the introduction, which can then be exploited as a general function for that associated descriptor, in the sense that it can be made to operate subsequently on any music file to extract the value of the descriptor for that file (assuming its signals are compatible).
  • The design of the system is based on extended experiments in the field of audio/music description extraction. During these experiments the applicant observed that a deep knowledge of signal processing was required to design accurate and robust signal processing extractors. Each extractor can be seen here as a function that takes as argument a given music signal (typically 3 minutes of audio), and outputs a value. This value can be of various types: a float (for the tempo), a vector (for the timbre), a symbol (for instrumental versus song discrimination), etc.
  • The main task of extractor design is to find the right composition of basic, low-level signal processing functions to yield a value that is as correlated as possible to the values obtained by psycho-acoustic tests.
  • The preferred embodiment contains a representation of a human expertise in signal processing: it will try different combinations of signal processing functions, evaluate them, and compare them against human perceptive values. Using an algorithm based on genetic programming, different signal processing functions will be tried concurrently, and modified to find a satisfying extractor function.
  • Compared to existing approaches in music extraction, the system is one step higher: its primary function is not to produce a descriptor for a signal, but rather a function which itself will produce the descriptor, when applied on other music file signals e.g. taken from a database of signals.
  • The invention and its advantages shall become more apparent from reading the following description of the preferred embodiments, given purely as nonlimiting examples, with reference to the appended drawings in which:
    • figure 1 is a diagram showing the basic user input and output of a programmed system for automatically generating descriptor extraction functions in accordance with the invention;
    • figure 2 is a simplified block diagram showing the main functional units of the system shown in figure 1;
    • figure 3 is a symbolic illustration showing the formal compatibility requirements for two grouped elementary functions forming part of a compound function produced by the system of figure 2;
    • figure 4 is a symbolic illustration of an elementary function for performing a low-pass filtering operation on a signal;
    • figure 5 is a symbolic illustration of an elementary function for performing a short-time fast Fourier transform operation on a signal;
    • figure 6 is a symbolic illustration of a grouping of elementary functions forming a term in a compound function;
    • figure 7 is a diagram showing an example of a tree structure symbolic representation of a compound function;
    • figure 8 is a diagram showing a matrix of values calculated on a set of reference signals for a population of compound functions, and how those values are used to determine the fit of those functions with respect to a descriptor associated with the music contents of those signals;
    • figure 9 is a diagram showing, through a tree structure representation, how parts of two compound functions are combined to form a new compound function using a crossover operation according to a genetic programming technique;
    • figure 10 is a diagram showing, through a tree structure representation, how a compound function is mutated into a new compound function using a mutation operation according to a genetic programming technique;
    • figure 11 is a diagram showing, through a tree structure representation, how a caching technique is implemented to acquire results data for a prior-results data cache and to substitute a part of a function under calculation with a previously calculated result;
    • figure 12 is a flow chart showing the general steps performed by the system of figure 2 for producing a descriptor extraction function;
    • figure 13 is an example of different functions and their fitness produced automatically by the system of figure 2 for evaluating the presence of voice in music title; and
    • figure 14 is an example of different compositions of descriptor extraction functions in terms of elementary functions, and their fitness produced automatically by the system to evaluate the global energy of music titles.
  • Figure 1 depicts a system 2 in accordance with the invention to indicate the raw data on which it operates (user data input) and the output (user data output) it produces from the latter. The example is based on a music data application, in which the system 2 generates as its user data output an executable function 4, referred to as a descriptor extractor function (DE function). This function is then packaged in a data carrier 5 in a form suitable to be exploited for extracting a given descriptor from an arbitrary audio file 6. The latter is typically formatted according to a recognised standard such as CD audio, MP3, MPEG7, WAV, etc exploitable by a music player, and contains a musical piece to which a descriptor value Dx is to be associated. The DE function 4 operates on the raw data signal Sx of the audio file 6, i.e. it takes the latter as its argument or operand and returns the descriptor value DVex for that file. Naturally, the signal Sx is assumed to be compatible with the DE function 4 as regards data format. As mentioned in the introductory portion, the descriptor value is typically a number, a Boolean, or a statement, and generally belongs to the class or real objects Rn.
  • The above data carrier 5 typically comprises a software package which can contain other DE functions, e.g. for extracting other descriptor values, and possibly auxiliary software code, e.g. for management and user assistance. The data carrier 5 can be a physical entity, such as a CD ROM, or it can be in immaterial form, e.g. as downloadable software accessible from the Internet.
  • The system 2 generates the DE function 4 on the basis of both the user data input and internally programmed parameters, functions and algorithms, as shall be detailed later.
  • The user data input serves inter alia to feed an internal learning database and constitutes the raw learning material from which to model the DE function. This material includes a set of m audio files A1 to Am and, for each one Ai(1 i m), a given value Dgti of a specific descriptor De for the audio item Ti it contains. The audio files Ai are formatted as for file 6 above, and thus each produce a respective signal Si when accessed to reproduce the audio item Ti.
  • The respective descriptor values Dgt1-Dgtm associated to the audio files are established by a human judge, or a panel of human judges. For instance, if the descriptor De in question is the "global energy" of the music title, the judge or panel awards for each respective title Ti a number within a range from a minimum (level of a lullaby, for instance) to a maximum, and which constitutes the title's descriptor value Dgti. These values Dgti are referred to "grounded truth" descriptor values.
  • Figure 2 shows the general architecture of the system 2. The system is preferably implemented using the hardware of a standard personal computer PC. For ease of understanding, the different types of data used are divided into respective databases 10-18 under the general control of a data management unit 20, which further manages the overall data flow of the system 2. The databases comprise:
    • a learning database 10, which stores the signal data S1-Sm of the reference audio files A1-Am in association their corresponding grounded truth descriptor values Dgt1- Dgtm, supplied as the user data input (cf. figure 1);
    • a library 12 of elementary functions EF1, EF2, EF3, ..., which serve as the basic building blocks from which compound functions CF are created on a guided - or constrained ― random basis. A selected compound function, or possibly a selected group of compound functions, shall become an outputted DE function 4;
    • a heuristics database 14, which contains different types of guiding or constraining rules that come into play in conjunction with random selection events, notably at different stages in the elaboration of compound functions, as shall be explained in more detail below;
    • a formal rules and rewriting rule database 15, which contains a set of deterministic rules for recasting automatically-generated compound functions into their formally correct and most rational form;
    • a prior results cache 16, which stores results of previously calculated parts of compound functions in view of obviating the need to recalculate them when subsequently encountered; and
    • a validation database 18, which contains the same type of data as the learning database 10, but for other music titles. The audio data contained in that database are not used as reference for elaborating the compound functions, and thus constitute a neutral source for ultimately testing the validity of a candidate DE function 4 selected among the compound functions.
  • The signal processing and overall management of the system are carried out by a main processor unit 22 which runs programs contained in a main program memory 24. A user interface 26 associated to a monitor 28, keyboard 30 and mouse 31 allows the user input and output data of figure 1, as well as the internal programming data, to be entered and extracted.
  • Figure 3 illustrates the principle of an elementary function EF as exploited by the system 2. Being effectively an operator, the elementary function comprises executable code and one or a set of parameter(s) which it can receive as input Pin, and which defines the elementary function's boundary conditions. An elementary function acts on an operand, or argument 32 ― which can be signal data or the output of a preceding elementary function ― and generates an output that is the result of the code executed on the operand data. An elementary function EF is catalogued in the system inter alia by the type of operand, designated Toper, on which can operate and on the type of output, designated Tout, it delivers. Types Toper and Tout can be the same or mutually different for a given elementary function. Typical types include: signal, numerical (single number, float, range), vector, or matrix. As explained further, the system 2 treats elementary functions EF ― which can be assimilated to modules ― as symbolic objects or as executable operators depending on the nature of the processing required in the course of elaborating a compound function CF.
  • Figure 4 illustrates an example of an elementary function in the form of a low pass filter (LPF) operator. As such, its executable code comprises a digital LPF algorithm and its input parameters Pip are the cut-off frequency F and optionally the attenuation rate (dB/octave). The operand and output types are respectively Toper=Signal and Tout=Signal.
  • Figure 5 illustrates another example of an elementary function, this time in the form of a short-time fast Fourier transform (short-time FFT) operator. The executable code comprises a short time FFT algorithm, and its input parameters Pin are the sampling window and summation limits. The operand and output types are respectively Toper=Signal and Tout=matrix.
  • Figure 6 illustrates the principle of a string of elementary functions, the example concerning three elementary functions EFa, EFb and EFc forming a term TCF of a compound function that operates on a signal data S of an audio file, the term being TCF=EFc.EFb.EFa*S. Note that in such a string of elementary functions, an elementary function also constitutes an argument, or operand, for its left-hand neighbour (i.e. succeeding function) to which its is joined by "*" function when the case arises. Also, an output of an elementary function can include parameter input data for its neighbouring function. This is illustrated in figure 6 by the output of function EFb, which produces inter alia a signal which conveys a parameter Pin for its downstream function EFc, for instance the value of a high-pass cut off frequency if the latter is a high-pass filter function.
  • A compound function CF can contain an arbitrary number of elementary functions related by different arithmetical operators (+, -, * or ÷). Elementary functions connected together by a multiplicative or divisional operator form a term; several terms can be linked by associative operators + and - as the case arises when constructing a compound function CF.
  • Among the programs stored in the main program memory 24 are:
    • a compound function construction program 25, which has the role of generating compound functions by assembling together a number of elementary functions EF. The latter are typically signal or data processing functions that can each be considered as a single unit operator or module that produces a determined technical effect on the signal data Si of an audio file or on the output of another elementary function, and
    • a function execution program 27, which is composed of the compound functions themselves, these being exploited no longer as symbolic objects, but as executable algorithmic entities for producing technically meaningful operations on signal data S.
  • These two programs 25 and 27 are under the overall control of a master program 29 which manages the overall system 2.
  • The compound function construction program 25 is based on genetic programming techniques following an artificial intelligence (AI) approach. Accordingly, the elementary functions EF are also handled as symbols, whereby they are treated as first class obj ects in their symbolic representation.
  • Thus, the system 2 is capable of handling the elementary functions both as objects, when executing the compound function (CF) construction program 25, and as executable operators, notably for evaluating and testing the compound functions, when executing the function execution program 27. To this end, these two programs 25 and 27 use languages adapted respectively to handling objects and to carrying out numerical calculations, an example of the latter being the "Matlab" language.
  • Table I gives a non-exhaustive example of elementary functions stored in the elementary function library 12, together with their operand type Top, output type Tout and parameters.
    sample list of elementary functions used by the system 2.
    I.1 ― Mathematical functions
    Function name Operation Param Pin Toper Tout
    DERIV Time derivative - Signal Signal
    MAX Max value of set - set of No.s No.
    MIN Min value of set - set of No.s No.
    SQUARE Raise power 2 - No. No.
    LOG Logarithm - No. No.
    MEAN ave value of set - set of Nos. No.
    VAR variance of set - set of Nos. No.s
    ABS(V) Absolute value |V| - signed V unsigned V
    SUM Summation of terms No. set of No
    SQRT Square root - No. No.
    POWER Raise power 'i' Integer i No. No.
    I.2 ― Signal processing functions
    Function name Operation Param Pi Toper Tout
    ENV. Envelope of signal - Signal Signal
    FFT Fast Fourier transf. limits Signal Signal
    stFFT short-time FFT limits/time Signal Matrix/Vector
    AUTOCOR autocorrelation - Signal Vector
    COR correlation - Signal/Signal Vector
    LPF Low-pass filter Fcutoff/atten. Signal Signal
    HPF High-pass filter Fcutoff/atten. Signal Signal
    BPF Bandpass filter Flow/Fhigh/atten. Signal
    Signal
    FLAT Flatness Signal No.
    E Energy Signal No.
    PITCH Pitch - Signal No.
    1.3- Combining and connecting functions
    Function name Operation Para Pi -
    COMPOSITION o -
    LOOP Repeat until No. iterations
    ( bracket
    COMBINATION * Multiply - -
    ÷ Divide - -
    + Add - -
    - Subtract - -
  • The last four combination operators are simply arithmetic operators which join successive functions, but are treated as functions too.
  • Advantageously, when the system handles the elementary functions as symbols, as in the above construction phase, it uses a tree structure.
  • According to the tree structure, a compound function CF is symbolised in terms of nodes, where each node corresponds to one elementary function EF, and in which branches connect the nodes according to the arithmetic operators +, -, *, ÷ used.
  • As an example, figure 7 illustrates the tree structure for the compound function CF = MAX.DERIV.FFT.FFT.LPF(B1)(S) + ABS.PITCH.LPF(B2)(S) + PITCH.HPF(VARIANCE(S))(S). The three terms are developed along three respective branches Br1-Br3. The three branches join at the "+" function, which is the common link to CF. The order of appearance of the elementary functions is followed along successive nodes, the first elementary function (i.e. the first to operate on the signal) being nearest the free end of its branch.
  • The CF construction program 27 initially begins by selecting and aggregating elementary functions in a random fashion.
  • Elementary rules and heuristics intervene in this random process to govern the appropriateness of combinations of elementary functions, notably as regards the incorporation of a potential elementary function in the context of any elementary function already present in term under construction.
  • Firstly, rules govern the function generation process on a number of different considerations, among which are:
  • i) Formal rules. These rule out the existence of two combined elementary functions EFbEFa if their types are not compatible. In other words, if for the above two functions the output type Tout(a) of EFa is not the same as the operand type Toper(b) of EFb, then EFbEFa, and elementary function EFa has already been selected, then elementary function EFb is attributed a zero weighting coefficient for the random draw that is to select an elementary function for which elementary function EFa is the operand. For example, the formal rule weighting scheme would forbid the meaningless operator combinations FFT.MAX.DERIVABS(V), etc. The formal rules also ensure that the right-hand most function of a term in the compound function has a signal operand type (Toper=S), given that it will necessarily operate on the signal Si from an audio file.
  • ii) Boundary condition rules. These rules serve to impose constraints on the compound functions or their populations having regard to the system parameters, such as: length constraint on the compound functions, by weighting the number of elementary functions used to favour a prescribed median value, the number of branch points (cf. the tree structure), the number of compound functions produced to form a first population P, etc..
  • Secondly, knowledge-based heuristics generally operate by associating to each elementary function EF a weighting coefficient affecting its random draw probability. These coefficients are attributed dynamically according to immediate context. The heuristics can in this way rule out some combinations of elementary functions through a zero weighting coefficient, at one extreme, and force combinations by imposing an absolute maximum value coefficient at the other extreme. A set of intermediate weighting coefficient values is provided to allow the random process to determine the construction of compound functions, albeit with constraints. These heuristics are generally derived from experience in using the system and the user's formal or intuitive knowledge. They thus allow the user to inject his or her know-how into the system and afford a degree of personalisation. They can also be generated by the system itself on an automated basis, using algorithms that detect similarities between compound functions that have been recognised as successful.
  • By using the range of attributable weighting coefficients in implementing these heuristics, the system user can use them:
  • i) as a positive influence, i.e. to encourage the presence or combinations of elementary functions that are of interest. For example, the system uses a knowledge based heuristic to favour the presence of two successive FFTs on a signal S, i.e. FFT.FFT(S), this being found to be conducive to interesting results;
  • ii) as a negative influence, i.e. that on the contrary to seek to prevent elementary function combinations that are considered to be ineffective or technically inappropriate. For instance, it has been found that the presence of three successive FFTs on a signal S, i.e. FFT.FFT.FFT(S) does not usually produce interesting results. The corresponding heuristic used by the system will thus give a low weighting coefficient to an FFT elementary function in the draw for the elementary function to be the operand on the existing combination of FFT.FFT.
  • Before the newly-formed compound functions are processed, they are advantageously submitted to rewriting by application of rewriting rules stored in database 15. Rewriting involves recasting compound functions from their initial form to a mathematically equivalent form that allows them to executed more efficiently. It is governed by a set of deterministic rewriting rules of varying levels of complexity which are executed on each function CFi of the population by the main processor 22, those rules being in machine-readable form.
  • Simple rewriting rules eliminate self-cancelling terms in a compound function. For instance, if the compound function considered contains the terms HPF(S, Fa)+FFT(S)- FFT(S), the rewriting rules shall tidy up the expression and reduce it to HPF(S, Fa).
  • Another category of rewriting rules eliminates elementary functions that are redundant given their environment, i.e. which do not produce a technical effect. For instance, if an expression contains a bandpass filtering function with a passband between frequencies Fb and Fc, then the rules would eliminate any subsequent function in that term which filter out frequencies outside that passband range, i.e. which are no longer present.
  • Other rewriting rules conduct simplifications of a more advanced type. For instance, they will replace systematically the expression E(FFT(S)) by the equivalent, but more easily calculable, expression E(S).
  • The implementation of the rewriting rules uses the tree structure of the compound function under consideration. Each node, or section of the tree, is scanned against the set of rewriting rules. Whenever a rewriting rule is applicable to a node or a succession of nodes of the part of the tree being analysed, the node or succession of nodes in question is rewritten according to that rule and replaced by a new tree section or node that corresponds to the thus rewritten ― and hence simplified ― form of the compound function.
  • Each time the tree is modified in this way, it is scanned again, as its new form can create new opportunities for applying rewriting rules that were not evidenced in the previous form of the tree. Accordingly, the tree scanning is repeated cyclically until no changes have been brought for a complete scan.
  • To ensure that there is no risk of falling into infinite loops, the rewriting rules do not produce a change that in itself leads to another change, and conversely, ad infinitum. For instance, the system would not contain simultaneously a rule to rewrite A+B as B+A and another rule to rewrite B+A as A+B (in fact, this would be the same rule, infinitely applicable to the result of its own production, and therefore yielding an unending loop)
  • Once the population P of compound functions has been formed in accordance with the above heuristics and rules, the compound functions cease to be considered as symbolic objects and are treated instead by the function execution program 27 according to their specified functional definitions.
  • Specifically, a compound function CFi (1≤ i ≤ n) is treated by the system 2 as a calculation routine using "Matlab" language and made to operate on the music file data signals Sj (1≤j≤m) stored in the learning database 10 to produce an output value Dij=CFi*(Sj). The signal Sj in question corresponds to a digitised form of an amplitude (signal level) evolving in time t, the time frame of t typically being on the order of 200 seconds in the case of a music title.
  • Each of the n compound functions CF1-CFn operates in this way on each of the m titles stored in the learning database 10, thereby producing a total of n.m output values Dij (for i=1 to n and j=1 to m) according to a matrix for the population P. This combination of calculation events is illustrated symbolically in figure 8.
  • As shown in figure 8, the n.m output values are mapped in matrix MAT(P) which is stored in a working memory of the main processor 22. These values are accessed at a subsequent stage of evaluating the overall fit of each of the n compound functions CF1-CFn with the descriptor De for which the grounded truths Dgt1-Dgtm were produced. This evaluation is carried out by standard statistical analysis techniques. In the illustrated example, each of the output m.n output values of the matrix MAT(P) is compared with its respective corresponding grounded truth descriptor value Dgti. Specifically, the set of m.n values Dij is analysed against corresponding grounded truth descriptor values Dgt1-Dgtm for the descriptor De ascribed to the respective music titles T1-Tm.
  • For a given compound function CFi, the analysis here involves comparing the value Dij with the Dgtj value for the corresponding audio file. This comparison is performed for each of the audio files, so yielding m comparison values. These comparison values are submitted to statistical analysis to obtain a global fit ― or fitness ― value FIT(afj) with respect to the descriptor De for that function CFi. The global fitness value FIT(afj) expresses objectively how well overall the descriptor values generated by the function CFj match ― or correlate ― with the corresponding grounded truth descriptors Dgt1-Dgtm.
  • The global fit in question is evaluated in the form appropriate for the descriptor, for instance numerical closeness for a numerical descriptor, Boolean correspondence for a Boolean descriptor, etc.
  • The above comparisons and statistical analysis are conducted for each of the n compound functions CF1-CFn, and the respective fitness values FIT(af1)-FIT(afn) are stored.
  • Then a new population P1 of r compound functions is produced by taking for its members those of the n compound functions CF1-CFn which yield the r best overall fit values (r<n).
  • The basic comparisons and analysis in conducting the above procedure is indicated in the algorithm below:
  • For CF1: comp. D11 with Dgt1; D12 with Dgt2; D13 with Dgt3; ...; D1m with Dgtm => STATISTICAL ANALYSIS => fit of CF1 with respect to descriptor De = FITaf1(De);
  • For CF2: comp. D21 with Dgt1; D22 with Dgt2; D23 with Dgt3; ...; D2m with Dgtm => STATISTICAL ANALYSIS => fit of CF2 with respect to descriptor De
    = FITaf2(De)
  • For CF3: comp. D31 with Dgt1; D32 with Dgt2; D33 with Dgt3; ...; D3m with Dgtm => STATISTICAL ANALYSIS => fit of CF3 with respect to descriptor De = FITaf3(De) ;
    ....
  • For CFn: comp. Dn1 with Dgt1; Dn2 with Dgt2; Dn3 with Dgt3; ...; Dnm with Dgtm => STATISTICAL ANALYSIS => fit of CF3 with respect to descriptor De = FITafn(De).
       →New population P1 = set of r compound functions CF yielding the r best fits FITaf(De).
  • The r compound functions CF(1)1 to CF(1)r of the new population P1 are then processed in their symbolic object form according to the above-described tree structure. The aim here is to generate from that population P1 a next generation population P2 of compound functions. Advantageously, the system achieves 2 this by using genetic programming techniques. These programming techniques model aspects of biological regeneration or reproduction process naturally ocurring at chromosone level, such as crossover and mutation. In this case, the analogue to a chromosone is an elementary function EF in its symbolic representation.
  • Genetic programming is in itself well documented, but hitherto reserved only to fields remote from electronic signal processing. Remarkably, it can be implemented to a great advantage in the present field by virtue of the present approach in which the compound functions question, whose primary purpose is to operate on an electronic signal, are conveniently made exploitable, at critical phases of their elaboration process, as symbolic objects. This "object" form, which advantageosly uses the above-described tree structure, thereby becomes amenable to genetic programming using standard knowledge of applied genetic programming. Accordingly, detailed aspects involving normal knowledge of genetic programming language and practice accessible to a person skilled in the art of genetic programming shall not be detailed in the present description for reasons of conciseness.
  • The concept of genetic programming applied to the present signal procesing functions CF is illustrated in connection with two interesting aspects: crossover and mutation. Each is implemented with adapted and specific rules and heuristics stored in the heuristics database 14 and the rules database 15. Among the rules and heuristics applied in the context of genetic programming are the formal and boundary condition rules, and knowledge-based heuristics outlined above, and adapted to circumstances. Overall, the rules and heuristics applied ensure that the compound functions resulting from genetic programming operations are formally acceptable, have a potential for exhibiting an improvement (in terms of fitness) compared the functions from which they are generated, and remain within the system's operating limits.
  • Crossover. Simply stated, crossover involves taking two compound functions, say CF(1)p and AP(1)q, (for population P1) and creating from them a new function CF(1)pq which contains a mixing of functions CF(1)p and AP(1)q, in a manner analogous to two chromosomes combining to form a new chromosome.
  • An example of a new function CF(1)pq produced by crossover of functions CF(1)p and AP(1)q is illustrated by figure 9 using the tree representation. In this representation, the elementary functions are designated in their abbreviated form: ep1-ep10 for compound function CF(1)p and eq1 to eq10 for compound function CF(1)q.
  • Crossover is carried out by a crossover generator module 33 forming part of the compound function construction program 25 stored in memory 24. The module 33 receives the two functions CF(1)p and CF(1)q as input and analyses their tree structure using a set of stored crossover rules and heuristics. The analysis seeks to determine, for each function, a suitable break point along a branch. The break point divides the tree in question into a portion that is to be rejected and a portion that is to be retained. In the example, it can be seen that for compound function CF(1)p, the part of the tree structure comprising elementary functions ep7 to ep10 is retained, and the part on the other side of the break point comprising elementary functions ep1 to ep6 is rejected. Similarly for compound function CF(1)q, the part of the tree structure comprising elementary functions eq1 to eq6 is retained, and the part on the other side of the break point comprising elementary functions eq7 to eq10 is rejected. The two retained portions of the respective trees are joined together at their respective break points. This is carried out by attaching with a straight branch the nodes of the respective retained parts lying adjacent the break points. Thus, in the illustrated example, node eq6 is attached by a branch to node ep7. The resultant crossover tree corresponding to compound function CF(1)pq is then composed of elementary functions eq1-eq6, ep7-ep10.
  • More complex crossover operations can involve extracting at least one section of a tree (not necessarily an end section) and inserting it within another tree by producing one or several break points in the latter depending on where it is to be accommodated.
  • The break points are determined in a guided ― or constrained ― random draw, in which the guidance is provided by a set of crossover rules and heuristics.
  • A first such rule is of the formal type, and requires that two nodes susceptible of being joined together must be formally compatible from the point of view of types, as described above in the context of formal rules. To this end, candidate break points for the random draw are considered in mutually indexed pairs, each member of the pair being associated to a respective tree. The corresponding nodes to be joined are identified in terms of which ones correspond respectively to the operand and the operating function among the pair. Only those pairs of break points satisfying the formal requirements are accepted as candidates.
  • Thus, in the illustrated example, the rules in question shall ensure that despite the crossover resulting from a random draw, the operand type Toper(ep7) of elementary function ep7 is the same as the output type Tout(eq6) of elementary function eq6.
  • Another rule is of the boundary condition type and requires that the break point should preferably be at the central portion of the tree, e.g. by using weighted random draws, to ensure that the size of crossover-generated compound functions shall be statistically similar in size over repeated generations.
  • Finally, knowledge-based heuristics are tested on crossover-generated compound functions. The operators in the new compound function are tested one by one starting from the break point. The knowledge-based heuristics provide a probability for each new operator, regarding which the compound functions is accepted or rejected at each step.
  • Mutation. Mutation involves taking one compound function CF(1)s and forming a variant thereof CF'(1)s. The variant can be produced by modifying one or a number of the parameters of CF(1)s, and/or by modifying the function's structure, e.g. by adding, removing or changing one or several of its elementary functions, or by any other modification.
  • An example of a new compound function CF'(1)s produced by mutation of a function CF(1)s is illustrated by figure 10. In this representation, the initial compound function CF(1)s has a tree structure formed of elementary functions es1 to es7 as shown.
  • This function is inputted to a mutation generator module 34 forming part of compound function construction program 25. The mutation generator module 34 produces on that function one or several mutations on a guided - or constrained - random basis.
  • In the illustrated example, the outputted mutated function CF'(1)s happens to differ from the inputted function CF(1): i) at the level of the elementary function es6, which is a lo pass filter operator whose parameter P'(es6) now specifies a cut-off frequency of 450 Hz instead of 600 Hz in its original form P (es6), and ii) at level of elementary function es1, which is simply being deleted.
  • The mutation process is governed by mutation rules and heuristics, which include formal rules that likewise ensure that any changed function remains formally correct, and boundary condition rules which govern the nature and number of mutations allowed, etc.
  • The system can implement other genetic programming operations. For instance, it can produce a cloning, which involves taking one compound function CF(1)t and forming a variant thereof CF'(1)t. The variant has exactly the same functional structure as the original function CF(1)s. Only the values of the fixed parameters are modified. For instance, if the original compound function contains a low-pass filter with a fixed cutoff frequency value of 500Hz, a clone would be the same compound function with a different cutoff frequency value of 400Hz for instance. A cloning parameter can control the extent of the variations of the values (for example +/- 10%). Note that cloning is simply a special ― and restricted ― case of mutation in the sense described above.
  • The genetic programming procedure comprising the above crossover and mutation operations (and possibly other operations) are applied to the population P1 of functions over a given period or number of cycles. When the procedure is terminated for the population, there results a new population P2 of compound functions which are the genetic descendants of those from population P1.
  • The number of compound functions CF(2) forming the population P2 is made to be the same as for population P, so as to accommodate for a selection of the r best fitness functions of that population to produce its own succeeding population of functions P3. In order to keep the population size constant, the cumulated proportions of compound function generated randomly (R%), by mutation (M%), by crossover (CO%), and cloning(C%), has to be so that R + M + CO + C = 100%. This consideration applies to all succeeding generations so that their populations do not dwindle in the course of eliminating the lowest fitness functions. Thus, the creation of new population typically calls for a repetition of the random creation procedure (described above for randomly creating the initial population P) to top up the population, given that crossover operations tend reduce the population (if C < CO).
  • The new population P2 is then treated in the same manner as the initial population P, starting with a phase undergoing rewriting rules (the rules and heuristics listed above have already applied explicitly or implicitly to that population P2 in the course of the genetic programming (crossover and mutation) operations.
  • Accordingly, the correlation, or fitness of each compound function CF(2) of the new population is determined against the grounded truth descriptor values Dgt1 to Dgtm for the descriptor De. The procedure here is just as for obtaining population P1, and the algorithm described above applies mutatis mutandis by replacing P with P1, and P1 with P2.
  • The result gives a new set of the r best compound functions CF(2)1 to CF(2)r for the descriptor De, forming the new population P2.
  • The above procedure is carried out iteratively over a given number of cycles, each cycle producing a new population Pu from the previous population Pu-1 by genetic programming and a selection of the best compound functions for the population Pu.
  • Implementation of heuristics.
  • Further aspects of the heuristics used by the system are outlined below, notably for function generation (producing the population P) and genetic programming.
  • A heuristic can be represented as a function which has for argument (operand):
  • i) a current term: one or more functions or a tree section, corresponding to the existing environment in terms of the composition of elementary functions EF - for instance the elementary function combinations that have already been produced during an ongoing function construction process;
  • ii) a potential term: likewise one or more functions or a tree section, for which the possibility of incorporation into the current term is to be considered by the heuristic.
  • The heuristic function produces from the above argument a result in the form of a value in a specified range, e.g. from 0 to 10, which expresses the appropriateness or interest of constructing a function in which the potential term is branched (according to the tree representation) to the current term, e.g. as its argument.
  • The range of weighting coefficients (which are here expressed to one decimal) expresses quantitatively the following:
       weighting coefficient
    0 potential term forbidden from random draw
    1 of very little interest
    ...
    5 of medium interest
    9 extremely interesting
    10 potential term imposed (i.e. must be selected).
       The heuristic function(s) can come into play in the following example: current term = LPF(500Hz).FFT.S
       potential term (to become the argument (operand) of the current term) = FFT.DERIV.FFT.S
  • A heuristic shall determine the appropriateness of creating the branching where the "S" of the current term becomes "FFT.DERIV.FFT.S".
  • In the above case, one example of an applicable heuristic function is the one, which is here designated "HEURISTIC 245", that on the one hand favours the presence of two FFTs (FFT.FFT.(...), and on the other hand discourages the presence of three FFTs (FFT.FFT.FFT.(....). It is catalogued in the heuristics database 14 as:
  • HEURISTIC245:
    • statement of purpose: "interesting to have FFT of FFT, but not FFT of FFT of FFT";
    • form: HEURISTIC245(current term, potential term);
    • potential term weighting coefficient attribution procedure:
    • if type of current term is FFT,
    • AND if current term does not contain other FFT type terms,
    • AND if type of potential term is FFT,
    • AND if potential term contains an FFT,
    • THEN : potential term's weighting coefficient = 0.1 {indeed, the complete function would then have three FFTs, and a low weighting coefficient is therefore attributed}
    • ELSE: potential term's weighting coefficient = 8.0.
    • Procedures and statements of which the above is an example can be adapted to all other heuristics of the database 14.
    Another heuristic function designated HEURISTIC250 is as follows:
  • HEURISTIC250:
    • statement of purpose: "give preference to a filtering on raw signals".
    • potential term applicable: Filter class {LPF, HPF, BPF..}
    • form HEURISTIC250(current term, filter class)
    • potential term weighting coefficient attribution procedure:
    • if current term contains FFT, THEN: potential term's weighting coefficient = 0 {filtering is meaningless if an FFT is carried out beforehand},
    • if current term contains CORRELATION, THEN: potential term's weighting coefficient = 3 {if a correlation is carried out beforehand, filtering is of doubtful use, but could nevertheless return an interesting value},
    • ELSE: potential term's weighting coefficient =7 {if the current term does not contain signal modification operations such as FFT, CORRELATION, it is generally useful to filter the signal to retain just some of its spectral components}.
    • Other heuristics can be implemented to take in account a given context, or an indication of the descriptor De for which the compound function is constructed. These are referred to as "context sensitive heuristics".An example of a context sensitive heuristic is as follows:
      • Context sensitive heuristic CSHEURISTIC280
        • statement of purpose: "to treat problems pertaining to a sung voice (presence, extraction, ....), whereby it is useful to use frequencies of the human voice e.g. from 200 Hz to 1500 Hz";
        • context = analysis of voice
        • potential term to which it is applicable: Filter(lowF, highF)
        • current term to which it is applicable: any.
        • potential term's weighting coefficient attribution procedure:
          • if lowF (of signal) is close to 200 HZ, potential term's weighting coefficient is correspondingly high (e.g. 9 for 200 Hz, 8 for 300 Hz, etc.);
          • if highF (of signal) is close to 1500, potential term's weighting coefficient is correspondingly high (e.g. 9 for 1500 Hz, 8 for 1400 Hz, etc.).
  • A further class of heuristics, known as "reference base sensitive heuristics" takes into account the global nature of the signals in the learning database 10. The latter is expressed by a quantity referred to as "global reference indicator"
  • These heuristics therefore additionally have this global reference indicator as their parameter. The latter can also be for instance a set of descriptors taken out from that reference database.
  • They enable to select functions in dependence of the nature of the reference signals.
  • An example a of reference base sensitive heuristic is as follows:
  • HEURISTIC465;
    • form HEURISTIC465 (current term, potential term, global reference indicator):
    • statement of purpose: "indicate that it is particularly useful to use FFTs when the reference database signals overall have a complex spectrum".
    • potential term's weighting coefficient attribution procedure:
      • if current term does not contain other FFT type terms,
      • AND if potential term is an FFT,
      • AND if the reference database signals have (for the most part) a complex spectrum, with spectral characteristics SC1, SC2, ..
  • THEN: potential term's weighting coefficient = 9.
  • Caching technique.
  • The iterative loops used by the system 2 involve a considerable amount of processing, especially for the steps of extracting a value Dij of a compound function CFi for a signal data Sj. In order to maximise the efficiency of that task, the system advantageously uses the prior results cache 16 as a source of precalculated results that save having to repeat calculations that have previously been performed.
  • The corresponding caching technique involves analysing a compound function under execution in terms of its tree structure, and thus involves both the symbolic, object representation of the function and its exploitation as an operator.
  • Figure 11 is an example illustrating how the caching technique is implemented. At a time t1, the system 2 is required to calculate the expression MAX*FFT*LPFILTER(F=600Hz)*(Si) (F=cut-off frequency) that appears at a branch Brp of a given compound function CFu(Si).
  • Assuming that the prior results cache 24 is initially empty at that stage, the main processor 22 proceeds in a stepwise manner on the successive elementary functions. Thus, it calculates LPF(S), F=600Hz at a first step i) and stores the result as R1, then calculates FFT*R1 at a second step ii) and stores the result as R2, and finally calculates MAX*R2, which yields the value for the term of branch Br1.
  • The above intermediate and final values R1, R2 and R3 are sent to prior results cache 24 together with an indication of the parts of branch Br1 that generated them. Thus, the cache records that LPF(Si), F=600Hz = R1, FFT*LPFILTER(F=600Hz)*(Si) = R2, and MAX*FFT*LPFILTER(F=600Hz)*(Si) = R3 in a two-way correspondence table. Note that results are stored in the cache 24 for an operation on a specific set of data contained in the signal data Si. The set in question can correspond to a predetermined time sequence of the associated audio file, for instance corresponding to one sampling event.
  • At a later time t2, the main processor 22 is required to calculate the value of a branch Brq belonging to another function CFv(S). In the example, the branch Brq corresponds to the term AVE* FFT*LPFILTER(F=600Hz)*(Si).
  • The cache 24 now no longer being empty, the main processor 22 proceeds to determine first whether at least one function of that branch has already been calculated and stored in the cache 24. To this end, it performs a scan routine on branch Brq by determining whether the first function to be calculated, i.e. LPFILTER(F=600Hz)*(Si) is indexed in the cache 24. The answer being yes, it determines whether the first and second functions together, i.e. FFT*LPFILTER(F=600Hz)*(Si) are indexed in the cache. The answer being again yes, it determines whether the first, second and third functions together, i.e. AVE*FFT*LPFILTER(F=600Hz)*(Si) are indexed in the cache. The answer this time being no, it is thereby informed that the most useful result in the cache is R2= FFT*LPFILTER(F=600Hz)*(Si). Accordingly the main processor 22 rewrites the contents of branch Brj as AVE(R2) and calculates that value. The result of that calculation R4, indexed to the function AVE(R2), or equivalently to the term AVE* FFT*LPFILTER(F=600Hz)*(Si), is sent to the cache 24 so that it need not be calculated in whole at a later stage.
  • The cache 24 is thus enriched with new results every time a new function or term is encountered and calculated. The caching technique becomes increasingly useful cache contents grow in size, and contributes remarkably to the execution speed of the system 2.
  • In practice, the number of entries in the prior results cache 24 can become too large for an efficient use of allowable memory space and search. There is therefore provided a monitoring algorithm which regularly checks the usefulness of each result stored in the cache 24 according to a determined criterion and deletes those found not to useful. In the example, the criterion for keeping a result Ri in the in the cache 24 is a function which takes into account: i) the calculation time to produce Ri, ii) the frequency of use of Ri, and iii) the size (in bytes) of Ri. The last condition can be disregarded if available memory space is not an issue, or if it is managed separately by the computer.
  • After a given number of cycles or a given execution time according to a chose criterion, the system 2 produces as its user data output a descriptor extraction (DE) function 4 (cf. figure 1). The latter is the member of the latest generation population Pf of compound functions CF(f) that has been found to have the best fit for the descriptor De. The user output can produce more than one member of that population, for instance the b best fit functions CF(f), where b is an arbitrary integer, or those compound functions that exhibit a fit better than a given threshold.
  • The criterion for ending the loop back to creating a new population of functions is arbitrary, an ending criterion being for example one or a combination of: i) execution time, ii) quality of results in terms of the functions' fitness, iii) number of generations of functions (loops executed), etc.
  • Preferably, before an composite function is finally outputted as a DE function for future exploitation, it is validated against signals of other music titles taken from the validation database 18. As these signals are not used to influence the construction of the DE functions 4, they serve as a neutral reference on which to check their effectiveness. The checking procedure involves determining the degree of fit between on the one hand a descriptor value obtained by making a DE function operate on a signal Sv of the validation database and on the other the grounded truth descriptor value associated to the music title of that signal Sv. An overall correlation or validation value is generated by statistical analysis over a given number of entries of the validation database 18. If the validation value is above an acceptable threshold, the DE function 4 is validated and thus considered to be exploitable. In the opposite case, the DE function is rejected and another DE function is considered.
  • Figure 12 is a flowchart illustrating some steps performed by the system 2 of figure 2 in the course of producing a descriptor extraction function DE 4, these being:
    • inputting user input data to constitute the learning database 10 (step S2), whereby the database comprises the set of reference signals S1-Sm in association with their global characteristic values Dgt1-Dgtm pre-attributed,
    • preparing a population P1 of functions CF1-CFr each composed of at least one elementary function (EF) (step S4),
    • modifying functions of the current population using programmed means 22, 25 that handle their elementary functions as symbolic objects (step S6),
    • operating each function of the population on at least one reference signal using means 22, 27 that exploit the elementary functions as executable operators, to obtain a calculated value for each compound function of the population in respect of the reference signal (step S8),
    • for each function of the population, determining the degree of matching between its calculated value and the pre-attributed value Dgti for the signal from which that value has been calculated (step S10),
    • selecting functions of the population producing the best matches to form a new population P2 of functions (step S12),
    • if an ending criterion is not satisfied, returning to step S6, where the new population becomes the current population (step S 14), and
    • if an ending criterion is satisfied, outputting at least one function of the current new population as a general function (4) of the user output (step S16).
  • Heuristics and/or rules can be entered, edited, modified through the user interface unit 26 e.g. by manual input (keyboard) or by download, thereby making the system fully adaptive and configurable.
  • Typically, the system generates several hundred compound functions over a twelve-hour period. The learning database preferable comprises at least several hundred titles, and preferably several thousand. The handling of such large databases is simplified by the use of the above caching technique and heuristics. Parallel processing, where a same function is calculated on several titles simultaneously using respective processors over a network can also be envisaged.
  • The size of the compound functions is typically of the order of ten elementary functions.
  • The system is remarkable in that it does not need to be informed of the descriptor De for which it must a find a suitable DE function. In other words, all that is necessary is to provide examples of just the descriptor values Dgti associated to music titles Ti and their signal data Si. This makes the system 2 completely open as regards descriptors, and amenable to generating suitable DE functions for different descriptors without requiring any initial formal training or programming specific to a given descriptor.
  • In the embodiment, the system is connected to a network, such as Internet or a LAN, in order to facilitate the acquisition of music titles through a download centre 36. The networking also makes it possible to share and exchange elementary functions, compound functions, heuristics, rules and DE functions found to be interesting, as well as results data for the prior results cache 24, allowing parallel processing, etc. In this way, an interactive community of searchers can be fostered and allow the a rapid spread of new developments.
  • The heuristics and/or rules can be entered / edited / parameterised through the user interface unit 26; they can also be generated / adapted internally by the system, e.g. by processing techniques based on analysing compound functions that produce the best fits and determining common features thereof expressible as rules and/or heuristics.
  • Figure 12 is an example of different DE functions and their fitness produced automatically by the system for evaluating the presence of voice in music title.
  • Figure 13 is an example of different compositions of DE functions in terms of elementary functions, and their fitness produced automatically by the system to evaluate the global energy of music titles.
  • The method and data implemented by the system can be presented as executable code forming a software product stored on a computer-readable recording medium, e.g. a CD-ROM or downloadable from a source, the code executing all or part of operations presented.
  • From the foregoing, it will be appreciated that the above-described system is remarkable by virtue of many characteristics, inter alia :
    • its genericity: the system is independent of a given descriptor, and is able to infer an extractor (DE function) for arbitrary problems;
    • its heuristics: the system contains many built-in heuristics that guide the search, and reduce the search space. The originality here is that the system encodes heuristics specific to signal processing, and provides a way to evaluate the fitness of a given function by testing it against a real database of music titles;
    • caching, which greatly reduces the workload on the main processor 22 and accelerates calculation considerably;
    • rewriting, which provides the groundwork for ensuring that functions shall be calculated in their most rational form;
    • implementation: the aim is calculate functions on an automated basis, rather than manually. In the respect, the embodiment can be likened to an expert system in artificial intelligence, where it substitutes the role of the human specialist in signal processing. Extracting descriptors automatically from the digital representation of an acoustic signal in accordance with the invention allows to scale-up descriptor acquisition, and also ensures that the descriptors obtained are objective.
  • The remarkable aspects of the present automated system 2 can be appreciated from considering how the task would have to be considered in a manual approach. The starting point is the raw data signals as seen by the specialist in signal processing. The latter tries out various processing functions according to a empirical methodology in the expectation that some rule shall emerge for correlating complex signal characteristics with that descriptor. In other words, the approach is extremely heuristic in nature. It is also largely based on trial and error.
  • This task of manually finding a combination of signal processing functions by signal processing experts is time-consuming and subject to many subjective biases, errors, etc. In most cases it would be too impractical to be considered in a real-life application.
  • System applications.
  • 1. Fully autonomous automatic descriptor extraction function generating system.
  • In the embodiment described above, the programmed system 2 is able to generate an exploitable DE function 4 from scratch using just the user data input indicated with reference to figure 1.
  • The DE function typically takes on the form of executable code or instructions comprehensible to a human or machine. The contents of the DE function thereby allow processing on the audio data signal of any given music title to extract its descriptor De, the latter being referenced to the function.
  • The process of extracting in this way the descriptor De of a music title can be performed by an apparatus which is separate from the system. The apparatus in question takes for input the DE function (or set of DE functions) produced by the system 2 and audio files containing signals for which a descriptor has to be generated. The output is then the descriptor value Dx of the descriptor De for the or each corresponding music title Tx. The DE function (or set of DE functions) produced by the system 2 is in this case considered as a product in its own right for distribution either through a network, or through a recordable medium (CD, memory card, etc.) in which it is stored.
  • 2. Descriptor extraction
  • It will be noted that the system 2 already includes all the hardware and software necessary to constitute an automated descriptor generating apparatus as defined in the preceding section. In this case, the DE functions shown as user data output of figure 1 are fed back to the system (or kept within system and stored). The system can be switched to the descriptor extraction mode in which audio signal data corresponding to a music file Tx to be analysed is supplied as an input and the corresponding music descriptor value of Tx for the descriptor De is provided as the output.
  • 3. Authoring tool for producing descriptor extraction functions.
  • In a variant, the system is implemented more as an authoring tool. In this implementation, the system allows the outputted DE functions to be modified by external intervention, generally by a human operator. The rationale here is that in some cases, while the functions produced automatically may not be strictly optimal, they are nevertheless highly interesting as a starting basis for optimisation, or "tweaking". The advantage in this case resides in that the human specialist has at his disposal a descriptor extraction function firstly which is already proven to be effective compared to a large number of other possible functions, indicating that it possesses a sound structure, and secondly which is proven to be amenable to fast and consistent execution. Note that the DE function outputted by the system 2 can generally be modified by intervening in this case too either at the level of the basic elementary function taken as a symbolic object, e.g. by substitution, removal, or addition, or at the level of the internal parameterisation of a basic elementary function, e.g. by changing a cut-off frequency value in the case of the low-pass filtering elementary function.
  • 4. Evaluation tool for externally produced descriptor extraction functions.
  • The aspect of the system 2 that analyses and evaluates compound functions can be put at the disposal of external sources of candidate DE functions, so as to help designers evaluluate their own descriptor extraction functions. The evaluation can be used to provide an objective assessment of the "fitness" FIT of such a candidate function with respect to the learning database 10 or validation database 18.
  • 5. Function calculation tool for externally produced DE functions.
  • Similarly, the function calculation potential of the system 2, enhanced notably by the above-described rewriting rules and the caching technique, can be put at the disposal of outside users. The latter can then input a given complex signal processing function (not necessarily in the context of descriptor extraction) and receive a calculated value as an output.
  • Scope
  • While the invention has been described in the context of a system adapted to process audio file signal data to produce descriptor extraction functions DE, it will be apparent that the teachings of the invention are applicable to many other applications where it is required to analyse low level characteristics of an electronic data signal (digital or analogue) in view of extracting higher-level information relating to its contents. For instance, the invention can be implemented for obtaining descriptor extraction functions operative on video or image signal data, the descriptors in this case being applicable to visual contents, such as indicating whether a scene is set at night or daytime, the amount of action, etc. Other applications are in the fields of automatic cataloguing of sound, scenes, objects, animals, plants, etc. through high level descriptors.

Claims (29)

  1. Method of generating a general extraction function (4) which can operate on an input signal (Sx) to extract therefrom a predetermined global characteristic value (DVex) expressing a feature of the information (De) conveyed by that signal,
       characterised in that it comprises the steps of:
    generating automatically compound functions (CF1- CFn), each compound function being composed of at least one of a set of elementary functions (EF1, EF2, ..), by using means (22, 25) that handle said elementary functions as symbolic objects,
    operating said compound functions on at least one reference signal (S1-Sm) having a pre-attributed global characteristic value (Dgt1-Dgtm) serving for evaluation, by using means (22, 27) that process said elementary functions as executable operators,
    determining the correlation between the values (Dij) extracted by those compound functions as a result of operating on said reference signal and the pre-attributed global characteristic value (Dgt1-Dgtm) of said reference signal, and
    selecting said general extraction function (4) among those compound functions for which said correlation is relatively high.
  2. Method according to claim 1, wherein said compound functions are generated in successive populations (P, P1, P2),
       wherein each new population of functions takes as a basis earlier population functions which produce a relatively high correlation.
  3. Method according to claim 1 or 2, wherein it is performed by the steps of:
    a) preparing at least one reference signal (S1-Sm) for which said predetermined global characteristic value (Dgt1-Dgtm) is pre-attributed,
    b) preparing a population (P1) of compound functions (CF1-CFr) each composed of at least one elementary function (EF),
    c) modifying compound functions of the current population using said means (22, 25) that handle their elementary functions as symbolic objects,
    d) operating said compound functions of said population on at least one said reference signal using said means (22, 27) that exploit said elementary functions as executable operators, to obtain a calculated value for each compound function of the population in respect of said reference signal,
    e) for at least some compound functions of the population, determining the degree of matching between its calculated value and the pre-attributed value (Dgti) for the signal from which that value has been calculated,
    f) selecting compound functions of said population producing the best matches to form a new population (P2) of functions,
    g) if an ending criterion is not satisfied, returning to step c), where said new population becomes the current population,
    h) if an ending criterion is satisfied, outputting at least one compound function of the current new population as a said general function (4).
  4. Method according to any one of claims 1 to 3, wherein said compound functions are produced by random choices guided by rules and/or heuristics.
  5. Method according to claim 4, wherein said rules and/or heuristics comprise at least one rule which forbids, from a random draw for selecting an elementary function to be associated with a part of a compound function under construction, an elementary function that would be formally inappropriate for that part.
  6. Method according to claim 4 or 5, wherein said rules and/or heuristics comprise at least one heuristic which favours, in a random draw for selecting an elementary function to be associated with a part of a compound function under construction, an elementary function which is considered to produce potentially useful technical effects in association with that part, and/or which discourages from said random draw an elementary function considered to produce technical effects of little or no use in association with that part.
  7. Method according to any one of claims 4 to 6, wherein said rules and/or heuristics comprise at least one heuristic which ensures that a said compound function (CF) comprises only elementary functions (EF) that each produce a meaningful technical effect in their context.
  8. Method according to any one of claims 4 to 7, wherein said rules and/or heuristics comprise at least one heuristic which takes into account at least one overall characteristic of said reference signals.
  9. Method according to any one of claim 2 to 8, wherein a new population (P1, P2, ..) of functions is produced using genetic programming techniques.
  10. Method according to claim 9, wherein said genetic programming techniques comprise at least one of following:
    crossover,
    mutation,
    cloning.
  11. Method according to claim 10, wherein a crossover operation and/or a mutation operation is guided by at least one heuristic of any one of claims 4 to 8.
  12. Method according to any one of claims 1 to 10, wherein said means (22, 25) that handle said elementary functions as symbolic objects manage said functions (CF) in accordance with a tree structure comprising nodes and connecting branches, in which each node corresponds to a symbolic representation of a constituent unit function (EF), said tree having a topography in accordance with the structure of said function.
  13. Method according to any one of claims 1 to 12, further comprising a step of submitting a compound function (CF) to at least one rewriting rule executed by processing means (15, 22) to ensure that said compound function is cast in its most rational form or most efficient form in respect of execution efficiency.
  14. Method according to any one of claims 1 to 13, wherein it uses a caching technique for evaluating a function, in which results (R1, R2, ...) of previously calculated parts of functions are stored (24) in correspondence with those parts, and a function currently under calculation is initially analysed to determine whether at least a part of said function can be replaced by a corresponding stored result, said part being replaced by its corresponding result if such is the case.
  15. Method according to claim 14, comprising the steps of checking the usefulness of results stored (24) according to a determined criterion, and of erasing those found not to be useful, said criterion for keeping a result Ri being a function which takes into account: i) the calculation time to produce Ri, ii) the frequency of use of Ri and, optionally, iii) the size (in bytes) of Ri.
  16. Method according to any one of claims 1 to 15, wherein said elementary functions (EF) comprise signal processing operators and mathematical operators.
  17. Method according to any one of claims 1 to 16, further comprising a step of validating a general function (CF) against at least one reference signal having a known value for said general characteristic, and which was not used to serve as said reference.
  18. Method according to any one of claims 1 to 17, wherein said signal (S) expresses an audio content, and said global characteristic is a descriptor (De) of the audio content.
  19. Method according to claim 18, wherein said audio content is in the form of an audio file, said signal (S) being the signal data of said file.
  20. Method according to claim 18 or 19, wherein said descriptor comprises at least one among:
    a global energy indication,
    a sung or instrumental audio content,
    an evaluation of the danceability,
    an acoustic or electric sounding audio content,
    presence or absence of a solo instrument, e.g. guitar or saxophone solo.
  21. Method of extracting a global characteristic value (DVex) expressing a feature of the information (De) conveyed by a signal (Sx), characterised in that it comprises calculating for said signal (Sx) the value of a general function (4) produced specifically by the method of any one of claims 1 to 20 for that global characteristic.
  22. Apparatus (2) for generating a general function (4) which can operate on an input signal (Sx) to extract therefrom a predetermined global characteristic value (DVex) expressing a feature of the information (De) conveyed by that signal,
       characterised in that it comprises:
    means (22, 25) for generating automatically compound functions (CF1-CFn), each compound function being composed of at least one of a set of elementary functions (EF1, EF2, ..), said means (22, 25) handling said elementary functions as symbolic objects,
    means (22, 27) for operating said compound functions on at least one reference signal (S1-Sm) having a pre-attributed global characteristic value (Dgt1-Dgtm) serving for evaluation, said means (22, 27) processing said elementary functions as executable operators,
    means (22) for determining the correlation between the values (Dij) extracted by those compound functions as a result of operating on said reference signal and the pre-attributed global characteristic value (Dgt1-Dgtm) of said reference signal, and
    means (22) for selecting said general extraction function (4) among those compound functions for which said correlation is relatively high.
  23. Apparatus according to claim 22, configured to execute the method according to any one of claims 1 to 21.
  24. Use of the apparatus according to claim 22 or 23 as a fully autonomous automatic descriptor extraction function generating system.
  25. Use of the apparatus according to claim 22 or 23 as a descriptor extraction means.
  26. Use of the apparatus according claim 22 or 23 as an authoring tool for producing descriptor extraction functions (4).
  27. Use of the apparatus according to claim 22 or 23 as an evaluation tool for externally produced descriptor extraction functions.
  28. A general function (4) in a form exploitable by an electronic machine, produced specifically by the apparatus according to claim 22 or 23.
  29. A software product containing executable code which, when loaded in a data processing apparatus, enables the latter to perform the method of any one of claims 1 to 21.
EP20020293122 2002-12-17 2002-12-17 Method and apparatus for generating a function to extract a global characteristic value of a signal contents Withdrawn EP1437711A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP20020293122 EP1437711A1 (en) 2002-12-17 2002-12-17 Method and apparatus for generating a function to extract a global characteristic value of a signal contents
EP03290635A EP1431956A1 (en) 2002-12-17 2003-03-13 Method and apparatus for generating a function to extract a global characteristic value of a signal contents
DE20321797U DE20321797U1 (en) 2002-12-17 2003-03-13 Apparatus for automatically generating a general extraction function that is calculable from an input signal, e.g. an audio signal to produce therefrom a predetermined global characteristic value of its content, e.g. a descriptor
US10/738,928 US7624012B2 (en) 2002-12-17 2003-12-16 Method and apparatus for automatically generating a general extraction function calculable on an input signal, e.g. an audio signal to extract therefrom a predetermined global characteristic value of its contents, e.g. a descriptor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP20020293122 EP1437711A1 (en) 2002-12-17 2002-12-17 Method and apparatus for generating a function to extract a global characteristic value of a signal contents

Publications (1)

Publication Number Publication Date
EP1437711A1 true EP1437711A1 (en) 2004-07-14

Family

ID=32479823

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20020293122 Withdrawn EP1437711A1 (en) 2002-12-17 2002-12-17 Method and apparatus for generating a function to extract a global characteristic value of a signal contents

Country Status (1)

Country Link
EP (1) EP1437711A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5210820A (en) * 1990-05-02 1993-05-11 Broadcast Data Systems Limited Partnership Signal recognition system and method
US6028262A (en) * 1998-02-10 2000-02-22 Casio Computer Co., Ltd. Evolution-based music composer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5210820A (en) * 1990-05-02 1993-05-11 Broadcast Data Systems Limited Partnership Signal recognition system and method
US6028262A (en) * 1998-02-10 2000-02-22 Casio Computer Co., Ltd. Evolution-based music composer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LAMBROU T ET AL: "CLASSIFICATION OF AUDIO SIGNALS USING STATISTICAL FEATURES ON TIME AND WAVELET TRANSFORM DOMAINS", PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING. ICASSP '98. SEATTLE, WA, MAY 12 - 15, 1998, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, NEW YORK, NY: IEEE, US, vol. 6 CONF. 23, 12 May 1998 (1998-05-12), pages 3621 - 3624, XP000951242, ISBN: 0-7803-4429-4 *
TOKUMARU M ET AL: "MEMBERSHIP FUNCTIONS IN AUTOMATIC HARMONIZATION SYSTEM", PROCEEDINGS OF THE 1998 28TH IEEE INTERNATIONAL SYMPOSIUM ON MULTIPLE-VALUED LOGIC. ISMVL '98. FUKUOKA, MAY 27 - 29, 1998, THE INTERNATIONAL SYMPOSIUM ON MULTIPLE-VALUED LOGIC, LOS ALAMITOS, CA: IEEE COMPUTER SOC, US, 27 May 1998 (1998-05-27), pages 350 - 355, XP000793476, ISBN: 0-8186-8372-4 *

Similar Documents

Publication Publication Date Title
US7624012B2 (en) Method and apparatus for automatically generating a general extraction function calculable on an input signal, e.g. an audio signal to extract therefrom a predetermined global characteristic value of its contents, e.g. a descriptor
Bogdanov et al. Essentia: an open-source library for sound and music analysis
Pampalk A Matlab Toolbox to Compute Music Similarity from Audio.
Logan et al. A Music Similarity Function Based on Signal Analysis.
US7908135B2 (en) Music-piece classification based on sustain regions
AU749235B2 (en) Method and apparatus for composing original musical works
Hamanaka et al. Musical structural analysis database based on GTTM
Ganseman et al. Source separation by score synthesis
Kostek et al. Report of the ISMIS 2011 contest: Music information retrieval
Norowi et al. Factors affecting automatic genre classification: an investigation incorporating non-western musical forms
Serra et al. Sound transformations based on the sms high level attributes
Macret et al. Automatic design of sound synthesizers as pure data patches using coevolutionary mixed-typed cartesian genetic programming
Farrokhmanesh et al. A novel method for malware detection using audio signal processing techniques
Garcia Growing sound synthesizers using evolutionary methods
Masuda et al. Quality-diversity for Synthesizer Sound Matching
Atli et al. Audio feature extraction for exploring Turkish makam music
Dannenberg et al. Panel: new directions in music information retrieval
EP1437711A1 (en) Method and apparatus for generating a function to extract a global characteristic value of a signal contents
Dittmar et al. Novel mid-level audio features for music similarity
CN106294563A (en) A kind for the treatment of method and apparatus of multi-medium data
García Automatic generation of sound synthesis techniques
Cherla et al. Automatic phrase continuation from guitar and bass guitar melodies
Macret Automatic tuning of the OP-1 synthesizer using a multi-objective genetic algorithm
Ślȩzak et al. KDD-based approach to musical instrument sound recognition
Simas Filho et al. Genre classification for brazilian music using independent and discriminant features

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LI LU MC NL PT SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20040721