User:Safwan mustafa/sandbox

Audio Oracle Algorithm
Audio Oracle is a new method for indexing of audio data in terms of repeating sub-clips of variable length that we call audio factors. The new structure allows fast retrieval and recombination of sub-clips in a manner that assures continuity between splice points,This method is motivated by a similar technique used for fast indexing and search in text, called Factor Oracle.
The resulting structure accomplishes effectively a new method for texture synthesis, where the amount of innovation is controlled by one of the synthesis parameters.

Contents 1.Introduction 2.Description 3.Pseudocode 4.Example 5.Complexity 6.Applications 7.References

Introduction

The need to find sub-clips in a recording is important for a variety of applications. In case of audio retrieval, question of partial similarity can be used to ask questions whether certain sound clip occurs in a recording, and if it does,where? It can be used for efficient editing, such as jumping to similar sounding places in a recording, selection of audio sub-clips from databases of longer clips, or concatenative synthesis. Reshuffling of sub-clips can be used to create new variants of existing recordings, such as in the case of texture synthesis and audio mosaicing.
Multiple methods used for the same purpose of indexing audio data such as Discrete wavelet transform,Time frequency analysis, audio oracle is a new method for indexing of audio data in terms of detection of approximately repeating sub-clips of variable length that we call audio factors.

Description

As we mention above Audio Oracle is motivated by similar technique used for fast indexing of symbolic data such as text called Factor Oracle, Audio Oracle accepts a continuous (audio) stream as input, transforms it into a sequence of feature vectors and submits these vectors to AO analysis. AO outputs an automaton that contains pointers to different locations in the audio data that satisfy certain similarity criteria, as found by the algorithm. The resulting automaton is passed next to the audio generation module, that will outputs a new audio stream that is created by concatenation of the audio frames corresponding to AO states that were traversed during the generative process.

To assure modularity, the nature of the data presented to the algorithm is independent of the AO itself and can be any vector per audio frame containing audio features that describe the audio segments. Therefore, symbols in Factor Oracle corresponding to each state i of the algorithm are vectors representing a user-defined description of the audio stream. In order to construct the oracle structure, a distance function is required that measures the similarity between these “symbols” in the oracle. Defining the distance function is done by the user in a modular fashion as well.

Pseudocode

Audio oracle construction is done by the following two algorithms, During the online construction,the algorithm accepts audio frame descriptions as vectors σ_i for each time-frame i and updates audio oracle in an incremental manner.

`Algorithm 1 On-line construction of Audio Oracle Require: Audio stream as S=σ₁σ₂...σ_n Create an oracle P with one single state 0 S_p(0)← -1 for i=0 to N do Oracle(P=P₁....P_i)←Add-Frame (Oracle(P=p₁....p_i-1),σ_i) end for return Oracle (P=p₁....p_n)`

Algorithm 2 Add-Frame function: Incremental update of Audio Oracle Require: Oracle P = p₁.....p_m and Audio Frame description vector σ Create a new state m + 1 Create a new transition from m to m + 1,∂(m,σ) =m + 1 k ← S_p(m) while k > -1 do Calculate distances between σ and S Find indexes of frames in S whose distances from σ are less than θ if There are indexes found then Create a transition from state k to m + 1 ∂(k,σ) = m + 1 k ← S_p(k) end if end while if k = -1 (no suffix exists) then s ← 0 else s ← where leads the best transition (min. distance) from k end if S_p ← s return Oracle P = p₁.....p_mσ

Algorithm 1 calls the function Add-Frame described in algorithm 2 which updates the audio oracle structure using the latest received frame descriptions. This function works very similar to the description of Factor Oracle except that:-

it accepts continuous data flow rather than symbolic data.
does not assign symbols to transitions and instead each state has a one-toone correspondence with frames in audio buffer.
it uses a distance function (described earlier) along with a threshold θ to assign the degree of similarity between frame descriptions.

The set of links in Audio Oracle are forward arrows ∂(i,σ) and suffix links S_p(k).
Similar to the Factor Oracle algorithm, forward transitions or factor links correspond to states that can produce similar patterns by continuing forward and suffix or backward links correspond to states that share the largest similar sub-clip in audio buffer.

Example

Figure on the right shows a sample run of Audio Oracle algorithm and the state spaces obtained out of the audio using Cepstral Coefficients as features. Part (c) shows the (natural recording) waveform of audio clip used in the experiment which corresponds to a Blue Jay bird sound. As seen in the waveform, the bird sound can be heard twice. For demonstration purposes, Part (a) and Part(b) show the obtained state space out of Audio Oracle algorithm and for two different threshold θ values on the distance function. Naturally, a lower threshold means that similarity decisions are much tighter and thus, less forward and backward links would be produced. The numbering on each node correspond to a specific time-frame analysis over the original audio (total of 20 frames of size 93ms with 46ms overlap). The second occurrence of the bird sound appears around frame number 12. Clearly, we would expect suffix links after this state to refer at some point to the earlier occurrence since both are somewhat similar. Forward transitions also connect states that share a similar leading frame using the given distance function and threshold.

Complexity

Audio Oracle’s complexity is both linear in time and space (independent of the complexity of audio feature calculations). This computational complexity allows the algorithm to work in interactive and real-time environments.

Applications

AO can be used as a generator (high level instrument) in a new composition that is designed off-line. For on-line, real-time interaction approach, two main systems have been built using audio oracle:-

1 -Omax by Assayag, Bloch and Chemillier, an OpenMusic/Max application for co-improvisation with human performers using the Midi, audio and video media.
2 -MIMI by A. Franois and E. Chew, a multimodal interface for coimprovisation with the computer that gives a visual feedback to the performer, helping him to anticipate the computers reactions.

References

RCAM REAL TIME MUSICAL INTERACTIONS
A NEW ALGORITHM FOR FAST LEARNING OF AUDIO STRUCTURES Shlomo Dubnov, G´erard Assayag, Arshia Cont
MUSIC DESIGN WITH AUDIO ORACLE USING INFORMATION RATE. Shlomo Dubnov, G´erard Assayag
AUDIO ORACLE ANALYSIS OF MUSICAL INFORMATION RATE.Shlomo Dubnov, G´erard Assayag, Arshia Cont