Jump to content

Audio search engine

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 168.65.231.235 (talk) at 19:32, 15 September 2015 (Deep audio search: Added two new companies. This article still appears to be a bit stale and could use more work.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

An audio search engine is a web-based search engine which crawls the web for audio content. The information can be consist in web pages, images, audio files, o another type of documents. The utilization of this search engine is most prevailing on mobile phones, that allow obtaining information of unknown objects in anytime and place. Various techniques exist for research on these engines.

Audio search from text

In a toolbar of search is input a term of search (in text format) and then, after analyzing the database, show the results that concur our term. These results accompanied by a brief description of the audio file and its characteristics, such as: sample frequency, bit rate, type of file, length, duration, or coding type. Finally the user will have the option to download the file that best suits your preferences.

Audio search from image

The Query by Example(QBE) system, is a technique of search that implies providing to Content-based image retrieval(CBIR) system with an image of example, where it will serve to realize the search. Once analyze the image, it extracts information from the same. For example, keywords related to the content of the image. These words are used to search the audios from the sound database. As the previous example, depending to the application, the results of the search are displayed according to the user preferences regarding to the type of file (wav, mp3, aiff…) or other characteristics.

Above: a sound A waveform
Below: a sound A spectrogram

Audio search from audio

It's commonly used in the field of music. In the application the user must play the audio of a song by a music player or singing or humming to the computer microphone or mobile grasp it and the sound is recorded. Subsequently, the sound picked correspond to a pattern A which is defined by its waveform and its frequency representation from its Fourier Transform. This pattern will be matched with a pattern B corresponding to the sound files found in the database, of which its waveform and its transform is known. All those audio files in the database whose patterns are similar to the pattern search will be displayed as search results.

  • AudioSear.ch is a company which develops technology for indexing and retrieving transcribed text from audio recordings. Audio content is indexed and searchable.
  • VoiceBase is a company which develops technology for converting audio content to time-stamped text with a server-side speech-recognition engine. After the speech regognition process, the audio content is then indexed, and searchable.
  • Everyzing (formerly Podzinger until May, 2007) is a company which develops technology for delivering video content. Everyzing has developed to products which are licensed primarily to large media companies. ezSEO is a white labeled, hosted search engine optimization solution for making audio and video discoverable with major search engines such as Google and Yahoo. ezSEARCH is a universal site search product which combines text, images, audio, and video. Everyzing claims to have spent millions of dollars building speech to text audio search. Everyzing takes the user within the actual content by using speech recognition. This enables online video consumers to jump directly to the point in the video for which they are searching.
  • Picsearch Audio Search has been licensed to search portals since 2006. Picsearch is a search technology provider who powers image, video and audio search for over 100 major search engines around the world.

For mobile phone

  • SoundHound (previously known as Midomi) is a software that lets you find results concerning a song from the audio that you entered previously having sung or hummed. Its purpose is to create a searchable database of more comprehensive music. To do this, can contribute to expand the database performing songs in the recording studio online that Midomi offers, in any language or genre, so that interpretation could be the result of a search. Also, add a shopping service legal digital music, or you may also post their original songs.
  • Munax released their first version all-content search engine in 2005 and powers both nationwide and worldwide search engines with audio search. Munax make the PlayAudioVideo multimedia search engine in July 2007, providing search on the web for three multimedia types. For Mobile Phone only provide a search with the traditional text (metasearch engines) search but also develops functionality that let the visitors pre-listen to audio and preview videos, making it easier for the visitor to decide what song or video he is looking for before playing it, or visiting the site hosting it.
  • Shazam is an app for smartphone or Mac best known for its music identification capabilities. It uses a built-in microphone to gather a brief samplpe of audio being played. It creates an acoustic fingerprint based on the sample, and compares it against a central database for a match. If it finds a match, it sends information such as the artist, song title, and album back to the user.

Search results are modified, or suspect, due to the large hosted video being given preferential treatment in search results.

Design and algorithms

A spectrogram of the sound of a violin.
The target zone of a song scanned by Shazam.[clarification needed]

Audio search has evolved slowly through several basic search formats which exist today and all use keywords. The keywords for each search can be found in the title of the media, any text attached to the media and content linked web pages, also defined by authors and users of video hosted resources.

Some search engines can search recorded speech such as podcasts, though this can be difficult if there is background noise. Around 40 phonemes exist in every language with about 400 in all spoken languages. Rather than applying a text search algorithm after speech-to-text processing is completed, some engines use a phonetic search algorithm to find results within the spoken word. Others work by listening to the entire podcast and creating a text transcription.

Applications as Munax, use several independent ranking algorithms processes, that the inverted index together with hundreds of search parameters to produce the final ranking for each document. Also like Shazam that works by analyzing the captured sound and seeking a match based on an acoustic fingerprint in a database of more than 11 million songs. Shazam identifies songs based on an audio fingerprint based on a time-frequency graph called a spectrogram. Shazam stores a catalogue of audio fingerprints in a database. The user tags a song for 10 seconds and the application creates an audio fingerprint. Once it creates the fingerprint of the audio, Shazam starts the search for matches in the database. If there is a match, it returns the information to the user; otherwise it returns a "song not known" dialogue. Shazam can identify prerecorded music being broadcast from any source, such as a radio, television, cinema or music in a club, provided that the background noise level is not high enough to prevent an acoustic fingerprint being taken, and that the song is present in the software's database.

See also