Jump to content

Speech recognition software for Linux

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by KolbertBot (talk | contribs) at 04:33, 3 December 2017 (Bot: HTTP→HTTPS (v477)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

There are currently several speech recognition software packages for Linux. Some of them are free and open source software while others are proprietary. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for sending operational commands to a computer.

Native Linux speech recognition

History

In the late 1990s, a Linux version of ViaVoice (created by IBM) was made available to users for no charge. However, the free SDK was removed by the developer in 2002.

Current development status

Recently, there has been a push to get a high-quality native Linux speech recognition engine developed. As a result, numerous projects dedicated to creating Linux speech recognition solutions were established.

Crowdsourcing of speech samples

To compile a speech corpus and enable the production of acoustic models for speech recognition, VoxForge was set up with the aim of collecting transcribed speech for use with speech recognition projects. It is licensed under the GPL. VoxForge accepts crowdsourced speech samples and corrections of recognized speech sequences.

Speech recognition concept

The first step starts recording an audio stream on the Linux machine. Then the user has two options:

  • (Local) process the voice recognition on his local machine or
  • (Remote) submit the audio file to a remote server for converting the audio file into a text string.

The second option is used mainly on smart phones, because they do not have the performance and disk space to process the speech recognition on the phone.

Speech Recognition in Browser

Speech Recognition can be performed in a web browser. This concept does not require installation of software on the desktop computer or mobile device. According to the Local/Remote approach this will be possible as well in the browser as well:

  • (Remote): https://dictation.io (use Chromium/Chrome) The dictation service records an audio track of the user via the web browser. In turn, dictation.io uses the Google API for speech recognition.
  • (Local): There are solutions that work on the client only, without sending data to servers, e.g. pocketsphinx.js.

Free speech recognition engines

The following is a list of current projects dedicated to implementing speech recognition in Linux, as well as major native solutions. These are not end-user applications. These are programming libraries that a programmer may use to develop an end-user application.

  • CMU Sphinx is a general term to describe a group of speech recognition systems developed at Carnegie Mellon University.
  • Julius is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers.
  • Kaldi a toolkit for speech recognition provided under the Apache licence.

Possibly active projects:

  • Lera (Large Vocabulary Speech Recognition) based on Simon and CMU Sphinx for KDE[1].
  • Speechpad.pw[2] uses Google's speech recognition engine and Chrome native messaging API to provide direct speech input in Linux.
  • Speech[3] uses Google's speech recognition engine to support dictation in many different languages.
  • Speech Control: is a Qt-based application that uses CMU Sphinx's tools like SphinxTrain and PocketSphinx to provide speech recognition utilities like desktop control, dictation and transcribing to the Linux desktop.
  • Platypus[4] is an open source shim that will allow the proprietary Dragon NaturallySpeaking running under Wine to work with any Linux X11 application.
  • FreeSpeech,[5] from the developer of Platypus, is a free and open source cross-platform desktop application for GTK that uses CMU Sphinx's tools to provide voice dictation, language learning, and editing in the style of Dragon NaturallySpeaking.
  • Vedics[6] (Voice Enabled Desktop Interaction and Control System) is a speech assistant for GNOME Environment
  • GnomeVoiceControl[7] is a dialogue system to control the GNOME Desktop that was developed in the Google Summer of Code in 2007.
  • NatI[8] is a multi-language voice control system written in Python
  • SphinxKeys[9] allows the user to type keyboard keys and mouse clicks by speaking into their microphone.
  • VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines.
  • Simon[10] aims at being extremely flexible to compensate dialects or even speech impairments. It uses either HTK / Julius or CMU SPHINX, works on Windows and Linux and supports training.
  • Speeral Speeral a group of speech recognition tools developed at University of Avignon
  • Jasper project https://jasperproject.github.io/ Jasper is an open source platform for developing always-on, voice-controlled applications. This is an embedded Raspberry Pi front-end for CMU Sphinx or Julius

It is possible for developers to create Linux speech recognition software by using existing packages derived from open-source projects.

Inactive projects:

  • CVoiceControl[11] is a KDE and X Window independent version of its predecessor KVoiceControl. The owner ceased development in alpha stage of development.
  • Open Mind Speech,[12] a part of the Open Mind Initiative,[13] aims to develop free (GPL) speech recognition tools and applications, as well as collect speech data. Production ended in 2000.
  • PerlBox[14] is a perl based control and speech output. Development ended in early stages in 2004.
  • Xvoice[15] A user application to provide dictation and command control to any X application. Development ended in 2009 during early project testing. (requires proprietary ViaVoice to function)

Proprietary speech recognition engines

Voice control and keyboard shortcuts

Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for sending operational commands to a computer or appliance. Voice control typically requires a much smaller vocabulary and thus is much easier to implement.

Simple software combined with keyboard shortcuts, have the earliest potential for practically accurate voice control in Linux.

Running Windows speech recognition software with Linux

Using a compatibility layer

It is possible to use programs such as Dragon NaturallySpeaking in Linux, by utilizing Wine, though some problems may arise, depending on which version is used.[21]

Using virtualized Windows

It is also possible to use Windows speech recognition software under Linux. Using no-cost virtualization software, it is possible to run Windows and NaturallySpeaking under Linux. VMware Server or VirtualBox support copy and paste to/from a virtual machine, making dictated text easily transferable to/from the virtual machine.

See also

References

  1. ^ Lera KDE git repository - (2015) - https://cgit.kde.org/scratch/grasch/lera.git/ Retrieved 2017-07-25.
  2. ^ Speechpad.pw
  3. ^ Speech
  4. ^ Platypus
  5. ^ FreeSpeech
  6. ^ Vedics
  7. ^ GnomeVoiceControl
  8. ^ NatI (Natural Language Interface)
  9. ^ SphinxKeys
  10. ^ Simon KDE - Main Developer until 2015 Peter Grasch - (accessed 2017/09/04) - http://simon.kde.org/]
  11. ^ CVoiceControl
  12. ^ Open Mind Speech
  13. ^ Open Mind Initiative
  14. ^ PerlBox
  15. ^ Xvoice
  16. ^ Verbio ASR
  17. ^ DynaSpeak
  18. ^ Janus Recognition Toolkit (JRTk)
  19. ^ "Speech Recognition Software - LumenVox". Retrieved 2013-02-28.
  20. ^ Speech-to-text software by Vocapia
  21. ^ Dragon NaturallySpeaking - Wine Application Database