Speech as a user interface


For humans, speech is the most natural means of communication. It is fast, easy and frees up your hands to focus on other tasks. Speech interfaces are also vital for users who cannot write using a keyboard.  Indeed, speech technology is one of the most important technologies of the future: according to MarketsandMarkets Research, speech recognition-based business is growing at an explosive annual rate of 20%. 

Lingsoft has been developing its speech recognition technology for over two decades. We utilise speech recognition as part of our services and offer our clients speech recognition solutions for use at the organisational level. Our solutions are always based on our clients' need: even though standard language speech recognition can be applied as is in a variety of applications, we also custom-tailor language models for the recognition of defined trade lingo, such as the specialised terminology used in health care. In addition to customised language models, consideration is also given to the production process, which affects how we design our solutions. We provide speech recognition solutions in Finnish, Swedish and other Nordic languages.

Our solutions always focus on the intended goal of the text production process and how a text can be utilised during or after its production.  Indeed, speech is ultimately just a fast way of producing text and preserving information. Read more about text analysis.

Higher quality texts at a faster rate

Lingsoft's speech recognition technology was developed by Finland's top speech recognition experts, and we work in close co-operation with leading university researchers. Our current speech recognition solutions are based on machine learning and deep neural networks.

The quality of speech recognition that is based on machine learning is only as good as the material used to train the speech recognition model. In many cases, the training material that is used is filled with spelling and grammatical errors or other structures that the machine should not repeat in its speech recognition results. Lingsoft's strength lies in its  language analysis process, which can be used to preprocess training materials, e.g. correct errors or anonymise sensitive material.

Speech recognition is based on probabilities: the model is trained using large amounts of audio and text data that help it learn which parts of different words will most likely appear in any given context.  A trade lingo glossary is more limited than a standard language glossary. Consequently, the more limited an area, the higher the recognition accuracy. For example, the percentage of incorrect words produced by the weather forecast speech recognition model that we provided to Swedish television is approximately 2%. In practice, there are fewer errors in a text than if it were written by a human.


It is fairly obvious that the use of high-quality speech recognition as part of the writing and text processing process can significantly accelerate the rate of text production. For example, the time it takes for a text to be produced when using speech recognition in health care services is about the same as the time that is spent dictating it. This practically makes real-time dictation a reality. High-quality speech recognition also makes it possible to transcribe large amounts of audio data in a cost-effective manner.

Read more about speech recognition processes here.