Speech that is heard (and seen) by everyone – in real time

Lingsoft developed a solution that can subtitle weather reports for Sveriges Television (SVT), the Swedish public service television company, using speech recognition. The project was part of the Swedish Post and Telecom Authority's (PTS) accessibility project, which also helped generate a considerable amount of language capital for the Swedish Language Bank.

In 2015, the Swedish Post and Telecom Authority (PTS) and the Swedish public service television company (SVT) issued a call for tenders for the purposes of developing a speech recognition-assisted prototype solution that could be used to subtitle SVT's weather reports. The project was part of an extensive national undertaking to develop Sweden's national language infrastructure and promote the development of speech-based solutions and services for the country. Lingsoft was selected as the vendor for the prototype.

Accessibility in real time

SVT is a national public television company, and as such is required to subtitles 100% of all recorded programmes and 65–80% of live broadcasts for the hearing impaired or viewers requiring subtitles for other reasons. Each year, this results in approximately 18,000 hours of Swedish-language television programmes subtitled in Swedish. PTS and SVT have a common goal: to improve the accessibility and usability of public services in an effort to make information and content available to all user groups regardless of age, functional capacity, disability or other special needs.

Readability and comprehensibility are key in subtitling. This poses a special challenge when subtitling live broadcasts, as not even the fastest typist can produce flawless text in pace with spoken dialogue, much less high-quality subtitles that keep up with the words and images on the screen. At SVT, live subtitlers use a specially designed keyboard (called a Velotype), but learning how to use one, not to mention mastering it, requires months or even years of training.

Not just technology, but an entirely new process

The starting point for the solution was to update the subtitling process by combining automatic speech recognition with re-speaking. Using freely available speech and text material as well as SVT's own materials,Lingsoft developed a speech recognition solution based on machine learning and deep neural networks that could be integrated with SVT's own subtitling system. The solution was trained to pay particularly close attention to weather report terminology and SVT's subtitling rules and practices. In addition to this, voice commands were made available to subtitlers, thus allowing them to add punctuation marks, insert line breaks or change the colour of a text without having to touch a keyboard.

A successful IT project by any measure

The project ran for 13 months and was a runaway success: it was completed ahead of schedule and the client's demands were met with flying colours, in terms of both the quality and speed of the speech recognition. Success relied on close co-operation with the client and a genuine understanding of their needs.

A key part of the project was the production testing phase, in which viewers belonging to different interest groups evaluated the subtitles produced using speech recognition. The results were promising to say the least, and viewer feedback was positive: the subtitles created using speech recognition were available more quickly than manual subtitles, and the errors made by speech recognition were − perhaps a bit surprisingly − less disruptive to the viewer than the typing errors and unexpected character combinations made by a subtitler using a Velotype.

SVT put the prototype into use at the beginning of 2018 and is now developing its own processes further on the basis of the speech recognition solution. The project also generated a considerable amount of language capital, and this data will serve as the basis for the Swedish Language Bank.