MICROSERVICES AT YOUR SERVICE BRIDGING THE GAP BETWEEN NLP RESEARCH AND INDUSTRY

Our mission is to simplify finding and using the many superb open source speech and language processing tools that the European Research Community has to offer.

The European Union’s Connecting Europe Facility has given us support to fill the ELRC-SHARE and European Language Grid with such tools! We are taking yet another step towards fulfilling the vision of a European digital single market!

Gradiant

Lingsoft

Reykjavik University

University of Tartu

OUR PLAN 2021-2023

  1. Reach out to the European research community to help us identify suitable open source tools (2021)
     
  2. Help the researchers packaging the tools to facilitate re-use by other developers and researchers (2021-2022)
     
  3. Make the tools available on the European Union’s own platforms for language technology ELRC-SHARE and European Language Grid (2022-2023)

Do you know of open source tools that could be of interest to us? Perhaps tools you or your group have developed?

We are planning  workshops for how to add a server API and package your tool as an easily distributable docker image in early 2022. Come join us!

Would you like to be kept up-to-date with the progress of our project? Join our concluding seminar in February 2023 to see which tools we found and made available!

 

Contact us

Docker and API workshop

Lingsoft arranged a Docker and API workshop for beginners as part of our "Microservices at Your Service" project. Filip Ginter and Juhani Luotolahti from Turku NLP group assisted Lingsoft's Sebastian Andersson with the workshop presentations. The workshop is targeted at people interested in speech and language technology development who are beginners at REST API and Docker. The whole workshop was recorded and can be watched on this page or here. The presentations contain hands-on examples with code that can also be downloaded and tried. 

Docker offers a convenient way of putting software tools and models into a package that makes it very easy for other software developers and data scientists to use those tools and models. According to Wikipedia: “[Docker is] a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages [...]”. The created Docker package is called an "image". This image is a file that can be copied and shared with just your colleagues or it can be registered with e.g. DockerHub and shared with the world!

By adding a web service API to your software and putting it on a Docker one can arrange many different types of tools and models into a microservice architecture similar to how the European Language Grid (ELG) and many other Saas/PaaS are organised, including Lingsoft's own SaaS. The main benefit is that it is very easy to replace outdated tools/models with state-of-the-art from e.g. the great speech and language technology open source communities on GitHub, ELG, Hugginface, etc. This is very important in the fast moving field of artificial intelligence, including speech and language technology.

 

You can download the presentation slides here.

INTRODUCING OUR CONSORTIUM

 

Gradiant

Spanish ICT technology centre aims to improve the competitiveness of companies by transfering knowledge and technologies in the fields of connectivity, intelligence and security. With more than 100 professional and 285 R&D&i projects, they’re becoming one of the main engines of innovation in Galicia. 

Gradiant is backed by a board that includes representatives of the three Galician universities (Vigo, Santiago and A Coruña) and seven companies from the telecommunications industry: Altia, Arteixo Telecom, Egatel, Indra, Plexus, R, Telefónica, Televés; and INEO business association.

Gradiant is positioned as a technology partner for the industry, oriented to their needs in the ICT field. They are contributing with national and international experience in technologies for security and privacy; processing of multimedia signals; Internet of Things; Natural Language Processing, biometrics and data analytics; and advanced communications systems. 

 

Lingsoft

Lingsoft Oy and its sister company Lingsoft Language Services Oy are part of the Lingsoft Group with a consolidated turnover of about 12,5 million euros in 2019 making us one of the 100 largest language service providers in the world. Founded in 1986, Lingsoft is a reliable, experienced and innovative partner. Lingsoft makes available a wide variety of language technology solutions and language services, designed for the analysis, processing and utilization of written and spoken language. Our solutions are making the text FAIR - Findable, Accessible, Interoperable and Reusable in online society. Lingsoft's core technologies and solutions have been tested by tens of millions of users around the world as part of the Microsoft Office suite of proofing tools. Lingsoft is the coordinator of the “Microservices at Your Service”. 

Reykjavik University

Reykjavik University is a dynamic international university with 3800 registered students and 250 permanent faculty and staff. The university focuses on research, excellence in teaching, entrepreneurship, technology development, and co-operation with the business community. The Language and Voice Lab (LVL) was established in 2016 as a part of the research center in Artificial Intelligence with the aim of carrying out research and development in speech and language processing. LVL is part of the Icelandic National Language Technology Programme. This is a consortium of universities, institutions, associations, and private companies with the aim of ensuring that Icelandic can be supported in modern language technology  applications.

University of Tartu

University of Tartu is the leading centre of research and training in Estonia. It preserves the culture of the Estonian people and spearheads the country's reputation in research and provision of higher education. University of Tartu is the leading partner of the Center of Estonian Language Resources (CELR) consortium, other partners are Tallinn University of Technology, Institute of the Estonian Language and Estonian Literary Museum. The goal of CELR is to create and manage an infrastructure to make the Estonian language digital resources (dictionaries, corpora, various language databases) and language technology tools (software) available to everyone working with digital language materials. The main users of CELR are researchers from Estonian R&D institutions and Social Sciences and Humanities researchers all over the world via the CLARIN ERIC network of similar centers in Europe.