MICROSERVICES AT YOUR SERVICE BRIDGING THE GAP BETWEEN NLP RESEARCH AND INDUSTRY

Our mission is to simplify finding and using the many superb open source speech and language processing tools that the European Research Community has to offer.

The European Union’s Connecting Europe Facility has given us support to fill the ELRC-SHARE and European Language Grid with such tools! We are taking yet another step towards fulfilling the vision of a European digital single market!

See also the project description at the INEA site.

Gradiant

Lingsoft

Reykjavik University

University of Tartu

OUR PLAN 2021-2023

  1. Reach out to the European research community to help us identify suitable open source tools (2021)
     
  2. Help the researchers packaging the tools to facilitate re-use by other developers and researchers (2021-2022)
     
  3. Make the tools available on the European Union’s own platforms for language technology ELRC-SHARE and European Language Grid (2022-2023)

Do you know of open source tools that could be of interest to us? Perhaps tools you or your group have developed?

We are organizing workshops for how to add a server API and package your tool as an easily distributable docker image in March 2022. Come join us!

Would you like to be kept up-to-date with the progress of our project? Join our concluding seminar in February 2023 to see which tools we found and made available!

 

Contact us

Lingsoft's Tiina Lindh-Knuutila presented the project and some of the tools we have made available in META-FORUM 2022 conference in the beginning of June. 

Lingsoft's Sebastian Andersson presented the Microservices project in the 6th ELRC Conference on March 31, 2022. Watch his presentation here!

Workshops

We organize workshops in which we present our work, the tools and possibilities to contribute in the ELG community.

Our newest workshop "ELG, a bridge for NLP development" was held in March. See the recording on our workshop site.

Workshops

Project results

Our project contributes easy-to use Docker containers and services in the ELG platform. This list is constantly updated when new tools become available.

Tool name Description Language Docker image & ELG catalogue Original creator Partner
HeLI OTS HeLI-OTS (off-the-shelf) is a language identifier with language models for 200 languages. Multilingual Docker image University of Helsinki Lingsoft
Finto AI Finto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing. Finnish Docker image
ELG catalogue
National Library of Finland Lingsoft
Finto AI Finto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing. Swedish Docker image
ELG catalogue
National Library of Finland Lingsoft
Finto AI Finto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing. English Docker image
ELG catalogue
National Library of Finland Lingsoft
KB BERT NER SV Swedish Named Entity Recognition tool. Swedish Docker image
ELG catalogue
National Library of Sweden (KBLab) Lingsoft
KB BERT Senti SV Swedish sentiment classification tool. Swedish Docker image
ELG catalogue
Martin Malmsten (National Library of Sweden / KBLab) Lingsoft
KB BERT NER NO Norwegian Named Entity Recognition tool Norwegian Docker image
ELG catalogue
National Library of Norway /NbAiLab) Lingsoft
Aalto-kaldi-align Aalto-kaldi-align aligns text and the corresponding audio. Finnish Docker image
ELG catalogue 
Aalto University Lingsoft
Aalto-kaldi-align Aalto-kaldi-align aligns text and the corresponding audio. Estonian Docker image
ELG catalogue
Aalto University Lingsoft
Aalto-kaldi-align Aalto-kaldi-align aligns text and the corresponding audio. Northern Sami Docker image
ELG catalogue
Aalto University Lingsoft
Aalto-kaldi-align Aalto-kaldi-align aligns text and the corresponding audio. English Docker image
ELG catalogue
Aalto University Lingsoft
Aalto-kaldi-align Aalto-kaldi-align aligns text and the corresponding audio. Komi Docker image
ELG catalogue
Aalto University Lingsoft
MeMAD lidbox  Multilingual spoken language identification tool. Multilingual Docker image
ELG catalogue
Aalto University Lingsoft
Lithuanian spaCy (tagger) ELG endpoint to Lithuanian spaCY tagger. Lithuanian Docker image
ELG catalogue
Explosion Lingsoft
Lithuanian spaCy (NER) ELG endpoint to Lithuanian spaCY Named Entity Recognition. Lithuanian Docker image
ELG catalogue
Explosion Lingsoft
FinBERT NER Finnish Named Entity Recognition. Finnish Docker image
ELG catalogue
University of Turku Lingsoft
Multi-label register classification Tool to infer the genre (register) of a text (Finnish, Swedish, English French). Multilingual Docker image
ELG catalogue
University of Turku Lingsoft
Turku neural parser (proxy) Turku Neural Parser Pipeline. Finnish Docker image
ELG catalogue
University of Turku Lingsoft
Turku neural parser (LT tool hosted by UTU) The ELG-compatible docker version of the Turku Neural Parser hosted by University of Turku and accessed via the proxy. Finnish Docker image University of Turku Lingsoft

Morphological Analyzer for Latvian - word analysis

Morphological Analysis tool for Latvian that analyses the word. Latvian Docker image
ELG catalogue
University of Latvia Lingsoft

Morphological Analyzer for Latvian - wordforms

Morphological analysis tool that returns word forms for a given word. Latvian Docker image
ELG catalogue
University of Latvia Lingsoft
LV Tagger Tagger tool for Latvian. Latvian Docker image
ELG catalogue
University of Latvia Lingsoft
Northern Sami grammar checker Tool for checking the grammar in Northern Sami. Northern Sami Docker image
ELG catalogue
UiT The Arctic University of Norway Lingsoft
EstNLTK tokenizer Tokenizer for written Estonian marks words, punctuation marks, numbers and other tokens one finds in text. Estonian Docker image
ELG catalogue
University of Tartu University of Tartu
Vabamorf morf Morphological analyser Vabamorf for Estonian. Estonian Docker image
ELG catalogue
Filosoft University of Tartu
Vabamorf disambiguator After words have been tagged by the morphological analyser Vabamorf, Vabamorf disambiguator chooses the most plausible analyses for these words, based on their context in the text. Estonian Docker image
ELG catalogue
Filosoft University of Tartu
Est TTS preprocessor Converts Estonian non-word tokens (numbers, symbols, abbreviations, accronyms) to words for subsequent speech synthesis. Estonian Docker image
ELG catalogue
University of Tartu University of Tartu
HTS Speech Synthesiser HMM-Based Speech Synthesis for Estonian. Estonian Docker image
ELG catalogue
Institute of the Estonian Language University of Tartu
Vabamorf generator Given a word lemma and grammatical categories, Vabamorf generator synthesizes the corresponding wordform. Estonian Docker image
ELG catalogue
Filosoft University of Tartu
CG syntax parser Syntactic analysis of a sentence, by a parser following Constraint Grammar rules. Estonian Docker image
ELG catalogue
University of Tartu University of Tartu
spaCy tagger SpaCy pipeline for morphological and syntactic analysis of Estonian; outputs UD-features as morphological categories. Estonian Docker image
ELG catalogue
University of Tartu University of Tartu
Grapheme-to-phoneme engine Converts graphemes to phonemes. Estonian Docker image
ELG catalogue
Tallinn University of Technology University of Tartu
Syllabifier Syllabifier marks syllables in written Estonian words. Estonian Docker image University of Tartu University of Tartu
Tokenizer   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
Icenip   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
Iceparser   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
NER   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
POS   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
ABLTagger   Faroese Docker image
ELG catalogue
University of Iceland Reykjavík University
Icesum   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
Nefnir   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
GreynirSeq (is -> en)   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
GreynirSeq (en -> is)   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
BinPackage   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
GreynirCorrect   Icelandic Docker image
ELG catalogue
Reykjavík University Reykjavík University
TranslateAlignRetrieve - Spanish QA Question&Answering in Spanish. Spanish Docker image
ELG catalogue
TALP - Center for Language and Speech Technologies and Applications Gradiant
TWilBert BERT specialization for the Spanish language and the Twitter domain. Spanish Docker image
ELG catalogue
ELiRF - Enginyeria del Llenguatge Natural i Reconeiximent de Formes Gradiant
BETO: Spanish BERT BERT model trained on Spanish. Spanish Docker image
ELG catalogue
ReLeLa - Departamento de Ciencias de la Computación Universidad de Chile Gradiant
LM-SPANISH Question&Answering in Spanish. Spanish Docker image
ELG catalogue
BSC - Barcelona Supercomputing Center - Text Mining Unit Gradiant
Emoevales-iberlef2021 Emotion Analysis of Spanish Tweets. Spanish Docker image
ELG catalogue
GSI - Grupo de Sistemas Inteligentes (UPM) Gradiant
QAPTNET BERT model for question-answering tasks, trained in Portuguese. Portugese Docker image
ELG catalogue
Independent Development Gradiant
QueLingua Language indentifier for several languages. Multilingual Docker image CiTIUS - Centro Singular de Investigación en Tecnoloxías Intelixentes Gradiant
BERTimbau Pre-trained BERT models trained on the Portuguese language. Portugese Docker image
ELG catalogue
NeuralMind Inteligencia Artificial Gradiant
Bertinho A pre-trained BERT model for Galician. Galician Docker image
ELG catalogue
LyS-CITIC - Lengua Y Sociedad de la Información Gradiant
Nlpnet It performs part-of-speech tagging, semantic role labeling and dependency parsing. Mostly language independent, but some tailored for Portuguese. Portugese Docker image NILC - Interinstitutional Center for Computational Linguistics (ICMC - University of São Paulo) Gradiant
Julibert BERT-like models trained in catalan. Catalan Docker image
ELG catalogue
SOFTCATALA Gradiant
FARO Extract sensitivity indicators from documents (e.g. Document IDs, monetary quantities, personal emails) and gives a sensitivity score to the document. Spanish Docker image Gradiant Gradiant
Teco Adapts selected Portuguese expressions enhancing relatedness, originality and, possibly, funniness. Portuguese Docker image CISUC - Centre for Informatics and Systems of the University of Coimbra Gradiant
Ixa Pipes Part of Speech tagging in Portuguese. Portuguese Docker image IXA NLP Group of the University of the Basque Country Gradiant
False Friends Distinguishing true and false friends between Spanish and Portuguese. Portuguese Docker image Natural Language Processing Group from University of the Republic, Uruguay Gradiant
Berteus A pre-trained BERT model for Basque. Basque Docker image IXA NLP Group of the University of the Basque Country. Gradiant

INTRODUCING OUR CONSORTIUM

 

Gradiant

Spanish ICT technology centre aims to improve the competitiveness of companies by transfering knowledge and technologies in the fields of connectivity, intelligence and security. With more than 100 professional and 285 R&D&i projects, they’re becoming one of the main engines of innovation in Galicia. 

Gradiant is backed by a board that includes representatives of the three Galician universities (Vigo, Santiago and A Coruña) and seven companies from the telecommunications industry: Altia, Arteixo Telecom, Egatel, Indra, Plexus, R, Telefónica, Televés; and INEO business association.

Gradiant is positioned as a technology partner for the industry, oriented to their needs in the ICT field. They are contributing with national and international experience in technologies for security and privacy; processing of multimedia signals; Internet of Things; Natural Language Processing, biometrics and data analytics; and advanced communications systems. 

 

Lingsoft

Lingsoft Oy and its sister company Lingsoft Language Services Oy are part of the Lingsoft Group with a consolidated turnover of about 12,5 million euros in 2019 making us one of the 100 largest language service providers in the world. Founded in 1986, Lingsoft is a reliable, experienced and innovative partner. Lingsoft makes available a wide variety of language technology solutions and language services, designed for the analysis, processing and utilization of written and spoken language. Our solutions are making the text FAIR - Findable, Accessible, Interoperable and Reusable in online society. Lingsoft's core technologies and solutions have been tested by tens of millions of users around the world as part of the Microsoft Office suite of proofing tools. Lingsoft is the coordinator of the “Microservices at Your Service”. 

Reykjavik University

Reykjavik University is a dynamic international university with 3800 registered students and 250 permanent faculty and staff. The university focuses on research, excellence in teaching, entrepreneurship, technology development, and co-operation with the business community. The Language and Voice Lab (LVL) was established in 2016 as a part of the research center in Artificial Intelligence with the aim of carrying out research and development in speech and language processing. LVL is part of the Icelandic National Language Technology Programme. This is a consortium of universities, institutions, associations, and private companies with the aim of ensuring that Icelandic can be supported in modern language technology  applications.

University of Tartu

University of Tartu is the leading centre of research and training in Estonia. It preserves the culture of the Estonian people and spearheads the country's reputation in research and provision of higher education. University of Tartu is the leading partner of the Center of Estonian Language Resources (CELR) consortium, other partners are Tallinn University of Technology, Institute of the Estonian Language and Estonian Literary Museum. The goal of CELR is to create and manage an infrastructure to make the Estonian language digital resources (dictionaries, corpora, various language databases) and language technology tools (software) available to everyone working with digital language materials. The main users of CELR are researchers from Estonian R&D institutions and Social Sciences and Humanities researchers all over the world via the CLARIN ERIC network of similar centers in Europe.

The contents of this publication are the sole responsibility of the Microservices project and do not necessarily reflect the opinion of the European Union.