Microservices at Your Service Bridging the Gap Between NLP Research and Industry

Our mission is to simplify finding and using the many superb open source speech and language processing tools that the European Research Community has to offer.

The European Union’s Connecting Europe Facility has given us support to fill the ELRC-SHARE and European Language Grid with such tools! We are taking yet another step towards fulfilling the vision of a European digital single market!

Gradiant

Lingsoft

 

University of Reykjavik

University of Tartu

Our plan 2021-2023

  1. Reach out to the European research community to help us identify suitable open source tools (2021)
     
  2. Help the researchers packaging the tools to facilitate re-use by other developers and researchers (2021-2022)
     
  3. Make the tools available on the European Union’s own platforms for language technology ELRC-SHARE and European Language Grid (2022-2023)
     

Do you know of open source tools that could be of interest to us? Perhaps tools you or your group have developed?

We are organizing workshops for how to add a server API and package your tool as an easily distributable docker image in March 2022. Come join us!

Would you like to be kept up-to-date with the progress of our project? Join our concluding seminar in February 2023 to see which tools we found and made available!

Workshops

Our final workshops are coming! See details on our workshop page. 

We organize workshops in which we present our work, the tools and possibilities to contribute in the ELG community.

Recording of our previoius workshops can be seen on our worshop site.

Workshops

Lingsoft's Tiina Lindh-Knuutila presented the project and some of the tools we have made available in META-FORUM 2022 conference in the beginning of June. 

Lingsoft's Sebastian Andersson presented the Microservices project in the 6th ELRC Conference on March 31, 2022. Watch his presentation here!

Project results

Our project contributes easy-to use Docker containers and services in the ELG platform. This list is constantly updated when new tools become available.

Tool nameDescriptionLanguageDocker image & ELG catalogueOriginal creatorPartner
HeLI OTSHeLI-OTS (off-the-shelf) is a language identifier with language models for 200 languages.MultilingualDocker image    
ELG catalogue
University of HelsinkiLingsoft
Finto AIFinto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing.FinnishDocker image    
ELG catalogue
National Library of FinlandLingsoft
Finto AIFinto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing.SwedishDocker image    
ELG catalogue
National Library of FinlandLingsoft
Finto AIFinto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing.EnglishDocker image    
ELG catalogue
National Library of FinlandLingsoft
KB BERT NER SVSwedish Named Entity Recognition tool.SwedishDocker image    
ELG catalogue
National Library of Sweden (KBLab)Lingsoft
KB BERT Senti SVSwedish sentiment classification tool.SwedishDocker image    
ELG catalogue
Martin Malmsten (National Library of Sweden / KBLab)Lingsoft
KB BERT NER NONorwegian Named Entity Recognition toolNorwegianDocker image    
ELG catalogue
National Library of Norway /NbAiLab)Lingsoft
Aalto-kaldi-alignAalto-kaldi-align aligns text and the corresponding audio.FinnishDocker image    
ELG catalogue 
Aalto UniversityLingsoft
Aalto-kaldi-alignAalto-kaldi-align aligns text and the corresponding audio.EstonianDocker image    
ELG catalogue
Aalto UniversityLingsoft
Aalto-kaldi-alignAalto-kaldi-align aligns text and the corresponding audio.Northern SamiDocker image    
ELG catalogue
Aalto UniversityLingsoft
Aalto-kaldi-alignAalto-kaldi-align aligns text and the corresponding audio.EnglishDocker image    
ELG catalogue
Aalto UniversityLingsoft
Aalto-kaldi-alignAalto-kaldi-align aligns text and the corresponding audio.KomiDocker image    
ELG catalogue
Aalto UniversityLingsoft
MeMAD lidbox Multilingual spoken language identification tool.MultilingualDocker image    
ELG catalogue
Aalto UniversityLingsoft
Lithuanian spaCy (tagger)ELG endpoint to Lithuanian spaCY tagger.LithuanianDocker image    
ELG catalogue
ExplosionLingsoft
Lithuanian spaCy (NER)ELG endpoint to Lithuanian spaCY Named Entity Recognition.LithuanianDocker image    
ELG catalogue
ExplosionLingsoft
FinBERT NERFinnish Named Entity Recognition.FinnishDocker image    
ELG catalogue
University of TurkuLingsoft
Multi-label register classificationTool to infer the genre (register) of a text (Finnish, Swedish, English French).MultilingualDocker image    
ELG catalogue
University of TurkuLingsoft
Turku neural parser (proxy)Turku Neural Parser Pipeline.FinnishDocker image    
ELG catalogue
University of TurkuLingsoft
Turku neural parser (LT tool hosted by UTU)The ELG-compatible docker version of the Turku Neural Parser hosted by University of Turku and accessed via the proxy.FinnishDocker imageUniversity of TurkuLingsoft
Morphological Analyzer for Latvian - word analysisMorphological Analysis tool for Latvian that analyses the word.LatvianDocker image    
ELG catalogue
University of LatviaLingsoft
Morphological Analyzer for Latvian - wordformsMorphological analysis tool that returns word forms for a given word.LatvianDocker image    
ELG catalogue
University of LatviaLingsoft
LV TaggerTagger tool for Latvian.LatvianDocker image    
ELG catalogue
University of LatviaLingsoft
Northern Sami grammar checkerTool for checking the grammar in Northern Sami.Northern SamiDocker image    
ELG catalogue
UiT The Arctic University of NorwayLingsoft
EstNLTK tokenizerTokenizer for written Estonian marks words, punctuation marks, numbers and other tokens one finds in text.EstonianDocker image    
ELG catalogue
University of TartuUniversity of Tartu
Vabamorf morfMorphological analyser Vabamorf for Estonian.EstonianDocker image    
ELG catalogue
FilosoftUniversity of Tartu
Vabamorf disambiguatorAfter words have been tagged by the morphological analyser Vabamorf, Vabamorf disambiguator chooses the most plausible analyses for these words, based on their context in the text.EstonianDocker image    
ELG catalogue
FilosoftUniversity of Tartu
Est TTS preprocessorConverts Estonian non-word tokens (numbers, symbols, abbreviations, accronyms) to words for subsequent speech synthesis.EstonianDocker image    
ELG catalogue
University of TartuUniversity of Tartu
HTS Speech SynthesiserHMM-Based Speech Synthesis for Estonian.EstonianDocker image    
ELG catalogue
Institute of the Estonian LanguageUniversity of Tartu
Vabamorf generatorGiven a word lemma and grammatical categories, Vabamorf generator synthesizes the corresponding wordform.EstonianDocker image    
ELG catalogue
FilosoftUniversity of Tartu
CG syntax parserSyntactic analysis of a sentence, by a parser following Constraint Grammar rules.EstonianDocker image    
ELG catalogue
University of TartuUniversity of Tartu
spaCy taggerSpaCy pipeline for morphological and syntactic analysis of Estonian; outputs UD-features as morphological categories.EstonianDocker image    
ELG catalogue
University of TartuUniversity of Tartu
Grapheme-to-phoneme engineConverts graphemes to phonemes.EstonianDocker image    
ELG catalogue
Tallinn University of TechnologyUniversity of Tartu
SyllabifierSyllabifier marks syllables in written Estonian words.EstonianDocker image    
ELG catalogue
University of TartuUniversity of Tartu
Tokenizer

 Tokenizer for Icelandic.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
Icenip

 Natural Language Processing Toolkit.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
Iceparser

 Shallow parser for Icelandic.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
NER

 Named Entity Recognizer for Icelandic.

IcelandicDocker image (old version)    
Docker image (new version)  
ELG catalogue (old version)  
ELG catalogue (new version)
Reykjavík UniversityReykjavík University
POS

 Part-of-Speech tagger for Icelandic.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
ABLTagger

 Part of Speech tagger for Faroese.

FaroeseDocker image    
ELG catalogue
University of IcelandReykjavík University
Icesum

 Provides an extractive summary of an input text in Icelandic.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
Nefnir

 Lemmatizer for Icelandic.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
GreynirSeq (is -> en)

 Machine translation model for Icelandic to English.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
GreynirSeq (en -> is)

 Machine translation model for English to Icelandic.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
BinPackage

 The Database of Icelandic Morphology encapsulated in a Python package.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
GreynirCorrect

 Spelling and grammar correction tool.

IcelandicDocker image    
ELG catalogue
Reykjavík UniversityReykjavík University
TranslateAlignRetrieve - Spanish QAQuestion&Answering in Spanish.SpanishDocker image    
ELG catalogue
TALP - Center for Language and Speech Technologies and ApplicationsGradiant
TWilBertBERT specialization for the Spanish language and the Twitter domain.SpanishDocker image    
ELG catalogue
ELiRF - Enginyeria del Llenguatge Natural i Reconeiximent de FormesGradiant
BETO: Spanish BERTBERT model trained on Spanish.SpanishDocker image    
ELG catalogue
ReLeLa - Departamento de Ciencias de la Computación Universidad de ChileGradiant
LM-SPANISHQuestion&Answering in Spanish.SpanishDocker image    
ELG catalogue
BSC - Barcelona Supercomputing Center - Text Mining UnitGradiant
Emoevales-iberlef2021Emotion Analysis of Spanish Tweets.SpanishDocker image    
ELG catalogue
GSI - Grupo de Sistemas Inteligentes (UPM)Gradiant
QAPTNETBERT model for question-answering tasks, trained in Portuguese.PortugueseDocker image    
ELG catalogue
Independent DevelopmentGradiant
QueLinguaLanguage indentifier for several languages.MultilingualDocker image    
ELG catalogue
CiTIUS - Centro Singular de Investigación en Tecnoloxías IntelixentesGradiant
BERTimbauPre-trained BERT models trained on the Portuguese language.PortugueseDocker image    
ELG catalogue
NeuralMind Inteligencia ArtificialGradiant
BertinhoA pre-trained BERT model for Galician.GalicianDocker image    
ELG catalogue
LyS-CITIC - Lengua Y Sociedad de la InformaciónGradiant
NlpnetIt performs part-of-speech tagging, semantic role labeling and dependency parsing. Mostly language independent, but some tailored for Portuguese.PortugueseDocker image    
ELG catalogue
NILC - Interinstitutional Center for Computational Linguistics (ICMC - University of São Paulo)Gradiant
JulibertBERT-like models trained in catalan.CatalanDocker image    
ELG catalogue
SOFTCATALAGradiant
FAROExtract sensitivity indicators from documents (e.g. Document IDs, monetary quantities, personal emails) and gives a sensitivity score to the document.SpanishDocker image   
ELG catalogue
GradiantGradiant
TecoAdapts selected Portuguese expressions enhancing relatedness, originality and, possibly, funniness.PortugueseDocker image    
ELG catalogue
CISUC - Centre for Informatics and Systems of the University of CoimbraGradiant
Ixa PipesPart of Speech tagging in Portuguese.PortugueseDocker image    
ELG catalogue
IXA NLP Group of the University of the Basque CountryGradiant
False FriendsDistinguishing true and false friends between Spanish and Portuguese.PortugueseDocker image    
ELG catalogue
Natural Language Processing Group from University of the Republic, UruguayGradiant
BerteusA pre-trained BERT model for Basque.BasqueDocker image    
ELG catalogue
IXA NLP Group of the University of the Basque Country.Gradiant
berta_qa_catalanQuestion&Answering in Catalan.CatalanDocker image    
ELG catalogue
 Gradiant

Introducing our Consortium

Gradiant

Spanish ICT technology centre aims to improve the competitiveness of companies by transfering knowledge and technologies in the fields of connectivity, intelligence and security. With more than 100 professional and 285 R&D&i projects, they’re becoming one of the main engines of innovation in Galicia. 

Gradiant is backed by a board that includes representatives of the three Galician universities (Vigo, Santiago and A Coruña) and seven companies from the telecommunications industry: Altia, Arteixo Telecom, Egatel, Indra, Plexus, R, Telefónica, Televés; and INEO business association.

Gradiant is positioned as a technology partner for the industry, oriented to their needs in the ICT field. They are contributing with national and international experience in technologies for security and privacy; processing of multimedia signals; Internet of Things; Natural Language Processing, biometrics and data analytics; and advanced communications systems. 

Lingsoft

Lingsoft Oy and its sister company Lingsoft Language Services Oy are part of the Lingsoft Group with a consolidated turnover of about 12,5 million euros in 2019 making us one of the 100 largest language service providers in the world. Founded in 1986, Lingsoft is a reliable, experienced and innovative partner. Lingsoft makes available a wide variety of language technology solutions and language services, designed for the analysis, processing and utilization of written and spoken language. Our solutions are making the text FAIR - Findable, Accessible, Interoperable and Reusable in online society. Lingsoft's core technologies and solutions have been tested by tens of millions of users around the world as part of the Microsoft Office suite of proofing tools. Lingsoft is the coordinator of the “Microservices at Your Service”. 

Reykjavik University

Reykjavik University is a dynamic international university with 3800 registered students and 250 permanent faculty and staff. The university focuses on research, excellence in teaching, entrepreneurship, technology development, and co-operation with the business community. The Language and Voice Lab (LVL) was established in 2016 as a part of the research center in Artificial Intelligence with the aim of carrying out research and development in speech and language processing. LVL is part of the Icelandic National Language Technology Programme. This is a consortium of universities, institutions, associations, and private companies with the aim of ensuring that Icelandic can be supported in modern language technology  applications.

University of Tartu

University of Tartu is the leading centre of research and training in Estonia. It preserves the culture of the Estonian people and spearheads the country's reputation in research and provision of higher education. University of Tartu is the leading partner of the Center of Estonian Language Resources (CELR) consortium, other partners are Tallinn University of Technology, Institute of the Estonian Language and Estonian Literary Museum. The goal of CELR is to create and manage an infrastructure to make the Estonian language digital resources (dictionaries, corpora, various language databases) and language technology tools (software) available to everyone working with digital language materials. The main users of CELR are researchers from Estonian R&D institutions and Social Sciences and Humanities researchers all over the world via the CLARIN ERIC network of similar centers in Europe.
 

The contents of this publication are the sole responsibility of the Microservices project and do not necessarily reflect the opinion of the European Union.