MICROSERVICES AT YOUR SERVICE BRIDGING THE GAP BETWEEN NLP RESEARCH AND INDUSTRY
Our mission is to simplify finding and using the many superb open source speech and language processing tools that the European Research Community has to offer.
The European Union’s Connecting Europe Facility has given us support to fill the ELRC-SHARE and European Language Grid with such tools! We are taking yet another step towards fulfilling the vision of a European digital single market!
OUR PLAN 2021-2023
- Reach out to the European research community to help us identify suitable open source tools (2021)
- Help the researchers packaging the tools to facilitate re-use by other developers and researchers (2021-2022)
- Make the tools available on the European Union’s own platforms for language technology ELRC-SHARE and European Language Grid (2022-2023)
Do you know of open source tools that could be of interest to us? Perhaps tools you or your group have developed?
We are organizing workshops for how to add a server API and package your tool as an easily distributable docker image in March 2022. Come join us!
Would you like to be kept up-to-date with the progress of our project? Join our concluding seminar in February 2023 to see which tools we found and made available!
Read our blog articles on use cases of NLP tools!
Why is a Really Really Good Language Identification Tool Important when Training AI?
Workshops
Recordings of our workshops are available on our workshop page.
We have organized workshops in which we presented our work, the tools and possibilities to contribute in the ELG community.
Lingsoft's Tiina Lindh-Knuutila presented the project and some of the tools we have made available in META-FORUM 2022 conference in the beginning of June.
Lingsoft's Sebastian Andersson presented the Microservices project in the 6th ELRC Conference on March 31, 2022. Watch his presentation here!
Project results
Our project contributes easy-to use Docker containers and services in the ELG platform. This list is constantly updated when new tools become available.
Tool name | Description | Language | Docker image & ELG catalogue | Original creator | Partner |
HeLI OTS | HeLI-OTS (off-the-shelf) is a language identifier with language models for 200 languages. | Multilingual | Docker image ELG catalogue | University of Helsinki | Lingsoft |
Finto AI | Finto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing. | Finnish | Docker image ELG catalogue | National Library of Finland | Lingsoft |
Finto AI | Finto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing. | Swedish | Docker image ELG catalogue | National Library of Finland | Lingsoft |
Finto AI | Finto AI suggests topics for a given text. It's based on Annif, a tool for automated subject indexing. | English | Docker image ELG catalogue | National Library of Finland | Lingsoft |
KB BERT NER SV | Swedish Named Entity Recognition tool. | Swedish | Docker image ELG catalogue | National Library of Sweden (KBLab) | Lingsoft |
KB BERT Senti SV | Swedish sentiment classification tool. | Swedish | Docker image ELG catalogue | Martin Malmsten (National Library of Sweden / KBLab) | Lingsoft |
KB BERT NER NO | Norwegian Named Entity Recognition tool | Norwegian | Docker image ELG catalogue | National Library of Norway /NbAiLab) | Lingsoft |
Aalto-kaldi-align | Aalto-kaldi-align aligns text and the corresponding audio. | Finnish | Docker image ELG catalogue | Aalto University | Lingsoft |
Aalto-kaldi-align | Aalto-kaldi-align aligns text and the corresponding audio. | Estonian | Docker image ELG catalogue | Aalto University | Lingsoft |
Aalto-kaldi-align | Aalto-kaldi-align aligns text and the corresponding audio. | Northern Sami | Docker image ELG catalogue | Aalto University | Lingsoft |
Aalto-kaldi-align | Aalto-kaldi-align aligns text and the corresponding audio. | English | Docker image ELG catalogue | Aalto University | Lingsoft |
Aalto-kaldi-align | Aalto-kaldi-align aligns text and the corresponding audio. | Komi | Docker image ELG catalogue | Aalto University | Lingsoft |
MeMAD lidbox | Multilingual spoken language identification tool. | Multilingual | Docker image ELG catalogue | Aalto University | Lingsoft |
Lithuanian spaCy (tagger) | ELG endpoint to Lithuanian spaCY tagger. | Lithuanian | Docker image ELG catalogue | Explosion | Lingsoft |
Lithuanian spaCy (NER) | ELG endpoint to Lithuanian spaCY Named Entity Recognition. | Lithuanian | Docker image ELG catalogue | Explosion | Lingsoft |
FinBERT NER | Finnish Named Entity Recognition. | Finnish | Docker image ELG catalogue | University of Turku | Lingsoft |
Multi-label register classification | Tool to infer the genre (register) of a text (Finnish, Swedish, English French). | Multilingual | Docker image ELG catalogue | University of Turku | Lingsoft |
Turku neural parser (proxy) | Turku Neural Parser Pipeline. | Finnish | Docker image ELG catalogue | University of Turku | Lingsoft |
Turku neural parser (LT tool hosted by UTU) | The ELG-compatible docker version of the Turku Neural Parser hosted by University of Turku and accessed via the proxy. | Finnish | Docker image | University of Turku | Lingsoft |
Morphological Analyzer for Latvian - word analysis | Morphological Analysis tool for Latvian that analyses the word. | Latvian | Docker image ELG catalogue | University of Latvia | Lingsoft |
Morphological Analyzer for Latvian - wordforms | Morphological analysis tool that returns word forms for a given word. | Latvian | Docker image ELG catalogue | University of Latvia | Lingsoft |
LV Tagger | Tagger tool for Latvian. | Latvian | Docker image ELG catalogue | University of Latvia | Lingsoft |
Northern Sami grammar checker | Tool for checking the grammar in Northern Sami. | Northern Sami | Docker image ELG catalogue | UiT The Arctic University of Norway | Lingsoft |
EstNLTK tokenizer | Tokenizer for written Estonian marks words, punctuation marks, numbers and other tokens one finds in text. | Estonian | Docker image ELG catalogue | University of Tartu | University of Tartu |
Vabamorf morf | Morphological analyser Vabamorf for Estonian. | Estonian | Docker image ELG catalogue | Filosoft | University of Tartu |
Vabamorf disambiguator | After words have been tagged by the morphological analyser Vabamorf, Vabamorf disambiguator chooses the most plausible analyses for these words, based on their context in the text. | Estonian | Docker image ELG catalogue | Filosoft | University of Tartu |
Est TTS preprocessor | Converts Estonian non-word tokens (numbers, symbols, abbreviations, accronyms) to words for subsequent speech synthesis. | Estonian | Docker image ELG catalogue | University of Tartu | University of Tartu |
HTS Speech Synthesiser | HMM-Based Speech Synthesis for Estonian. | Estonian | Docker image ELG catalogue | Institute of the Estonian Language | University of Tartu |
Vabamorf generator | Given a word lemma and grammatical categories, Vabamorf generator synthesizes the corresponding wordform. | Estonian | Docker image ELG catalogue | Filosoft | University of Tartu |
CG syntax parser | Syntactic analysis of a sentence, by a parser following Constraint Grammar rules. | Estonian | Docker image ELG catalogue | University of Tartu | University of Tartu |
spaCy tagger | SpaCy pipeline for morphological and syntactic analysis of Estonian; outputs UD-features as morphological categories. | Estonian | Docker image ELG catalogue | University of Tartu | University of Tartu |
Grapheme-to-phoneme engine | Converts graphemes to phonemes. | Estonian | Docker image ELG catalogue | Tallinn University of Technology | University of Tartu |
Syllabifier | Syllabifier marks syllables in written Estonian words. | Estonian | Docker image ELG catalogue | University of Tartu | University of Tartu |
Tokenizer | Tokenizer for Icelandic. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
Icenip | Natural Language Processing Toolkit. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
Iceparser | Shallow parser for Icelandic. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
NER | Named Entity Recognizer for Icelandic. | Icelandic | Docker image (old version) Docker image (new version) ELG catalogue (old version) ELG catalogue (new version) | Reykjavík University | Reykjavík University |
POS | Part-of-Speech tagger for Icelandic. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
ABLTagger | Part of Speech tagger for Faroese. | Faroese | Docker image ELG catalogue | University of Iceland | Reykjavík University |
Icesum | Provides an extractive summary of an input text in Icelandic. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
Nefnir | Lemmatizer for Icelandic. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
GreynirSeq (is -> en) | Machine translation model for Icelandic to English. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
GreynirSeq (en -> is) | Machine translation model for English to Icelandic. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
BinPackage | The Database of Icelandic Morphology encapsulated in a Python package. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
GreynirCorrect | Spelling and grammar correction tool. | Icelandic | Docker image ELG catalogue | Reykjavík University | Reykjavík University |
TranslateAlignRetrieve - Spanish QA | Question&Answering in Spanish. | Spanish | Docker image ELG catalogue | TALP - Center for Language and Speech Technologies and Applications | Gradiant |
TWilBert | BERT specialization for the Spanish language and the Twitter domain. | Spanish | Docker image ELG catalogue | ELiRF - Enginyeria del Llenguatge Natural i Reconeiximent de Formes | Gradiant |
BETO: Spanish BERT | BERT model trained on Spanish. | Spanish | Docker image ELG catalogue | ReLeLa - Departamento de Ciencias de la Computación Universidad de Chile | Gradiant |
LM-SPANISH | Question&Answering in Spanish. | Spanish | Docker image ELG catalogue | BSC - Barcelona Supercomputing Center - Text Mining Unit | Gradiant |
Emoevales-iberlef2021 | Emotion Analysis of Spanish Tweets. | Spanish | Docker image ELG catalogue | GSI - Grupo de Sistemas Inteligentes (UPM) | Gradiant |
QAPTNET | BERT model for question-answering tasks, trained in Portuguese. | Portuguese | Docker image ELG catalogue | Independent Development | Gradiant |
QueLingua | Language indentifier for several languages. | Multilingual | Docker image ELG catalogue | CiTIUS - Centro Singular de Investigación en Tecnoloxías Intelixentes | Gradiant |
BERTimbau | Pre-trained BERT models trained on the Portuguese language. | Portuguese | Docker image ELG catalogue | NeuralMind Inteligencia Artificial | Gradiant |
Bertinho | A pre-trained BERT model for Galician. | Galician | Docker image ELG catalogue | LyS-CITIC - Lengua Y Sociedad de la Información | Gradiant |
Nlpnet | It performs part-of-speech tagging, semantic role labeling and dependency parsing. Mostly language independent, but some tailored for Portuguese. | Portuguese | Docker image ELG catalogue | NILC - Interinstitutional Center for Computational Linguistics (ICMC - University of São Paulo) | Gradiant |
Julibert | BERT-like models trained in catalan. | Catalan | Docker image ELG catalogue | SOFTCATALA | Gradiant |
FARO | Extract sensitivity indicators from documents (e.g. Document IDs, monetary quantities, personal emails) and gives a sensitivity score to the document. | Spanish | Docker image ELG catalogue | Gradiant | Gradiant |
Teco | Adapts selected Portuguese expressions enhancing relatedness, originality and, possibly, funniness. | Portuguese | Docker image ELG catalogue | CISUC - Centre for Informatics and Systems of the University of Coimbra | Gradiant |
Ixa Pipes | Part of Speech tagging in Portuguese. | Portuguese | Docker image ELG catalogue | IXA NLP Group of the University of the Basque Country | Gradiant |
False Friends | Distinguishing true and false friends between Spanish and Portuguese. | Portuguese | Docker image ELG catalogue | Natural Language Processing Group from University of the Republic, Uruguay | Gradiant |
Berteus | A pre-trained BERT model for Basque. | Basque | Docker image ELG catalogue | IXA NLP Group of the University of the Basque Country. | Gradiant |
berta_qa_catalan | Question&Answering in Catalan. | Catalan | Docker image ELG catalogue | Gradiant |
INTRODUCING OUR CONSORTIUM
Gradiant
Spanish ICT technology centre aims to improve the competitiveness of companies by transfering knowledge and technologies in the fields of connectivity, intelligence and security. With more than 100 professional and 285 R&D&i projects, they’re becoming one of the main engines of innovation in Galicia.
Gradiant is backed by a board that includes representatives of the three Galician universities (Vigo, Santiago and A Coruña) and seven companies from the telecommunications industry: Altia, Arteixo Telecom, Egatel, Indra, Plexus, R, Telefónica, Televés; and INEO business association.
Gradiant is positioned as a technology partner for the industry, oriented to their needs in the ICT field. They are contributing with national and international experience in technologies for security and privacy; processing of multimedia signals; Internet of Things; Natural Language Processing, biometrics and data analytics; and advanced communications systems.
Lingsoft
Lingsoft Oy and its sister company Lingsoft Language Services Oy are part of the Lingsoft Group with a consolidated turnover of about 12,5 million euros in 2019 making us one of the 100 largest language service providers in the world. Founded in 1986, Lingsoft is a reliable, experienced and innovative partner. Lingsoft makes available a wide variety of language technology solutions and language services, designed for the analysis, processing and utilization of written and spoken language. Our solutions are making the text FAIR - Findable, Accessible, Interoperable and Reusable in online society. Lingsoft's core technologies and solutions have been tested by tens of millions of users around the world as part of the Microsoft Office suite of proofing tools. Lingsoft is the coordinator of the “Microservices at Your Service”.
Reykjavik University
Reykjavik University is a dynamic international university with 3800 registered students and 250 permanent faculty and staff. The university focuses on research, excellence in teaching, entrepreneurship, technology development, and co-operation with the business community. The Language and Voice Lab (LVL) was established in 2016 as a part of the research center in Artificial Intelligence with the aim of carrying out research and development in speech and language processing. LVL is part of the Icelandic National Language Technology Programme. This is a consortium of universities, institutions, associations, and private companies with the aim of ensuring that Icelandic can be supported in modern language technology applications.
University of Tartu
University of Tartu is the leading centre of research and training in Estonia. It preserves the culture of the Estonian people and spearheads the country's reputation in research and provision of higher education. University of Tartu is the leading partner of the Center of Estonian Language Resources (CELR) consortium, other partners are Tallinn University of Technology, Institute of the Estonian Language and Estonian Literary Museum. The goal of CELR is to create and manage an infrastructure to make the Estonian language digital resources (dictionaries, corpora, various language databases) and language technology tools (software) available to everyone working with digital language materials. The main users of CELR are researchers from Estonian R&D institutions and Social Sciences and Humanities researchers all over the world via the CLARIN ERIC network of similar centers in Europe.
The contents of this publication are the sole responsibility of the Microservices project and do not necessarily reflect the opinion of the European Union.