Why Choose Lingsoft's morphological analyzers
|Have you noticed how incomplete your search application is when dealing with inflecting and compounding languages? Would your text mining application benefit from knowing the base forms of each inflected and compound word in your corpus? Or be able to isolate only nouns, only proper names, or only verbs in past tense? Are you planning a mind-blowing service, and would need rich information of the grammatical features of your textual content? Could Lingsoft's morphological analyzer be the solution for your needs? Yes, it could.|
Lingsoft's analyzers provide base forms and grammatical features for requested words. The base forms may contain boundary characters for marking various types of morpheme boundaries. The morphosyntactic grammatical features are encoded with tags. The boundary characters and the tags are language-specific, reflecting the grammatical diversity of languages. The well-documented tags can easily be used for filtering, labeling and organizing the output in a desired manner.
The analyzer treats each input word in isolation and provides all possible analyses for that word. Removing syntactically inconsistent analyses, called disambiguation, can be achieved by applying Lingsoft's corresponding-language constraint-grammar parser to the tagged output of the analyzer.
Based on comprehensive language modelsThe analyzers are based on Lingsoft's proprietary language models, which are core sources of linguistic intelligence for Lingsoft's spelling and grammar checkers, thesauri and hyphenators. The continuous maintenance of the models and their lexical content benefits all these application areas.
Each language model contains a number of lexical entries (lexemes) covering the central vocabulary of the language, including abbreviations, acronyms, proper names and numerals. The lexemes adhere to commonly known and accepted spelling norms, using the best resources and references available.
By adding domain-specific lexemes to the language model, we can fabricate special-purpose analyzers, spelling checkers, and more. A good example is our research-driven development of a Finnish medical text analyzer, based on which we produced a special-purpose spelling checker for healthcare and clinical domain.
The inflectional, derivational and compositional features of the corresponding language are coded in the model. Two-level rules are used to conduct transformation issues. The inflectional mechanism provides correct inflections for words known by the lexicon. The derivational and compositional mechanisms allow for new words to be recognized based on words known by the lexicon. The generative mechanisms have been restricted to increase precision, meaning that not all theoretically acceptable compound or derivative words are allowed to be recognized.
Compact, fast and easy to integrateThe lexical content and the rules of the language model are compiled to a compact and fast finite-state transducer, which along with additional data is included in a language-specific binary file of only some megabytes, yet producing thousands or even tens of thousands of analyses per second. The recognition rate of an analyzer is typically more than 95% of correctly spelled words in typical running text.
Lingsoft's analyzers fit almost any integration scenario with LSINDEX programming library, available for Windows, Linux, Mac and Java. You can connect several language modules to a single instance of LSINDEX. The character set used with LSINDEX is Unicode.
Product specifications | How to buy
Copyright ©1986-2018, Lingsoft Ltd.