![]() | FINTWOL - Morphological Analyzer for Finnish |
| ||||||||||
| ContentsIntroductionFINTWOL is Lingsoft's morphological analyzer component for Finnish. It is based on the two-level model (TWOL). It is available for use with Lingsoft's proprietary LSINDEX application programming interface (API). Lingsoft may also provide support for some other software vendors' APIs with software based on LSINDEX or LSLING, this specification generally applies to such implementations as well.FINTWOL adheres to commonly known and accepted spelling norms of standard written Finnish, which are presented in established references works available at the date of the latest update. However, FINTWOL does not include all norms and rules presented in these references. DictionaryFINTWOL is Lingsoft's two-level model of Finnish morphology, originally developed by Kimmo Koskenniemi at the University of Helsinki. The FINTWOL dictionary contains over 55 000 entries (lexemes), covering the central vocabulary of Finnish, including abbreviations, acronyms, proper names and numerals.The character set used with FINTWOL depends on the API used (Unicode with LSINDEX, ISO-8859-1 with LSLING). The internal character set used by FINTWOL is ISO-8859-1, with special features to accommodate words containing characters outside this character set. MorphologyThe morphology part of the FINTWOL lexicon is a comprehensive model of the inflectional, derivational and compositional morphology of Finnish. The inflectional morphology provides the correct inflections for the words in the dictionary. The derivational and compositional mechanisms allow for new words to be formed based on the words in the dictionary. The generative mechanisms have been restricted to increase precision, meaning that not all morphologically acceptable compound or derivative words are recognized. On the other hand, the generative mechanisms are generally not semantically sensitive, meaning that such words can be recognized which may seem odd or meaningless in reality.The dictionary and the morphology together constitute the FINTWOL lexicon, which along with other data are included in the FINTWOL lexicon file. The size of the lexicon file is approximately 1.5 MB. FINTWOL AnalysesFINTWOL is typically used to provide an analysis for input words consisting of a base form and a list of morphosyntactic features for that particular form. The base form may contain special boundary characters for marking various types of morpheme boundaries. The morphosyntactic features are encoded with tags. Documentation for the boundary characters and the tags can be found in separate appendixes.FINTWOL as used alone does not disambiguate; that is, it analyzes each input word in isolation and provides all possible analyses for the word in question. Disambiguation can be achieved by using FINTWOL together with FINCG. PerformanceOn an Intel Xeon @ 3.0 GHz running Linux, FINTWOL can analyze approximately 250 kB (approximately 30 000 words) of typical running text per second. FINTWOL recognizes over 95% of the correctly spelled words in typical running text.Copyrights for FINTWOLFINTWOL: Copyright © Lingsoft, Inc. year of latest update.Two-Level Compiler: Copyright © Xerox Corporation 1994. All rights reserved. AppendixesLingsoft is a registered trademark and FINTWOL, LSLING, LSINDEX and TWOL are trademarks of Lingsoft, Inc. Copyright © Lingsoft, Inc. 2006. All rights reserved. Details subject to change. | |||||||||||