Lingsoft® 

FINTWOL - Morphological Analyzer for Finnish

Select language  
Search from site  

Contents


Introduction

FINTWOL is Lingsoft's morphological analyzer component for Finnish. It is based on the two-level model (TWOL). It is available for use with Lingsoft's proprietary LSINDEX application programming interface (API). Lingsoft may also provide support for some other software vendors' APIs with software based on LSINDEX or LSLING, this specification generally applies to such implementations as well.

FINTWOL adheres to commonly known and accepted spelling norms of standard written Finnish, which are presented in established references works available at the date of the latest update. However, FINTWOL does not include all norms and rules presented in these references.

Dictionary

FINTWOL is Lingsoft's two-level model of Finnish morphology, originally developed by Kimmo Koskenniemi at the University of Helsinki. The FINTWOL dictionary contains over 55 000 entries (lexemes), covering the central vocabulary of Finnish, including abbreviations, acronyms, proper names and numerals.

The character set used with FINTWOL depends on the API used (Unicode with LSINDEX, ISO-8859-1 with LSLING). The internal character set used by FINTWOL is ISO-8859-1, with special features to accommodate words containing characters outside this character set.

Morphology

The morphology part of the FINTWOL lexicon is a comprehensive model of the inflectional, derivational and compositional morphology of Finnish. The inflectional morphology provides the correct inflections for the words in the dictionary. The derivational and compositional mechanisms allow for new words to be formed based on the words in the dictionary. The generative mechanisms have been restricted to increase precision, meaning that not all morphologically acceptable compound or derivative words are recognized. On the other hand, the generative mechanisms are generally not semantically sensitive, meaning that such words can be recognized which may seem odd or meaningless in reality.

The dictionary and the morphology together constitute the FINTWOL lexicon, which along with other data are included in the FINTWOL lexicon file. The size of the lexicon file is approximately 1.5 MB.

FINTWOL Analyses

FINTWOL is typically used to provide an analysis for input words consisting of a base form and a list of morphosyntactic features for that particular form. The base form may contain special boundary characters for marking various types of morpheme boundaries. The morphosyntactic features are encoded with tags. Documentation for the boundary characters and the tags can be found in separate appendixes.

FINTWOL as used alone does not disambiguate; that is, it analyzes each input word in isolation and provides all possible analyses for the word in question. Disambiguation can be achieved by using FINTWOL together with FINCG.

Performance

On an Intel Xeon @ 3.0 GHz running Linux, FINTWOL can analyze approximately 250 kB (approximately 30 000 words) of typical running text per second. FINTWOL recognizes over 95% of the correctly spelled words in typical running text.

Copyrights for FINTWOL

FINTWOL: Copyright © Lingsoft, Inc. year of latest update.
Two-Level Compiler: Copyright © Xerox Corporation 1994.
All rights reserved.

Appendixes



Lingsoft is a registered trademark and FINTWOL, LSLING, LSINDEX and TWOL are trademarks of Lingsoft, Inc.
Copyright © Lingsoft, Inc. 2006.
All rights reserved. Details subject to change.

 

Print   RSS




Copyright ©1986-2010, Lingsoft Ltd.