Lingsoft's Spelling Checker Component for Swahili
|Lingsoft® SWASPELL is Lingsoft's high-quality spelling checker component for Swahili, designed for checking basic spelling errors in standard written Swahili. It adheres to commonly known and accepted spelling norms presented in established reference works available.|
Lingsoft endeavors to keep the subtle balance between recall (the rate of correctly spelled words recognized) and precision (the rate of errors detected) by performing rigorous regression testing when changes are made to the language model. Particular care has been taken to avoid masking, which means that a frequent spelling error is hidden by a rare word being spelled exactly as the erroneously spelled.
Based on Lingsoft's Model of SwahiliSWASPELL uses Lingsoft's comprehensive two-level model of Swahili morphology to recognize inflected, derivative and compound word forms, and to generate correction suggestions. The model contains more than 25,000 lexical entries, covering the central vocabulary of Swahili, including abbreviations, acronyms, proper names and numerals. Two-level rules take care of word transformation issues like "kitabu, vitabu" (book, books).
The inflectional mechanism recognizes all the morphologically correct inflected word forms. The derivational and compositional mechanisms allow for new words to be formed based on words known to the model. The generative mechanisms have been restricted to increase precision, meaning that not all morphologically acceptable compound or derivative words are recognized. Considering the productive compounding in Swahili, the amount of recognized words can be measured in millions.
The lexical content and two-level rules of the language model are compiled to a fast and compact finite-state transducer, which along with the program code and other data are included in a binary file of only about 300 kB.
A Suggestion Mechanism that WorksSWASPELL attempts to suggest corrections to words it doesn't recognize as correctly spelled. The basic suggestion mechanism suggests all recognized words with the editing distance of one (one-letter addition, deletion or transposition, except for the first letter of the word). More wide-ranging and more specific suggestions are given to common spelling errors. Some particular common spelling errors receive only the typically appropriate correction(s).
SWASPELL generally avoids suggesting words that may seem awkward or incomprehensible for the user. In particular generated compounds and derivatives are only suggested based on segment-specific correction rules. SWASPELL also endeavors not to suggest words that may potentially seem offensive for the user. If suitable suggestions are not found, no suggestions are given.
Software Integration Made EasySWASPELL can be integrated to provide spell-checking to almost any software application, including web-based services, with Lingsoft's proprietary LSPROOF-API application programming interface for Windows, Linux, Mac and Java. The character set used with LSPROOF is Unicode.
Lingsoft® SWASPELL: Copyright © Lingsoft, Inc. 1986-2010. Two-Level Compiler: Copyright © Xerox Corporation 1994. Lingsoft is a registered trademark and SWASPELL and LSPROOF are trademarks of Lingsoft, Inc. All rights reserved. Details subject to change.
Copyright ©1986-2016, Lingsoft Ltd.