Lingsoft's Spelling Checker Component for Danish
|Lingsoft® DANSPELL is Lingsoft's high-quality spelling checker component for Danish, designed for checking basic spelling errors in standard written Danish. It adheres to commonly known and accepted spelling norms presented in established reference works available.|
Lingsoft endeavors to keep the subtle balance between recall (the rate of correctly spelled words recognized) and precision (the rate of errors detected) by performing rigorous regression testing when changes are made to the language model. Particular care has been taken to avoid masking, which means that a frequent spelling error is hidden by a rare word being spelled exactly as the erroneously spelled.
Based on Lingsoft's Model of DanishDANSPELL uses Lingsoft's comprehensive two-level model of Danish morphology to recognize inflected, derivative and compound word forms, and to generate correction suggestions. The model contains more than 100 000 lexical entries, covering the central vocabulary of Danish, including abbreviations, acronyms, proper names and numerals. Two-level rules take care of word transformation issues like "bog, bøger" (book, books).
The inflectional mechanism recognizes all the morphologically correct inflected word forms. The derivational and compositional mechanisms allow for new words to be formed based on words known to the model. The generative mechanisms have been restricted to increase precision, meaning that not all morphologically acceptable compound or derivative words are recognized. Considering the productive compounding in Swedish, the amount of recognized words can be measured in millions.
The lexical content and two-level rules of the language model are compiled to a fast and compact finite-state transducer, which along with the program code and other data are included in a binary file of only about 1.5 MB.
A Suggestion Mechanism that WorksDANSPELL attempts to suggest corrections to words it doesn't recognize as correctly spelled. The basic suggestion mechanism suggests all recognized words with the editing distance of one (one-letter addition, deletion or transposition, except for the first letter of the word). More wide-ranging and more specific suggestions are given to common spelling errors. Some particular common spelling errors receive only the typically appropriate correction(s).
DANSPELL generally avoids suggesting words that may seem awkward or incomprehensible for the user. In particular generated compounds and derivatives are only suggested based on segment-specific correction rules. DANSPELL also endeavors not to suggest words that may potentially seem offensive for the user. If suitable suggestions are not found, no suggestions are given.
Stunning Performance and PrecisionDANSPELL can analyze more than 30 000 words per second on an Intel Xeon @ 3.0 GHz running Linux, and recognizes more than 95% of the correctly spelled words in typical running text.
Software Integration Made EasyDANSPELL can be integrated to provide spell-checking to almost any software application, including web-based services, with Lingsoft's proprietary LSPROOF-API application programming interface for Windows, Linux, Mac and Java. The character set used with LSPROOF is Unicode.
Lingsoft® DANSPELL: Copyright © Lingsoft, Inc. 1986-2010. Two-Level Compiler: Copyright © Xerox Corporation 1994. Lingsoft is a registered trademark and DANSPELL and LSPROOF are trademarks of Lingsoft, Inc. All rights reserved. Details subject to change.
Copyright ©1986-2017, Lingsoft Ltd.