Day 2: Spellchecker notes
mer, 23/02/2020 - 16:15
Spellcheckers (using Hunspell engine)
- paa.kwesi compiling league table of spellcheckers
- Tunde looking to work on Igbo spellchecker
- Tunde suggests creating a spellchecker how-to
- outline of manual
- explanation of how Hunspell works
- San-James asking for a checklist of what to do in order to create a spellchecker?
- work with linguist or get data from someone with linguistic
information on a language
- large amount of text as corpus
- write tools to crawl the web as a way to collect text for corpus
- corpus catcher from ANLoc
- corpus crawler from Kevin Scannell
- write tools to crawl the web as a way to collect text for corpus
- generate word list from text
- Taha mentions there are about 400 stop-words or prepositions
which you can consider to include in wordlist for Hunspell
- make conjugator as a tool to automatically generate words for verbs in order to add to word list.
- Hunspell code complicated. includes language-specific code
- Aspell more structured, but not as advanced or as widely used
as Hunspell
- Tunde mentions spelt, which is a tool by ANLoc to help automatically recognise patterns in words
- paa.kwesi points out that for spellcheckers that already exist in
Hunspell, they could be included in applications that are still being localised
»
- Version imprimable
- 2269 lectures