Day 2: Spellchecker notes

Spellcheckers (using Hunspell engine)

  • paa.kwesi compiling league table of spellcheckers
  • Tunde looking to work on Igbo spellchecker
  • Tunde suggests creating a spellchecker how-to

  • outline of manual
  • explanation of how Hunspell works
  • San-James asking for a checklist of what to do in order to create a spellchecker?
  • work with linguist or get data from someone with linguistic

information on a language

  • large amount of text as corpus
    • write tools to crawl the web as a way to collect text for corpus
      • corpus catcher from ANLoc
      • corpus crawler from Kevin Scannell
  • generate word list from text
  • Taha mentions there are about 400 stop-words or prepositions

which you can consider to include in wordlist for Hunspell

  • make conjugator as a tool to automatically generate words for verbs in order to add to word list.

  • Hunspell code complicated. includes language-specific code
  • Aspell more structured, but not as advanced or as widely used

as Hunspell

  • Tunde mentions spelt, which is a tool by ANLoc to help automatically recognise patterns in words
  • paa.kwesi points out that for spellcheckers that already exist in

Hunspell, they could be included in applications that are still being localised

corsage plus-size dresses cocktail dresses flower girl dresses mother of the bride dresses quotes about life