Monthly Archives: April 2013

Integrating OpenMedSpel wordlist in Abiword spell checker


I started working on Abiword a few weeks back. I felt comfortable exploring the code base (some difficulty with svn – an exception) and was looking for something interesting to get started on. Browsing through abiwiki I found a suggestion asking for OpenMedSpel wordlist integration within Abiword spell checker.

The problem looked clean and attractive for a starter like me. I accomplished the task to a fair extent and I am writing this blog post to provide a generic idea on how Abiword spell checker works and the fundamental building blocks and wrapper code underneath it.

The various components involved are: (in a layered fashion)

Spell Manager

——————

Spell Checker

——————

Enchant

——————

Aspell     |    Myspell   |   Ispell

 

  • The spell manager class assists in initiating language detection and loading corresponding dictionaries in the underlying spell checking library.
  • The spell checker class is responsible for analyzing words and detecting mispelled words. In addition it initiates word suggestion from underlying spell checking library.
  • Enchant is a wrapper used for configuring and exercising spell checking libraries like Aspell, Myspell and Ispell.
  • Enchant and spell checking libraries are system programs that enable Abiword’s spell checking functionality.

In order to integrate a custom wordlist like OpenMedSpel with Abiword spell checker, it is enough if we include the list in the dictionary of one of the spell checking libraries. Going by alphabetical order and impressive documentation, I chose Aspell for this prototyping task.

I did the following changes (hacks) to get things going:

1)Downloaded OpenMedSpel wordlist from:
http://www.e-medtools.com/openmedspel100.zip

This had the following files:

vidhoon@vidhoonv:/usr/lib/aspell$ ls -l ~/Downloads/openmedspel100
total 2968
-rw-rw-r– 1 vidhoon vidhoon 607041 Feb 14 2007 OpenMedSpel 100.csv
-rw-rw-r– 1 vidhoon vidhoon 558312 Mar 14 01:38 OpenMedSpel 100.txt
-rw-rw-r– 1 vidhoon vidhoon 1169 Feb 14 2007 README_OpenMedSpel.txt

2)The txt file in the download had DOS characters in it. Hence, I had
to do this:

$dos2unix OpenMedSpel\ 100.txt OpenMedSpelunix.txt

3) Now I created a wordlist for aspell using the command below:

$aspell –lang=en create master ./openmedspel.rws < /Downloads/openmedspel100/OpenMedSpel\ 100.txt

This documentation link was really helpful:
4) After this, I did a locate to find aspell in my system:

/usr/lib/aspell

In this location I could find all “multi” dictionary files and “rws” lists included in them.

A dictionary is composed of multiple word lists. “.multi” files represent such dictionaries and they contain wordlists which make up the dictionary.

I copied the openmedspel.rws list to this location.

5) I took the en_US.multi (since OpenMedSpel wordlist is also USA english) and found it to contain “en_US-wo_accents.multi”
Then I opened en_US-wo_accents.multi and added the new wordlist created as shown below:

vidhoon@vidhoonv:/usr/lib/aspell$ cat en_US-wo_accents.multi
# Generated with Aspell Dicts “proc” script version 0.60.2
add en-common.rws
add en_US-wo_accents-only.rws
add openmedspel.rws

6) I did a locate for enchant on my local system:

/usr/share/enchant

Enchant maintains ordering of spell checking libraries for global and private users.
This ordering file determines which spell checker is chosen for a specific language.

If multiple libraries support a particular language, then the first specified spell checking library is chosen.
I learned more about this file from this link.

7) I found the “private” enchant.ordering file and hacked this file to place “aspell” ahead of
“myspell” from the ordering so that aspell gets picked for en_US and this would contain OpenMedSpel wordlist also.

vidhoon@vidhoonv:/usr/lib/aspell$ cat /usr/share/enchant/enchant.ordering
*:aspell,myspell,ispell //-> order changed in this line
fi:voikko,ispell,myspell,aspell
fi_FI:voikko,ispell,myspell,aspell
he:hspell,myspell
he_IL:hspell,myspell
yi:uspell
tr:zemberek
tr_TR:zemberek

Now I can see that abiword spell check does not underline words from OpenMedSpel list which indicates that the goal is achieved. That is, Abiword does spell check for English US – normal words and OpenSpelMed words.

I have attached screenshots of abiword illustrating spell check and suggestion.

Image

Image