Category Archives: Uncategorized

Integrating OpenMedSpel wordlist in Abiword spell checker


I started working on Abiword a few weeks back. I felt comfortable exploring the code base (some difficulty with svn – an exception) and was looking for something interesting to get started on. Browsing through abiwiki I found a suggestion asking for OpenMedSpel wordlist integration within Abiword spell checker.

The problem looked clean and attractive for a starter like me. I accomplished the task to a fair extent and I am writing this blog post to provide a generic idea on how Abiword spell checker works and the fundamental building blocks and wrapper code underneath it.

The various components involved are: (in a layered fashion)

Spell Manager

——————

Spell Checker

——————

Enchant

——————

Aspell     |    Myspell   |   Ispell

 

  • The spell manager class assists in initiating language detection and loading corresponding dictionaries in the underlying spell checking library.
  • The spell checker class is responsible for analyzing words and detecting mispelled words. In addition it initiates word suggestion from underlying spell checking library.
  • Enchant is a wrapper used for configuring and exercising spell checking libraries like Aspell, Myspell and Ispell.
  • Enchant and spell checking libraries are system programs that enable Abiword’s spell checking functionality.

In order to integrate a custom wordlist like OpenMedSpel with Abiword spell checker, it is enough if we include the list in the dictionary of one of the spell checking libraries. Going by alphabetical order and impressive documentation, I chose Aspell for this prototyping task.

I did the following changes (hacks) to get things going:

1)Downloaded OpenMedSpel wordlist from:
http://www.e-medtools.com/openmedspel100.zip

This had the following files:

vidhoon@vidhoonv:/usr/lib/aspell$ ls -l ~/Downloads/openmedspel100
total 2968
-rw-rw-r– 1 vidhoon vidhoon 607041 Feb 14 2007 OpenMedSpel 100.csv
-rw-rw-r– 1 vidhoon vidhoon 558312 Mar 14 01:38 OpenMedSpel 100.txt
-rw-rw-r– 1 vidhoon vidhoon 1169 Feb 14 2007 README_OpenMedSpel.txt

2)The txt file in the download had DOS characters in it. Hence, I had
to do this:

$dos2unix OpenMedSpel\ 100.txt OpenMedSpelunix.txt

3) Now I created a wordlist for aspell using the command below:

$aspell –lang=en create master ./openmedspel.rws < /Downloads/openmedspel100/OpenMedSpel\ 100.txt

This documentation link was really helpful:
4) After this, I did a locate to find aspell in my system:

/usr/lib/aspell

In this location I could find all “multi” dictionary files and “rws” lists included in them.

A dictionary is composed of multiple word lists. “.multi” files represent such dictionaries and they contain wordlists which make up the dictionary.

I copied the openmedspel.rws list to this location.

5) I took the en_US.multi (since OpenMedSpel wordlist is also USA english) and found it to contain “en_US-wo_accents.multi”
Then I opened en_US-wo_accents.multi and added the new wordlist created as shown below:

vidhoon@vidhoonv:/usr/lib/aspell$ cat en_US-wo_accents.multi
# Generated with Aspell Dicts “proc” script version 0.60.2
add en-common.rws
add en_US-wo_accents-only.rws
add openmedspel.rws

6) I did a locate for enchant on my local system:

/usr/share/enchant

Enchant maintains ordering of spell checking libraries for global and private users.
This ordering file determines which spell checker is chosen for a specific language.

If multiple libraries support a particular language, then the first specified spell checking library is chosen.
I learned more about this file from this link.

7) I found the “private” enchant.ordering file and hacked this file to place “aspell” ahead of
“myspell” from the ordering so that aspell gets picked for en_US and this would contain OpenMedSpel wordlist also.

vidhoon@vidhoonv:/usr/lib/aspell$ cat /usr/share/enchant/enchant.ordering
*:aspell,myspell,ispell //-> order changed in this line
fi:voikko,ispell,myspell,aspell
fi_FI:voikko,ispell,myspell,aspell
he:hspell,myspell
he_IL:hspell,myspell
yi:uspell
tr:zemberek
tr_TR:zemberek

Now I can see that abiword spell check does not underline words from OpenMedSpel list which indicates that the goal is achieved. That is, Abiword does spell check for English US – normal words and OpenSpelMed words.

I have attached screenshots of abiword illustrating spell check and suggestion.

Image

Image

Advertisements

Soon ‘To be’ sorry saga of GMAIL


GMAIL could easily slip into the role of protagonist in the movie “MY NAME IS GMAIL” and beat the glory of SRK… Thanks to its autistic sufferings in recent times.One of the earliest and innovative products that google introduced, GMAIL  is not so interesting anymore. Once upon a time, it did engage and addict leading to kill google’s own product WAVE which never hit the visible spectrum. But are you convinced that GMAIL is giving the best user experience for an email client (which by origin it is) ? then think twice!

Yes they (google folks) do change the layout and replace tabs  with sleek icons.. But is that all they can ? Not even close !!
Some silly ideas like this could easily create Gmail’s crusader:

1. Recent conversations:

Computer memories are designed based on the concept of locality of reference. We must not forget that computer theories did suck a lot of inspiration from human model to evolve. Why did GMAIL miss that we would need to see what we just saw a moment ago! Be it a conference pin or a payment reference number or a super cool daily deal..

Why can’t they put my ten most recently opened mails to aid my amnesia or whatever?

For me, this frustrates humongous times each day .. I find a conversation buried deep inside my inbox ( GMAIL search – I will punch on its face below) and when i am between it, my boss calls, “hey did you check my mail?” Without asking me, my lips blabber “yea just began on it” while my fingers guide the mouse to go to inbox within pico-fucking-seconds !

Poof! The conversation that I had dug from the center of earth is back at it’s rightful place 😦 damn it!

A simple pathetic looking <div> containing the recent mails that I opened could save mins of time we waste in the hours we spend within GMAIL!

2.Relevant conversations:

This not only sounds close to the previous feature, they are twin sisters may be! 🙂 Tell me, how often you endup wanting to  check the previous meeting notes while you read the latest notes?? Again, I go searching a conversation from my galactic inbox wasting a minute or second or whatever of my limited lifetime! 😦 We do generously stick labels to conversations.

When I am viewing a mail, why can’t they atleast (GMAIL)  pull out few mails that are tagged with the same label + and – say five days.

I am sure you can do better with mining.

Honest advice: stop looking for new ways to suck the bullshit data we got. Can’t you see that we give it for free to Mark zuckenberg! We wont mind doing the same to you. And Try utilizing the data you have already within.

3.Refer and compose:

Is it possible by any means to refer a conversation while I am composing a new mail simultaneously? I might want send my friend different quotes for a tour trip from different agents.

Why can’t they parallelize reading and composing?

There are many times where I need to collect different details from mail chains and draft a mail. They could very well introduce multi tasking environments like tabs or windows for gmail application to which most computer users are already addicted. ( thanks to Steve jobs and bill gates!! )

4.Auto suggest for mail search:

The power house of google is web search. But the mail search seems to be a blooper :p it is high time to incorporate content based search assistance rather than depending on the persistence of user’s memory.

Why should I always remember the exact subject to pick a mail?

Can’t there be an auto suggest atleast for subject column based mail search. I am sure a lot more can be done in this space.

5.All attachments at one stop:

The development of mail system protocols have enabled almost anything to be attached to a mail message.

But why cannot there be a one stop shop to search through all my mail attachments?

An attachment explorer could be handy in listing all of them ordered / categorized by label, size,type, date, subject etc. This would also provide additional scope for attachment based search and auto suggestion.

Some silly comments on the latest GMAIL interface update:

The most irritating thing is the new chat button to toggle between chat widget / labels.

If my grandma was alive, she would definitely be grunting every single time she tried to chat with me.

Please remember that unlike Google +, gmail does have users outside google employee base. So it would be better to have buttons and dynamic flow designs suiting the needs of a common man!

The verdict is always simple. To do or die. Either you provide value to the customer adapting to the needs of time. Or kill yourself. As of today, nobody needs a jack of all trades in web services. If you are a master of one particular product/service, then there always is a loyal user base.  A GMAIL user never cares about your giant android cloak or your rugged social network suit.

Focus and differentiation will continue to be the basis of a good business whether it is Steve Jobs or Bill Gates or Jeff Bezos who controls the climate of silicon world.

                  

The new Documentation awareness


Certainly, this should have happened earlier!

Yes, there are no excuses for not recording my past research and extra academic (mostly useless :p ) work! But before a week it finally struck my dumb head to do this as soon as possible.

I have created a few blogs to record what I am working on. Feel free to take a look:

My research work at IITM

My Final year project work at CEG

It’s actually a good thing to record daily updates on such pages because they naturally monitor our progress and reflect our work style. So things become visible for correction. 😀 In addition it becomes a documentation of various sources used which is indeed a time saving resource for our future work.

Happy blogging!

regards,

Vidhoon V

 

 

%d bloggers like this: