James Stanley


I made a Rhyming Dictionary

Mon 17 April 2017

Some friends and I were trying to come up with rhymes for "mowing", to help us think of a witty name for a lawnmower racing team. This seemed like a job for a machine, so I looked online for something that might help.

RhymeZone seemed to be the best option. It sometimes comes up with some good rhymes, but it also misses some obvious ones, and gives some really bad ones. And it is slow and the interface is annoying.

It seemed like a fun project to try to implement an alternative rhyming dictionary, so I did. You can play with it here: Rhyming Dictionary.

I started with the Simplest Thing That Could Possibly Work: I ran Soundex on the input word and the dictionary words, and output the dictionary words that have similar Soundex output.

Since Soundex is mostly concerned with the starts of words, this didn't work very well at all. I tried various ways to fix this, and various other sounds-like algorithms, and ended up going down quite a rabbit hole. It all boiled down to "find words with suffixes that are spelled the same", which clearly is not adequate (e.g. "pause" rhymes with "paws" but doesn't even share a 1-character suffix).

My next thought was to use a text-to-speech engine to convert the input words to phoneme strings, and then compare suffixes of the phoneme strings. This worked much better.

eSpeak can output the phonemes it's generating (-x), and can suppress sound generation (-q). espeak -qx does exactly what we want: it takes words as input and gives phonemes as output, without actually speaking them. Perfect.

Example:

$ espeax -qx
hello world
 h@l'oU w'3:ld
pause
 p'O:z
paws
 p'O:z

I preprocessed all of the words in my dictionary (101K words) to find their eSpeak phoneme strings, and wrote a function to find rhymes. It simply takes the eSpeak phoneme string for the input word and proceeds as follows:

1.) Output all dictionary words (if any) that have this phoneme string as a suffix.
2.) Remove the first character from the phoneme string.
3.) Loop back to (1.) if any characters remain in the phoneme string.

This means those dictionary words that share longer suffxes of phonemes (i.e. the best rhymes) are output first.

I then wrote a small Mojolicious app to expose the rhyme-finding function over HTTP, and a user interface.

The end result is pretty good. It comes up with better rhymes than RhymeZone, more quickly, and with a less annoying interface. Great success.

Here's the link again: Rhyming Dictionary.

And the lawnmower racing team? Mowing Nowhere Fast.



If you like my blog, please consider subscribing to the RSS feed or the mailing list: