Making SpellStack Smart by Dumbing it Down
by Kwasi Mensah
October 3rd, 2013
SpellStack, the new word game we’re working on, is inherently a two player game. But what do you do if you’re on the subway, on a long trip, etc. So we knew we had to include an AI (artificial intelligence) in the game so you can play against the computer when no one else is around.
It actually wasn’t that hard getting the AI up and running. But very early in testing we realized a big problem. The computer was just as likely to spell ‘apple’ as it was to spell ‘gerrymandering’.
SpellStack has a giant text file of all the words you’re allowed to spell. This is fine and dandy for human players, but for the computer it can’t tell which words are common and which words are rare. We tried doing things like limiting the length of the word, but it turns out there are some really tricky 5 letter words ( ‘zymic’ anyone?).
Thankfully through the amazing Boston Indies community and Darius Kazemi, twitterbot maker extraordinaire , I was pointed to Wordnik, a service that lets me query information about words (like how often they’re used!). Wrote a quick script, waited a long time for it finish, and voila, we have not only a list of words but an approximation of how often they’re used in the real world. This lets us control which words the AI is allowed to use for each difficulty level.
Nuts and Bolts
This is meant for the programmers and might be getting super technical. You also might want to take a look at the Wordnik API which has the coolest documentation I’ve ever seen.
The only way to get meaningful values for how often a word is used was to used the count returned from words/search (word/frequencies didn’t seem to return reasonable values for fairly common words like ‘pizza’).
We originally wanted to just pull down all the words in Wordnik (words/search?query=*). But since its dictionary is made up by what’s actually used in the wild we found if a word got misspelled often enough it’d end up in the dictionary (make a quick google search for ‘Eqypt’). Even if we were smart about setting a minimum amount of times a word is used a lot of our more obtuse words weren’t found.
This meant we had to do a word search for every word in our dictionary. This ruby function took ~20 hours to finish! I’d give my machine specs but I’m pretty sure the bottleneck is talking to the Wordnik servers.