Making SpellStack Smart by Dumbing it Down

by Kwasi Mensah

October 3rd, 2013

Overview

A very rough mockup of SpellStack

A very rough mockup of SpellStack

SpellStack, the new word game we’re working on, is inherently a two player game. But what do you do if you’re on the subway, on a long trip, etc. So we knew we had to include an AI (artificial intelligence) in the game so you can play against the computer when no one else is around.

It actually wasn’t that hard getting the AI up and running. But very early in testing we realized a big problem.  The computer was just as likely to spell ‘apple’ as it was to spell ‘gerrymandering’.

Picture of a dog with a funny caption about being too smart.

“You may have a dog that won’t sit up, roll over, or even cook you breakfast, not because he’s too stupid to learn how, be because he’s too smart to bother.”
from http://www.flickr.com/photos/78428166@N00/8293186460/

SpellStack has a giant text file of all the words you’re allowed to spell. This is fine and dandy for human players, but for the computer it can’t tell which words are common and which words are rare. We tried doing things like limiting the length of the word, but it turns out there are some really tricky 5 letter words ( ‘zymic’ anyone?).

Thankfully through the amazing Boston Indies community and Darius Kazemi, twitterbot maker extraordinaire , I was pointed to Wordnik, a service that lets me query information about words (like how often they’re used!). Wrote a quick script, waited a long time for it finish, and voila, we have not only a list of words but an approximation of how often they’re used in the real world. This lets us control which words the AI is allowed to use for each difficulty level.

Nuts and Bolts

picture of nuts and bolts

from http://www.flickr.com/photos/microassist/6990640490/

This is meant for the programmers and might be getting super technical. You also might want to take a look at the Wordnik API which has the coolest documentation I’ve ever seen.

The only way to get meaningful values for how often a word is used was to used the count returned from words/search (word/frequencies didn’t seem to return reasonable values for fairly common words like ‘pizza’).

We originally wanted to just pull down all the words in Wordnik (words/search?query=*). But since its dictionary is made up by what’s actually used in the wild we found if a word got misspelled often enough it’d end up in the dictionary (make a quick google search for ‘Eqypt’). Even if we were smart about setting a minimum amount of times a word is used a lot of our more obtuse words weren’t found.

This meant we had to do a word search for every word in our dictionary. This ruby function took ~20 hours to finish! I’d give my machine specs but I’m pretty sure the bottleneck is talking to the Wordnik servers.

Post a Comment

You must be logged in to post a comment.