Dinosaur Comics!

What are the haps my friends

February 16th, 2011: Here is a funny comic (sorry about the tumblr mirror; I linked to it on Twitter yesterday and broke the site :( )

Okay so Watson is an IBM natural language computer on Jeopardy and he dominated the game last night! As someone who studied computational linguistics, it's really interesting. I don't know precisely how Watson works but it looks like they've got tons of texts that are used for knowledge and different algorithms for different types of text. Usually you'd set this up so that each algorithm reports its answer and its confidence level, and then another AI, an overseer AI, learns which algorithms it can trust in which situations, and if more report the same answer, you'd be more likely to trust that response. You'd usually do this through a machine learning algorithm: give it tons of examples from past shows and let it know what the right answer is, so it can learn and adjust itself. Then, test it on unseen cases (the show we're now seeing).

One thing I've noticed is that I'm pretty sure Watson is relying on having an enormous aggregate corpus rather than doing too much clever language stuff. Last night there were a few questions (including the one for Final Jeopardy) that were phrased unusually, and if you'd be doing keyword searches and trying to figure out what's being said from that, you'd end up with the wrong answer. And sure enough, on those questions Watson was stumped!

This is not to belittle what IBM's done / is doing, but the ads they show interspersed with the game show are acting like THE PROBLEM OF SEMANTICS IS NOW SOLVED, and COMPUTERS NOW KNOW WHAT LOVE IS. I'd love for that to be the case, but it looks like while Watson knows what words are and how they relate to other words, the differences we see between the words "mistake", "error", "blunder" and "goof" aren't known by Watson. If he's lucky he'll have texts that disambiguates them in his database, but there'll always be questions that aren't found in your corpus and then you have to rely on intelligence, not just source material.

I'm writing this not because there's any great insights here, but because the news covering the game that I've seen has focused on "COMPUTERS CAN WIN CHESS AND JEOPARDY, BUT AT LEAST THEY'LL NEVER PAINT A PAINTING OR COMPOSE A SONATA", which is frankly ridiculous. Computers can do all those things; they're just doing them differently than we do, because doing them as we do is really hard and we haven't figured it out yet. Give it time.

One year ago today: originally i wrote "nobody knows who first played hamlet", but we do! it was richard burbage. good work, richard!

– Ryan