Sarah Palin’s coke habit, Elvis spottings on Hollywood Boulevard, and the moon landing conspiracy…these are some of the things you’d be seeing on Wavii if we believed everything that came in through our crawler.
It might seem obvious to you that these stories can’t be trusted, but one of the things that makes Wavii unique is that machines are creating the content for our feed, not people. This means we have to build systems that can make judgment calls about what info to trust, and what info to discard or put on hold to verify later.
How does our system know what to trust?
Sometimes it’s useful to think of our system as a small child learning about her world. Like this child, a machine learning algorithm can take everything it sees and hears into its view of the world. But, also like a child, the machine only knows what you’ve told it, and lots of things that are obvious to us adults aren’t so obvious to it.
Let’s pursue this analogy a little further and follow six-year-old Jane through a day in her life.
Jane
Jane’s day starts over breakfast one morning, when her older brother John tells her that the weatherman is predicting snow, and they are probably going to get an early dismissal from school. The “weatherman” sounds like a trustworthy source, but last week John told Jane that it was raining cats and dogs, so this is likely some kind of trick. Jane is learning…she decides not to believe him.
When she gets on the bus to go school a few minutes later she overhears someone in the back talking about snow. She doesn’t hear the full sentence, but she is certain that she hears the words “snow” and “early dismissal.” What are the odds, she thinks — especially after what John said this morning?
When she gets to school, two of her friends tell her that they heard there is going to be an early dismissal. And by snack time, there is an excited buzz around the classroom: everybody is talking about it! Jane overhears one teacher saying to another, “Right, we’re going home early. Preposterous!” This seems to be almost certain confirmation of Jane’s hopes — since the teacher never lies — except that Jane doesn’t know what “preposterous” means.
The day drags on, and, sadly, there’s no snow in sight. Jane wonders how something she heard from so many people is wrong. Then at recess she runs into her brother. He’s laughing his heart out and tells her that he started the rumor on the bus and it spread like wildfire! Jane definitely learned her lesson…
Wait, is Wavii just a bunch of six-year-olds?? I’m confused!
No, but our systems face the same two problems that Jane does:
- Incomplete information about the world: In the same way that Jane didn’t know whether her brother was joking, where her classmates were getting their information from, or what a “weatherman” was, so too do our systems operate with only a partial picture of what the world looks like.
- Noisy stream of data: Just as Jane had to deal with overheard snippets of sentences on the bus and unfamiliar vocabulary from her teacher, our systems sometimes get only fragments of things or see unfamiliar language.
So Jane did exactly what all of us do on a daily basis when trying to process what we see and hear:
- Used her prior experience to inform present decisions
- Updated her beliefs as she got each new piece of evidence
- Maintained an awareness of her own confidence in her beliefs
At Wavii, we incorporate these same strategies into making decisions about what news to post. Prior experience can tell our technology that the New York Times is more trustworthy than National Enquirer, or help it to decide whether “Barack Obama’s engagement in Afghanistan” is the same type of engagement as “Prince William and Kate Middleton’s engagement”. And like Jane, our tech will update its beliefs as it gets confirmation of a story from more sources — but also keep an eye out for the Johns of the news world. Finally, our tech can quantify its own confidence in its output based on factors like how much data it has seen about a subject and how well it has performed in the past.
We’ve found that thinking about how humans approach a learning problem can often provide a good starting point for brainstorming computational solutions. A small child’s perspective is especially useful because it forces us to trim down our assumptions. And who knows, if we’re lucky, in a few years our small child will grow up and start being an insolent teenager!



















































