22 Apr 2016

Giving big data a voice

From Nine To Noon, 9:38 am on 22 April 2016

Could the secret to making the most of big data be… a voice ?

no caption

Photo: AFP

With applications like Siri we can now ask our phones for quick answers to simple questions, but the bigger game in town is Natural Language Generation – or NLG  – which enables big data to tell a rich and complex story.

Read an edited snapshot of Robert Dale and Kathryn Ryan's conversation:

We know our Google search will surf through a phenomenal amount of information and come back with a set of information. The next step that’s a bit more mind-bending is how the computer takes that information that’s processed and communicates it back to a human being in what we recognise as being language. How does that happen?

Really that’s driven by what we’ve come to learn about linguistics. It’s very much driven by science. And you can think about it as being codifying a capability that we all have naturally. So when we talk we know how to tell a story, we know how to put things together in an order that makes sense to the person we’re speaking to, we know what information we should emphasise, what information we should miss out, we know how to phrase things and describe things so that the person we’re talking to will understand us. And of course that’s going to depend on just who the audience is, so it’s very much tailored to the audience we’re talking to. All of those things we do effortlessly, but they are actually driven by rules in our heads. And if we can get at those rules, we can codify those rules and put them in software. And that’s essentially what we’re doing when we develop these kinds of applications.

How sophisticated is the language capacity of the computer?

Already very sophisticated. Where there are still challenges here is around the more nuanced aspects of language, like the use of analogies and metaphor and so on. We’re in a situation now where we know enough to be able to get the machine to talk very coherently and sensibly about factual information. So we can give the machine lots of data in many, many different domains. There is lots of work we do is in the finance domain. That's a very popular area for this technology right now.

How much is what you’re talking about here linking in to the so-called internet of things?

In the last few years the driver here has been big data, now it’s the internet of things. We are surrounded by sensors and pieces of machinery and trackers on our wrists and so on that are all spewing out all this data. Really what we’ve got there is a situation where machines are talking to machines and we are left out of the loop.  And in many cases – not all – but in many cases we’d like to be part of that conversation, too. For that to happen we have to understand what that data means. So the rule here for NLG is to explain data in the world of the internet of things – to overcome that tsunami of information, if you like, to convey what’s important to the people who own the data in the first place.