Data Science TruNarrative 2st Century

Written by John Lord

History of data science: 21st century and the future

In the first post of this series, ‘History of data science: pre-20th century’, we looked at data science from 5,000 years back and beyond. In the second post, ‘History of data science: 20th century and the modern computer’, we explained the modern data science revolution.

In this final post, we will explore the current state of data science, and look to the innovations taking place right now, and how they will affect the future of this space.

Big data

Between the dawn of civilisation and 2003, we only created five billion gigabytes; now we’re creating that amount every two days.” — Hal Varian, Chief Economist at Google

Arrivals TruNarrative

The 21st century is the century of big data. As more and more of our lives move onto the Internet we are creating ever increasing amounts data every day; whether we chose to or not. Data on and about you and your behaviour is created, collected, and processed every time you use your smartphone or purchase a product or service. Your digital footprints are likely now scattered across computers all over the world.

Browse our (or any) website and your IP address, the type of phone you use, and the pages you visit are all logged in the system. Purchase a new bed on Amazon and through your purchase history Amazon will predict that you might also want to buy sheets and pillows. Use a smartphone app to buy a cup of coffee at your favourite coffee shop, and the shop now knows when and where you buy your coffee, what coffee you like, and whether you are a loyal customer – or not.

With the advent of big data, it is infeasible for humans to search and utilise all of this valuable data manually. As the team described in previous blog posts, credit issuing organisations are now struggling to manage the volume of applications they are receiving. And, as my colleague Nick has blogged, fraudsters are devising increasingly ingenious and fruitful methods of defrauding organisations. TruNarrative’s mission is to use big data provide actionable insight into exactly who, what, and when people are legitimately representing themselves to our customers, and when there is a reason to suspect fraud.

Machine Learning TruNarrative

Machine Learning

You may have read about Google’s recent success in teaching a computer how to play the Japanese strategic board game of Go. Similar to chess, Go is a game that takes significant ability to play, and it was predicted that the game was too complicated for a computer to master in the foreseeable future. The prediction was wrong. Google’s AlphaGo recently completely dominated Go Grandmaster Lee Se-dol, winning four games and losing only one.

You may have read about Google’s recent success in teaching a computer how to play the Japanese strategic board game of Go. Similar to chess, Go is a game that takes significant ability to play, and it was predicted that the game was too complicated for a computer to master in the foreseeable future. The prediction was wrong. Google’s AlphaGo recently completely dominated Go Grandmaster Lee Se-dol, winning four games and losing only one.

For AlphaGo to achieve Grandmaster-level ability, Google obtained every bit of data it could find in the game, and then used state-of-the-art machine learning algorithms to learn the rules and winning strategies. Google did not program AlphaGo on how to play Go; it programmed AlphaGo on how to learn to play Go; much like how a human would learn to play the game. This is machine learning using big data.

Google AI Go

As an example of how machine learning is impacting our everyday lives, consider Amazon’s Echo or Google’s Home voice recognition system which can be purchased for less than £150 and sits in your home listening for your commands.

These systems, while not perfect, are the result of intense research into big data and machine learning. Amazon and Google have collected massive sets of voice to text transcriptions and applied machine learning algorithms on an unprecedented scale. Again, the voice recognition systems were not programmed explicitly; they were programmed to learn how to listen to human speech and predict what the speaker was likely saying.

Soon they’ll be correcting our pronunciation and grammar, much as spell checkers automatically correct our spelling today.

How we use Machine Learning at TruNarrative

TruNarrative uses machine learning algorithms to train computers to detect fraud, like how Google taught AlphaGo to learn how to play Go. Machine learning is ideally suited to finding hidden patterns in data, such as fraudulent behaviour. Not only can machine learning find existing patterns of fraud in historical data, but it is also possible for machine learning to discover novel patterns of fraudulent conduct.

Fraudsters are continually modifying their methods to avoid detection, so it is vital that TruNarrative always remains one step ahead of our adversaries. TruNarrative’s data consortium, spanning multiple organisations, industries, and countries; provides the big data that makes machine learning capable of successfully detecting fraud. Greater data enables better fraud detection for all our customers.

Rules based systems are – and will remain for a very long time – an excellent weapon to counteract fraud. With a platform like TruNarrative, they can be adapted within seconds and allow subject matter experts (SMEs) complete control of how they work. However, they have downsides; they are operationally expensive to maintain (the SMEs require salaries), can only work on specific data points, rather than wider patterns, and they are entirely reactive – rules are not written until the company has suffered one or more losses.

Machine Learning has shown itself to be very adaptable to detecting frauds. They can be used to identify patterns as well as specific data points, they do not have any form of unconscious bias and once in place, do not need significant investment. But they can be complex to put in place initially, and they tend to be optimised for one particular target. For example, they can be so optimised for absolute detection rates that they generate false positives, or can be so tailored for the reduction of false positives, that they can miss frauds.

TruNarrative blends these approaches in a way that maximises the benefits while almost eliminating the issues associated with the aforementioned systems. We use an approach called a “Jacquard”.

With different Machine Learning models come varying levels of optimisations. For example, optimal catch rate or minimum false positives. We use different families of Machine Learning algorithms to do this. Instead of using one or the other, we use both and look at the results within a ‘Weights of Evidence’ system – where we essentially let the different algorithms compete against each other.
We take a similar approach with rules.

We encourage our clients to build multiple sets of rules, for example, a set to detect fraud, and a separate set to identify ‘honesty’. As with the Machine Learning, these multiple rule sets can be configured to compete against each other – another Jacquard.

As a final step, we then balance out the best outcomes from the rules against the best results from the Machine Learning systems. Once again, we let them compete. This is our third Jacquard.

The result of this method is a ‘best of both worlds’ outcome. We blend the expertise of the SMEs in the fraud rules world with the power of Machine Learning systems. We almost entirely eliminate the problems of over-fitting and over-optimisation that can come from Machine Learning, yet the outcomes are still explainable (a critical element to regulators in many countries we operate in).
Through our unique process and our three Jacquards, the outcome is that our platform produces a very high catch rate, but with a very low false positive rate as well.

Get started with TruNarrative today and sign up for our free monthly Fraud Analytics Reporting.