Fraud detection and the false positive problem

One of the most useful and profitable applications of machine learning in finance is its application to the age-old problem of fraud. Since the days of the Medici banking family, bad actors have been trying to beat the system and get money for nothing.

One challenge of this business problem is that such actors are motivated to hide their footprints. It’s a cat and mouse game where fraudsters continuously come up with innovative ways to beat the system.

For this reason, the state of the art systems in this subject area are trained for anomaly detection rather than past patterns of fraud. If one trains their system to detect patterns of fraud, such system will only detect past observed patterns and will fail to detect the new and exciting ways criminals are discovering.

This approach is not without disadvantages. Many of today’s systems have a high degree of accuracy in detecting fraud but have a detrimental side effect. A high number of false positives. This creates a problem with two alternatives. Either legitimate transactions will be denied. This is not the desired outcome. Or the generated queue of positive hits or alerts will have to be further analyzed by human operators, which depending on the queue size is a huge undertaking in time and capital and it has the added side effect of potential error which again leads to legitimate transactions being denied.

Baseline model

How do we solve this? Initially, the answer might seem to find a solution that minimizes false positives. When it comes to fraud detection cases, arguably (1) the only thing worse than false positives is false negatives. In this domain, a false negative is a transaction that was fraudulent and was labeled as legitimate. Ergo, the bad guys got away with it. Visualize a system that has 99% legitimate transactions (and 1% fraud). We could generate a simpleton model that labels every transaction as legitimate. Such system would have 99% precision but 0% recall. Our clients would be very happy because none of their transactions got denied and our fraud detection department would be quite small. The bad guys would also be ecstatic. Not a good option. Let’s keep looking.

We have now established a (very poor) baseline. Let’s summarize some of the ways we can improve using artificial intelligence:

  • Improve the predictive model with one that yields a lower number of false positives AND a lower number of false negatives. In other words, improve the precision AND the recall.
  • Automate the handling of at least some of the alerts to minimize the number of analysts needed to handle.
  • Provide better tools to the analysts to increase their accuracy and productivity
  • Compute a danger score for the alerts in order to better prioritize, handle and route alerts. For example, an alert with a high danger score might be routed to a senior analyst versus one with a low score that can be routed to entry-level analysts.

Better modeling

For supervised learning cases such as the domain in question, our team of data scientists have been achieving impressive results using AutoML capabilities to generate models. Often (but not always) the “winning” model produced by these frameworks is a stacked ensemble combining several models. Some of the AutoML frameworks we recommend using to produce these models are:

We’ll save a deeper explanation of these tools and how they can be used for a future article. But in the meantime, we encourage the reader to visit the tools’ sites to learn more about this technique.

Automating the handling of alerts

For certain low priority alerts, there are opportunities, to use automation techniques to robo-analyze these alerts. In most of these systems, there are certain patterns that will keep on cropping up. In these instances, Robot Process Automation (RPA), can be appropriate to mechanically handle these cases. As an example use case, oftentimes, fraud agents need to communicate with a customer via phone to verify the validity of the transaction. Let’s say that a bank customer wants to transfer a large sum of money to a foreign bank. The customer has never performed a similar transaction of this amount or to this location before. Policies for such a transaction might prescribe a call the client to ensure that it’s not a fraudulent transaction. A simple case of using RPA would be to automate the process by using Google Duplex or Amazon Lex/Polly to interact with the client instead of using a live agent.

Improving the handling of alerts

Another way to enhance alert handling and minimize costs, improve productivity and increase analyst accuracy is to provide them with the right processing tools. This is more of a software engineering problem than a data science problem but it is important that it is addressed nonetheless. Some design elements that should be present in the system:

  • It should be intuitive
  • Once a determination is made, the analyst should be able to quickly mark an alert as valid or invalid with a minimum number of clicks and keystrokes.
  • All the information required to make decisions should be easily accessible by the analyst, preferably in one screen.
  • It can make use of your customers as an extension of your analyst staff. No one knows better than your customer if their card was used for nefarious purposes or not. As long as we don’t have too many false negatives, customers usually appreciate and don’t mind a text or a call asking them if a transaction is valid and it the case of true positives are quite grateful that you caught the bad guys before a fraudulent transaction occurred.

Prioritizing and Routing

Up until this point, we have been assuming that the result of our model is binary. The transaction is either legitimate or fraudulent. Another way to enhance the model is to slice the results in a more granular fashion. Some potential divisions are:

  • Clustering in a predetermined number of cohorts using unsupervised learning techniques such as k-means clustering.
  • A more fine-grained classification where only possible values are not just black or white, but shades of gray are permissible and the darker the color, the higher the scrutiny the alert will be given.
  • Use a numeric result on a continuous scale and bounded scale (for example, from 0 to 1000). We can think of this result as a “danger” score. For higher scores, a more rigorous process will be applied to determine if the transaction should be disallowed.

For the last two approaches, we can use the AutoML techniques previously mentioned.


There are many benefits, costs, and risks associated with fraud prevention. The judicious application of state of the art machine learning techniques can help your firm increase productivity, decrease costs and minimize headline, regulatory and other kinds of risk. Feel free to reach out to if you would like to discuss ways to implement these techniques to help your organization streamline your fraud detection processes.

(1) By some measures (as an example the Javelin Strategy Study), the losses incurred by denying legitimate transactions is higher on a 10:1 ratio compared to the amount lost on fraud. i.e., financial institutions lose a lot more money by denying legitimate transactions than by fraud.