Major Matters
AI in Financial Services
Module 2 of 6
Module 2

AI in Fraud and Risk

Real-time fraud scoring. Behavioural biometrics. Network effects. Why every 1 percent improvement saves millions


The Real-Time Fraud Scoring Challenge

A customer swipes their card. The transaction hits the payment processor. The fraud detection system has sub-100 milliseconds to decide: approve or decline. Reject it as fraud and the legitimate customer loses their sale. Approve it and let fraudsters through and the merchant loses money. The margin for error is measured in milliseconds and measured in millions of dollars.

This is the hard problem of production fraud detection. Academic fraud detection papers optimize for accuracy on historical datasets. Production fraud detection optimizes for speed, accuracy, and business value in real time. A model that takes 5 seconds to score a transaction is useless. A model that is 99 percent accurate but declines 5 percent of legitimate transactions costs more in false declines than fraud prevention saves.

Real-time fraud scoring requires making high-stakes decisions in milliseconds with incomplete information. The entire feature computation pipeline, model scoring, decision logic, and risk response must execute in parallel. This is not machine learning. This is systems engineering that happens to use machine learning.

The complexity is real. A card network like Visa or Mastercard processes hundreds of millions of transactions daily. A fintech lender processes millions of underwriting decisions. Each decision requires dozens or hundreds of features computed in real time. Each feature requires lookups across distributed systems: customer databases, transaction histories, device databases, network intelligence. All of this must happen in milliseconds.


Real-Time Feature Computation Within the Authorization Window

The authorization window is the time between when a transaction is submitted and when the authorisation decision must be returned. For card transactions, this window is typically 50-100 milliseconds. For instant payments, it might be 10-50 milliseconds. The fraud detection system must compute every feature it needs within that window. No shortcuts. No lazy evaluation. Everything computed and ready to feed to the model.

Velocity Signals: How Fast, How Often, How Many

The simplest and often most effective fraud signals are velocity signals. How many transactions has this card had in the last hour? The last day? What is the average transaction size, and does this transaction match the average? Is the frequency of transactions accelerating (carding attack)? Has the customer ever made this many transactions in this short a window before?

Velocity signals are cheap to compute. Most require only a counter and a rolling window. A customer's transaction count is maintained in a high-speed cache. The fraud system increments it with each transaction. But velocity signals are also easy to game. An attacker can spread fraudulent transactions across time or across multiple cards to avoid triggering velocity thresholds. An attacker can find a stolen card with legitimate transaction history and make fraudulent purchases in the pattern of normal transactions.

Device Fingerprinting: Is This the Right Device?

Device fingerprinting identifies unique characteristics of the device making the transaction: the phone or computer. On a mobile app, the device fingerprint comes from the phone: device ID, OS version, device type, browser headers. On a website, it comes from browser signals: user agent, IP address, screen resolution, timezone, installed fonts. On a card network, it comes from the payment terminal or POS system.

Device fingerprinting is effective because fraudsters typically operate from different devices than legitimate customers. A stolen card used at a Target cash register is a different device from the legitimate cardholder's phone. A compromised account accessed from a botnet in Eastern Europe is a different device than the legitimate customer's laptop. Device consistency is a strong fraud signal.

The catch: device fingerprints can be spoofed. A determined attacker can clone device identifiers, rotate proxies to change IP addresses, use residential proxies to look like home networks. Sophistication matters. A low-effort attacker can be caught by device consistency. A sophisticated attacker will have invested in device spoofing infrastructure.

Behavioural Biometrics: Mouse Movements, Typing Patterns, Session Analysis

Behavioural biometrics captures how a user interacts with the system, not who the user is. Mouse movement patterns (speed, acceleration, curves), keyboard typing rhythms (time between keystrokes, keystroke pressure patterns), touch patterns (pressure, speed, accuracy on mobile), and browsing patterns (page views, scroll speed, time spent on pages) all vary from person to person.

A legitimate customer logs into their account from their home laptop. The mouse movements are consistent with that customer's historical pattern. The typing rhythm is consistent. A fraudster compromises the account and logs in from a botnet. The mouse movements are rigid and mechanical. The typing rhythm is perfectly consistent (a bot, not a human with natural variation). The patterns are detected as anomalous.

Behavioural biometrics is powerful but expensive to compute in real time. Extracting mouse movement patterns requires tracking pixel-level coordinates in real time. Processing this stream to detect deviations from baseline patterns requires comparing against historical patterns stored in low-latency databases. For platforms like Sardine, behavioural biometrics has become a core differentiator in synthetic identity and account takeover fraud detection.

Geolocation: Is This Where We Expect You To Be?

Geolocation signals answer whether a transaction is happening where it should be. If a card registered in New York is used to make a purchase in Las Vegas, that is expected (travel). If a card in New York is used in Tokyo 3 hours later, that is impossible (unless the cardholder teleported) and very suspicious. Impossible geographic jumps are strong fraud indicators.

Geolocation comes from IP address (where is the request coming from), GPS coordinates (mobile only), or merchant location (for in-person transactions). The comparison is against historical patterns: where has this card been used before, what are the geographic boundaries, what is the travel speed between last transaction and current transaction.

The limitation: attackers use proxies and VPNs to change their apparent geographic origin. A fraudster in Russia uses a residential proxy in New York to make the transaction appear to originate from New York. Geolocation becomes useless if it is easily spoofed. But combining geolocation with other signals (device consistency, behavioural patterns) makes spoofing much harder. A single signal is easy to fake. A combination of signals is harder.

Transaction Graph Analysis: Is This Card Part of a Ring?

The frontier of fraud detection is network-level analysis. A single fraudulent card is one risk. But when 100 fraudulent cards are all linked through the same merchant account, the same device fingerprint, the same IP address, or the same customer profile data, we have a coordinated fraud ring.

Transaction graph analysis builds a network of relationships: this card has been used with these 10 other cards at the same merchant. This device has been used to make accounts with these 50 email addresses. This email address has been used to register accounts at these 20 merchants. By analyzing the graph structure, sophisticated fraud detection systems identify rings that individual transaction analysis would miss.

Graph analysis is expensive computationally. Building and querying a graph of billions of transactions and entities requires sophisticated infrastructure. But the signal is powerful. An attack that evades single-transaction detection (by spreading out velocity signals or randomizing device characteristics) can be caught at the graph level because the attacker is still connecting together their fraudulent resources.


Sardine and Sift: Behavioural Biometrics and Network Effects

Two specialized fraud detection platforms illustrate different approaches to AI-driven fraud prevention in production.

Sardine: Behavioural Biometrics and Synthetic Identity Detection

Sardine focuses on behavioural biometrics and synthetic identity fraud. The insight: synthetic identities are created by fraudsters acting in patterns. They open accounts. They verify identity. They make transactions. The patterns are different from legitimate customers opening accounts for the first time.

Sardine captures mouse movements, typing patterns, touch interactions, and session analysis. A genuine user opening an account takes time, thinks about things, scrolls back and forth. A fraudster opening 100 synthetic identity accounts in parallel uses automation: fast, predictable, mechanical. The patterns differ. Sardine's models identify the mechanical patterns as synthetic identities or account takeovers.

The value is highest at onboarding. Detecting fraud after a customer has integrated into your system is expensive. Stopping a synthetic identity at account opening prevents all downstream fraud. Sardine's onboarding-focused approach prevents the fraud before it happens.

Sift: Trillion-Event Data Network and Network Effects

Sift operates a network where fraud detection is powered by collective intelligence. Sift has visibility into transactions from thousands of merchants worldwide. A card tested at one merchant is often tested at dozens of others simultaneously as part of a carding attack. Sift sees all of these tests and knows they are coordinated.

Sift's trillion-event data network creates a network effect moat. The more merchants on the platform, the more events Sift sees. The more events, the more patterns Sift can detect. The better the detection, the more merchants want to join. A merchant on Sift benefits from seeing fraud patterns visible only to Sift (because Sift sees activity across thousands of merchants). A merchant not on Sift only sees fraud from their own customers, missing the network-level patterns.

This is why fraud-as-a-service platforms have become dominant. Individual merchants cannot build trillion-event networks. Sift can. Individual merchants benefit from using Sift rather than building their own system.


Ensemble Methods: Rules, Models, and Network Signals Combined

No single fraud signal is perfect. Rules fail when attackers adapt. Models learn patterns but miss edge cases. Network signals are powerful but require access to network data. The best fraud detection systems combine all three.

Rule-Based Detection: Known Attack Patterns

Rules capture known fraud patterns. If the transaction amount exceeds a threshold and the account age is less than 24 hours, block it (new account, large transaction, likely fraud). If the card is from a known blocked country and the merchant is high-risk, flag it for review. If the device fingerprint matches a device blacklisted for fraud, decline the transaction. Rules are fast, explicit, and easy to explain to customers.

Rules are reactive. You write them in response to attacks you have seen. A new attack vector hits, you do not have a rule for it, you get caught flat-footed. But combined with machine learning, rules catch obvious fraud while ML catches subtle patterns.

Machine Learning Models: Pattern Detection at Scale

ML models learn from historical transaction data which combinations of features indicate fraud. A transaction with a velocity spike, from a new device, in a new geography, from a country with high fraud risk, to a merchant category known for fraud, gets a high fraud score. The model learns the weights without explicit programming. The pattern is emergent from the data.

ML models can adapt as patterns change. Historical rules captured fraud patterns from last year. ML models can be retrained on current data and detect current patterns without requiring explicit rule updates. But ML models are expensive: they require training data, they require retraining as patterns shift, and they require monitoring for model decay.

Network Intelligence: What Happened Everywhere

Network-level signals answer the question: has this card, device, or customer been associated with fraud elsewhere? If a card has been marked as fraud 100 times across 50 merchants, the next merchant trying to process a transaction from that card should know. Network intelligence provides that knowledge.

Network signals are powerful because fraud is coordinated. The same card is carded at multiple merchants. The same device is used for multiple fraud attacks. The same synthetic identity is replicated across multiple merchants. Seeing the coordination reveals the fraud.

Ensemble Voting: Combine and Decide

The ensemble combines signals. A transaction is scored by the rule engine (high/medium/low risk), the ML model (fraud probability), and network intelligence (has this been seen before). The ensemble votes. If all three agree (high risk), decline the transaction. If there is disagreement (rules say low risk, model says high risk, network says unknown), escalate to human review or apply a threshold decision (if two out of three say high risk, decline).

The ensemble is more robust than any single signal. If the rules miss something, the model catches it. If the model is overfit to old patterns, the network signal catches the attack. If the network is lagging, the model provides real-time detection.


Adaptive Models: Learning From New Fraud Patterns

The fundamental challenge in fraud detection is that the problem changes over time. New fraud tactics emerge. Attackers adapt. Old patterns stop working. Models trained on 2025 fraud may not recognise 2026 fraud.

Online Learning and Streaming Updates

Traditional models are retrained in batch: collect all data from the past week, retrain the model, deploy the new model. Online learning updates the model in real time as new data arrives. Each new transaction is used to update the model immediately. The model learns new fraud patterns without waiting for a weekly retraining cycle.

The catch: online learning models can be manipulated by adversarial input. A fraudster could intentionally submit transactions that trick the online learning algorithm into learning harmful patterns. Safeguards are required: learning rates are low (do not overfit to new data too quickly), updates are bounded (individual transactions do not swing the model too much), and monitoring tracks whether model performance is degrading due to adversarial input.

Feedback Loops and Ground Truth

Adaptive models require feedback. The system makes a decision (approve or decline). The outcome is observed (was it fraud?). The model uses the outcome to update. But outcomes are delayed and incomplete. You decline a transaction today because it looks like fraud. You do not know for 30 days whether it actually was fraud (if the cardholder disputes it). You approve a transaction and the cardholder might not notice the fraudulent charge for weeks.

The feedback loop is fundamental to model learning but imperfect. Some fraud goes undetected. Some legitimate transactions are marked as fraud incorrectly. Models must learn from noisy, delayed feedback. Weighting recent feedback higher than old feedback helps. Feedback that comes from reliable sources (external fraud report) weighted higher than feedback that is sparse or delayed.

Active Learning: Query the Hardest Cases

A model that is 99 percent confident about a transaction does not need human review. A model that is 51 percent confident does. Active learning prioritises cases where the model is uncertain, routes them to human analysts for review, and uses the analyst feedback to improve the model.

Active learning is efficient. Instead of reviewing every transaction, focus human effort on edge cases where the model is uncertain. The humans provide high-quality labels for the difficult cases. The model learns from these high-confidence labels and improves faster than if it learned from easy cases it was already confident about.


The Precision-Recall Trade-Off and Business Value

Fraud detection is not about maximizing accuracy. It is about optimizing the right business metric. Understanding precision, recall, and the precision-recall trade-off is understanding how to build a system that actually creates value.

The Precision-Recall Trade-Off in Fraud Detection
Score Impact Decision Threshold Precision Recall F1 Score (Balance) Business Optimal Very Permissive Moderate Very Strict Permissive Threshold High recall (catch fraud) Low precision (many false positives) Result: Lost revenue to false declines Strict Threshold High precision (few false positives) Low recall (miss fraud) Result: Lost revenue to fraud losses

Precision: Of the Fraud I Caught, How Much Was Actually Fraud?

Precision answers the question: when the model predicts fraud, how often is it right? If the model flags 1,000 transactions as fraud and 950 actually are fraud, precision is 95 percent. Precision is the cost of false positives. False positives are legitimate transactions flagged as fraud and declined, costing the merchant revenue and the customer frustration.

Recall: Of All the Fraud Out There, How Much Did I Catch?

Recall answers: when fraud happens, what percent does the model catch? If 1,000 fraudulent transactions occur and the model catches 850, recall is 85 percent. Recall is the cost of false negatives. False negatives are fraudulent transactions that slip through and cost the merchant (or consumer) direct fraud loss.

The Trade-Off: Precision vs. Recall

Precision and recall are inversely related. To improve precision (have fewer false positives), you can set a higher fraud threshold. Only flag transactions as fraud if the model is very confident. This reduces false positives, but increases false negatives. Fraudulent transactions with moderate fraud scores slip through.

To improve recall (catch more fraud), lower the threshold. Flag more transactions as fraud to catch fraud that is easy to miss. But this increases false positives. Legitimate transactions that look slightly suspicious get declined.

The Business Optimum Is Not Maximum of Either

The business optimum is not maximum precision or maximum recall. It is the threshold that maximizes business value. The value of catching 1 percent more fraud must outweigh the cost of declining 1 percent more legitimate transactions.

For a luxury retailer with high-value customers, false declines are devastating (customer loss, brand damage). The optimum is high precision, lower recall. For a commodity retailer with thin margins and high chargeback costs, fraud loss is devastating. The optimum is high recall, accept some false declines.

The business reality: every 1 percent improvement in precision saves millions to tens of millions for large merchants. A merchant processing $1 billion in volume annually at 3 percent fraud loss ($30 million) would pay millions for 1 percent improvement (reducing fraud loss to $29.7 million). This is why specialized fraud detection vendors command premium pricing. The ROI is enormous.


Credit Risk Modelling: From Logistic Regression to Gradient Boosted Trees

Credit risk is not real-time decision-making like fraud detection. A credit application might be scored once, underwritten, and approved or declined over hours or days. The ML problem is different: predict the probability of default over the loan lifetime.

From Linear Regression to Nonlinear Models

Traditional credit scoring used logistic regression: a linear model that outputs a probability between 0 and 1. The relationship between features and default was assumed to be linear: each additional year of income reduces default probability by the same amount. This assumption is often wrong. The relationship between feature and default is often nonlinear: additional income matters a lot for customers at the bottom of the income distribution, less for customers already well-off.

Gradient boosted trees capture nonlinear relationships without explicit programming. The model learns that income matters more for low-income applicants and less for high-income applicants. The relationship is learned from data, not assumed a priori.

Feature Importance and Regulatory Compliance

A credit decision is a regulated decision. Regulators require that credit decisions be explainable and non-discriminatory. A model that denies someone credit must be able to explain why, and the reasons must not be discriminatory (directly or indirectly).

Gradient boosted trees provide feature importance: which features mattered most for this decision. But even when a model is explainable, it must not discriminate. Feature like race cannot be used. Features like zip code can be discriminatory if they are proxies for race. A well-built credit risk model filters out obviously discriminatory features and monitors for proxy discrimination.

Alternative Data and Underserved Populations

Traditional credit scoring relies on credit history. But millions of people have no credit history (recently immigrated, young, never borrowed). Credit models exclude them. Alternative data (utility payment history, rent payments, employment history) can expand credit to underserved populations. ML models can combine alternative data with traditional data to make more informed decisions.

The result: previously unbanked populations get access to credit. Credit companies expand their addressable market. Accuracy might be slightly lower (alternative data is noisier than traditional data) but coverage is much higher.


Anti-Money Laundering: Rules vs. Machine Learning

Anti-money laundering (AML) is regulatory compliance at scale. Financial institutions must detect suspicious transactions (potential money laundering, sanctions violations, terrorist financing) and file reports with regulators.

Rule-Based AML: High False Positive Rate

Traditional AML systems are rules-based. If a transaction exceeds $10,000 (federal reporting threshold in the US), file a report. If the customer is from a sanctioned country, file a report. If a customer's transaction volume spikes 10x beyond normal, file a report. These rules are explicit, auditable, and compliant. They are also noisy. Financial institutions file millions of reports annually, most of which are false positives (legitimate transactions flagged incorrectly). The signal-to-noise ratio can be 95 percent false positives, only 5 percent actually suspicious.

ML-Based AML: Learning Suspicious Patterns

ML models learn suspicious patterns without explicit programming. A transaction is suspicious not because it exceeds a threshold, but because it deviates from the customer's behaviour. A customer who normally spends $1,000 per month suddenly spends $50,000 and sends it to a sanctioned country. The pattern is suspicious even though the transaction might look normal in isolation.

ML-based AML reduces false positives, allowing compliance teams to focus on genuine risk. But ML introduces new challenges: models must be explainable (regulators want to understand why something was flagged), models must not discriminate (flagging certain nationalities or religions is regulatory violation), and models must be monitored for decay (patterns of money laundering evolve).


Why Fraud Companies Command Premium Valuations

Specialized fraud and risk companies like Sardine, Sift, Unit21 command high valuations because the ROI is enormous. A payment processor saving 1 percent of fraud loss on $100 billion in annual volume saves $1 billion. The customer will pay millions annually for that. The LTV of a fraud customer is measured in tens of millions. Acquisition cost is small relative to LTV. The business model works.

The moat is data and network effects. More transactions mean better models. Better models attract more customers. More customers mean more data. The leader in fraud detection (the company with the most transactions and most data) has an insurmountable advantage over competitors with less data. This is why Sift can charge premium prices. The trillion-event network cannot be replicated by a competitor with billions of events.

If your fraud detection improves by 1 percent tomorrow, how much revenue would that unlock? What would you be willing to pay for that 1 percent improvement?

Key Takeaways

Fraud Score
Numerical probability (0-1) that a transaction or customer is fraudulent. Produced by ML model or rules engine. Higher score indicates higher fraud likelihood.
Precision
Of the transactions flagged as fraud, what percentage actually were fraud? Precision = True Positives / (True Positives + False Positives).
Recall
Of all fraudulent transactions, what percentage did the system catch? Recall = True Positives / (True Positives + False Negatives). Also called sensitivity or detection rate.
F1 Score
Harmonic mean of precision and recall. Balances false positives and false negatives. Single metric for evaluating model performance on imbalanced problems like fraud.
Ensemble Method
System combining multiple fraud signals: rule-based detection, ML models, network intelligence. Votes on final decision. More robust than any single signal.
Behavioural Biometrics
Fraud signals derived from user interaction patterns: mouse movements, typing rhythm, touch patterns, session analysis. Distinguishes humans from bots and legitimate users from fraudsters.
Network Effect
Moat created when platform value increases with more users. Fraud networks see more data with more merchants, enabling better detection, attracting more merchants.
Device Fingerprinting
Unique identifier for device making transaction. Captures device ID, OS, browser, IP, screen resolution. Changes between legitimate customer's device and fraudster's device.
Velocity Signal
Fraud indicator based on transaction frequency or volume over time. High velocity (many transactions in short window) indicates possible carding or account takeover attack.
False Positive
Legitimate transaction flagged as fraud and declined. Costs merchant revenue and customer churn. False positive rate directly impacts customer experience and business metrics.
False Negative
Fraudulent transaction allowed through and not detected. Costs merchant fraud loss. False negative rate directly impacts fraud losses.
Credit Risk Model
Predictive model that estimates probability a borrower will default on loan. Used for underwriting and pricing decisions. Gradient boosted trees improved accuracy over traditional logistic regression.
Next Module
AI in Compliance and RegTech
Transaction monitoring automation. SAR generation. Sanctions screening. KYC automation. Regulatory requirements and the RegTech category.