Fraud Detection System

An AI-powered system that catches 84% of fraud while keeping false alarms under 0.05%, deployed in <50ms

Annual Savings

2.7M

Annual Savings

Fraud Caught

83.8%

Fraud Caught

Alert Accuracy

75.2%

Alert Accuracy

What I Built

Fraud detection system to test feature performance across multiple algorithms and optimize for the highest fraud detection rate

The dataset revealed the core challenge: while credit card fraud costs businesses $32 billion annually, only 0.17% of transactions are actually fraudulent

This extreme imbalance makes some traditional approaches ineffective, with most systems either missing fraud or drowning analysts in false alarms

Selecting a right approach

Instead of jumping straight to testing algorithms. I started by asking: 'What makes a transaction suspicious?' This human-centered question shaped everything that followed.

Data Analyses

Analyzed 284K transactions to uncover risk patterns

Feature engineering

Created 21 custom features combining domain knowledge with statistical methods

Algorithm testing

Compared three algorithms and selected XGBoost

Business results

Calculated $2.7M annual value and performed segment analysis to translate model performance

Data Analyses

Transactions

Analyzed 284K transactions over 2 days

Outliers

Discovered isolation forest outliers had 217× fraud concentration

High risk

Identified night transactions = 3× higher risk

Feature engineering

Created 21 custom features in 3 tiers. Top engineered feature (pca_magnitude) became #1 most important (34.5% model weight)

Statistical

pca magnitude
log amount, amount zscore
hour sin, hour cos, is night
Isolation Forest outlier scores

Domain Specific

amount percentile
is round_amount
V14 amount interaction

Advanced

distance to fraud
feature entropy
dominant feature value

Algorithm testing

Compared 3 algorithms and selected XGBoost: 83.8% recall, handling extreme class imbalance

AlgorithmRecallPrecisionROC-AUCStatus
Logistic Regression79.4%63.2%0.951Lower recall
Random Forest81.7%71.8%0.963Slower inference
XGBoost83.8%75.2%0.968
StatusBest balance

Business results

Real-time performance dashboard

Cost-Benefit Breakdown

How catching fraud impacts the business revenue and how much we can save?

Without a System

All 492 frauds succeed = -$3.3M lost per year

With a System

Fraud Prevented: 413 frauds → $2.77M saved

Missed: 79 frauds → $535K loss

Technical Performance

Comprehensive performance metrics and technical achievements of the fraud detection system.

Recall

Catches 413 out of 492 frauds

83.8%

Precision

3 out of 4 alerts are real fraud

75.2%

ROC-AUC

Near-perfect discrimination

0.968

False Alarm Rate

Only 41 false positives per 85K transactions

0.048%

Latency

Real-time capable

<50ms

Segment Analysis (Honest Assessment)

Balancing recall (catch fraud) vs. precision (minimize false alarms) without business context. Solved by calculating cost-benefit tradeoffs at different thresholds.

Strengths

High-value fraud (>$500): 94% recall

Medium transactions ($100-$500): 89% recall

Night transactions: 91% recall

Isolation Forest for feature creation: Outlier scores had 217x fraud concentration

Weaknesses

Micro-transactions (<$10): 78% recall

Very small frauds likely card testing patterns

What Worked Well

Feature engineering over algorithm choice

Business-driven threshold optimization

Segment analysis

Isolation Forest for feature creation

Key Learnings Image

Technologies Used

Technologies

Python 3.13
XGBoost 3.1.0
scikit-learn
imbalanced-learn
pandas
numpy
scipy
Load balancer
Docker
plotly
SQLite
joblib

Categories

Machine Learning
Data Processing
Deployment & Infrastructure

Machine Learning & AI

Hands-on experimentation with fraud detection, retrieval systems, and autonomous agents.

Want to talk about your project?

Message me on LinkedIn or send me an email

I build solutions using:

Machine Learning Engineering

Data Science

Product Design