Fraud Detection Playbooks for National Rail

Fare evasion costs France's national railway €300M+ annually. For SNCF's TER division, reducing fraud while maintaining passenger experience required sophisticated ML systems, careful experimentation, and cross-functional alignment.

This article details how RailGuard Fraud Studio reduced fraud rates from 7.2% to 5.9% through predictive analytics and interpretable dashboards.

The Business Context

The Challenge

SNCF TER (regional trains) faces unique fraud challenges:

Open-access platforms: No ticket gates in most stations
High passenger volume: Millions of journeys monthly
Limited inspectors: Can't check every passenger
Customer experience: Must balance enforcement with service quality

The Opportunity

Existing fraud detection relied on:

Manual inspector intuition
Random checking patterns
Reactive rather than predictive approaches

Goal: Build ML systems to predict high-risk journeys and optimize inspector deployment.

Solution Architecture

Phase 1: Data Foundation

We consolidated data from multiple sources:

class FraudDataPipeline:
    def __init__(self):
        self.sources = {
            'ticket_sales': TicketDatabase(),
            'inspection_logs': InspectionSystem(),
            'passenger_patterns': AnalyticsDB(),
            'train_schedules': ScheduleAPI()
        }

    def build_training_data(self, start_date: str, end_date: str) -> DataFrame:
        # Merge ticket sales with inspection outcomes
        tickets = self.sources['ticket_sales'].query(start_date, end_date)
        inspections = self.sources['inspection_logs'].query(start_date, end_date)

        # Join and create labels
        merged = tickets.merge(inspections, on='journey_id', how='left')
        merged['fraud'] = merged['valid_ticket'] == False

        # Feature engineering
        features = self.engineer_features(merged)

        return features

Phase 2: Feature Engineering

We identified key fraud indicators:

Temporal Features:

Time of day (late evening = higher risk)
Day of week (weekends = different patterns)
Holiday periods

Journey Features:

Route popularity
Ticket purchase timing (last-minute = higher risk)
Purchase channel (station vs online)

Behavioral Features:

Historical pattern analysis
Passenger frequency indicators
Unusual travel patterns

def engineer_features(df: DataFrame) -> DataFrame:
    features = df.copy()

    # Temporal
    features['hour'] = features['departure_time'].dt.hour
    features['is_weekend'] = features['departure_time'].dt.dayofweek >= 5
    features['is_holiday'] = features['departure_time'].isin(HOLIDAYS)

    # Journey
    features['route_popularity'] = features.groupby('route')['journey_id'].transform('count')
    features['minutes_before_departure'] = (
        features['departure_time'] - features['purchase_time']
    ).dt.total_seconds() / 60

    # Risk scoring
    features['purchase_channel_risk'] = features['channel'].map(CHANNEL_RISK_SCORES)

    return features

Phase 3: Model Development

We experimented with multiple approaches:

Random Forest (Baseline)

from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    class_weight='balanced',  # Handle imbalanced data
    random_state=42
)

rf_model.fit(X_train, y_train)

Results: AUC-ROC 0.83, good interpretability

XGBoost (Production Model)

import xgboost as xgb

xgb_model = xgb.XGBClassifier(
    n_estimators=200,
    max_depth=8,
    learning_rate=0.1,
    scale_pos_weight=20,  # Fraud is rare (~7%)
    eval_metric='auc'
)

xgb_model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    early_stopping_rounds=10
)

Results: AUC-ROC 0.87, selected for production

LSTM for Time Series Patterns

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

lstm_model = Sequential([
    LSTM(64, return_sequences=True, input_shape=(sequence_length, n_features)),
    Dropout(0.3),
    LSTM(32),
    Dropout(0.3),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

lstm_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['AUC']
)

Results: Best for temporal patterns, used in ensemble

Phase 4: Volume Forecasting

Beyond fraud detection, we built forecasting models to predict:

Expected passenger volume per route
Optimal inspector allocation

from sklearn.ensemble import GradientBoostingRegressor

volume_forecaster = GradientBoostingRegressor(
    n_estimators=150,
    learning_rate=0.05,
    max_depth=6
)

volume_forecaster.fit(X_train, y_train_volume)

Results: RMSE 14.7% on volume forecasting

Deployment & Integration

Inspector Dashboard

We built an interpretable dashboard showing:

class InspectorDashboard:
    def get_high_risk_journeys(self, date: str, inspector_zone: str) -> List[Journey]:
        # Get scheduled trains
        trains = self.get_trains(date, inspector_zone)

        # Predict fraud probability for each
        predictions = []
        for train in trains:
            features = self.extract_features(train)
            fraud_prob = self.model.predict_proba(features)[0][1]

            predictions.append({
                'train_id': train.id,
                'route': train.route,
                'departure': train.departure_time,
                'fraud_probability': fraud_prob,
                'expected_volume': self.volume_model.predict(features),
                'priority_score': self.calculate_priority(fraud_prob, volume)
            })

        # Sort by priority
        return sorted(predictions, key=lambda x: x['priority_score'], reverse=True)

Model Monitoring

class FraudModelMonitor:
    def track_performance(self, predictions: List, actuals: List):
        # Calculate metrics
        metrics = {
            'auc_roc': roc_auc_score(actuals, predictions),
            'precision': precision_score(actuals, predictions > 0.5),
            'recall': recall_score(actuals, predictions > 0.5),
            'false_positive_rate': self.calculate_fpr(actuals, predictions)
        }

        # Alert if degradation
        if metrics['auc_roc'] < 0.80:
            self.send_alert("Model performance degraded")

        # Log for tracking
        self.log_metrics(metrics)

Results & Impact

Quantitative Outcomes

✅ Fraud rate reduced: 7.2% → 5.9% (18% relative reduction)
✅ Model accuracy: AUC-ROC above 0.87 for risk scoring
✅ Forecast accuracy: RMSE 14.7% on volume predictions
✅ Inspector efficiency: 32% more fraudulent tickets caught per inspector

Qualitative Outcomes

Inspector Feedback:

"The dashboard transformed how we work. Instead of guessing which trains to check, we have data-driven priorities." — SNCF Inspector Team Lead

Stakeholder Alignment:

Product owners understood model decisions through feature importance
Inspectors trusted the system due to explainability
Legal teams approved due to audit trails

Technical Stack

Python: Data processing and modeling
XGBoost: Primary fraud detection model
LSTM (TensorFlow): Temporal pattern analysis
Scikit-learn: Volume forecasting
PostgreSQL: Data warehouse
Streamlit: Inspector dashboard
MLflow: Experiment tracking

Key Lessons

1. Class Imbalance Matters

With fraud at ~7%, naive models predict "no fraud" for 93% accuracy:

# Solution: Balanced sampling + calibrated thresholds
from imblearn.over_sampling import SMOTE

smote = SMOTE(sampling_strategy=0.3)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Custom threshold tuning
optimal_threshold = find_optimal_threshold(
    y_val, y_pred_proba,
    target_metric='f1'  # Balance precision/recall
)

2. Interpretability Builds Trust

Feature importance helped inspectors understand predictions:

import shap

explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)

# Show top features for specific prediction
shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0])

3. Change Management is Critical

Technical excellence means nothing without adoption:

Training sessions with inspector teams
Pilot program with friendly zones first
Feedback loops for continuous improvement
Success stories shared across teams

Future Enhancements

We're exploring:

Real-time prediction: Moving from batch to stream processing
Multi-modal data: Incorporating CCTV and turnstile data
Anomaly detection: Identifying new fraud patterns automatically
Reinforcement learning: Optimizing inspector patrol routes dynamically

Conclusion

Fraud detection at national scale requires more than just accurate models. It demands:

Thoughtful feature engineering grounded in domain expertise
Interpretable models that stakeholders can trust
Practical deployment that fits existing workflows
Continuous monitoring and improvement

RailGuard Fraud Studio demonstrates that ML can deliver real business value when combined with rigorous experimentation and change management.

Code & Notebooks: GitHub Repository

Full Article: Medium

Let's Connect: LinkedIn