Operational Analytics

Fraud Detection Playbooks for National Rail

January 5, 20256 min readBy Jaafar Benabderrazak

Breaking down the experimentation, models, and change management that lowered fare evasion for SNCF.

Fraud Detection Playbooks for National Rail

Fare evasion costs France's national railway €300M+ annually. For SNCF's TER division, reducing fraud while maintaining passenger experience required sophisticated ML systems, careful experimentation, and cross-functional alignment.

This article details how RailGuard Fraud Studio reduced fraud rates from 7.2% to 5.9% through predictive analytics and interpretable dashboards.

The Business Context

The Challenge

SNCF TER (regional trains) faces unique fraud challenges:

  • Open-access platforms: No ticket gates in most stations
  • High passenger volume: Millions of journeys monthly
  • Limited inspectors: Can't check every passenger
  • Customer experience: Must balance enforcement with service quality

The Opportunity

Existing fraud detection relied on:

  • Manual inspector intuition
  • Random checking patterns
  • Reactive rather than predictive approaches

Goal: Build ML systems to predict high-risk journeys and optimize inspector deployment.

Solution Architecture

Phase 1: Data Foundation

We consolidated data from multiple sources:

class FraudDataPipeline:
    def __init__(self):
        self.sources = {
            'ticket_sales': TicketDatabase(),
            'inspection_logs': InspectionSystem(),
            'passenger_patterns': AnalyticsDB(),
            'train_schedules': ScheduleAPI()
        }

    def build_training_data(self, start_date: str, end_date: str) -> DataFrame:
        # Merge ticket sales with inspection outcomes
        tickets = self.sources['ticket_sales'].query(start_date, end_date)
        inspections = self.sources['inspection_logs'].query(start_date, end_date)

        # Join and create labels
        merged = tickets.merge(inspections, on='journey_id', how='left')
        merged['fraud'] = merged['valid_ticket'] == False

        # Feature engineering
        features = self.engineer_features(merged)

        return features

Phase 2: Feature Engineering

We identified key fraud indicators:

Temporal Features:

  • Time of day (late evening = higher risk)
  • Day of week (weekends = different patterns)
  • Holiday periods

Journey Features:

  • Route popularity
  • Ticket purchase timing (last-minute = higher risk)
  • Purchase channel (station vs online)

Behavioral Features:

  • Historical pattern analysis
  • Passenger frequency indicators
  • Unusual travel patterns
def engineer_features(df: DataFrame) -> DataFrame:
    features = df.copy()

    # Temporal
    features['hour'] = features['departure_time'].dt.hour
    features['is_weekend'] = features['departure_time'].dt.dayofweek >= 5
    features['is_holiday'] = features['departure_time'].isin(HOLIDAYS)

    # Journey
    features['route_popularity'] = features.groupby('route')['journey_id'].transform('count')
    features['minutes_before_departure'] = (
        features['departure_time'] - features['purchase_time']
    ).dt.total_seconds() / 60

    # Risk scoring
    features['purchase_channel_risk'] = features['channel'].map(CHANNEL_RISK_SCORES)

    return features

Phase 3: Model Development

We experimented with multiple approaches:

Random Forest (Baseline)

from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    class_weight='balanced',  # Handle imbalanced data
    random_state=42
)

rf_model.fit(X_train, y_train)

Results: AUC-ROC 0.83, good interpretability

XGBoost (Production Model)

import xgboost as xgb

xgb_model = xgb.XGBClassifier(
    n_estimators=200,
    max_depth=8,
    learning_rate=0.1,
    scale_pos_weight=20,  # Fraud is rare (~7%)
    eval_metric='auc'
)

xgb_model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    early_stopping_rounds=10
)

Results: AUC-ROC 0.87, selected for production

LSTM for Time Series Patterns

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

lstm_model = Sequential([
    LSTM(64, return_sequences=True, input_shape=(sequence_length, n_features)),
    Dropout(0.3),
    LSTM(32),
    Dropout(0.3),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

lstm_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['AUC']
)

Results: Best for temporal patterns, used in ensemble

Phase 4: Volume Forecasting

Beyond fraud detection, we built forecasting models to predict:

  • Expected passenger volume per route
  • Optimal inspector allocation
from sklearn.ensemble import GradientBoostingRegressor

volume_forecaster = GradientBoostingRegressor(
    n_estimators=150,
    learning_rate=0.05,
    max_depth=6
)

volume_forecaster.fit(X_train, y_train_volume)

Results: RMSE 14.7% on volume forecasting

Deployment & Integration

Inspector Dashboard

We built an interpretable dashboard showing:

class InspectorDashboard:
    def get_high_risk_journeys(self, date: str, inspector_zone: str) -> List[Journey]:
        # Get scheduled trains
        trains = self.get_trains(date, inspector_zone)

        # Predict fraud probability for each
        predictions = []
        for train in trains:
            features = self.extract_features(train)
            fraud_prob = self.model.predict_proba(features)[0][1]

            predictions.append({
                'train_id': train.id,
                'route': train.route,
                'departure': train.departure_time,
                'fraud_probability': fraud_prob,
                'expected_volume': self.volume_model.predict(features),
                'priority_score': self.calculate_priority(fraud_prob, volume)
            })

        # Sort by priority
        return sorted(predictions, key=lambda x: x['priority_score'], reverse=True)

Model Monitoring

class FraudModelMonitor:
    def track_performance(self, predictions: List, actuals: List):
        # Calculate metrics
        metrics = {
            'auc_roc': roc_auc_score(actuals, predictions),
            'precision': precision_score(actuals, predictions > 0.5),
            'recall': recall_score(actuals, predictions > 0.5),
            'false_positive_rate': self.calculate_fpr(actuals, predictions)
        }

        # Alert if degradation
        if metrics['auc_roc'] < 0.80:
            self.send_alert("Model performance degraded")

        # Log for tracking
        self.log_metrics(metrics)

Results & Impact

Quantitative Outcomes

  • Fraud rate reduced: 7.2% → 5.9% (18% relative reduction)
  • Model accuracy: AUC-ROC above 0.87 for risk scoring
  • Forecast accuracy: RMSE 14.7% on volume predictions
  • Inspector efficiency: 32% more fraudulent tickets caught per inspector

Qualitative Outcomes

Inspector Feedback:

"The dashboard transformed how we work. Instead of guessing which trains to check, we have data-driven priorities." — SNCF Inspector Team Lead

Stakeholder Alignment:

  • Product owners understood model decisions through feature importance
  • Inspectors trusted the system due to explainability
  • Legal teams approved due to audit trails

Technical Stack

  • Python: Data processing and modeling
  • XGBoost: Primary fraud detection model
  • LSTM (TensorFlow): Temporal pattern analysis
  • Scikit-learn: Volume forecasting
  • PostgreSQL: Data warehouse
  • Streamlit: Inspector dashboard
  • MLflow: Experiment tracking

Key Lessons

1. Class Imbalance Matters

With fraud at ~7%, naive models predict "no fraud" for 93% accuracy:

# Solution: Balanced sampling + calibrated thresholds
from imblearn.over_sampling import SMOTE

smote = SMOTE(sampling_strategy=0.3)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Custom threshold tuning
optimal_threshold = find_optimal_threshold(
    y_val, y_pred_proba,
    target_metric='f1'  # Balance precision/recall
)

2. Interpretability Builds Trust

Feature importance helped inspectors understand predictions:

import shap

explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)

# Show top features for specific prediction
shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0])

3. Change Management is Critical

Technical excellence means nothing without adoption:

  • Training sessions with inspector teams
  • Pilot program with friendly zones first
  • Feedback loops for continuous improvement
  • Success stories shared across teams

Future Enhancements

We're exploring:

  • Real-time prediction: Moving from batch to stream processing
  • Multi-modal data: Incorporating CCTV and turnstile data
  • Anomaly detection: Identifying new fraud patterns automatically
  • Reinforcement learning: Optimizing inspector patrol routes dynamically

Conclusion

Fraud detection at national scale requires more than just accurate models. It demands:

  1. Thoughtful feature engineering grounded in domain expertise
  2. Interpretable models that stakeholders can trust
  3. Practical deployment that fits existing workflows
  4. Continuous monitoring and improvement

RailGuard Fraud Studio demonstrates that ML can deliver real business value when combined with rigorous experimentation and change management.


Code & Notebooks: GitHub Repository

Full Article: Medium

Let's Connect: LinkedIn

Enjoyed this article?

Check out more technical deep dives on AI systems, or connect with me to discuss your AI initiatives.

AI copilot

Ask about Jaafar’s AI projects, articles, or experiments.

AI copilot

Ask about Jaafar’s AI projects, articles, or experiments.