Two end-to-end deployments — one driven by regulation, one by physical scale. Both demonstrating that collaborative AI is possible without moving a single byte of raw data.
Fraud rings rarely confine themselves to a single institution. This case study explores how four major financial organisations
built a shared fraud detection model using FL — without any institution exposing its customer data to the others.
Four financial institutions jointly aim to detect fraud, but their data is both horizontally and vertically partitioned. Some customers are shared across institutions, while each organisation observes a different subset of features.
→ Overlapping IDs (e.g., C400–C600) appear in multiple banks, but no institution sees the full population (C1–C1200).
→ Features are disjoint across institutions; no single bank has a complete feature vector.
Each institution holds sensitive financial data governed by strict privacy and banking regulations. Centralising this data — even in encrypted form — would require complex approvals and is often not feasible. Meanwhile, fraud losses are significant, with a large share driven by cross-institution activity. The need for collaboration is clear, but raw data sharing is not viable.
Before training, a private set intersection protocol identified shared customer IDs across institutions without revealing the IDs themselves. This produced a shared index used to align vertical features without centralising data.
Each round: local training on institution data → DP noise injection → gradient or intermediate information upload to server → FedAvg aggregation → global model broadcast. No raw data left any institution at any point.
Each institution evaluated the global model on its own labelled holdout set. Results were aggregated by the server into a single performance report — again without sharing institution-level predictions.
The federated model outperformed each institution's siloed baseline on every metric — demonstrating that collaboration without data sharing is not only legally viable but technically superior.
Modern sky surveys generate petabytes of imaging data that no single institution can centralise or process alone. This case study explores how three major astronomical research institutes trained a shared galaxy classification model using horizontal FL — keeping data local where the compute lives.
Three institutes independently operate large-scale sky survey telescopes, each accumulating imaging data at a rate that outpaces network transfer capacity. Each institute captures the same kinds of objects — galaxies, nebulae, stellar clusters — using the same feature schema and labelling conventions. The data is horizontally partitioned: same features, entirely different sky regions and samples.
Unlike most FL scenarios, the barrier here is not legal or regulatory — it is purely physical. The combined dataset across the three institutes exceeds 5.5 petabytes of raw imaging data. Transferring this volume over even a dedicated 100 Gbps research network would take over 120 days of continuous transfer, with storage and preprocessing costs in the tens of millions of dollars. Furthermore, no single compute facility has the GPU memory capacity to train a CNN on the full combined dataset in memory. The data must stay where the storage and local compute infrastructure already exists.
Each institute applied an identical preprocessing pipeline locally: image normalisation to a shared photometric scale, augmentation (random crop, flip, rotation) and resizing to 224×224 pixels. Alignment was verified by comparing summary statistics — not by sharing images — before the first training round.
Each round: local CNN training for 3 epochs on institute GPU cluster → weight upload to CERN server (~280 MB per client) → FedProx aggregation → global model broadcast. Node 1, with fewer GPUs, was permitted to skip up to 20% of rounds without destabilising convergence — a key advantage of FedProx over FedAvg.
Each institute held back 10% of labelled images for local validation. The global model was also evaluated on a small jointly-curated benchmark set of 12,000 images — the only data physically transferred during the entire project, at under 20 GB.
The federated CNN matched the accuracy of a hypothetical centralised model trained on all data — without a single raw image leaving its institute of origin. Training that would have required 120+ days of data transfer was completed in under two weeks of federated rounds.