Technologies

The building blocks of privacy-preserving AI

From model selection to deployment frameworks — a breakdown of the core technologies that underpin federated learning systems in real-world environments.


ML Models

Compatible ML models

FL is model-agnostic. The right model depends on your data type, task complexity and privacy requirements. Some of the ML models are:

input P(y=1)
01

Logistic Regression

A linear classifier suited to binary and multiclass tasks. Low complexity makes it ideal for resource-constrained FL participants with tabular data.

in hidden out
02

Neural Networks

Multi-layer perceptrons capable of learning complex patterns from structured data. The de facto backbone for most FL research.

input conv pool fc convolutional layers
03

Convolutional Neural Networks

Spatial feature extractors designed for image and signal data. Widely used in federated medical imaging — training across hospital silos without sharing scans.

x > 5? x > 2? x > 8? A B B A N Y
04

Decision Trees

Interpretable rule-based classifiers that partition feature space. In FL, gradient-boosted variants such as XGBoost can be federated across participants without sharing raw data.


Data Partitioning

How data is split across participants

The structure of your data across participants determines which FL topology to use. There are three fundamental partitioning types:

age income score Client A 34 52k 0.81 Client B 29 38k 0.63 Client C 41 71k 0.77 same features · different samples ↑ rows split by participant ↓
01

Horizontal Partitioning

Participants share the same feature space but hold different samples. The most common FL scenario — e.g. multiple hospitals with the same patient schema but different patients.

s1 s2 s3 Client A age: 34 age: 29 age: 41 Client B inc: 52k inc: 38k inc: 71k Client C score: 0.81 score: 0.63 score: 0.77 same samples · different features ← columns split by participant →
02

Vertical Partitioning

Participants share the same sample IDs but hold different feature sets. Common when different organisations hold complementary data on the same users — e.g. a bank and a retailer.

Client A rows 1–3 cols 1–2 Client B rows 1–3 cols 3–4 Client C rows 4–6 cols 1–2 Client D rows 4–6 cols 3–4 different samples · different features
03

Combined Partitioning

Participants differ in both their samples and feature sets — the most complex and realistic scenario. Requires careful alignment strategies and is common in large cross-silo deployments.


Privacy Techniques

How data is kept private

Privacy-preserving techniques ensure that sensitive data is never exposed during the FL process — even to the aggregating server. Some of the techniques are:

— true value ● noisy output
01

Differential Privacy

Calibrated statistical noise is added to model updates before sharing, making it mathematically impossible to reverse-engineer individual records from the aggregated output.

name Alice ███ age 34 ██ diagnosis T2DM ████ postcode SW1A ████ field raw value masked
02

Data Masking

Sensitive fields are replaced with redacted or pseudonymised values before any data is used in training. Identifiers, diagnoses and locations are hidden while retaining data utility.

Enc(x) Enc(f(x)) computation inside encrypted space ciphertext in ciphertext out
03

Homomorphic Encryption

Model updates are encrypted before leaving the client. The server performs aggregation directly on ciphertexts — the data is never decrypted, even during computation.


Aggregation Strategies

How local models are combined

After each training round, client model updates must be aggregated into a single global model. The aggregation strategy directly affects convergence speed, fairness and robustness. Some of the techniques are:

global model Client A w=0.4 Client B w=0.3 Client C w=0.2 Client D w=0.1 w_global = Σ (nk / n) · wk
01

FedAvg

Federated Averaging is the foundational FL aggregation algorithm. Each client trains locally for several steps, then sends its model weights to the server. The server computes a weighted average — proportional to each client's dataset size — to produce an updated global model.

global w* Client A local w Client B local w Client C local w Client D local w + μ/2 · ‖w − w*‖² proximal term
02

FedProx

An extension of FedAvg designed for heterogeneous FL systems. It adds a proximal term to each client's local objective that penalises divergence from the global model — preventing any single client's non-IID data from pulling the global model too far off course.


FL Frameworks

Tools for building FL systems

Open-source frameworks handle the communication, aggregation and orchestration infrastructure — letting teams focus on model development rather than FL plumbing. A couple of FL tools are:

Flower server PyTorch TensorFlow JAX scikit-learn custom framework-agnostic · any ML backend
01

Flower (flwr)

A framework-agnostic FL library that works with any ML backend — PyTorch, TensorFlow, JAX, scikit-learn or custom code. Designed for real-world cross-silo and cross-device deployments with a simple client-server API that abstracts all communication and aggregation logic.

Python cross-silo cross-device production-ready
TensorFlow Core TFF Core — federated ops FL simulation runtime Research / model API TensorFlow-native · simulation-first
02

TensorFlow Federated (TFF)

Google's FL framework built natively on TensorFlow. TFF provides a layered architecture — from low-level federated operators up to high-level simulation APIs — making it particularly well-suited for research and prototyping federated algorithms before production deployment.

Python TensorFlow simulation research

Start your journey toward privacy-preserving AI

Whether you're exploring FL or ready to deploy, we can help you assess, design and implement the right solution for your organisation.

Book a consultation