Unlock the power of uncertainty quantification, density estimation, and generative modeling with TensorFlow Probability.
This repository is a comprehensive collection of TensorFlow Probability implementations for probabilistic deep learning. The primary goal is educational: to bridge the gap between traditional deterministic models and real-world uncertainty quantification.
This repository provides hands-on implementations of probabilistic deep learning using TensorFlow Probability (TFP), enabling you to build models that not only make predictions but also quantify how confident they are about those predictions.
Documentation: TFP API Docs
Traditional machine learning models provide point estimates without quantifying uncertainty. In critical applications like medical diagnosis, autonomous vehicles, or financial modeling, knowing how confident your model is can be the difference between success and catastrophic failure.
-
Enables models to express confidence levels using probabilistic layers and Bayesian neural networks.
-
Supports sampling, log-likelihood evaluation, and manipulation of complex distributions (univariate & multivariate).
-
Powers VAEs and normalizing flows for density estimation, representation learning, and synthetic data generation.
This repository demonstrates how TensorFlow Probability transforms your standard neural networks into probabilistic powerhouses that:
- Quantify uncertainty in predictions
- Model complex distributions beyond simple Gaussian assumptions
- Perform Bayesian inference at scale
- Generate realistic synthetic data through advanced generative models
Real-world data is messy, incomplete, and uncertain. Probabilistic deep learning addresses these challenges by:
- Handling Data Scarcity: Bayesian approaches work well with limited data.
- Robust Decision Making: Uncertainty estimates guide better decisions.
- Interpretable AI: Understanding model confidence builds trust
- Anomaly Detection: Identifying outliers and unusual patterns.
- Risk Assessment: Quantifying potential failure modes.
-
Bayesian neural networks: Adds priors to weights and calibrates predictive uncertainty for out-of-distribution robustness.
-
Normalizing flows: Uses invertible transforms for expressive density estimation and efficient sampling.
-
Variational inference: Optimizes ELBO with reparameterization for controllable generation and learning.
- Higher memory and training time than deterministic models.
- Gains in interpretability, calibrated risk, and anomaly detection often outweigh the cost
- Linear Algebra: Matrix operations, eigenvalues, SVD
- Calculus: Derivatives, gradients, optimization
- Statistics: Probability theory, Bayes' theorem, distributions
- Information Theory: KL divergence, entropy, mutual information
- Python 3.8+ with object-oriented programming
- TensorFlow/Keras fundamentals
- NumPy/SciPy for numerical computing
- Matplotlib/Seaborn for visualization
- Pattern Recognition and Machine Learning by Christopher Bishop
- The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
- Probabilistic Machine Learning by Kevin Murphy
-
Clone the repository:
git clone https://github.com/mohd-faizy/Probabilistic-Deep-Learning-with-TensorFlow.git cd Probabilistic-Deep-Learning-with-TensorFlow -
Create virtual environment (using uv β β‘ faster alternative):
# Install uv if not already installed pip install uv # Create and activate virtual environment uv venv # Activate the env source .venv/bin/activate # Linux/macOS .venv\Scripts\activate # Windows
-
Install dependencies:
uv add -r requirements.txt
-
Verify installation:
import tensorflow as tf import tensorflow_probability as tfp print(f"TensorFlow: {tf.__version__}") print(f"TensorFlow Probability: {tfp.__version__}")
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
# Create a probabilistic model
def create_bayesian_model():
model = tf.keras.Sequential([
tfp.layers.DenseVariational(
units=64,
make_prior_fn=lambda: tfd.Normal(0., 1.),
make_posterior_fn=tfp.layers.default_mean_field_normal_fn(),
kl_weight=1/50000
),
tf.keras.layers.Dense(10, activation='softmax')
])
return model
# Train with uncertainty quantification
model = create_bayesian_model()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')Understanding core probability distributions is fundamental for designing probabilistic models. In TensorFlow Probability, distributions are treated as first-class computational objects. They are grouped here by their mathematical type: Discrete, Continuous, and Multivariate.
| Distribution | Type | Support | Parameters | TFP Class |
|---|---|---|---|---|
| Bernoulli | π’ Discrete |
|
tfd.Bernoulli |
|
| Binomial | π’ Discrete |
|
tfd.Binomial |
|
| Poisson | π’ Discrete |
|
tfd.Poisson |
|
| Gaussian (Normal) | π΅ Continuous |
|
tfd.Normal |
|
| Exponential | π΅ Continuous |
|
tfd.Exponential |
|
| Beta | π΅ Continuous |
|
tfd.Beta |
|
| Multivariate Gaussian | π Multivariate |
|
tfd.MultivariateNormalTriL |
Discrete distributions model count data, binary trials, or categorical events where outcomes are distinct, countable values.
Mathematical Formulation:
-
Support:
$x \in {0, 1}$ -
Parameters: Probability of success
$p \in [0, 1]$ (or log-odds$\text{logits} \in \mathbb{R}$ ) - Typical DL Use Cases: Binary classification outputs, Variational Autoencoder (VAE) decoder output for binary data (e.g. MNIST pixels), and stochastic dropout masks.
TFP Implementation:
import tensorflow_probability as tfp
tfd = tfp.distributions
# Define via probability of success
bernoulli = tfd.Bernoulli(probs=0.7)
# Define via logits (log-odds, preferred for neural networks)
bernoulli_logits = tfd.Bernoulli(logits=0.85)Mathematical Formulation:
-
Support:
$k \in {0, 1, \dots, n}$ -
Parameters: Number of trials
$n \in \mathbb{N}^+$ , success probability$p \in [0, 1]$ - Typical DL Use Cases: A/B testing models, click-through-rate (CTR) modeling, quality control and defect counts.
TFP Implementation:
import tensorflow_probability as tfp
tfd = tfp.distributions
# 15 independent trials with success probability 0.4
binomial = tfd.Binomial(total_count=15., probs=0.4)Mathematical Formulation:
-
Support:
$k \in {0, 1, 2, \dots}$ -
Parameters: Average event rate
$\lambda > 0$ - Typical DL Use Cases: Count regression models (Poisson regression), web traffic/API request volume forecasting, and anomaly detection in system logs.
TFP Implementation:
import tensorflow_probability as tfp
tfd = tfp.distributions
# Average rate of 5.0 events per interval
poisson = tfd.Poisson(rate=5.0)Continuous distributions model real-valued parameters, waiting times, or data points that can take any value within a range.
Mathematical Formulation:
-
Support:
$x \in \mathbb{R}$ -
Parameters: Mean
$\mu \in \mathbb{R}$ , standard deviation$\sigma > 0$ - Typical DL Use Cases: Weight priors/posteriors in Bayesian Neural Networks (BNNs), VAE continuous latent spaces (reparameterization trick), and continuous regression with uncertainty output.
TFP Implementation:
import tensorflow_probability as tfp
tfd = tfp.distributions
# Standard Normal (mean 0, std dev 1)
normal = tfd.Normal(loc=0.0, scale=1.0)Mathematical Formulation:
-
Support:
$x \in [0, \infty)$ -
Parameters: Decay/arrival rate
$\lambda > 0$ - Typical DL Use Cases: Survival analysis and time-to-event estimation, wait-time and queueing theory in networks, positive-valued prior assumptions.
TFP Implementation:
import tensorflow_probability as tfp
tfd = tfp.distributions
# Exponential decay rate of 1.0
exponential = tfd.Exponential(rate=1.0)Mathematical Formulation:
-
Support:
$x \in (0, 1)$ -
Parameters: Shape/concentration parameters
$\alpha, \beta > 0$ - Typical DL Use Cases: Prior distribution for Binomial/Bernoulli probabilities, modeling bounded target proportions, and Bayesian inference of rates / success probabilities.
TFP Implementation:
import tensorflow_probability as tfp
tfd = tfp.distributions
# Beta prior distribution
beta = tfd.Beta(concentration1=2.0, concentration0=5.0)Multivariate distributions model higher-dimensional vectors of random variables, accounting for the correlations between different dimensions.
Mathematical Formulation:
-
Support:
$\mathbf{x} \in \mathbb{R}^k$ -
Parameters: Mean vector
$\boldsymbol{\mu} \in \mathbb{R}^k$ , covariance matrix$\boldsymbol{\Sigma} \in \mathbb{R}^{k \times k}$ (positive-definite) - Typical DL Use Cases: Modeling correlated multidimensional features, generative modeling (Normalizing Flows and VAE priors), and Kalman filter state estimators.
TFP Implementation:
import tensorflow_probability as tfp
tfd = tfp.distributions
# Correlated bivariate normal via lower triangular Cholesky factor
mv_normal = tfd.MultivariateNormalTriL(
loc=[0.0, 0.0],
scale_tril=[[1.0, 0.0],
[0.6, 0.8]]
)Every distribution object in TensorFlow Probability exposes a consistent API for seamless integration with TensorFlow graph execution:
-
sample(sample_shape): Draws Monte Carlo samples from the distribution. -
log_prob(value): Computes the log probability density (or mass) function. Crucial for calculating custom loss functions (e.g. Negative Log-Likelihood). -
prob(value): Computes the exact probability density/mass$P(X=x)$ . -
mean()/variance()/stddev(): Analytical moments of the distribution. -
kl_divergence(other_dist): Analytical Kullback-Leibler divergence between two distributions of the same type.
| Model Type | Dataset | Standard NN | Bayesian NN | VAE | Normalizing Flow |
|---|---|---|---|---|---|
| MNIST Classification | 60k samples | 2 min | 8 min | 12 min | 15 min |
| CIFAR-10 Classification | 50k samples | 15 min | 45 min | 60 min | 90 min |
| CelebA Generation | 200k samples | N/A | N/A | 120 min | 180 min |
Benchmarks on NVIDIA RTX 3090 GPU
Probabilistic models typically require 2-4x more memory than standard models due to:
- Parameter uncertainty representation
- Additional forward/backward passes
- Sampling operations during training
| Aspect | TensorFlow Probability (TFP) | TensorFlow Core (TF) |
|---|---|---|
| Primary Focus | Probabilistic modeling, uncertainty quantification | Deterministic neural networks, optimization |
| Model Output | Distributions with uncertainty bounds | Point estimates |
| Key Strengths | Bayesian inference, generative modeling | Fast training, established workflows |
| Learning Curve | Steeper (requires probability theory) | Gentler (standard ML concepts) |
| Memory Usage | Higher (parameter distributions) | Lower (point parameters) |
| Training Time | Slower (sampling, variational inference) | Faster (direct optimization) |
| Interpretability | Higher (uncertainty quantification) | Lower (black box predictions) |
| Best Use Cases | Critical decisions, small data, research | Large datasets, production systems |
We welcome contributions from the community! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Auto-Encoding Variational Bayes - Kingma & Welling (2013)
- Stochastic Backpropagation and Approximate Inference in Deep Generative Models - Rezende et al. (2014)
- Weight Uncertainty in Neural Networks - Blundell et al. (2015)
- Variational Inference: A Review for Statisticians - Blei et al. (2017)
- Probabilistic Machine Learning and Artificial Intelligence - Ghahramani (2015)
- Normalizing Flows for Probabilistic Modeling and Inference - Papamakarios et al. (2019)
- Density estimation using Real NVP - Dinh et al. (2016)
- Glow: Generative Flow with Invertible 1x1 Convolutions - Kingma & Dhariwal (2018)
- Practical Variational Inference for Neural Networks - Graves (2011)
- What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? - Kendall & Gal (2017)
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles - Lakshminarayanan et al. (2017)
- Black Box Variational Inference - Ranganath et al. (2014)
- Automatic Differentiation Variational Inference - Kucukelbir et al. (2017)
- The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo - Hoffman & Gelman (2014)
- Gaussian Processes for Machine Learning - Rasmussen & Williams (2006)
- Variational Learning of Inducing Variables in Sparse Gaussian Processes - Titsias (2009)
- Predicting the Present with Bayesian Structural Time Series - Scott & Varian (2014)
- Deep State Space Models for Time Series Forecasting - Rangapuram et al. (2018)
- TensorFlow Distributions - Dillon et al. (2017)
- Probabilistic Programming and Bayesian Methods for Hackers - Davidson-Pilon (2015)
- Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection - Chang et al. (2020)
- Predictive Uncertainty Estimation via Prior Networks - Malinin & Gales (2018)
This project is licensed under the MIT License - see the LICENSE file for details







