You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/survival_analysis/censored_data.myst.md
+10-29Lines changed: 10 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,9 @@ jupytext:
5
5
format_name: myst
6
6
format_version: 0.13
7
7
kernelspec:
8
-
display_name: Python [conda env:base] *
8
+
display_name: arviz_1
9
9
language: python
10
-
name: conda-base-py
10
+
name: python3
11
11
---
12
12
13
13
(censored_data)=
@@ -22,7 +22,7 @@ kernelspec:
22
22
```{code-cell} ipython3
23
23
from copy import copy
24
24
25
-
import arviz.preview as az
25
+
import arviz as az
26
26
import matplotlib.pyplot as plt
27
27
import numpy as np
28
28
import pymc as pm
@@ -36,42 +36,22 @@ rng = default_rng(1234)
36
36
az.style.use("arviz-variat")
37
37
```
38
38
39
-
[This example notebook on Bayesian survival
40
-
analysis](https://www.pymc.io/projects/examples/en/latest/survival_analysis/survival_analysis.html) touches on the
41
-
point of censored data. _Censoring_ is a form of missing-data problem, in which
42
-
observations greater than a certain threshold are clipped down to that
43
-
threshold, or observations less than a certain threshold are clipped up to that
44
-
threshold, or both. These are called right, left and interval censoring,
45
-
respectively. In this example notebook we consider interval censoring.
39
+
[This example notebook on Bayesian survival analysis](https://www.pymc.io/projects/examples/en/latest/survival_analysis/survival_analysis.html) touches on the point of censored data. _Censoring_ is a form of missing-data problem, in which observations greater than a certain threshold are clipped down to that threshold, or observations less than a certain threshold are clipped up to that threshold, or both. These are called right, left and interval censoring,respectively. In this example notebook we consider interval censoring.
46
40
47
41
Censored data arises in many modelling problems. Two common examples are:
48
42
49
-
1._Survival analysis:_ when studying the effect of a certain medical treatment
50
-
on survival times, it is impossible to prolong the study until all subjects
51
-
have died. At the end of the study, the only data collected for many patients
52
-
is that they were still alive for a time period $T$ after the treatment was
53
-
administered: in reality, their true survival times are greater than $T$.
43
+
1._Survival analysis:_ when studying the effect of a certain medical treatment on survival times, it is impossible to prolong the study until all subjects have died. At the end of the study, the only data collected for many patients is that they were still alive for a time period $T$ after the treatment was administered: in reality, their true survival times are greater than $T$.
54
44
55
-
2._Sensor saturation:_ a sensor might have a limited range and the upper and
56
-
lower limits would simply be the highest and lowest values a sensor can
57
-
report. For instance, many mercury thermometers only report a very narrow
58
-
range of temperatures.
45
+
2._Sensor saturation:_ a sensor might have a limited range and the upper and lower limits would simply be the highest and lowest values a sensor can report. For instance, many mercury thermometers only report a very narrow range of temperatures.
59
46
60
47
This example notebook presents two different ways of dealing with censored data
61
48
in PyMC:
62
49
63
-
1. An imputed censored model, which represents censored data as parameters and
64
-
makes up plausible values for all censored values. As a result of this
65
-
imputation, this model is capable of generating plausible sets of made-up
66
-
values that would have been censored. Each censored element introduces a
67
-
random variable.
50
+
1. An imputed censored model, which represents censored data as parameters and makes up plausible values for all censored values. As a result of this imputation, this model is capable of generating plausible sets of made-up values that would have been censored. Each censored element introduces a random variable.
68
51
69
-
2. An unimputed censored model, where the censored data are integrated out and
70
-
accounted for only through the log-likelihood. This method deals more
71
-
adequately with large amounts of censored data and converges more quickly.
52
+
2. An unimputed censored model, where the censored data are integrated out and accounted for only through the log-likelihood. This method deals more adequately with large amounts of censored data and converges more quickly.
72
53
73
-
To establish a baseline we compare to an uncensored model of the uncensored
74
-
data.
54
+
To establish a baseline we compare to an uncensored model of the uncensored data.
75
55
76
56
```{code-cell} ipython3
77
57
# Produce normally distributed samples
@@ -238,6 +218,7 @@ As we can see, both censored models appear to capture the mean and variance of t
238
218
- Updated by [Benjamin Vincent](https://github.com/drbenvincent) in May 2021.
239
219
- Updated by [Benjamin Vincent](https://github.com/drbenvincent) in May 2022.
240
220
- Updated by [Osvaldo Martin](https://github.com/aloctavodia) in Dec 2025.
221
+
- Updated by [Osvaldo Martin](https://github.com/aloctavodia) in Apr 2026.
0 commit comments