You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/examples-in-five-safes-tes/aggregating-statistics.md
+153Lines changed: 153 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,4 +3,157 @@ theme: air
3
3
style: ../entrust-style.css
4
4
title: Aggregating basic statistics
5
5
---
6
+
# Aggregating basic statistics
6
7
8
+
This tutorial can be run as a Jupyter notebook from the [5s-TES notebooks repository](https://github.com/Health-Informatics-UoN/5s-TES-notebooks/)
9
+
10
+
The outputs from TES tasks can be easily used to calculate basic statistics.
11
+
12
+
This example will use summary statistics from a dataset in the OMOP common data model.
13
+
There is a container which, given a SQL query that filters an OMOP table by your criteria, will calculate the necessary summary statistics for your final analysis.
14
+
15
+
This example data was collected using the [Custom Image wizard](submission-layer-wizards#custom-image) in the submission layer with these settings changed from default:
\"--user-query=SELECT value_as_number FROM public.measurement \\nWHERE measurement_concept_id = 3000905\\nAND value_as_number IS NOT NULL\",
52
+
\"--analysis=variance\",
53
+
\"--output-filename=/outputs/output\",
54
+
\"--output-format=json\"
55
+
],
56
+
\"workdir\": \"/app\",
57
+
\"stdin\": null,
58
+
\"stdout\": null,
59
+
\"stderr\": null,
60
+
\"env\": {}
61
+
}
62
+
],
63
+
\"volumes\": null,
64
+
\"tags\": {
65
+
\"Project\": \"NottinghamDemo\",
66
+
\"tres\": \"Nottingham TRE 01|Nottingham TRE 02\"
67
+
},
68
+
\"logs\": null,
69
+
\"creation_time\": null
70
+
}
71
+
```
72
+
</details>
73
+
74
+
The `aggregate_utils` module provided with this notebook allows you to calculate statistics for the overall population by aggregating intermediate result
75
+
76
+
```python
77
+
from pathlib import Path
78
+
from aggregate_utils import VarianceIntermediate, TTestIntermediate, make_variance_intermediate_from_json, aggregate_variance_intermediates
79
+
import numpy as np
80
+
```
81
+
The example data are held in `./data`
82
+
83
+
```python
84
+
paths = {
85
+
"tre1": "./data/variance-tre-1.json",
86
+
"tre2": "./data/variance-tre-2.json"
87
+
}
88
+
```
89
+
90
+
The `make_variance_intermediate_from_json` function reads the data from the JSON file and provides methods for aggregation.
91
+
The returned values hold the count (`n`), the sum (`total`), and the sum of squares (`sum_x2`) for the value read from the original table.
92
+
These three pieces of information are sufficient to calculate several other summary statistics.
The `mean` and `variance` properties are for the whole sample.
116
+
```python
117
+
aggregated_intermediate.mean
118
+
```
119
+
120
+
`35.69751154820444`
121
+
122
+
```python
123
+
aggregated_intermediate.variance
124
+
```
125
+
126
+
`1010.6365591480935`
127
+
128
+
The values are from a very skewed distribution.
129
+
To demonstrate how the same information can be used to conduct other common statistical analyses, random samples from a normal distribution can be generated with this code:
130
+
131
+
```python
132
+
mu, sigma =50, 10
133
+
rng = np.random.default_rng()
134
+
s = rng.normal(mu, sigma, 10)
135
+
```
136
+
137
+
A `TTestIntermediate` uses the same three pieces of information as a `VarianceIntermediate`.
Copy file name to clipboardExpand all lines: src/examples-in-five-safes-tes/contingency-tables.md
+155Lines changed: 155 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,4 +3,159 @@ theme: air
3
3
style: ../entrust-style.css
4
4
title: Aggregating contingency tables
5
5
---
6
+
# Aggregating data for contingency tables
6
7
8
+
This tutorial can be run as a Jupyter notebook from the [5s-TES notebooks repository](https://github.com/Health-Informatics-UoN/5s-TES-notebooks/)
9
+
10
+
Federated analysis on contingency tables is relatively simple.
11
+
Counts are easy to federate: each TRE calculates their local count for some group, then these are aggregated by adding the counts together.
12
+
Each cell of a contingency table is a count, so the table can be federated by requesting these counts, and then statistical analyses can be performed on the aggregate.
13
+
14
+
```mermaid
15
+
graph TD
16
+
subgraph 5s-TES
17
+
sub(Submission layer)
18
+
tre1(TRE 1)
19
+
tre2(TREs ..n)
20
+
end
21
+
user -- Request counts --> sub
22
+
sub -- Request counts --> tre1
23
+
sub -- Request counts --> tre2
24
+
tre1 -- counts --> agg(User)
25
+
tre2 -- counts --> agg
26
+
agg -- Sum counts --> Result
27
+
```
28
+
29
+
The example data were produced by running the [Custom Image wizard](submission-layer-wizards#custom-image) using the following parameters:
| Commands | --user-query=SELECT g.concept_name AS gender_name, r.concept_name AS race_name\\nFROM public.person p\\nJOIN public.concept g ON p.gender_concept_id = g.concept_id\\nJOIN public.concept r ON p.race_concept_id = r.concept_id\\nWHERE p.race_concept_id IN (8515, 8516, 8527)<br>--analysis=contingency_table<br>--output-filename=/outputs/output<br>--output-format=json |
36
+
37
+
The UI should look like this:
38
+

39
+
40
+
<details>
41
+
<summary>Expand for example JSON</summary>
42
+
43
+
```json
44
+
{
45
+
"id": "504",
46
+
"name": "test chi-sq",
47
+
"description": "Federated analysis task",
48
+
"inputs": null,
49
+
"outputs": [
50
+
{
51
+
"name": "Query Results",
52
+
"description": "Results from the requested query execution",
"--user-query=SELECT g.concept_name AS gender_name, r.concept_name AS race_name\nFROM public.person p\nJOIN public.concept g ON p.gender_concept_id = g.concept_id\nJOIN public.concept r ON p.race_concept_id = r.concept_id\nWHERE p.race_concept_id IN (8515, 8516, 8527)",
64
+
"--analysis=contingency_table",
65
+
"--output-filename=/outputs/output",
66
+
"--output-format=json"
67
+
],
68
+
"workdir": "/app",
69
+
"stdin": null,
70
+
"stdout": null,
71
+
"stderr": null,
72
+
"env": {}
73
+
}
74
+
],
75
+
"volumes": null,
76
+
"tags": {
77
+
"Project": "NottinghamDemo",
78
+
"tres": "Nottingham TRE 01|Nottingham TRE 02"
79
+
},
80
+
"logs": null,
81
+
"creation_time": null
82
+
}
83
+
```
84
+
</details>
85
+
86
+
```python
87
+
import pandas as pd
88
+
from contingency_table_utils import read_contingency_table_from_json, aggregate_tables
89
+
90
+
from scipy.stats import chi2_contingency
91
+
```
92
+
93
+
The json produced by this analysis can be read into tables using the `contingency_table_utils` module supplied.
The data aren't very interesting, as they simply report how many men and women there are of three ethnicities in the synthetic datasets, but they serve to show how contingency tables can be assembled.
111
+
112
+
`aggregate_tables` checks that your tables have the same variables, and sums the counts if they do.
113
+
114
+
```python
115
+
aggregate= aggregate_tables([tre1, tre2])
116
+
```
117
+
118
+
The `contingency_table`property organises this data into the formatfor statistical analyses.
119
+
120
+
```python
121
+
aggregate.contingency_table
122
+
```
123
+
124
+
||FEMALE|MALE|
125
+
|-------------------------|------|------|
126
+
| Asian |1011|982|
127
+
| Black or African American |1080|1022|
128
+
| White |1426|1441|
129
+
130
+
This format can be used for`scipy.stats` contingency table functions.
0 commit comments