You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/paper.md
+12-11Lines changed: 12 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,12 +40,12 @@ affiliations:
40
40
# Summary
41
41
42
42
Study Data Tabulation Model (SDTM), developed by the Clinical Data Interchange Standards Consortium (CDISC), is an internationally adopted
43
-
data storage standard [@cdisc]. The model provides a coherent framework for harmonising data from
43
+
data storage standard [@cdisc]. The model provides a coherent framework for harmonising data from
44
44
multiple clinical studies. Using a modified implementation of SDTM, the Infectious Diseases Data Observatory (IDDO) has developed a standardised
45
45
repository of individual participant data from clinical studies spanning diseases of global health importance [@iddo]. Generating datasets amenable
46
-
to analysis (‘analysis datasets’) from SDTM format requires complex transformations, which necessitates expenditure of additional time and resource by end-users.
46
+
to analysis (‘analysis datasets’) from SDTM format requires complex transformations, which requires additional time and resource by end-users.
47
47
48
-
The `iddoverse` package provides a suite of functions that transforms datasets from SDTM format into analysis datasets. Assuming a foundational
48
+
The `iddoverse` package provides a suite of functions that transforms datasets from SDTM into analysis datasets. Assuming a foundational
49
49
familiarity with the R software [@r_core], the package is intended for epidemiologists, statisticians, and data scientists with a diverse
50
50
range of programming ability. Advanced knowledge of SDTM is not required, thereby removing a potential challenge for data requesters, especially
51
51
those in Low- and Middle- Income Countries (LMICs) where SDTM training and expertise is often more difficult to access. Overall, the `iddoverse`
@@ -58,7 +58,7 @@ and the adoption of a responsible open-access data model [@pisani_lessons; @pisa
58
58
important questions using individual participant data meta-analysis (IPD-MA) techniques. To achieve this, IDDO standardises clinical study
59
59
data from disparate sources into SDTM format, thereby providing a consistent and comprehensive method to harmonise data. The result is
60
60
a controlled, open-access database, which enables the global research community to address questions of public health relevance that would
61
-
otherwise not be possible using a standalone study. A key challenge, however, is the effort and time required for transformation of data
61
+
otherwise not be possible using a standalone study. A key challenge, however, is the effort and time required for transformation of the curated data
62
62
to an analysis-ready format. This mandates a working knowledge of the SDTM format and nomenclature, in addition to a good knowledge of IDDO’s
63
63
specific implementation of SDTM (a necessary modification to existing implementation guidelines due to heterogeneity in legacy data [@iddo_KKSS]).
64
64
End-users unfamiliar with IDDO’s SDTM implementation can therefore encounter challenges in utilising the data.
@@ -78,7 +78,7 @@ for pharmaceutical regulatory submission.
78
78
79
79
Data stored in SDTM format comprises a series of subsets called domains, with each domain corresponding to a specific data topic.
80
80
Domains are tabular and usually stored in a long format; typically a row per event per participant, which can result in multiple rows
81
-
per participant, per day. The package provides several synthetic datasets generated for user familiarisation and documentation. SDTM
81
+
per participant, per timepoint. The package provides several synthetic datasets generated for user familiarisation and documentation. SDTM
82
82
findings domains, such as the microbiology (MB) domain, can have up to 4 columns for capturing the test results (i.e. `MBORRES`, `MBSTRESC` & `MBSTRESN`)
83
83
and a selection of over 20 timing variables. The MB domain, for example (see `MB_RPTESTB` below), has one row for every microbiology test and the test
84
84
result conducted in the study. Whilst this preserves the intricacies of the study data, it also creates complexity for analysis.
@@ -133,7 +133,7 @@ A key limitation is that the iddoverse functions cannot address every need of re
133
133
in the datasets within, and across, diseases. The objective has been to provide assistance and automation of analysis
134
134
datasets, whilst keeping the solution generalisable and customisable by the user.
135
135
136
-

136
+

137
137
138
138
# iddoverse Functions
139
139
@@ -148,11 +148,11 @@ diseases and will not be relevant to most. The hierarchy of timing variables is
148
148
parameter to enable researchers to select the most appropriate variable(s) for their analysis. By choosing a ‘best choice’ timing
149
149
and result, potential confusion surrounding multiple columns is removed.
150
150
151
-

151
+

152
152
153
153
The `prepare_domain()` function then pivots the rows by the best choice time variable (`TIME`, `TIME_SOURCE`), the study ID (`STUDYID`)
154
154
and participant number (`USUBJID`). The different events/findings/tests are transformed into columns, and the dataset is populated
155
-
with the associated result. Several domains can then be analysed separately, or joined together by the uniquely identifying keys: `STUDYID`, `USUBJID`, `TIME` and `TIME_SOURCE`.
155
+
with the associated result. Several domains can then be analysed separately or joined together by the uniquely identifying keys: `STUDYID`, `USUBJID`, `TIME` and `TIME_SOURCE`.
156
156
157
157
```r
158
158
> prepare_domain(MB_RPTESTB, "mb")
@@ -189,15 +189,16 @@ selection of parameters within `prepare_domain()`. Additionally, some utility an
189
189
190
190
# Research Impact Statement
191
191
192
+
The IDDO data repository contains over 1.3 million IPD records from over 600 studies and 70 countries, across 8 disease themes. Since February 2026,
193
+
all researchers accessing data from the IDDO data repository have been provided with information on how to use and access the iddoverse package,
194
+
thus increasing the number and diversity of users and organisations.
195
+
192
196
Research which has used the `iddoverse` includes malaria studies conducted by the Liverpool School of Tropical Medicine and the
193
197
Mahidol-Oxford Tropical Medicine Research Unit in Thailand (unpublished at time of writing). Published works which have used the `iddoverse`
194
198
package include research on factors associated with death from Ebola [@trokon] and two visceral leishmaniasis meta-analyses
195
199
[@munir; @kumar]. Additionally, the package is being used internally by IDDO to create summaries to check, and subsequently
196
200
improve, the quality of the data standardisation.
197
201
198
-
Since February 2026, all researchers accessing the IDDO data repository have been provided with information on how to use and access the
199
-
`iddoverse` package, thus increasing the number and diversity of users and organisations.
200
-
201
202
In a survey conducted by IDDO, several users of the IDDO data repository have reported confusion with the variety of timing variables and
202
203
result columns. Users will benefit from the `iddoverse` package as a solution to this complexity.
0 commit comments