Skip to content

Commit 56fb036

Browse files
committed
Addressing author feedback
1 parent 07c0fc2 commit 56fb036

1 file changed

Lines changed: 12 additions & 11 deletions

File tree

paper/paper.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,12 @@ affiliations:
4040
# Summary
4141

4242
Study Data Tabulation Model (SDTM), developed by the Clinical Data Interchange Standards Consortium (CDISC), is an internationally adopted
43-
data storage standard [@cdisc]. The model provides a coherent framework for harmonising data from
43+
data storage standard [@cdisc]. The model provides a coherent framework for harmonising data from
4444
multiple clinical studies. Using a modified implementation of SDTM, the Infectious Diseases Data Observatory (IDDO) has developed a standardised
4545
repository of individual participant data from clinical studies spanning diseases of global health importance [@iddo]. Generating datasets amenable
46-
to analysis (‘analysis datasets’) from SDTM format requires complex transformations, which necessitates expenditure of additional time and resource by end-users.
46+
to analysis (‘analysis datasets’) from SDTM format requires complex transformations, which requires additional time and resource by end-users.
4747

48-
The `iddoverse` package provides a suite of functions that transforms datasets from SDTM format into analysis datasets. Assuming a foundational
48+
The `iddoverse` package provides a suite of functions that transforms datasets from SDTM into analysis datasets. Assuming a foundational
4949
familiarity with the R software [@r_core], the package is intended for epidemiologists, statisticians, and data scientists with a diverse
5050
range of programming ability. Advanced knowledge of SDTM is not required, thereby removing a potential challenge for data requesters, especially
5151
those in Low- and Middle- Income Countries (LMICs) where SDTM training and expertise is often more difficult to access. Overall, the `iddoverse`
@@ -58,7 +58,7 @@ and the adoption of a responsible open-access data model [@pisani_lessons; @pisa
5858
important questions using individual participant data meta-analysis (IPD-MA) techniques. To achieve this, IDDO standardises clinical study
5959
data from disparate sources into SDTM format, thereby providing a consistent and comprehensive method to harmonise data. The result is
6060
a controlled, open-access database, which enables the global research community to address questions of public health relevance that would
61-
otherwise not be possible using a standalone study. A key challenge, however, is the effort and time required for transformation of data
61+
otherwise not be possible using a standalone study. A key challenge, however, is the effort and time required for transformation of the curated data
6262
to an analysis-ready format. This mandates a working knowledge of the SDTM format and nomenclature, in addition to a good knowledge of IDDO’s
6363
specific implementation of SDTM (a necessary modification to existing implementation guidelines due to heterogeneity in legacy data [@iddo_KKSS]).
6464
End-users unfamiliar with IDDO’s SDTM implementation can therefore encounter challenges in utilising the data.
@@ -78,7 +78,7 @@ for pharmaceutical regulatory submission.
7878

7979
Data stored in SDTM format comprises a series of subsets called domains, with each domain corresponding to a specific data topic.
8080
Domains are tabular and usually stored in a long format; typically a row per event per participant, which can result in multiple rows
81-
per participant, per day. The package provides several synthetic datasets generated for user familiarisation and documentation. SDTM
81+
per participant, per timepoint. The package provides several synthetic datasets generated for user familiarisation and documentation. SDTM
8282
findings domains, such as the microbiology (MB) domain, can have up to 4 columns for capturing the test results (i.e. `MBORRES`, `MBSTRESC` & `MBSTRESN`)
8383
and a selection of over 20 timing variables. The MB domain, for example (see `MB_RPTESTB` below), has one row for every microbiology test and the test
8484
result conducted in the study. Whilst this preserves the intricacies of the study data, it also creates complexity for analysis.
@@ -133,7 +133,7 @@ A key limitation is that the iddoverse functions cannot address every need of re
133133
in the datasets within, and across, diseases. The objective has been to provide assistance and automation of analysis
134134
datasets, whilst keeping the solution generalisable and customisable by the user.
135135

136-
![Figure 1: Flowchart of functions within the `iddoverse` package.](figures/Function Flowchart.tif)
136+
![Figure 1: Flowchart of functions within the `iddoverse` package.](figures/Figure 1 - Function Flowchart.tif)
137137

138138
# iddoverse Functions
139139

@@ -148,11 +148,11 @@ diseases and will not be relevant to most. The hierarchy of timing variables is
148148
parameter to enable researchers to select the most appropriate variable(s) for their analysis. By choosing a ‘best choice’ timing
149149
and result, potential confusion surrounding multiple columns is removed.
150150

151-
![Figure 2: Hierarchy of best choice results/events/findings in `prepare_domain()`. `STRESN` or `DECOD` would be used in the first instance and, where rows are missing this information, they are populated with the variables under them in order. The two letter domain code preceeds these variable names.](figures/Hierarchy Choices.tif)
151+
![Figure 2: Hierarchy of best choice results/events/findings in `prepare_domain()`. `STRESN` or `DECOD` would be used in the first instance and, where rows are missing this information, they are populated with the variables under them in order. The two letter domain code preceeds these variable names.](figures/Figure 2 - Hierarchy Choices.tif)
152152

153153
The `prepare_domain()` function then pivots the rows by the best choice time variable (`TIME`, `TIME_SOURCE`), the study ID (`STUDYID`)
154154
and participant number (`USUBJID`). The different events/findings/tests are transformed into columns, and the dataset is populated
155-
with the associated result. Several domains can then be analysed separately, or joined together by the uniquely identifying keys: `STUDYID`, `USUBJID`, `TIME` and `TIME_SOURCE`.
155+
with the associated result. Several domains can then be analysed separately or joined together by the uniquely identifying keys: `STUDYID`, `USUBJID`, `TIME` and `TIME_SOURCE`.
156156

157157
```r
158158
> prepare_domain(MB_RPTESTB, "mb")
@@ -189,15 +189,16 @@ selection of parameters within `prepare_domain()`. Additionally, some utility an
189189

190190
# Research Impact Statement
191191

192+
The IDDO data repository contains over 1.3 million IPD records from over 600 studies and 70 countries, across 8 disease themes. Since February 2026,
193+
all researchers accessing data from the IDDO data repository have been provided with information on how to use and access the iddoverse package,
194+
thus increasing the number and diversity of users and organisations.
195+
192196
Research which has used the `iddoverse` includes malaria studies conducted by the Liverpool School of Tropical Medicine and the
193197
Mahidol-Oxford Tropical Medicine Research Unit in Thailand (unpublished at time of writing). Published works which have used the `iddoverse`
194198
package include research on factors associated with death from Ebola [@trokon] and two visceral leishmaniasis meta-analyses
195199
[@munir; @kumar]. Additionally, the package is being used internally by IDDO to create summaries to check, and subsequently
196200
improve, the quality of the data standardisation.
197201

198-
Since February 2026, all researchers accessing the IDDO data repository have been provided with information on how to use and access the
199-
`iddoverse` package, thus increasing the number and diversity of users and organisations.
200-
201202
In a survey conducted by IDDO, several users of the IDDO data repository have reported confusion with the variety of timing variables and
202203
result columns. Users will benefit from the `iddoverse` package as a solution to this complexity.
203204

0 commit comments

Comments
 (0)