Skip to content

Commit 7a46e16

Browse files
committed
Update to process ASA24 dietary records
Polyphenol Estimator can now process prospective dietary records, including multi-day records or multiple one-day records. Updates also clarify packages for functions used.
1 parent 27b63ad commit 7a46e16

32 files changed

Lines changed: 1151 additions & 654 deletions

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Polyphenol Estimator
22

3-
This repository contains scripts to automate the estimation of dietary polyphenol intake and calculation of the dietary inflammatory index from ASA24 or NHANES diet recalls.
3+
This repository contains scripts to automate the estimation of dietary polyphenol intake and calculation of the dietary inflammatory index from ASA24 recalls, ASA24 records, and NHANES WWEIA recalls.
44

55
### Releases
66
- November 20, 2025 - Tutorial Draft Release
@@ -21,7 +21,7 @@ Want to estimate polyphenols in your dietary data? Please review our start-up gu
2121
2222
| Inputs | Provided | About |
2323
|------------ |--------- |---------|
24-
| Diet Data | No | ASA24 Items File, or NHANES<br>Note: Current pipeline requires each participant to have at least two recalls. |
24+
| Diet Data | No | ASA24 Items File, or NHANES<br>Note: Current pipeline requires each participant to have at least two recalls, records, or record days. |
2525
| FDA Food Disaggregation Database V 3.1 | Yes | FDA's Food Disaggregation Database contains Ingredients and their percentages within FNDDS food codes. |
2626
| FooDB food polyphenol content | Yes | Contains polyphenol content in foods. Polyphenols were determined based off structure (an aromatic ring with at least two hydroxyl groups) with 9 compounds manually added to better reflect microbial enzyme substrates. |
2727
| FooDB polyphenol list | Yes | List of 3072 polyphenols. File includes FooDB compound ID, compound name, SMILES, InChI key, and taxonomic class. Taxonomic class is from ClassyFire, an automated chemical taxonomic classification application based on chemical structure. |

_config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
title: Polyphenol Estimator
2-
description: A tutorial on how to obtain estimates of polyphenol intake and DII from diet recall data.
2+
description: A tutorial on how to obtain estimates of polyphenol intake and DII from diet recall or record data.
33
theme: just-the-docs
44
markdown: kramdown
55

functions/calculate_DII.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ calculate_DII = function(report = c("none", "html", "md")) {
1717

1818
# The Polyphenol Estimation Pipeline Needs to run first.
1919
# Output from this pipeline kicks off the DII calculation scripts
20-
starting_file = "outputs/Recall_Disaggregated_mapped.csv.bz2"
20+
starting_file = "outputs/Diet_Disaggregated_mapped.csv.bz2"
2121

2222
# Check if it was by confirming Disaggregated Dietary Data File exists
2323
if (!file.exists(starting_file)) {

functions/estimate_polyphenols.R

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -59,16 +59,34 @@ estimate_polyphenols = function(
5959
stop("The diet data file must contain the column '", id_var, "' for ", type, ".")
6060
}
6161

62-
# Obtain recall counts by user
63-
recalls_per_user = diet_dat %>%
64-
group_by(.data[[id_var]]) %>%
65-
summarise(n_recalls = n_distinct(RecallNo), .groups = "drop")
62+
# Determine if Recall or Record
63+
if ("RecallNo" %in% names(diet_dat)) {
64+
# Recall dataset
65+
n_events = diet_dat %>%
66+
group_by(.data[[id_var]]) %>%
67+
summarise(n_events = n_distinct(RecallNo), .groups = "drop")
68+
69+
} else if ("RecordNo" %in% names(diet_dat) || "RecordDayNo" %in% names(diet_dat)) {
70+
# Record dataset
71+
n_events = diet_dat %>%
72+
group_by(.data[[id_var]]) %>%
73+
summarise(
74+
n_events = max(
75+
n_distinct(RecordNo %||% 0),
76+
n_distinct(RecordDayNo %||% 0)
77+
),
78+
.groups = "drop"
79+
)
80+
81+
} else {
82+
stop("Dataset has neither RecallNo nor RecordNo/RecordDayNo columns.")
83+
}
6684

6785
# Recall Status Message
68-
if (max(recalls_per_user$n_recalls, na.rm = TRUE) < 2) {
69-
stop("The diet recall file does not contain multiple recalls per participant.")
86+
if (max(n_events$n_events, na.rm = TRUE) < 2) {
87+
stop("Your diet file does not contain multiple recalls or records per participant.")
7088
} else {
71-
message("Multiple recalls detected across participants.\n")
89+
message("Multiple recalls or records detected across participants.\n")
7290
}
7391

7492
##########################################

index.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ nav_order: 1
66

77
# Polyphenol Estimator
88

9-
This start guide shows you how to take your ASA24 or NHANES dietary data and estimate polyphenol intake using the [FooDB](https://foodb.ca/) and calculate the dietary inflammatory index[^1]. Example ASA24 data, borrowed from the DietDiveR Repository[^2], is provided for you to test. Check out [the example file here](https://github.com/SWi1/polyphenol_pipeline/blob/main/user_inputs/VVKAJ_Items.csv) to see the input structure required for Polyphenol Estimator.
9+
This start guide shows you how to take your ASA24 or NHANES dietary data and estimate polyphenol intake using the [FooDB](https://foodb.ca/) and calculate the dietary inflammatory index [(Shivapppa et al. 2013)](10.1017/S1368980013002115). Example ASA24 data, borrowed from the [DietDiveR Repository](https://computational-nutrition-lab.github.io/DietDiveR/), is provided for you to test. Check out [the example file here](https://github.com/SWi1/polyphenol_pipeline/blob/main/user_inputs/VVKAJ_Items.csv) to see the input structure required for Polyphenol Estimator.
1010

1111
### 1. Download the entire repository directly [here](https://github.com/SWi1/polyphenol_pipeline/archive/refs/heads/main.zip) then unzip the folder.
1212
The repository contains files and scripts used in the tutorial.
@@ -28,16 +28,12 @@ Find a list of expected outputs below:
2828
<details>
2929
<summary>Reports: See What's in Each Script</summary>
3030
<ul>
31-
For every script that was run, a report was generated in the reports folder.
32-
This online tutorial actually shows you what the reports look like if you navigate to pages under "Polyphenol Estimator" and "DII Calculation" in your sidebar.
31+
If you opt to generate md or html reports, then a readable report of each script used will be placed into your reports folder. You can preview the latest reports generated by navigating to pages under "Polyphenol Estimator" and "DII Calculation" in your sidebar. Report generation may be helpful to keep a record of which scripts were used as the tool periodically will be updated.
3332
</ul>
3433
</details>
3534

3635
### Want to test NHANES data instead?
3736
`estimate_polyphenols` can also be run on NHANES WWEIA data. To generate NHANES WWEIA data, follow the instructions in ["Preparing Diet Data - NHANES diet recalls"](https://swi1.github.io/polyphenol_pipeline/webpages/preparing_diet_data_NHANES.html#prepare-nhanes-diet-recall-data). When you've finished:
3837
1. Come back to run_pipeline.R and update `diet_input_file` with the NHANES output file name.
3938
2. In `estimate_polyphenols`, change type to "NHANES"
40-
3. Run the script.
41-
42-
[^1]: [Shivapppa et al. 2013. Designing and developing a literature-derived, population-based dietary inflammatory index](10.1017/S1368980013002115)
43-
[^2]: [DietDiveR Repo](https://computational-nutrition-lab.github.io/DietDiveR/).
39+
3. Run the script.

scripts/DII_STEP1_Eugenol.Rmd

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "DII_PREP - Eugenol"
33
author: "Stephanie Wilson"
4-
date: "November 2025"
4+
date: "February 2026"
55
output:
66
md_document:
77
variant: gfm
@@ -18,16 +18,16 @@ output:
1818
---
1919

2020
## Calculate Eugenol Intake
21-
This script takes in your disaggregated dietary data and FooDB-linked descriptions to calculate eugenol intake per recall and subject.
21+
This script takes in your disaggregated dietary data and FooDB-linked descriptions to calculate eugenol intake per recall (or record) and subject.
2222

2323
#### INPUTS
2424

25-
- **Recall_Disaggregated_mapped.csv.bz2** - Disaggregated dietary data, mapped to FooDB foods, From Step 2 of the polyphenol estimation pipeline
25+
- **Diet_Disaggregated_mapped.csv.bz2** - Disaggregated dietary data, mapped to FooDB foods, From Step 2 of the polyphenol estimation pipeline
2626
- **FooDB_Eugenol_Content_Final.csv** - Eugenol Content for foods in FooDB, Provided File
2727

2828
#### OUTPUTS
2929

30-
- **Recall_DII_eugenol_by_recall.csv**: Sum eugenol content for each participant recall
30+
- **Diet_DII_eugenol_by_entry.csv**: Sum eugenol content for each participant recall or record
3131

3232
## SCRIPT
3333

@@ -54,16 +54,30 @@ Load data
5454
source("provided_files.R")
5555
5656
# Load Dietary data that has been disaggregated and connected to FooDB
57-
input_mapped = vroom::vroom('outputs/Recall_Disaggregated_mapped.csv.bz2',
57+
input_mapped = vroom::vroom('outputs/Diet_Disaggregated_mapped.csv.bz2',
5858
show_col_types = FALSE)
5959
6060
# Eugenol Content in FooDB
6161
# Note: Eugenol doesn't have retention factors from Phenol Explorer
6262
eugenol = vroom::vroom(FooDB_eugenol, show_col_types = FALSE) %>%
63-
select(-c(source_type, food_name:orig_source_name))
63+
dplyr::select(-c(source_type, food_name:orig_source_name))
6464
```
6565

6666

67+
### Specify grouping variables
68+
Column grouping depends on whether output is from a record or recall.
69+
```{r}
70+
if ("RecallNo" %in% names(input_mapped)) {
71+
group_vars = c("subject", "RecallNo")
72+
73+
} else if ("RecordNo" %in% names(input_mapped)) {
74+
group_vars = c("subject", "RecordNo", "RecordDayNo")
75+
76+
} else {
77+
stop("Data must contain RecallNo or RecordNo.")
78+
}
79+
```
80+
6781
### Merge FooDB-matched Ingredient Codes to FooDB Eugenol Content File.
6882

6983
- Link between FooDB Polyphenol Content and code-matched data is *food_id*.
@@ -73,20 +87,22 @@ input_mapped_content = input_mapped %>%
7387
# Bring in the Polyphenol Content
7488
dplyr::left_join(eugenol, by = 'food_id') %>%
7589
# Remove content that is NA
76-
filter(!is.na(orig_content_avg)) %>%
90+
dplyr::filter(!is.na(orig_content_avg)) %>%
7791
# Calculate eugenol amount consumed in milligrams
78-
mutate(eugenol_mg = (orig_content_avg * 0.01) * FoodAmt_Ing_g) %>%
79-
# Sum eugenol content by subject and recall
80-
group_by(subject, RecallNo) %>%
81-
# Calculate EUGENOL summed per recall, and rename for dietaryindex function
82-
mutate(EUGENOL= sum(eugenol_mg)) %>%
83-
ungroup() %>%
84-
select(c(subject, RecallNo, EUGENOL)) %>%
92+
dplyr::mutate(eugenol_mg = (orig_content_avg * 0.01) * FoodAmt_Ing_g) %>%
93+
# Recall - Sum by Subject, Recall
94+
# Record - Sum by Subject, Record Number, Day in Record Number
95+
dplyr::group_by(across(all_of(group_vars))) %>%
96+
# Calculate EUGENOL summed per recal or record
97+
# with rename for dietaryindex function
98+
dplyr::mutate(EUGENOL = sum(eugenol_mg)) %>%
99+
dplyr::ungroup() %>%
100+
dplyr::select(c(subject, any_of(c("RecallNo", "RecordNo", "RecordDayNo")), EUGENOL)) %>%
85101
# Keep distinct entries
86-
distinct(subject, RecallNo, .keep_all = TRUE)
102+
dplyr::distinct(across(all_of(group_vars)), .keep_all = TRUE)
87103
```
88104

89105
### Export Eugenol Intake File for DII Calculation
90106
```{r}
91-
vroom::vroom_write(input_mapped_content, 'outputs/Recall_DII_eugenol_by_recall.csv', delim = ",")
107+
vroom::vroom_write(input_mapped_content, 'outputs/Diet_DII_eugenol_by_entry.csv', delim = ",")
92108
```

scripts/DII_STEP2_Polyphenol_Subclass.Rmd

Lines changed: 32 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "DII_PREP - Polyphenol Subclasses"
33
author: "Stephanie Wilson"
4-
date: "November 2025"
4+
date: "February 2026"
55
output:
66
md_document:
77
variant: gfm
@@ -18,16 +18,16 @@ output:
1818
---
1919

2020
## Calculate DII Polyphenol Subclasses
21-
This script takes your data that has been mapped to FooDB polyphenol content, extracts compounds categorized under the six required DII subclasses (flavan-3-ols, Flavones, Flavonols, Flavonones, Anthocyanidins, Isoflavones), and calculates the total intake of these subclasses per participant recall.
21+
This script takes your data that has been mapped to FooDB polyphenol content, extracts compounds categorized under the six required DII subclasses (flavan-3-ols, Flavones, Flavonols, Flavonones, Anthocyanidins, Isoflavones), and calculates the total intake of these subclasses per participant recall or record.
2222

2323
#### INPUTS
2424

25-
- **Recall_FooDB_polyphenol_content.csv.bz2**: Disaggregated dietary data, mapped to FooDB polyphenol content, at the compound-level
25+
- **Diet_FooDB_polyphenol_content.csv.bz2**: Disaggregated dietary data, mapped to FooDB polyphenol content, at the compound-level
2626
- **FooDB_DII_polyphenol_list.csv** - Polyphenols under the six polyphenol subclasses required for DII-2014, Provided File
2727

2828
#### OUTPUTS
2929

30-
- **Recall_DII_subclass_by_recall.csv**: Sum DII polyphenol subclass content for each participant recall
30+
- **Diet_DII_subclass_by_entry.csv**: Sum DII polyphenol subclass content for each participant recall or record
3131

3232
## SCRIPT
3333

@@ -54,20 +54,33 @@ Load data
5454
source("provided_files.R")
5555
5656
# Load dietary data mapped to polyphenol content
57-
input_polyphenol_content = vroom::vroom('outputs/Recall_FooDB_polyphenol_content.csv.bz2',
57+
input_polyphenol_content = vroom::vroom('outputs/Diet_FooDB_polyphenol_content.csv.bz2',
5858
show_col_types = FALSE)
5959
60-
# Eugenol Content in FooDB
61-
# Note: Eugenol doesn't have retention factors from Phenol Explorer
60+
# Polyphenol classifications
6261
subclasses = vroom::vroom(FooDB_DII_subclasses, show_col_types = FALSE)
6362
```
6463

6564
### Filter in relevant DII subclass polyphenols
6665
```{r}
6766
input_polyphenol_content_filtered = input_polyphenol_content %>%
68-
filter(compound_public_id %in% subclasses$compound_public_id) %>%
67+
dplyr::filter(compound_public_id %in% subclasses$compound_public_id) %>%
6968
# Merge the class information so subclasses are grouped correctly
70-
left_join(subclasses)
69+
dplyr::left_join(subclasses)
70+
```
71+
72+
### Specify grouping variables
73+
Column grouping depends on whether output is from a record or recall.
74+
```{r}
75+
if ("RecallNo" %in% names(input_polyphenol_content_filtered)) {
76+
group_vars = c("subject", "RecallNo", "component")
77+
78+
} else if ("RecordNo" %in% names(input_polyphenol_content_filtered)) {
79+
group_vars = c("subject", "RecordNo", "RecordDayNo", "component")
80+
81+
} else {
82+
stop("Data must contain RecallNo or RecordNo.")
83+
}
7184
```
7285

7386

@@ -76,18 +89,20 @@ The column `component` contains the DII category.
7689

7790
```{r}
7891
subclass_intakes = input_polyphenol_content_filtered %>%
79-
group_by(subject, RecallNo, component) %>%
92+
dplyr::group_by(across(all_of(group_vars))) %>%
8093
# Sum polyphenol category intake, mg by recall
81-
mutate(component_sum = sum(pp_consumed)) %>%
82-
ungroup() %>%
94+
dplyr::mutate(component_sum = sum(pp_consumed)) %>%
95+
dplyr::ungroup() %>%
8396
# Keep distinct entries
84-
distinct(subject, RecallNo, component, .keep_all = TRUE) %>%
97+
dplyr::distinct(across(all_of(group_vars)), .keep_all = TRUE) %>%
8598
# Minimize the number of columns for pivoting to wide format
86-
select(c(subject, RecallNo, component, component_sum)) %>%
99+
dplyr::select(c(subject,
100+
any_of(c("RecallNo", "RecordNo", "RecordDayNo")),
101+
component, component_sum)) %>%
87102
# pivot to wide version
88-
pivot_wider(names_from = component, values_from = component_sum) %>%
103+
tidyr::pivot_wider(names_from = component, values_from = component_sum) %>%
89104
# Rename the columns to match the DII category names in the DietaryIndex function
90-
rename(
105+
dplyr::rename(
91106
ISOFLAVONES = Isoflavones,
92107
"FLA3OL" = "Flavan-3-ols",
93108
"FLAVONES" = "Flavones",
@@ -98,6 +113,6 @@ subclass_intakes = input_polyphenol_content_filtered %>%
98113

99114
### Export Polyphenol Subclass Intakes for DII Calculation
100115
```{r EXPORT}
101-
vroom::vroom_write(subclass_intakes, 'outputs/Recall_DII_subclass_by_recall.csv', delim = ",")
116+
vroom::vroom_write(subclass_intakes, 'outputs/Diet_DII_subclass_by_entry.csv', delim = ",")
102117
```
103118

0 commit comments

Comments
 (0)