Skip to content

Envirotox#24

Open
beckyfisher wants to merge 62 commits into
mainfrom
envirotox
Open

Envirotox#24
beckyfisher wants to merge 62 commits into
mainfrom
envirotox

Conversation

@beckyfisher
Copy link
Copy Markdown
Contributor

Closes #21

Depends on #22 - this branch is based off ref_headings. Please merge #22 into dev first, then merge this PR. GitHub will automatically re-target the diff to show only the changes on top of ref_headings once #22 is merged.

Summary

Subsumes the full reproducible build pipeline from poissonconsulting/envirotox directly into ssddata, making ssddata self-contained with no dependency on the (non-CRAN) envirotox package.

New datasets

Four new exported datasets are added under the envirotox prefix:

Dataset Rows Cols Description
envirotox_acute ~14,949 6 Acute toxicity (EC50/LC50), one geometric mean per species per chemical
envirotox_chronic ~1,721 5 Chronic toxicity (NOEC/NOEL), one geometric mean per species per chemical
envirotox_chemical 744 2 Chemical name to CAS Registry Number lookup
envirotox_data - - Named list wrapper: $acute, $chronic, $chemical

envirotox_acute and envirotox_chronic include logical flag columns Yanagihara24 and/or Iwasaki25 for subsetting to published benchmark subsets without requiring separate datasets.

New function

  • envirotox_data_sets() - returns a character vector of all envirotox_* dataset names, analogous to ssd_data_sets().

Changes

  • data-raw/envirotox/DATASET.R - full pipeline ported from upstream; reads envirotox.xlsx, filters/converts/aggregates/flags, saves 4 .rda files
  • data-raw/source_all.R - "envirotox" added to datasets vector
  • data-raw/build_pkgdown_yml.R - envirotox_data added to derived_topics; three component datasets added to accounted_for and marked @keywords internal to suppress separate reference page entries
  • R/envirotox_acute.R, R/envirotox_chronic.R, R/envirotox_chemical.R, R/envirotox_data.R - manually authored roxygen2 docs; component datasets linked via @seealso from envirotox_data
  • R/get_ssddata.R - envirotox_data_sets() added; ssd_data_sets() updated to exclude envirotox_* datasets (count remains 28)
  • inst/REFERENCES.bib - added Connors2019, Yanagihara2024, Iwasaki2025
  • DESCRIPTION - added EnvStats, mousetrap, openxlsx, stringr to Suggests
  • tests/testthat/test-envirotox.R - 5 new tests covering key constraints, row/column counts, list structure, and ssd_data_sets() count stability
  • .gitattributes - * text=auto eol=lf to prevent line-ending noise in generated files

aylapear and others added 30 commits October 31, 2025 10:58
This merge adds the wqbench data sets, as well as several new datasets to the anzg data. it also fixes some minor package documentation errors.
@beckyfisher beckyfisher requested a review from joethorley May 20, 2026 08:13
@joethorley
Copy link
Copy Markdown
Collaborator

When I compare using diffdf there are differences in the Yanagihara column for envirotox_chronic

> diffdf::diffdf(ssddata::envirotox_chronic, envirotox::envirotox_chronic)
Warning message:
In diffdf::diffdf(ssddata::envirotox_chronic, envirotox::envirotox_chronic) :
  
Not all Values Compared Equal
Differences found between the objects!

Summary of BASE and COMPARE
  ======================================================================
    PROPERTY              BASE                         COMP             
  ----------------------------------------------------------------------
      Name     ssddata::envirotox_chronic  envirotox::envirotox_chronic 
     Class     "tbl_df, tbl, data.frame"    "tbl_df, tbl, data.frame"   
    Rows(#)               1721                         1721             
   Columns(#)              5                            5               
  ----------------------------------------------------------------------


Not all Values Compared Equal
  =================================
     Variable    No of Differences 
  ---------------------------------
   Yanagihara24         43         
  ---------------------------------


First 10 of 43 rows are shown in table below
  =============================================
     VARIABLE    ..ROWNUMBER..  BASE   COMPARE 
  ---------------------------------------------
   Yanagihara24       620       FALSE   TRUE   
   Yanagihara24       621       FALSE   TRUE   
   Yanagihara24       622       FALSE   TRUE   
   Yanagihara24       623       FALSE   TRUE   
   Yanagihara24       624       FALSE   TRUE   
   Yanagihara24       625       FALSE   TRUE   
   Yanagihara24       626       FALSE   TRUE   
   Yanagihara24       627       FALSE   TRUE   
   Yanagihara24       628       FALSE   TRUE   
   Yanagihara24       629       FALSE   TRUE   
  ---------------------------------------------

@joethorley
Copy link
Copy Markdown
Collaborator

acute is good

> diffdf::diffdf(ssddata::envirotox_acute, envirotox::envirotox_acute)
No issues were found!

@joethorley
Copy link
Copy Markdown
Collaborator

also chemical good

> diffdf::diffdf(ssddata::envirotox_chemical, envirotox::envirotox_chemical)
No issues were found!

Copy link
Copy Markdown
Collaborator

@joethorley joethorley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I notice there is a discrepancy in the Yanagihara24 coding for 43 rows in envirotox_chronic (see comments).

Also the ssddata::envirotox_data object just seems to be a list of the envirotox_chronic, envirotox_acute and envirotox_chemical datasets so I'm not sure if its worth adding?

Finally do we want to add the individual datasets so they are returned by ssd_data_sets()? We could name using make.names() so syntactically correct for R ie.

> make.names("envirotox_acute_1,2,4-Trichlorobenzene")
[1] "envirotox_acute_1.2.4.Trichlorobenzene"

The names remain unique

> length(unique(envirotox_chemical$Chemical))
[1] 744
> length(unique(make.names(envirotox_chemical$Chemical)))
[1] 744

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add envirotox dataset

3 participants