Envirotox#24
Open
beckyfisher wants to merge 62 commits into
Open
Conversation
…data and source naming conventions
This merge adds the wqbench data sets, as well as several new datasets to the anzg data. it also fixes some minor package documentation errors.
…e build and update pkgdown site
…ge in the appropriate section
…unction for immediate workflow continuity
Collaborator
|
When I compare using diffdf there are differences in the Yanagihara column for envirotox_chronic |
Collaborator
|
acute is good |
Collaborator
|
also chemical good |
joethorley
reviewed
May 21, 2026
Collaborator
joethorley
left a comment
There was a problem hiding this comment.
Looks good.
I notice there is a discrepancy in the Yanagihara24 coding for 43 rows in envirotox_chronic (see comments).
Also the ssddata::envirotox_data object just seems to be a list of the envirotox_chronic, envirotox_acute and envirotox_chemical datasets so I'm not sure if its worth adding?
Finally do we want to add the individual datasets so they are returned by ssd_data_sets()? We could name using make.names() so syntactically correct for R ie.
> make.names("envirotox_acute_1,2,4-Trichlorobenzene")
[1] "envirotox_acute_1.2.4.Trichlorobenzene"
The names remain unique
> length(unique(envirotox_chemical$Chemical))
[1] 744
> length(unique(make.names(envirotox_chemical$Chemical)))
[1] 744
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #21
Summary
Subsumes the full reproducible build pipeline from poissonconsulting/envirotox directly into
ssddata, makingssddataself-contained with no dependency on the (non-CRAN)envirotoxpackage.New datasets
Four new exported datasets are added under the
envirotoxprefix:envirotox_acuteenvirotox_chronicenvirotox_chemicalenvirotox_data$acute,$chronic,$chemicalenvirotox_acuteandenvirotox_chronicinclude logical flag columnsYanagihara24and/orIwasaki25for subsetting to published benchmark subsets without requiring separate datasets.New function
envirotox_data_sets()- returns a character vector of allenvirotox_*dataset names, analogous tossd_data_sets().Changes
data-raw/envirotox/DATASET.R- full pipeline ported from upstream; readsenvirotox.xlsx, filters/converts/aggregates/flags, saves 4.rdafilesdata-raw/source_all.R-"envirotox"added todatasetsvectordata-raw/build_pkgdown_yml.R-envirotox_dataadded toderived_topics; three component datasets added toaccounted_forand marked@keywords internalto suppress separate reference page entriesR/envirotox_acute.R,R/envirotox_chronic.R,R/envirotox_chemical.R,R/envirotox_data.R- manually authored roxygen2 docs; component datasets linked via@seealsofromenvirotox_dataR/get_ssddata.R-envirotox_data_sets()added;ssd_data_sets()updated to excludeenvirotox_*datasets (count remains 28)inst/REFERENCES.bib- addedConnors2019,Yanagihara2024,Iwasaki2025DESCRIPTION- addedEnvStats,mousetrap,openxlsx,stringrtoSuggeststests/testthat/test-envirotox.R- 5 new tests covering key constraints, row/column counts, list structure, andssd_data_sets()count stability.gitattributes-* text=auto eol=lfto prevent line-ending noise in generated files