diff --git a/.Rbuildignore b/.Rbuildignore index 2e29672b..891f0386 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -12,3 +12,5 @@ ^LICENSE\.md$ ^cran-comments\.md$ ^\.zenodo\.json$ +^\.github$ +^precompile$ diff --git a/.github/.gitignore b/.github/.gitignore new file mode 100644 index 00000000..2d19fc76 --- /dev/null +++ b/.github/.gitignore @@ -0,0 +1 @@ +*.html diff --git a/.github/404.md b/.github/404.md new file mode 100644 index 00000000..60f5fd02 --- /dev/null +++ b/.github/404.md @@ -0,0 +1,5 @@ +# Page not found (404) + +The page you requested was not found. + +Please use the links in the navigation bar. diff --git a/.github/CODE_OF_CONDUCT.md b/.github/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..1f188b3a --- /dev/null +++ b/.github/CODE_OF_CONDUCT.md @@ -0,0 +1,44 @@ +# Contributor Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at [ethan@weecology.org](mailto:ethan@weecology.org). All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 1.4, +available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 00000000..e6aefd21 --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,90 @@ +# Guidelines for Contributing + +Thanks for checking out our project! + +## Contributor Code of Conduct +All contributors will be expected to follow our [code of conduct](CODE_OF_CONDUCT.md). + +## Workflow +#### For the General Public +If you're not a member of the Weecology lab, we ask that you use one of the following two methods for contributing: + +1. Create an issue -- if you spot any typos, bugs, or have general suggestions, etc. You can also use this to participate in ongoing discussions. For more info, please check out this Github [guide](https://guides.github.com/features/issues/). + +2. Fork weecology/LDATS and clone your copy. Use a branch to add contributions and create a pull request -- if you have suggested bugfixes or changes. For more info, please check out this Github [guide](https://help.github.com/articles/about-pull-requests/). We ask that you follow our guidelines below on documentation and testing. + +We use the R package `devtools` to install, build, and test the changes in the repository: + +```r +install.packages("devtools") +install.packages(".", repos = NULL, type="source", quiet = FALSE, verbose = TRUE) +library(LDATS) +``` + +#### Weecologists + +If you're actively working on this repo, then you should have write access to create branches for any new features or bugfixes. Please see the lab-wiki for info on using branches in a shared repository. + +If you don't have write access and you would like to, please contact @gmyenni for access. + +## Documentation + +If you are contributing code to this project, you generally don't need any additional packages, since the documentation will be written as comments in the R scripts. If you are also building the package, see the [section below](#building) for more details. + +In most cases, you'll be creating a new function and then documenting it. You can check the existing functions for examples, but here's a basic template: +``` +#' @title {this is the heading for the help file} +#' +#' @description {A description of the function} +#' +#' @param {name of a function argument} {what the argument does} +#' +#' @return {what is returned from the function} +#' +#' @examples +#' {R code that is an example use of the function} +#' +#' @export +#' +newfunc <- function() ... +``` + +Note that you can also include links to other functions, math formatting, and more. For more details, see the [chapter on documentation ](http://r-pkgs.had.co.nz/man.html) in Hadley Wickham's book for R packages. + + +## Building + +To fully build the package, including documentation, running-tests, you will need the `roxygen2`, the `testthat`, and the `devtools` package. + +Specific operations are then done by calling the appropriate functions from within R, while your working directory is somewhere in the package folder. + +The suggested workflow is: +1. Write code, documentation, and tests. +2. Run `devtools::document()` to generate the documentation files and update the `NAMESPACE` file. +3. Run `devtools::install()` to install the new version of the package. +4. Run `devtools::test()` to run the test scripts on the new version of the package. + +If you are also prepping the package as a whole, then you will also want to run `devtools::check()` and/or `devtools::check_cran()` to make sure that the package is complete. +Note that you need an up-to-date TeX/LaTeX distribution for running `devtools::check()` and/or `devtools::check_cran()` due to the rendering of the package manual. + +For more info, see the [GitHub repo](https://github.com/hadley/devtools) for the `devtools` package. + +## Testing + +If you are adding new functionality, please include automated tests to verify that some of the basic functionality is correct. + +Automated testing uses R scripts, that live in the `tests/testthat/` subfolder for the package. If you are adding a new file, please name it as `test-{concept}.R`. + +As a general rule, you don't need to test all possible inputs and outputs for a function, but you should test some important aspects: +* outputs are the correct format (including dimensions and components) +* sample input produces the correct sample output + +You can see the existing tests as examples of how to organize your tests, but note that there are several different kinds of `expect_` functions that test for different things. For more details, see the [chapter on testing ](http://r-pkgs.had.co.nz/tests.html) in Hadley Wickham's book for R packages. + +## Attribution + +This document is based on the [CONTRIBUTING +file](https://github.com/weecology/portalr/blob/master/CONTRIBUTING.md) +associated with the Beta release of the +[**portalr**](https://github.com/weecology/portalr/) package +and is used under the MIT License. diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 00000000..c571a627 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,35 @@ +--- +name: Bug report +about: Let us know about faulty behavior and help us improve +title: '[bug] ' +labels: 'bug' +assignees: 'juniperlsimonis' + +--- + +**Describe the bug** +A clear and concise description of what the bug is. + +**To Reproduce** +Steps to reproduce the behavior: +1. Go to '...' +2. Click on '....' +3. Scroll down to '....' +4. See error + +**Expected behavior** +A clear and concise description of what you expected to happen. + +**Screenshots** +If applicable, add screenshots to help explain your problem. + +**Runner Information:** + - OS: [e.g. iOS] + - Browser [e.g. chrome, safari] + - Version [e.g. 22] + +**R Session Information:** + - Run `sessionInfo()` + +**Additional context** +Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 00000000..0086358d --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1 @@ +blank_issues_enabled: true diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 00000000..5fd8f5dc --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,20 @@ +--- +name: Feature request +about: Add functionality to the R package +title: '' +labels: 'feature' +assignees: '' + +--- + +**Is your feature request related to a problem? Please describe.** +A clear and concise description of what the problem is. + +**Describe the solution you'd like** +A clear and concise description of what you want to happen. + +**Describe alternatives you've considered** +A clear and concise description of any alternative solutions or features you've considered. + +**Additional context** +Add any other context or screenshots about the feature request here. diff --git a/.github/ISSUE_TEMPLATE/question.md b/.github/ISSUE_TEMPLATE/question.md new file mode 100644 index 00000000..21f453fc --- /dev/null +++ b/.github/ISSUE_TEMPLATE/question.md @@ -0,0 +1,12 @@ +--- +name: Question +about: Ask a question about this project. +title: '' +labels: 'question' +assignees: 'juniperlsimonis' + +--- + + +Please search [existing issues](https://github.com/weecology/LDATS/issues) to avoid creating duplicates. + diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 00000000..57f3dfe4 --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,43 @@ +# Pull Request Template + +## Description + +Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. + +Fixes # (issue) + + +## Type of change + +- [ ] Bug fix (non-breaking change which fixes an issue) +- [ ] New feature (non-breaking change which adds functionality) +- [ ] Not backwards compatible new feature (breaking change which adds functionality) +- [ ] New model +- [ ] New dataset +- [ ] Documentation edit + + +## How Has This Been Tested? + +Please describe the tests that you ran to verify your changes. We currently use GitHub Actions for automated testing, which should be running (and are required). Provide details about any tests added or altered for new functionality. + +We conduct testing cross-platform. Please indicate if any platforms were added to the matrix or any changes were made to build configurations. + + +## Checklist: + +- [ ] My code follows [the style guidelines](https://github.com/weecology/LDATS/blob/main/.github/CONTRIBUTING.md) of this project +- [ ] I follow [the code of conduct](https://github.com/weecology/LDATS/blob/main/.github/CODE_OF_CONDUCT.md) for this project +- [ ] I have performed a self-review of my own code +- [ ] I have commented my code, particularly in hard-to-understand areas +- [ ] I have made corresponding changes to the documentation +- [ ] My changes generate no new warnings +- [ ] I have added tests that prove my fix is effective or that my feature works +- [ ] New and existing unit tests pass locally with my changes +- [ ] Any dependent changes have been merged and published in downstream modules +- [ ] I have checked my code and corrected any misspellings + + +## Review + +Please tag at least one code reviewer (@juniperlsimonis) in your PR. diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml new file mode 100644 index 00000000..f44885b6 --- /dev/null +++ b/.github/workflows/pkgdown.yaml @@ -0,0 +1,54 @@ +# building website + +on: + push: + branches: + - main + pull_request: + branches: + - main + release: + types: [published] + workflow_dispatch: + +name: pkgdown + +jobs: + pkgdown: + runs-on: ubuntu-latest + + # Only restrict concurrency for non-PR jobs + concurrency: + group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} + + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + + permissions: + contents: write + + steps: + - uses: actions/checkout@v3 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::pkgdown, local::. + needs: website + + - name: Build site + run: pkgdown::build_site_github_pages(new_process = FALSE, examples = FALSE) + shell: Rscript {0} + + - name: Deploy to GitHub pages 🚀 + if: github.event_name != 'pull_request' + uses: JamesIves/github-pages-deploy-action@v4.4.1 + with: + clean: false + branch: gh-pages + folder: docs diff --git a/.github/workflows/r-cmd-check.yaml b/.github/workflows/r-cmd-check.yaml new file mode 100644 index 00000000..c93081e1 --- /dev/null +++ b/.github/workflows/r-cmd-check.yaml @@ -0,0 +1,63 @@ +# R package checking +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples + +on: + push: + branches: + - main + pull_request: + branches: + - main + +name: R-CMD-check + +jobs: + R-CMD-check: + runs-on: ${{ matrix.config.os }} + + name: ${{ matrix.config.os }} (${{ matrix.config.r }}) + + strategy: + fail-fast: false + matrix: + config: + - {os: macos-latest, r: 'release'} + + - {os: windows-latest, r: 'release'} + # Use 3.6 to trigger usage of RTools35 + - {os: windows-latest, r: '3.6'} + # use 4.1 to check with rtools40's older compiler + - {os: windows-latest, r: '4.1'} + + # - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} + # something weird happening with the cache for this version in github actions + + - {os: ubuntu-latest, r: 'release'} + - {os: ubuntu-latest, r: 'oldrel-1'} + - {os: ubuntu-latest, r: 'oldrel-2'} + - {os: ubuntu-latest, r: 'oldrel-3'} + - {os: ubuntu-latest, r: 'oldrel-4'} + + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + R_KEEP_PKG_SOURCE: yes + + steps: + - uses: actions/checkout@v3 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + r-version: ${{ matrix.config.r }} + http-user-agent: ${{ matrix.config.http-user-agent }} + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::rcmdcheck + needs: check + + - uses: r-lib/actions/check-r-package@v2 + with: + upload-snapshots: true \ No newline at end of file diff --git a/.github/workflows/test-coverage.yaml b/.github/workflows/test-coverage.yaml new file mode 100644 index 00000000..f613d755 --- /dev/null +++ b/.github/workflows/test-coverage.yaml @@ -0,0 +1,35 @@ +on: + push: + branches: + - main + pull_request: + branches: + - main + +name: test-coverage + +jobs: + test-coverage: + runs-on: ubuntu-latest + + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + R_KEEP_PKG_SOURCE: yes + + steps: + - uses: actions/checkout@v3 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::covr + needs: coverage + + - name: Test coverage + run: covr::codecov() + shell: Rscript {0} \ No newline at end of file diff --git a/.gitignore b/.gitignore index 9ec24e4f..aed4ab56 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,5 @@ Meta working/~$ats_ms.docx docs/* *.Rproj +/doc/ +/Meta/ diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 20077cec..00000000 --- a/.travis.yml +++ /dev/null @@ -1,29 +0,0 @@ -language: r -cache: packages -sudo: required -warnings_are_errors: false - -before_install: - - sudo apt-get update -qq - - sudo apt-get install texlive-latex-base - - sudo apt-get install gsl-bin libgsl0-dev - -r_packages: - - covr - -matrix: - include: - - r: devel - - r: release - after_success: - - R CMD INSTALL . - - Rscript -e 'pkgdown::build_site(examples = FALSE)' - - Rscript -e 'library(covr); codecov()' - deploy: - provider: pages - skip-cleanup: true - github-token: $GITHUB_PAT - keep-history: true - local-dir: docs - on: - branch: master \ No newline at end of file diff --git a/DESCRIPTION b/DESCRIPTION index ec8fde3c..2adf69a7 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: LDATS Title: Latent Dirichlet Allocation Coupled with Time Series Analyses -Version: 0.2.7 +Version: 0.3.0 Authors@R: c( person(c("Juniper", "L."), "Simonis", email = "juniper.simonis@weecology.org", role = c("aut", "cre"), @@ -18,19 +18,10 @@ Authors@R: c( person(c("S.K.", "Morgan"), "Ernest", role = c("aut"), comment = c(ORCID = "0000-0002-6026-8530")), person(c("Weecology"), role = "cph")) -Description: Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial - time series methods in a two-stage analysis to quantify dynamics in - high-dimensional temporal data. LDA decomposes multivariate data into - lower-dimension latent groupings, whose relative proportions are modeled - using generalized Bayesian time series models that include abrupt - changepoints and smooth dynamics. The methods are described in Blei - et al. (2003) , Western and Kleykamp - (2004) , Venables and Ripley - (2002, ISBN-13:978-0387954578), and Christensen et al. - (2018) . -URL: https://weecology.github.io/LDATS, https://github.com/weecology/LDATS +Description: Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) , Western and Kleykamp (2004) , Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) . +URL: https://weecology.github.io/LDATS/, https://github.com/weecology/LDATS BugReports: https://github.com/weecology/LDATS/issues -Depends: R (>= 3.2.3) +Depends: R (>= 3.5.0) License: MIT + file LICENSE Encoding: UTF-8 LazyData: true @@ -56,6 +47,7 @@ Suggests: rmarkdown, testthat, vdiffr +SystemRequirements: gsl VignetteBuilder: knitr -RoxygenNote: 6.1.1 +RoxygenNote: 7.2.3 diff --git a/NEWS.md b/NEWS.md index c7769356..068afda7 100644 --- a/NEWS.md +++ b/NEWS.md @@ -3,7 +3,16 @@ Version numbers follow [Semantic Versioning](https://semver.org/). -# LDATS 0.2.7(https://github.com/weecology/ldats/releases/tag/v0.2.7) +# LDATS 0.3.0 +*2023-09-16* + +## Patching CRAN issues with vignette building +* The paper comparison vignette is now pre-computed to avoid needing to access internet resources during cran build (see https://ropensci.org/blog/2019/12/08/precompute-vignettes/). + +## Converting from travis ci to github actions +* Travis CI no longer works, so shifting to github actions for builds + +# [LDATS 0.2.7](https://github.com/weecology/ldats/releases/tag/v0.2.7) *2020-03-18* ## Patching CRAN issues with vignette building @@ -11,14 +20,14 @@ Version numbers follow [Semantic Versioning](https://semver.org/). * For the paper comparison vignette, all of the code is pre-run and saved in the LDATS-replications repository * Allows removal of otherwise unused packages from this package's dependency list -# LDATS 0.2.6(https://github.com/weecology/ldats/releases/tag/v0.2.6) +# [LDATS 0.2.6](https://github.com/weecology/ldats/releases/tag/v0.2.6) *2020-03-02* ## Patching a bug in tests for r-devel * `straingsAsFactors` update * only involved patching one test -# LDATS 0.2.5(https://github.com/weecology/ldats/releases/tag/v0.2.5) +# [LDATS 0.2.5](https://github.com/weecology/ldats/releases/tag/v0.2.5) *2019-12-22* ## General editing of simulation functions @@ -146,12 +155,12 @@ Version numbers follow [Semantic Versioning](https://semver.org/). * `plot.LDA_TS()` plots produce the combination of plots. ## Rodents data set -* Portal rodent data from [Christensen *et al.* (2018)](https://doi.org/10.1002/ecy.2373) are now provided in a pre-formatted and ready-to-roll data object. +* Portal rodent data from [Christensen *et al.* (2018)](https://pubmed.ncbi.nlm.nih.gov/29718539/) are now provided in a pre-formatted and ready-to-roll data object. * Access the data using `data(rodents)`. * Note, however, that the data in Christensen *et al.* 2018 are scaled according to trapping effort. The data included in LDATS are not, to allow for appropriate weighting. See [comparison vignette](https://weecology.github.io/LDATS/articles/paper-comparison.html) for further details. -## Comparison with [Christensen *et al.* (2018)](https://doi.org/10.1002/ecy.2373) +## Comparison with [Christensen *et al.* (2018)](https://pubmed.ncbi.nlm.nih.gov/29718539/) * The [comparison vignette](https://weecology.github.io/LDATS/articles/paper-comparison.html) provides a step-by-step comparison of the LDATS pipeline to the analysis in Christensen *et al.* 2018. * The key differences are as follows: @@ -163,4 +172,4 @@ Version numbers follow [Semantic Versioning](https://semver.org/). # [LDATS 0.0.1](https://github.com/weecology/LDATS/commit/326506b9d7fb3e0223948d0245381963f83a2b37) *2017-11-16* -* Beginning initial development of package from [original code](https://github.com/emchristensen/Extreme-events-LDA) used in [Christensen *et al.* (2018)](https://doi.org/10.1002/ecy.2373). +* Beginning initial development of package from [original code](https://github.com/emchristensen/Extreme-events-LDA) used in [Christensen *et al.* (2018)](https://pubmed.ncbi.nlm.nih.gov/29718539/). diff --git a/R/LDA.R b/R/LDA.R index 55f77fd7..8d0d7f92 100644 --- a/R/LDA.R +++ b/R/LDA.R @@ -40,7 +40,7 @@ #' Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet #' Allocation. \emph{Journal of Machine Learning Research} #' \strong{3}:993-1022. -#' \href{http://jmlr.csail.mit.edu/papers/v3/blei03a.html}{link}. +#' \href{https://jmlr.csail.mit.edu/papers/v3/blei03a.html}{link}. #' #' Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic #' Models. \emph{Journal of Statistical Software} \strong{40}:13. @@ -91,7 +91,7 @@ LDA_set <- function(document_term_table, topics = 2, nseeds = 1, #' @references #' Buntine, W. 2002. Variational extensions to EM and multinomial PCA. #' \emph{European Conference on Machine Learning, Lecture Notes in Computer -#' Science} \strong{2430}:23-34. \href{https://bit.ly/327sltH}{link}. +#' Science} \strong{2430}:23-34. \href{https://link.springer.com/chapter/10.1007/3-540-36755-1_3}{link}. #' #' Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic #' Models. \emph{Journal of Statistical Software} \strong{40}:13. @@ -100,7 +100,7 @@ LDA_set <- function(document_term_table, topics = 2, nseeds = 1, #' Hoffman, M. D., D. M. Blei, and F. Bach. 2010. Online learning for #' latent Dirichlet allocation. \emph{Advances in Neural Information #' Processing Systems} \strong{23}:856-864. -#' \href{https://bit.ly/2LEr5sb}{link}. +#' \href{https://papers.nips.cc/paper/3902-online-learning-for-latent-dirichlet-allocation}{link}. #' #' @examples #' data(rodents) diff --git a/R/LDATS.R b/R/LDATS.R index e43f6e96..90d7f725 100644 --- a/R/LDATS.R +++ b/R/LDATS.R @@ -26,25 +26,25 @@ #' 2002) following Christensen \emph{et al.} (2018). #' #' @section Documentation: -#' \href{https://bit.ly/30n9sRJ}{Technical mathematical manuscript} +#' \href{https://github.com/weecology/LDATS/blob/main/LDATS_model.pdf}{Technical mathematical manuscript} #' \cr \cr -#' \href{https://bit.ly/2Jvj9GS}{End-user-focused vignette worked example} +#' \href{https://weecology.github.io/LDATS/articles/rodents-example.html}{End-user-focused vignette worked example} #' \cr \cr -#' \href{https://bit.ly/2xFzJOW}{Computational pipeline vignette} +#' \href{https://weecology.github.io/LDATS/articles/LDATS_codebase.html}{Computational pipeline vignette} #' \cr \cr -#' \href{https://bit.ly/2NFTVLh}{Comparison to Christensen \emph{et al.}} +#' \href{https://weecology.github.io/LDATS/articles/paper-comparison.html}{Comparison to Christensen \emph{et al.}} #' #' @references #' #' Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet #' Allocation. \emph{Journal of Machine Learning Research} #' \strong{3}:993-1022. -#' \href{http://jmlr.csail.mit.edu/papers/v3/blei03a.html}{link}. +#' \href{https://jmlr.csail.mit.edu/papers/v3/blei03a.html}{link}. #' #' Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. #' Long-term community change through multiple rapid transitions in a #' desert rodent community. \emph{Ecology} \strong{99}:1523-1529. -#' \href{https://doi.org/10.1002/ecy.2373}{link}. +#' \href{https://pubmed.ncbi.nlm.nih.gov/29718539/}{link}. #' #' Venables, W. N. and B. D. Ripley. 2002. \emph{Modern and Applied #' Statistics with S}. Fourth Edition. Springer, New York, NY, USA. @@ -52,7 +52,7 @@ #' Western, B. and M. Kleykamp. 2004. A Bayesian change point model for #' historical time series analysis. \emph{Political Analysis} #' \strong{12}:354-374. -#' \href{https://doi.org/10.1093/pan/mph023}{link}. +#' \href{https://www.cambridge.org/core/journals/political-analysis/article/abs/bayesian-change-point-model-for-historical-time-series-analysis/F7D2EDBBC211278EC6C6CB43FE170812}{link}. #' #' @name LDATS #' diff --git a/R/LDA_TS.R b/R/LDA_TS.R index d9c7bb17..7b71c89d 100644 --- a/R/LDA_TS.R +++ b/R/LDA_TS.R @@ -94,12 +94,12 @@ #' Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet #' Allocation. \emph{Journal of Machine Learning Research} #' \strong{3}:993-1022. -#' \href{http://jmlr.csail.mit.edu/papers/v3/blei03a.html}{link}. +#' \href{https://jmlr.csail.mit.edu/papers/v3/blei03a.html}{link}. #' #' Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. #' Long-term community change through multiple rapid transitions in a #' desert rodent community. \emph{Ecology} \strong{99}:1523-1529. -#' \href{https://doi.org/10.1002/ecy.2373}{link}. +#' \href{https://pubmed.ncbi.nlm.nih.gov/29718539/}{link}. #' #' Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic #' Models. \emph{Journal of Statistical Software} \strong{40}:13. @@ -108,7 +108,7 @@ #' Western, B. and M. Kleykamp. 2004. A Bayesian change point model for #' historical time series analysis. \emph{Political Analysis} #' \strong{12}:354-374. -#' \href{https://doi.org/10.1093/pan/mph023}{link}. +#' \href{https://www.cambridge.org/core/journals/political-analysis/article/abs/bayesian-change-point-model-for-historical-time-series-analysis/F7D2EDBBC211278EC6C6CB43FE170812}{link}. #' #' @examples #' data(rodents) diff --git a/R/TS.R b/R/TS.R index f963788c..f144de21 100644 --- a/R/TS.R +++ b/R/TS.R @@ -104,12 +104,12 @@ #' Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. #' Long-term community change through multiple rapid transitions in a #' desert rodent community. \emph{Ecology} \strong{99}:1523-1529. -#' \href{https://doi.org/10.1002/ecy.2373}{link}. +#' \href{https://pubmed.ncbi.nlm.nih.gov/29718539/}{link}. #' #' Earl, D. J. and M. W. Deem. 2005. Parallel tempering: theory, #' applications, and new perspectives. \emph{Physical Chemistry Chemical #' Physics} \strong{7}: 3910-3916. -#' \href{https://doi.org/10.1039/B509983H}{link}. +#' \href{https://pubs.rsc.org/en/content/articlelanding/2005/CP/b509983h}{link}. #' #' McCullagh, P. and J. A. Nelder. 1989. \emph{Generalized Linear Models}. #' 2nd Edition. Chapman and Hall, New York, NY, USA. @@ -120,7 +120,7 @@ #' Ruggieri, E. 2013. A Bayesian approach to detecting change points in #' climactic records. \emph{International Journal of Climatology} #' \strong{33}:520-528. -#' \href{https://doi.org/10.1002/joc.3447}{link}. +#' \href{https://mathcs.holycross.edu/~eruggier/joc3447.pdf}{link}. #' #' Venables, W. N. and B. D. Ripley. 2002. \emph{Modern and Applied #' Statistics with S}. Fourth Edition. Springer, New York, NY, USA. @@ -128,7 +128,7 @@ #' Western, B. and M. Kleykamp. 2004. A Bayesian change point model for #' historical time series analysis. \emph{Political Analysis} #' \strong{12}:354-374. -#' \href{https://doi.org/10.1093/pan/mph023}{link}. +#' \href{https://www.cambridge.org/core/journals/political-analysis/article/abs/bayesian-change-point-model-for-historical-time-series-analysis/F7D2EDBBC211278EC6C6CB43FE170812}{link}. #' #' @examples #' data(rodents) @@ -578,7 +578,7 @@ measure_rho_vcov <- function(rhos){ #' Western, B. and M. Kleykamp. 2004. A Bayesian change point model for #' historical time series analysis. \emph{Political Analysis} #' \strong{12}:354-374. -#' \href{https://doi.org/10.1093/pan/mph023}{link}. +#' \href{https://www.cambridge.org/core/journals/political-analysis/article/abs/bayesian-change-point-model-for-historical-time-series-analysis/F7D2EDBBC211278EC6C6CB43FE170812}{link}. #' #' @examples #' \donttest{ diff --git a/R/data.R b/R/data.R index 5e29bac2..fd4fb858 100644 --- a/R/data.R +++ b/R/data.R @@ -26,13 +26,13 @@ #' Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. #' Long-term community change through multiple rapid transitions in a #' desert rodent community. \emph{Ecology} \strong{99}:1523-1529. -#' \href{https://doi.org/10.1002/ecy.2373}{link}. +#' \href{https://pubmed.ncbi.nlm.nih.gov/29718539/}{link}. #' #' Ernest, S. K. M., \emph{et al}. 2016. #' Long-term monitoring and experimental manipulation of a Chihuahuan desert #' ecosystem near Portal, Arizona ({1977-2013}). #' \emph{Ecology} \strong{97}:1082. -#' \href{https://doi.org/10.1890/15-2115.1}{link}. +#' \href{https://experts.nebraska.edu/en/publications/long-term-monitoring-and-experimental-manipulation-of-a-chihuahua}{link}. #' "rodents" @@ -50,14 +50,14 @@ #' the document covariate table (called \code{document_covariate_table}) #' with columns of covariates (time step, year, season). #' -#' @source \url{https://jornada.nmsu.edu/lter/dataset/49798/view} +#' @source \url{https://lter.jornada.nmsu.edu/data-catalog/} #' #' @references #' Lightfoot, D. C., A. D. Davidson, D. G. Parker, L. Hernandez, and J. W. #' Laundre. 2012. Bottom-up regulation of desert grassland and shrubland #' rodent communities: implications of species-specific reproductive #' potentials. \emph{Journal of Mammalogy} \strong{93}:1017-1028. -#' \href{https://doi.org/10.1644/11-MAMM-A-391.1}{link}. +#' \href{https://academic.oup.com/jmammal/article/93/4/1017/957927}{link}. #' "jornada" diff --git a/R/ptMCMC.R b/R/ptMCMC.R index 6695312a..7e59f9a6 100644 --- a/R/ptMCMC.R +++ b/R/ptMCMC.R @@ -36,7 +36,7 @@ #' Katzgraber, H. G., S. Trebst, D. A. Huse. And M. Troyer. 2006. #' Feedback-optimized parallel tempering Monte Carlo. \emph{Journal of #' Statistical Mechanics: Theory and Experiment} \strong{3}:P03018 -#' \href{https://bit.ly/2LICGXh}{link}. +#' \href{https://iopscience.iop.org/article/10.1088/1742-5468/2006/03/P03018}{link}. #' #' @examples #' \donttest{ @@ -89,7 +89,7 @@ diagnose_ptMCMC <- function(ptMCMCout){ #' Katzgraber, H. G., S. Trebst, D. A. Huse. And M. Troyer. 2006. #' Feedback-optimized parallel tempering Monte Carlo. \emph{Journal of #' Statistical Mechanics: Theory and Experiment} \strong{3}:P03018 -#' \href{https://bit.ly/2LICGXh}{link}. +#' \href{https://iopscience.iop.org/article/10.1088/1742-5468/2006/03/P03018}{link}. #' #' @examples #' \donttest{ @@ -168,12 +168,12 @@ count_trips <- function(ids){ #' Earl, D. J. and M. W. Deem. 2005. Parallel tempering: theory, #' applications, and new perspectives. \emph{Physical Chemistry Chemical #' Physics} \strong{7}: 3910-3916. -#' \href{https://rsc.li/2XkxPCm}{link}. +#' \href{https://pubs.rsc.org/en/content/articlelanding/2005/cp/b509983h}{link}. #' #' Falcioni, M. and M. W. Deem. 1999. A biased Monte Carlo scheme for #' zeolite structure solution. \emph{Journal of Chemical Physics} #' \strong{110}: 1754-1766. -#' \href{https://aip.scitation.org/doi/10.1063/1.477812}{link}. +#' \href{https://pubs.aip.org/aip/jcp/article/110/3/1754/475486/A-biased-Monte-Carlo-scheme-for-zeolite-structure}{link}. #' #' Geyer, C. J. 1991. Markov Chain Monte Carlo maximum likelihood. \emph{In #' Computing Science and Statistics: Proceedings of the 23rd Symposium on @@ -293,7 +293,7 @@ swap_chains <- function(chainsin, inputs, ids){ #' #' Hastings, W. K. 1970. Monte Carlo sampling methods using Markov Chains #' and their applications. \emph{Biometrika} \strong{57}:97-109. -#' \href{https://doi.org/10.2307/2334940}{link}. +#' \href{https://academic.oup.com/biomet/article-abstract/57/1/97/284580}{link}. #' #' Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. #' Teller. 1953. Equations of state calculations by fast computing machines. diff --git a/README.md b/README.md index 69590b9d..266de006 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,17 @@ # Latent Dirichlet Allocation coupled with Bayesian Time Series analyses -[![Build Status](https://travis-ci.org/weecology/LDATS.svg?branch=master)](https://travis-ci.org/weecology/LDATS) -[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/weecology/LDATS/master/LICENSE) -[![Lifecycle:maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing) -[![Codecov test coverage](https://img.shields.io/codecov/c/github/weecology/LDATS/master.svg)](https://codecov.io/github/weecology/LDATS/branch/master) +[![R-CMD-check](https://github.com/weecology/LDATS/actions/workflows/r-cmd-check.yaml/badge.svg)](https://github.com/weecology/LDATS/actions/workflows/r-cmd-check.yaml) +[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/weecology/LDATS/main/LICENSE) +[![Lifecycle:maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#maturing) +[![Codecov test coverage](https://img.shields.io/codecov/c/github/weecology/LDATS/main.svg)](https://app.codecov.io/github/weecology/LDATS/branch/main) [![CRAN downloads](https://cranlogs.r-pkg.org/badges/grand-total/LDATS)](https://CRAN.R-project.org/package=LDATS) -[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3286617.svg)](https://doi.org/10.5281/zenodo.3286617) +[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3286617.svg)](https://zenodo.org/record/3715386) ## Overview The **`LDATS`** package provides functionality for analyzing time series of high-dimensional data using a two-stage approach comprised of Latent Dirichlet Allocation (LDA) and Bayesian time series (TS) analyses. -For a full description of the math underlying the **`LDATS`** package, see the [technical document](https://github.com/weecology/LDATS/blob/master/LDATS_model.pdf). +For a full description of the math underlying the **`LDATS`** package, see the [technical document](https://github.com/weecology/LDATS/blob/main/LDATS_model.pdf). ## Status: Stable Version Available, Continuing Development @@ -54,25 +54,25 @@ prints the selected LDA and TS models and plot(r_LDATS) ``` produces a 4-panel figure of them a la Figure 1 from -[Christensen et al. 2018](https://doi.org/10.1002/ecy.2373). +[Christensen et al. 2018](https://pubmed.ncbi.nlm.nih.gov/29718539/). ## More Information -Based on initial work using [LDA to analyze time-series data at Portal by Erica M. Christensen, David J. Harris, and S. K. Morgan Ernest](https://github.com/emchristensen/Extreme-events-LDA), which has been [published in *Ecology*](https://doi.org/10.1002/ecy.2373) +Based on initial work using [LDA to analyze time-series data at Portal by Erica M. Christensen, David J. Harris, and S. K. Morgan Ernest](https://github.com/emchristensen/Extreme-events-LDA), which has been [published in *Ecology*](https://pubmed.ncbi.nlm.nih.gov/29718539/) ## Acknowledgements -The motivating study—the Portal Project—has been funded nearly continuously since 1977 by the [National Science Foundation](http://nsf.gov/), most recently by [DEB-1622425](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1622425) to S. K. M. Ernest, which also supported (in part) E. Christensen’s time. -Much of the computational work (including time of J. Simonis, D. Harris, and H. Ye) was supported by the [Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative](http://www.moore.org/programs/science/data-driven-discovery) through [Grant GBMF4563](http://www.moore.org/grants/list/GBMF4563) to E. P. White. +The motivating study—the Portal Project—has been funded nearly continuously since 1977 by the [National Science Foundation](https://www.nsf.gov/), most recently by [DEB-1622425](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1622425) to S. K. M. Ernest, which also supported (in part) E. Christensen’s time. +Much of the computational work (including time of J. Simonis, D. Harris, and H. Ye) was supported by the [Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative](https://www.moore.org/initiative-strategy-detail?initiativeId=data-driven-discovery) through [Grant GBMF4563](https://www.moore.org/grant-detail?grantId=GBMF4563) to E. P. White. R. Diaz was supported in part by a [National Science Foundation Graduate Research Fellowship](http://www.nsfgrfp.org/) (No. [DGE-1315138](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1315138) and [DGE-1842473](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1842473)). ## Author Contributions **J. L. Simonis** provided insight on LDA applications and feedback on technical writing during development of the first version of the LDATS model and application, led the coding and mathematical development of the model into an R package, and led writing on the technical model document. -**E. M. Christensen** led the project during development of the first version of the LDATS model and its application to the Portal data, specifically conceiving the project, coding the pipeline wrappers of the analysis, and writing and editing the first description of the model and its application ([Christensen *et al.* 2018](https://doi.org/10.1002/ecy.2373)). -**D. J. Harris** was involved in developing and applying the first version of the LDATS model, specifically suggesting the LDA and change point approaches, coding the first version of the change point model, and writing and editing the first description of the model ([Christensen *et al.* 2018](https://doi.org/10.1002/ecy.2373)). +**E. M. Christensen** led the project during development of the first version of the LDATS model and its application to the Portal data, specifically conceiving the project, coding the pipeline wrappers of the analysis, and writing and editing the first description of the model and its application ([Christensen *et al.* 2018](https://pubmed.ncbi.nlm.nih.gov/29718539/)). +**D. J. Harris** was involved in developing and applying the first version of the LDATS model, specifically suggesting the LDA and change point approaches, coding the first version of the change point model, and writing and editing the first description of the model ([Christensen *et al.* 2018](https://pubmed.ncbi.nlm.nih.gov/29718539/)). **R. Diaz** contributed code to the LDATS package, wrote vignettes, provided insight into model development, and conducted extensive end-user code application testing. **H. Ye** contributed code to the LDATS package, insight into data structures and LDA algorithms, and significant feedback on vignettes. **E. P. White** helped design, troubleshoot, and supervise initial methods development; provided big-picture feedback on development of the R package; contributed end-user application testing; and gave substantial editing feedback on the technical document. -**S. K. Morgan Ernest** provided managerial oversight and feedback on the project in both the initial and second stages of LDATS development, tested applications of the code to data sets, and assisted with writing and editing of the first description of the model and its application ([Christensen *et al.* 2018](https://doi.org/10.1002/ecy.2373)) as well as the technical model document. +**S. K. Morgan Ernest** provided managerial oversight and feedback on the project in both the initial and second stages of LDATS development, tested applications of the code to data sets, and assisted with writing and editing of the first description of the model and its application ([Christensen *et al.* 2018](https://pubmed.ncbi.nlm.nih.gov/29718539/)) as well as the technical model document. diff --git a/cran-comments.md b/cran-comments.md index 15a4aa8d..d3101759 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,20 +1,74 @@ -This resubmission addresses problems associated with vignette dependencies +This resubmission addresses problems associated with vignette building and remote resources. +We now pre-compile the paper-comparison vignette prior to the formal build process following https://ropensci.org/blog/2019/12/08/precompute-vignettes/. ## Test environments -* local Windows 10 home install, R 3.6.1 64-bit and 32-bit -* local Windows 10 home install, R-devel (2020-03-12 r77936) 64-bit and 32-bit -* ubuntu 16.04.6 LTS (on travis-ci), R 3.6.2 and R-devel (2020-03-13 r77948) -* win builder, R 3.5.3 -* win builder, R 3.6.3 -* win builder, R-devel (2020-03-11 r77925) -* R-hub builder, Ubuntu Linux 16.04 LTS, R-release, GCC -* R-hub builder, Windows Server 2008 R2 SP1, R-devel, 32/64 bit -* R-hub builder, Fedora Linux, R-devel, clang, gfortran -* R-hub builder, macOS 10.11 El Capitan, R-release (experimental) -* R-hub builder, Oracle Solaris 10, x86, 32 bit, R-patched (experimental) + +* Local: Windows 10 home install (build 19045), 64-bit R 4.3.0 (2023-04-21 ucrt) + +* GitHub Actions: MacOS 12.6.8 21G725, R 4.3.1 (2023-06-16) + +* GitHub Actions: Microsoft Windows Microsoft Windows Server 2022, 10.0.20348, Datacenter, x86_64-w64-mingw32 (64-bit), R 4.3.1 (2023-06-16 ucrt) +* GitHub Actions: Microsoft Windows Microsoft Windows Server 2022, 10.0.20348, Datacenter, x86_64-w64-mingw32 (64-bit), R 3.6.3 (2020-02-29) +* GitHub Actions: Microsoft Windows Microsoft Windows Server 2022, 10.0.20348, Datacenter, x86_64-w64-mingw32 (64-bit), R 4.1.3 (2022-03-10) + +* GitHub Actions: Ubuntu 22.04.3 LTS, x86_64-pc-linux-gnu (64-bit), R 4.3.1 (2023-06-16) +* GitHub Actions: Ubuntu 22.04.3 LTS, x86_64-pc-linux-gnu (64-bit), R 4.2.3 (2023-03-15) +* GitHub Actions: Ubuntu 22.04.3 LTS, x86_64-pc-linux-gnu (64-bit), R 4.1.3 (2022-03-10) +* GitHub Actions: Ubuntu 22.04.3 LTS, x86_64-pc-linux-gnu (64-bit), R 4.0.5 (2021-03-31) +* GitHub Actions: Ubuntu 22.04.3 LTS, x86_64-pc-linux-gnu (64-bit), R 3.6.3 (2020-02-29) + +* win-builder: Windows Server 2022 x64 (build 20348), x86_64-w64-mingw32, R Under development (unstable) (2023-09-16 r85157 ucrt) +* win-builder: Windows Server 2022 x64 (build 20348), x86_64-w64-mingw32, R 4.3.1 (2023-06-16 ucrt) +* win-builder: Windows Server 2022 x64 (build 20348), x86_64-w64-mingw32, R 4.2.3 (2023-03-15 ucrt) + +* mac-builder: macosx, macOS 13.3.1 (22E261), r-release-macosx-arm64, R 4.3.0 + +* R-hub builder: Windows Server 2022, R-devel, 64 bit +* R-hub builder: +* R-hub builder: + ## R CMD check results: -There were no ERRORs, WARNINGs, or NOTEs +There were no ERRORs, WARNINGs, or substantive NOTEs + +### There are spurious NOTES associated with URLs on the win-builder system: + +* checking CRAN incoming feasibility ... NOTE +Maintainer: 'Juniper L. Simonis ' + +Found the following (possibly) invalid URLs: + URL: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1315138 + From: README.md + Status: Error + Message: Operation timed out after 60004 milliseconds with 0 bytes received + +Found the following (possibly) invalid DOIs: + DOI: 10.1002/ecy.2373 + From: DESCRIPTION + Status: Forbidden + Message: 403 + +### There are spurious NOTES associated with URLs and the manual build on the Windows Server of R hub: + +* checking CRAN incoming feasibility ... [224s] NOTE +Maintainer: 'Juniper L. Simonis ' +Found the following (possibly) invalid URLs: + URL: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1622425 + From: README.md + Status: Error + Message: libcurl error code 28: + Operation timed out after 60001 milliseconds with 0 bytes received + +* checking HTML version of manual ... [14s] NOTE +Skipping checking math rendering: package 'V8' unavailable + +* checking for non-standard things in the check directory ... NOTE +Found the following files/directories: + ''NULL'' + +* checking for detritus in the temp directory ... NOTE +Found the following files/directories: + 'lastMiKTeXException' ## Downstream dependencies There are currently no downstream dependencies for this package. diff --git a/doc/LDATS_codebase.Rmd b/doc/LDATS_codebase.Rmd index 0c736efa..d1dcb541 100644 --- a/doc/LDATS_codebase.Rmd +++ b/doc/LDATS_codebase.Rmd @@ -1,9 +1,9 @@ --- -title: "Latent Dirichlet Allocation Time Series (LDATS)" +title: "LDATS Codebase" author: "Juniper L. Simonis" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{LDATScodebase} + %\VignetteIndexEntry{LDATS Codebase} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- diff --git a/doc/LDATS_codebase.html b/doc/LDATS_codebase.html index 46e96341..5cb7f126 100644 --- a/doc/LDATS_codebase.html +++ b/doc/LDATS_codebase.html @@ -13,76 +13,105 @@ -Latent Dirichlet Allocation Time Series (LDATS) +LDATS Codebase + + + + - + - - - - - - - - - - - - - - - -

Comparison to Christensen et al. 2018

-

Renata Diaz, Juniper Simonis, and Hao Ye

- - - -
-

Introduction

-

This document provides a side-by-side comparison of LDATS (version 0.2.7) results with analysis from Christensen et al. 2018. Due to the size and duration of model runs, we use pre-generated model output from the LDATS-replications repo.

-
-

Summary

- ------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StepChanges from Christensen et al 2018 to LDATSEffect on comparisonRecommendations for future users
DataPaper adjusts abundances according to unit effort, while LDATS uses raw capture numbers.None: run comparison using adjusted dataUse raw, unweighted abundances.
LDA model selectionPaper conservatively overestimated the number of parameters for calculating AIC for model selection. LDATS calculates AIC appropriately.Paper LDA selects 4 topics, while LDATS finds 6. Compare all combinations of paper and LDATS LDA models and changepoint modelsUse LDATS AIC calculation.
Changepoint model document weightsPaper weighted all sampling periods equally regardless of abundance. LDATS by default weights the information from each sampling period according to the number of individuals captured (i.e. the amount of information gleaned about the underlying community composition).None; use weights = NULL to set all weights equal to 1 for LDATSWeight sampling periods according to abundance
Overall LDA + changepoint resultsAll combinations of LDA + changepoint model find 4 changepoints at approximately the same time stepsChoice of LDA model has more of an effect than choice of changepoint modelLDATS reflects best practices, but the paper methods will produce qualitatively similar results.
-
-
-
-

Setup

-
-

LDATS Installation

-

To obtain the most recent version of LDATS, install the most recent version from GitHub:

-
install.packages("devtools")
-devtools::install_github("weecology/LDATS")
-

Load in the LDATS package.

-
library(LDATS)
-set.seed(42)
-nseeds <- 200
-nit <- 10000
-

To run the analyses here, you will also need to download dplyr, gridExtra, multipanel, RColorBrewer, RCurl, and reshape2 as the manuscript’s code relies on these packages.

-
install.packages(c("dplyr", "gridExtra", "multipanel", "RColorBrewer", "RCurl", "reshape2"))
-
-
-

Running the Models

-

Because both the Latent Dirichlet Allocation (LDA) and time series components of the analysis can take a long time to run (especially with the settings above for the number of seeds and iterations), we will use pre-generated model outputs and turn off certain code chunks that run the models using a global rmarkdown parameter, run_models = FALSE.

-

To change this functionality, you can re-render this file with:

-
rmarkdown::render("paper-comparison.Rmd", params = list(run_models = TRUE))
-
-
-

Download Analysis Scripts and Data Files

-

We’re going to download analysis scripts, data files, and model objects, so we use a temporary location for storage:

-
vignette_files <- tempdir()
-

To replicate the Christensen et al. 2018 analysis, we download some of the original scripts & data files from Extreme-events-LDA repo, as well as some raw data files from the PortalData repo, which are stored in the LDATS-replications repo:

-

Main Analysis Scripts:

-
    -
  • rodent_LDA_analysis.R -
      -
    • main script for analyzing rodent community change using LDA
    • -
  • -
  • rodent_data_for_LDA.R -
      -
    • contains a function that creates the rodent data table used in analyses
    • -
  • -
  • AIC_model_selection.R -
      -
    • contains functions for calculating AIC for different candidate LDA models
    • -
  • -
  • changepointmodel.r -
      -
    • contains change-point model code
    • -
  • -
  • LDA-distance.R -
      -
    • function for computing Hellinger distance analyses
    • -
  • -
-

Data:

-
    -
  • Rodent_table_dat.csv -
      -
    • table of rodent data, created by rodent_data_for_LDA.R
    • -
  • -
  • moon_dates.csv -
      -
    • table of census dates (downloaded from the PortalData repository)
    • -
  • -
  • Portal_rodent_trapping.csv -
      -
    • table of trapping effort (downloaded from the PortalData repository)
    • -
  • -
  • paper_dat.csv -
      -
    • rodent data table from Christensen et al. 2018
    • -
  • -
  • paper_dates.csv -
      -
    • dates used in Christensen et al. 2018
    • -
  • -
  • paper_covariates.csv -
      -
    • table of dates and covariate data
    • -
  • -
-

Figure scripts:

-
    -
  • LDA_figure_scripts.R
  • -
  • contains functions for making main plots in manuscript (Fig 1). Called from rodent_LDA_analysis.R
  • -
-
test_file <- file.path(vignette_files, "scripts", "rodent_LDA_analysis.r")
-
-if (!file.exists(test_file)){
-
-  # scripts
-  dir.create(file.path(vignette_files, "scripts"), showWarnings = FALSE)
-  github_path <- "https://raw.githubusercontent.com/weecology/LDATS-replications/master/scripts/"
-  files_to_download <- c("rodent_LDA_analysis.r", "rodent_data_for_LDA.r", 
-                         "AIC_model_selection.R", "changepointmodel.r", 
-                         "LDA-distance.R", "LDA_figure_scripts.R")
-  
-  for (file in files_to_download)  {
-    download.file(url = paste0(github_path, file),
-                  destfile = file.path(vignette_files, "scripts", file))
-  }
-
-    
-  # data
-  dir.create(file.path(vignette_files, "data"), showWarnings = FALSE)
-  github_path <- "https://raw.githubusercontent.com/weecology/LDATS-replications/master/data/"
-  files_to_download <- c("moon_dates.csv", "Portal_rodent_trapping.csv", 
-                         "Rodent_table_dat.csv", "paper_dat.csv",
-                         "paper_dates.csv", "paper_covariates.csv")
-  
-  for (file in files_to_download)  {
-    download.file(url = paste0(github_path, file),
-                  destfile = file.path(vignette_files, "data", file))
-  }
-}
-
-
-

Download Model Outputs

-

We also have pre-generated model outputs that we download from the LDATS-replications repo:

-

LDA models:

-
    -
  • ldats_ldamodel.RDS -
      -
    • the best LDA model as selected by LDATS
    • -
  • -
  • paper_ldamodel.RDS -
      -
    • the best LDA model as selected by the Christensen et al. analysis
    • -
  • -
-

Changepoint outputs:

-
    -
  • ldats_ldats.RDS, ldats_ldats_cpt.RDS, ldats_ldats_cpt_dates.RDS -
      -
    • the posterior distribution, count, and dates of changepoints, using the LDATS LDA model and the LDATS changepoint selection
    • -
  • -
  • ldats_paper.RDS, ldats_paper_cpt.RDS, `ldats_paper_cpt_dates.RDS -
      -
    • the posterior distribution, count, and dates of changepoints, using the LDATS LDA model and the paper’s changepoint selection
    • -
  • -
  • paper_ldats.RDS, paper_ldats_cpt.RDS, `paper_ldats_cpt_dates.RDS -
      -
    • the posterior distribution, count, and dates of changepoints, using the paper LDA model and the LDATS changepoint selection
    • -
  • -
  • paper_paper.RDS, paper_paper_cpt.RDS, `paper_paper_cpt_dates.RDS -
      -
    • the posterior distribution, count, and dates of changepoints, using the paper LDA model and the paper’s changepoint selection
    • -
  • -
  • cpt_dates.RDS -
      -
    • summary table of changepoint results across models
    • -
  • -
-

Figures

-
    -
  • lda_distances.png -
      -
    • figure showing the variance in the topics identified by the paper’s LDA model code
    • -
  • -
  • paper_paper_cpt_plot.png -
      -
    • figure showing the time series results for the paper analysis of the paper LDA
    • -
  • -
  • ldats_paper_cpt_plot.png -
      -
    • figure showing the time series results for the paper analysis of the LDATS LDA
    • -
  • -
  • annual_hist.RDS -
      -
    • function for making histogram of change points over years
    • -
  • -
-
test_file <- file.path(vignette_files, "output", "ldats_ldamodel.RDS")
-
-if (!file.exists(test_file)){
-
-  dir.create(file.path(vignette_files, "output"), showWarnings = FALSE)
-  github_path <- "https://raw.githubusercontent.com/weecology/LDATS-replications/master/output/"
-  files_to_download <- c("ldats_ldamodel.RDS", "paper_ldamodel.RDS", 
-                         "ldats_ldats.RDS", "ldats_paper.RDS", 
-                         "paper_ldats.RDS", "paper_paper.RDS", 
-                         "ldats_rodents_adjusted.RDS", "rodents.RDS",
-                         "ldats_paper_cpt.RDS", "ldats_paper_cpt_dates.RDS",
-                         "ldats_ldats_cpt.RDS", "ldats_ldats_cpt_dates.RDS",
-                         "paper_paper_cpt.RDS", "paper_paper_cpt_dates.RDS",
-                         "paper_ldats_cpt.RDS", "paper_ldats_cpt_dates.RDS",
-                         "annual_hist.RDS", "cpt_dates.RDS",
-                         "lda_distances.png", "paper_paper_cpt_plot.png",
-                         "ldats_paper_cpt_plot.png")
-
-  for (file in files_to_download){
-    download.file(url = paste0(github_path, file),
-                  destfile = file.path(vignette_files, "output", file), 
-                  mode = "wb")
-  }
-}
-
-
-
-

Data Comparison

-

The dataset of Portal rodents on control plots is included in the LDATS package:

-
data(rodents)
-
-head(rodents[[1]])
-#>   BA DM DO DS NA. OL OT PB PE PF PH PI PL PM PP RF RM RO SF SH SO
-#> 1  0 13  0  2   2  0  0  0  1  1  0  0  0  0  3  0  0  0  0  0  0
-#> 2  0 20  1  3   2  0  0  0  0  4  0  0  0  0  2  0  0  0  0  0  0
-#> 3  0 21  0  8   4  0  0  0  1  2  0  0  0  0  1  0  0  0  0  0  0
-#> 4  0 21  3 12   4  2  3  0  1  1  0  0  0  0  0  0  0  0  0  0  0
-#> 5  0 16  1  9   5  2  1  0  0  2  0  0  0  0  0  0  1  0  0  0  0
-#> 6  0 17  1 13   5  1  5  0  0  3  0  0  0  0  0  0  0  0  0  0  0
-

We can compare this against the data used in Christensen et al:

-
# parameters for subsetting the full Portal rodents data
-periods <- 1:436
-control_plots <- c(2, 4, 8, 11, 12, 14, 17, 22)
-species_list <- c("BA", "DM", "DO", "DS", "NA", "OL", "OT", "PB", "PE", "PF", 
-                  "PH", "PI", "PL", "PM", "PP", "RF", "RM", "RO", "SF", "SH", "SO")
-
-source(file.path(vignette_files, "scripts", "rodent_data_for_LDA.r"))
-
-# assemble `paper_dat`, the data from Christensen et al. 2018
-paper_dat <- create_rodent_table(period_first = min(periods),
-                                 period_last = max(periods),
-                                 selected_plots = control_plots,
-                                 selected_species = species_list)
-
-# assemble `paper_covariates`, the associated dates and covariate data
-moondat <- read.csv(file.path(vignette_files, "data", "moon_dates.csv"), stringsAsFactors = F)
-
-paper_dates <- moondat %>%
-  dplyr::filter(period %>% dplyr::between(min(periods), max(periods))) %>% 
-  dplyr::pull(censusdate) %>%
-  as.Date()
-
-paper_covariates <- data.frame(
-  index = seq_along(paper_dates), 
-  date = paper_dates, 
-  year_continuous = lubridate::decimal_date(paper_dates)) %>%
-  dplyr::mutate( 
-    sin_year = sin(year_continuous * 2 * pi), 
-    cos_year = cos(year_continuous * 2 * pi)
-  )
-
-

Compare the data from Christensen et al. with the included data in LDATS

-
compare <- rodents[[1]] == paper_dat
-
-length(which(rowSums(compare) < ncol(compare)))
-#> [1] 16
-

There are 16 rows where the data included in LDATS differs from the paper data. This is because the LDATS data is not adjusted to account for trapping effort, while the paper data is, by dividing all census counts by the actual number of plots trapped and multiplying by 8 to account for incompletely-trapped censuses.

-

To confirm this, refer to lines 36-46 in rodent_data_for_LDA.r:

-
  # retrieve data on number of plots trapped per month
-  trap_table = read.csv('https://raw.githubusercontent.com/weecology/PortalData/master/Rodents/Portal_rodent_trapping.csv')
-  trap_table_controls = filter(trap_table, plot %in% selected_plots)
-  nplots_controls = aggregate(trap_table_controls$sampled,by=list(period = trap_table_controls$period),FUN=sum)
-  
-  # adjust species counts by number of plots trapped that month
-  r_table_adjusted = as.data.frame.matrix(r_table)
-  for (n in 1:436) {
-    #divide by number of control plots actually trapped (should be 8) and multiply by 8 to estimate captures as if all plots were trapped
-    r_table_adjusted[n,] = round(r_table_adjusted[n,]/nplots_controls$x[n]*8)
-  }
-

We can run the same procedure on the LDATS data to verify that we obtain a data.frame that matches.

-
# get the trapping effort for each sample
-trap_table <- read.csv(file.path(vignette_files, "data", "Portal_rodent_trapping.csv"))
-trap_table_controls <- dplyr::filter(trap_table, plot %in% control_plots)
-nplots_controls <- aggregate(trap_table_controls$sampled, 
-                            by = list(period = trap_table_controls$period), 
-                            FUN = sum)
-
-# adjust species counts by number of plots trapped that month
-#   divide by number of control plots actually trapped (should be 8) and 
-#   multiply by 8 to estimate captures as if all plots were trapped
-ldats_rodents_adjusted <- as.data.frame.matrix(rodents[[1]])
-ldats_rodents_adjusted[periods, ] <- round(ldats_rodents_adjusted[periods, ] / nplots_controls$x[periods] * 8)
-

Now we can compare the adjusted LDATS dataset with both the original ldats dataset and the dataset from the paper:

-
compare_raw <- rodents[[1]] == ldats_rodents_adjusted
-length(which(rowSums(compare_raw) < ncol(compare_raw)))
-
-compare_adjusted <- ldats_rodents_adjusted == paper_dat
-length(which(rowSums(compare_adjusted) < ncol(compare_adjusted)))
-

Because the LDA procedure weights the information from documents (census periods) according to the number of words (rodents captured), we now believe it is most appropriate to run the LDA on unadjusted trapping data, and we recommend that users of LDATS do so. However, to maintain consistency with Christensen et al 2018, we will proceed using the adjusted rodent table in this vignette.

-
rodents[[1]] <- paper_dat
-

The LDATS rodent data comes with a document_covariate_table, which we will use later as the predictor variables for the changepoint models. In this table, time is expressed as new moon numbers. Later we will want to be able to interpret the results in terms of census dates. We will add a column to the document_covariate_table to convert new moon numbers to census dates. We will not reference this column in any of the formulas we pass to the changepoint models, so it will be ignored until we need it.

-
head(rodents$document_covariate_table)
-#>   newmoon sin_year cos_year
-#> 1       1  -0.2470  -0.9690
-#> 2       2  -0.6808  -0.7325
-#> 3       3  -0.9537  -0.3008
-#> 4       4  -0.9813   0.1925
-#> 5       5  -0.7583   0.6519
-#> 6       6  -0.3537   0.9354
-
new_cov_table <- dplyr::left_join(rodents$document_covariate_table, 
-                                  dplyr::select(moondat, newmoonnumber, censusdate), 
-                                  by = c("newmoon" = "newmoonnumber")) %>%
-                                  dplyr::rename(date = censusdate)
-
-rodents$document_covariate_table <- new_cov_table
-
-
-
-

Identify community groups using LDA

-

While LDATS can run start-to-finish with LDATS::LDA_TS, here we will work through the process function-by-function to isolate differences with the paper. For a breakdown of the LDA_TS pipeline, see the codebase vignette.

-

First, we run the LDA models from LDATS to identify the number of topics:

-
ldats_ldas <- LDATS::LDA_set(document_term_table = rodents$document_term_table, 
-                             topics = 2:6, nseeds = nseeds)
-ldats_ldamodel <- LDATS::select_LDA(LDA_models = ldats_ldas)[[1]]
-
-saveRDS(ldats_ldamodel, file = file.path(vignette_files, "ldats_ldamodel.RDS"))
-

Second, we run the LDA models from Christensen et al. to do the same task:

-
source(file.path(vignette_files, "scripts", "AIC_model_selection.R"))
-source(file.path(vignette_files, "scripts", "LDA-distance.R"))
-
-# Some of the functions require the data to be stored in the `dat` variable
-dat <- paper_dat
-
-# Fit a bunch of LDA models with different seeds
-# Only use even numbers for seeds because consecutive seeds give identical results
-seeds <- 2 * seq(nseeds)
-
-# repeat LDA model fit and AIC calculation with a bunch of different seeds to test robustness of the analysis
-best_ntopic <- repeat_VEM(paper_dat,
-                          seeds,
-                          topic_min = 2,
-                          topic_max = 6)
-hist(best_ntopic$k, breaks = seq(from = 0.5, to = 9.5), 
-     xlab = "best # of topics", main = "")
-
-# 2b. how different is species composition of 4 community-types when LDA is run with different seeds?
-# ==================================================================
-# get the best 100 seeds where 4 topics was the best LDA model
-seeds_4topics <- best_ntopic %>%
-  filter(k == 4) %>%
-  arrange(aic) %>%
-  head(min(100, nseeds)) %>%
-  pull(SEED)
-
-# choose seed with highest log likelihood for all following analyses
-#    (also produces plot of community composition for "best" run compared to "worst")
-
-png(file.path(vignette_files, "output", "lda_distances.png"), width = 800, height = 400)
-dat <- paper_dat # calculate_LDA_distance has some required named variables
-best_seed <- calculate_LDA_distance(paper_dat, seeds_4topics)
-dev.off()
-mean_dist <- unlist(best_seed)[2]
-max_dist <- unlist(best_seed)[3]
-
-# ==================================================================
-# 3. run LDA model
-# ==================================================================
-ntopics <- 4
-SEED <- unlist(best_seed)[1]  # For the paper, use seed 206
-ldamodel <- LDA(paper_dat, ntopics, control = list(seed = SEED), method = "VEM")
-
-saveRDS(ldamodel, file = file.path(vignette_files, "paper_ldamodel.RDS"))
-
knitr::include_graphics(file.path(vignette_files, "output", "lda_distances.png"))
-

-
-

Plots

-

To visualize the LDA assignment of species to topics, we load in the saved LDA models from previously:

-
ldamodel <- readRDS(file.path(vignette_files, "output", "paper_ldamodel.RDS"))
-ldats_ldamodel <- readRDS(file.path(vignette_files, "output", "ldats_ldamodel.RDS"))
-

How the paper LDA model assigns species to topics:

-
plot(ldamodel, cols = NULL, option = "D")
-

-

How the LDATS LDA model assigns species to topics:

-
plot(ldats_ldamodel, cols = NULL, option = "D")
-

-

The paper method finds 4 topics and LDATS finds 6. This is because of an update to the model selection procedure. The paper conservatively overestimates the number of parameters (by counting all of the variational parameters) and therefore overpenalizes the AIC for models with more topics. Comparatively, the LDATS method now uses the number of parameters remaining after the variational approximation, as returned by the LDA object. For this vignette, we will compare the results from using both LDA models.

-
-
-
-

Changepoint models

-

We will compare four combinations of LDA + changepoint models:

-
    -
  • LDATS LDA + LDATS changepoint
  • -
  • LDATS LDA + paper changepoint
  • -
  • Paper LDA + LDATS changepoint
  • -
  • Paper LDA + paper changepoint
  • -
-

Having divided the data to generate catch-per-effort, the paper changepoint model weighted all sample periods equally. In comparison, LDATS does not force an equal weighting, but assumes that as default, and can weight sample periods according to how many individuals were captured (controlled by the weights argument to LDA_TS, and easily calculated for a document-term matrix using document_term_weights. We now believe it is more appropriate to weight periods proportional to captures in the time series (despite the LDA function returning only proportions of each topic), and this is what we recommend for LDATS users. For the purposes of comparison, however, we will continue set all weights = 1 for both changepoint models. For an example of LDATS run with proportional weights, see the rodents vignette.

-
-

Running paper changepoint models

-

We define a few helper functions for running the changepoints model of Christensen et al. and processing the output to obtain the dates:

-
#### Run changepoint ####
-source(file.path(vignette_files, "scripts", "changepointmodel.r"))
-
-find_changepoints <- function(lda_model, paper_covariates, n_changepoints = 1:6){
-  # set up parameters for model
-  x <- dplyr::select(paper_covariates, 
-                     year_continuous, 
-                     sin_year, 
-                     cos_year)
-
-  # run models with 1, 2, 3, 4, 5, 6 changepoints
-  cpt_results <- data.frame(n_changepoints = n_changepoints)
-  cpt_results$cpt_model <- lapply(cpt_results$n_changepoints,
-                                  function(n_changepoints){
-                                    changepoint_model(lda_model, x, n_changepoints, maxit = nit, 
-                                                      weights = rep(1, NROW(x)))
-                                  })
-  return(cpt_results)
-}
-
-# Among a selection of models with different # of changepoints, 
-#   - compute AIC
-#   - select the model with the best AIC
-#   - get the posterior distributions for the changepoints
-select_cpt_model <- function(cpt_results, ntopics){
-  # compute log likelihood as the mean deviance
-  cpt_results$mean_deviances <- vapply(cpt_results$cpt_model, 
-                                       function(cpt_model) {mean(cpt_model$saved_lls)}, 
-                                       0)
-
-  # compute AIC = ( -2 * log likelihood) + 2 * (#parameters)
-  cpt_results$AIC <- cpt_results$mean_deviances * -2 + 
-    2 * (3 * (ntopics - 1) * (cpt_results$n_changepoints + 1) +
-           (cpt_results$n_changepoints))
-  
-  # select the best model
-  cpt <- cpt_results$cpt_model[[which.min(cpt_results$AIC)]]
-  return(cpt)
-}
-
-# transform the output from `compute_cpt` and match up the time indices with 
-#   dates from the original data
-get_dates <- function(cpt, covariates = paper_covariates){
-  cpt$saved[,1,] %>%
-    t() %>%
-    as.data.frame() %>%
-    reshape::melt() %>%
-    dplyr::left_join(covariates, by = c("value" = "index"))
-}
-
-

LDATS LDA and paper changepoint

-

Run the Christensen et al. time series model to identify changepoints on the LDA topics selected by LDATS:

-
ldats_paper_results <- find_changepoints(ldats_ldamodel, paper_covariates)
-
-saveRDS(ldats_paper_results, file = file.path(vignette_files, "output", "ldats_paper.RDS"))
-

Extract the dates of the changepoints:

-
ldats_paper_results <- readRDS(file.path(vignette_files, "output", "ldats_paper.RDS"))
-
-ldats_paper_cpt <- select_cpt_model(ldats_paper_results, 
-                                    ntopics = ldats_ldamodel@k)
-ldats_paper_cpt_dates <- get_dates(ldats_paper_cpt)
-
-
-

Paper LDA and paper changepoint

-

Run the Christensen et al. time series model to identify changepoints on the LDA topics selected by Christensen et al.:

-
paper_paper_results <- find_changepoints(ldamodel, paper_covariates)
-
-saveRDS(paper_paper_results, file = file.path(vignette_files, "paper_paper.RDS"))
-

Extract the dates of the changepoints:

-
paper_paper_results <- readRDS(file.path(vignette_files, "output", "paper_paper.RDS"))
-
-paper_paper_cpt <- select_cpt_model(paper_paper_results, 
-                                    ntopics = ldamodel@k)
-paper_paper_cpt_dates <- get_dates(ldats_paper_cpt)
-
-
-
-

Running LDATS changepoint models

-
-

LDATS LDA and LDATS changepoint

-

Run the LDATS time series model to identify changepoints on the LDA topics selected by LDATS:

-
ldats_ldats_results <- TS_on_LDA(LDA_models = ldats_ldamodel,
-                                 document_covariate_table = rodents$document_covariate_table,
-                                 formulas = ~ sin_year + cos_year,
-                                 nchangepoints = 1:6,
-                                 timename = "newmoon",
-                                 weights = NULL,
-                                 control = list(nit = nit))
-
-saveRDS(ldats_ldats_results, file = file.path(vignette_files, "output", "ldats_ldats.RDS"))
-

Unlike the paper changepoint model, LDATS can recognize that sampling periods may not be equidistant, and can place changepoint estimates at new moons if they fall between nonconsecutive sampling periods. We can estimate the dates corresponding to those new moons, extrapolating from the census dates for adjacent census periods.

-
# make the full sequence of possible newmoon values
-full_index <- seq(min(rodents$document_covariate_table$newmoon), 
-                  max(rodents$document_covariate_table$newmoon))
-
-# generate a lookup table with dates for the newmoons, using `approx` to 
-#   linearly interpolate the missing values
-ldats_dates <- approx(rodents$document_covariate_table$newmoon, 
-                     as.Date(rodents$document_covariate_table$date), 
-                     full_index) %>%
-  as.data.frame() %>%
-  mutate(index = x, 
-         date = as.Date(y, origin = "1970-01-01")) %>%
-  select(index, date)
-

Select the best time series model and extract the dates of the changepoints:

-
ldats_ldats_results <- readRDS(file.path(vignette_files, "output", "ldats_ldats.RDS"))
-  
-ldats_ldats_cpt <- select_TS(ldats_ldats_results)
-
-ldats_ldats_cpt_dates <- ldats_ldats_cpt$rhos %>%
-  as.data.frame() %>%
-  reshape::melt() %>%
-  dplyr::left_join(ldats_dates, by = c("value" = "index"))
-
-
-

Paper LDA and LDATS changepoint

-

Run the LDATS time series model to identify changepoints on the LDA topics selected by Christensen et al.:

-
paper_ldats_results <- TS_on_LDA(LDA_models = ldamodel,
-                             document_covariate_table = rodents$document_covariate_table,
-                             formulas = ~ sin_year + cos_year,
-                             nchangepoints = 1:6,
-
-                             timename = "newmoon",
-                             weights = NULL,
-                             control = list(nit = nit))
-
-
-saveRDS(paper_ldats_results, file = file.path(vignette_files, "output", "paper_ldats.RDS"))
-

Select the best time series model and extract the dates of the changepoints:

-
paper_ldats_results <- readRDS(file.path(vignette_files, "output", "paper_ldats.RDS"))
-  
-paper_ldats_cpt <- select_TS(paper_ldats_results)
-
-paper_ldats_cpt_dates <- paper_ldats_cpt$rhos %>%
-  as.data.frame() %>%
-  reshape::melt() %>%
-  dplyr::left_join(ldats_dates, by = c("value" = "index"))
-
-
-
-

How many changepoints were identified?

-
nlevels(ldats_paper_cpt_dates$variable)
-#> [1] 4
-nlevels(paper_paper_cpt_dates$variable)
-#> [1] 4
-nlevels(ldats_ldats_cpt_dates$variable)
-#> [1] 4
-nlevels(paper_ldats_cpt_dates$variable)
-#> [1] 4
-

All of the models find four changepoints.

-
-
-

Plot changepoint models

-
-

Paper LDA and LDATS changepoint

-
plot(paper_ldats_cpt)
-

-
-
-

LDATS LDA and LDATS changepoint

-
plot(ldats_ldats_cpt)
-

-
annual_hist <- readRDS(file.path(vignette_files, "output", "annual_hist.RDS"))
-
-
-

Paper LDA and paper changepoint

-
paper_cpts <- find_changepoint_location(paper_paper_cpt)
-ntopics <- ldamodel@k
-
-png(file.path(vignette_files, "output", "paper_paper_cpt_plot.png"), width = 800, height = 600)
-get_ll_non_memoized_plot(ldamodel, paper_covariates, paper_cpts, make_plot = TRUE,
-                                           weights = rep(1, NROW(paper_covariates)))
-dev.off()
-
paper_paper_hist <- annual_hist(paper_paper_cpt, paper_covariates$year_continuous)
-

-
knitr::include_graphics(file.path(vignette_files, "output", "paper_paper_cpt_plot.png"))
-

-
-
-

LDATS LDA and paper changepoint

-
ldats_cpts <- find_changepoint_location(ldats_paper_cpt)
-ntopics <- ldats_ldamodel@k
-
-png(file.path(vignette_files, "output", "ldats_paper_cpt_plot.png"), width = 800, height = 600)
-get_ll_non_memoized_plot(ldats_ldamodel, paper_covariates, ldats_cpts, make_plot = TRUE,
-                                           weights = rep(1, NROW(paper_covariates)))
-dev.off()
-
ldats_paper_hist <- annual_hist(ldats_paper_cpt, paper_covariates$year_continuous)
-

-
knitr::include_graphics(file.path(vignette_files, "output", "ldats_paper_cpt_plot.png"))
-

-

The results of the changepoint model appear robust to both choice of LDA model and choice of changepoint model.

-
-
-
-

Report changepoint dates

-
knitr::kable(cpt_dates)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
changepointldatsLDA_ldatsCPTldatsLDA_paperCPTpaperLDA_ldatsCPTpaperLDA_paperCPT
V11984-01-301984-01-111984-04-271984-01-11
V21991-05-301991-05-291993-04-191991-05-29
V31999-01-251998-12-211999-07-071998-12-21
V42009-11-152009-09-292010-01-202009-09-29
-

The choice of LDA has more influence on the changepoint locations than the choice of changepoint model - probably because the LDATS LDA has 6 topics, and the paper LDA has 4. However, all of the models agree to within 6 months in most cases, and a year for the broader early 1990s changepoint.

-
-
- - - - - - - - - - - diff --git a/doc/rodents-example.Rmd b/doc/rodents-example.Rmd index 26fb9342..058021de 100644 --- a/doc/rodents-example.Rmd +++ b/doc/rodents-example.Rmd @@ -1,9 +1,9 @@ --- -title: "LDATS Rodents Example" +title: "Rodents Example" author: "Renata Diaz and Juniper L. Simonis" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{rodentsexample} + %\VignetteIndexEntry{Rodents Example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- diff --git a/doc/rodents-example.html b/doc/rodents-example.html index 5d7143f2..7b3892d4 100644 --- a/doc/rodents-example.html +++ b/doc/rodents-example.html @@ -13,76 +13,105 @@ -LDATS Rodents Example +Rodents Example + + + + - +