Skip to content

Fixed warnings from reproducibility audit#99

Open
braebigge wants to merge 1 commit into
mainfrom
bb_reproducibility
Open

Fixed warnings from reproducibility audit#99
braebigge wants to merge 1 commit into
mainfrom
bb_reproducibility

Conversation

@braebigge
Copy link
Copy Markdown
Contributor

This PR is to address a few small changes found in a reproducibility audit. I primarily addressed these by adding additional documentation regarding API calls. None of these changes should affect the functionality of the pipeline.

  1. External API dependencies are fragile: The pipeline depends on live Foldseek, BLAST remote, UniProt REST, and AlphaFold APIs in search mode. These are inherently non-reproducible over time (APIs may change, servers may go down, rate limits may be hit). The README partially addresses this (e.g., line 213: "Restarting Snakemake with the --rerun-incomplete flag usually resolves this"), but a reproducer running this months or years later may encounter failures.
  2. Cluster mode with key_protids still requires Foldseek API calls: In Snakefile lines 586–591, get_aggregate_features_input() expands aggregate_foldseek_fraction_seq_identity and calculate_concordance for KEY_PROTIDS as a common input (not search-mode-only). This triggers the run_foldseek rule (a web API call) for each key_protid even in cluster mode. The README's cluster mode documentation (lines 119–168) does not mention this network requirement, which may surprise users who expect cluster mode to be fully offline.
  3. Additionally, I cleaned up a couple of things in the setup.py file, which isn't currently used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant