Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions PEPATACr/tests/testthat.R

This file was deleted.

79 changes: 0 additions & 79 deletions PEPATACr/tests/testthat/helper-fixtures.R

This file was deleted.

24 changes: 0 additions & 24 deletions PEPATACr/tests/testthat/test-summarizer.R

This file was deleted.

38 changes: 0 additions & 38 deletions PEPATACr/tests/testthat/test-utilities.R

This file was deleted.

78 changes: 0 additions & 78 deletions PEPATACr/tests/testthat/test-yamlToDT.R

This file was deleted.

34 changes: 24 additions & 10 deletions docs/assets.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,30 +24,44 @@

## Using `refgenie` managed assets

`PEPATAC` can utilize [`refgenie`](http://refgenie.databio.org/) assets. Because assets are user-dependent, these files must be available natively. Therefore, you need to [install and initialize a refgenie config file.](http://refgenie.databio.org/en/latest/install/). For example:
`PEPATAC` (this branch) targets [refgenie 1.0+](https://github.com/refgenie/refgenie1) (the SQLModel-backed reimplementation), not legacy refgenie 0.12.x.

`refgenie` 1.0 splits genome registration from asset acquisition: you first `refgenie genome init` from a FASTA, then `refgenie add` each asset (which builds it locally from the registered recipes, or pulls from a subscribed source).

Install and initialize refgenie 1.0:

```console
pip install refgenie
export REFGENIE=/path/to/your_genome_folder/genome_config.yaml
refgenie init -c $REFGENIE
pip install "refgenie>=1.0.0"
export REFGENIE_HOME_PATH=/path/to/your_refgenie_home
export REFGENIE_DB_CONFIG_PATH=$REFGENIE_HOME_PATH/refgenie_db_config.yaml
refgenie init
```

Add the `export REFGENIE` line to your `.bashrc` or `.profile` to ensure it persists.
Add the `export REFGENIE_HOME_PATH` and `export REFGENIE_DB_CONFIG_PATH` lines to your `.bashrc` or `.profile` to ensure they persist. Note: legacy refgenie used `$REFGENIE` pointing at a YAML config; refgenie 1.0 uses `$REFGENIE_DB_CONFIG_PATH` pointing at the SQLite-backed db config. Update any inherited `.bashrc` accordingly.

Next, pull the assets you need. Replace `hg38` in the example below if you need to use a different genome assembly. If these assets are not available automatically for your genome of interest, then you'll need to [build them](annotation.md). Download all standard assets for `hg38` like so:
Next, register a genome and add assets. Replace `hg38` if you need a different assembly:

```console
refgenie pull hg38/fasta hg38/bowtie2_index hg38/refgene_anno hg38/ensembl_gtf hg38/ensembl_rb hg38/blacklist
refgenie build hg38/feat_annotation
# Register a genome from a FASTA file
refgenie genome init /path/to/hg38.fa --alias hg38

# Add each asset (recipes ship in refgenie/recipes; subscribe to a source if pulling)
refgenie add hg38/fasta --recipe fasta
refgenie add hg38/bowtie2_index --recipe bowtie2_index
refgenie add hg38/refgene_anno --recipe refgene_anno
refgenie add hg38/blacklist --recipe blacklist
refgenie add hg38/feat_annotation --recipe feat_annotation
```

`PEPATAC` also requires a `bowtie2_index` asset for any prealignment genomes:

```console
refgenie pull rCRSd/fasta rCRSd/bowtie2_index human_repeats/fasta human_repeats/bowtie2_index
refgenie genome init /path/to/rCRSd.fa --alias rCRSd
refgenie add rCRSd/fasta --recipe fasta
refgenie add rCRSd/bowtie2_index --recipe bowtie2_index
```

If you prefer `bwa` for alignment, you would use the [`refgenie bwa_index`](http://refgenie.databio.org/en/latest/available_assets/#bwa_index) instead.
If you prefer `bwa` for alignment, you would use a `bwa_index` recipe instead. (Note: the `bwa_index` and `tallymer_index` asset classes may not yet ship in `refgenie/recipes`; check that repo or build manually.)

Furthermore, you can [learn more about using `seqOutBias` and the required `tallymer_index` here](sob.md).

Expand Down
30 changes: 14 additions & 16 deletions docs/detailed-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,28 +247,26 @@ Before we analyze anything, we also need a reference genome. You can use our rec

### 4a: Initialize `refgenie` and download assets

`PEPATAC` can utilize [`refgenie`](http://refgenie.databio.org/) assets. Because assets are user-dependent, these files must still be available natively. Therefore, we need to [install and initialize a refgenie config file.](http://refgenie.databio.org/en/latest/install/). For example:
> **NOTE (refgenie1 branch):** This branch targets [refgenie 1.0+](https://github.com/refgenie/refgenie1). See [`docs/assets.md`](assets.md) for canonical setup.

```console
pip install refgenie
export REFGENIE=/path/to/your_genome_folder/genome_config.yaml
refgenie init -c $REFGENIE
pip install "refgenie>=1.0.0"
export REFGENIE_HOME_PATH=/path/to/your_refgenie_home
export REFGENIE_DB_CONFIG_PATH=$REFGENIE_HOME_PATH/refgenie_db_config.yaml
refgenie init
refgenie genome init /path/to/hg38.fa --alias hg38
refgenie add hg38/fasta --recipe fasta
refgenie add hg38/bowtie2_index --recipe bowtie2_index
refgenie add hg38/refgene_anno --recipe refgene_anno
refgenie add hg38/feat_annotation --recipe feat_annotation
```

Add the `export REFGENIE` line to your `.bashrc` or `.profile` to ensure it persists.

Next, pull the assets you need. Replace `hg38` in the example below if you need to use a different genome assembly. If these assets are not available automatically for your genome of interest, then you'll need to [build them](annotation.md). Download these required assets with this command:

```console
refgenie pull hg38/fasta hg38/bowtie2_index hg38/refgene_anno hg38/ensembl_gtf hg38/ensembl_rb
refgenie build hg38/feat_annotation
```

`PEPATAC` also requires a `bowtie2_index` asset for any pre-alignment genomes:
`PEPATAC` also requires `fasta` and `bowtie2_index` assets for any pre-alignment genomes:

```console
refgenie pull rCRSd/fasta
refgenie pull rCRSd/bowtie2_index
refgenie genome init /path/to/rCRSd.fa --alias rCRSd
refgenie add rCRSd/fasta --recipe fasta
refgenie add rCRSd/bowtie2_index --recipe bowtie2_index
```

### 4b: Download assets manually
Expand Down
20 changes: 13 additions & 7 deletions docs/howto/install-refgenie.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@ You have two options for using `refgenie` assemblies with `PEPATAC`. If you're u

Pre-built genome indices exist for common genomes including: `hg38`, `hg19`, `mm10`, and `mm9`. You may [download the corresponding pre-indexed references](http://refgenie.databio.org/en/latest/download/) directly from the web or using `refgenie` on the command line.

For example, get the `hg38` bowtie2 index:
For example, build the `hg38` bowtie2 index (refgenie 1.0):
```console
refgenie pull hg38/bowtie2_index
refgenie genome init /path/to/hg38.fa --alias hg38
refgenie add hg38/bowtie2_index --recipe bowtie2_index
```

### Build custom `refgenie` assemblies
Expand All @@ -24,11 +25,16 @@ For complete and detailed information on indexing your own genomes and building

## 2: Configure the pipeline to use `refgenie` assemblies

Once you've procured assemblies for all genomes you wish to use, you must point the pipeline to where you store these. You can do this in two ways, either: 1) with an environment variable, or 2) by adjusting a configuration option.
The pipeline looks for genomes stored in a folder specified by the `resources.genome_config` attribute in the [pipeline config file](https://github.com/databio/pepatac/blob/dev/pipelines/pepatac.yaml). By default, this points to the shell variable `REFGENIE`, so all you have to do is set an environment variable to the location of your `refgenie` configuration file:
Once you've registered assemblies and assets for all genomes you wish to use, the pipeline locates them via the refgenie 1.0 db config path:

```
export REFGENIE="/path/to/genome_config.yaml"
export REFGENIE_HOME_PATH="/path/to/your_refgenie_home"
export REFGENIE_DB_CONFIG_PATH="$REFGENIE_HOME_PATH/refgenie_db_config.yaml"
```
(Add this to your `.bashrc` or `.profile` to ensure it persists).
Alternatively, you can skip the `REFGENIE` variable and simply change the value of that configuration option to point to the configuration file for `refgenie`. The advantage of using an environment variable is that it makes the configuration file portable, so the same pipeline can be run on any computing environment, as the location to reference assemblies is not hard-coded to a specific computing environment.

(Add these to your `.bashrc` or `.profile` to ensure they persist.)

The pipeline interface's `pre_submit` hook (`refgenie.looper_refgenie_populate_local`) reads `$REFGENIE_DB_CONFIG_PATH` from the environment and resolves all asset paths automatically.

> **NOTE (refgenie1 branch):** The legacy `$REFGENIE` env var (pointing at a YAML config) is replaced by `$REFGENIE_DB_CONFIG_PATH` (pointing at refgenie 1.0's db config YAML). Update any inherited `.bashrc` accordingly.

Loading