pheweb/etc/detailed-loading-instructions.md at master · statgen/pheweb

Configuration options

assoc_min_maf (float): an association (between a phenotype and variant) will only be included if its MAF is greater than or equal to this value. (default: 0)
cache (string): a directory where files shared by all datasets can be cached. If you're loading multiple phewebs, setting cache = "~/.pheweb/cache/" will avoid downloading files multiples times. (default: None)
num_procs (int): the number of processes to use for parallel loading steps. (default: 2/3 of the number of cores on your machine)
loading_nice = True: sets nice=19 (reducing cpu priority) and sets ionice to class "Idle" (reducing IO when anything else is using disk)
debugging_limit_num_variants (int): only parses this many variants from each input association file and from the rsids file. This is convenient for quickly loading part of a dataset to check that it works as expected.
download_pheno_sumstats: explained in README
show_correlations: explained in README

Making pheno-list.json

There are four ways to make a pheno-list.json:

If you have a csv (or tsv, optionally gzipped) with a header that has exactly the right column names, just import it by running pheweb phenolist import-phenolist "/path/to/my/pheno-list.csv".

If you have multiple association files for each phenotype, you may put them all into a single column with | between them. For example, your file pheno-list.csv might look like this:
```
phenocode,assoc_files
a1c,/home/peter/data/a1c.autosomal.gz|/home/peter/data/a1c.X.gz
ear-length,/home/peter/data/ear-length.gz
```
If you have one association file per phenotype, you can use a shell-glob to get assoc-files. Suppose that your assocation files are at paths like:
- /home/peter/data/a1c.autosomal.gz
- /home/peter/data/ear-length.gz
Then you could run pheweb phenolist glob "/home/peter/data/*.gz" to get assoc-files.

To get phenocodes, you can use this command which will take the text after the last / and before the next .:
```
pheweb phenolist extract-phenocode-from-filepath --simple
```
If that doesn't work, see pheweb phenolist extract-phenocode-from-filepath -h for how to use a regex capture group.
If you have multiple association files for some phenotypes, you can follow the directions in 2 and then run pheweb phenolist unique-phenocode.

For example, if your association files are at:
- /home/peter/data/ear-length.gz
- /home/peter/data/a1c.autosomal.gz
- /home/peter/data/a1c.X.gz
then you can run:
```
pheweb phenolist glob "/home/peter/data/*.gz"
pheweb phenolist extract-phenocode-from-filepath --simple
pheweb phenolist unique-phenocode
```
If you want to do more advanced things, like merging in more information from another file, check out the tools in pheweb phenolist --help.

Distributing jobs across a cluster

pheweb process runs a bunch of steps, which you can see by running pheweb process -h. Some of those steps can instead be run distributed across a cluster. You can see which steps by running pheweb cluster -h.

The schedulers SLURM and SGE are natively supported. Use --engine=slurm or --engine=sge when you run pheweb cluster. For other schedulers, you'll have to modify the output of pheweb cluster.

For example, on SLURM you could run:

pheweb phenolist verify
pheweb cluster --engine=slurm --step=parse
pheweb sites && pheweb make-gene-aliases-sqlite3 && pheweb add-rsids && pheweb add-genes && pheweb make-cpras-rsids-sqlite3
pheweb cluster --engine=slurm --step=augment-phenos
pheweb cluster --engine=slurm --step=manhattan
pheweb cluster --engine=slurm --step=qq
pheweb process  # This won't re-create any files that are already up-to-date.

Annotating with VEP

Run the code in etc/annotate_vep/run.sh. It requires docker (and thus sudo) and only works on hg38. Read the comments at the top of that script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration options

Making pheno-list.json

Distributing jobs across a cluster

Annotating with VEP

FilesExpand file tree

detailed-loading-instructions.md

Latest commit

History

detailed-loading-instructions.md

File metadata and controls

Configuration options

Making pheno-list.json

Distributing jobs across a cluster

Annotating with VEP