A patient with respiratory symptoms seeks to find out the pathogen that is causing them these symptoms.
Make sure all bacterial reads needed to create your reference dataset also known as a training dataset are available.
bash 1_before_starting.shbash 2_before_starting.shNote: training and sample datasets are required to have the same ksize. Please note that since we are sketching from a list of genomes. We can use the following sourmash sketch command:
sourmash sketch fromfile genome_list.csv -p dna,k=31,scaled=1000,abund -o training_database.k31.sig.zipSketch your sample fasta file
yacht sketch sample --infile ./lung_sample.fasta --kmer 31 --scaled 1000 --outfile lung_sample.k31.sig.zipyacht train --ref_file training_database.k31.sig.zip --ksize 31 --num_threads 64 --ani_thresh 0.95 --prefix 'training_database.k31' --outdir ./ --forceyacht run --json training_database.k31_config.json --sample_file lung_sample.k31.sig.zip --significance 0.99 --num_threads 64 --min_coverage_list 1 0.6 0.2 0.1 --out ./k31_result.xlsxUsing a ksize of 31, YACHT finds that M. pneumoniae is present in the lung sample.
If we use small ksizes like 15, we would expect to not find that the patient is infected by M. pneumoniae. Let's set up the experiment. Note that a ksize below 7 may not produce results and is not recommend.
sourmash sketch fromfile genome_list.csv -p dna,k=15,scaled=1000,abund -o training_database.k15.sig.zipSketch your sample fasta file
yacht sketch sample --infile ./lung_sample.fasta --kmer 15 --scaled 1000 --outfile lung_sample.k15.sig.zipyacht train --ref_file training_database.k15.sig.zip --ksize 15 --num_threads 64 --ani_thresh 0.95 --prefix 'training_database.k15' --outdir ./ --forceIdentify whether the patient has a infectin and what pathogen is causing the disease.
yacht run --json training_database.k15_config.json --sample_file lung_sample.k15.sig.zip --significance 0.99 --num_threads 64 --min_coverage_list 1 0.6 0.2 0.1 --out ./k15_result.xlsxUsing a ksize of 15, YACHT finds/does not fine that M. pneumoniae
yacht train --ref_file training_database.k15.sig.zip --ksize 15 --num_threads 64 --ani_thresh 0.85 --prefix 'training_database.k15_ani0.85' --outdir ./ --forceIdentify whether the patient has a infectin and what pathogen is causing the disease.
yacht run --json training_database.k15_ani0.85_config.json --sample_file lung_sample.k15.sig.zip --significance 0.99 --num_threads 64 --min_coverage_list 1 0.6 0.2 0.1 --out ./k15_ani0.85_result.xlsxUsing a ksize of 15, YACHT finds/does not fine that M. pneumoniae