Skip to content

Commit f33e563

Browse files
committed
initial import runRAILS_lowcontiguityseqs.sh
1 parent 410832c commit f33e563

6 files changed

Lines changed: 51 additions & 10 deletions

File tree

RAILS_v1.2/RAILS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ use Getopt::Std;
2525
use Net::SMTP;
2626
use vars qw($opt_f $opt_s $opt_d $opt_i $opt_e $opt_l $opt_a $opt_v $opt_b $opt_t $opt_p $opt_q);
2727
getopts('f:s:d:e:l:a:v:b:t:p:i:q:');
28-
my ($base_name,$frag_dist,$seqid,$insert_stdev,$min_links,$max_link_ratio,$verbose)=("",1000,0.9,1.0,1,0.0,0);
28+
my ($base_name,$frag_dist,$seqid,$insert_stdev,$min_links,$max_link_ratio,$verbose)=("",250,0.9,1.0,1,0.0,0);
2929

3030
my $version = "[v1.2]";
3131
my $dev = "rwarren\@bcgsc.ca";

RAILS_v1.2/cobbler.pl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
use Net::SMTP;
2626
use vars qw($opt_f $opt_s $opt_d $opt_i $opt_v $opt_b $opt_t $opt_q);
2727
getopts('f:s:d:v:b:t:i:q:');
28-
my ($base_name,$frag_dist,$seqid,$verbose)=("",1000,0.9,0);
28+
my ($base_name,$frag_dist,$seqid,$verbose)=("",250,0.9,0);
2929

3030
my $version = "[v0.3]";
3131
my $dev = "rwarren\@bcgsc.ca";

RAILS_v1.2/readme.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
-------------
77

88
RAILS: Radial Assembly Improvement by Long Sequence Scaffolding
9+
910
Cobbler: Gap-filling with long sequences
1011

1112

@@ -89,7 +90,7 @@ Software. doi: 10.21105/joss.00116
8990
-------------
9091

9192
<pre>
92-
./runRAILS.sh
93+
./runRAILS.sh or runRAILS_lowcontiguityseqs.sh
9394
Usage: runRAILS.sh <FASTA assembly .fa> <FASTA long sequences .fa> <anchoring sequence length eg. 250> <min sequence identity 0.95>
9495

9596
this pipeline will:
@@ -107,7 +108,7 @@ Usage: ./cobbler.pl [v0.3]
107108
-f Assembled Sequences to further scaffold (Multi-FASTA format NO LINE BREAKS, required)
108109
-q Long Sequences queried (Multi-FASTA format NO LINE BREAKS, required)
109110
-s BAM file (use v0.2 for reading SAM files)
110-
-d Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 1000, optional)
111+
-d Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 250, optional)
111112
-i Minimum sequence identity, default -i 0.9, optional
112113
-t LIST of names/header, long sequences to avoid using for merging/gap-filling scaffolds (optional)
113114
-b Base name for your output files (optional)
@@ -117,7 +118,7 @@ Usage: ./RAILS [v1.2]
117118
-f Assembled Sequences to further scaffold (Multi-FASTA format NO LINE BREAKS, required)
118119
-q Long Sequences queried (Multi-FASTA format NO LINE BREAKS, required)
119120
-s BAM file (use v1.1 for reading SAM files)
120-
-d Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 1000, optional)
121+
-d Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 250, optional)
121122
-i Minimum sequence identity, default -i 0.9, optional
122123
-t LIST of names/header, long sequences to avoid using for merging/gap-filling scaffolds (optional)
123124
-b Base name for your output files (optional)
@@ -127,7 +128,7 @@ Usage: ./RAILS [v1.2]
127128
### How it works
128129
-------------
129130

130-
The pipeline is detailed in the provided script runRAILS.sh. PLEASE ensure the draft assembly is FASTA-formatted with one sequence per line (NO LINE BREAKS)
131+
The pipeline is detailed in the provided script runRAILS.sh and runRAILS_lowcontiguityseqs.sh. PLEASE ensure the draft assembly is FASTA-formatted with one sequence per line (NO LINE BREAKS)
131132

132133
Cobbler's process:
133134

@@ -136,6 +137,10 @@ In the runRAILS.sh, these scaftigs are renamed, tracking their scaffold of origi
136137
A bwa index is created and the long sequence file, also re-numbered, is aligned to the scaftigs.
137138
Cobbler is supplied with the alignment file (-s sam file) and the long reads files (-q option), specifying the minimum length of anchoring bases (-d) aligning at the edge of scaftigs and the minimum sequence identity of the alignment (-i). When 1 or more long sequences align unambiguously to the 3'end of a scaftig and the 5'end of its neighbour, the gap is patched with the sequence of that long sequence. If no long sequences are suitable, or the -d and -i conditions are not met, the original Ns are placed back between those scaftigs.
138139

140+
runRAILS.sh uses scaftigs for patching gaps, whereas runRAILS_lowcontiguityseqs.sh uses scaffold sequences (not broken at Ns) instead.
141+
If you intend to use an assembly with low contiguity for patching gaps, one that has few and short gaps (stretches of Ns), you may use runRAILS_lowcontiguityseqs.sh. This is because, depending on the -d parameter set, insufficient anchoring bases may align to patch a gap.
142+
143+
139144
RAILS process:
140145

141146
In RAILS, the process is similar as for Cobbler, except that the draft assembly is not broken up at Ns, since the goal is to merge distinct sequences into larger ones. Long sequences are aligned to the draft assembly sequences, orienting and ordering sequences and simulateneously filling the gaps between them, using DNA bases from the long sequences.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/bin/bash
2+
#RLW 2016
3+
if [ $# -ne 4 ]; then
4+
echo "Usage: $(basename $0) <FASTA assembly .fa> <FASTA long sequences .fa> <anchoring sequence length eg. 250> <min sequence identity 0.95>"
5+
exit 1
6+
fi
7+
###Change line below to point to path of bwa executables
8+
export PATH=/gsc/btl/linuxbrew/bin:$PATH
9+
echo Resolving ambiguous bases -Ns- in $1 assembly using long sequences $2
10+
echo reformatting file $1
11+
cat $1 | perl -ne 'if(/^\>/){$scafnum++;}else{my $len=length($_);my @scaftigs=split(/N+/i,$_);my $scaftignum=0;foreach my $scaftig(@scaftigs){ my $len=length($scaftig);$scaftignum++; print ">wga$scafnum";print "."; print "$scaftignum,$len\n$scaftig\n";}}' > $1-formatted.fa
12+
echo reformatting file $2
13+
###THIS IS FOR PATCHING GAPS USING SEQUENCES FROM AN ASSEMBLY WITH LOW CONTIGUITY (AND FEW, VERY SHORT STRETCHES OF Ns)
14+
cat $2 | perl -ne 'if(/^\>/){$ct++;}else{my $len=length($_);print ">seq$ct,$len\n$_";}' > $2-formatted.fa
15+
echo Building sequence database index out of your $1-formatted.fa assembly contigs..
16+
bwa index $1-formatted.fa
17+
echo Aligning long sequences $2-formatted.fa to your contigs..
18+
bwa mem -a -t24 /projects/btl/rwarren/beluga/RAILS/$1-formatted.fa $2-formatted.fa | /gsc/btl/linuxbrew/bin/samtools view -Sb - > $2_vs_$1_gapfilling.bam
19+
echo Scaffolding $1-formatted.fa using $2-formatted.fa and filling gaps with sequences in $2-formatted.fa
20+
./cobbler.pl -f $1 -s $2_vs_$1_gapfilling.bam -d $3 -i $4 -b $2_vs_$1_$3_gapsFill -q $2-formatted.fa
21+
echo Process terminated.
22+
echo RAILS scaffolding $1.gapsFill.fa sequences using long seqs $2 -- anchoring sequence threshold $3 bp
23+
echo reformatting file $1.gapsFill.fa
24+
cat $2_vs_$1_$3_gapsFill.fa | perl -ne 'if(/^\>/){$ct++;}else{my $len=length($_);print ">wga$ct,$len\n$_";}' > $2_vs_$1_$3_gapsFill-formatted.fa
25+
echo Building sequence database index out of your $2_vs_$1_$3_gapsFill-formatted.fa assembly contigs..
26+
bwa index $2_vs_$1_$3_gapsFill-formatted.fa
27+
echo Aligning long sequences $2-formatted.fa to your contigs..
28+
bwa mem -a -t24 $2_vs_$1_$3_gapsFill-formatted.fa $2-formatted.fa | /gsc/btl/linuxbrew/bin/samtools view -Sb - > $2_vs_$1_scaffolding.bam
29+
echo Scaffolding $2_vs_$1_$3_gapsFill-formatted.fa using $2-formatted.fa and filling new gaps with sequences in $2-formatted.fa
30+
./RAILS -f $2_vs_$1_$3_gapsFill-formatted.fa -s $2_vs_$1_scaffolding.bam -d $3 -i $4 -b $2_vs_$1_$3_rails -q $2-formatted.fa &
31+
echo RAILS process terminated.

rails_v1-2.tar.gz

346 Bytes
Binary file not shown.

readme.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
-------------
77

88
RAILS: Radial Assembly Improvement by Long Sequence Scaffolding
9+
910
Cobbler: Gap-filling with long sequences
1011

1112

@@ -89,7 +90,7 @@ Software. doi: 10.21105/joss.00116
8990
-------------
9091

9192
<pre>
92-
./runRAILS.sh
93+
./runRAILS.sh or runRAILS_lowcontiguityseqs.sh
9394
Usage: runRAILS.sh <FASTA assembly .fa> <FASTA long sequences .fa> <anchoring sequence length eg. 250> <min sequence identity 0.95>
9495

9596
this pipeline will:
@@ -107,7 +108,7 @@ Usage: ./cobbler.pl [v0.3]
107108
-f Assembled Sequences to further scaffold (Multi-FASTA format NO LINE BREAKS, required)
108109
-q Long Sequences queried (Multi-FASTA format NO LINE BREAKS, required)
109110
-s BAM file (use v0.2 for reading SAM files)
110-
-d Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 1000, optional)
111+
-d Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 250, optional)
111112
-i Minimum sequence identity, default -i 0.9, optional
112113
-t LIST of names/header, long sequences to avoid using for merging/gap-filling scaffolds (optional)
113114
-b Base name for your output files (optional)
@@ -117,7 +118,7 @@ Usage: ./RAILS [v1.2]
117118
-f Assembled Sequences to further scaffold (Multi-FASTA format NO LINE BREAKS, required)
118119
-q Long Sequences queried (Multi-FASTA format NO LINE BREAKS, required)
119120
-s BAM file (use v1.1 for reading SAM files)
120-
-d Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 1000, optional)
121+
-d Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 250, optional)
121122
-i Minimum sequence identity, default -i 0.9, optional
122123
-t LIST of names/header, long sequences to avoid using for merging/gap-filling scaffolds (optional)
123124
-b Base name for your output files (optional)
@@ -127,7 +128,7 @@ Usage: ./RAILS [v1.2]
127128
### How it works
128129
-------------
129130

130-
The pipeline is detailed in the provided script runRAILS.sh. PLEASE ensure the draft assembly is FASTA-formatted with one sequence per line (NO LINE BREAKS)
131+
The pipeline is detailed in the provided script runRAILS.sh and runRAILS_lowcontiguityseqs.sh. PLEASE ensure the draft assembly is FASTA-formatted with one sequence per line (NO LINE BREAKS)
131132

132133
Cobbler's process:
133134

@@ -136,6 +137,10 @@ In the runRAILS.sh, these scaftigs are renamed, tracking their scaffold of origi
136137
A bwa index is created and the long sequence file, also re-numbered, is aligned to the scaftigs.
137138
Cobbler is supplied with the alignment file (-s sam file) and the long reads files (-q option), specifying the minimum length of anchoring bases (-d) aligning at the edge of scaftigs and the minimum sequence identity of the alignment (-i). When 1 or more long sequences align unambiguously to the 3'end of a scaftig and the 5'end of its neighbour, the gap is patched with the sequence of that long sequence. If no long sequences are suitable, or the -d and -i conditions are not met, the original Ns are placed back between those scaftigs.
138139

140+
runRAILS.sh uses scaftigs for patching gaps, whereas runRAILS_lowcontiguityseqs.sh uses scaffold sequences (not broken at Ns) instead.
141+
If you intend to use an assembly with low contiguity for patching gaps, one that has few and short gaps (stretches of Ns), you may use runRAILS_lowcontiguityseqs.sh. This is because, depending on the -d parameter set, insufficient anchoring bases may align to patch a gap.
142+
143+
139144
RAILS process:
140145

141146
In RAILS, the process is similar as for Cobbler, except that the draft assembly is not broken up at Ns, since the goal is to merge distinct sequences into larger ones. Long sequences are aligned to the draft assembly sequences, orienting and ordering sequences and simulateneously filling the gaps between them, using DNA bases from the long sequences.

0 commit comments

Comments
 (0)