1Department of Biology, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences, D.H.A, Lahore 54792, Pakistan
2 Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, UK
HANDS stands for "HSP base Assignment using NGS data through Diploid Similarity" and is a tool to characterize Homeolog-Specific Polymorphisms (HSPs) in allopolyploid genomes. HSPs are the positions in a polyploid genome where the homeologous subgenomes have different bases. HANDS2 involves comparative alignments of next-generation sequencing (NGS) reads from polyploid and diploid-progenitors onto a suitable reference sequence and uses diploid similarity to assign base-identities to subgenomes at HSP positions. HANDS2 characterizes homoeallelic base-identities with high accuracy even in the absence of RNA-seq data for one of the diploid-progenitors and supports up to ten diploid-progenitors and. For queries regarding HANDS2 tool, please contact Aziz Mithani at firstname.lastname@example.org
DownloadHANDS2 can be downloaded from here.
C++ code to filter SAM and VCF files can be downloaded from here.
Current Version: v1.1.1 [16-June-2017]Includes bug fixes, memory improvements and adds support for BAM files.
HANDS2 InputsHANDS2 takes the following files as input for the assignment of homoeallelic base-identities.
- SAM/BAM file of the polyploid. SAM/BAM file can be obtained using standard alignment software such as BWA, BOWTIE or BOWTIE2. HANDS2 requires the alignment file to be position sorted (see SAMtools sort command)
- A General Feature Format (GFF) file containing start and end coordinates for each gene/contig. This file is automatically generated by HANDS2 when a transcriptomic reference is constructed from a set of unigenes/contigs using ‘seq2ref’ command. HANDS2 only uses entries of the ‘gene’ type from the GFF file.
- VCF files containing lists of HSPs and single base substitutions (SBSs) present in the polyploid and the diploid-progenitors respectively. These files can be obtained using standard variant calling tools such as SAMtools, GATK or FreeBayes. HANDS2 uses VCF version 4.0 or greater.
- Base coverage files containing number of reads supporting a particular base at each position in a tab-delimited format for HSP/SBS validation (optional). These files can be generated from a SAM file using ‘coverage’ command available in HANDS2.
- An optional list of positions in the reference (a tab delimited file containing the sequence/chromosome names, positions and reference base) to be checked for HSPs in addition to the positions present in the HSP list during pre-processing step.
java -jar hands2.jar <command> <input-parameters>
Commands and Options:
|help||Display this help|
|assign||Assign homoeallelic base identities|
|coverage||Calculate the number of reads supporting a particular base at each position|
|seq2ref||Create an in silico reference using a set of unigenes, contigs or other sequences|
assign- Assign homoeallelic base identities
This commands assigns the homoeallelic base-identities to the allopolyploid subgenomes. It uses the SAM/BAM file of the polyploid genome along with the VCF files for the polyploid and the diploid-progenitors to assign base-identity to the polyploid subgenomes. HANDS2 supports up to ten diploid-progenitors and can assign base-identities with high accuracy even in the absence of a diploid-progenitor. Base assignment can be preceded by an optional pre-processing step, which validates the lists of HSPs and SBSs provided as input. The pre-processing step can be turned on by providing the base coverage files for the polyploid and/or diploid-progenitors. An additional list of positions to be checked for HSP can also be provided in the pre-processing step.
java -jar hands2.jar assign <input parameters>
|-h or -help||Display this help|
|-i <File>||Polyploid SAM/BAM file|
|-g <File>||GFF file containing gene start/end coordinates|
|-hsp <File>||Polyploid HSP file in VCF Format.|
|-snp<n> <File>||Diploid # n SNP file in VCF Format.|
|-bc <File>||Polyploid Base coverage file (optional). See coverage command.|
|-bc<n> <File>||Diploid # n Base coverage file, e.g. bc1 (optional). See coverage command.|
|-out<n> <File>||Sub-Genome # n output file, e.g. out1|
|-vcf <Boolean>||Generate VCF output (Default: TRUE). When FALSE, tab-delimited output is generated.|
|-sp <float>||SNP pair proportion threshold (Default: 0.05)|
|-pm <float>||Base pattern matching threshold (Default: 0.5)|
|-pa <char>||Base pattern assignment mode (M: Keep maximum proportion for a base or A: Add all proportions; Default: M)|
|-r <Boolean>||Rectify Assignment using reference genome (Default: FALSE)|
|-m <Boolean>||Merge Base Patterns before assignment (Default: FALSE)|
|-d <int>||Use genome <int> as distant genome (Default: <null>). Cannot be used with missing genome.|
coverage- Calculate base coverage for each position from a SAM/BAM file
This command creates a base coverage file for the given SAM/BAM file containing number of reads supporting a particular base at each position. Base with phred quality less than the specified threshold (-q) are ignored when calculating the base coverage. Base quality check can be turned off by providing ‘-q 0’ as parameter. The output is in the tab-delimited format containing the number of reads supporting bases A, T, G, C and N. This file can be used to validate the variant list during the pre-processing step of homoeallelic base assignments by HANDS2.
java -jar hands2.jar coverage <input parameters>
seq2ref- Create an in silico reference from given sequences/contigs
This command creates an in silico reference from a set of sequences which can be used as a reference sequence for the alignment of sequencing reads using standard alignment tools like BWA, BOWTIE or BOWTIE2. This command reads a multi-fasta file (containing multiple sequences in fasta format) and generates a fasta file by concatenating the sequences such that two consecutive sequences are separated by a stretch of Ns. It also generates a GFF file for the created reference.
java -jar hands2.jar seq2ref <input parameters>
|-h or -help||Display this help|
|-i <File>||Input sequence file (multi-fasta format)|
|-o <File>||Output file|
|-n <str>||Header for the in silico reference|
|-g <int>||Gap size between two sequences (Default: 200)|
Assign homoeallelic base-identities
To run the wheat example data, go to the directory where you have extracted the files and run the following command:
java -jar hands2.jar assign -i example/wheat/polyploid.sam -g example/wheat/reference.fa.gff -hsp example/wheat/polyploid.hsp -snp1 example/wheat/diploid1.snp -snp2 example/wheat/diploid2.snp -snp3 example/wheat/diploid3.snp -bc example/wheat/polyploid.bc -bc1 example/wheat/diploid1.bc -bc2 example/wheat/diploid2.bc -bc3 example/wheat/diploid3.bc -out1 example/wheat/out1.vcf -out2 example/wheat/out2.vcf -out3 example/wheat/out3.vcf -m true
java -jar hands2.jar assign -i example/brassica/napus.sam -g example/brassica/rapa_ref.fa.gff -hsp example/brassica/napus.hsp -snp1 example/brassica/rapa.snp -snp2 example/brassica/oleracea.snp -bc example/brassica/napus.bc -bc1 example/brassica/rapa.bc -bc2 example/brassica/oleracea.bc -out1 example/brassica/napus-a.vcf -out2 example/brassica/napus-c.vcf -m true
java -jar hands2.jar assign -i example/brassica/napus.sam -g example/brassica/rapa_ref.fa.gff -hsp example/brassica/napus.hsp -snp1 example/brassica/rapa.snp -snp2 "" -bc example/brassica/napus.bc -bc1 example/brassica/rapa.bc -out1 example/brassica/napus-a.vcf -out2 example/brassica/napus-c.vcf
java -jar hands2.jar assign -i example/wheat/polyploid.sam -g example/wheat/reference.fa.gff -hsp example/wheat/polyploid.hsp -snp1 example/wheat/diploid1.snp -snp2 example/wheat/diploid2.snp -snp3 example/wheat/diploid3.snp -out1 example/wheat/out1.vcf -out2 example/wheat/out2.vcf -out3 example/wheat/out3.vcf -m true
java -jar hands2.jar assign -i example/brassica/napus.sam -g example/brassica/rapa_ref.fa.gff -hsp example/brassica/napus.hsp -snp1 example/brassica/rapa.snp -snp2 example/brassica/oleracea.snp -bc example/brassica/napus.bc -bc1 example/brassica/rapa.bc -bc2 example/brassica/oleracea.bc -out1 example/brassica/napus-a.vcf -out2 example/brassica/napus-c.vcf -m true -p positions.txt
Calculate base coverage
To calculate base coverage file for a given SAM/BAM file, use the coverage command available in HANDS2:
java -jar hands2.jar coverage -i example/brassica/napus.sam -o example/brassica/coverage.bc
java -jar hands2.jar coverage -i example/brassica/napus.sam -o example/brassica/coverage.bc -q 0
Create in silico reference
To generate in silico reference from a set of contigs/cDNA sequences, use the seq2ref command available in HANDS2:
java -jar hands2.jar seq2ref -i example/brassica/cdna.mfa -o example/brassica/reference.fa -n "my_ref my in silico reference"