1Department of Biology, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences, D.H.A, Lahore 54792, Pakistan
2 Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, UK
HANDS stands for "HSP base Assignment using NGS data through Diploid Similarity" and is a tool to characterize Homeolog-Specific Polymorphisms (HSPs) in allopolyploid genomes. HSPs are the positions in a polyploid genome where the homeologous subgenomes have different bases. HANDS2 involves comparative alignments of next-generation sequencing (NGS) reads from polyploid and diploid-progenitors onto a suitable reference sequence and uses diploid similarity to assign base-identities to subgenomes at HSP positions. HANDS2 characterizes homoeallelic base-identities with high accuracy even in the absence of RNA-seq data for one of the diploid-progenitors and supports up to ten diploid-progenitors. For queries regarding HANDS2 tool, please contact Aziz Mithani at aziz.mithani@lums.edu.pk
Download
HANDS2 can be downloaded from here.C++ code to filter SAM and VCF files can be downloaded from here.
Current Version: v1.1.1 [16-June-2017]
Includes bug fixes, memory improvements and adds support for BAM files.HANDS2 Inputs
HANDS2 takes the following files as input for the assignment of homoeallelic base-identities.- SAM/BAM file of the polyploid. SAM/BAM file can be obtained using standard alignment software such as BWA, BOWTIE or BOWTIE2. HANDS2 requires the alignment file to be position sorted (see SAMtools sort command)
- A General Feature Format (GFF) file containing start and end coordinates for each gene/contig. This file is automatically generated by HANDS2 when a transcriptomic reference is constructed from a set of unigenes/contigs using ‘seq2ref’ command. HANDS2 only uses entries of the ‘gene’ type from the GFF file.
- VCF files containing lists of HSPs and single base substitutions (SBSs) present in the polyploid and the diploid-progenitors respectively. These files can be obtained using standard variant calling tools such as SAMtools, GATK or FreeBayes. HANDS2 uses VCF version 4.0 or greater.
- Base coverage files containing number of reads supporting a particular base at each position in a tab-delimited format for HSP/SBS validation (optional). These files can be generated from a SAM file using ‘coverage’ command available in HANDS2.
- An optional list of positions in the reference (a tab delimited file containing the sequence/chromosome names, positions and reference base) to be checked for HSPs in addition to the positions present in the HSP list during pre-processing step.
Usage:
java -jar hands2.jar <command> <input-parameters>
Commands and Options:
help | Display this help |
assign | Assign homoeallelic base identities |
coverage | Calculate the number of reads supporting a particular base at each position |
seq2ref | Create an in silico reference using a set of unigenes, contigs or other sequences |
assign
- Assign homoeallelic base identities
This commands assigns the homoeallelic base-identities to the allopolyploid subgenomes. It uses the SAM/BAM file of the polyploid genome along with the VCF files for the polyploid and the diploid-progenitors to assign base-identity to the polyploid subgenomes. HANDS2 supports up to ten diploid-progenitors and can assign base-identities with high accuracy even in the absence of a diploid-progenitor. Base assignment can be preceded by an optional pre-processing step, which validates the lists of HSPs and SBSs provided as input. The pre-processing step can be turned on by providing the base coverage files for the polyploid and/or diploid-progenitors. An additional list of positions to be checked for HSP can also be provided in the pre-processing step.
java -jar hands2.jar assign <input parameters>
Input Parameters
-h or -help | Display this help |
-i <File> | Polyploid SAM/BAM file |
-g <File> | GFF file containing gene start/end coordinates |
-hsp <File> | Polyploid HSP file in VCF Format. |
-snp<n> <File> | Diploid # n SNP file in VCF Format. |
-bc <File> | Polyploid Base coverage file (optional). See coverage command. |
-bc<n> <File> | Diploid # n Base coverage file, e.g. bc1 (optional). See coverage command. |
-out<n> <File> | Sub-Genome # n output file, e.g. out1 |
-vcf <Boolean> | Generate VCF output (Default: TRUE). When FALSE, tab-delimited output is generated. |
-sp <float> | SNP pair proportion threshold (Default: 0.05) |
-pm <float> | Base pattern matching threshold (Default: 0.5) |
-pa <char> | Base pattern assignment mode (M: Keep maximum proportion for a base or A: Add all proportions; Default: M) |
-r <Boolean> | Rectify Assignment using reference genome (Default: FALSE) |
-m <Boolean> | Merge Base Patterns before assignment (Default: FALSE) |
-d <int> | Use genome <int> as distant genome (Default: <null>). Cannot be used with missing genome. |
coverage
- Calculate base coverage for each position from a SAM/BAM file
This command creates a base coverage file for the given SAM/BAM file containing number of reads supporting a particular base at each position. Base with phred quality less than the specified threshold (-q) are ignored when calculating the base coverage. Base quality check can be turned off by providing ‘-q 0’ as parameter. The output is in the tab-delimited format containing the number of reads supporting bases A, T, G, C and N. This file can be used to validate the variant list during the pre-processing step of homoeallelic base assignments by HANDS2.
java -jar hands2.jar coverage <input parameters>
seq2ref
- Create an in silico reference from given sequences/contigs
This command creates an in silico reference from a set of sequences which can be used as a reference sequence for the alignment of sequencing reads using standard alignment tools like BWA, BOWTIE or BOWTIE2. This command reads a multi-fasta file (containing multiple sequences in fasta format) and generates a fasta file by concatenating the sequences such that two consecutive sequences are separated by a stretch of Ns. It also generates a GFF file for the created reference.
java -jar hands2.jar seq2ref <input parameters>
Input Parameters
-h or -help | Display this help |
-i <File> | Input sequence file (multi-fasta format) |
-o <File> | Output file |
-n <str> | Header for the in silico reference |
-g <int> | Gap size between two sequences (Default: 200) |
Examples:
Assign homoeallelic base-identities
To run the wheat example data, go to the directory where you have extracted the files and run the following command:
java -jar hands2.jar assign -i example/wheat/polyploid.sam -g example/wheat/reference.fa.gff -hsp example/wheat/polyploid.hsp -snp1 example/wheat/diploid1.snp -snp2 example/wheat/diploid2.snp -snp3 example/wheat/diploid3.snp -bc example/wheat/polyploid.bc -bc1 example/wheat/diploid1.bc -bc2 example/wheat/diploid2.bc -bc3 example/wheat/diploid3.bc -out1 example/wheat/out1.vcf -out2 example/wheat/out2.vcf -out3 example/wheat/out3.vcf -m true
java -jar hands2.jar assign -i example/brassica/napus.sam -g example/brassica/rapa_ref.fa.gff -hsp example/brassica/napus.hsp -snp1 example/brassica/rapa.snp -snp2 example/brassica/oleracea.snp -bc example/brassica/napus.bc -bc1 example/brassica/rapa.bc -bc2 example/brassica/oleracea.bc -out1 example/brassica/napus-a.vcf -out2 example/brassica/napus-c.vcf -m true
java -jar hands2.jar assign -i example/brassica/napus.sam -g example/brassica/rapa_ref.fa.gff -hsp example/brassica/napus.hsp -snp1 example/brassica/rapa.snp -snp2 "" -bc example/brassica/napus.bc -bc1 example/brassica/rapa.bc -out1 example/brassica/napus-a.vcf -out2 example/brassica/napus-c.vcf
java -jar hands2.jar assign -i example/wheat/polyploid.sam -g example/wheat/reference.fa.gff -hsp example/wheat/polyploid.hsp -snp1 example/wheat/diploid1.snp -snp2 example/wheat/diploid2.snp -snp3 example/wheat/diploid3.snp -out1 example/wheat/out1.vcf -out2 example/wheat/out2.vcf -out3 example/wheat/out3.vcf -m true
java -jar hands2.jar assign -i example/brassica/napus.sam -g example/brassica/rapa_ref.fa.gff -hsp example/brassica/napus.hsp -snp1 example/brassica/rapa.snp -snp2 example/brassica/oleracea.snp -bc example/brassica/napus.bc -bc1 example/brassica/rapa.bc -bc2 example/brassica/oleracea.bc -out1 example/brassica/napus-a.vcf -out2 example/brassica/napus-c.vcf -m true -p positions.txt
Calculate base coverage
To calculate base coverage file for a given SAM/BAM file, use the coverage command available in HANDS2:
java -jar hands2.jar coverage -i example/brassica/napus.sam -o example/brassica/coverage.bc
java -jar hands2.jar coverage -i example/brassica/napus.sam -o example/brassica/coverage.bc -q 0
Create in silico reference
To generate in silico reference from a set of contigs/cDNA sequences, use the seq2ref command available in HANDS2:
java -jar hands2.jar seq2ref -i example/brassica/cdna.mfa -o example/brassica/reference.fa -n "my_ref my in silico reference"