Back to >> Software list
ERVcaller: Identify endogenous retrovirus and other transposable element insertions
We
developed a new software tool (ERVcaller) for identifying and
estimating the allele frequency of non-reference transposable element
insertions, particularly ERVs, from short read sequence data. It is
well known that standard reference-based alignment and assembly
approaches for short read sequence data generally fail to properly
assemble large sequence insertions that are not present in the
reference genome used. Reads derived from such insertions typically
either fail to align or align anomalously.
ERVcaller takes
such anomalous reads and remaps them to sets of reference transposable
element sequences of interest to identify the locations of insertions
not present in the original reference genome. It further uses this data
to determine whether each sequenced individual is homozygous for the
insertion allele, heterozygous, or homozygous for the pre-insertion
allele, thus allowing for estimates of population (or case-control
group) allele frequencies for each insertion.
The overall idea
used here is similar to a number of previous tools for mobile element
identification, such as RetroSeq, but ERVcaller is a significant
improvement over previous tools, as it has incorporated the best
aspects of multiple different older tools, utilizing a more diverse set
of input data and providing more detailed output, most notably genotype
data. Benchmark comparisons with other tools also show notable
improvements in sensitivity, precision, and/or speed with each previous
approach.
ERVcaller download
Download ERVcaller Verson 1.4 and software manual and FAQ
Note:
we constantly update the software for new functions, fixed bugs, and
others. If you would like to use the latest version, please send your
email address to us (dawei.li at ttuhsc.edu) so that we can notice you when new versions become available.
Questions?
The software has been tested in multiple servers and by different users. If
you have any questions about installation, error messages, or
interpretation of results, feel free to contact the authors.
Recent major updates:
1) Further increased the accuracy 2) Added the Phred-scale genotype quality and likelihoods 3) Speed up the genotype process significantly 4)
Added the function to distinguish missing and none TE insertion
genotypes in the combined VCF file for population genomics studies 5) Corrected multiple bugs
Full update log:
# Updates (v1.4): #
02/15/2019:
Re-designed the engineer process to increase the (genotyping) speed
significantly #
02/10/2019:
Added the scripts to distinguish missing genotypes and none TE
insertions genotypes for all samples in the combined VCF file #
02/06/2019:
Corrected the output coordinates of TE insertions with TSD #
02/02/2019:
Further standardized the VCF format for the usage of bcftools #
02/01/2019:
Added Phred-scale genotype quality and likelihoods #
01/29/2019:
Adjusted reciprocal-aligned reference genomic region length using the
estimated insert size and SD, which significantly reduced
false-positives #
01/29/2019:
Added a function to estimate insert size and its standard deviation (SD) #
01/38/2019:
Corrected multiple bugs in the main Perl script #
01/24/2019:
Corrected a bug in the script combing VCF files from multiple samples # # Updates (v1.3): #
11/20/2018:
Added the scripts to merge various samples into a list of known TE loci
or TE loci detected from the analyzed samples #
11/12/2018:
Updated the Output in VCF_v4.2 format #
11/05/2018:
Debugged the support of the BAM files generated by Bowtie2 # # Updates (v1.2): #
11/01/2018:
Further optimized the speed of validation steps #
10/21/2018:
Supported multiple bam files as the input #
10/10/2018:
Optimized the validation steps to increase the specificity # # Updates (v1.1): #
09/02/2018:
Optimized the validation steps to significantly increase the speed #
08/28/2018:
Updated the parameter of -S to specify the length of split reads used
(20 bp by default; >=40 bp is recommended for reads of 150 bp in
length) #
08/10/2018:
Added component to support BAM files using different chromosome IDs as
the reference genome, such as "Chr1", "chr1", "1", and "NC_000001.11" #
07/17/2018:
Corrected bugs for checking input files; #
07/17/2018:
Corrected the errors for detecting and genotyping TE insertions using
single-end sequencing data; #
07/16/2018:
Re-formatted the output files #
07/15/2018:
Released ERVcaller Version 1.1 and software manual # # Release (v1.0): #
05/27/2018:
Released ERVcaller Version 1.0 (a testing version) and software manual
Citation:
Chen X, Li D*. ERVcaller: Identifying polymorphic endogenous
retrovirus and other transposable element insertions using whole-genome
sequencing data. Bioinformatics. 2019 Oct 15;35(20):3913-3922. PMID: 30895294. (* corresponding author).
Please report any bugs to us at your earliest convenience! Thank you very much!
|