Utilizing aligner advising to create a new ensemble alignerDan DeBlasio and John Kececioglu
E-mail question/comments: firstname.lastname@example.org
Multiple sequence alignment is an essential step in many biological analyses but all of the standard formulations of the problem are NP-complete. Because of the important and difficulty of the multiple sequence alignment problem there are a large number of heuristic aligners that compute high quality alignments. It is not always easy to know which aligner is best for your given input. In addition, each of the aligners has a large number of tunable parameters that can greatly impact the quality of the output alignment. Using the Facet accuracy estimator we developed the first ensemble aligner to achieve significant increases in accuracy. The ensemble aligner uses aligner advising to choose not only the aligner but also the parameter choice that will produce a high quality alignment for a given set of input sequences.
This page provides the needed software for the ensemble aligner that uses the Facet accuracy estimator as well as the precomputed aligner advising set and the benchmarks we used for testing.
Ensemble Multiple Sequence Alignment via Advising
Dan DeBlasio and John Kececioglu (b)
In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB '15).
ACM, New York, NY, USA, pp452-461.
doi:10.1145/2808719.2808766 (pdf) (talk)
This poster was presented at ISMB/ECCB 2015 in Dublin, Ireland and descries ensemble alignment.
The Facet distribution includes the accuracy estimator (written in Java as well as a driver script, a wrapper for PSIPRED secondary structure predictor and scripts for using Facet for aligner advising.
FACET v1.4 (tgz) (6 Aug 2015)
The ensemble alignment drivers found in the Facet package need to be modified to specify where each individual aligner is installed on a users specific environment.
The development version can be found on GitHub (http://git.io/Facet)
Note this application requires a working copy of PSIPRED as well as BLAST. PSIPRED v3.2 can be downloaded here: (link).
Greedy aligner advising sets
Ensemble alignment scripts are now included in FACET v1.4 and above. These sets are different than those in DeBlasio and Kececioglu 2015 (b) as it is the Greedy set found for all benchmarks as opposed to 12-fold cross validation. The cross-validation sets can be found in this (tgz).
The protein benchmarks used for testing Facet are a combination of the BENCH (link) benchmark set or Edgar and the PALI (link) benchmark set. Attached is a tar file wth both the input sequence sets as well as the reference alignment annotated with the core columns in capital letters. (tgz). The PSIPRED structure predictions can also be downloaded here: (tgz)
AcknowledgementsResearch supported by the NSF IGERT Grant in Comparative Genomics DGE-0654435 and NSF Grant