The University of Arizona

Facet

Multiple alignment accuracy estimation and parameter advising

Utilizing aligner advising to create a new ensemble aligner

Dan DeBlasio and John Kececioglu

E-mail question/comments: deblasio@cs.arizona.edu


About

Multiple sequence alignment is an essential step in many biological analyses but all of the standard formulations of the problem are NP-complete. Because of the important and difficulty of the multiple sequence alignment problem there are a large number of heuristic aligners that compute high quality alignments. It is not always easy to know which aligner is best for your given input. In addition, each of the aligners has a large number of tunable parameters that can greatly impact the quality of the output alignment. Using the Facet accuracy estimator we developed the first ensemble aligner to achieve significant increases in accuracy. The ensemble aligner uses aligner advising to choose not only the aligner but also the parameter choice that will produce a high quality alignment for a given set of input sequences.

This page provides the needed software for the ensemble aligner that uses the Facet accuracy estimator as well as the precomputed aligner advising set and the benchmarks we used for testing.

Publications

Please cite:

Ensemble Multiple Sequence Alignment via Advising
Dan DeBlasio and John Kececioglu (b)
In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB '15).
ACM, New York, NY, USA, pp452-461.
doi:10.1145/2808719.2808766 (pdf) (talk)

Poster

This poster was presented at ISMB/ECCB 2015 in Dublin, Ireland and descries ensemble alignment.
(link)

Download

The Facet distribution includes the accuracy estimator (written in Java as well as a driver script, a wrapper for PSIPRED secondary structure predictor and scripts for using Facet for aligner advising.

FACET v1.4 (tgz) (6 Aug 2015)

The ensemble alignment drivers found in the Facet package need to be modified to specify where each individual aligner is installed on a users specific environment.

The development version can be found on GitHub (http://git.io/Facet)

Note this application requires a working copy of PSIPRED as well as BLAST. PSIPRED v3.2 can be downloaded here: (link).

Advisor Sets

Greedy aligner advising sets

Ensemble alignment scripts are now included in FACET v1.4 and above. These sets are different than those in DeBlasio and Kececioglu 2015 (b) as it is the Greedy set found for all benchmarks as opposed to 12-fold cross validation. The cross-validation sets can be found in this (tgz).

Supplemental Data

Benchmarks

The protein benchmarks used for testing Facet are a combination of the BENCH (link) benchmark set or Edgar and the PALI (link) benchmark set. Attached is a tar file wth both the input sequence sets as well as the reference alignment annotated with the core columns in capital letters. (tgz). The PSIPRED structure predictions can also be downloaded here: (tgz)

Acknowledgements

Research supported by the NSF IGERT Grant in Comparative Genomics DGE-0654435 and NSF Grant
IIS-1217886.

Terms of Use

Facet is free for noncommercial use, and comes with neither warranty nor guarantee. Facet cannot be redistributed in any form without consent of the authors. If you wish to use Facet for commercial purpose, you must first obtain the permission from all authors. All noteworthy uses of Facet should cite the related paper.