The University of Arizona

Facet

Multiple alignment accuracy estimation and parameter advising

Facet: Feature-Based Accuracy Estimator

Dan DeBlasio and John Kececioglu

E-mail question/comments: deblasio@cs.arizona.edu


About

We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment.

For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure non-local properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond lin- ear combinations of features, and (c) develops new regression formulations for learning an estimator from examples

We call our estimator Facet (for feature-based accuracy estimator). Our estimator can be used in a number of ways including for Parameter Advising using an single aligner, to create an Ensemble Aligner that combined the output of several aligners, or for Adaptive Local Realignment. This page provides the accuracy estimator as a stand alone utility. Please see the corresponding page for the uses mentiond earlier.

Publications

Please cite:

Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment-
John Kececioglu and Dan DeBlasio
Journal of Computational Biology 20:(4), 259-279, 2013.
doi:10.1089/cmb.2013.0007 (pdf) (talk)

Previous Publication

Estimating the Accuracy of Multiple Alignments and its Use in Parameter Advising
Dan DeBlasio, Travis J. Wheeler, John Kececioglu
Proceedings of the 16th Conference on Research in Computational Molecular Biology (RECOMB), Springer-Verlag Lecture Notes in Bioinformatics 7262, 45-59, 2012.(pdf) (talk)

Download

The Facet distribution includes the accuracy estimator (written in Java as well as a driver script, a wrapper for PSIPRED secondary structure predictor.

FACET v1.4 (tgz) (6 Aug 2015)


Previous Versions

The development version can be found on GitHub (http://git.io/Facet)

Note this application requires a working copy of PSIPRED as well as BLAST. PSIPRED v3.2 can be downloaded here: (link).

Posters

This poster was presented at ISMB/ECCB 2011 in Vienna, Austria and descries the estimator features.
(link)
This poster was presented at ISMB/ECCB 2013 in Berlin, Germany and descries parameter advising.
(link)

Supplemental Data

Benchmarks

The protein benchmarks used for testing Facet are a combination of the BENCH (link) benchmark set or Edgar and the PALI (link) benchmark set. Attached is a tar file wth both the input sequence sets as well as the reference alignment annotated with the core columns in capital letters. (tgz). The PSIPRED structure predictions can also be downloaded here: (tgz)

Acknowledgements

Research supported by the NSF IGERT Grant in Comparative Genomics DGE-0654435 and NSF Grant
IIS-1217886.

Terms of Use

Facet is free for noncommercial use, and comes with neither warranty nor guarantee. Facet cannot be redistributed in any form without consent of the authors. If you wish to use Facet for commercial purpose, you must first obtain the permission from all authors. All noteworthy uses of Facet should cite the related paper.