Facet: Feature-Based Accuracy EstimatorDan DeBlasio and John Kececioglu
E-mail question/comments: firstname.lastname@example.org
We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment.
For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure non-local properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond lin- ear combinations of features, and (c) develops new regression formulations for learning an estimator from examples
We call our estimator Facet (for feature-based accuracy estimator). Our estimator can be used in a number of ways including for Parameter Advising using an single aligner, to create an Ensemble Aligner that combined the output of several aligners, or for Adaptive Local Realignment. This page provides the accuracy estimator as a stand alone utility. Please see the corresponding page for the uses mentiond earlier.
Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment-
John Kececioglu and Dan DeBlasio
Journal of Computational Biology 20:(4), 259-279, 2013.
doi:10.1089/cmb.2013.0007 (pdf) (talk)
Estimating the Accuracy of Multiple Alignments and its Use in Parameter Advising
Dan DeBlasio, Travis J. Wheeler, John Kececioglu
Proceedings of the 16th Conference on Research in Computational Molecular Biology (RECOMB), Springer-Verlag Lecture Notes in Bioinformatics 7262, 45-59, 2012.(pdf) (talk)
The Facet distribution includes the accuracy estimator (written in Java as well as a driver script, a wrapper for PSIPRED secondary structure predictor.
FACET v1.4 (tgz) (6 Aug 2015)
The development version can be found on GitHub (http://git.io/Facet)
Note this application requires a working copy of PSIPRED as well as BLAST. PSIPRED v3.2 can be downloaded here: (link).
This poster was presented at ISMB/ECCB 2011 in Vienna, Austria and descries the estimator features.
This poster was presented at ISMB/ECCB 2013 in Berlin, Germany and descries parameter advising.
The protein benchmarks used for testing Facet are a combination of the BENCH (link) benchmark set or Edgar and the PALI (link) benchmark set. Attached is a tar file wth both the input sequence sets as well as the reference alignment annotated with the core columns in capital letters. (tgz). The PSIPRED structure predictions can also be downloaded here: (tgz)
AcknowledgementsResearch supported by the NSF IGERT Grant in Comparative Genomics DGE-0654435 and NSF Grant