Facet: Feature-Based Accuracy Estimator
Dan DeBlasio and John KececiogluE-mail question/comments: deblasio@cs.arizona.edu
About
We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment.
For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure non-local properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond lin- ear combinations of features, and (c) develops new regression formulations for learning an estimator from examples
We call our estimator Facet (for feature-based accuracy estimator). Our estimator can be used in a number of ways including for Parameter Advising using an single aligner, to create an Ensemble Aligner that combined the output of several aligners, or for Adaptive Local Realignment. This page provides the accuracy estimator as a stand alone utility. Please see the corresponding page for the uses mentiond earlier.
Publications
Please cite:
Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment-
John Kececioglu and Dan DeBlasio
Journal of Computational Biology 20:(4), 259-279, 2013.
doi:10.1089/cmb.2013.0007 (pdf) (talk)
Previous Publication
Estimating the Accuracy of Multiple Alignments and its Use in Parameter Advising
Dan DeBlasio, Travis J. Wheeler, John Kececioglu
Proceedings of the 16th
Conference on Research in Computational Molecular Biology (RECOMB),
Springer-Verlag Lecture Notes in Bioinformatics 7262, 45-59, 2012.(pdf) (talk)
Download
The Facet distribution includes the accuracy estimator (written in Java as well as a driver script, a wrapper for PSIPRED secondary structure predictor.
FACET v1.4 (tgz) (6 Aug 2015)
Previous Versions
The development version can be found on GitHub (http://git.io/Facet)
Note this application requires a working copy of PSIPRED as well as BLAST. PSIPRED v3.2 can be downloaded here: (link).
Posters
This poster was presented at ISMB/ECCB 2011 in Vienna, Austria and descries the estimator features.
|
This poster was presented at ISMB/ECCB 2013 in Berlin, Germany and descries parameter advising.
|
Supplemental Data
Benchmarks
The protein benchmarks used for testing Facet are a combination of the BENCH (link) benchmark set or Edgar and the PALI (link) benchmark set. Attached is a tar file wth both the input sequence sets as well as the reference alignment annotated with the core columns in capital letters. (tgz). The PSIPRED structure predictions can also be downloaded here: (tgz)
Acknowledgements
Research supported by the NSF IGERT Grant in Comparative Genomics DGE-0654435 and NSF GrantIIS-1217886.