The University of Arizona

Facet

Multiple alignment accuracy estimation and parameter advising

Choosing aligner parameters for specific input protein sequences

Dan DeBlasio and John Kececioglu

E-mail question/comments: deblasio@cs.arizona.edu


About

Parameter advising is the task of choosing settings of an aligners tunable parameters that will generate a high quality alignment for a given input set of sequences. Our parameter advisor first generates a set of candidate alignments for the input by using the aligner to compute an alignment for each parameter choice in the advisor set. We can then use Facet to estimate the accuracy and choose the alignment that with the highest accuracy. To do this we must carefully choose the advisor set so that there is at least one parameter choice that will produce a high accuracy alignment for each input. Finding the optimal set of parameter choices is NP-complete, so we provided a greedy approximation algorithm to find sets that have advising accuracy that is close to optimal.

Facet along with greedy advisor sets yield a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality.

This page provides resources for using Facet for Parameter advising including the Facet executable written in Java, precomputed advisor sets, and the benchmarks used in testing our estimator. In addition we provide a new version of the Opal aligner (v3.+) which includes the ability to run parameter advising in parallel.

Publications

Please cite:

For accuracy estimation and oracle advisor sets:
Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment-
John Kececioglu and Dan DeBlasio
Journal of Computational Biology 20:(4), 259-279, 2013.
doi:10.1089/cmb.2013.0007 (pdf)

For greedy advisor sets:
Learning Parameter-Advising Sets for Multiple Sequence Alignment
Dan DeBlasio and John Kececioglu (a)
IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015
doi:10.1109/TCBB.2015.2430323 (pdf)

Previous Publications

Estimating the Accuracy of Multiple Alignments and its Use in Parameter Advising
Dan DeBlasio, Travis J. Wheeler, John Kececioglu
Proceedings of the 16th Conference on Research in Computational Molecular Biology (RECOMB), Springer-Verlag Lecture Notes in Bioinformatics 7262, 45-59, 2012.(pdf) (talk)

Learning Parameter Sets for Alignment Advising
Dan DeBlasio and John Kececioglu
In Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB '14).
ACM, New York, NY, USA, pp230-239.
doi:10.1145/2649387.2649448 (pdf) (talk)

Poster

This poster was presented at ISMB/ECCB 2013 in Berlin, Germany and descries parameter advising.
(link)

Download

The Facet distribution includes the accuracy estimator (written in Java as well as a driver script, a wrapper for PSIPRED secondary structure predictor. Advisor sets can be downloaded below.

FACET v1.4 (tgz) (6 Aug 2015)

The Opal distribution includes the Opal and Facet software as a .jar file as well as a driver and the scripts needed to predict secondary structure. Advisor sets can be downloaded below.

Opal v3.1.b0 (tgz) (14 Jun 2016)


Previous Versions

The development version can be found on GitHub (http://git.io/Facet)

Note this application requires a working copy of PSIPRED as well as BLAST. PSIPRED v3.2 can be downloaded here: (link).

Advisor Sets

Greedy parmeter advisor sets

The greedy sets that can be used with Opal (v3.0 or higher) for parameter advising can be found in this folder. The methods are described in DeBlasio and Kececioglu 2015 (a).

Oracle advisor sets

This file contains the parameter sets used for analysis for the Facet Kececioglu and DeBlasio 2013. There is a README in the attached file as well that explains the format. These features were used to generate the alternate alignments of the benchmarks used for evaluation. (tgz)

Supplemental Data

Benchmarks

The protein benchmarks used for testing Facet are a combination of the BENCH (link) benchmark set or Edgar and the PALI (link) benchmark set. Attached is a tar file wth both the input sequence sets as well as the reference alignment annotated with the core columns in capital letters. (tgz). The PSIPRED structure predictions can also be downloaded here: (tgz)

Alternate Alignments

This file contains the alternate alignments produced by Opal used for analysis for the Facet Kececioglu and DeBlasio 2013. The folder format is the same as described in the README above;
<repl>.<lambda>.<lambda_term>.<gamma>.<gamma_term> where these are replacement matrix (repl), gap open and gap extension penalty (lambda & gamma) and those for terminal gaps (lambda_term & gamma_term). (tgz)

Acknowledgements

Research supported by the NSF IGERT Grant in Comparative Genomics DGE-0654435 and NSF Grant
IIS-1217886.

Terms of Use

Facet is free for noncommercial use, and comes with neither warranty nor guarantee. Facet cannot be redistributed in any form without consent of the authors. If you wish to use Facet for commercial purpose, you must first obtain the permission from all authors. All noteworthy uses of Facet should cite the related paper.