Choosing aligner parameters for specific input protein sequencesDan DeBlasio and John Kececioglu
E-mail question/comments: firstname.lastname@example.org
Parameter advising is the task of choosing settings of an aligners tunable parameters that will generate a high quality alignment for a given input set of sequences. Our parameter advisor first generates a set of candidate alignments for the input by using the aligner to compute an alignment for each parameter choice in the advisor set. We can then use Facet to estimate the accuracy and choose the alignment that with the highest accuracy. To do this we must carefully choose the advisor set so that there is at least one parameter choice that will produce a high accuracy alignment for each input. Finding the optimal set of parameter choices is NP-complete, so we provided a greedy approximation algorithm to find sets that have advising accuracy that is close to optimal.
Facet along with greedy advisor sets yield a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality.
This page provides resources for using Facet for Parameter advising including the Facet executable written in Java, precomputed advisor sets, and the benchmarks used in testing our estimator. In addition we provide a new version of the Opal aligner (v3.+) which includes the ability to run parameter advising in parallel.
For accuracy estimation and oracle advisor sets:
Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment-
John Kececioglu and Dan DeBlasio
Journal of Computational Biology 20:(4), 259-279, 2013.
For greedy advisor sets:
Learning Parameter-Advising Sets for Multiple Sequence Alignment
Dan DeBlasio and John Kececioglu (a)
IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015
Estimating the Accuracy of Multiple Alignments and its Use in Parameter Advising
Dan DeBlasio, Travis J. Wheeler, John Kececioglu
Proceedings of the 16th Conference on Research in Computational Molecular Biology (RECOMB), Springer-Verlag Lecture Notes in Bioinformatics 7262, 45-59, 2012.(pdf) (talk)
Learning Parameter Sets for Alignment Advising
Dan DeBlasio and John Kececioglu
In Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB '14).
ACM, New York, NY, USA, pp230-239.
doi:10.1145/2649387.2649448 (pdf) (talk)
This poster was presented at ISMB/ECCB 2013 in Berlin, Germany and descries parameter advising.
The Facet distribution includes the accuracy estimator (written in Java as well as a driver script, a wrapper for PSIPRED secondary structure predictor. Advisor sets can be downloaded below.
FACET v1.4 (tgz) (6 Aug 2015)
The Opal distribution includes the Opal and Facet software as a .jar file as well as a driver and the scripts needed to predict secondary structure. Advisor sets can be downloaded below.
Opal v3.1.b0 (tgz) (14 Jun 2016)
- FACET v1.3 (tgz) (3 Jul 2015)
- FACET v1.1 (tgz) (15 Jan 2014)
- FACET v1.0 (tgz)
- Opal v3.0.b0 (tgz) (3 Aug 2015)
The development version can be found on GitHub (http://git.io/Facet)
Note this application requires a working copy of PSIPRED as well as BLAST. PSIPRED v3.2 can be downloaded here: (link).
Greedy parmeter advisor sets
Oracle advisor sets
This file contains the parameter sets used for analysis for the Facet Kececioglu and DeBlasio 2013. There is a README in the attached file as well that explains the format. These features were used to generate the alternate alignments of the benchmarks used for evaluation. (tgz)
The protein benchmarks used for testing Facet are a combination of the BENCH (link) benchmark set or Edgar and the PALI (link) benchmark set. Attached is a tar file wth both the input sequence sets as well as the reference alignment annotated with the core columns in capital letters. (tgz). The PSIPRED structure predictions can also be downloaded here: (tgz)
This file contains the alternate alignments produced by Opal used for analysis for the Facet Kececioglu and DeBlasio 2013. The folder format is the same as described in the README above;
<repl>.<lambda>.<lambda_term>.<gamma>.<gamma_term> where these are replacement matrix (repl), gap open and gap extension penalty (lambda & gamma) and those for terminal gaps (lambda_term & gamma_term).
AcknowledgementsResearch supported by the NSF IGERT Grant in Comparative Genomics DGE-0654435 and NSF Grant