Estimating the Accuracy of Multiple Alignments and its Use in Parameter Advising
We develop a novel and general approach to estimating the accuracy of protein multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new problem that we call parameter advising. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. We evaluate this approach by applying it to the task of parameter advising: the problem of choosing alignment scoring parameters from a collection of parameter values to maximize the accuracy of a computed alignment. Our estimator, which we call Facet (for “feature-based accuracy estimator”), yields a parameter advisor that on the hardest benchmarks provides more than a 20% improvement in accuracy over the best default parameter choice, and outperforms the best prior approaches to selecting good alignments for parameter advising.
KeywordsInteger Linear Program Structural Alignment Accuracy Estimator Parameter Choice Balance Weight
Unable to display preview. Download preview PDF.
- 3.Edgar, R.C.: http://www.drive5.com/bench (2009)
- 12.Wheeler, T.J., Kececioglu, J.D.: Multiple alignment by aligning alignments. Bioinformatics 23, i559–i568 (2007); Proceedings of the 15th ISMBGoogle Scholar
- 13.Wheeler, T.J., Kececioglu, J.D.: Opal: software for aligning multiple biological sequences. Version 2.1.0 (January 2012), http://opal.cs.arizona.edu