Prediction of Hot Spots Based on Physicochemical Features and Relative Accessible Surface Area of Amino Acid Sequence
Hot spot is dominant for understanding the mechanism of protein-protein interactions and can be applied as a target to drug design. Since experimental methods are costly and time-consuming, computational methods are prevalently applied as an useful tool in hot spot prediction through sequence or structure information. Here, we propose a new sequence-based model that combines physicochemical features with relative accessible surface area of amino acid sequence. The model consists of 83 classifiers involving IBk algorithm, where instances for one classifier are encoded by corresponding property extracted from 544 properties in AAindex1 database. Then several top performance classifiers with respect to F1 score are selected to be an ensemble by majority voting technique. The model outperforms other state-of-the-art computational methods, yields a F1 score of 0.80 on BID test set.
KeywordsHot spots Physicochemical features Majority voting IBk algorithm
This work was supported by the National Natural Science Foundation of China (Nos. 61300058, 61472282, 61271098 and 61374181).
- 12.Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)Google Scholar
- 14.Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., Russo, R., Huang, D., Paczkowski, M., Lalchandani, V., Ramachandra, C.: The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)CrossRefGoogle Scholar