Mohammad M. Sultan, Gert Kiss, Diwakar Shukla & Vijay S. Pande
Journal of Chemical Theory and Computation, 12, 10, 5217-5223, 2014.
Publication year: 2014

Abstract

Given the large number of solved crystal structures and NMR ensembles that have been solved to date, classical molecular dynamics (MD) simulations have become powerful tools in the atomistic study of the kinetics and thermodynamics of biomolecular systems on ever increasing time scales. By virtue of the high dimensional state conformational space that is explored, the interpretation of large-scale simulations faces difficulties not unlike those in the Big Data community. We address this challenge by introducing a Clustering Based Feature Selection (CB-FS) method that employs a posterior analysis approach. It combines Supervised Machine Learning (SML) and feature selection with Markov State Models to identify the dynamic characteristics of conformational states. We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions, with examples of this method shown in applications to two exemplary cell-signaling proteins central to human disease states.