Bozena Malysiak-Mrozek*, Kamil Zur and Dariusz Mrozek Pages 175 - 189 ( 15 )
Background: Protein Data Bank is a world-wide repository that collects and provides macromolecular data of protein structures and other molecules for Life sciences community. Manipulation of vast amount of 3D protein structures and exploration of their properties require parsing thousands of flat files that are used to describe these macromolecular structures every time we perform calculations.
Objective: Expecting more protein structures to appear in the future in open access repositories, like the Protein Data Bank, and meeting the expectations of the era of fast data analytics, we propose inmemory management system for protein structures that predominantly uses main memory of the host server to store, manage and manipulate data. This allows to eliminate the overhead related to loading data from hard drives and storing them in a buffer cache.
Method: In this paper, we show in-memory protein structure management system (IMPSMS), which allows performing various operations, including basic functions like: selection, inserting, updating and searching of protein structures, and execution of more sophisticated functions, like batch calculation of root mean square deviation between proteins stored in the database, batch calculation of torsion angles, structure comparison, structural alignment and superposition of the given molecule to molecules stored in the in-memory database.
Results: In the experimental part, we show that with dedicated in-memory data structures particular operations on proteins can be performed even a hundred times faster than analogous operations preceded by traditional loading and parsing macromolecular data from standard PDB flat files.
Conclusion: Our work proves that designing dedicated data structures and management systems for frequent protein data manipulations brings significant time savings and increases capabilities of running fast data analytics in bioinformatics.
Databases, in-memory, management system, proteins, 3D protein structure, structural bioinformatics.
Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice