22 Mar 2016

Modelling Static Networks in Bioinformatics Applications Using Advanced Computing

Alhadi Bustamam, June 2011
The University of Queensland

Abstract
Molecular Biology and Bioinformatics or Computational Biology have helped enrich each other and together have helped us in understanding the fundamental question in biology, ”what is life?”. They have also provided insights at the molecular level and via systems biology to improve the quality of human health/life and the cure of diseases. Starting with DNA sequences analysis, Bioinformatics has become a major tool to support far wider research areas, thus allowing us to deal with the rapid increase of biological dataset sizes, including genomes, proteomes and metabolomes. Furthermore, advanced computing technology and tools underpins the progress of supporting molecular biology and systems biology research. Meanwhile, the empirical study of networks has enlightened our understanding of many application domains, including the topology of biological systems. Many successful network algorithms have been adopted and applied in bioinformatics applications including the Markov clustering algorithm (MCL). In this thesis, we introduce some different approaches for parallel implementations of fast Markov clustering algorithm, using two different advanced computing environments and sparse data structures, in order to improve theMCL performance to enable the analysis of massive, sparse biological networks. These two approaches include: (1) the MPI-MCL implementation in the supercomputer SGI Altix symmetric multiprocessing (SMP) clusters using the message passing interface (MPI) tool, Fortran 90 language, and compressed storage row (CSR) sparse data structure; and (2) the CUDA-MCL implementation in the graphic processing unit (GPU) streaming processors on desktop and laptop computers using the computed device architecture (CUDA) tool from NVIDIA, C language, and ELLPACK-R sparse data structure. The first environment for MPI-MCL represents a classical, huge, high maintenance cost and very expensive supercomputer machines with multi-core CPUs (SMP clusters). The second one for CUDA-MCL represents a new architecture, low maintenance cost, cheaper, scalable and is widely-available in the market with massively-parallel streaming processors (many-core GPUs). Recently, the original MCL algorithm developed for clustering graphs, has been successfully adopted for clustering biological networks. The MCL is now becoming an effective algorithm for the extraction of complexes from interaction networks and also a key algorithm within Bioinformatics for determining clusters in networks. For instance, clustering protein-protein interaction (PPI) networks is helping to find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data in the biological networks, performance and scalability issues are becoming a critical limiting factor in bioinformatics applications. The highest computational cost of the MCL algorithm is in the expansion and inflation operations on the stochastic (Markov)matrix associated with a graph. In this thesis, we introduce parallel implementation approaches, especially for reducing the computational costs of the expansion and inflation parts. The results show the significant improvements of the parallel Markov clustering implementation in both the SPM clusters or the GPUs. However, with the low maintenance cost, scalable architecture and widely-availability of the GPUs unit in the market, the CUDA-MCL is considered to be the best option for improving the performance in the future. In MPI-MCL, the central construct is message passing by packaging information into a message and sending it from one process or processor to another and computing the data concurrently. To achieve an efficient MPI implementation, we evaluate different parallel schemes such as point-to-point or collective communication approaches using the Single Program Multiple Data (SPMD) model. The SPMD model is the dominant method for structuring parallel programs. We evaluate the scalability and performance of our parallel MPI-MCL using CSR sparse data format on small, medium, and large PPI-network datasets. Our results demonstrate a performance of about 80% in terms of efficiency over a wide range of processors. In addition, this good performance is highly scalable with an increasing number of processors. Meanwhile, the GPU as a new advanced computing environment which uses massively-parallel thread is becoming a very powerful, efficient and a low-cost option for achieving substantial performance gains over the CPU approaches. The CUDA-MCL uses the Single Instruction Multiple Threads (SIMT) model adopted from SPMD model. The central construct is using CUDA tool to allow the GPU to perform parallel sparse matrix-matrix computations, and parallel sparse Markov matrix normalizations which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDA-MCL) is in using the ELLACK-R sparse data format to allow for effective and fine-grain massively-parallel processing. The CUDA also allows us to use the on-chip memory of the GPU efficiently to lower the latency time, thus circumventing a major issue in other parallel computing environments, such as the Message Passing Interface. In CUDA-MCL, comparing the GPU computation times against a modern quad-core CPU on the published (relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of four and nine times are obtained, respectively. On the Human Protein Reference Database, the speed of the clustering of 19599 proteins was improved by a factor of seven with the GPU algorithm. However, on artificially-generated densely-connected networks with 1600 to 4800 nodes, speedups by a factor of 40 to 120 times were readily obtained. As the results show, in all cases, the GPU implementation is significantly faster than the original MCL running on a CPU. In conclusion, our parallel Markov clustering implementations have a significant performance improvement and are scalable on both multiple CPU processors using MPI and (even better) in many-core GPU streaming processors. Such approaches, especially the GPU implementation, are allowing large-scale parallel-computations on off-the-shelf desktop machines that were previously only possible on super-computing architectures. They have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data. Moreover, with the economically low setup and maintenance costs and the rapid improvement and continuous support from their vendor (such as NVIDIA CUDA), GPU computing provides considerable hope for further implementations of this parallel Markov clustering algorithm.

 

Keywords: MCL, MPI-MCL, CUDA-MCL, PPI networks, clustering, bioinformatics, computational biology, system biology, parallel computing, GPU computing. Australian and New Zealand Standard Research Classification (ANZSRC): 060102 Bioinformatics 50%, 080301 Bioinformatics Software 25%, 080501 Distributed and Grid Systems 25%.

Share this article on:

Related Article


Back to Top