Nvidia GPU Cluster for Genomic research - WB

1. Background and Objective
The institute dedicated to advancing genomic research identified the need for a cutting-edge High Performance Computing (HPC) infrastructure to support large-scale genomic data processing and analysis.
With the increasing volume and complexity of genomic datasets, traditional computational resources were inadequate for accelerating analyses such as whole genome sequencing, variant calling, and functional genomics. The new HPC installation aimed to provide GPU-accelerated computing and robust storage solutions to enable faster, scalable, and efficient genomic workflows, accelerating discoveries in health and disease biology.
2. Installation Overview
Hardware Configuration
- 8 GPU System- NVIDIA DGX A100 for accelerating analyses such as whole genome sequencing, variant calling, and functional genomics.
- 2 PiB parallel file system (PFS) from DDN, was integrated to deliver high-throughput, scalable storage for massive genomic datasets.
- Interconnect: High-speed InfiniBand switches provide low-latency, high-bandwidth communication essential for parallel processing and data exchange across nodes.
Software and Management
- The DGX A100 system was installed to support GPU-accelerated genomics pipelines.
- Clara Parabricks suite was deployed, providing optimized and accelerated genomic analysis workflows such as sequence alignment, variant calling, and RNA-Seq analysis.
- The parallel file system (PFS) from DDN, with a total raw capacity of 2 PiB, was integrated to deliver high-throughput, scalable storage for massive genomic datasets.
3. Performance and Research Impact
- The DGX A100 cluster provides multi-petaflop GPU performance enabling rapid execution of computationally intensive genomics workloads.
- Clara Parabricks accelerates genomic pipelines by orders of magnitude compared to CPU-only methods, reducing analysis time from days to hours.
- The 2 PiB DDN parallel file system delivers sustained throughput for simultaneous high-volume data streaming and analysis by multiple users.
- Researchers benefit from faster turnaround in genomic data processing, enabling timely insights into genetic disease mechanisms and precision medicine.
- The infrastructure has catalyzed collaborative projects across institutes focusing on population genomics, cancer genomics, and rare genetic disorders.
This HPC installation has the Institute to remain at the forefront of genomic research by enabling unprecedented computational speed and scalability in analyzing large and complex biological datasets.