Lightning-fast iterations with HP Z8+NVIDIA RAPIDS

Junior Data Scientist

Hello there, I’m Ruchi Bhatia!
I’m the 6th Grandmaster in the Datasets tier on Kaggle and currently working at Colgate Palmolive in the Global Information Technology sector as an Executive Associate.

I’m also one of the 16 data science global ambassadors selected by Z by HP and NVIDIA. We have been provided state-of-the-art technology by HP and NVIDIA to run our data science workflows smoothly and locally.

My current gear includes:
- HP Z8 G4 Workstation: an indomitable beast integrated with
● dual 6234 3.3 GHz 8 core Xeon processors
● NVIDIA Quadro RTX 6000
● 96 GB memory
● 2TB storage

- HP ZBook Studio: an extremely compact mobile workstation integrated with
● NVIDIA Quadro RTX 5000
● Intel Core i9-10885H @2.4GHz x 16
● 32 GB memory
● 2 TB storage

- HP Z38c: a rich and immersive curved display

To fully utilize the potential of my HP Z8 workstation, I decided to explore the NVIDIA RAPIDS suite. As someone who has used Pandas extensively, making the switch to RAPIDS was relatively easy as simple code changes are required for end-to-end speed improvements in the data science life cycle.

RAPIDS makes use of NVIDIA CUDA and high-bandwidth GPU memory. It is open-source and freely available for anyone to use and the documentation is very comprehensive.

The first time I ever used RAPIDS was for the competition I am currently working on, WiDS Datathon 2021, which includes several patient records from the first 24 hours of intensive care, and we are required to determine whether a patient admitted to an ICU has been diagnosed with a particular type of diabetes, Diabetes Mellitus.

The tabular data contains 181 columns, making Exploratory Data Analysis a crucial step to spot patterns and special cases and pick the columns that create a statistically significant difference in model performance. Libraries that are tailored to run on the CPU tend to become a bottleneck in terms of memory usage and time with larger datasets.

This a snapshot of how I visualized the distribution of a few features in the dataset:

End-to-end workflow acceleration
The steps I followed for working on a problem statement that required usage of GPUs before I got introduced to RAPIDS were: 

    1. Loading data 
    2. ETL tasks: data cleaning; feature extraction, generation and selection
    3. Converting the output to a format specific to the GPU-accelerated machine learning library
    4. Moving the data to GPU memory
    5. Training the model + Hyperparameter tuning using the GPU
    6. Moving the data back to host memory
    7. Model deployment

Steps 1 and 2 were carried out with the help of CPU processing. Although the speedup for model training was decent and much better than training on the CPU, the operations for converting the data format created a lot of overhead.

RAPIDS deals with this issue with skill and grace by using a GPU Dataframe that executes the Apache Arrow columnar data format on the GPU. 

Performance speedup
As for the difference I noticed while running various code segments for WiDS, it took me less time to run functions locally on my Z8 workstation than on Kaggle’s hosted environment.

The speedup achieved by using Z8 for my workflow was tremendous.

By combining powerful Quadro RTX GPUs with the acceleration of RAPIDS, running multiple experiments in parallel is instrumental in being able to gain insights quicker! 

Everything from reading a large dataset to tuning hyperparameters to improve model performance is like a flash! Since the entire training pipeline runs on GPUs, the projects run very smoothly enabling lightning-fast iterations. 

Data Science Software Stack Preload

Z8 even has a data science software stack preloaded which includes 
- libraries like TensorFlow, PyTorch and RAPIDS
- developer tools like PyCharm and Visual Studio Code
- GPU support tools like CUDA and more in a cloud-interoperable kit


Resolving dependency tree issues can be extremely time-consuming and may involve uninstalling and reinstalling libraries and packages again and again. 

This was one of the primary reasons I used to prefer working on Google Colab or Kaggle environment earlier. The data science software stack helps us to dive right into our workflow without spending time individually installing Data Science tools and libraries. Furthermore, future updates of the stack include seamless and convenient transitions to updated packages and dependencies. This eliminates the time we spend on fixing and resolving installation issues from day 0 to day n and ensures productivity all along.

Local GPU perks
There is a weekly limit on the maximum number of GPU hours on Kaggle and the number of GPU instances that can be used at a time. 

Having a local GPU gives us the flexibility to run experiments without any such time constraint or restriction on the number of experiments that are being run simultaneously.  With the amount of data growing in real-world projects, the cost and time factor needs to be accounted for. 

This workstation is definitely a workaround for these commonly faced issues and more. 
The link to my Kaggle notebook for WiDS is: https://www.kaggle.com/ruchi798/wids-datathon-2021-rapids-xgb-lightgbm

You can find regular updates about my work on Linkedin and Twitter. Stay tuned!