How I use HP workstation and my competition plan in 2021

Algorithm Engineer

I am very excited and happy to be invited by HP and NVIDIA to be one of the Z by HP & NVIDIA Global Data Science Ambassadors for the period of 2020-2021. I would like to express my gratitude to them for their support. As an Ambassador, I’ve been able to work with my first deep learning workstation, and my first impression is that the Z8 is very solid and techy, and the hardware deployment is very modular and tidy. 

At the same time, I was very surprised that the small form factor of the ZBook Studio G7 can be equipped with a high-performance GPU like the NVIDIA Quadro RTX 5000 with 16GB of GPU memory. I believe that this mobile workstation can let me easily handle many small-scale data tasks or even medium-scale data tasks. 

Oh, here's another thing that surprised me even more, Z by HP Data Science Software Stack that is pre-loaded on the Z8 and Zbook Studio. It is really nice, which is a great solution that integrates most of the current data science tools, such as NVIDIA drivers, CUDA, RAPIDS, and deep learning frameworks PyTorch/TensorFlow, etc., which allow me to quickly configure what working environment I need. For developers, it avoids many strange problems when configuring the environment.

  1. Figure 1 - My Workstation

As for how I currently use this Data Science Workstation, here's how I do it. Currently, I have deployed my Z8 workstation as a server with both local and remote connections. When I'm not at home, I can use my laptop to connect the Z8 via SSH, allowing me to debug my experiments no matter where I am. For the ZBook, I use it more for quick validation of my ideas. For example, when I'm inspired to come up with some novel idea, the ZBook is equipped with enough hardware, which allows me to write code and debug experiments with the idea very fastly. When the validation is complete, I would upload the complete code to Z8 and train it.

In the Kaggle competitions, I've been focusing more on the fields of deep learning and computer vision. In 2021, I have a new plan for the competition and my research.

First, I will still put more effort on the Computer Vision competition, because I now have a more powerful server-class HP Z8 workstation, which has dual NVIDIA Quadro RTX 6000 GPUs and dual Xeon 6254 CPUs. In the computer vision competition, there are two steps that take up more time in my workflow: 

       1. Loading large datasets with the need to do some complex pre-processing or heavy data augmentation on them. This operation consumes a lot of CPU resources, but now the Z8 workstation with dual Xeon 6254 CPUs can help me process these data very efficiently, saving a lot of time. Because I can assign more workers to the data augmentation. 

As shown in the figure below, the difference in time consumed for data processing between num_workers=36 and num_workers=8 is significant:

  1. Figure 2 - num_workers=36 VS num_workers=8 

       2. Training deep and heavy models with high computational complexity, which requires a relatively large GPU memory, the large-scale data and the depth of the model have a great impact on the training time, which requires a GPU with a very large number of CUDA cores. But the 2* NVIDIA Quadro RTX 6000 GPUs now have a total of 2x*24GB of memory and 2x*4608 CUDA cores, so the training speed is very fast, making it very easy for me to solve most of the problems I encountered in the competition. At the same time, having the support of enough computing power, it gives me the ability to keep track of the latest technological developments in this field, read relevant papers and reproduce some novel ideas mentioned in them, which helps a lot to improve my own abilities. See, CPU/GPU full load keeps me enthusiastic and pushes me to put in more effort for competitions.

  1. Figure 3 - CPU and GPU at full load

 It is worth mentioning that I also recently used the Z8 + NVIDIA Quadro RTX 6000 to win 3rd place in the NFL 1st and Future - Impact Detection competition on Kaggle. This is the first time I used Z8 workstation and won a gold medal. And I participated in the last month of the competition, no doubt because of the high efficiency of the Z8workstation and saved me time in the competition.

  1. Figure 4 - My competition

Second, I also plan to participate in some data mining competitions, which I have not participated in before, mainly because the tabula data is very large and complex, which requires the performance of CPU is higher than the image data. There is no doubt that the Z8 workstation with 2*x36- core CPUs can solve this problem of data processing efficiency very easily.

  1. Figure 5 - RAPIDS VS (Scikit-learn&Pandas)

 Regarding the processing of tabular data, we have to mention RAPIDS software, which is a suite of open source machine learning libraries developed by NVIDIA researchers and other contributors. RAPIDS is designed to use skills you already have -- like working with tabular data in SQL or Pandas, and building models with scikit-learn -- to empower vast speedups with GPUs. Because RAPIDSData Science library runs directly on the GPU, I can easily get a 100x or higher speed up of my Machine Learning pipeline. These advantages definitely help me to learn new things more efficiently than I have ever experienced before.

  1. Figure 6 - cuML

The above is how I am currently using the HP Z8 data science workstation. As I look forward to the new competition plan for myself this year, I believe that with the support of Z by HP & NVIDIA AI, I can complete my plan very well.