Hi, my name is Qishen Ha, working for LINE Corp. as a Machine Learning Engineer, the 11th ranked Kaggle Grandmaster in the world now, mainly working on computer vision problems like image classification, semantic segmentation or object detection.
I am very honored to be a Z by HP & NVIDIA data science ambassador, and I am very grateful to Z by HP & NVIDIA for giving me this opportunity and providing me with the Z8G4 workstation and ZBook Studio. This has increased my competitiveness in kaggle competitions.
I used the Z8G4 workstation and ZBook in the NFL 1st and Future - Impact Detection competition that ended last month (Jan. 2021) and our team finished in 3rd place. And now (Feb. 2021) I'm using them for another Kaggle competition -- Cassava Leaf Disease Classification. So I'd like to talk about how I felt about using the Z8G4 workstation and ZBook Studio in real kaggle competitions here.
I'll start with a brief overview of the specific configuration of the Z8G4 workstation and ZBook Studio provided to me by Z by HP & NVIDIA.
- Dual NVIDIA Quadro RTX 6000 GPUs (2 x 24 GB)
- Dual Intel Xeon Gold 6254 CPUs (2 x 18 cores)
- Memory 96 GB
- Storage 2 TB
- NVIDIA Quadro RTX 5000 GPU (16 GB)
- Intel Core i9-10885H (8 cores)
- Memory 32 GB
- Storage 2 TB
My main area of interest is computer vision, so the most important thing I look for in a workstation is GPU performance, followed by CPU performance.
The importance of GPU performance needs no introduction, but perhaps some would underestimate the need for CPU performance in computer vision tasks. Usually in computer vision tasks, the CPU is responsible for reading and pre-processing the data in multiple threads and then handing it over to the GPU for training. If the resolution of the images is large, or if more augmentation methods are used, this can put more pressure on the CPU. And once the CPU reads and preprocesses the data slower than the GPU trains the model, then the CPU performance will become the bottleneck of the whole system. In my opinion, a minimum of 8 CPU cores per GPU in a workstation is required to prevent CPU performance from becoming a bottleneck, while 16+ CPU cores per GPU are more desirable.
Large GPU memory is crucial in tabular competitions, because it is usually necessary to load the entire dataset into memory during feature engineering of the data. Therefore, hundreds of GB of memory are usually required for workstations in large-scale tabular competitions. However, in computer vision there is not much memory requirement, and about 24GB of memory per GPU is sufficient for most cases.
Regarding laptops, for a long time I just used them as a tool to connect to servers (cloud instances, etc.). Because the performance of laptops is usually positively correlated with their size and weight, powerful laptops are necessarily bigger and heavier. In this case, using a lightweight laptop to connect to the server and only debug the code and train the model on the server has become the mainstream practice today.
But when I got my hands on the ZBook Studio, I was amazed at how thin and light a laptop can be with this much performance - I hope you haven't forgotten that this laptop has an a Quadro RTX 5000 GPU (16GB) inside. This means that ZBook Studio can not only be a tool for connecting to a server or workstation, but it can also debug deep learning code itself.
How I use the Z8G4 Workstation and Zbook Studio
A quick note: I set up a jJupyter notebook on my Z8G4 workstation and then connected to it via Zbook Studio.
Jupyter notebook is, a web application that makes it easy to write and debug code cell by cell through your browser. It’s so well-known that I believe many machine learning engineers have used it, or at least heard about it. I always wrote the code and trained models on cloud instances via jJupyter notebook when I was kaggling last year before the partnership between Z by HP and NVIDIA. So when I got the Z8G4 workstation from Z by HP and NVIDIA, the first thing I did was to configure it with Jupyter Notebook.
When I connect to Jjupyter notebook in cloud instances, I need to go through a public network, while when I connect to my Z8G4 I only need to go through a LAN. So one of the obvious advantages of a workstation at home is that it is not dependent on the network environment and has very low latency. When I rent instances from vast.ai, I don't know where on earth the instance I'm renting is. It could be in the US, it could be in China or Europe, it could even be in the Arctic. Usually, unless the instance is in Japan, I experience significant latency when I connect to it via Jjupyter notebook. But connecting to my workstation at home over the LAN is almost indistinguishable from setup Jjupyter notebook directly from my laptop -- I don't feel any latency.
As for why I don't use the workstation directly, but connect it via LAN, the reason is very simple. Because Ubuntu's GUI takes up some of the GPU memory, usually 300-500mb, which is not a huge amount. But once I started training the model on the GPU, Ubuntu's GUI would become so laggy that I could hardly do anything else properly. So turning off the Ubuntu's GUI and connecting the workstation via a laptop not only solves the GUI lag problem when training the model, but also saves a few hundred MB of GPU memory.
In my previous workflow, I had no way to debug new experimental code if all the GPUs in my rented cloud instance were fully loaded. But that's not the case with the ZBook Studio, which not only has a 16GB GPU, but is also very thin and light. Now I can use ZBook Studio to debug new experiment’s code even when the GPUs on my workstation and instance are fully loaded. When the code is being able to run on ZBook Studio, I copy it to the workstation or instance and can quickly start the next experiment when the current one is finished. This effectively improves my GPU utilization.
Of course, it is possible to train models directly on ZBook Studio as well, but I personally don't recommend it because laptops have poor heat dissipation capabilities and working in high temperatures for long periods of time can damage the life of the laptop.
Speed of the NVIDIA Quadro RTX 6000
I have compared the speed of the NVIDIA Quadro RTX 6000 and NVIDIA V100 GPUs while kagglinge in the NFL and Cassava competitions. The comparison was done by running some of the same experiments with an NVIDIA Quadro RTX 6000 on my Z8G4 workstation, and a V100 on vast.ai. The CNN architectures used are EfficientNet and EfficientDet. To draw a direct conclusion, the NVIDIA Quadro RTX 6000 is about 90% ~ 100% as fast as the V100, with slight variations depending on by the CNN architecture. So I'm also very happy with the speed of the Z8G4's NVIDIA GPU.
About Z by HP Data Science SW Stack
To make it faster for users to get started with the workstation or ZBook, the Z by HP team can pre-install a set of software for users called "Z by HP Data Science SW Stack", which includes CUDA, RAPIDS, cudnn, Tensorflow, Pytorch, Docker and dozens of others. software commonly used by data scientists. Of course, users can also install this softwares by themselves.
However, as of now, users are not able to select the software they need to install from the SW Stack themselves, so it may not be so friendly for data scientists who don't like to have a lot of software installed on their computers that they won’t use.