Catheter Line Classification with Z by HP & NVIDIA

Bachelor of Science and Arts

Late last year, I was lucky enough to join an ambassador program with Z by HP & NVIDIA in which I was given a Z8 Workstation with a RTX Quadro 8000 GPU and a ZBook Studio with a RTX Quadro 5000 GPU to enhance my data science skills on Kaggle. In this blog post, I will discuss how I have used these incredible machines on the recently completed RANZCR CLiP Catheter Line Detection competition. 

Z by HP DS Software Stack
Before getting into the blog, I want to mention the Z by HP Data Science Software stack. If you are like me, you don't like spending tons of time installing necessary packages and tools to get your data science project up and running (you are probably like me). With the Z by HP Data Science Software stack, your computer comes pre-loaded with a host of data science libraries, developer tools, and Docker container support. So not only do you have access to programming languages like Python 2 / 3, you also get libraries like TensorFlow, PyTorch, Sklearn, and XGBoost from day one. As someone who has spent hours trying to get IT to do something as simple as install Python on a co-workers computer so they could run a code I wrote, this software stack is invaluable and saves you a ton of time. Because of this, I was able to take pre-existing workflows I wrote on Kaggle kernels and run them on my local machines without installing a single package.

Ranzcr CLiP Intro
In this competition, we were tasked with categorizing line catheters into 11 different classes based on chest x-rays. Complications arise when these line catheters are incorrectly positioned in patients. Typically, doctors and nurses manually inspect the position of these catheters / tubes in these x-rays, but this is prone to human error and is highly time consuming, so how do we remedy these issues? With deep learning, of course! 

We were given just over 30,000 chest x-rays of line catheters and their corresponding labels and had to infer on 40,000 unseen chest x-rays; quite a task indeed. Patients could have several catheters present at once, making this a multi-label classification problem. The competition metric was a column-wise average of the AUC for each predicted label and the observed target.

I joined this competition a month and a half late, so I was able to peruse existing solutions / discussions before beginning my initial experiments. It seemed that deep CNNs were performing well; in particular the fairly new ResNet200D model was outperforming other architectures. Now, x-rays are typically very high resolution images; in this competition, the original x-rays were around 2500 by 2500 pixels. One cannot fit these images into memory, even with large GPUs and smaller batch sizes, so it was necessary to resize the images. 

This is particularly problematic for this competition: the classification of catheters is largely based on their endpoints, and in some cases, the difference between the endpoints of a correctly placed catheter and a malpositioned catheter is only several pixels. So if you resize your images too much, this pixel difference cannot be picked up by a CNN. One potential solution is to crop images at the original resolution, so you can train on smaller resolutions while retaining high resolution features. This worked well in the Kaggle Melanoma classification competition, where most moles were at the center of the image. However, many catheters insert in the arm, which are at the fringes of the chest x-ray (see below image), so you cannot crop too much or you lose this information. What to do?

Luckily, my Z8 Workstation has a NVIDIA Quadro RTX 8000 with 48 gigabytes of memory, so I was able to use large image resolutions with decent batch size. The largest resolution I was able to fit into memory with a batch size of 32 was 736 by 736. This was without using any sort of gradient accumulation, so I could have easily fit larger resolution images if I wanted to. For reference, the Kaggle NVIDIA Tesla P100 GPU only has 16 gigabytes of memory and most Kagglers were only able to train 512 by 512 images with a batch size of 16 before running out of memory, which was far from ideal for this competition.

Having this much GPU memory allowed me to run many experiments with the same batch size to compare model performance at higher image resolution. I saw an increase in local CV everytime I bumped up the image size. I would not have been able to observe this without my large local GPU. Many of my experiments took over 9 hours to run, which is the maximum runtime for a Kaggle notebook with GPU and close to the maximum time that a Google Colab notebook can run (12 hours). 


I learned late in the competition that the public test and private test images were repurposed chest x-rays from the NIH Chest X-ray dataset. This was very useful information; if you can somehow label these images for line catheters and train on them, your leaderboard score would improve. How would one do this? Simple: you just use a model (or better yet, and ensemble of models) to pseudo label the images in the NIH x-ray dataset. 

That being said, this is risky: many of the training images in the competition also came from the NIH datasets, so if you naively train on pseudo labels, you may be training on images in your validation set and therefore have CV leakage. So it is necessary to remove all duplicate images from the NIH dataset that were given to us in the original RANZCR training dataset. 

One way to do this is why RAPIDS. You can create CNN image embeddings by extracting features from all training images and NIH images and use RAPIDS cuML NearestNeighbors to find duplicate images. To do this, you simply remove the classification head from an ImageNet pretrained model and use it to infer on the images in both datasets. You then train a GPU-accelerated NearestNeighbors model on the training images and use it to find neighbors in the external dataset. (You can also use this technique to find duplicates inside a single dataset to prevent fold leakage as well). 

Thanks to RAPIDS, this training only takes seconds: the time consuming part is actually extracting the CNN embeddings, but this only needs to be done once and then you can save them and load them later. You can use these CNN embeddings with RAPIDS to do even more: train a KNearestNeighbors model to establish a baseline score, cluster samples together with KMeans, and reduce dimensionality with T-SNE (all GPU accelerated through RAPIDS).

End Result
After removing duplicates and generating pseudo-labels for the external data, I started running experiments to compare performance between models trained solely on the RANZCR dataset and those trained on both the RANZCR images and the pseudo labeled NIH external dataset. Thanks to my GPU, I quickly saw that those that used both datasets drastically outperformed those that only used the dataset given to us in the competition. My largest CV increase for a single model was from 0.917 to 0.9636

I re-trained my top 6 models on these newly created pseudo-labels and added them to my ensembling algorithm. In the end, I had around 20 models trained on different image sizes (ensembling models trained on different image resolutions typically works well). For my final submissions, I first chose my best public leaderboard submission and then chose my best local CV submission. It is not surpsinging that my best CV score submission gave me the highest private leaderboard score. 

I managed to get 60th (top 4%) which gave me a solo silver medal. I am proud of my performance in this competition, but far from satisfied. Thank you Z by HP & NVIDIA for the powerful hardware. Without it, I would not have secured a solo silver medal in this competition. I look forward to using my hardware in upcoming computer vision competitions!