For this year 2021, I have set a target to take part in 25 data science competitions in Kaggle, DataDriven, AiCrowd, etc. As a Z by HP & Nvidia global data science ambassador, I am fortunate to receive a Z8 specialized data science workstation for my competitions.
My first competition is “Wind-dependent Variables: Predict Wind Speeds of Tropical Storms” - hosted by Radiant Earth Foundation at DrivenData:
Every competition tells a story and this one is about:
“Two roads diverged in a wood, and I ...I took the one less traveled by, And that has made all the difference.” - quote from Robert Frost, American Poet
The organization of the blog is as follows:
● Problem and solution : Summary of the data science problem that I am solving and my proposed solution
● Experience on Z8 HP & Nvidia Workstation : How new computation power has empowered me to try something different
Let’s begin the story!
Problem and Solution
Tropical cyclones are one of the costliest natural disasters globally. Hurricanes can cause upwards of 1,000 deaths and $50 billion in damages in a single event. The task for this competition is to build model to estimate the maximum sustained surface wind speed (storm intensity) from new satellite images of tropical storms. Past history of images and wind speeds can also be used if available.
Single-band (10.3 microns) satellite images of 496 storms are provided. These are captured by the Geostationary Operational Environmental Satellites (GOES) orbiting around the earth. Each storm consists of about 20 to 700 images, measuring 336 x336 pixels. In total, the train set have 70,257 images and the test set have of 44,377 images.
My solution is an ensemble of LSTM and transformer image-based regression models.
Figure.1. shows the construction of the LSTM encoder-decoder model. A imagenet pretrained resnet-18d is used to encode the input image into a 512-dim embedding. Next, a 2-layer bidirectional LSTM is used to encode the past history images, time stamps and wind speeds into a context vector. Finally, another 2-layer LSTM is used for sequence decoding in the forecast interval. Given the encoder context vector, the decoder predicts the current wind speed using the current input image, time stamp and the last predicted wind speed.
Figure.1 LSTM encoder-decoder model
Figure.2. shows the construction of the transformer encoder-decoder model. Similarly, resnet-18d is used to encode the input image. Sinusoidal positional encoding is used for the time stamp. A 2-layer multi-head dot-product attention (MHA) encoder encodes the history into a set of “output weighted summed values”.
Another 2-layer MHA decoder predicts the wind speed using current input image as time stamp as query, and encoder output as key and value.
A triangular mask is used in the self-attention in the decoder. This is a standard practice to prevent the decoder from “cheating” and accessing any future image and time stamp.
Note that unlike the LSTM, I make a subtle but important modification. The previous predicted wind speed is not used in the transformer decoder. Experimental results shows that this reduces predicted error propagation error. The side effects are that the transformer’s prediction could also become more bumpy and less smooth.
Fortunately, results of LSTM and transformer are complimentary, giving rise to quite significant improvement when the different models are combined in the ensemble.
1. First train the baseline CNN encoder using single image only.
2. Freeze the CNN encoder and train LSTM or transformer only from learning rate 0.001 to 0.0001. Here I am using RADAM optimizer with lookahead implementation.
3. Finally, unfreeze the CNN encoder and train end-to-end with learning rate 0.0001.
Such freezing and unfreezing steps can prevent LSTM and transformer from overfitting.
Finally, Figure.3 is an example of input images and history wind speed and the the predicted results.