Methods: Working off of Zhang et al and Toni et al’s papers on machine learning in wildfire detection, we used Tensorflow to implement our CNN. We scraped 202 images from the NASA MODIS sensor on the Aqua and Terra satellite, labelled them as having wildfires or not for classification purposes, and mounted them to our Colab notebook to use in our CNN. Our convolutional base is a stack of Convolutional 2D layers and MaxPooling2D layers, and we added several dense layers on top. We got peak accuracy with these layers at 76.216% accuracy with this configuration. Our training dataset is built off of only 202 satellite images, so we find this accuracy to be somewhat encouraging even though it is not comprehensive for the dataset we hoped to train on. In order to train a CNN that worked specifically on satellite images, we used the data augmentation techniques of horizontal and vertical flipping, rotating, cropping, adding Gaussian noise, and adjusting brightness. This augmented our base imageset by a factor of 6.
Results: We were able to achieve a 76.216% accuracy with our hyperparameter tuning, layer configuration, and data augmentation. As we trained on a base dataset including only 100 images in each category of wildfire and non-wildfire images, we feel these results are quite impressive for such a small training set. In addition, we were able to accomplish a 6% increase in accuracy simply through our data augmentation. We feel this is significant and speaks to the strength of our model’s architecture and the techniques used, despite not reaching our original goal accuracy of 80%.
Discussion: Our largest hurdle in the supervised portion of this project was data collection. We quickly realized we would need many more images than we had collected in order to successfully train our model. We explored a variety of techniques for data collection, including researching open source image databases for use in computer vision projects, CIFAR10 style data libraries, and manually web scraping the images we wanted ourselves. However, we still struggled to collect enough high-quality data to result in high accuracies.
We specifically wanted to train a CNN for satellite images, so this also made our data collection more difficult, as data sources are limited and often have these images in difficult formats. The NASA website layout also made image scraping more difficult than traditional web-scraping methods (like for example, downloading a thousand images of lilies off of Google images).
As there are many neural networks that are trained to detect fires in images with high accuracy, we hope that our satellite image specific model will add something a little unique to these efforts, and we will also be publishing our training and testing image sets on our public git repository for future work on this topic, as collecting our dataset was a significant portion of this battle.
Data augmentation did, however, significantly aid in the expansion of our dataset and significantly increased the model’s accuracy. As our dataset was already so small, we were never able to surpass 80% accuracy, as we hoped to be able to do with a balanced two-category classification problem, however, our methods did significantly increase the accuracy of the model, and with more time, we are confident we would have been able to achieve higher accuracies by adding new novel data, and applying our data augmentation techniques to this data.
Skill I think I demonstrated
What are the key reasons for moving forward with this proposal despite the concerns?