Skip to content

maxbeber/deep-water

Repository files navigation

Deep Water

This project aims to track changes in water level using satellite imagery and deep learning. Throughout my studies, I've worked on this project with my friend Karl as part of our portfolio project of the Data Science Retreat. The retreat consists of a three months intensive in-person Data Science bootcamp in Berlin, Germany.

Table of Content:

  1. Introduction
  2. Datasets
  3. Labeling
  4. Data Augmentation
  5. Metrics
  6. Baseline
  7. Model Optimization
  8. Model Results
  9. Dashboard
  10. Technical Stack
  11. Virtual Environment
  12. Next Steps

Introduction

The motivation for this project is the article Some of the World's Biggest Lakes Are Drying Up found in the March 2018 edition of the National Geographic magazine.

Freshwater is the most important resource for mankind, cross-cutting all social, economic and environmental activities. It is a condition for all life on our planet, an enabling limiting factor for any social and technological development, a possible source of welfare or misery, cooperation or conflict. (UNESCO)

The exponenetial growth of satellite-based information over the past four decades has provided unprecedented opportunities to improve water resource manegement.

Datasets

NWPU-Resic-45 dataset is a pubicly available benchmark for Remote Sensing Image Scene Classification (RESIC), created by Nortwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes (including water classes) with 700 images in each class.

The second dataset is a time-series of cloudless Sentinel-2 imagery including 17 criticaly endangered lakes as following:

Labeling

The MakeSense online tool has been used for labeling both datasets images. It only requires a web browser and you are ready to go. It's an excellent choice for small computer vision deep learning projects, making the process of preparing the dataset easier and faster.

Data Augmentation

The following techniques have been applied during training:

  • Height shift up to 30%;
  • Horizontal flip;
  • Rotation up to 45 degrees;
  • No shear;
  • Vertical flip;
  • Width shift up to 30%;
  • Zoom between 75% and 125%.

Metrics

The following metrics have been used to evaluate the semantic segmenation model:

  • Jaccard Index
  • Dice Coefficient

More information about both of these metrics can be found here.

Baseline

The baseline consists of a simple U-Net model architecture. This strategy allow us to modify the model for our own purposes and fine-tunning it as necessary for our development purposes. By using this network architecture, we could spend more time understanding the optimization strategies.

Without Data Augmentation

Train/Validation/Test splits based on Resic-45 dataset only:

  • training set: 489 images;
  • validation set: 140 images;
  • test set: 71 images.

Model performance:

Baseline results without image augmentation

With Data Augmentation

Train/Validation/Test splits based on Resic-45 dataset only:

  • training set: 979 images;
  • validation set: 280 images;
  • test set: 122 images.

Model performance:

Baseline results using image augmentation

It can be seen clearly that the baseline model overfits using image augmentation.

Model Optimization

The following strategies have been explored:

  1. Using Early Stopping and adaptive learning rates;
  2. Using a bigger model (and dropout);
  3. Using regularization (Batch Normalization);
  4. Using residual connections;
  5. Dealing with class imbalance using dice loss;
  6. Refining label images using CRFs;
  7. Ensemble predictions.

Train/Validation/Test splits:

  • training set: 489 images from Resic-45 dataset randomly transformed at each epoch using one of the techniques described in the fourth section Data Augmentation;
  • validation set: 211 images from Resic-45 dataset;
  • test set: 359 images from Sentinel-2 dataset.

Model performance using binary cross entropy as the loss function:

Model optimization using binary cross entropy

Model performance using dice loss as the loss function:

Model optimization using dice loss

Model Results

The test set to measure the results presented below is based on 182 images from Sentinel-2 dataset.

Model 1: U-Net residual model trained without label correction:

Model results: unet residual large dice

Model 2: U-Net residual model trained with label correction using Conditional Random Fields:

Model results: unet residual large dice

Model 3: Ensemble model based on the two previous models:

Model results: unet residual large dice

The ensemble model is the one with highest accuracy (97.15%) and is the one used in the Dashboard application that will be covered in the next section.

Dashboard

The dashboard can be executed with the following command:

python app.py

A demo is available here.

Use Case 1: Lake Copais, Greece (2019)

Use Case 1: Lake Copais

Use Case 2: Lake Di Cancano, Italy (2019)

Use Case 2: Lake Di Cancano

Use Case 3: Lake Salda, Turkey (2016)

Use Case 3: Lake Salda

Technical Stack

The following libraries are required to create the virtual environment. The creation of the virtual environment is detailed in the next section.

  • Cython
  • Dash
  • Matplotlib
  • NumPy
  • Pillow
  • Plotly
  • Pydensecrf
  • Rasterio
  • Requests
  • Tensorflow 2.4

Virtual Environement

To setup your local environemnt it is recommended to create a virtual environment using condas. Make sure you have it installed on your computer and then execute the command below:

conda env create -f environment.yml

The environment.yml file ensures that all dependiences will be downloaded.

After the enviroment is created, it is necessary to activate the virtual environemnt as follows:

conda activate deep-water

The virtual environment can be deactivate in a single line of code.

conda deactivate

Next Steps

The topics below can be studied and analysed in the context of the project:

  • Apply post-processing techniques such as defrosting;
  • Collect satellite imagery with clouds;
  • Collect more data using the sentinelsat package;
  • Estimate the volume of a given water body.

About

This projects track changes in water level using satellite imagery and deep learning.

Topics

Resources

Stars

Watchers

Forks

Languages