Deep Water

This project aims to track changes in water level using satellite imagery and deep learning. Throughout my studies, I've worked on this project with my friend Karl as part of our portfolio project of the Data Science Retreat. The retreat consists of a three months intensive in-person Data Science bootcamp in Berlin, Germany.

Table of Content:

Introduction
Datasets
Labeling
Data Augmentation
Metrics
Baseline
Model Optimization
Model Results
Dashboard
Technical Stack
Virtual Environment
Next Steps

Introduction

The motivation for this project is the article Some of the World's Biggest Lakes Are Drying Up found in the March 2018 edition of the National Geographic magazine.

Freshwater is the most important resource for mankind, cross-cutting all social, economic and environmental activities. It is a condition for all life on our planet, an enabling limiting factor for any social and technological development, a possible source of welfare or misery, cooperation or conflict. (UNESCO)

The exponenetial growth of satellite-based information over the past four decades has provided unprecedented opportunities to improve water resource manegement.

Datasets

NWPU-Resic-45 dataset is a pubicly available benchmark for Remote Sensing Image Scene Classification (RESIC), created by Nortwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes (including water classes) with 700 images in each class.

The second dataset is a time-series of cloudless Sentinel-2 imagery including 17 criticaly endangered lakes as following:

Lake Poopo, Bolivia;
Lake Urmia, Iran;
Lake Mojave, USA;
Aral sea, Kazahkstan;
Lake Copais, Greece;
Lake Ramganga, India;
Qinghai Lake, China;
Salton Sea, USA;
Lake Faguibine, Mali;
Mono Lake, USA;
Walker Lake, USA;
Lake Balaton, Hungary;
Lake Koroneia, Greece;
Lake Salda, Turkey;
Lake Burdur, Turkey;
Lake Mendocino, USA;
Elephant Butte Reservoir, USA.

Labeling

The MakeSense online tool has been used for labeling both datasets images. It only requires a web browser and you are ready to go. It's an excellent choice for small computer vision deep learning projects, making the process of preparing the dataset easier and faster.

Data Augmentation

The following techniques have been applied during training:

Height shift up to 30%;
Horizontal flip;
Rotation up to 45 degrees;
No shear;
Vertical flip;
Width shift up to 30%;
Zoom between 75% and 125%.

Metrics

The following metrics have been used to evaluate the semantic segmenation model:

Jaccard Index
Dice Coefficient

More information about both of these metrics can be found here.

Baseline

The baseline consists of a simple U-Net model architecture. This strategy allow us to modify the model for our own purposes and fine-tunning it as necessary for our development purposes. By using this network architecture, we could spend more time understanding the optimization strategies.

Without Data Augmentation

Train/Validation/Test splits based on Resic-45 dataset only:

training set: 489 images;
validation set: 140 images;
test set: 71 images.

Model performance:

With Data Augmentation

Train/Validation/Test splits based on Resic-45 dataset only:

training set: 979 images;
validation set: 280 images;
test set: 122 images.

Model performance:

It can be seen clearly that the baseline model overfits using image augmentation.

Model Optimization

The following strategies have been explored:

Using Early Stopping and adaptive learning rates;
Using a bigger model (and dropout);
Using regularization (Batch Normalization);
Using residual connections;
Dealing with class imbalance using dice loss;
Refining label images using CRFs;
Ensemble predictions.

Train/Validation/Test splits:

training set: 489 images from Resic-45 dataset randomly transformed at each epoch using one of the techniques described in the fourth section Data Augmentation;
validation set: 211 images from Resic-45 dataset;
test set: 359 images from Sentinel-2 dataset.

Model performance using binary cross entropy as the loss function:

Model performance using dice loss as the loss function:

Model Results

The test set to measure the results presented below is based on 182 images from Sentinel-2 dataset.

Model 1: U-Net residual model trained without label correction:

Model 2: U-Net residual model trained with label correction using Conditional Random Fields:

Model 3: Ensemble model based on the two previous models:

The ensemble model is the one with highest accuracy (97.15%) and is the one used in the Dashboard application that will be covered in the next section.

Dashboard

The dashboard can be executed with the following command:

python app.py

A demo is available here.

Use Case 1: Lake Copais, Greece (2019)

Use Case 2: Lake Di Cancano, Italy (2019)

Use Case 3: Lake Salda, Turkey (2016)

Technical Stack

The following libraries are required to create the virtual environment. The creation of the virtual environment is detailed in the next section.

Cython
Dash
Matplotlib
NumPy
Pillow
Plotly
Pydensecrf
Rasterio
Requests
Tensorflow 2.4

Virtual Environement

To setup your local environemnt it is recommended to create a virtual environment using condas. Make sure you have it installed on your computer and then execute the command below:

conda env create -f environment.yml

The environment.yml file ensures that all dependiences will be downloaded.

After the enviroment is created, it is necessary to activate the virtual environemnt as follows:

conda activate deep-water

The virtual environment can be deactivate in a single line of code.

conda deactivate

Next Steps

The topics below can be studied and analysed in the context of the project:

Apply post-processing techniques such as defrosting;
Collect satellite imagery with clouds;
Collect more data using the sentinelsat package;
Estimate the volume of a given water body.

Name		Name	Last commit message	Last commit date
Latest commit History 345 Commits
assets		assets
datasets		datasets
metrics		metrics
models		models
preprocessing		preprocessing
saved_models		saved_models
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
app_callbacks.py		app_callbacks.py
app_helpers.py		app_helpers.py
app_layout.py		app_layout.py
environment.yml		environment.yml
requirements-cpu.txt		requirements-cpu.txt
requirements-gpu.txt		requirements-gpu.txt

maxbeber/deep-water

Folders and files

Latest commit

History

Repository files navigation

Deep Water

Introduction

Datasets

Labeling

Data Augmentation

Metrics

Baseline

Without Data Augmentation

With Data Augmentation

Model Optimization

Model Results

Dashboard

Technical Stack

Virtual Environement

Next Steps

About

Topics

Resources

Stars

Watchers

Forks

Languages