Air Quality Analysis

Jupyter notebooks and Python code for analyzing air quality (fine particle, PM2.5)
Alternatives To Air Quality Analysis
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
5 years agoPython
Agro Analytics - Data Mining/Machine Learning Project based on Agricultural datasets. For more info, go to
23 years ago5May 05, 20202gpl-3.0Python
MELODIST is an open-source toolbox written in Python for disaggregating daily meteorological time series to hourly time steps. It is licensed under GPLv3 (see license file). The software framework consists of disaggregation functions for each variable including temperature, humidity, precipitation, shortwave radiation, and wind speed. These functions can simply be called from a station object, which includes all relevant information about site characteristics. The data management of time series is handled using data frame objects as defined in the pandas package. In this way, input and output data can be easily prepared and processed. For instance, the pandas package is data i/o capable and includes functions to plot time series using the matplotlib library.
Course Pandas11
3 years agoAwk
Data processing with Pandas course for the CM Hub at Imperial College
2 years agomitPython
Operating system independent controller and data visualizer for the Uni-Trend UT330B temperature and humidity logger. Full documentation provided. Uses Python, Pandas, and Bokeh.
Raspberrypi Automatically Report8
9 months agogpl-3.0Python
Easy, lightweight, Python Script for monitoring Raspberry Pi
4 years ago4June 22, 20154bsd-3-clausePython
A Python package to calculate degree days (DD or in french DJU - degré jour unifié) from measured outdoor temperatures and to make it possible to quantify drift of energy consumption for heating (or cooling)
Air Quality Analysis8
2 years agomitHTML
Jupyter notebooks and Python code for analyzing air quality (fine particle, PM2.5)
Hawaii Flask Api3
5 years agoJupyter Notebook
Creating an API app with flask, python and sqlAlchemy
4 years agoPython
[CSIAR Inspire Challenge 2018] Using Machine Learning to improve agriculture in India
5 years agoPython
Mining Sensor Data to Evaluate Indoor Environmental Quality of Public Educational Buildings
Alternatives To Air Quality Analysis
Select To Compare

Alternative Project Comparisons


Jupyter notebooks and Python code for analyzing air quality (fine particles, PM2.5)

Table of contents

1. Basic data visualization
2. Correlation of PM2.5
2.1 Correlation of PM2.5 with time
2.2. Correlation of PM2.5 with wind and temperature (data cleaning)
2.2. Correlation with wind and temperature (analysis)
2.3 Correlation with MERRA-2 data
2.4 Conversion wind (U,V) component, RH from temperatures
3.1 Data selection
3.2 Regression

Tool and packages
4. Credits

PDF version is in PDF folder, likewise HTML's

1. Basic data visualization

  • introduce to basic setup of folder, install pandas, matplotlib, seaborn (using pip for Python package), Anaconda is a good choice if you are using Windows (or even Mac, Linux). Alternatively, try out Google Colaboratory

  • basic use of those tools (clean, explore, plot, interpret)

  • work with a CSV file from

  • here are some graphs produced from this exercise

    • basic line chart

    • line chart with a band for standard deviation

    - pie chart with Air Quality Index (AQI)

2. Correlation of PM2.5

2.1 Correlation of PM2.5 with time

  • continue to work with the .CSV file from AirNow.Gov to explore the correlation between PM2.5 and time such as:
    • peak-traffic hours vs. non-peak traffic hours
    • weekends vs. weekdays
    • variation of each months
  • here is some graphs produced from this exercise

  • a summary graph of this dataset

2.2 Correlation of PM2.5 with wind and temperature (data cleaning)

  • explore data source (specifically working with archieved meteorologcal data from NOAA.GOV

  • clean the data (which is formatted with Integrated Surface Data (ISD) style)

  • use windrose package to make windrose plot

2.2 Correlation with wind and temperature (analysis)

  • explore correlation between meteorological paramters to observed PM2.5 concentration such wind, temperature, height above ground

  • capture espisode and examine relevant inputs with PM2.5

  • some examples from this exercise

    • correlation graph:

    - what method in *that* correlation?

    - a high PM2.5 and a cloudy day

    - or, I want to see other inputs such as..

2.3 Correlation with MERRA-2 data

  • work with MERRA-2 reanalysis data from NASA

  • find the correlation from main groups (single level, surface turbulent flux, aerosols mixing ratio) and PM2.5

  • here is the 3 summary graphs:

    • Single level diagnosis

    • surface turbulent flux

    • Aerosol mixing:

2.4 Conversion wind (U,V) component, RH from temperatures

  • a detour to look at conversion of wind data (U, V) vectors to speed and direction in degree

  • how to use MetPy packages calculate such conversion instead of manually undertake

  • explore data for the next which is selecting relevant data for predicting PM2.5

  • some graph examples:

    • relation of height (to the ground) vs. pressure

    • compare values from different sources (such as from observed station, a public API, or reanalysis product)

    • correlation of wind speed in different altitude to PM2.5 concentration

3.1 Data selection

  • combine three sources of data fromt the previous exercise
    • PM2.5 from
    • Ground observed data from
    • Reanalysis data from MERRA-2 product, SLV and FLX groups (or tags)
  • remove dependent data and data with weak (very weak) correlation with PM2.5
  • here is outcome of this exercise:
    • preliminary heatmap (of all most input parameters, don't worry about the name just yet):

    • a final version of selected data with correlation with PM2.5

    • and if you are curious about the full name of each parameter, here it is. Note that in the final version of CSV data, all temperature was converted from Kelvin (K) to Celsius (C).

    {'TQV': 'total_precipitable_water_vapor, kg m-2',
    'T2MDEW': 'dew_point_temperature_at_2_m, K',
    'HLML': 'surface_layer_height, m',
    'FRCAN': 'areal_fraction_of_anvil_showers, 1',
    'T2M': '2-meter_air_temperature, K',
    'WS': 'observed ground wind speed, m/s',
    'DISPH': 'zero_plane_displacement_height, m',
    'TQL': 'total_precipitable_liquid_water, kg m-2',
    'v_50m': 'wind speed at 50m, m/s',
    'v_850': 'wind speed at 850hPa (~1450m)',
    'v_2m': 'wind speed at 2m, m/s',
    'CLDCR': 'cloud cover, 1',
    'CIG': 'ceiling height dimension, m',
    'PS': 'surface_pressure, Pa',
    'RHOA': 'air_density_at_surface, kg m-3',
    'H1000': 'height_at_1000_mb, m'}

3.1 Regression

  • Work with Scikit-learn library with regression models such Linear, DecisionTree, RandomForest

  • Evaluate performance of each model and an ensamble by PM2.5 and meteorological data for Hanoi, 2018. Datasets are cleaned and reduced from the previous excercise

  • Apply a model with less feastures (DarkSky), but easiler to extract via API.

  • Graphs from this excercise:

    • perfomance on train dataset (using ensemble regression)

    • performance on test dataset

    • relative standard deviation on each model (lower is better)

  • an hourly update web-interface using the same concept can be found here with selected sites at my personal website

    • screenshot example:

tools and packages

  • the analysis is carried out on Jupyter Notebook (and later with Jupyter Lab 2.2), Ubuntu 18.04LTS.
  • Python (3.6.9)
  • Matplotlib (3.1.2)
  • pandas (1.1.0)
  • Seaborn (0.9.0)
  • windrose (N/A)
  • MetPy (1.0.0)
  • scikit-learn (sklearn - 0.22.1)
  • scipy (1.4.1)


If this work is helpful to your research

  • Admittedly, citing Github repository or other open project is new, but if this work is helpful for your work, I would appreciate the attribution, a link or a word.
  • To cite this work, use this Binh Nguyen, Air Quality Analysis, GitHub repository:


Keras (with TensorFlow)

  • experiment with LSTM is not yet promising.
Popular Pandas Projects
Popular Temperature Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.