|Project Name||Stars||Downloads||Repos Using This||Packages Using This||Most Recent Commit||Total Releases||Latest Release||Open Issues||License||Language|
|Deutsche Bahn ticket price calendar.|
|Powergenome||176||14 days ago||4||June 10, 2023||66||mit||Python|
|A tool to quickly and easily create inputs for power systems models|
|Polypheny Db||92||6 days ago||76||apache-2.0||Java|
|A self-adaptive Polystore that provides cost- and workload aware access to heterogeneous data.|
|Social Data||50||6 months ago||6||mit||Python|
|Code and data for eviction and housing analysis in the US|
|Learning Based Cost Estimator||27||2 years ago||2||mit||Jupyter Notebook|
|Snowflake Usage Block||25||2 years ago||9||mit||LookML|
|Slow Actions||23||1||1||14 years ago||2||October 20, 2009||mit||Ruby|
|Inspect a rails application's log file to find slow actions|
|Autotrader Miner||20||7 years ago||3||PHP|
|Single-user site for mining Autotrader.co.uk data and providing more useful searches for cars|
|Invacost||5||8 months ago||1||R|
The code and data for PowerGenome are under active development and some changes may break existing functions. Keep up to date with major code and data releases by joining PowerGenome on groups.io. And check out the growing documentation on the Wiki for helpful background information.
Power system optimization models can be used to explore the cost and emission implications of different regulations in future energy systems. One of the most difficult parts of running these models is assembling all the data. A typical model will define several regions, each of which need data such as:
Because computational complexity and run times increase as the number of regions and generating unit clusters increases, a user might want only want to disaggregate regions and generating units close to the primary region of interest. For example, a study focused on clean electricity regulations in New Mexico might combine several states in the Pacific Northwest into a single region while also splitting Arizona combined cycle units into multiple clusters.
The goal of PowerGenome is to let a user make all of these choices in a settings file and then run a single script that generates input files for the power system model. PowerGenome currently generates input files for GenX, and we hope to expand to other models in the near future.
PowerGenome uses data from a number of different sources, including EIA, NREL, and EPA. The data are accessed through a combination of sqlite databases, CSV files, and parquet data files. All data files are available here.
pg_misc_tables_efs.sqlite) has tables with new resource costs from NREL ATB, transmission constraints between IPM regions from EIA, and hourly demand within each IPM region derived from NREL or FERC data.
This project pulls data from PUDL. As such, it requires installation of PUDL to access a normalized sqlite database and some of the convienience PUDL functions.
catalystcoop.pudl is included in the
environment.yml file and will be installed automatically in the conda environment (see instructions below). Catalyst Cooperative will be creating versioned data releases of PUDL, which can be accessed on Zenodo. Download the zip file from Zenodo, unzip it, and find the sqlite database under
pudl_data/sqlite/pudl.sqlite. Note that the version of
catalystcoop.pudl software may change based on the database version you use. Look on the right-hand side of the zenodo archive to see what software version was used to compile the data. If the version in your conda environment does not match the version used to compile the data, you can change it in the
environment.yml file or install a different version using
mamba install catalystcoop.pudl=<your_version>.
Clone this repository to your local machine and navigate to the top level (PowerGenome) folder.
Create a conda environment named
powergenome using the provided
environment.yml file. If you don't already use conda it is easiest to download and install Mambaforge, which will install conda with mamba in the
base environment. See this description for more information on the difference between different ways to install conda and mamba. Conda usually fail to resolve dependencies in under a day so I highly recommend that you either start with Mambaforge or install mamba in your
base environment and use it instead.
mamba env create -f environment.yml
conda activate powergenome
pip install -e .
Download the PUDL database from Zenodo or the PowerGenome data repository, unzip it, and copy the
/pudl_data/sqlite/pudl.sqlite to wherever you would like to store PowerGenome data on your computer. The zip file contains other data sets that aren't needed for PowerGenome and can be deleted. Note that as of May 2023 the most recent version of this database (v2022.11.30) is compatible with
catalystcoop.pudl version v2022.11.30 and may not work if an earlier software version is included in your conda environment.
Download the additional PowerGenome database from the PowerGenome data repository. It includes NREL ATB cost data, transmission constraints between IPM regions, and hourly demand for each IPM region. Hourly demand is based on a 2012 weather year and was constructed either directly from FERC 714 data (
load_curves_ferc) or from NREL EFS data (
load_curves_nrel_efs) that also sources back to FERC 714. The NREL load curves, which separate hourly demand by sector and subsector, are now the default source for load curves in PowerGenome. See the wiki for more information. These files will eventually be provided through a data repository with citation information.
Download the appropriate renewable resource data files from the PowerGenome data repository. There is a single set of generation profiles and resource group folders specific to different regional aggregations. Read through the included README for more background. This folder contains:
generation_profilescan be saved in a single place and used across multiple studies.
resource_groupshas CSV files that tell PowerGenome the metro that each potential wind/solar site will deliver power to based on a set of regional aggregations. Use the corresponding regional aggregations in your settings file. You can request new resource group files for different regional aggregations on the PowerGenome repository discussion page
Download data files derived from NREL's EFS from the PowerGenome data repository. These provide hourly demand profiles for growing electrification technologies like electric vehicles and heat pumps and are used to both build up demand profiles in the future and create flexible demand resources that can shift their load.
Download distributed generation profiles from the PowerGenome data repository compiled from NREL Cambium 2022 scenarios.
Create the file
PowerGenome/powergenome/.env. In this file, add:
PUDL_DB=YOUR_PATH_HERE(your path to the PUDL database downloaded in step 5)
PG_DB=YOUR_PATH_HERE(your path to the additional PowerGenome data downloaded in step 6)
RESOURCE_GROUP_PROFILES=YOUR_PATH_HERE(your path to the folder with hourly wind/solar generation parquet files)
EFS_DATA=YOUR_PATH_HERE(your path to the folder with EFS derived data files)
DISTRIBUTED_GEN_DATA=YOUR_PATH_HERE(your path to the folder with distributed generation profiles)
RESOURCE_GROUPS=YOUR_PATH_HERE(your path to the resource groups data for a project -- this can be included in your settings file instead of the .env file)
Quotation marks are only needed if your values contain spaces. The
.env file is included in
.gitignore and will not be synced with the repository.
Installing Powergenome with pip has only been tested within a conda environment but it should work in other environment management systems. Make sure that you have an updated version of pip installed. If you hit dependency errors I suggest trying to install them using mamba or conda. PowerGenome has
catalystcoop.pudl as a dependency, which has a large number of its own dependencies. I have not (yet) had to install
catalystcoop.pudl using mamba but doing so may help clear up errors.
Depending on your operating system you might also have issues installing some other packages from pip. The example code below is what works for me on a Mac, where python-snappy fails to build wheels.
(base) conda create -n powergenome python=3.10 pip python-snappy (base) conda activate powergenome (powergenome) pip install powergenome
PowerGenome has been submitted to conda-forge but is not yet available.
If you are installing a packaged version of PowerGenome you won't be able to easily use a .env file. Instead, add the environment parameters (
PG_DB, etc) to a YAML file in the same folder as the rest of your settings. It doesn't really matter which file these parameters are included in but creating a new file such as
env_params.yml will help keep them separate from other settings parameters that might be shared with other PowerGenome users.
It is best practice to set up project folders outside of the cloned repository so that git doesn't track any new/changed files within the upper-level
PowerGenome folder. Try copying one of the example systems (settings file and extra inputs) and modifying it. Copy the
notebooks folder into your project folder, change the path to the settings file as needed, and run code in the notebooks. This can also be a good way to learn how data are created in PowerGenome and debug problem.
Keeping project folders separate from the cloned
PowerGenome folder will also make it easier to pull changes as they are released.
A few example systems are included under
PowerGenome/example_systems. Each system has settings files in a folder (
settings) and a folder with extra user inputs (
extra_inputs). The different example systems are not meant to be accurate for real-world analysis, so please do not blindly use the external data files included with them in your own studies!
Settings are controlled in a set of YAML files within a folder or combined into a single file. An example folder of settings files (
settings) and folder with extra user inputs (
extra_inputs) are included in each of the example systems. Scenario options across different planning years are defined in the file
test_scenario_inputs.csv. Documentation on extra inputs is included in the folder of each example system.
A series of example notebooks are included in
PowerGenome/notebooks describe how to access different functions within PowerGenome to create resource clusters, variable generation profiles, fuel costs, hourly demand, and transmission constraints. They include a description of how the data are compiled and the settings parameters that are required for each type of data.
The outputs are all formatted for GenX we hope to make the data formatting code more module to allow users to easily switch between outputs for different power system models.
Functions from each module can be imported and used in an interactive environment (e.g. JupyterLab). Examples of how to load data in this way are included in
PowerGenome/notebooks. To run from the command line, navigate to a project folder that contains a settings file and extra inputs (e.g.
myproject/powergenome), activate the
powergenome conda environment, and use the command
run_powergenome_multiple with flags for the settings file name and where the results should be saved. Since the
powergenome package is installed in the
powergenome conda environment, you can run the command line function from anywhere on your computer (not just within the cloned
run_powergenome_multiple --settings_file settings --results_folder test_system
The command line arguments
--results_folder can be shortened to
-rf respectively. For all options, run:
A folder with extra user inputs is required when using the
run_powergenome_multiple command. The name of this folder is defined in the settings YAML file with the
input_folder parameter. Look at the files in each example system for test cases to follow.
If you have previously installed PowerGenome and the
run_powergenome_multiple command doesn't work, try reinstalling it using
pip install -e . as described above. If you downloaded the custom PUDL database before May of 2020, some errors may be resolved by downloading a new version.
PowerGenome is released under the MIT License. Most data inputs are from US government sources (EIA, EPA, FERC, etc), which should not be subject to copyright in the US. Hourly FERC demand data has been cleaned using techniques developed by Tyler Ruggles and David Farnham, and allocated to IPM regions using methods developed by Catalyst Cooperative. Hourly generation profiles for wind and solar resources were created by Vibrant Clean Energy and provided without usage restrictions. All PowerGenome data outputs are released under the CC-BY-4.0 license.
Contributions are welcome! There is significant work to do on this project and additional perspective on user needs will help make it better. If you see something that needs to be improved, open an issue. If you have questions or need assistance, join PowerGenome on groups.io and post a message there.
Pull requests are always welcome. To start modifying/adding code, make a fork of this repository, create a new branch, and submit a pull request.
All code added to the project should be formatted with black. After making a fork and cloning it to your own computer, run
pre-commit install to install the git hook scripts that will run every time you make a commit. These hooks will automatically run
black (in case you forgot), fix trailing whitespace, check yaml formatting, etc.