|Project Name||Stars||Downloads||Repos Using This||Packages Using This||Most Recent Commit||Total Releases||Latest Release||Open Issues||License||Language|
|Q||9,962||2 months ago||1||February 27, 2018||107||gpl-3.0||Python|
|q - Run SQL directly on delimited files and multi-file sqlite databases|
|Usql||8,341||2||13||4 days ago||162||December 03, 2023||69||mit||Go|
|Universal command-line interface for SQL databases|
|Visidata||6,976||5||9||2 days ago||52||July 17, 2023||16||gpl-3.0||Python|
|A terminal spreadsheet multitool for discovering and arranging data|
|Mergestat Lite||3,372||2||9 days ago||36||May 15, 2023||40||mit||Go|
|Query git repositories with SQL. Generate reports, perform status checks, analyze codebases. 🔍 📊|
|Sqlcheck||2,077||2 years ago||18||September 01, 2020||6||apache-2.0||C++|
|Automatically identify anti-patterns in SQL queries|
|Trdsql||1,698||7||18 days ago||65||November 10, 2023||15||mit||Go|
|CLI tool that can execute SQL queries on CSV, LTSV, JSON, YAML and TBLN. Can output to various formats.|
|Manage Fastapi||1,494||a month ago||14||September 02, 2021||20||mit||Python|
|:rocket: CLI tool for FastAPI. Generating new FastAPI projects & boilerplates made easy.|
|Sqlite Utils||1,348||12||186||a month ago||122||November 04, 2023||88||apache-2.0||Python|
|Python CLI utility and library for manipulating SQLite databases|
|Zsh Histdb||1,057||a year ago||33||mit||Shell|
|A slightly better history for zsh|
|Qsv||883||4 hours ago||148||November 20, 2023||25||unlicense||Rust|
|CSVs sliced, diced & analyzed.|
The PUDL Project is an open source data processing pipeline that makes US energy data easier to access and use programmatically.
Hundreds of gigabytes of valuable data are published by US government agencies, but it's often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation.
PUDL currently integrates data from:
Thanks to support from the Alfred P. Sloan Foundation Energy & Environment Program, from 2021 to 2024 we will be integrating the following data as well:
The project is focused on serving researchers, activists, journalists, policy makers, and small businesses that might not otherwise be able to afford access to this data from commercial sources and who may not have the time or expertise to do all the data processing themselves from scratch.
We want to make this data accessible and easy to work with for as wide an audience as possible: anyone from a grassroots youth climate organizers working with Google sheets to university researchers with access to scalable cloud computing resources and everyone in between!
There are several ways to access PUDL outputs. For more details you'll want to check out the complete documentation, but here's a quick overview:
We publish a lot of the data on https://data.catalyst.coop using a tool called Datasette that lets us wrap our databases in a relatively friendly web interface. You can browse and query the data, make simple charts and maps, and download portions of the data as CSV files or JSON so you can work with it locally. For a quick introduction to what you can do with the Datasette interface, check out this 17 minute video.
This access mode is good for casual data explorers or anyone who just wants to grab a small subset of the data. It also lets you share links to a particular subset of the data and provides a REST API for querying the data from other applications.
Want access to all the published data in bulk? If you're familiar with Python and Jupyter Notebooks and are willing to install Docker you can:
The PUDL Examples repository has more detailed instructions on how to work with the Zenodo data archive and Docker image.
If you're more familiar with the Python data science stack and are comfortable working
conda environments, and the Unix command line, then you can set up the
whole PUDL Development Environment on your own computer. This will allow you to run the
full data processing pipeline yourself, tweak the underlying source code, and (we hope!)
make contributions back to the project.
If you are less concerned with reproducibility and want the freshest possible data we automatically upload the outputs of our nightly builds to public S3 storage buckets as part of the AWS Open Data Registry. This data is based on the dev branch, of PUDL, and is updated most weekday mornings. It is also the data used to populate Datasette.
The nightly build outputs can be accessed using the AWS CLI, the S3 API, or downloaded directly via the web. See Accessing Nightly Builds for links to the individual SQLite, JSON, and Apache Parquet outputs.
Find PUDL useful? Want to help make it better? There are lots of ways to help!
In general, our code, data, and other work are permissively licensed for use by anybody, for any purpose, so long as you give us credit for the work we've done.
Catalyst Cooperative is a small group of data wranglers and policy wonks organized as a worker-owned cooperative consultancy. Our goal is a more just, livable, and sustainable world. We integrate public data and perform custom analyses to inform public policy (Hire us!). Our focus is primarily on mitigating climate change and improving electric utility regulation in the United States.