Git2pg

Git to PostgreSQL repository migration.
Alternatives To Git2pg
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Migrate11,0535942 days ago129March 17, 2022260otherGo
Database migrations. CLI and Golang library.
Flyway7,0834,044442a day ago156September 08, 2022110apache-2.0Java
Flyway by Redgate • Database Migrations Made Easy.
Nodal4,5474433 months ago272June 07, 202240mitJavaScript
API Services Made Easy With Node.js
Pgloader4,413
8 days ago1February 27, 2018288otherCommon Lisp
Migrate to PostgreSQL in a single command!
Goose3,69665992 days ago46August 29, 202255otherGo
A database migration tool. Supports SQL migrations and Go functions.
Strong_migrations3,403103413 days ago46September 21, 20224mitRuby
Catch unsafe migrations in development
Dbmate3,1871a day ago27March 25, 202212mitGo
:rocket: A lightweight, framework-agnostic database migration tool.
Scenic3,1181141120 days ago18February 13, 202220mitRuby
Versioned database views for Rails
Fluentmigrator2,9095481308 days ago52January 14, 2022221apache-2.0C#
Fluent migrations framework for .NET
Migra2,670282 months ago99May 08, 202269unlicensePython
Like diff but for PostgreSQL schemas
Alternatives To Git2pg
Select To Compare


Alternative Project Comparisons
Readme

git2pg

Migrate git repositories to a PostgreSQL database.

Install

Manually using go get:

go get github.com/erizocosmico/git2pg/cmd/git2pg/...

Or manually building the binary by hand:

# at the repository root folder
go build -o git2pg ./cmd/git2pg/main.go

When the project is more stable, a pre-built binary will be provided in the releases page.

## Usage

To configure how git2pg works, you will need to use environment variables to specify the database details and command line flags to control certain aspects of the program.

Environment variables

  • DBHOST: PostgreSQL database host, 127.0.0.1 by default.
  • DBPORT: PostgreSQL database port, 5432 by default.
  • DBUSER: PostgreSQL database user, postgres by default.
  • DBPASS: PostgreSQL database password, `` by default.
  • DBNAME: PostgreSQL database name, postgres by default.

Command line flags

  • -d <path> path to the collection of repositories that will be migrated. For example, -d /home/myuser/repos. This must be a folder containing non-bare git repositories.
  • -siva whether the collection of repositories are using the siva archiving format. Not enabled by default.
  • -rooted whether the collection of repositories are rooted because they were collected with gitcollector. Not enabled by default.
  • -buckets=N number of characters for bucketing in case the repositories are in buckets. By default, 0. For example, -buckets=2 for a structure like the following:
|- go
   |- goofy
   |- goober
|- py
   |- pytorch
   |- pylint
  • -workers=N number of parallel workers to use. This means, the number of repositories that will be migrated in parallel at the same time. By default is cpu cores / 2. Check out the note on worker numbers at the end of this section.
  • -repo-workers=N number of workers to use while processing each single repository. By default is cpu cores / 2. Check out the note on worker numbers at the end of this section.
  • -v verbose mode that will spit more logs. Only meant for debugging purposes. Not enabled by default.
  • -create create the tables necessary in the schema.
  • -drop drop the tables if they exist before creating them again. This option cannot be used unless -create is used as well.
  • -full migrate all the trees in the repository for each commit of each reference. By default, only the trees of the HEAD of each reference is migrated, because the space and time it takes lowers dramatically and is the most common case. If you need the full repository data, use this option.
  • -max-blob-size=N migrate only blobs with a size lower than the given number in megabytes.
  • -no-binary-blobs do not migrate blobs of files that are binaries.
  • -cstore=CSTORE_FDW_SERVER_NAME if the data should be imported in columnar format to cstore_fdw, provide the server name. e.g. -cstore=cstore_server.

Note on setting worker numbers

Since each repository can have more than one worker, you need to take into account that WORKERS * REPOWORKERS should be equal or lower to the number of cores of your machine.

For example, in a 32 core machine, where you want 2 repo workers per repository, you could have 16 workers, since 2 repo workers for each of the 16 workers is equal to the number of cores of the machine.

Example of usage:

git2pg -d /path/to/repos -workers=4 -repo-workers=2

Docker image usage

Pull the image from the docker registry:

docker pull erizocosmico/git2pg

And then run the image providing the following data:

  • Database configuration via environment variables (described in the environment variables section).
  • Mount your repository folder as a volume to /repositories.
  • Provide the command line flags you need.

For example:

docker run --name git2pg -v /path/to/repositories:/repositories \
    -e DBUSER=dbuser \
    -e DBPASS=dbpass \
    -e DBPORT=5432 \
    -e DBNAME=dbname \
    -e DBHOST=postgres \
    erizocosmico/git2pg -workers=4 -repo-workers=2 -create -drop -v

Schema

The schema is provided in schema.sql for reference purposes, but you can create it directly using the tool with the -create command line flag.

The schema contains the following tables:

  • repositories: containing only ids of repositories.
  • remotes: containing the remotes with their URLs and fetch refspecs.
  • refs: containing the references of each repository and the commits they point to. References to objects other than commits are not included.
  • ref_commits: which has each commit in each reference in each repository with a history_index, which is the offset to the HEAD of the reference.
  • commits: containing all the commit information. Each table has the reference of the root tree at this point. That can be used to join with other tables that have information of root trees.
  • tree_entries: containing all the tree entries in each repository. This table is not very useful, but migrated just to have that data that is in git.
  • tree_blobs: containing the blob hashes that are in each root tree of each repository.
  • tree_blobs: containing the files that are in each root tree of each repository.
  • blobs: containing all the blobs in each repository, including its file content.

LICENSE

Apache 2.0, see LICENSE

Popular Migration Projects
Popular Postgresql Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Go
Database
Git
Postgresql
Sql
Migration
Vcs