Git Heat Map

Visualise a git repository by diff activity
Alternatives To Git Heat Map
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Dolt14,9382a day ago214May 19, 2022292apache-2.0Go
Dolt – Git for Data
Waking Up8,405
4 months ago19gpl-3.0
计算机基础(计算机网络/操作系统/数据库/Git...)面试问题全面总结,包含详细的follow-up question以及答案;全部采用【问题+追问+答案】的形式,即拿即用,直击互联网大厂面试:rocket:;可用于模拟面试、面试前复习、短期内快速备战面试...
Versionpress2,553
6 months ago232PHP
Git-based version control for WordPress. Whoa!
Irmin1,697
13 days ago2March 30, 2022139iscOCaml
Irmin is a distributed database that follows the same design principles as Git
Learn Devops1,097
2 days ago1HCL
I am using this repository to document my devops journey. I follow the process of learning everything by tasks. Every task has an associated objective that encompasses an underlying concept. Concepts including CloudProviders, Containers, ContainersOrchestration, Databases, InfrastructureAsCode, Interview, VersionControl etc in progress
Datakit999
a year ago34apache-2.0OCaml
Connect processes into powerful data pipelines with a simple git-like filesystem interface
Git Heat Map940
2 months ago2JavaScript
Visualise a git repository by diff activity
Gaskit927
9 years ago3otherJavaScript
a git-backed issue tracker
Redwood74924 months ago26April 18, 2021111mitGo
A highly-configurable, distributed, realtime database that manages a state tree shared among many peers.
Gitmodel542713 years ago8September 30, 20129mitRuby
An ActiveModel-compliant persistence framework for Ruby that uses Git for versioning and remote syncing.
Alternatives To Git Heat Map
Select To Compare


Alternative Project Comparisons
Readme

Git-Heat-Map

Map showing the files in cpython that Guido van Rossum changed the most Map showing the files in cpython that Guido van Rossum changed the most; full SVG image available in repo

Website now available

A version of this program is now available for use at heatmap.jonathanforsythe.co.uk

Basic use guide

  • Generate database with python generate_db.py {path_to_repo_dir}
  • Install flask from pip
  • Run web server with python app.py or flask run (flask run --host=<ip> to run on that ip address, with 0.0.0.0 being used for all addresses on that machine)
  • Connect on 127.0.0.1:5000
  • Available repos will be displayed, select the one you want to view
  • Add emails, commits, filenames, and date ranges you want to highlight by using the form on the right, with % acting as a wildcard
  • Clicking on any of these entries will cause the query to exclude results matching that entry
  • Choose the minimum size of box to draw, with smaller numbers resulting in greater detail at the cost of performance
  • Choose the hue that you want the chart to use for highlighting
  • Press submit query
  • Click on directories to zoom in, and the back button in the sidebar to zoom out
  • Update text rendering depth as desired

Project Structure

This project consists of two parts:

  1. Git log -> database
  2. Database -> treemap

Git log -> database

Scans through an entire git history using git log, and creates a database using three tables:

  • Files, which just keeps track of filenames
  • Commits, which stores commit hash, author, committer
  • CommitFile, which stores an instance of a certain file being changed by a certain commit, and tracks how many lines were added/removed by that commit
  • Author, which stores an author name and email
  • CommitAuthor, which links commits and Author in order to support coauthors on commits

Using these we can keep track of which files/commits changed the repository the most, which in itself can provide useful insight

Database -> treemap

Taking the database above, uses an SQL query to generate a JSON object with the following structure:

directory:
  "name": <Directory name>
  "val": <Sum of sizes of children>
  "children": [<directory or file>, ...]

file:
  "name": <File name>
  "val": <Total number of line changes for this file over all commits>

then uses this to generate an inline svg image representing a treemap of the file system, with the size of each rectangle being the val described above.

Then generates a second JSON object in a similar manner to above, but filtering for the things we want (only certain emails, date ranges, etc), then uses this to highlight the rectangles in varying intensity based on the vals returned eg highlighting the files changed most by a certain author.

Performance

These speeds were attained on my personal computer.

Database generation

Repo Number of commits Git log time Git log size Database time Database size Total time
linux 1,154,884 60 minutes 444MB 462.618 seconds 733MB 68 minutes
cpython 115,874 4.6 minutes 44.6MB 36.607 seconds 74.3MB 5.2 minutes

Time taken seems to scale linearly, going through approximately 300 commits/second, or requiring 0.0033 seconds/commit. Database size also scales linearly, with approximately 2600 commits/MB, or requiring 384 B/commit.

Querying database and displaying treemap

For this test I filtered each repo by its most prominent authors:

Repo Author filter Drawing treemap time Highlighting treemap time
linux [email protected] 19.7 s 54.3 s
cpython [email protected] 842 ms 1238 ms

These times are with minimum size drawn = 0, on very large repositories, so the performance is not completely unreasonable. This does not include the time for the browser to actually render the svg, which can take longer.

Wanted features

Submodule tracking

Currently the only submodule changes that can be seen are the top level commit pointer changes. In the future would like to recursively explore submodules and add their files to the database.

Faster database generation

Currently done using git log which can take a very long time for large repos. Will look into any other ways of getting needed information on files.

Multiple filters per query

Currently the user can submit only a single query for the highlighting. Ideally they could have a separate filter dictating which boxes to draw in the first place, and possibly multiple filters that could result in multiple colour highlighting on the same image.

Popular Git Projects
Popular Database Projects
Popular Version Control Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Javascript
Python
Database
Git
Treemap