“Mining software repositories is an increasingly popular and important area of software engineering research aimed at retrieving, integrating, and analyzing data available in various kinds of software repositories” Massimiliano Di Penta


Tools

  • maven-miner Mines Maven Central and creates a global dependency graph.

  • source{d} Engine Powerful language-agnostic analysis of your source code and git history.

  • reaper Calculate the score of a repository based on best engineering practices.

Datasets

  • GH Archive Records the public GitHub timeline, archive it, and make it easily accessible for further analysis.

  • The GHTorrent project An effort to create a scalable, queriable, offline mirror of data offered through the Github REST API.

  • jsDelivr is a Content Delivery Network (CDN) that can be used to download the GitHub files without any rate limit. (Ask Zimin about how to get the difference between two versions of the same file).

Influential papers