Skip to content

Latest commit

 

History

History
270 lines (221 loc) · 18.5 KB

README.md

File metadata and controls

270 lines (221 loc) · 18.5 KB

Awesome Open Source Software Research Data

GitHub Awesome

This is a (curated) list of relevant datasets, data sources, and empirical research in the space of Open Source Software development. We prioritize sources in which (1) the raw data is made publicly accessible or (2) the published metrics are derived from public sources. We also include data sources for which only high level insights are available.

An excellent list of datasets used for empirical software engineering / mining software repositories exists at dspinellis/awesome-msr. Several relevant data sources from this list are included here.

Contents

  1. Databases and archives
  2. Metrics
  3. Contribution patterns
  4. Public policy
  5. Platforms
  6. Survey data

Databases and archives

Software development activity

Contributor communication

Security

Software Faults

Other

Software dependency networks

  • deps.dev - Open Source Insights
  • Libraries.io
    • Data on software package depdency relationships over time. Sourced from a number of different ecosystems.
    • Data releases
  • Repology
    • montors software package vintages (i.e. versioning) across a number of ecosystems

Package downloads

Metrics

General

Valuation

Community Health

Contributor Experience

  • Denivan Campos, Luana Martins, & Ivan Machado. (2022). An empirical study on the influence of developers' experience on software test code quality [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.7110141
  • Perez, Quentin, Urtado, Christelle, Vauttier, Sylvain, 2022. Dataset of Open-Source Software Developers Labeled by their Experience Level in the Project and their Associated Software Metrics. https://doi.org/10.5281/zenodo.6966195

Project Characteristics

  • Munaiah, N., Kroh, S., Cabrey, C. and Nagappan, M., 2017. Curating github for engineered software projects. Empirical Software Engineering, 22(6), pp.3219-3253. project website
  • Dabic, Ozren, Aghajani, Emad, Bavota, Gabriele, 2021. GHS (GitHub Search): Sampling Projects in GitHub for MSR Studies. https://doi.org/10.5281/zenodo.4588464

Contribution patterns

  • Choudhary, Samridhi; Bogart, Christopher; Rose, Carolyn; Herbsleb, James (2020): Modeling Productivity in Open Source GitHub Projects: A Dataset and Codebase. Carnegie Mellon University. Dataset. https://doi.org/10.1184/R1/6397013.v1
  • Marco Ortu, Giuseppe Destefanis, Daniel Graziotin, Michele Marchesi, Marco Tonelli, 2020. Dataset - How do you propose your code changes? Empirical Analysis of Affect Metrics of Pull Requests on GitHub. https://doi.org/10.5281/zenodo.3825044
  • Champion, K. and Hill, B.M., 2021. Underproduction: An approach for measuring risk in open source software. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (pp. 388-399). IEEE.
  • Wachs, J., Nitecki, M., Schueller, W. and Polleres, A., 2022. The geography of open source software: Evidence from github. Technological Forecasting and Social Change, 176, p.121478.

OSS contribution by private firms

  • Open Source Contributor Index (OSCI)
    • Tracks GitHub contribution by commercial firms
    • Measures active and total contributors
    • Drawn from GH Archive (events from GitHub's public timeline)
  • Spinellis, Diomidis, Kotti, Zoe, Kravvaritis, Konstantinos, Theodorou, Georgios, & Louridas, Panos. (2020). Enterprise-Driven Open Source Software (1.1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3742962
  • Angermeir, F., Voggenreiter, M., Moyón, F. and Mendez, D., 2021, May. Enterprise-driven open source software: a case study on security automation. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (pp. 278-287). IEEE.
  • Shimels Garomssa, Rathimala Kannan, Ian Chai, Dirk Riehle, 2022. How Software Quality Mediates the Impact of Intellectual Capital on Commercial Open Source Software Company Success. Available at: https://dx.doi.org/10.21227/3rwb-vg72.

Public Policy

Platforms

Contribution and Bug Bounty platforms

Funding

Wikipedia

Q&A Platforms

Survey Data

Other Resources