Mining and Creating a Software Repositories Dataset

2020 
Mining software repositories to extract meaningful information from them has become an important topic in software engineering. This paper presents our study to mine a very large dataset consisting of over three million software repositories across many version control systems and create derived data for future studies. Through this study, we propose a method for detecting forks and duplicates in repositories. We also preliminarily investigate the possible correlations between forking patterns, software health and risks, and success indicators.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []