Mining and Creating a Software Repositories Dataset
2020
Mining software repositories to extract meaningful information from them has become an important topic in software engineering. This paper presents our study to mine a very large dataset consisting of over three million software repositories across many version control systems and create derived data for future studies. Through this study, we propose a method for detecting forks and duplicates in repositories. We also preliminarily investigate the possible correlations between forking patterns, software health and risks, and success indicators.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
21
References
0
Citations
NaN
KQI