Project Entity Matching across FLOSS Repositories

Conklin, Megan

doi:10.1007/978-0-387-72486-7_4

Megan Conklin⁵

Part of the book series: IFIP — The International Federation for Information Processing ((IFIPAICT,volume 234))

Included in the following conference series:

IFIP International Conference on Open Source Systems

1506 Accesses
5 Citations

Abstract

Much of the data about free, libre, and open source (FLOSS) software development comes from studies of code repositories used for managing projects. This paper presents a method for integrating data about open source projects by way of matching projects (entities) and deleting duplicates across multiple code repositories. After a review of the relevant literature, a few of the methods are chosen and applied to the FLOSS domain, including a simple scoring system for confidence in pairwise project matches. Finally, the paper describes limitations of this approach and recommendations for future work.

Download to read the full chapter text

Chapter PDF

Using Co-evolution of Artefacts in Git Repository to Establish Test-to-Code Traceability Links on Method-Level

Cross-project code clones in GitHub

Article 05 September 2018

Automatic identification of self-admitted technical debt from four different sources

Article Open access 15 April 2023

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

7 References

Batini, C., Lenzerini, M., Navathe, S. (1986). A comparative analysis of methodologies for database schema integration. ACM Comp. Surveys, 18:4. 323–364.
Article Google Scholar
Conklin, M. (2005). Beyond low-hanging fruit: Seeking the next generation of FLOSS data mining. In Proc. 2 ^nd Intl. Conf. on Open Source Sys. Como, Italy. 47–56.
Google Scholar
Doan, A., Domingos, P., Halevy, A. (2001). Reconciling schemas of disparate data sources: A machine learning approach. In Proc. of the ACM SIGMOD. Santa Barbara, CA, USA. 509–520.
Google Scholar
Doan, A., Lu, Y., Lee, Y., Han, J. (2003). Object matching for information integration: A profiler-based approach. In Proc. of the IJCAI Workshop on Information Integration on the Web. Acapulco, Mexico. 53–58.
Google Scholar
Howison, J., Conklin, M., Crowston, K. (2005). OSSmole: A Collaborative Repository for FLOSS Research Data and Analyses. In Proc. of the 1st Intl. Conf. on Open Source Sys. Genova, Italy. 54–59.
Google Scholar
Menestrina, D., Benejelloun, O., Garcia-Molina, H. (2006). Generic entity resolution with data confidences. In Proc. of 1st Int. VLDB Workshop on Clean Databases. Seoul, Korea.
Google Scholar
On, B-W., Lee, D., Kang, J., Mitra, P. (2005). Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proc. of the 5th ACM/IEEE-CS Joint Conf. on Digital Libraries. Denver, CO, USA. 344–353.
Google Scholar
Rahm, E. and Bernstein, P. (2001). A survey of approaches to automatic schema matching. VLDB Journal, 10. 334–350.
Article MATH Google Scholar
Robles, G. and Gonzalez-Barahona, J. (2005). Developer identification methods for integrated data from various sources. In Proc. of the Mining Software Repositories Workshop (MSR2005). 1–5.
Google Scholar
Winkler, W. (1999). The State of Record Linkage and Current Research Problems. Technical Report, Statistical Research Division, US Bureau of the Census.
Google Scholar

Download references

Author information

Authors and Affiliations

Elon University Campus, Box 2126, Elon, NC, 27244, USA
Megan Conklin

Authors

Megan Conklin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Business Information Systems, University College Cork, Ireland
Joseph Feller
Lero - the Irish Software Engineering Research Centre, University of Limerick, Ireland
Brian Fitzgerald
Institute for Software Research, Donald Bren School of Information and Computer Sciences, University of California, Irvine, USA
Walt Scacchi
Free University of Bolzano-Bozen, Italy
Alberto Sillitti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Conklin, M. (2007). Project Entity Matching across FLOSS Repositories. In: Feller, J., Fitzgerald, B., Scacchi, W., Sillitti, A. (eds) Open Source Development, Adoption and Innovation. OSS 2007. IFIP — The International Federation for Information Processing, vol 234. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-72486-7_4

Download citation

DOI: https://doi.org/10.1007/978-0-387-72486-7_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-72485-0
Online ISBN: 978-0-387-72486-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Project Entity Matching across FLOSS Repositories

Abstract

Chapter PDF

Similar content being viewed by others

Using Co-evolution of Artefacts in Git Repository to Establish Test-to-Code Traceability Links on Method-Level

Cross-project code clones in GitHub

Automatic identification of self-admitted technical debt from four different sources

Keywords

7 References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Project Entity Matching across FLOSS Repositories

Abstract

Chapter PDF

Similar content being viewed by others

Using Co-evolution of Artefacts in Git Repository to Establish Test-to-Code Traceability Links on Method-Level

Cross-project code clones in GitHub

Automatic identification of self-admitted technical debt from four different sources

Keywords

7 References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation