Advertisement

Abstract

This paper develops PAC (probably approximately correct) error bounds for network classifiers in the transductive setting, where the network node inputs and links are all known, the training nodes class labels are known, and the goal is to classify a working set of nodes that have unknown class labels. The bounds are valid for any model of network generation. They require working nodes to be selected independently, but not uniformly at random. For example, they allow different regions of the network to have different densities of unlabeled nodes.

Keywords

network classifier collective classification validation error bound worst likely assignment 

References

  1. 1.
    Bax, E.: Nearly uniform validation improves compression-based error bounds. Journal of Machine Learning Research 9, 1741–1755 (2008)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Bax, E., Callejas, A.: An error bound based on a worst likely assignment. Journal of Machine Learning Research 9, 581–613 (2008)MathSciNetGoogle Scholar
  3. 3.
    Bollobas, B.: Random Graphs, 2nd edn. Cambridge University Press (2001)Google Scholar
  4. 4.
    Bondy, J.A., Murty, U.: Graph Theory. Springer (2008)Google Scholar
  5. 5.
    Cataltepe, Z., Sonmez, A., Baglioglu, K., Erzan, A.: Collective classification using heterogeneous classifiers. In: 7th International Conference on Machine Learning and Data Mining, MLDM 2011 (2011)Google Scholar
  6. 6.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press (2000)Google Scholar
  7. 7.
    Feller, W.: An Introduction to Probability Theory and Its Applications. John Wiley & Sons, New York (1968)zbMATHGoogle Scholar
  8. 8.
    Frank, O.: Survey sampling in graphs. Journal of Statistical Planning and Inference 1, 235–264 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)MathSciNetGoogle Scholar
  10. 10.
    Kolaczyk, E.D.: Statistical Analysis of Network Data. Springer (2010)Google Scholar
  11. 11.
    Macskassy, S., Provost, F.: A simple relational classifier. In: Proceedings of the Multi-Relational Data Mining Workshop (MRDM) at the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, pp. 64–76 (2003)Google Scholar
  12. 12.
    Macskassy, S.A., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research 8, 935–983 (2007)Google Scholar
  13. 13.
    Sen, P., Getoor, L.: Empirical comparison of approximate inference algorithms for networked data. In: ICML Workshop on Open Problems in Statistical Relational Learning, SRL 2006 (2006)Google Scholar
  14. 14.
    Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3) (2008)Google Scholar
  15. 15.
    Vapnik, V.: Statistical Learning Theory. John Wiley & Sons (1998)Google Scholar
  16. 16.
    Watts, D.: Six Degrees: The Science of a Connected Age. Norton & Company (2003)Google Scholar
  17. 17.
    Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)CrossRefGoogle Scholar
  18. 18.
    Zheleva, E., Getoor, L.: To join or not to join: The illusion of privacy in social networks with mixed public and private user profiles. In: 18th International World Wide Web Conference, pp. 531–531 (April 2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • James Li
    • 1
    • 2
    • 3
  • Abdullah Sonmez
    • 1
    • 2
    • 3
  • Zehra Cataltepe
    • 1
    • 2
    • 3
  • Eric Bax
    • 1
    • 2
    • 3
  1. 1.Cornell UniversityUSA
  2. 2.Istanbul Technical UniversityTurkey
  3. 3.Yahoo! Inc.USA

Personalised recommendations