Skip to main content
Log in

Collective regression for handling autocorrelation of network data in a transductive setting

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Sensor networks, communication and financial networks, web and social networks are becoming increasingly important in our day-to-day life. They contain entities which may interact with one another. These interactions are often characterized by a form of autocorrelation, where the value of an attribute at a given entity depends on the values at the entities it is interacting with. In this situation, the collective inference paradigm offers a unique opportunity to improve the performance of predictive models on network data, as interacting instances are labeled simultaneously by dealing with autocorrelation. Several recent works have shown that collective inference is a powerful paradigm, but it is mainly developed with a fully-labeled training network. In contrast, while it may be cheap to acquire the network topology, it may be costly to acquire node labels for training. In this paper, we examine how to explicitly consider autocorrelation when performing regression inference within network data. In particular, we study the transduction of collective regression when a sparsely labeled network is a common situation. We present an algorithm, called CORENA (COllective REgression in Network dAta), to assign a numeric label to each instance in the network. In particular, we iteratively augment the representation of each instance with instances sharing correlated representations across the network. In this way, the proposed learning model is able to capture autocorrelations of labels over a group of related instances and feed-back the more reliable labels predicted by the transduction in the labeled network. Empirical studies demonstrate that the proposed approach can boost regression performances in several spatial and social tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Vapnik introduced an alternative transductive setting which is distributional, since both known set and unknown set are assumed to be drawn independently and identically from some unknown distribution. As shown in Vapnik (1998)(Theorem 8.1), error bounds for learning algorithms in the distribution-free setting apply to the more popular distributional transductive setting. This justifies our focus on the distributional-free setting.

  2. It is noteworthy that this phase can be overlooked when links are apriori defined in the input network data.

  3. The dissimilarity weights associated with the least-cost paths can be pre-computed before starting the iterative learning. As they depend on the descriptive values, they do not change over the collective inferences.

  4. We use the Java implementation of M5’ included in the WEKA toolkit (Witten and Frank 2005). We consider the default configuration setup with the pruning option enabled.

References

  • Anselin, L. (1995). Local indicators of spatial association:lisa. Geographical Analysis, 27(2), 93–115.

    Article  Google Scholar 

  • Antulov-Fantulin, N., Bošnjak, M., žnidaric, M., Grcar, M., Morzy, M., & Šmuc, T. (2011). Discovery challenge overview. In ECML-PKDD 2011 Discovery Challenge Workshop (pp. 7–20): Springer.

  • Appice, A., & Malerba, D. (2014). Leveraging the power of local spatial autocorrelation in geophysical interpolative clustering. Data Mining and Knowledge Discovery, 28(5-6), 1266–1313.

    Article  MathSciNet  Google Scholar 

  • Appice, A., Ceci, M., & Malerba, D. (2009a). An iterative learning algorithm for within-network regression in the transductive setting. In J. Gama, V.S. Costa, A.M. Jorge, & P. Brazdil (Eds.) Discovery Science, 12th International Conference, DS 2009, Springer, Lecture Notes in Computer Science, (Vol. 5808 pp. 36–50).

  • Appice, A., Ceci, M., & Malerba, D. (2009b). An iterative learning algorithm for within-network regression in the transductive setting. In Discovery Science (pp. 36–50): Springer.

  • Appice, A., Pravilovic, S., Malerba, D., & Lanza, A. (2013). Enhancing regression models with spatio-temporal indicator additions. In Proceedings of the 13rd International Conference of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence, AI*IA 2013, Springer, Lecture Notes in Computer Science, (Vol. 8249 pp. 433–444).

  • Arthur, G. (2008). A history of the concept of spatial autocorrelation: A geographer’s perspective. Geographical Analysis, 40(3), 297–309.

    Article  Google Scholar 

  • Bilgic, M., Namata, G.M., & Getoor, L. (2007). Combining collective classification and link prediction. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, ICDMW 2007, IEEE Computer Society (pp. 381–386).

  • Bilgic, M., Mihalkova, L., & Getoor, L. (2010). Active learning for networked data. In J. Fürnkranz, & T. Joachims (Eds.) Proceedings of the 27th International Conference on Machine Learning, ICML 2010, Omnipress (pp. 79–86).

  • Blockeel, H., Raedt, L.D., & Ramon, J. (1998). Top-down induction of clustering trees. In Shavlik, J W (Ed.) Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, USA, July 24-27, 1998, Morgan Kaufmann (pp. 55–63).

  • Chopra, S.P. (2008). Factor graphs for relational regression. ProQuest.

  • Cressie, N. (1993). Statistics for Spatial Data, 1st edn. Wiley.

  • Demšar, D., Debeljak, M., Lavigne, C., & Džeroski, S. (2005). Modelling pollen dispersal of genetically modified oilseed rape within the field. In Abstracts of the 90th ESA Annual Meeting, The Ecological Society of America (p. 152).

  • Dijkstra, E.W. (1959). A note on two problems in connexion with graphs. Numerische mathematik, 1(1), 269–271.

    Article  MathSciNet  MATH  Google Scholar 

  • Epperson, B. (2000). Spatial and space-time correlations in ecological models. Ecological modeling, 132, 63–76.

    Article  Google Scholar 

  • Fang, M., Yin, J., & Zhu, X. (2013). Transfer learning across networks for collective classification. In Proceedings of the 13th International Conference on on Data Mining, ICDM 2013 (pp. 161–170): IEEE Computer Society.

  • Gallagher, B., Tong, H., Eliassi-Rad, T., & Faloutsos, C. (2008). Using ghost edges for classification in sparsely labeled networks. In Proc. 14th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining (pp. 256–264): ACM.

  • Getoor, L. (2005). Link-based classification. In Advanced Methods for Knowledge Discovery from Complex Data, Advanced Information and Knowledge Processing (pp. 189–207). London: Springer.

  • Getoor, L., & Taskar, B. (2007). Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning): The MIT Press.

  • Goodchild, M. (1986). Spatial autocorrelation: Geo Books.

  • Grouplens (1998). http://www.grouplens.org/node/12.

  • Intel Berkeley Lab (2004). http://db.csail.mit.edu/labdata/labdata.html.

  • Jensen, D., Neville, J., & Gallagher, B. (2004a). Why collective inference improves relational classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’04. doi:10.1145/1014052.1014125 (pp. 593–598).

  • Jensen, D., Neville, J., & Gallagher, B. (2004b). Why collective inference improves relational classification. In Proc. 10th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (pp. 593–598): ACM.

  • Kuwadekar, A., & Neville, J. (2011). Relational active learning for joint collective classification models. In L. Getoor, & T. Scheffer (Eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Omnipress (pp. 385–392).

  • Legendre, P. (1993). Spatial autocorrelation: Trouble or new paradigm? Ecology, 74(6), 1659–1673.

    Article  Google Scholar 

  • Loglisci, C., Appice, A., & Malerba, D. (2014). Collective inference for handling autocorrelation in network regression. In T. Andreasen, H. Christiansen, J.C.C. Talavera, & Z.W. Ras (Eds.) Foundations of Intelligent Systems - 21st International Symposium, ISMIS 2014, Springer, Lecture Notes in Computer Science, (Vol. 8502 pp. 542–547).

  • Macskassy, S., & Provost, F. (2007). Classification in networked data: a toolkit and a univariate case study. Machine Learning, 8, 935–983.

    Google Scholar 

  • Macskassy, S.A. (2007). Improving learning in networked data by combining explicit and mined links. In Proc. 22nd Intl. Conf. on Artificial Intelligence (pp. 590–595): AAAI Press.

  • Malerba, D., Ceci, M., & Appice, A. (2009). A relational approach to probabilistic classification in a transductive setting. Engineering Applications of Artificial Intelligence, 22(1), 109–116. doi:10.1016/j.engappai.2008.04.005.

    Article  Google Scholar 

  • May, M., & Savinov, A.A. (2003). Spin!-an enterprise architecture for spatial data mining. In Proceedings of the 7th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2003, Part I (pp. 510–517).

  • McDowell, L., & Aha, D.W. (2012). Semi-supervised collective classification via hybrid label regularization. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Omnipress.

  • McDowell, L., & Aha, D.W. (2013). Labels or attributes?: rethinking the neighbors for collective classification in sparsely-labeled networks. In Q. He, A. Iyengar, W. Nejdl, J. Pei, & R. Rastogi (Eds.) Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, ACM (pp. 847–852).

  • McDowell, L., Gupta, K.M., & Aha, D.W. (2007). Case-based collective classification. In D. Wilson, & G. Sutcliffe (Eds.) Proceedings of the 20th International Florida Artificial Intelligence Research Society Conference, AAAI Press (pp. 399–404).

  • McDowell, L., Gupta, K.M., & Aha, D.W. (2009). Cautious collective classification. Journal of Machine Learning Research, 10, 2777–2836.

    MathSciNet  MATH  Google Scholar 

  • McPherson, M., Smith-Lovin, L., & Cook, J. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444.

    Article  Google Scholar 

  • Neville, J., & Jensen, D. (2000). Iterative classification in relational data. In Proc. 17th Intl. Joint Conf. on Artificial Intelligence: AAAI Press.

  • Neville, J., & Jensen, D. (2007). Relational dependency networks. Journal of Machine Learning Research, 8, 653–692.

    MATH  Google Scholar 

  • Ohashi, O., & Torgo, L. (2012). Wind speed forecasting using spatio-temporal indicators. In ECAI 2012, IOS Press, (Vol. 242 pp. 975–980).

  • Orkin, M., & Drogin, R. (1990). Vital Statistics: McGraw Hill.

  • Rattigan, M., Maier, M., & Jensen, D. (2007). Exploiting network structure for active inference in collective classification. In Seventh IEEE International Conference on Data Mining - ICDM Workshops 2007 (pp. 429–434).

  • Saha, T., Rangwala, H., & Domeniconi, C. (2012). Multi-label collective classification using adaptive neighborhoods. In Proceedings of the 11th International Conference on Machine Learning and Applications, ICMLA 2012, (Vol. 1 pp. 427–432).

  • Saha, T., Rangwala, H., & Domeniconi, C. (2014). FLIP: active learning for relational network classification. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014, Part III, Springer, Lecture Notes in Computer Science, (Vol. 8726 pp. 1–18).

  • Seeger, M. (2001). Learning with labeled and unlabeled data. Technical Report.

  • Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., & Eliassi-Rad, T. (2008). Collective classification in network data. AI Magazine, 29(3), 93–106.

    Google Scholar 

  • Shi, X., Li, Y., & Yu, P. (2011a). Collective prediction with latent graphs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, ACM (pp. 1127–1136).

  • Shi, X., Li, Y., & Yu, P.S. (2011b). Collective prediction with latent graphs. In C. Macdonald, I. Ounis, & I. Ruthven (Eds.) Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, ACM (pp. 1127–1136).

  • Simons, R.A. (2011). Erddap - the environmental research division’s data access program. http://coastwatchpfegnoaagov/erddap Pacific Grove, CA: NOAA/NMFS/SWFSC/ERD.

  • Steinhaeuser, K., Chawla, N.V., & Ganguly, A.R. (2011). Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science. Statistical Analysis and Data Mining, 4(5), 497–511.

    Article  MathSciNet  Google Scholar 

  • Stojanova, D., Ceci, M., Appice, A., & Dzeroski, S. (2012). Network regression with predictive clustering trees. Data Mining and Knowledge Discovery, 25 (2), 378–413.

    Article  MathSciNet  MATH  Google Scholar 

  • Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, UAI 2002, Morgan Kaufmann Publishers Inc. (pp. 485–492).

  • Tobler, W. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(2), 234–240.

    Article  Google Scholar 

  • Vapnik, V. (1998). Statistical Learning Theory: Wiley.

  • Wang, Y., & Witten, I. (1997). Induction of model trees for predicting continuous classes. In Proc. Poster Papers of the European Conference on Machine Learning, Faculty of Informatics and Statistics (pp. 128–137). Prague: University of Economics.

  • Weiss, Y. (2001). Comparing the mean field method and belief propagation for approximate inference in mrfs. In M. Opper, & D. Saad (Eds.) Advanced Mean Field Methods (pp. 229–243): MIT Press.

  • Witten, I., & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. San Francisco: Morgan Kaufmann.

    MATH  Google Scholar 

  • Xiang, R., & Neville, J. (2008). Pseudolikelihood em for within-network relational learning. In Proceedings of the 8th IEEE International Conference on Data Mining, ICDM 2008, IEEE (pp. 1103–1108).

Download references

Acknowledgments

This work fulfills the research objectives of the PON 02_00563_3470993 project “VINCENTE - A Virtual collective INtelligenCe ENvironment to develop sustainable Technology Entrepreneurship ecosystems” funded by the Italian Ministry of University and Research (MIUR), as well as the ATENEO 2012 project “Mining Complex Patterns” funded by University of Bari Aldo Moro. The authors wish to thank Antonella Montinari for her support in developing the software, Saso Dzeroski for providing SIGMEA data and Lynn Rudd for her help in reading the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Corrado Loglisci.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Loglisci, C., Appice, A. & Malerba, D. Collective regression for handling autocorrelation of network data in a transductive setting. J Intell Inf Syst 46, 447–472 (2016). https://doi.org/10.1007/s10844-015-0361-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-015-0361-8

Keywords

Navigation