Journal of Intelligent Information Systems

, Volume 51, Issue 2, pp 207–234 | Cite as

Robust learning in expert networks: a comparative analysis

  • Ashiqur R. KhudaBukhshEmail author
  • Jaime G. Carbonell
  • Peter J. Jansen


Human experts as well as autonomous agents in a referral network must decide whether to accept a task or refer to a more appropriate expert, and if so to whom. In order for the referral network to improve over time, the experts must learn to estimate the topical expertise of other experts. This article extends concepts from Multi-agent Reinforcement Learning and Active Learning to referral networks for distributed learning in referral networks. Among a wide array of algorithms evaluated, Distributed Interval Estimation Learning (DIEL), based on Interval Estimation Learning, was found to be superior for learning appropriate referral choices, compared to 𝜖-Greedy, Q-learning, Thompson Sampling and Upper Confidence Bound (UCB) methods. In addition to a synthetic data set, we compare the performance of the stronger learning-to-refer algorithms on a referral network of high-performance Stochastic Local Search (SLS) SAT solvers where expertise does not obey any known parameterized distribution. An evaluation of overall network performance and a robustness analysis is conducted across the learning algorithms, with an emphasis on capacity constraints and evolving networks, where experts with known expertise drop off and new experts of unknown performance enter — situations that arise in real-world scenarios but were heretofore ignored.


Referral networks Active learning Reinforcement learning 


  1. Abdallah, S., & Lesser, V.R. (2006). Learning the task allocation game. In Proc. of AAMAS ’06 (pp. 850–857). ACM.Google Scholar
  2. Agrawal, R. (1995). Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability pp. 1054–1078.Google Scholar
  3. Agrawal, S., & Goyal, N. (2012). Analysis of thompson sampling for the multi-armed bandit problem. In COLT (pp. 39–1).Google Scholar
  4. Applegate, D.L., Bixby, R.E., Chvatal, V., Cook, W.J. (2011). The traveling salesman problem: a computational study. Princeton: Princeton University Press.zbMATHGoogle Scholar
  5. Audibert, J.Y., & Bubeck, S. (2010). Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11(Oct), 2785–2836.MathSciNetzbMATHGoogle Scholar
  6. Audibert, J.Y., Munos, R., Szepesvári, C. (2007). Tuning bandit algorithms in stochastic environments. In International conference on algorithmic learning theory (pp. 150–165). Springer.Google Scholar
  7. Auer, P., Cesa-Bianchi, N., Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3), 235–256.CrossRefzbMATHGoogle Scholar
  8. Axelrod, R. (2003). Advancing the art of simulation in the social sciences. Journal of the Japanese and International Economies, 12(3), 16–22.Google Scholar
  9. Barabási, A.L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K. (2010). Soylent: a word processor with a crowd inside. In Proc. of UIST ’10 (pp. 313–322). ACM.Google Scholar
  11. Berry, D.A., & Fristedt, B. (1985). Bandit problems: sequential allocation of experiments (Monographs on statistics and applied probability) Vol. 12. Berlin: Springer.CrossRefzbMATHGoogle Scholar
  12. Biere, A., Cimatti, A., Clarke, E.M., Fujita, M., Zhu, Y. (1999). Symbolic model checking using SAT procedures instead of BDDs. In Proceedings of the 36th annual ACM/IEEE design automation conference (pp. 317–320). ACM.Google Scholar
  13. Biere, A., Heule, M., van Maaren, H. (2009). Handbook of satisfiability Vol. 185. Amsterdam: IOS Press.zbMATHGoogle Scholar
  14. Blum, A., & Mansour, Y. (2007). From external to internal regret. Journal of Machine Learning Research, 8(Jun), 1307–1324.MathSciNetzbMATHGoogle Scholar
  15. Brinker, K. (2003). Incorporating diversity in active learning with support vector machines. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 59–66).Google Scholar
  16. Chakrabarti, D., Kumar, R., Radlinski, F., Upfal, E. (2009). Mortal multi-armed bandits. In Advances in neural information processing systems (pp. 273–280).Google Scholar
  17. Chapelle, O., & Li, L. (2011). An empirical evaluation of thompson sampling. In Advances in neural information processing systems (pp. 2249–2257).Google Scholar
  18. Cheng, J., & Bernstein, M.S. (2015). Flock: hybrid crowd-machine learning classifiers. In Proc. of CSCW 2015 (pp. 600–611). ACM.Google Scholar
  19. Cook, S.A. (1971). The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on theory of computing (pp. 151–158). ACM.Google Scholar
  20. Crawford, J.M., & Baker, A.B. (1994). Experimental results on the application of satisfiability algorithms to scheduling problems. In AAAI, vol. 2 (pp. 1092–1097).Google Scholar
  21. Donmez, P., & Carbonell, J.G. (2008). Proactive learning: cost-sensitive active learning with multiple imperfect oracles. Proceedings of CIKM ’08, 08, 619–628.CrossRefGoogle Scholar
  22. Donmez, P., Carbonell, J.G., Schneider, J. (2009). Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 259–268). ACM.Google Scholar
  23. Donmez, P., Carbonell, J.G., Schneider, J. (2010). A probabilistic framework to learn from multiple annotators with Time-Varying accuracy. In Proceedings of the SIAM international conference on data mining (SDM 2010) (pp 826–837).Google Scholar
  24. Foner, L.N. (1997). Yenta: a multi-agent, referral-based matchmaking system. In Proceedings of the first international conference on autonomous agents (pp. 301–307). ACM.Google Scholar
  25. Fraenkel, A.S. (1993). Complexity of protein folding. Bulletin of Mathematical Biology, 55(6), 1199–1210.CrossRefzbMATHGoogle Scholar
  26. Freund, Y., Schapire, R.E., Singer, Y., Warmuth, M.K. (1997). Using and combining predictors that specialize. In Proceedings of the twenty-ninth annual ACM symposium on theory of computing (pp. 334–343). ACM.Google Scholar
  27. Garivier, A., & Cappé, O. (2011). The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th annual conference on learning theory (pp. 359–376).Google Scholar
  28. Gelder, A.V. (2008). Another look at graph coloring via propositional satisfiability. Discrete Applied Mathematics, 156(2), 230–243.MathSciNetCrossRefzbMATHGoogle Scholar
  29. Guo, Y., & Schuurmans, D. (2008). Discriminative batch mode active learning. In Advances in neural information processing systems (pp. 593–600).Google Scholar
  30. Heimerl, K., Gawalt, B., Chen, K., Parikh, T., Hartmann, B. (2012). Communitysourcing: engaging local crowds to perform expert work via physical kiosks. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1539–1548). ACM.Google Scholar
  31. Hoi, S., Jin, R., Lyu, M.R. (2006). Large-scale text categorization by batch mode active learning. In Proceedings of the 15th international conference on World Wide Web (pp. 633–642). ACM.Google Scholar
  32. Hoi, S., Jin, R., Zhu, J., Lyu, M.R. (2006). Batch mode active learning and its application to medical image classification. In Proceedings of the 23rd international conference on machine learning (pp. 417–424). ACM.Google Scholar
  33. Holme, P., & Kim, B.J. (2002). Growing scale-free networks with tunable clustering. Physical Review E, 65(2), 026,107.CrossRefGoogle Scholar
  34. Jaakkola, T., Jordan, M.I., Singh, S.P. (1994). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (pp. 703–710).Google Scholar
  35. Jensen, D., & Neville, J. (2002). Data mining in social networks. In National academy of sciences symposium on dynamic social network modeling and analysis.Google Scholar
  36. Kaelbling, L.P. (1993). Learning in embedded systems. Cambridge: MIT Press.Google Scholar
  37. Kaelbling, L.P., Littman, M.L., Moore, A.P. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285.CrossRefGoogle Scholar
  38. Kandasamy, K., Krishnamurthy, A., Schneider, J., Poczos, B. (2017). Asynchronous parallel bayesian optimisation via thompson sampling. arXiv:1705.09236.
  39. Kapoor, A., Horvitz, E., Basu, S. (2007). Selective supervision: guiding supervised learning with decision-theoretic active learning. In IJCAI, vol. 7 (pp. 877–882).Google Scholar
  40. Kaufmann, E., Cappé, O., Garivier, A. (2012). On bayesian upper confidence bounds for bandit problems. In Artificial intelligence and statistics (pp. 592–600).Google Scholar
  41. Kautz, H., & Selman, B. (1996). Pushing the envelope: planning, propositional logic, and stochastic search. In Proceedings of the national conference on artificial intelligence (pp. 1194–1201).Google Scholar
  42. Kautz, H., & Selman, B. (1999). Unifying SAT-based and graph-based planning. In IJCAI, vol. 99 (pp. 318–325).Google Scholar
  43. Kautz, H., Selman, B., Milewski, A. (1996). Agent amplified communication pp. 3–9.Google Scholar
  44. KhudaBukhsh, A.R., Xu, L., Hoos, H.H., Leyton-brown, K. (2009). SATenstein: automatically building local search SAT solvers from components. In IJCAI, vol. 9 (pp. 517–524).Google Scholar
  45. KhudaBukhsh, A.R., Xu, L., Hoos, H.H., Leyton-brown, K. (2016). SATenstein: automatically building local search SAT solvers from components. Artificial Intelligence, 232, 20–42.MathSciNetCrossRefzbMATHGoogle Scholar
  46. KhudaBukhsh, A.R., Carbonell, J.G., Jansen, P.J. (2016). Proactive-DIEL in evolving referral networks. In European conference on multi-agent systems (pp. 148–156). Springer.Google Scholar
  47. KhudaBukhsh, A.R., Carbonell, J.G., Jansen, P.J. (2016a). Proactive skill posting in referral networks. In Australasian joint conference on artificial intelligence (pp. 585–596). Springer.Google Scholar
  48. KhudaBukhsh, A.R., Jansen, P.J., Carbonell, J.G. (2016b). Distributed learning in expert referral networks. In European conference on artificial intelligence (ECAI), 2016 (pp. 1620–1621).Google Scholar
  49. KhudaBukhsh, A.R., Carbonell, J.G., Jansen, P.J. (2017). Incentive compatible proactive skill posting in referral networks. In European conference on multi-agent systems, p. [to appear]. Springer.Google Scholar
  50. King, R.D., Whelan, K.E., Jones, F.M., Reiser, P., Bryant, C.H., Muggleton, S.H., Kell, D.B., Oliver, S.G. (2004). Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427(6971), 247–252.CrossRefGoogle Scholar
  51. Kleinberg, R., Niculescu-Mizil, A., Sharma, Y. (2010). Regret bounds for sleeping experts and bandits. Machine Learning, 80(2-3), 245–272.MathSciNetCrossRefzbMATHGoogle Scholar
  52. Lai, T.L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.MathSciNetCrossRefzbMATHGoogle Scholar
  53. Lewis, D.D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the eleventh international conference on machine learning (pp. 148–156).Google Scholar
  54. Lewis, D.D., & Gale, W.A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (pp. 3–12). New York: Springer.Google Scholar
  55. Lin, S., Hong, W., Wang, D., Li, T. (2017). A survey on expert finding techniques. Journal of Intelligent Information Systems pp. 1–25.Google Scholar
  56. Littman, M.L., & Szepesvári, C. (1996). A generalized reinforcement-learning model: convergence and applications. In ICML (pp. 310–318).Google Scholar
  57. Manavalan, P., & Singh, M.P. (2012). Emerging properties of knowledge sharing referral networks: considerations of effectiveness and fairness. Lecture Notes in Computer Science pp. 13–23.Google Scholar
  58. May, B.C., Korda, N., Lee, A., Leslie, D.S. (2012). Optimistic bayesian sampling in contextual-bandit problems. Journal of Machine Learning Research, 13 (Jun), 2069–2106.MathSciNetzbMATHGoogle Scholar
  59. McDonald, D.W., & Ackerman, M.S. (2000). Expertise recommender: a flexible recommendation system and architecture. In CSCW ’00 Proceedings of the 2000 ACM conference on computer supported cooperative work (pp 231–240).Google Scholar
  60. Nallapati, R., Peerreddy, S., Singhal, P. (2012). Skierarchy: extending the power of crowdsourcing using a hierarchy of domain experts, crowd and machine learning. Tech. rep., DTIC Document.Google Scholar
  61. Pop, M., Salzberg, S.L., Shumway, M. (2002). Genome sequence assembly: algorithms and issues. Computer, 35(7), 47–54.CrossRefGoogle Scholar
  62. Pushpa, S., Easwarakumar, K.S., Elias, S., Maamar, Z. (2010). Referral based expertise search system in a time evolving social network. In Proceedings of the Third annual ACM bangalore conference on - COMPUTE ’10 (pp 1–8).Google Scholar
  63. Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Zhang, H.J. (2008). Two-dimensional active learning for image classification. In IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008 (pp. 1–8). IEEE.Google Scholar
  64. Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L. (2010). Learning from crowds. Journal of Machine Learning Research, 11(Apr), 1297–1322.MathSciNetGoogle Scholar
  65. Reichart, R., Tomanek, K., Hahn, U., Rappoport, A. (2008). Multi-task active learning for linguistic annotations. In ACL, vol. 8 (pp. 861–869).Google Scholar
  66. Settles, B. (2010). Active learning literature survey. University of Wisconsin, Madison, 52(55-66), 11.Google Scholar
  67. Sheng, V.S., & Ling, C.X. (2006). Feature value acquisition in testing: a sequential batch test algorithm. In Proceedings of the 23rd international conference on Machine learning (pp. 809–816). ACM.Google Scholar
  68. Sheng, V.S., Provost, F., Ipeirotis, P.G. (2008). Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 614–622). ACM.Google Scholar
  69. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y. (2008). Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing (pp. 254–263). Association for Computational Linguistics.Google Scholar
  70. Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. In: IEEE computer society conference on computer vision and pattern recognition workshops, 2008. CVPRW’08 (pp. 1–8). IEEE.Google Scholar
  71. Stephan, P., Brayton, R.K., Sangiovanni-Vincentelli, A.L. (1996). Combinational test generation using satisfiability. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15(9), 1167–1176.CrossRefGoogle Scholar
  72. Thompson, W.R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.CrossRefzbMATHGoogle Scholar
  73. Tsitsiklis, J.N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), 185–202.zbMATHGoogle Scholar
  74. Vijayanarasimhan, S., & Grauman, K. (2011). Cost-sensitive active visual category learning. International Journal of Computer Vision, 91(1), 24–44.CrossRefzbMATHGoogle Scholar
  75. van Hasselt, H. (2010). Double Q-learning. In Advances in neural information processing systems (pp. 2613–2621).Google Scholar
  76. Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4), 279–292.CrossRefzbMATHGoogle Scholar
  77. Watts, D.J., & Strogatz, S.H. (1998). Collective dynamics of `small-world’networks. Nature, 393(6684), 440–442.CrossRefzbMATHGoogle Scholar
  78. Whitehill, J., Wu, T., Bergsma, J., Movellan, J.R., Ruvolo, P.L. (2009). Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems (pp. 2035–2043).Google Scholar
  79. Wiering, M., & Schmidhuber, J. (1998). Efficient model-based exploration. In Proceedings of the Fifth international conference on simulation of adaptive behavior (SAB’98) (pp. 223–228).Google Scholar
  80. Xu, Z., Akella, R., Zhang, Y. (2007). Incorporating diversity and density in active learning for relevance feedback. In ECIr, vol. 7 (pp. 246–257). Springer.Google Scholar
  81. Yang, L., & Carbonell, J.G. (2013). Buy-in-bulk active learning. In Advances in neural information processing systems (pp. 2229–2237).Google Scholar
  82. Yolum, P., & Singh, M.P. (2003). Dynamic communities in referral networks. Web Intelligence and Agent Systems, 1(2), 105–116.Google Scholar
  83. Yu, B. (2002). Emergence and evolution of agent-based referral networks. Ph.D. thesis: North Carolina State University.Google Scholar
  84. Yu, B., & Singh, M.P. (2003). Searching social networks. In Proceedings of the second international joint conference on Autonomous agents and multiagent systems AAMAS 03.Google Scholar
  85. Yu, B., Venkatraman, M., Singh, M.P. (2003). An adaptive social network for information access: theoretical and experimental results. Applied Artificial Intelligence, 17, 21–38.CrossRefGoogle Scholar
  86. Yu, L. (2011). Crowd creativity through combination. In Proc. of creativity and cognition 2015 (pp. 471–472). ACM.Google Scholar
  87. Yu, L., & Nickerson, J.V. (2013). An internet-scale idea generation system. ACM Transactions on Interactive Intelligent Systems (TiiS), 3(1), 2.Google Scholar
  88. Zhang, H., & Lesser, V.R. (2007). A reinforcement learning based distributed search algorithm for hierarchical peer-to-peer information retrieval systems. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems (p. 47). ACM.Google Scholar
  89. Zhang, C., Lesser, V.R., Shenoy, P. (2009). A multi-agent learning approach to online distributed resource allocation. In Proc. of IJCAI-09, vol. 1 (pp. 361–366). Pasadena.
  90. Zhang, C., Lesser, V.R., Abdallah, S. (2010). Self-organization for coordinating decentralized reinforcement learning. In van der Hoek, K. (Ed.) Proc. of AAMAS ’10 (pp. 739–746). Toronto.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations