Advertisement

BJR-tree: fast skyline computation algorithm using dominance relation-based tree structure

  • Kenichi KoizumiEmail author
  • Peter Eades
  • Kei Hiraki
  • Mary Inaba
Regular Paper

Abstract

High-throughput label-free single-cell screening technology has been studied for the noninvasive analysis of various kinds of cells. Selecting the prominent cells with extreme features from a large number of cells is an important and interesting problem, which we call the serendipitous searching problem (SSP). In the SSP, it is important to find entries located near the rind of the population in a multi-dimensional feature space. We tackle the SSP as a continuous skyline computation. Originally, the skyline computation was designed to extract interesting entries from a database with multi-attributes. The skyline points are continuously updated as the existing entries disappear and new entries arrive. In this paper, we propose a balanced jointed rooted tree (BJR-tree) algorithm and a non-dominated relation cache (ND-cache) for continuous skyline computation. The BJR-tree expresses the dominance relation as an arc and stores the “dominated” relations. The ND-cache complements the BJR-tree by reducing the recalculation of the dominance relations. The execution times of the BJR-tree and existing continuous skyline computation algorithms are compared on randomly constructed synthetic datasets with multiple temporal and spatial features. The BJR-tree is then evaluated on actually measured information of blood cells. On the two- and eight-dimensional synthetic datasets, the BJR-tree computed the continuous skylines approximately 3 and 70 times faster than LookOut, respectively. On real-world datasets, BJR-tree was approximately 2.4–3.2 times faster than LookOut.

Keywords

Algorithm Streaming application Continuous skyline computation 

Notes

Acknowledgements

This work was partially funded by ImPACT Program of Council for Science, Technology and Innovation (Cabinet Office, Government of Japan). We would like to acknowledge Dr. Lei, Dr. Ozeki, Dr. Sugimura, and Dr. Goda for providing measurement results of blood cells. We thank H. Tezuka for constructive comments.

References

  1. 1.
    Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33(4), 31:1–31:49 (2008)CrossRefGoogle Scholar
  2. 2.
    Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indexes. Acta Inf. 1(3), 173–189 (1972)CrossRefzbMATHGoogle Scholar
  3. 3.
  4. 4.
    Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, SIGMOD ’90, pp. 322–331. ACM, New York, NY, USA (1990)Google Scholar
  5. 5.
    Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: Proceedings of the 22th International Conference on Very Large Data Bases, VLDB ’96, pp. 28–39. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1996)Google Scholar
  6. 6.
    Bøgh, K.S., Assent, I., Magnani, M.: Efficient GPU-based skyline computation. In: Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN ’13, pp. 5:1–5:6. ACM, New York, NY, USA (2013)Google Scholar
  7. 7.
    Bøgh, K.S., Chester, S., Assent, I.: Work-efficient parallel skyline computation for the GPU. Proc. VLDB Endow. 8(9), 962–973 (2015)CrossRefGoogle Scholar
  8. 8.
    Böhm, C., Kriegel, H.P.: Determining the convex hull in large multidimensional databases. In: Data Warehousing and Knowledge Discovery, pp. 294–306. Springer, Berlin (2001)Google Scholar
  9. 9.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The Skyline Operator. In: Proceedings 17th International Conference on Data Engineering, pp. 421–430 (2001)Google Scholar
  10. 10.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pp. 93–104. ACM, New York, NY, USA (2000)Google Scholar
  11. 11.
    Buchta, C.: On the average number of maxima in a set of vectors. Inf. Process. Lett. 33(2), 63–65 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Carpenter, A.E., Jones, T.R., Lamprecht, M.R., Clarke, C., Kang, I.H., Friman, O., Guertin, D.A., Chang, J.H., Lindquist, R.A., Moffat, J., Golland, P., Sabatini, D.M.: Cell Profiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7(10), R100 (2006)CrossRefGoogle Scholar
  13. 13.
    Chan, C.Y., Jagadish, H., Tan, K.L., Tung, A.K., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 503–514. ACM (2006)Google Scholar
  14. 14.
    Chan, C.Y., Jagadish, H.V., Tan, K.L., Tung, A.K.H., Zhang, Z.: Finding K-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, pp. 503–514. ACM, New York, NY, USA (2006)Google Scholar
  15. 15.
    Choi, W., Liu, L., Yu, B.: Multi-criteria decision making with skyline computation. In: 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI), pp. 316–323. IEEE (2012)Google Scholar
  16. 16.
    Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of 19th International Conference on Data Engineering, pp. 717–719. IEEE (2003)Google Scholar
  17. 17.
    CYTO: CYTO2017 Image Analysis Challenge. http://cytoconference.org/2017/Home.aspx (2017)
  18. 18.
    Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inf. 4(1), 1–9 (1974)CrossRefzbMATHGoogle Scholar
  19. 19.
    Fotiadou, K., Pitoura, E.: BITPEER: continuous subspace skyline computation with distributed bitmap indexes. In: Proceedings of the 2008 International Workshop on Data Management in Peer-to-Peer Systems, pp. 35–42. ACM (2008)Google Scholar
  20. 20.
    Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. Int. J. Very Large Data Bases 16(1), 5–28 (2007)CrossRefGoogle Scholar
  21. 21.
    Graham, R.L.: An efficient algorith for determining the convex hull of a finite planar set. Inf. Process. Lett. 1(4), 132–133 (1972)CrossRefzbMATHGoogle Scholar
  22. 22.
    Guo, B., Lei, C., Kobayashi, H., Ito, T., Yalikun, Y., Jiang, Y., Tanaka, Y., Ozeki, Y., Goda, K.: High-throughput, label-free, single-cell, microalgal lipid screening by machine-learning-equipped optofluidic time-stretch quantitative phase microscopy. Cytom. A 91(5), 494–502 (2017)CrossRefGoogle Scholar
  23. 23.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD ’84, pp. 47–57. ACM, New York, NY, USA (1984)Google Scholar
  24. 24.
    Hiraki, K., Inaba, M., Tezuka, H., Tomari, H., Koizumi, K., Kondo, S.: All-IP-ethernet architecture for real-time sensor-fusion processing. In: Proceedings of the SPIE, High-Speed Biomedical Imaging and Spectroscopy: Toward Big Data Instrumentation and Management, vol. 9720, p. 97200D (2016)Google Scholar
  25. 25.
    Huang, Z., Lu, H., Ooi, B.C., Tung, A.K.H.: Continuous skyline queries for moving objects. IEEE Trans. Knowl. Data Eng. 18(12), 1645–1658 (2006)CrossRefGoogle Scholar
  26. 26.
    Jiang, Y., Lei, C., Yasumoto, A., Kobayashi, H., Aisaka, Y., Ito, T., Guo, B., Nitta, N., Kutsuna, N., Ozeki, Y., et al.: Label-free detection of aggregated platelets in blood by machine-learning-aided optofluidic time-stretch microscopy. Lab Chip 17(14), 2426–2434 (2017)CrossRefGoogle Scholar
  27. 27.
    Katayama, N., Satoh, S.: The SR-tree: an index structure for high-dimensional nearest neighbor queries. ACM SIGMOD Rec. 26(2), 369–380 (1997)CrossRefGoogle Scholar
  28. 28.
    Kim, Y.J., Patel, J.M.: Rethinking choices for multi-dimensional point indexing: making the case for the often ignored quadtree. In: CIDR, pp. 281–291 (2007)Google Scholar
  29. 29.
    Koizumi, K., Eades, P., Hiraki, K., Inaba, M.: BJR-tree: fast skyline computation algorithm for serendipitous searching problems. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2017)Google Scholar
  30. 30.
    Koizumi, K., Inaba, M., Hiraki, K.: Efficient implementation of continuous skyline computation on a multi-core processor. In: 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 52–55 (2015)Google Scholar
  31. 31.
    Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 275–286. VLDB Endowment (2002)Google Scholar
  32. 32.
    Kothuri, R.K.V., Ravada, S., Abugov, D.: Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 546–557. ACM (2002)Google Scholar
  33. 33.
    Kriegel, H.P., S hubert, M., Zimek, A.: Angle-based Outlier Detection in High-dimensional Data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pp. 444–452. ACM, New York, NY, USA (2008)Google Scholar
  34. 34.
    Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. JACM 22(4), 469–476 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Lee, J., Hwang, S.W.: BSkyTree: scalable skyline computation using a balanced pivot selection. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 195–206. ACM (2010)Google Scholar
  36. 36.
    Lee, M.W., Hwang, S.w.: Continuous Skylining on Volatile Moving Data. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ’09, pp. 1568–1575. IEEE Computer Society, Washington, DC, USA (2009)Google Scholar
  37. 37.
    Liknes, S., Vlachou, A., Doulkeridis, C., Nørvåg, K.: APSkyline: improved skyline computation for multicore architectures. In: Database Systems for Advanced Applications, pp. 312–326. Springer (2014)Google Scholar
  38. 38.
    Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the 21st International Conference on Data Engineering, ICDE ’05, pp. 502–513. IEEE Computer Society, Washington, DC, USA (2005)Google Scholar
  39. 39.
    Milder, P.: MEMOCODE 2015 design contest: continuous skyline computation. In: 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 48–51. IEEE (2015)Google Scholar
  40. 40.
    Morse, M., Patel, J.M., Grosky, W.I.: Efficient continuous skyline computation. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 108–108 (2006)Google Scholar
  41. 41.
    Oikawa, M., Hiyama, D., Hirayama, R., Hasegawa, S., Endo, Y., Sugie, T., Tsumura, N., Kuroshima, M., Maki, M., Okada, G., Lei, C., Ozeki, Y., Goda, K., Shimobaba, T.: A computational approach to real-time image processing for serial time-encoded amplified microscopy. In: Proceedings of the SPIE, High-Speed Biomedical Imaging and Spectroscopy: Toward Big Data Instrumentation and Management, vol. 9720, p. 97200E (2016)Google Scholar
  42. 42.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478. ACM (2003)Google Scholar
  43. 43.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)CrossRefGoogle Scholar
  44. 44.
    Raj, P., Raman, A., Nagaraj, D., Duggirala, S.: High-Performance Big-Data Analytics: Computing Systems and Approaches, 1st edn. Springer, Berlin (2015)CrossRefGoogle Scholar
  45. 45.
    Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. ACM Sigmod Rec. 24(2), 71–79 (1995)CrossRefGoogle Scholar
  46. 46.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)CrossRefzbMATHGoogle Scholar
  47. 47.
    Selke, J., Lofi, C., Balke, W.-T.: Highly scalable multiprocessing algorithms for preference-based database retrieval. In: Database Systems for Advanced Applications, pp. 246–260. Springer, Berlin (2010)Google Scholar
  48. 48.
    Shang, H., Kitsuregawa, M.: Skyline operator on anti-correlated distributions. Proc. VLDB Endow. 6(9), 649–660 (2013)CrossRefGoogle Scholar
  49. 49.
    Su, L., Zou, P., Jia, Y.: Adaptive Mining the Approximate Skyline Over Data Stream, pp. 742–745. Springer, Berlin (2007)Google Scholar
  50. 50.
    Tan, K.L., Eng, P.K., Ooi, B.C., et al.: Efficient progressive skyline computation. In: Proceedings of the 27th International Conference on Very Large Data Bases, vol. 1, pp. 301–310 (2001)Google Scholar
  51. 51.
    Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng. 18(3), 377–391 (2006)CrossRefGoogle Scholar
  52. 52.
    Tian, L., Wang, L., Zou, P., Jia, Y., Li, A.: Continuous monitoring of skyline query over highly dynamic moving objects. In: Proceedings of the 6th ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 59–66. ACM (2007)Google Scholar
  53. 53.
    White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the Twelfth International Conference on Data Engineering, ICDE ’96, pp. 516–523. IEEE Computer Society, Washington, DC, USA (1996)Google Scholar
  54. 54.
    Woods, L., Alonso, G., Teubner, J.: Parallel computation of skyline queries. In: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM ’13, pp. 1–8. IEEE Computer Society, Washington, DC, USA (2013)Google Scholar
  55. 55.
    Woods, L., Alonso, G., Teubner, J.: Parallelizing data processing on FPGAs with shifter lists. TRETS 8(2), 7:1–7:22 (2015)CrossRefGoogle Scholar
  56. 56.
    Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable skyline computation using object-based space partitioning. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 483–494. ACM (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Creative InformaticsThe University of TokyoTokyoJapan
  2. 2.School of Information TechnologiesUniversity of SydneySydneyAustralia
  3. 3.Department of ChemistryThe University of TokyoTokyoJapan

Personalised recommendations