Advertisement

A Method and Tool for Automated Induction of Relations from Quantitative Performance Logs

  • Joshua KimballEmail author
  • Calton Pu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11513)

Abstract

Operators use performance logs to manage large-scale web service infrastructures. Detecting, isolating and diagnosing fine-grained performance anomalies require integrating system performance measures across space and time. The diversity of these logs layouts impedes their efficient processing and hinders such analyses. Performance logs possess some unique features, which challenge current log parsing techniques. In addition, most current techniques stop at extraction leaving relational definition as a post-processing activity, which can be a substantial effort at web scale. To achieve scale, we introduce our perftables approach, which automatically interprets performance log data and transforms the text into structured relations. We interpret the signals provided by the layout using our template catalog to induce an appropriate relation. We evaluate our method on a large sample obtained from our experimental computer science infrastructure in addition to a sample drawn from the wild. We were able to successfully extract on average over 97% and 85% of the data respectively.

Keywords

Information integration Data cleaning Data extraction 

References

  1. 1.
    Du, M., Li, F., Zheng, G., Srikumar, V.: DeepLog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017)Google Scholar
  2. 2.
    Rivera, J., Meulen, R.: Gartner says beware of the data lake fallacy. Gartner (2014). http://www.gartner.com/newsroom/id/2809117
  3. 3.
    Stein, B., Morrison, A.: The enterprise data lake: better integration and deeper analytics. PwC Technol. Forecast. Rethink. Integr. 1, 18 (2014)Google Scholar
  4. 4.
    Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries (2000)Google Scholar
  5. 5.
    Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: SIGMOD Conference (2003)Google Scholar
  6. 6.
    Liu, L., Pu, C., Han, W.: XWRAP: an XML-enabled wrapper construction system for web information sources. In: Proceedings of 16th International Conference on Data Engineering (Cat. No. 00CB37073) (2000)Google Scholar
  7. 7.
    Han, W., Buttler, D., Pu, C.: Wrapping web data into XML. ACM SIGMOD Rec. 30, 33–38 (2001)CrossRefGoogle Scholar
  8. 8.
    Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: WebTables: exploring the power of tables on the web. Proc. VLDB Endow. 1, 538–549 (2008)CrossRefGoogle Scholar
  9. 9.
    Fisher, K., Walker, D., Zhu, K.Q., White, P.: From dirt to shovels - fully automatic tool generation from ad hoc data. In: POPL (2008)Google Scholar
  10. 10.
    He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS) (2017)Google Scholar
  11. 11.
    Gao, Y., Huang S., Parameswaran, A.G.: Navigating the data lake with DATAMARAN - automatically extracting structure from log datasets. In: SIGMOD Conference (2018)Google Scholar
  12. 12.
    Chu, X., He, Y., Chakrabarti, K., Ganjam, K.: TEGRA. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, New York, NY, USA (2015)Google Scholar
  13. 13.
    Cortez, E., Oliveira, D., Silva, A.S., Moura, E.S., Laender, A.H.F: Joint unsupervised structure discovery and information extraction. In: SIGMOD Conference (2011)Google Scholar
  14. 14.
    Elmeleegy, H., Madhavan, J., Halevy, A.Y.: Harvesting relational tables from lists on the web. In: PVLDB (2009)Google Scholar
  15. 15.
    Senellart, P., Mittal, A., Muschick, D., Gilleron, R., Tommasi, M.: Automatic wrapper induction from hidden-web sources with domain knowledge. In: Proceedings of the 10th ACM Workshop on Web Information and Data Management (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations