Skip to main content

Discussion Tracking in Enron Email Using PARAFAC

  • Chapter

In this chapter, we apply a nonnegative tensor factorization algorithm to extract and detect meaningful discussions from electronic mail messages for a period of one year. For the publicly released Enron electronic mail collection, we encode a sparse term-author-month array for subsequent three-way factorization using the PARAllel FACtors (or PARAFAC) three-way decomposition first proposed by Harshman. Using nonnegative tensors, we preserve natural data nonnegativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in thread detection and interpretation are discussed in the context of published Enron business practices and activities, and benchmarks addressing the computational complexity of our approach are provided. The resulting tensor factorizations can be used to produce Gantt-like charts that can be used to assess the duration, order, and dependencies of focused discussions against the progression of time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • E. Acar, S.A. C¸amtepe, M.S. Krishnamoorthy, and B. Yener. Modeling and multiway analysis of chatroom tensors. In ISI 2005: IEEE International Conference on Intelligence and Security Informatics, volume 3495 of Lecture Notes in Computer Science, pages 256-268. Springer, New York, 2005.

    Google Scholar 

  • M.W. Berry and M. Browne. Email surveillance using non-negative matrix factorization. In Workshop on Link Analysis, Counterterrorism and Security, SIAM Conf. on Data Mining, Newport Beach, CA, 2005.

    Google Scholar 

  • M.W. Berry and M. Browne. Email surveillance using nonnegative matrix factorization. Computational & Mathematical Organization Theory, 11:249-264, 2005.

    Article  MATH  Google Scholar 

  • M.W. Berry and M. Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM, Philadelphia, second edition, 2005.

    MATH  Google Scholar 

  • M.W. Berry, M. Browne, A.N. Langville, V.P. Pauca, and R.J. Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 52(1):155-173, 2007.

    Article  MATH  MathSciNet  Google Scholar 

  • B.W. Bader and T.G. Kolda. Efficient MATLAB computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories, Albuquerque, New Mexico and Livermore, California, December 2006. Available from World Wide Web: http://csmr.ca.sandia.gov/ tgkolda/ pubs.html#SAND2006- 7592.

  • B.W. Bader and T.G. Kolda. MATLAB Tensor Toolbox, version 2.1. http:// csmr.ca.sandia.gov/tgkolda/TensorToolbox/, December 2006.

  • J.D. Carroll and J.J. Chang. Analysis of individual differences in multidimensional scaling via an N-way generalization of ‘Eckart-Young’ decomposition. Psychometrika, 35:283-319, 1970.

    Article  MATH  Google Scholar 

  • W.W. Cohen. Enron email dataset. Web page. http://www.cs.cmu.edu/∼enron/.

  • N.M. Faber, R. Bro, and P.K. Hopke. Recent developments in CANDECOMP/PARAFAC algorithms: a critical review. Chemometr. Intell. Lab. Syst., 65 (1):119-137, January 2003.

    Article  Google Scholar 

  • Federal Energy Regulatory Commission. FERC: Information released in Enron investigation. http://www.ferc.gov/industries/electric/indus-act/wec/enron/info-release.asp.

  • T. Grieve. The decline and fall of the Enron empire. Slate, October 14 2003. Available from World Wide Web: http://www.salon.com/news/feature/2003/10/14/enron/index\ np.html.

  • J.T. Giles, L. Wo, and M.W. Berry. GTP (General Text Parser) Software for Text Mining. In H. Bozdogan, editor, Software for Text Mining, in Statistical Data Mining and Knowledge Discovery, pages 455-471. CRC Press, Boca Raton, FL, 2003.

    Google Scholar 

  • R.A. Harshman. Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, 16:1-84, 1970. Available at http://publish.uwo.ca/∼harshman/wpppfac0.pdf.

  • T.G. Kolda and B.W. Bader. The TOPHITS model for higher-order web link analysis. In Workshop on Link Analysis, Counterterrorism and Security, 2006. Available from World Wide Web: http://www.cs.rit.edu/∼amt/linkanalysis06/accepted/21.pdf.

  • T.G. Kolda, B.W. Bader, and J.P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM 2005: Proceedings of the 5th IEEE International Conference on Data Mining, pages 242-249. IEEE Computer Society, Los Alamitos, CA, 2005.

    Google Scholar 

  • D. Lee and H. Seung. Learning the parts of objects by non-negative matrix factor-ization. Nature, 401:788-791, 1999.

    Article  Google Scholar 

  • B. Mclean and P. Elkind. The Smartest Guys in the Room: The Amazing Rise and Scandalous Fall of Enron. Portfolio, New York, 2003.

    Google Scholar 

  • M. Mørup, L. K. Hansen, and S. M. Arnfred. Sparse higher order non-negative matrix factorization. Neural Computation, 2006. Submitted.

    Google Scholar 

  • M. Mørup. Decomposing event related eeg using parallel factor (parafac). Presentation, August 29 2005. Workshop on Tensor Decompositions and Applications, CIRM, Luminy, Marseille, France.

    Google Scholar 

  • C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park. Enron dataset. Web page, February 2006. http://cis.jhu.edu/∼parky/Enron/enron.html.

  • J. Shetty and J. Adibi. Ex employee status report. Online, 2005. http://www.isi.edu/∼adibi/Enron/Enron Employee Status.xls.

  • A. Smilde, R. Bro, and P. Geladi. Multi-Way Analysis: Applications in the Chemical Sciences. Wiley, West Sussex, England, 2004. Available from World Wide Web: http://www.wiley.com/WileyCDA/WileyTitle/productCd- 0471986917.html.

  • F. Shahnaz, M.W. Berry, V.P. Pauca, and R.J. Plemmons. Document clustering using non-negative matrix factorization. Information Processing & Management, 42 (2):373-386, 2006.

    Article  MATH  Google Scholar 

  • N.D. Sidiropoulos, G.B. Giannakis, and R. Bro. Blind PARAFAC receivers for DS-CDMA systems. IEEE Transactions on Signal Processing, 48(3):810-823, 2000.

    Article  Google Scholar 

  • J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, and Z. Chen. CubeSVD: a novel approach to personalized Web search. In WWW 2005: Proceedings of the 14th International Conference on World Wide Web, pages 382-390. ACM Press, New York, 2005.

    Chapter  Google Scholar 

  • G. Tomasi and R. Bro. PARAFAC and missing values. Chemometr. Intell. Lab. Syst., 75(2):163-180, February 2005.

    Article  Google Scholar 

  • L.R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31:279-311, 1966.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

Bader, B.W., Berry, M.W., Browne, M. (2008). Discussion Tracking in Enron Email Using PARAFAC. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. https://doi.org/10.1007/978-1-84800-046-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-046-9_8

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-045-2

  • Online ISBN: 978-1-84800-046-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics