In this chapter, we apply a nonnegative tensor factorization algorithm to extract and detect meaningful discussions from electronic mail messages for a period of one year. For the publicly released Enron electronic mail collection, we encode a sparse term-author-month array for subsequent three-way factorization using the PARAllel FACtors (or PARAFAC) three-way decomposition first proposed by Harshman. Using nonnegative tensors, we preserve natural data nonnegativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in thread detection and interpretation are discussed in the context of published Enron business practices and activities, and benchmarks addressing the computational complexity of our approach are provided. The resulting tensor factorizations can be used to produce Gantt-like charts that can be used to assess the duration, order, and dependencies of focused discussions against the progression of time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
E. Acar, S.A. C¸amtepe, M.S. Krishnamoorthy, and B. Yener. Modeling and multiway analysis of chatroom tensors. In ISI 2005: IEEE International Conference on Intelligence and Security Informatics, volume 3495 of Lecture Notes in Computer Science, pages 256-268. Springer, New York, 2005.
M.W. Berry and M. Browne. Email surveillance using non-negative matrix factorization. In Workshop on Link Analysis, Counterterrorism and Security, SIAM Conf. on Data Mining, Newport Beach, CA, 2005.
M.W. Berry and M. Browne. Email surveillance using nonnegative matrix factorization. Computational & Mathematical Organization Theory, 11:249-264, 2005.
M.W. Berry and M. Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM, Philadelphia, second edition, 2005.
M.W. Berry, M. Browne, A.N. Langville, V.P. Pauca, and R.J. Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 52(1):155-173, 2007.
B.W. Bader and T.G. Kolda. Efficient MATLAB computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories, Albuquerque, New Mexico and Livermore, California, December 2006. Available from World Wide Web: http://csmr.ca.sandia.gov/ ∼tgkolda/ pubs.html#SAND2006- 7592.
B.W. Bader and T.G. Kolda. MATLAB Tensor Toolbox, version 2.1. http:// csmr.ca.sandia.gov/∼tgkolda/TensorToolbox/, December 2006.
J.D. Carroll and J.J. Chang. Analysis of individual differences in multidimensional scaling via an N-way generalization of ‘Eckart-Young’ decomposition. Psychometrika, 35:283-319, 1970.
W.W. Cohen. Enron email dataset. Web page. http://www.cs.cmu.edu/∼enron/.
N.M. Faber, R. Bro, and P.K. Hopke. Recent developments in CANDECOMP/PARAFAC algorithms: a critical review. Chemometr. Intell. Lab. Syst., 65 (1):119-137, January 2003.
Federal Energy Regulatory Commission. FERC: Information released in Enron investigation. http://www.ferc.gov/industries/electric/indus-act/wec/enron/info-release.asp.
T. Grieve. The decline and fall of the Enron empire. Slate, October 14 2003. Available from World Wide Web: http://www.salon.com/news/feature/2003/10/14/enron/index\ np.html.
J.T. Giles, L. Wo, and M.W. Berry. GTP (General Text Parser) Software for Text Mining. In H. Bozdogan, editor, Software for Text Mining, in Statistical Data Mining and Knowledge Discovery, pages 455-471. CRC Press, Boca Raton, FL, 2003.
R.A. Harshman. Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, 16:1-84, 1970. Available at http://publish.uwo.ca/∼harshman/wpppfac0.pdf.
T.G. Kolda and B.W. Bader. The TOPHITS model for higher-order web link analysis. In Workshop on Link Analysis, Counterterrorism and Security, 2006. Available from World Wide Web: http://www.cs.rit.edu/∼amt/linkanalysis06/accepted/21.pdf.
T.G. Kolda, B.W. Bader, and J.P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM 2005: Proceedings of the 5th IEEE International Conference on Data Mining, pages 242-249. IEEE Computer Society, Los Alamitos, CA, 2005.
D. Lee and H. Seung. Learning the parts of objects by non-negative matrix factor-ization. Nature, 401:788-791, 1999.
B. Mclean and P. Elkind. The Smartest Guys in the Room: The Amazing Rise and Scandalous Fall of Enron. Portfolio, New York, 2003.
M. Mørup, L. K. Hansen, and S. M. Arnfred. Sparse higher order non-negative matrix factorization. Neural Computation, 2006. Submitted.
M. Mørup. Decomposing event related eeg using parallel factor (parafac). Presentation, August 29 2005. Workshop on Tensor Decompositions and Applications, CIRM, Luminy, Marseille, France.
C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park. Enron dataset. Web page, February 2006. http://cis.jhu.edu/∼parky/Enron/enron.html.
J. Shetty and J. Adibi. Ex employee status report. Online, 2005. http://www.isi.edu/∼adibi/Enron/Enron Employee Status.xls.
A. Smilde, R. Bro, and P. Geladi. Multi-Way Analysis: Applications in the Chemical Sciences. Wiley, West Sussex, England, 2004. Available from World Wide Web: http://www.wiley.com/WileyCDA/WileyTitle/productCd- 0471986917.html.
F. Shahnaz, M.W. Berry, V.P. Pauca, and R.J. Plemmons. Document clustering using non-negative matrix factorization. Information Processing & Management, 42 (2):373-386, 2006.
N.D. Sidiropoulos, G.B. Giannakis, and R. Bro. Blind PARAFAC receivers for DS-CDMA systems. IEEE Transactions on Signal Processing, 48(3):810-823, 2000.
J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, and Z. Chen. CubeSVD: a novel approach to personalized Web search. In WWW 2005: Proceedings of the 14th International Conference on World Wide Web, pages 382-390. ACM Press, New York, 2005.
G. Tomasi and R. Bro. PARAFAC and missing values. Chemometr. Intell. Lab. Syst., 75(2):163-180, February 2005.
L.R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31:279-311, 1966.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
Bader, B.W., Berry, M.W., Browne, M. (2008). Discussion Tracking in Enron Email Using PARAFAC. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. https://doi.org/10.1007/978-1-84800-046-9_8
Download citation
DOI: https://doi.org/10.1007/978-1-84800-046-9_8
Publisher Name: Springer, London
Print ISBN: 978-1-84800-045-2
Online ISBN: 978-1-84800-046-9
eBook Packages: Computer ScienceComputer Science (R0)