Pattern-Based Causal Feature Extraction
This cause-effect pairs challenge was motivated by the contrast between the costs of performing controlled experiments in order to determine causality and the abundance of observational data. Our goal was to provide a value representing our confidence of causality determined by the observation data which would help identify the most promising variables for experimental verification of their causal relationship. By identifying patterns in functions that generate relevant features, a feature extraction pipeline was architected to allow for the creation of large amounts of complex features with minimal human intervention. Using this pipeline, we were able to finish second in the public leaderboard and first in the private leaderboard. Furthermore, this process by default generates over 20,000 features. In this paper, we analyze which aspects are most important, and create a new pipeline that gets comparable performance with only 324 features.
KeywordsFeature extraction Machine learning Causality
Special thanks to the organizers of the ChaLearn Cause-Effect Pair Challenge hosted by Kaggle.
- 1.Causality Workbench causality challenge #3: Cause-effect pairs - help. http://www.causality.inf.ethz.ch/cause-effect.php?page=help. Accessed: 2013.
- 2.Cause-Effect Pairs, howpublished = http://www.kaggle.com/c/cause-effect-pairs, note = Accessed: 2013.
- 3.Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.Google Scholar
- 4.Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.Google Scholar
- 5.Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.Google Scholar