Abstract
Due to the recent explosion of research data based on novel scientific instruments and corresponding experiments, automatic features, in particular in data analysis, has become more essential than ever. In this paper we present a new Automatic Analysis Framework (AAF) that is able to increase the productivity of data analysis. The AAF can be used for classifications, predictions and clustering. It is built upon the workflow engine Taverna, which is widely used in different domains and there exists a large number of Taverna activities for various kinds of analytical methods. The AAF enables scientists to modify our predefined Taverna workflow and to extend it with other available activities. For the execution of the analytical methods, in particular for the computation of the results, we use our own cloud-based Code Execution Framework (CEF). It provides web services to execute problem solving environment code, such as MATLAB, Octave, and R scripts, in parallel in the cloud. This combination of the AAF and CEF enables scientists to easily conduct time-consuming calculations without the need to manually combine potential combinations of independent variables. It furthermore automatically evaluates all identified models and provides service for the scientists conducting the analysis. The framework has been tested and evaluated with real breath gas data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
IONICON PTR-TOFMS Series (2012), http://www.ionicon.com/products/ptr-ms/ptrtofms/index.html
Taverna - open source and domain independent Workflow Management System (2012), http://www.taverna.org.uk
International Association for Breath Research (IABR), http://iabr.voc-research.at (accessed December 2012)
Journal of Breath Research, http://iopscience.iop.org/1752-7163 (accessed December 2012)
Amazon: Amazon EC2 Instance Types (2012), http://aws.amazon.com/ec2/instance-types/
Bajtarevic, A., Ager, C., Pienz, M., Klieber, M., Schwarz, K., Ligor, M., Ligor, T., Filipiak, W., Denz, H., Fiegl, M., Hilbe, W., Weiss, W., Lukas, P., Jamnig, H., Hackl, M., Haidenberger, A., Buszewski, B., Miekisch, W., Schubert, J., Amann, A.: Noninvasive detection of lung cancer by analysis of exhaled breath. BMC Cancer 9(1), 348 (2009), http://www.biomedcentral.com/1471-2407/9/348
Elsayed, I., Ludescher, T., Woehrer, A., Feilhauer, T., Brezany, P.: Data Life Cycle Management and Analytics Code Execution Strategies for the Breath Gas Analysis Domain. Procedia Computer Science 9, 156–165 (2012), http://www.sciencedirect.com/science/article/pii/S187705091200138X ; Proceedings of the International Conference on Computational Science, ICCS 2012
Filipiak, W., Ruzsanyi, V., Mochalski, P., Filipiak, A., Bajtarevic, A., Ager, C., Denz, H., Hilbe, W., Jamnig, H., Hackl, M., Dzien, A., Amann, A.: Dependence of exhaled breath composition on exogenous factors, smoking habits and exposure to air pollutants. Journal of Breath Research 6(3), 036008 (2012), http://stacks.iop.org/1752-7163/6/i=3/a=036008
R Project Foundation, The R Project for Statistical Computing, http://www.r-project.org (accessed December 2012)
Houeto, P., Hoffman, J.R., Got, P., Dang, V., Baud, F.J.: Acetonitrile as a possible marker of current cigarette smoking. Hum. Exp. Toxicol. 16(11), 658–661 (1997), http://www.biomedsearch.com/nih/Acetonitrile-as-possible-marker-current/9426367.html
Eato, J.W.: Octave (2012), http://www.gnu.org/software/octave
Kepner, J.: High Performance Computing Productivity Model Synthesis. The International Journal of High Performance Computing Applications 4(18), 505516 (2004)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial intelligence, IJCAI 1995, vol. 2, pp. 1137–1143. Morgan Kaufmann Publishers Inc., San Francisco (1995), http://dl.acm.org/citation.cfm?id=1643031.1643047
Kushch, I., Schwarz, K., Schwentner, L., Baumann, B., Dzien, A., Schmid, A., Unterkofler, K., Gastl, G., Španěl, P., Smith, D., Amann, A.: Compounds enhanced in a mass spectrometric profile of smokers’ exhaled breath versus non-smokers as determined in a pilot study using ptr-ms. Journal of Breath Research 2(2), 026002 (2008), http://stacks.iop.org/1752-7163/2/i=2/a=026002
Ludescher, T., Feilhauer, T., Brezany, P.: Security Concept and Implementation for a Cloud Based E-science Infrastructure. In: 2012 Seventh International Conference on Availability, Reliability and Security, pp. 280–285 (2012)
OECD: Measuring Productivity - OECD Manual. OECD Publishing, /content/book/9789264194519-en (2001)
The MathWorks: Matlab - The Language of Technical Computing, http://www.mathworks.com/products/matlab (accessed December 2012)
Weka 3: Data Mining with Open Source Machine Learning Software in Java (2012), http://www.cs.waikato.ac.nz/~ml/weka/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ludescher, T., Feilhauer, T., Amann, A., Brezany, P. (2013). Towards a High Productivity Automatic Analysis Framework for Classification: An Initial Study. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2013. Lecture Notes in Computer Science(), vol 7987. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39736-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-39736-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39735-6
Online ISBN: 978-3-642-39736-3
eBook Packages: Computer ScienceComputer Science (R0)