Skip to main content

Pattern-Based Energy Consumption Analysis by Chaining Principle Component Analysis and Logistic Regression

  • Chapter
  • First Online:
Analytics for Smart Energy Management

Part of the book series: Springer Series in Advanced Manufacturing ((SSAM))

Abstract

It is often required to carry out sensor-based condition monitoring for machines or operations (e.g., machining centre, foundry) during production to ensure the effectiveness. Due to the requirements of a non-invasive installation or no interruption during production, however, it may be difficult to fully instrument the machine or production equipment with monitoring sensors. As an alternative to the direct monitoring, it is possible to use energy power or temperature data, and other easy-to-install sensors measured with relatively high time resolution (~2 s) to provide enough information to effectively infer events and other properties. From this reason, the ability of inferring becomes important. To introduce how the inferencing technology can be used in the energy management, this chapter presents a pattern-based energy consumption analysis by chaining Principle Component Analysis (PCA) and logistic regression. The PCA provides an unsupervised dimension reduction to mitigate the issue of multicollinearity (high dependence) among the explanatory variables, while the logistic regression does the prediction based on the reduced dataset expressed in orthogonal axes that are uncorrelated principle components represented by Eigenvectors found in the PCA. By chaining the PCA and logistic regression, it is possible to train manually time-logged energy data and to infer the events associated with the manufacturing operations. It is expected that the proposed analysis method will enable manufacturing companies to correlate energy and operations and further use the power data to predict when operation events of interest (e.g. start up, idle, peak operation, etc.) occur, resulting in determining how current energy usage levels in manufacturing operations compares to the optimal usage patterns. This chapter also provides a short instruction to Python and IPython Notebook. It illustrates a supervised learning process by using Python to carry out pipelining PCA and logistic regression and applying a grid search to training and inference energy consumption patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aguilera AM, Escabias M, Valderrama MJ (2006) Using principal components for estimating logistic regression with high-dimensional multicollinear data. Comput Stat Data Anal 50:1905–1924

    Article  MathSciNet  MATH  Google Scholar 

  • Cangelosi R, Goriely A (2007) Component retention in principal component analysis with application to cDNA microarray data. Available online: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1797006/. Accessed 18 August 2015

  • Camminatiello I, Lucadamo A (2010) Estimating multinomial logit model with multicollinear data. Asian J Math Stat 3:93–101

    Article  MathSciNet  Google Scholar 

  • Dahmus JB, Gutowski TC (2004) An environmental analysis of machining. In: ASME international mechanical engineering congress and RD&D Expo

    Google Scholar 

  • Fang K, Uhan N, Zhao F, Sutherland WJ (2011) A new approach to scheduling in manufacturing for power consumption and carbon footprint reduction. J Manuf Syst 30:234–240

    Article  Google Scholar 

  • Greene WH (2012) Econometric analysis, 7th edn. Prentice Hall, New Jersey

    Google Scholar 

  • Gutowski TC, Murphy C, Allen D, Bauer D, Bras B, Piwonka T, Sheng P, Sutherland J, Thurston D, Wolff E (2005) Environmentally benign manufacturing: observations from Japan, Europe and the United States. J Cleaner Prod 13:1–17

    Article  Google Scholar 

  • Oh S-C, Hidreth AJ (2013) Decisions on energy demand response option contracts in smart grids based on activity-based costing and stochastic programming. Energies 6:425–443

    Article  Google Scholar 

  • Oh S-C, Hidreth AJ (2014) Estimating the technical improvement of energy efficiency in the automotive industry—stochastic and deterministic frontier benchmarking approaches. Energies 9:6198–6222

    Google Scholar 

  • Oh SC, D’Arcy JB, Arinez JF, Biller SR, Hidreth AJ (2011) Assessment of energy demand response options in smart grid utilizing the stochastic programming approach. In: Proceedings of IEEE power and energy society general meeting, Detroit, MI, USA, 24–28 July

    Google Scholar 

  • Raschka S (2014) Implementing a Principal Component Analysis (PCA) in Python step by step. Available online: http://sebastianraschka.com/Articles/2014_pca_step_by_step.html. Accessed 12 August 2015

  • Scikit-Learn Machine Learning in Python. Available online: http://scikit-learn.org/stable/. Accessed 6 August 2015

  • Smith LI (2002) A tutorial on Principal Components Analysis. Available online: http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf. Accessed 11 August 2015

  • Train KE (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge, USA

    Book  MATH  Google Scholar 

  • Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86

    Article  Google Scholar 

  • Weisstein E (2014) K-Means clustering algorithm. Available online: http://mathworld.wolfram.com/K-MeansClusteringAlgorithm.html. Accessed 18 August 2015

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seog-Chan Oh .

Appendix: Getting Started with IPython Notebook for Energy Pattern Analysis

Appendix: Getting Started with IPython Notebook for Energy Pattern Analysis

5.1.1 Introducing, Getting and Installing IPython Notebook

In this chapter, this book uses Python to illustrate the process of training and inference energy consumption patterns for machining operations. Specifically, in this chapter, Scikit-Learn machine learning library is used. Scikit-Learn is a powerful tool for machine learning providing several modules for working with classification, regression and clustering problems. Technically speaking, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), is it said to have several attributes or features. It is, in general, impossible separate learning problems in a few large categories:

  • Supervised learning, in which the data comes with additional attributes that are targets for prediction. Supervised learning problem includes classification and regression. Classification is concerned with samples belonging to two or more classes and the goal is to learn from already labeled data as to how to predict the class of unlabeled data. Regression is useful in the situation if the desired output consists of one or more continuous variables. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.

  • Unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding or desired target values. The goal of unsupervised learning is to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.

This Appendix illustrates a supervised learning process by using Python to carry out pipelining PCA and logistic regression and applying a grid search to training and inference energy consumption patterns as described in this chapter. However, this chapter is not an introduction to all of Python. For further instruction about other Scikit-Learn machine learning library, see Scikit-Learn Machine Learning in Python, which is available on: http://scikit-learn.org/stable/. There are many articles available regarding the implementation of Python in programming codes. Raschka (2014) shared Python codes to implement PCA step by step. A Computer Science course offered by Stanford University, CS231n:Convolutional Neural Networks for Visual Recognition provides a tutorial for Python with a focus on programing with the help of a few popular libraries such as numpy, scipy, matplolib, which is available on http://cs231n.github.io/python-numpy-tutorial/.

IPython notebook extends the console-based approach of original Python to interactive computing, providing a web-based application. Therefore, IPython notebook lets users write and execute Python code in their web browser. IPython notebook makes it very easy to tinker with code and execute it in bits and pieces being able to use IPython notebook widely in scientific computing.

There are many way to install IPython notebook but the most convenient way is to download WinPython, which is available on https://winpython.github.io/. Once WinPython is downloaded and installed, IPython notebook can be used by activating IPython Notebook.exe as in Fig. 5.14.

Fig. 5.14
figure 14

IPython Notebook in MS Window

Once IPython is running, users should point their web browser at http://localhost:8888 to start using IPython notebooks. If everything worked correctly, a user should see a screen as in Fig. 5.15, showing all available IPython notebooks in the current directory. Note that there is Tutorial.ipynb available already.

Fig. 5.15
figure 15

IPython Notebook in operation in a web browser and its server console

If a user clicks through to the built-in notebook file, Tutorial.ipynb , the user will see a screen as in Fig. 5.16.

Fig. 5.16
figure 16

IPython Notebook cells

IPython notebook is made up of a number of cells with each cell containing Python codes. A user can execute a cell by clicking on Cell|Run or pressing Shift-Enter directly. Then, the codes in the cell will run, and the output of the cell is displayed beneath the cell as in Fig. 5.16.

5.1.2 An Introductory Python Session

The final goal of this Python tutorial is to let users familiar with Python machine learning library, that is, Scikit-Learn. For the purpose, users need to learn how to manipulate various data types of Python and some science libraries as prerequisites, for example, numpy.array and matplotlib.pyplot . Like most languages, Python has a number of basic types including integers, floats, booleans, and strings. These data types behave in ways that are familiar from other programming languages. Following examples help learn how to manipulate various data types of Python. Examples below modified and enhanced the examples, which are available on http://cs231n.github.io/python-numpy-tutorial/, to be in accordance with this book theme.

Note that Python does not provide unary increment ( x++ ) or decrement ( x−− ) operators.

Python implements all of the usual operators for Boolean logic (e.g., AND, OR, NOT, XOR) using English words like ‘and’, ‘or’, ‘not’ and ‘!=’.

Python has great support for strings, providing a bunch of useful methods, for example, upper or lower characterizing, replacing or sprintf style string formatting. Note that it does not matter to use single quotes or double quotes for string literals.

The powerful function provided by Python is built-in container types including lists, dictionaries, sets, and tuples. A list is equivalent to an array but is resizable and contains elements of different types. Accessing sublists of a list, called slicing is easy. Also, implementing conditions inside a list, called list comprehension is available.

A dictionary is an another type of container types provided by Python. It stores (key, value) pairs, similar to a look up table in other languages. By using the keys, it is efficient to iterate entries to look up values associated with the keys in a dictionary.

A set is an another type of container types provided by Python. It is used to contain an unordered collection of distinct elements. Iterating over a set has the same syntax as iterating over a list but since sets are unordered, it is not sure in which order the elements will be visited.

A tuple is an another type of container types provided by Python. It is an immutable ordered list of values and is mainly used as keys in dictionaries and as elements of sets.

Since Python is an object-oriented language, it can create a class. Inside a class, a Python functions are defined using the def keyword.

Python has a core library for scientific computing, Numpy. It provides a high-performance multidimensional array object, and tools for working with these arrays. A numpy array is a grid of values with all the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is called, rank while the shape of an array is a tuple of integers giving the size of the array along each dimension. Numpy also provides many functions to create arrays.

Transposing a matrix is simple in Numpy, using the T attribute of an array object.

Compute the sum of each column or row is simple Numpy, using the Sum function.

Basic array mathematical functions on arrays are available both as operator overloads and as functions in Numpy.

Broadcasting is a powerful mechanism in Numpy, allowing arrays of different shapes to perform arithmetic operations between them. Note that the results of using the broadcasting is the same as that of adding the vector y to each row of the matrix x with an explicit loop in the example below.

Matplotlib is a plotting library in Python. The most important function in matplotlib is plot, which allows you to plot 2D data.

SciPy is a library that builds on Numpy provides a large number of functions that are useful for different types of scientific and engineering applications. SciPy also provides some basic functions to work with images. For further instructions on image processing using SciPy, see http://www.scipy-lectures.org/advanced/image_processing/.

5.1.3 A Python Scrip for Energy Pattern Analysis

This section illustrates a supervised learning process of using Python to carry out pipelining PCA and logistic regression and applying a grid search to training and inference energy consumption patterns as described in this chapter. First, the training data sets are prepared, showing energy patterns in bitmap format and vector format corresponding to events of interest to monitor. Totally, 18 bitmap energy patterns are prepared, each with 82 pixels from which 18 numbers of 64 dimensional vectors. These vectors are used as input data for the proposed pattern based energy consumption analysis.

Once patterns for training is prepared, PCA can be performed. Note that the training dataset into six energy consumption events of interest to monitor should be clustered as described in Table 5.1 and as shown in Figs. 5.4 and 5.5. Further, the provided data must be parsed into one dimensional vector representing the energy power within the time window. Due to this reason, this section converts patterns_img which type is list into patterns which type is numpy.array . patterns has a shape of (18, 64), meaning that it has 18 numbers of vectors, each with 64 elements.

For training, each of vector in patterns should be associated with their pertinent target.

This pattern analysis approach suggests to use both Principle Component Analysis (PCA) and logistic regression by chaining them. As introduced in the previous sections, the PCA provides an unsupervised dimension reduction to mitigate the issue of multicollinearity (high dependence) among the explanatory variables, while the logistic regression does the prediction based on the reduced dataset expressed in orthogonal axes that are principle components represented by Eigenvectors found in the PCA. Therefore, the chain of PCA and logistic regression may improve the accuracy of classification.

The detailed theoretical background of using PCA and logistic regression, see Sect. 5.3. The results of applying a grid search tells that although there are possibly available 64 Eigenvectors because each input vector is 64-dimensional, the grid search found that using the first 7 Eigenvectors are optimal to compress the data efficiently and effectively. This result indicates that a final data set has 7 dimensions, which has saved the space by approximately 90 % (=(64–7)/64). Tables 5.2, 5.3, 5.4 and 5.5 explains the classes in this example.

Table 5.2 Methods and options of sklearn.decomposition.PCA used in this example
Table 5.3 Methods and options of sklearn.linear_model.LogisticRegression used in this example
Table 5.4 Methods and options of sklearn.pipeline.Pipeline used in this example
Table 5.5 Methods and options of sklearn.grid_search.GridSearchCV used in this example

Based on model parameters for PCA and logistic regression model found through the grid search through pipelining PCA and logistic regression, it is possible to infer (predict) events from their energy power profiles. Figure 5.13 shows the results of inference on energy patterns. The inference accuracy for those patterns used for training is 100 %.

This example used four classes in Scikit-Learn machine learning library that are playing the role of a big stakeholder for training and inference energy consumption patterns. Tables 5.2, 5.3, 5.4 and 5.5 give the brief summary of those classes.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Oh, SC., Hildreth, A.J. (2016). Pattern-Based Energy Consumption Analysis by Chaining Principle Component Analysis and Logistic Regression. In: Analytics for Smart Energy Management. Springer Series in Advanced Manufacturing. Springer, Cham. https://doi.org/10.1007/978-3-319-32729-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32729-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32728-0

  • Online ISBN: 978-3-319-32729-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics