1 Introduction

On July 1, 1957, Tamil Nadu Electricity Board (TNEB) came into being and has remained the energy provider and distributor for all these years. After 53 years of journey, the TNEB was restructured by establishing a holding company with the following names as per the mandatory requirements of the Electricity Act 2003. There are three separate corporations in TNEB named Tamil Nadu Generation and Distribution Corporation Limited (TANGEDCO), Tamil Nadu Transmission Corporation Limited (TANTRANSCO), and TNEB Limited.

Power scarcity is now a critical problem in Tamil Nadu because of the increasing per capita energy consumption. When the demand increases, the price or rate automatically increases. The price and demand become unsound. Electricity is much needed for everyday activities of all people . Due to power shortage, power cut has become frequent, whereas the consumer needs uninterrupted power supply. But the load (capacity) is not satisfying the demand (need) because of lack of electricity generation. So, before the construction of power plants, electricity board has to know the demand. Thus, electricity demand has to be forecasted for smooth running of all works in the society and will make use of smart grid environment.

2 Related Work

Conejo et al. [1] used the wavelet transform and AutoRegressive Integrated Moving Average (ARIMA) models to predict the day-ahead electricity price. Aggarwal et al. [2] divided each day into segments and applied a multiple linear regression model to the original series obtained by the wavelet transform depending on the segment. Equally noticeable was the approach proposed by García et al. [3], in which a forecasting technique based on a generalized autoregressive conditional heteroscedasticity (GARCH) model was presented. Transfer function models based on past electricity prices and demand were proposed to forecast day-ahead electricity prices by Nogales and Conejo [4], but the prices of all 24 hours of the previous day were not known.

Weron and Misiorek [5] presented 12 parametric and semiparametric time series models to predict electricity prices for the next day. Amjady proposed a hybrid model that combined artificial neural networks (ANN), and fuzzy logic was introduced in [6]. Recently, Pindoria and Singh [7] proposed an artificial neural network in which the output of the hidden layer neurons was based on wavelets that adapted their shape to training data. A modification of the weighted nearest neighbor (WNN) methodology is proposed by Troncoso et al. [8]. With the aim of dealing with this feature, Zhao et al. [9] proposed a data mining framework based on both support vector machines (SVM) and a probability classifier.

Gonzalez et al. [10] proposed a forecasting technique to predict next-day electricity spot prices and volatilities. Romero and Hernandez [11] suggested that the knowledge of the future events will allow us to take preventive measures. The primary goal is to realize the need for prediction of a set of data. This prediction is performed by ANN using backpropagation as learning algorithm. Saigal and Mehrotra [12] implemented four models, namely multiple regression in Excel, multiple linear regression of dedicated time series analysis in Weka, vector autoregressive model in R, and neural network model using NeuralWorks Predict. All these models are compared on the basis of the forecasting errors generated by them.

The paper is organized as follows. Sections 1 and 2 cover the introduction and related works. Section 3 describes the structure of electricity domain dataset. Section 4 describes the proposed system. Section 5 shows experimental results. Section 6 analyzes the performance evaluation and comparison. Finally, Sect. 7 concludes the work.

3 Electricity Domain Dataset Structure

Consumer numbers are generated for all consumers statewide. Regarding electricity usage, Tamil Nadu State is divided into 9 regions, as mentioned as in Table 1. People can pay electricity bill easily since it can now be paid online. Each region is divided into number of circles, each circle into sections and each section into distributions. . Distributions maintain the customer’s electricity consumption details. Consumer number is formed by the concatenation of the region number, section number, distribution number, and service number.

Table 1 Region names and their numbers

In this study, Madurai Metro Circle in Madurai region is taken as the input dataset for predicting the electricity load demand using real-time data from the year 2008 to 2013 for domestic category. In our study area, there are 206947 electricity consumers till now. This dataset contains information about consumption in terms of period and units assessed. Monthly data for domestic category were collected from the TNEB and stored as CSV format. Data for all the 60 months are available for the 5 years listed in Table 2.

Table 2 Statistical data for the period 2008–2013

4 Proposed System

Here, we forecast the electricity demand using the predictive data mining models such as support vector machine, multilayer perceptron, linear regression, and Gaussian processes. The test data are processed by the best model predicted by the train data, and the best forecast is made. The proposed system is used to find the best model, as shown in Fig. 1. The dataset is applied to this method.

Fig. 1
figure 1figure 1

Finding the best model using training data and testing data

5 Experimental Results

Weka tool divides the dataset into two sets of data called train data and test data. Train data contain 42-month data. Train data predict the next 43–66-month data using the four models. The original data, i.e., 43–60-month data, are available, which are then compared with the predicted data. The train data are used to build the model, and the test data are used to confirm the soundness of the model. The ability of the model to accurately predict the test output tells how good the model is and is a measure of model performance.

Figure 2 shows the comparison between the future predictions of training data. It reveals that the Gaussian processes model predicts demands closer to the actual demands than the other three models. But we have to confirm this decision by executing the test data. Test data range is 43–60-month data. The next 24-month data, i.e., 61–84-month data, are predicted using the four models. Figure 3 shows the comparison between the future predictions of testing data.

Fig. 2
figure 2figure 2

Comparison between the future predictions of training data

Fig. 3
figure 3figure 3

Comparison between the future predictions of testing data

6 Performance Evaluation and Comparison

All the models are compared on the basis of the forecasting measures generated by them. They are generated by Weka, and comparisons between the four models are given in Fig. 4. Forecast accuracies are mean absolute error, root-mean-squared error, and root relative squared error. Statistical dataset is implemented in the four models, and the evaluation metrics are analyzed with the accuracy parameters in graphical representation. Results show that all the models accurately predict the units going to be assessed, but Gaussian processes model outperforms the other three models. Future predictions generated by Gaussian processes are represented in Fig. 5.

Fig. 4
figure 4figure 4

Comparison between the models for mean absolute error (MSE), root-mean-squared error (RMSE), and root relative squared error

Fig. 5
figure 5figure 5

Series chart diagram of future prediction of statistical data using best model Gaussian processes

7 Conclusion

In this paper, four models, namely support vector machine, Gaussian processes, multilayer perceptron, and linear regression, are analyzed using real-time data. All these models show favorable forecasting accuracy, with Gaussian processes model outperforming the other three models. We suggest developing a repository of information for each region so that comparing data across the regions becomes easier. Other data mining tools such as R, Rapid Miner, and Orange have to be focussed for their application in the forecasting models.