Introduction

Extra virgin olive oil is the healthiest and most sought-after of all the categories of olive oil, and it fetches higher prices than other kinds of seed and nut oils. Extra virgin olive oil is a virgin olive oil which has free acidity, expressed as oleic acid, of no more than 0.8 g per 100 g, and whose other characteristics correspond to those fixed for the category in the IOC standard. Extra virgin olive oil does not undergo any refining process. Refined olive oil is the olive oil obtained from virgin olive oils by refining methods which do not lead to alterations in the initial glyceridic structure. It has free acidity, expressed as oleic acid, of no more than 0.3 g per 100 g and its other characteristics correspond to those fixed for this category in the IOC standard. This kind of olive oil may only be sold directly to the consumer if permitted in a country of retail sale [1].

Adulteration of extra virgin olive oil undermines the confidence of consumers and decreases the profits of honest producers. Therefore, it is one of the main issues for the olive oil industry and thus arises a need for analytical techniques to control the quality and authenticity of virgin olive oils. Several instrumental techniques to classify, authenticate, and control the quality of olive oils have been developed, for example, GC [2, 3], HPLC [4, 5], NMR [6, 7], NIR [9], MIR [9], FTIR spectroscopy [10, 11], Raman spectroscopy [11], fluorescence spectroscopy [12,13,14], UV-IMS [15], LIBS [16], microwave reflectometry [17], and Vis [18]. Most of these instrumental techniques require harmful reagents or expensive equipment with large operational and maintenance costs. In this context, fluorescence spectrometry is a simpler and less costly alternative. This technique has been successfully applied for the classification of honeys [19], coffees [20] oils [21], wines [22], beers [23], cereal products [24] as well for freshness estimation of such food products as fish, meat, eggs, olive oil, and rapeseed oil [25, 26].

Various statistical methods can be used for classification purposes, such as linear discriminant analysis (LDA) [20, 21], quadratic discriminant analysis (QDA) [27], regularized discriminant analysis (RDA), k-nearest neighbors (KNN) [21], support vector machine (SVM) [28], or random forest (RF) [29]. In discriminant analysis, spectra are assigned to definite classes, so that qualitative information complements quantitative spectral data. If the samples are numerous enough, they may be divided into two sets: a training set to construct the classification method (calibration stage) and a testing set to validate it (validation stage). The purpose of the classification method is to obtain weighted combinations of data that minimize variances within classes and maximize variances between classes. Then, the classification rules are used to assign new or unknown samples to the most probable subclasses. Prior to discriminant analysis, principal component analysis is often applied to spectral data sets to reduce data set size and co-linearity. The validity of a classification method can be verified by a comparison of distances or testing.

The objective of the present study is to evaluate the potential of synchronous fluorescence spectrometry for the classification of olive oil samples with respect to type (extra virgin/refined) and shelf-life condition (expired/non-expired) and to compare the accuracy of six chemometric methods for this classification purpose. To the best of our knowledge, it is the first time that the RDA and RF methods are applied to synchronous fluorescence data. Moreover, a comparison of the most frequently used classification methods for fluorescence spectra reported in the literature can greatly facilitate the selection of the appropriate method.

Experimental

Characteristics of experimental olive oil samples

A total of 82 samples of extra virgin and refined olive oils (labeled as olive oils and pomace olive oils) were acquired in the local supermarkets. It was verified that fresh samples satisfied the requirements of the IOC organization as to the free fatty acid content, fatty acids profile, and UV absorption at 232 (K232) and 270 nm (K270) [30]. The samples assigned as fresh samples were kept at room temperature in the dark and analyzed within a short time from their arrival at the laboratory. Samples intended to be analyzed as terminates had been stored in the original commercial container without strict environmental control for 18–24 months to pass the expiration date. Hereafter, these samples will be termed ‘expired’. For the four classes classification study, the following classes were considered: non-expired extra virgin olive oil (NEEV), non-expired refined olive oil (NER), expired extra virgin olive oil (EEV), and expired refined olive oils (ER) with 24, 20, 34, and 14 samples, respectively. In the three classes’ classification study, expired extra virgin and expired refined olive oils were analyzed together as expired olive oils. All reagents used in the experiment were of analytical grade.

Synchronous fluorescence spectra measurement of experimental olive oil samples

Fluorescence spectra were gathered on a Fluorolog 3–11 spectrofluorometer Spex-Jobin Yvon SA, with a xenon lamp as a source of excitation. The excitation and emission slits were 2 nm wide each. The acquisition interval was 1 nm and the integration time was 0.1 s. The excitation wavelength covered the range of 240–700 nm (461 data points). The oil samples, diluted in n-hexane (1% v/v) in a 10 mm fused-quartz cuvette, were examined by means of the right-angle geometry. Synchronous fluorescence spectra were lifted by simultaneously scanning excitation and emission monochromators, with constant wavelength intervals Δλ between the excitation and emission wavelengths. For each sample, four spectra were collected, at wavelength intervals of 10, 30, 60, and 80 nm. All measurements were performed in duplicate and reported as mean values. Plots of fluorescence intensities as a function of the excitation wavelength were made.

Statistical analysis of spectroscopic measurements

Prior to classification analysis, PCA was employed to reduce the number of variables. PCA was followed by six different classification methods: LDA, QDA, RDA, KNN, SVM, and RF. Principal components (PCs) are characterized by a decreasing variance (which is a measure of their linear information capacity), so that the first principal component explains the highest percentage of total observable variable variance. The number of principal components taken for further classification analysis was chosen on the basis of the Kaiser criterion (since PCs with eigenvalues higher than one provide more information than the average single variable) [31]. In the first test, samples were classified into one of the four classes: non-expired extra virgin, non-expired refined, expired extra virgin, and expired refined, and in the second test into three classes: non-expired extra virgin, non-expired refined, and expired olive oils. In the latter test, expired extra virgin and refined oils were combined into one group and analyzed as oils not fit for consumption.

Classification models were validated using 80–20% split validation with 100 repeats. 92 items out of all samples were 100 times randomly split into two subsets: training or calibration set (80% of all samples) and test set (20%), assuming that the content of both sets does not repeat. Each time all classification methods LDA, QDA, RDA, KNN, SVM, and RF were performed using the training set to estimate the parameters of discriminant functions, and classification error was then calculated on the basis of the test set. The process was repeated 100 times and then mean classification error (RMSEV 80:20) was calculated. There was no need to make all possible analyses with 80–20% split of 92 samples as simulations showed that RMSEV 80:20 stabilizes as the number of repeats approaches 100. All statistical analyses were carried out using R, version 3.4.1 patched, a software environment for statistical computing.

Results and discussion

Synchronous fluorescence spectra of olive oil samples

Figure 1 shows synchronous fluorescence intensities acquired for olive oil samples as a function of excitation wavelength. The intensity of fluorescence of edible oils depends on, for example, the content of tocopherols, tocotrienols, chlorophylls and pheophytins as well as phenolic compounds [32]. Non-expired extra virgin, non-expired refined, expired extra virgin, and expired refined exhibit differences in fluorescence spectra caused by the different contents of tocochromanlos, polyphenols, fatty acids, and chlorophylls [33]. A band observed in the range of 270–300 nm was traced back to the emission of tocopherols and tocotrienols, while a band in the range of 660–700 is characteristic of chlorophylls and pheophytins a and b [34]. Refining and similar processes are supposedly responsible for changes in the content and structure of the minor compounds mentioned above as well as for the conjugation of double bonds in fatty acids that enables the distinction of extra virgin from the other categories of olive oils. Extraction methods might also be responsible for some of the differences in olive oil quality, since from the same raw material, different final products are obtained [35]. The intensity and shape of synchronous fluorescence spectra of olive oils depend on the difference between the excitation and emission wavelengths. As a result of oxidation, characteristic changes in synchronous fluorescence spectra are observed: the tocopherol and chlorophyll band intensity decreases, and simultaneously, a new emission band appears in the 320–380 nm wavelength range. These findings are in line with those presented by Sikorska et al. [36].

Principal component analysis of synchronous fluorescence spectra

Fig. 1
figure 1

Synchronous fluorescence spectra of fresh and expired olive oils (diluted in n-hexane 1% v/v)

Figure 2 presents the first two PCs score plot resulting from the application of PCA to the synchronous fluorescence intensities. As can be seen, there is an overlap between the classes, especially regarding the expired extra virgin and expired refined olive oil samples. Principal component analysis (PCA) was employed for exploratory spectral analysis, and subsequently, LDA, QDA, RDA, KNN, SVM, and RF were performed. All selected PCs cumulatively account for over 96% of the total variance for fluorescence intensities measured at each wavelength interval. The number of principal components chosen for further statistical analysis according to the Kaiser [31] criterion equaled six PCs for all wavelength intervals.

Fig. 2
figure 2

First two PCs plots of principal component analysis obtained for four (4) and three (3) classes models for synchronous fluorescence intensities measured at aλ = 10 nm, bλ = 30 nm, cλ = 60 nm and dλ = 80 nm

Comparison of different classification analyses of fluorescence data

Six classification methods were applied to the principal components obtained previously by PCA. Classification analyses were carried out separately for synchronous fluorescence data acquired at each wavelength interval (10, 30, 60, and 80 nm). The visualizations of the olive oil samples in the relative coordinated systems show classification performance of all six classification methods of measured samples in four clusters: non-expired extra virgin, non-expired refined, expired extra virgin, and expired refined samples acquired for synchronous fluorescence at ∆λ = 30 nm (Fig. 3). It is interesting to note that for the group formed by expired extra virgin and expired refined olive oils, the samples are very close to each other. Perhaps, this is so because these samples contain some characteristic factor, e.g., products of oxidation. The results presented in Fig. 3 were confirmed by a comparison of classification error rates in three and four classes models obtained by six different methods, as shown in Table 1.

Table 1 Classification errors for different classification methods
Fig. 3
figure 3

Classification plots of synchronous fluorescence intensities acquired at ∆λ = 30 nm, for a LDA, b QDA, c RDA, d KNN, e SVM, and f RF

The best discrimination ability among the classification models, except for the ones obtained with the RDA method, was observed for the synchronous fluorescence measurements obtained at ∆λ = 30 nm with classification errors in the range from 5.4 to 10.4 and from 2.9 to 9.4% for four classes and three classes models, respectively. The results obtained with the RDA method were slightly better for measurements obtained at ∆λ = 80 nm and 60 nm for four classes’ and three classes’ models, respectively.

The errors obtained for the three classes models in most cases were lower than for the four classes models. Perhaps, oxidation products formed during storage are the reason why, in the group formed by expired extra virgin and expired refined olive oils, and samples are very similar to each other and should be analyzed jointly as expired oils.

Comparing the results between classification methods, it was found that the best classification results among all methods were obtained with the KNN and SVM methods with synchronous fluorescence measurements acquired at wavelength intervals of 30 nm, and classification errors of 5.4, 5.7% (four classes) and 2.9, 3.2% (three classes) for the KNN and SVM method, respectively. LDA and QDA analyses did not allow for very good classification performance with classification errors not lower than 9.4 and 8.7%, respectively. The results obtained with the RDA method were satisfactory with the lowest classification errors equaling 6.4 and 4.4% for the four classes and three classes models, respectively.

The results obtained in this study are in agreement with the findings obtained by Wu et al. [27], who established that RDA analysis always gives results equivalent to or better than LDA and QDA, and Zheng et al. [28] who achieved a comparable and very satisfying accuracy for classification of food products while using the KNN and LS-SVM methods. However, the authors have established that KNN, as opposed to the latter method, is very sensitive to the nature of spectral data. The KNN method performed rather poorly on the fruit data set.

Conclusion

The study has shown that non-expired extra virgin, non-expired refined, expired extra virgin, and expired refined olive oil samples exhibit significant differences in their synchronous fluorescence spectral patterns. The KNN and SVM methods clearly outperformed the LDA, QDA, RDA, and RF methods. Moreover, the best classification accuracy was obtained for the fluorescence intensities measured at 30 nm wavelength interval. The lowest classification rates were obtained with the KNN and SVM classification methods for measurements acquired at wavelength interval of 30 nm and equaled 5.4, 5.7% and 2.9, 3.5% for four and three classes’ models, respectively. The findings provide a technical tool for consumer protection against profit-driven deceitful practices in the food market by contributing to a better freshness control and more efficient detection of olive oil fraud.