Geochemical wolframite fingerprinting – the likelihood ratio approach for laser ablation ICPMS data
Abstract
Wolframite has been specified as a ‘conflict mineral’ by a U.S. Government Act, which obliges companies that use these minerals to report their origin. Minerals originating from conflict regions in the Democratic Republic of the Congo shall be excluded from the market as their illegal mining, trading, and taxation are supposed to fuel ongoing violent conflicts. The German Federal Institute for Geosciences and Natural Resources (BGR) developed a geochemical fingerprinting method for wolframite based on laser ablation inductively coupled plasmamass spectrometry. Concentrations of 46 elements in about 5300 wolframite grains from 64 mines were determined. The issue of verifying the declared origins of the wolframite samples may be framed as a forensic problem by considering two contrasting hypotheses: the examined sample and a sample collected from the declared mine originate from the same mine (H_{1}), and the two samples come from different mines (H_{2}). The solution is found using the likelihood ratio (LR) theory. On account of the multidimensionality, the lack of normal distribution of data within each sample, and the huge withinsample dispersion in relation to the dispersion between samples, the classic LR models had to be modified. Robust principal component analysis and linear discriminant analysis were used to characterize samples. The similarity of two samples was expressed by KolmogorovSmirnov distances, which were interpreted in view of H_{1} and H_{2} hypotheses within the LR framework. The performance of the models, controlled by the levels of incorrect responses and the empirical cross entropy, demonstrated that the proposed LR models are successful in verifying the authenticity of the wolframite samples.
Keywords
Wolframite Fingerprinting Laser ablation ICPMS Likelihood ratio approach ChemometricsIntroduction
In the eastern provinces (North Kivu, South Kivu, and Maniema) of the Democratic Republic of the Congo (DRC), ongoing violent conflicts are fuelled by illegal mining, trading, and taxation of natural resources (e.g., tin, tantalum, and tungsten, their ores, and gold). Foreign and local armed groups profit from mining activities and use the revenue from mineral trade to finance their troops [1, 2]. In 2010 the US Congress passed the DoddFrank Wall Street Reform and Consumer Protection Act and charged the Securities and Exchange Commission (SEC) to take action to address virtually all of the mandatory rulemaking provisions of the Act. Section 1502 of this Act requires USlisted companies to exercise due diligence on the traceability of socalled “conflict minerals” (coltan, cassiterite, and wolframite mined to obtain Ta, Sn, and W, respectively, and gold) or their derivatives originating from DRC or adjoining countries if these minerals are necessary for the functionality or production of their products [3]. On the one hand, the DoddFrank Act intends to reduce income from mineral trade for armed groups, but on the other hand this Act will also have great impact on regular artisanal miners whose livelihood is strongly dependent on mining of these minerals. However, recently a combination of court opinions, regulatory reversals, and legislative proposals have joined to weaken the conflict mineral regulations under Section 1502 [4]. In 2017, the European Parliament and the Council laid down supply chain due diligence obligations for Union importers of tin, tantalum, and tungsten, their ores, and gold originating from conflictaffected and highrisk areas [5].
Traceability systems for mineral supply chains are designed to (1) indicate shipments which are of reliable origin and not conflict affected, and (2) to hamper market access for illegally mined and traded ores. Within such systems each ore mineral shipment is accompanied by a document which provides information about the origin of the minerals. An analytical fingerprinting (AFP) approach has been developed at the German Federal Institute for Geosciences and Natural Resources (BGR) as a documentindependent tool to verify the declared origin of a shipment in case of doubt [6, 7, 8]. AFP can be implemented as an optional proof of origin within the framework of traceability systems.
For AFP, a sample is taken from a shipment in doubt, the sample is analyzed, and the results are evaluated by comparison with data from a reference sample database where minespecific information on ore minerals is stored. The result is a statement whether the documented origin of the shipment in doubt is credible or not.
Wolframite (Fe,Mn)WO_{4} is the most important ore mineral for tungsten in Central Africa. Tungsten is a metal of high economic importance with major applications in cutting tools as tungsten carbide, in the production of various steel grades as an alloying component, or as filaments in light bulbs. Wolframite is traded as an ore concentrate which is produced by miners at the mine site.
Recently, Gäbler et al. [8] presented an approach for the analytical fingerprinting of wolframite ore concentrates based on laserablation inductively coupled plasmamass spectrometry data, the evaluation of KolmogorovSmirnov distances of twosample comparisons, and an empirically derived decision criteria. The data from wolframite concentrates are multivariate, not normaldistributed, and due to the mining process samples cannot be regarded as representative aliquots of a population, which poses an additional challenge for data evaluation [8]. This study presents an alternative data evaluation approach based on the likelihood ratio concept (e.g., [9, 10, 11]) and is based on the nearly identical data set used by Gäbler et al. [8].

H_{1}  samples D and E come from the same source S, i.e., mine site,

H_{2}  samples D and E originate from different sources.

H_{1}  samples D and E are brother samples,

H_{2}  samples D and E are not brother samples.
One of the solutions of this issue requires comparing the similarity of samples E and D with the similarity of sample E and each individual sample X remaining in the reference database based on the samples elemental composition. First, the characteristic of samples D and X is derived by a chemometric procedure (robust principal component analysis (rPCA) combined with linear discriminant analysis (LDA), details are given below) recording the difference between them. Now the data of sample E are projected on the variable characterizing and differentiating samples D and X. The idea is that if samples D and E are brother samples, both samples should behave similar relative to each individual sample X from the reference sample database and not similar if they are not brother samples. The final conclusive stage involves deciding whether this similarity of samples E, D, and X is more likely to occur when E and D are brother samples (H_{1}) or when they are not (H_{2}). Such a problem raised in the perspective of two equivalent hypotheses, H_{1} and H_{2}, typically issued in the forensic sciences, should preferably be solved using the likelihood ratio theory of hypothesis testing [9]. The equivalence of both hypotheses stated in the LR approach remains in contrast to the willingly applied statistical tests (e.g., ttest), in which the hypotheses are not equiponderant. These tests only indicate whether the null hypothesis (on which the emphasis is put) is rejected or fails to be rejected. No conclusions can be made about the acceptance/rejection of the alternative hypothesis.
The likelihood ratio is not a probability but a ratio of probabilities, and hence it takes values between 0 and infinity. Values of the likelihood ratio above one support the H_{1}, the values below one support the H_{2}, and those equal to one support neither of the hypotheses. The higher the value of the likelihood ratio is, the stronger is the support for the H_{1} proposition. The lower the value of the likelihood ratio is, the stronger is the support for the H_{2} proposition.
Another advantage of the LR approach over other statistical tests is the consideration of the rarity of the samples’ data. This rarity is available from databases storing information about the same parameters measured for a representative set of samples. Observing similar features for both compared samples must always be carefully controlled as the match between characteristics may be just a coincidence. This danger is growing for features commonly observed in the relevant population and decreases with their increasing rarity. Thus the value of the evidence in support of the proposition that compared samples have common origin is greater when the determined values are similar and rare in the relevant population than when the physicochemical values are equally similar but common in the same population [9, 11]. The rarity considerations are unfortunately ignored in the scorebased LR models, where the similarity between characteristics of two samples is expressed by their distance. Since the distance is identically measured for rare and common data, the scorebased LR models’ virtue mainly boils down to computational efficiency. Nevertheless, the scorebased LR models still keep their superiority over other statistical tests by viewing the data from two equivalent contrasting perspectives (hypotheses).
LR is a method for commenting on the evidential value of the evidence material, which is recommended by the forensic community, including the European Network of Forensic Science Institutes [15, 16, 17, 18, 19]. The most successful application of the LR approach in the forensic sphere is found in the evaluation of DNA profiling for forensic purposes [20]. This approach has also been used in the analysis of earprints, fingerprints, firearms, and tool marks, hair, documents, and handwriting (review can be found in [9]), as well as speaker recognition [21]. An increasing number of applications of this approach is found in the evaluation of physicochemical data recorded for microtraces of glass [12, 13, 14, 22, 23, 24, 25, 26, 27], explosives [28], car paints [29, 30, 31, 32, 33], polymers [31, 32], fire debris [34], inks [35, 36], fibers [29], drugs [37, 38, 39], food samples [40, 41] and biological samples [42].
Since the work of Aitken and Lucy [10] was published, LR models have been widely developed for data sets described by a limited number of variables. Commonly analyzed evidence in the form of glass fragments characterized by their elemental composition [12, 13, 14, 22, 23] concerning only oxygen, sodium, magnesium, aluminium, silicon, potassium, calcium, and iron, may serve as an example. Similar to most of the statistical methods, classic, socalled featurebased LR suffers from the curse of dimensionality when dealing with highly multidimensional data, being currently a domain of most of the analytical techniques outcomes. Moreover, difficulties emerge when the data are not normally distributed within each sample and their variance structure becomes complex. This may be the case when dispersion of data within each sample and for the samples from the same source (e.g., mine site) is comparable to the dispersion of data for samples from different sources. Some strategies for dealing with the multidimensionality have been proposed in [31, 32] for infrared and Raman spectra. They engage chemometric tools for reducing data dimensionality by studying various sources of variability and extracting the most relevant information in the form of a few latent variables. The outcomes of the chemometric techniques are then incorporated in what is referred to as hybrid LR models [31, 32]. The issues of the lack of normality and significant withinsample data dispersion have not been tackled yet. However, some strategies have been studied recently for keeping the proper relation of the within and betweensamples variability, which is easily violated by the applied chemometric tools for reducing data dimensionality.
The multidimensionality and lack of data normality within each sample is not regarded to be an obstacle in the scorebased LR models. These models maximally reduce data dimensionality to only a single score describing two compared samples. The score, which is for instance the distance between two samples characteristics, is then interpreted in the light of two hypotheses, H_{1} and H_{2}. In the scorebased LR models constructed for the examined wolframite data, the score is the similarity metric between the questioned sample, the sample from the declared mine site, and each of the remaining samples collected in the database. These similarity metrics must be significantly different for brother and nonbrother samples. This is possible only when the distances are computed in the space defined by the variables that well differentiate between samples from different locations and effectively group brother samples. Thus the dispersion of data for brother samples should be kept much lower than for nonbrother samples. This is easily achieved using chemometric tools optimally separating classes (or samples if each sample is regarded as a class), such as linear discriminant analysis (LDA). The only requirements of LDA are the reduction of data dimensionality and the need to deal with a nonnormal distribution within each sample. Even then, when care must be taken to work with normally distributed data and reduce their dimensionality, the use of scorebased LR models is not purposeless. This is because scores provide an improved description of the similarity between samples and consequently better enable the decision of whether they are brothers or not than conventional, featurebased LR models.
Thus the aim of this work is to demonstrate that hybrid scorebased likelihood ratio models are capable of verifying the authenticity of wolframite concentrate origins declared in the official documents. The issue is tackled with the combination of chemometric tools and the LR approach in the form of hybrid LR models [31, 32]. They utilize various chemometric techniques for (1) reducing data dimensionality, and (2) dealing with different aspects of database structure, i.e., lack of normality and significant dispersion arising from huge ranges of elements content observed within each sample and between them. The models engaged (i) robust variant of principal component analysis for reducing data dimensionality [43, 44, 45], (ii) linear discriminant analysis (LDA) [43] for finding the directions that capture the differences between samples, and (iii) KolmogorovSmirnov distance [46] for expressing their similarity, which, as a score, was then viewed within the LR framework.
Materials and methods
Samples, sample preparation, and analysis
Throughout this study, a sample is referred to as an aliquot of an ore concentrate which contains several hundred or several thousand individual mineral grains. The majority of those grains are wolframite grains if a good ore concentrate is obtained. Sample properties in terms of distributions of element concentrations in wolframite are obtained from about 40 to 50 individual wolframite grains of a sample.
For analysis a polished section is prepared for each sample. Wolframite grains are identified by scanning electron microscopy and analyzed by laserablation inductively coupled plasmamass spectrometry. Details on sample preparation, grain identification, and grain analyses are given by Gäbler et al. [8].
The database used for this study consists of information on elemental composition of 104 wolframite samples and is nearly identical to the database used by Gäbler et al. [8]. The wolframite ore concentrate samples originate from 45 different mine sites from 10 countries worldwide, with special emphasis on Central Africa (30 mine sites). In total, 5327 wolframite grains have been analyzed for the elements Mg, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, As, Sr, Y, Zr, Nb, Mo, Ag, Cd, In, Sn, Sb, Ba, La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Hf, Ta, Tl, Pb, Bi, Th, and U. There were 105 pairs of brother samples (samples coming from the same mine site) and 4972 pairs of nonbrother samples (samples coming from different mine sites).
LR models construction protocol
Scorebased LR models successfully distinguish samples only if the characteristics among brother samples are much less dispersed than the characteristics between nonbrother samples. As will be evidenced in “Descriptive statistics” section, the dispersion of the data within brother samples is for many elements basically comparable to the dispersion of data observed for the nonbrother samples. Moreover, the distributions of data within each sample cannot be considered normal and the number of variables needs to be reduced. Thus the key to build the appropriate LR models for making inferences whether the samples are brothers or not is first by reducing data dimensionality and dealing with the lack of normality, and second by finding the most informative variables with the best discrimination power, which uniquely characterize each mine site and well differentiate each from the others. Thus maximizing the similarity of the brother samples and minimizing the similarity of the nonbrother samples is of crucial importance.
First, the original variables were logtransformed for reducing huge data ranges (even 6 orders of magnitude). Then robust PCA (rPCA) [43, 44, 45] was applied with the aim of data mining to explore and find patterns in a multivariate dataset containing many extreme values. Its principle is to expose such projections of the original data that maximize their variation in a few components and hence reduce the number of variables. In rPCA robust measures of location and dispersion (namely median and median absolute deviation (MAD) [43]) are used to autoscale the data so that the variables introduce equal amount of variation and neither is favored. The autoscaling formula is expressed as z_{ ij } = (x_{ ij } − median(x_{ i }))/MAD(x_{ i }) where: x_{ ij } is the jth observation of ith variable; median(x_{ i }) and MAD(x_{ i }) are the median and median absolute deviation of the ith variable. The utmost advantage of the algorithms for rPCA is that they seek for the directions along which the robust measure of spread (MAD) is maximized. This ensures that the creation of the PCA space is minimally affected by extreme values since robust measures of dispersion are resistant to them.
Even though in many cases PCA is reported as sufficient for visualizing data and finding the grouping patterns, the method, when applied to the entire database, was much more successful in catching the significant withinsamples variability instead of the variability responsible for the differences between samples. Consequently, the first few PCs carrying the highest part of variability usually did not address the part of information associated with the discrepancies between samples as illustrated schematically in Fig. S1 in the Electronic Supplementary Material (ESM). Thus, instead of applying the rPCA to the entire database, it was then used for reducing data dimensionality for pairs of samples D and each of its nonbrother samples available in the database (X_{f}, with f = 1 to k_{ D }, k_{ D }number of nonbrother samples of D in the database) to the number of components explaining 95% of MAD^{2}. Thanks to this treatment it was easier to handle the problem with huge dispersion within and between samples for a pair of samples than for the entire database.
Possible configurations of brother samples (B) and nonbrother samples (nB) and the KolmogorovSmirnov distance (KSD) values they generate
Case  D and X_{f} ^{a}  D and E ^{a}  X_{f} and E ^{a}  KSD(ED)^{b}  KSD(EX_{f}) ^{b}  ΔKSD ^{b}  Considered under 

I  B  B  nB  impossible    
II  B  nB  B  impossible    
III  nB  B  B  impossible    
IV  B  B  B  ↓  ↓  ~0   
V  nB  nB  nB  ↑  ↑  ~0  H_{d} 
VI  nB  nB  B  ↑  ↓  >0  H_{d} 
VII  nB  B  nB  ↓  ↑  <0  H_{p} 
VIII  B  nB  nB  ↑  ↑  ~0   
For a single case when the source of sample E is declared as common with the location of sample D, k_{ D } ΔKSD values (\( \Delta {\mathrm{KSD}}_{{\mathrm{EDX}}_1} \), ..., \( \Delta {\mathrm{KSD}}_{{\mathrm{EDX}}_{\mathrm{kD}}} \)) were produced. All these k_{ D } ΔKSD values must be integrally and globally interpreted in the context of H_{1} and H_{2} for commenting whether E and D come from the same source or not. Unfortunately, incorporating all k_{ D } ΔKSD values at once for producing a single LR value is not feasible since the LR is computed only for a single value (as in Fig. 1); hence, each value generates a single LR. Thus, dealing with a set of k_{ D } ΔKSD either results in receiving k_{ D } LR values or in one LR value when all k_{ D } ΔKSD are somehow aggregated in a single number and subsequently interpreted within the LR framework.
 (a)
the distribution of ΔKSD for random selection of brother samples (distribution considered under H_{1}) and the distribution of k_{ D } ΔKSD obtained for the studied set of D, E and k_{ D } samples X (Fig. 3a),
 (b)
the distribution of ΔKSD for random selection of nonbrother samples (distribution considered under H_{2}) and the distribution of k_{ D } ΔKSD obtained for the studied set of D, E, and k_{ D } samples X (Fig. 3a).
In the first model, referred to as ΔKSDAR (Fig. 3a), the ratio of both areas (AR) was computed to indicate which of the hypotheses is supported. It should exceed 1 when E and D are brother samples and should remain below 1 for nonbrother samples. Though it may appear that this is an LR approach, it is not. This is a consequence of the fact that conventional LR models are computed as a ratio of probability density functions, not the areas below the probability density curves. Thus the proper LR model (denoted as ΔKSDARLR; Fig. 3) was developed in which the sets of common areas ratios received when E and D samples are brothers and when they are not, are stored to find the distributions under H_{1} and H_{2}, respectively. Then for the studied set of E and D samples and all k_{ D } X samples the areas’ ratio is computed and interpreted in the context of the modeled distributions under H_{1} and H_{2}.
Where: y  the common areas ratio (AR) under assessment for E, D, and k_{ D } X samples, c^{2}_{1}, c^{2}_{2}  variances of the m_{1} and m_{2} common areas ratios (iterated x_{1i}, x_{2i}) considered under H_{1} for numerator and H_{2} for denominator, respectively, h_{1}, h_{2}  smoothing parameter for a single variable (p =1) \( {h}_g={h}_{opt}={\left(\frac{4}{m_g\left(2p+1\right)}\right)}^{\frac{1}{p+4}} \), (g=1 – for numerator, 2 – for denominator) [47].
Measure of performance
Validation scheme
Separate sets of training data for building up the rPCA space, finding LDA direction (t), and for modeling the ΔKSD distributions were implied. It is worth emphasizing that the training sets are composed of randomly selected grains from each sample. Thus the dispersion of the data subset after the random selection is kept at the same level as observed for the entire database.
The process of model construction and testing its performance is repeated for several training and test sets. The procedure is applied for averaging the results and making the conclusions resistant and robust towards the cases when the selection of the grains is not representative enough and delivers extremely high or low rates of false responses.
 (a)
set A consisting of 2b pairs of D and E samples (b when E and D are brothers and b when they are not), each pair with k_{ D } X_{f} samples (Table 1), for computing 2b∑k_{ D } ΔKSD values. These ΔKSD values are used for modeling the distributions when E and D are brothers and when they are not (black distributions in Fig. 3a, each composed of b∑k_{ D } ΔKSD values);
 (b)
set B consisting of 2Z pairs of D and E samples (Z when E and D are brothers and Z when they are not), each pair with k_{ D } X_{f} samples (Table 1), for computing 2Z∑k_{ D } ΔKSD values; 2Z sets of k_{ D } ΔKSD values each for 2Z pairs of E and D samples will be used for computing 2Z common areas ratios (AR) with distributions of set A when E and D are brothers and when they are not. The distribution of k_{ D } ΔKSD values for one of 2Z pairs of E and D samples is shown in green in Fig. 3a. Then the areas taken for computing ratios are illustrated in orange in Fig. 3a. These AR values are used for estimating the levels of false positive answers (when AR should be lower than unity but it demonstrates values above 1) and false negative rates (when AR should exceed unity but it is does not reach 1).
 (c)
set C, which is constructed likewise as in set A. These ΔKSD values are used for modeling the distributions when E and D are brothers and when they are not (black distributions in Fig. 3a, each composed of b∑k_{ D } ΔKSD values);
 (d)
set D consisting of 2Z pairs of D and E samples (Z when E and D are brothers and Z when they are not), each pair with k_{ D } X_{f} samples (Table 1), for computing 2Z∑k_{ D } ΔKSD values. 2Z sets of k_{ D } ΔKSD values each for 2Z pairs of E and D samples will be used for computing 2Z common areas ratios (AR) with distributions of set C when E and D are brothers and when they are not. The AR value for one of 2Z pairs of E and D samples is shown as a green line in Fig. 3b. The ARs are interpreted under H_{1} and H_{2} [distributions generated in (b)] to give LR. These LR values are used for estimating the levels of false positive answers (when LR should be lower than unity but it demonstrates values above 1), false negative rates (when LR should exceed unity but it is does not reach 1) and producing empirical cross entropy curves.
For ΔKSDARLR there must be 2b+2Z pairs of brother D and E samples and 2b+2Z nonbrother D and E samples. There is a limit of 210 pairs of brother D and E samples, thus b was arbitrarily set as 30, 50, 65, and Z as 40, so that it exploits the database quite efficiently (2∙65 + 2∙40 = 210). Test and training sets were developed s = 10 times for averaging results. For ΔKSDAR model there must be b + Z pairs of brother D and E samples and b + Z nonbrother D and E samples. The limit of 210 brother samples still holds, thus b was arbitrarily set as 60, 120, 170, and Z as 40 (170 + 40 = 210).
False positive and false negative answers
The performance of the proposed models was initially evaluated by estimating the levels of false positive responses for a set of Z nonbrother samples and false negative responses for a set of Z brother samples, randomly selected in B set for ΔKSDAR model and D set for ΔKSDARLR model. False positive answers are observed when AR > 1 or LR > 1 for samples coming from different sources, which should yield AR < 1 or LR < 1. False negative answers are received when AR < 1 or LR < 1 for samples sharing the same origins, which should yield AR > 1 or LR > 1.
Empirical cross entropy approach
Empirical cross entropy (ECE) [11, 48, 49] is a procedure that allows the assessment of the qualitative and the quantitative aspect (strength of the support) of the model performance.
ECE is based on the idea of rewarding and penalizing the obtained LR values. The penalty is defined by logarithmic strictly proper scoring rules (if H_{1} is true: −log_{2}(Pr(H_{1} E)), if H_{2} is true: −log_{2}(Pr(H_{2} E))) and grows with stronger support for the incorrect hypothesis.
 (a)
Observed curve (solid red) – represents the ECE values calculated in accordance with equation (5) for LR values subjected to the evaluation.
 (b)
Calibrated curve (dashed blue) – corresponds to the ECE values calculated for the LR values which have been transformed with the use of a pool adjacent violators (PAV) algorithm [48, 49]. The calibrated curve serves as an indicator of the LR values with the best performance of all LR sets that offer the same discriminating power.
 (c)
Null or reference curve (dotted black) – refers to the situation in which no evidential value is assigned to the data, i.e., LR = 1. Always being the same, the null curve should be treated as a reference.
The performance of the chosen LR method can be evaluated through ECE plot analysis, where the observed curve can be assessed in terms of its position with respect to the calibrated and null curves. Figure 4 presents two ECE plots for LR models with satisfactory (Fig. 4a) and poor (Fig. 4b) performances. The arrows indicate how much information is unexplained by each model. In other words, they demonstrate the uncertainty about the correct hypothesis that remains when using particular LR model. For the satisfactory LR model the observed curve lies in between calibrated and null lines and points out some reduction of information loss. Such a reduction of information loss resulting from the employed LR method can be represented by the ECE value from the observed curve for the point of log_{10}Odds(H_{1}) = 0, which is referred to as \( {C}_{llr}^{exp} \) value. Likewise, the value denoted as \( {C}_{llr}^{\mathrm{min}} \) refers to the same point, but with respect to the calibrated curve. For the example shown in Fig. 4a, there is ca. 23% of information that is still unexplained by the model; hence, the reduction of information loss reaches 100% – 23% = 77%. For the LR model with poor performance, the observed curve exceeds the null curve, indicating that using such a model for data evaluation may end up in delivering more misleading information than when assuming that the data do not support any of the hypotheses (LR = 1 as in the null method illustrated by dotted black curve).
ECE approach was applied for controlling the performance of the LRbased model, i.e., ΔKSDARLR. The ΔKSDAR model is accomplished with the area ratio only, which just indicates which hypothesis is supported, but does not give the strength of the support towards the hypotheses.
Software
The scripts were prepared in R programming language [50] using pcaPP and MASS packages.
Results and discussion
Descriptive statistics
The data matrix consists of concentrations of 46 elements (Mg, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, As, Sr, Y, Zr, Nb, Mo, Ag, Cd, In, Sn, Sb, Ba, La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Hf, Ta, Tl, Pb, Bi, Th, and U) in 5327 wolframite grains analyzed by LAICPMS. In LAICPMS, limits of detection (LOD) are obtained individually for each element in each grain and depend on the day to day performance of the instrument. For each element the results below LOD have been replaced by the median value of all elementspecific LODs.
Examples of distributions of selected element concentrations in wolframite ore concentrates from different mine sites. Capital letters represent different mine sites. A1 and A2 represent two ore concentrates independently taken from the same mine site
Element  Zn [mg kg^{1}]  As [mg kg^{1}]  Lu [mg kg^{1}]  Pb [mg kg^{1}]  

Percentile  10^{th}  50^{th}  90^{th}  10^{th}  50^{th}  90^{th}  10^{th}  50^{th}  90^{th}  10^{th}  50^{th}  90^{th} 
Rwanda  
A1  11  22  311  8.7  28.0  57.6  2.0  4.9  8.2  50.4  72.7  122.0 
A2  7  13  151  14.1  35.4  94.1  0.9  5.2  9.4  44.5  70.5  110.0 
B  16  25  109  153.1  546.6  2895.8  1.8  4.4  10.3  51.1  78.3  119.3 
C  12  29  216  20.6  95.5  840.4  2.7  3.8  5.6  54.1  116.1  214.9 
D  45  51  63  <0.3  <0.3  0.5  0.0  0.1  0.2  <0.2  0.9  2.9 
DR Congo^{a}  
E  67  113  156  <0.3  <0.3  <0.3  0.2  9.4  19.2  <0.2  <0.2  0.9 
F  138  159  213  <0.3  <0.3  1.1  0.2  0.3  0.7  0.5  2.8  13.9 
G  96  167  219  <0.3  <0.3  <0.8  3.5  8.3  18.8  <0.2  <0.2  <0.2 
H  142  226  1375  0.3  0.5  1.9  0.3  0.4  0.9  3.0  6.8  20.0 
Australia  
I  47  73  135  <0.3  12.2  162.6  68.9  186.2  423.6  0.4  20.6  291.4 
K  124  137  159  <0.3  <0.3  <0.3  0.2  0.3  0.9  <0.2  0.5  1.3 
 (a)
uMAD^{ 2 }  the withinsamples MAD^{2} computed as a median of the MAD^{2} within each of the samples,
 (b)
uMAD_{ B }^{ 2 }  the MAD^{2} within brother samples computed as a median of the MAD^{2} estimated within the sets combined of brother samples,
 (c)
cMAD_{ nB }^{ 2 }  the MAD^{2} between nonbrother samples (different mine sites) computed as the MAD^{2} of the medians representing each mine site,
 (d)
cMAD_{ B }^{ 2 }  the MAD^{2} between brother samples computed as the median of the MAD^{2} of the medians representing each sample within each of the brother sample sets.
 (i)
The dispersion of data between brother samples, cMAD_{ B }^{ 2 }, is much lower than the dispersion between nonbrother samples, cMAD_{ nB }^{ 2 }. This result is advantageous from the perspective of LR models, which easily differentiate nonbrother samples and detect brother samples, when the similarity of the data observed for brother samples is greater than the similarity of the data for nonbrother samples.
 (ii)
The withinsamples dispersion, uMAD^{ 2 }, and dispersion within brother samples, uMAD_{ B }^{ 2 }, are comparable, but much greater than the variability between brother samples, cMAD_{ B }^{ 2 } (which is hardly visible in the plot in Fig. 5b). This proves that the collective variability in the data for all of the brother samples is well reflected in the data recorded for a single sample. This is a promising statement, which confirms that despite brother samples being collected as separate samples from a single mine site, their data variability is still kept on the level observed for the grains collected as one sample. The latter observation is quite surprising and clearly points out huge variability of the data within each sample. Both uMAD^{ 2 } and uMAD_{ B }^{ 2 } are computed using all the measurements (i.e., grains) recorded for each sample. Contrary to that, cMAD_{ B }^{ 2 } is estimated from the medians representing the measurements recorded for each sample. Thus the contribution of the data dispersion within each sample is not accounted for in cMAD_{ B }^{ 2 }. This is the reason for observing cMAD_{ B }^{ 2 } lower than uMAD^{ 2 } and uMAD_{ B }^{ 2 }.
 (iii)
The desired relation, i.e., lower dispersion of data between brother samples than between nonbrother samples, is only observed when the samples are described by their medians, summarizing all measurements recorded for samples grains. Then the significant dispersion of these measurements is not accounted for and the nonbrother samples become less similar than the brother samples.
Even though working with medians sounds like a solution to the problem, generalizing a sample’s data to a single number may be regarded as a loss of information. For this reason, the proposed LR models are constructed for pairs of samples instead of accounting for the entire database. Then the huge dispersion within each sample is easily managed using e.g., LDA. Another issue concerns lack of normality of the data within each sample and their multidimensionality, which should be sorted out to enable LDA. To handle these problems, rPCA was applied on the logdata to reduce data dimensionality by studying all variables at once. LDA was then used to find the direction that captures the differences between samples and is supposed to demonstrate greater similarity between brother samples than between nonbrother samples. Finally the similarity between samples was expressed by KolmogorovSmirnov distances.
Models performance
The undesirable shape of the ECE curves, which go beyond the neutral (null) curve for some ranges of the logarithm of the prior odds, i.e., log_{10}Odds(H_{1}), was studied indepth in order to determine whether this model truly yields poor performance, or this statement is just exaggerated as it may be caused by only a single sample delivering strong misleading support towards the incorrect hypothesis. It appears that in most cases the deteriorated curvature of the ECE plots is the consequence of generating only few LR values that support the incorrect hypothesis (usually H_{2}) much stronger than the remaining values support the correct hypothesis (usually H_{1}). This drawback of the ECE plots forces the researcher to be careful when the performance of the models assessed by ECE approach appears to be poor.
Observable differences between the experimental (known also as observed) and calibrated curves point out that there still exist some opportunities for developing the proposed methodology for receiving more reliable outcomes.
Casework example
 (i)
Five samples from the wolframite trading chain with reliable source documents (origin M) were used as evidence samples E. The database comprises nine reference samples from mine site M which were regarded as D samples. The arising question was whether E samples really came from the declared source (mine site M). Put in other words, whether E and D were brother samples (H_{1}) or not (H_{2}). To answer this query, there were 9∙5 = 45 pairs of E and D samples tested. Since they are labeled as brother samples (H_{1}), they are supposed to deliver LR or AR greater than 1.
 (ii)
For comparison, with the results obtained from (i) 45 nonbrother pairs of samples coming from two different mine sites were selected by chance from the database. One sample of each pair was treated as sample E, the other one as sample D. LR or AR below 1 (H_{2}) were expected for these comparisons.
Both developed models are applied on the casework data. Unfortunately, they cannot be directly compared with regard to the strength of the support towards the hypotheses. This is a consequence of the fact that LR value is not obtained by the ΔKSDAR model, contrary to the ΔKSDARLR model.
The area ratios (AR) obtained for the sample pairs in case (i) are all above 1 (or 0 on the log scale) and oscillate around 2. The ΔKSDARLR model supports the H_{1} quite strongly, though there are a few false negative responses. They, however, support the incorrect hypothesis only moderately and are therefore rather incidental.
For nonbrother pairs in case (ii) hypothesis H_{2} is supported for the majority of the nonbrother pairs, but there are a few outcomes observed that misleadingly suggest that the samples originate from the same mine site, although they truly come from different sources. However, it is observable as well that these results for the ΔKSDARLR model do not support the incorrect hypothesis H_{1} strongly and that the support is comparable to the support for the incorrect hypothesis H_{2} generated for brother samples.
This example clearly illustrates that the models place an emphasis on minimizing the levels of false negative answers considered when the samples are brothers. This seems quite important for real casework, where accusing a person or company of declaring the wrong origin of a wolframite delivery in a situation when the declared origin is actually true, should always be avoided. Conversely, the reverse situation, when the fact finder is deceived about the origins of wolframites, has no legal consequences and simply allows the deception to go undetected in that instance. For this reason, the levels of false negative rates must be strictly controlled while it is acceptable for the levels of false positive answers to be slightly greater.
Conclusions
The research presented herein addresses the issue of verifying the authenticity of the declared origins of wolframite samples based on their elemental composition determined by LAICPMS. In the case of a database with multivariate data, huge dispersion of the samples, and clearly notnormal distribution of the data, the evaluation of the evidential value can be supported by using hybrid likelihood ratio models that take the best from the chemometric tools and smartly apply the results within the LR framework. The robust PCA and LDA used in this study are applied to efficiently reduce data dimensionality and extract the features that maximally differentiate between samples coming from different mine sites (nonbrother samples). A scorebased LR model that incorporated similarity metrics like the KolmogorovSmirnov distance (KSD) into the likelihood ratio approach was developed to conclude whether a sample in question with a declared origin and a reference sample (truly coming from the declared location) are brother samples or not.
Two models called ΔKSDAR and ΔKSDARLR were proposed. The ΔKSDAR model used the ratio of the common areas of distributions of similarity metrics found for the sample in question (E) compared with its reference sample (D) and typical brother or nonbrother samples, respectively. The ΔKSDARLR model extended this model by coupling it with the likelihood ratio approach. Then it was possible not only to conclude which hypothesis was supported (as in ΔKSDAR model), but also to express the strength of such support.
Both models deliver acceptable results with false positive and false negative rates oscillating around 10%–15%. ΔKSDARLR model significantly reduces information loss expressed by the empirical cross entropy curves. The only drawback of the ΔKSDAR model relates to its accomplishment with the ratio, which cannot be treated directly as LR. The advantage of the ΔKSDARLR model is the fact that its performance can be objectively assessed by the ECE approach stressing the magnitude of the support towards each of the hypotheses. In a casework example, both models were tested successfully, confirming the brother nature of reliable samples from the trading chain relative to their respective reference samples.
The evaluation of the models performance indicates that the levels of false negative rates are minimized in regard to the false positive rates. This allows for avoiding the situation in which true declared origins of samples are regarded as spurious and the declaring person or company is recognized as a liar. This remains in contrast to the typical forensic issues where an emphasis is put on lowering the levels of false positive rates, leading to accusation of an innocent person or company. This is because in the wolframites case innocence means finding two samples supporting the H_{1} (stating that they come from the same source), whilst in the forensic science innocence involves finding e.g., two pieces of evidence as coming from different sources, hence in support for the H_{2}.
The proposed models have been developed for the conflict mineral wolframite. They also work for other minerals that are traded as ore concentrates like coltan or cassiterite because, just like wolframite, those minerals are not chemically modified at the mine site and keep their chemical signature during trade down to the smelter/metal refinery. The application of the proposed models on minerals like heterogenite or gold, which are often chemically modified at the mine site, seems to be more difficult as the chemical modification might change the characteristic geochemical signature of the mined ore.
Notes
Compliance with ethical standards
The authors declare that they have no conflict of interest.
Supplementary material
References
 1.United Nations Security Council. Final report of the Group of Experts on the Democratic Republic of the Congo, United Nations; 2016 S/2016/466.Google Scholar
 2.Vogel C, Raeymaekers T. Terr(it)or(ies) of Peace? The Congolese Mining Frontier and the Fight Against „Conflict Minerals”. Antipode. 2016;48(4):1102–1121.Google Scholar
 3.DoddFrank. Wall Street Reform and Consumer Protection Act: United States Securities and Exchange Commission (SEC), H.R. 4173, Public Law 111203. 111th Cong., 849. 2010.Google Scholar
 4.Horvath J. Latest Updates in Conflict Minerals Law. Lexology. 2017; Available at:https://www.lexology.com/library/detail.aspx?g=d27d2d5fdf96450683024b2958cb92a4.
 5.Regulation (EU) 2017/821 of the European Parliament and of the Council of 17 May 2017 laying down supply chain due diligence obligations for Union importers of tin, tantalum, and tungsten, their ores, and gold originating from conflictaffected and highrisk areas. O. J. 2017; L130: 19.5.2017.Google Scholar
 6.Gäbler HE, Melcher F, Graupner T, Bahr A, Sitnikova MA, HenjesKunst F, Oberthür T, Brätz H, Gerdes A. Speeding Up the Analytical Workflow for Coltan Fingerprinting by an Integrated Mineral Liberation Analysis/LAICPMS Approach. Geostand Geoanal Res. 2016;35(4):431–48.CrossRefGoogle Scholar
 7.Gäbler HE, Rehder S, Bahr A, Melcher F, Goldmann S. Cassiterite fingerprinting by LAICPMS. J Anal At Spectrom. 2013;28(8):1247–55.CrossRefGoogle Scholar
 8.Gäbler HE, Schink W, Goldmann S, Bahr A, Gawronski T. Analytical Fingerprint of Wolframite Ore Concentrates. J Forensic Sci. 2017;62(4):881–8.CrossRefGoogle Scholar
 9.Aitken CGG, Taroni F. Statistics and the evaluation of evidence for forensic scientists. 2nd ed. Chichester: Wiley; 2004.CrossRefGoogle Scholar
 10.Aitken CGG, Lucy D. Evaluation of trace evidence in the form of multivariate data. J Royal Stat Soc Series C (Applied Statistics). 2004;53:109–22.CrossRefGoogle Scholar
 11.Zadora G, Martyna A, Ramos D, Aitken CGG. Statistical analysis in forensic science evidential values of multivariate physicochemical data. Chichester: John Wiley and Sons; 2014.Google Scholar
 12.Aitken CGG, Zadora G, Lucy D. A twolevel model for evidence evaluation. J Forensic Sci. 2007;52:412–9.CrossRefGoogle Scholar
 13.Zadora G, Neocleous T. Likelihood ratio model for classification of forensic evidences. Anal Chim Acta. 2009;64:266–78.CrossRefGoogle Scholar
 14.Zadora G. Classification of glass fragments based on elemental composition and refractive index. J. Forensic Sci. 2009;54:49–59.CrossRefGoogle Scholar
 15.Evett IW, Jackson G, Lambert JA, McCrossan S. The impact of the principles of evidence interpretation and the structure and content of statements. Sci Justice. 2000;40:233–9.CrossRefGoogle Scholar
 16.Aitken CGG, Roberts P, Jackson G. Fundamentals of probability and statistical evidence in criminal proceedings: guidance for judges, lawyers, forensic scientists, and expert witnesses. Practitioner Guide No. 1. London: Royal Statistical Society; 2012.Google Scholar
 17.ENFSI guideline for evaluative reporting in forensic science: strengthening the evaluation of forensic results across Europe (STEOFRAE). Project (EU ISEC 2010) supported by the Prevention of and Fight against Crime Program of the European Union European Commission – Directorate – General Justice, Freedom, and Security (Agreement Number: HOME/2010/ISEC/MO/4000001759); 2015.Google Scholar
 18.Jackson G, Aitken CGG, Roberts P. Case assessment and interpretation of expert evidence: guidance for judges, lawyers, forensic scientists, and expert witnesses. Practitioner Guide No. 4. London: Royal Statistical Society; 2014.Google Scholar
 19.Roberts P, Aitken CGG. The logic of forensic proof: inferential reasoning in criminal evidence and forensic science: guidance for judges, lawyers, forensic scientists, and expert witnesses. Practicioner Guide No. 3. London: Royal Statistical Society; 2013.Google Scholar
 20.PuchSolis R, Roberts P, Pope S, Aitken CGG. Assessing the probative value of DNA evidence: guidance for judges, lawyers, forensic scientists, and expert witnesses. Practicioner Guide No. 2. London: Royal Statistical Society; 2012.Google Scholar
 21.Ramos D. Forensic evaluation of the evidence using automatic speaker recognition systems. PhD thesis Depto. Spain: de Ingenieria Informatica, Escuela Politecnica Superior, Universidad Autonoma de Madrid Madrid; 2007.Google Scholar
 22.Zadora G, Ramos D. Evaluation of glass samples for forensic purposes – an application of likelihood ratio model and informationtheoretical approach. Chemom Intell Lab Syst. 2010;102:63–83.CrossRefGoogle Scholar
 23.Zadora G, Neocleous T. Evidential value of physicochemical datacomparison of methods of glass database creation. J Chemom. 2010;24:367–78.CrossRefGoogle Scholar
 24.van Es A, Wiarda W, Hordijk M, Alberink I, Vergeer P. Implementation and assessment of a likelihood ratio approach for the evaluation of LAICPMS evidence in forensic glass analysis. Sci Justice. 2017;57:181–92.CrossRefGoogle Scholar
 25.Lucy D, Zadora G. Mixed effects modeling for glass category estimation from glass refractive indices. Forensic Sci Int. 2011;212:189–97.Google Scholar
 26.Zadora G, Wilk D. Evaluation of evidence value of refractive index measured before and after annealing for container and float glass fragments. Problems Forensic Sci. 2009;78:365–85.Google Scholar
 27.Martyna A, Sjastad KE, Zadora G, Ramos D. Analysis of lead isotopic ratios of glass objects with the aim of comparing them for forensic purposes. Talanta. 2013;105:158–66.CrossRefGoogle Scholar
 28.Pierrini G, Doyle S. Evaluation of preliminary isotopic analysis (13C and 15N) of explosives. A likelihood ratio approach to assess the links between Semtex samples. Forensic Sci Int. 2007;167:43–8.CrossRefGoogle Scholar
 29.Zadora G. Evaluation of evidential value of physicochemical data by a Bayesian network approach. J Chemom. 2010;24:346–66.CrossRefGoogle Scholar
 30.ZiębaPalus J, Zadora G, Milczarek JM. Differentiation and evaluation of evidence value of styrene acrylic urethane topcoat car paints analyzed by pyrolysisgas chromatography. J Chromatogr A. 2008;1179:47–58.CrossRefGoogle Scholar
 31.Martyna A, Michalska A, Zadora G. Interpretation of FTIR spectra of polymers and Raman spectra of car paints by means of likelihood ratio approach supported by wavelet transform for reducing data dimensionality. Anal Bioanal Chem. 2015;407:3357–76.CrossRefGoogle Scholar
 32.Martyna A, Zadora G, Neocleous T, Michalska A, Dean N. Hybrid approach combining chemometrics and likelihood ratio framework for reporting the evidential value of spectra. Anal Chim Acta. 2016;931:34–346.CrossRefGoogle Scholar
 33.Michalska A, Martyna A, ZiębaPalus J, Zadora G. Application of a likelihood ratio approach in solving a comparison problem of Raman spectra recorded for blue automotive paints. J Raman Spectr. 2015;46:772–83.CrossRefGoogle Scholar
 34.Zadora G, Borusiewicz R, ZiębaPalus J. Differentiation between weathered kerosene and diesel fuel using automatic thermal desorptionGCMS analysis and the likelihood ratio approach. J Separation Sci. 2005;28:1467–75.CrossRefGoogle Scholar
 35.Martyna A, Lucy D, Zadora G, Trzcinska BM, Ramos D, Parczewski A. The evidential value of microspectrophotometry measurements made for pen inks. Anal Methods. 2013;5:6788–95.CrossRefGoogle Scholar
 36.Neumann C, Margot P. New perspectives in the use of ink evidence in forensic science. Part III: Operational applications and evaluation. Forensic Sci Int. 2009;192:29–42.CrossRefGoogle Scholar
 37.Bolck A, Ni H, Lopatka M. Evaluating scoreand featurebased likelihood ratio models for multivariate continuous data: applied to forensic MDMA comparison. Law, Probability, Risk. 2015;14:243–66.CrossRefGoogle Scholar
 38.Hibbert DB, Blackmore D, Li J, Ebrahimi D, Collins M, Vujic S, et al. A probabilistic approach to heroin signatures. Anal Bioanal Chem. 2010;396:765–73.CrossRefGoogle Scholar
 39.Bolck A, Alberink I. Variation in likelihood ratios for forensic evidence evaluation of XTC tablets comparison. J Chemom. 2010;25:41–9.CrossRefGoogle Scholar
 40.Własiuk P, Martyna A, Zadora G. A likelihood ratio model for the determination of the geographical origin of olive oil. Anal Chim Acta. 2015;853:187–99.CrossRefGoogle Scholar
 41.Martyna A, Zadora G, Stanimirova I, Ramos D. Wine authenticity verification as a forensic problem. An application of likelihood ratio approach to label verification. Food Chem. 2014;150:287–95.CrossRefGoogle Scholar
 42.Alladio E, Martyna A, Salomone A, Pirro V, Vincenti M, Zadora G. Evaluation of direct and indirect ethanol biomarkers using a likelihood ratio approach to identify chronic alcohol abusers for forensic purposes. Forensic Sci Int. 2017;271:13–22.CrossRefGoogle Scholar
 43.Varmuza K, Filzmoser P. Multivariate statistical analysis in chemometrics. Boca Raton: CRC Press; 2008.Google Scholar
 44.Hubert M, Rousseeuw PJ, Verboven S. A fast method for robust principal components with applications to chemometrics. Chemom Intel Lab Syst. 2002;60:101–11.CrossRefGoogle Scholar
 45.Hubert M, Engelen S. Robust PCA and classification in biosciences. Bioinformatics. 2004;20:1728–36.CrossRefGoogle Scholar
 46.Hazewinkel M, Subbotin Y, Eds. Encyclopedia of Mathematics. New YorkP: Springer; 2001.Google Scholar
 47.Silverman BW. Density estimation for statistics and data analysis. London: Chapman and Hall; 1986.CrossRefGoogle Scholar
 48.Brümmer N, du Preez J. Application independent evaluation of speaker detection. Comput Speech Language. 2006;20:230–75.CrossRefGoogle Scholar
 49.Ramos D, GonzalezRodriguez J, Zadora G, Aitken C. Informationtheoretical assessment of the performance of likelihood ratio computation methods. J Forensic Sci. 2013;58:1503–18.CrossRefGoogle Scholar
 50.R Core Team. R. A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2012. Available at: http://www.Rproject.org. Accesssed 20 Jan 2018.
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.