Independent-model diagnostics for a priori identification and interpretation of outliers from a full pharmacokinetic database: correspondence analysis, Mahalanobis distance and Andrews curves

  • Nabil Semmar
  • Saik Urien
  • Bernard Bruguerolle
  • Nicolas Simon


Population pharmacokinetic (PK) (or pharmacodynamic (PD)) modelling aims to analyse the variability of drug kinetics (or dynamics) between numerous subjects belonging to a population. Such variability includes inter- and intra-individual sources leading to important differences between the variation ranges, the relative concentrations and the global shapes of PK profiles. These various sources of variability suggest that the distance metrics between the subjects can be examined under different aspects. Some subjects are so distant from the majority that they tend to be atypical or outliers. This paper presents three multivariate statistical methods to diagnose the outliers within a full population PK dataset, prior to any modelling step. Each method combined all the concentration–time variables to analyse the differences between patients by referring to a distance criterion: (a) Correspondence analysis (CA) used the chi-square distance to highlight the most atypical profiles in terms of relative concentrations; (b) Mahalanobis distance was calculated to extract PK profiles showing atypical shapes due to atypical variations in concentration; (c) Andrews method combined all the concentration variables into a Fourier transformation to give sine–cosine curves showing the clustering behaviours of subjects under the Euclidean distance criterion. After identification of outlier subjects, these methods can also be used to extract the concentration values which cause the atypical states of the patients. Therefore, the outliers will incorporate different variability sources of the PK dataset according to each method and independently of any PK modelling. Finally, a significant positive trend was found between the number of times outlier concentrations were detected (by one, two or three diagnostics) and the NPDE metrics of these concentrations (after a PK modelling): NPDE were highest when the corresponding concentration was detected by more diagnostics a priori. The application of a priori outlier diagnostics is illustrated here on two PK datasets: stimulated cortisol by synacthen and capecitabine administrated orally.


Outliers Variability analysis Independent-model diagnostics Full pharmacokinetic database Correspondence analysis Mahalanobis distance Andrews curves 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Gnanadesikan R (1977). Methods for statistical data analysis of multivariate observations. Wiley, New York Google Scholar
  2. Gnanadesikan R and Kettenring JR (1972). Robust estimates, residuals and outlier detection with multiresponse data. Biometrics 28: 81–124 CrossRefGoogle Scholar
  3. Greenacre MJ (1984). Theory and applications of correspondence analysis. Academic Press, London Google Scholar
  4. Greenacre MJ (1993). Correspondence analysis in practice. Academic Press, London Google Scholar
  5. Mortier F and Bar-Hen A (2004). Influence and sensitivity measures in correspondence analysis. Statistics 38: 207–215 CrossRefGoogle Scholar
  6. Andrews DF (1972). Plots of high-dimensional data. Biometrics 28: 125–136 CrossRefGoogle Scholar
  7. Barnett V (1976). The ordering of multivariate data (with discussion). J R Stat Soc A 139: 318–354 CrossRefGoogle Scholar
  8. Swaroop R, Winter WR (1971) A statistical technique for computer identification of outliers in multivariate data. NASA Tech Notes D-6472Google Scholar
  9. Brendel K, Comets E, Laffont C, Laveille C and Mentré F (2006). Metrics for external model evaluation with an application to the population pharmacokinetics of gliclazide. Pharm Res 23: 2036–2049 CrossRefPubMedGoogle Scholar
  10. Dixon WJ (1950). Analysis of extreme values. Ann Math Stat 21: 488–506 CrossRefGoogle Scholar
  11. Grubbs FE (1969). Procedures for detecting outlying observations in samples. Technometrics 11: 1–21 CrossRefGoogle Scholar
  12. Barnett V (1978). The study of outliers: purpose and model. Appl Stat 27: 242–250 CrossRefGoogle Scholar
  13. Barnett V and Lewis T (1994). Outliers in statistical data. Wiley, New York Google Scholar
  14. Rasmussen JL (1988). Evaluating outlier identification tests: Mahalanobis D Squared and Comrey Dk. Multivariate Behav Res 23: 189–202 CrossRefGoogle Scholar
  15. Seber GAF (1984). Multivariate observations. Wiley, New York CrossRefGoogle Scholar
  16. Rousseeuw PJ and Leroy AM (1987). Robust regression and outlier detection. Wiley, New York CrossRefGoogle Scholar
  17. Cerioli A and Riani M (1999). The ordering of spatial data and the detection of multiple outliers. J Comput Graph Stat 8: 239–258 CrossRefGoogle Scholar
  18. Robinson RB (2005). Identifying outliers in correlated water quality data. J Environ Eng 134: 651–657 Google Scholar
  19. Everitt BS and Dunn G (1992). Applied multivariate data analysis. Wiley, New York Google Scholar
  20. Filzmoser P, Garrett RG and Reimann C (2005). Multivariate outlier detection in exploration geochemistry. Comput Geosci 31: 579–587 CrossRefGoogle Scholar
  21. SAS Institute Inc (1987) JMP3.2. SAS Institute. Carry, North CarolinaGoogle Scholar
  22. Hawkins DM (1980). Identification of outliers. Chapman and Hall, London Google Scholar
  23. Jain AK and Dubes RC (1988). Algorithms for clustering data. Prentice Hall, New Jersey Google Scholar
  24. Boberg J (1999) Cluster analysis: a mathematical approach with applications to protein structures. Academic dissertation, Turku Centre for Computer Science, Turku, FinlandGoogle Scholar
  25. Rousseeuw PJ and Van Zomeren BC (1990). Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85: 633–651 CrossRefGoogle Scholar
  26. Hampel FR, Ronchetti EM, Rousseeuw PJ and Stahel W (1986). Robust statistics. The approach based on influence functions. Wiley, New York Google Scholar
  27. Escofier B and Pagès J (1991). Presentation of correspondence analysis and multiple correspondence analysis with the help of examples. In: Devillers, J and Karcher, W (eds) Applied multivariate analysis in SAR and environmental studies, pp 1–32. Kluwer Academic Publishers, Dordrecht Google Scholar
  28. Thioulouse J, Chessel D, Dolédec S and Olivier JM (1996). ADE-4: a multivariate analysis and graphical display software. Stat Comput 7: 75–83 CrossRefGoogle Scholar
  29. Frye C, Freeze WS, Buckingham FK (2004) Microsoft Office Excel 2003 Programming Inside Out. Microsoft Pr, WashingtonGoogle Scholar
  30. Gelman A, Carlin JB, Stern HS and Rubin DB (1995). Bayesian data analysis. Chapman and Hall, London Google Scholar
  31. Semmar N, Bruguerolle B, Boullu-Ciocca S and Simon N (2005). Cluster analysis: an alternative method for covariate selection in population pharmacokinetic modelling. J Pharmacokinet Pharmacodyn 32: 333–358 CrossRefPubMedGoogle Scholar
  32. Urien S, Rezaï K and Lokiec F (2005). Pharmacokinetic modelling of 5-FU production from capecitabine — a population study in 40 adult patients with metastatic cancer. J Pharmacokinet Pharmacodyn 32: 817–833 CrossRefPubMedGoogle Scholar
  33. Siegel AF (1988). Statistics and data analysis: an introduction. Wiley, New York Google Scholar
  34. Penny KI and Jolliffe IT (1999). Multivariate outlier detection applied to multiply imputed laboratory data. Stat Med 18: 1879–1895 CrossRefPubMedGoogle Scholar
  35. Chatterjee S and Hadi AS (1988). Sensitivity analysis in linear regression. Wiley, New York CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Nabil Semmar
    • 1
    • 2
  • Saik Urien
    • 3
    • 4
  • Bernard Bruguerolle
    • 1
  • Nicolas Simon
    • 1
  1. 1.Laboratory of Clinical Pharmacology, EA3784Medical School of MarseillesMarseillesFrance
  2. 2.ISSBAT, Institut Supérieur des Sciences Biologiques Appliquées de TunisTunisTunisia
  3. 3.Pharmacology DepartmentCentre René HugueninSaint-CloudFrance
  4. 4.INSERMParisFrance

Personalised recommendations