Data-driven tree structure for PIN models


Probability of informed trading (PIN) models characterize trading with certain types of information through a tree structure. Different tree structures with different numbers of groups for market participants have been proposed, with no clear, consistent tree used in the literature. One of the main causes of this inconsistency is that these trees are artificially proposed through a bottom-up approach rather than implied by actual market data. Therefore, in this paper, we propose a method that infers a tree structure directly from empirical data. More precisely, we use hierarchical clustering to construct a tree for each individual firm and then infer an aggregate tree through a voting mechanism. We test this method on US data from January 2002 for 7608 companies, which results in a tree with two layers and four groups. The characteristics of the resulting aggregate tree are between those of several proposed tree structures in the literature, demonstrating that these proposed trees all reflect only part of the market, and one should consider the proposed empirically driven method when seeking a tree representing the whole market.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Availability of data and material

The data is collected from TAQ database.

Code availability

The code for this study is written using R.


  1. 1.

    Note that when employing the BIC, one assumes that all data are from the same model with the same parameters, namely, independent and identical samples. This is not the case with heterogeneous data.

  2. 2.

    This idea comes from Chen et al. (2016). In their case, their experiment gives them different estimations of the number of states, so they use a majority vote (they use this term in their conference presentation) to select an estimation. Here we borrow this idea, but vote on the structure of trees.

  3. 3.

    More precisely, the data here are time series data collected through an experimental technique called Föster resonance energy transfer (FRET). See Chen et al. (2016) for more details.

  4. 4.

    Two general approaches (commonly known as tick tests) are used to infer the direction of a trade: (1) comparing the trade price to the bid/ask prices of the prevailing quote or (2) comparing the trade price to that of adjacent trades.


  1. Aslan H, Easley D, Hvidkjaer S, O’Hara M (2011) The characteristics of informed trading: implications for asset pricing. J Empir Finance 18:782–801.

    Article  Google Scholar 

  2. Bosque L, Albuquerque P, Peng Y, Silva CD, Nakano E (2020) Probability of informed trading: a Bayesian approach. Int J Appl Decis Sci 13:183–214.

    Article  Google Scholar 

  3. Brennan MJ, Huh SW, Subrahmanyam A (2018) High-frequency measures of informed trading and corporate announcements. Rev Finance Stud 31:2326–2376.

    Article  Google Scholar 

  4. Chang C, Lin E (2014) On the determinants of basis spread for Taiwan index futures and the role of speculators. Rev Pac Basin Finance Mark Policies 17:1–30.

    Article  Google Scholar 

  5. Chang C, Lin E (2015) Cash-futures basis and the impact of market maturity, informed trading, and expiration effects. Int Rev Econ Finance 35:197–213.

    Article  Google Scholar 

  6. Chen Y, Shen K, Shan SO, Kou SC (2016) Analyzing single-molecule protein transportation experiments via hierarchical hidden markov models. J Am Stat Assoc 111:951–966.

    Article  Google Scholar 

  7. Cheng TC, Lai HN (2020) Improvements in estimating the probability of informed trading models. Quant Finance.

    Article  Google Scholar 

  8. Duarte J, Young L (2009) Why is PIN priced? J Financ Econ 91:119–138.

    Article  Google Scholar 

  9. Easley D, Kiefer NM, O’Hara M, Paperman JB (1996) Liquidity, information, and infrequently traded stocks. J Finance 51:1405–1436.

    Article  Google Scholar 

  10. Easley D, Hvidkjaer S, O’Hara M (2002) Is information risk a determinant of asset returns? J Finance 57:2185–2221.

    Article  Google Scholar 

  11. Gan Q, Wei WC, Johnstone D (2015) A faster estimation method for the probability of informed trading using hierarchical agglomerative clustering. Quant Finance 15:1805–1821.

    Article  Google Scholar 

  12. Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis. Prentice Hall, Upper Saddle River

    Google Scholar 

  13. Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken

    Google Scholar 

  14. Lee CMC, Ready MJ (1991) Inferring trade direction from intraday data. J Finance 46:733–746.

    Article  Google Scholar 

  15. Lin HWW, Ke WC (2011) A computing bias in estimating the probability of informed trading. J Finance Mark 14:625–640.

    Article  Google Scholar 

  16. Lin E, Lee CF (2015) Application of poisson mixtures in the estimation of probability of informed trading. Lee CF. Lee JC Handbook of financial econometrics and statistics. Springer, New York, pp 2601–2619

    Google Scholar 

  17. Yan Y, Zhang S (2012) An improved estimation method and empirical properties of the probability of informed trading. J Bank Finance 36:454–467.

    Article  Google Scholar 

Download references


The research of Chu-Lan Michael Kao is partly supported by MOST 107–2118-M-009–003-MY2.

Author information



Corresponding author

Correspondence to Chu-Lan Michael Kao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lin, E., Kao, CL.M. & Adityarini, N.S. Data-driven tree structure for PIN models. Rev Quant Finan Acc (2021).

Download citation


  • PIN model
  • Hierarchical clustering
  • Tree voting
  • Data-driven method

JEL Classification

  • G14