Probability of informed trading (PIN) models characterize trading with certain types of information through a tree structure. Different tree structures with different numbers of groups for market participants have been proposed, with no clear, consistent tree used in the literature. One of the main causes of this inconsistency is that these trees are artificially proposed through a bottom-up approach rather than implied by actual market data. Therefore, in this paper, we propose a method that infers a tree structure directly from empirical data. More precisely, we use hierarchical clustering to construct a tree for each individual firm and then infer an aggregate tree through a voting mechanism. We test this method on US data from January 2002 for 7608 companies, which results in a tree with two layers and four groups. The characteristics of the resulting aggregate tree are between those of several proposed tree structures in the literature, demonstrating that these proposed trees all reflect only part of the market, and one should consider the proposed empirically driven method when seeking a tree representing the whole market.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Availability of data and material
The data is collected from TAQ database.
The code for this study is written using R.
Note that when employing the BIC, one assumes that all data are from the same model with the same parameters, namely, independent and identical samples. This is not the case with heterogeneous data.
This idea comes from Chen et al. (2016). In their case, their experiment gives them different estimations of the number of states, so they use a majority vote (they use this term in their conference presentation) to select an estimation. Here we borrow this idea, but vote on the structure of trees.
More precisely, the data here are time series data collected through an experimental technique called Föster resonance energy transfer (FRET). See Chen et al. (2016) for more details.
Two general approaches (commonly known as tick tests) are used to infer the direction of a trade: (1) comparing the trade price to the bid/ask prices of the prevailing quote or (2) comparing the trade price to that of adjacent trades.
Aslan H, Easley D, Hvidkjaer S, O’Hara M (2011) The characteristics of informed trading: implications for asset pricing. J Empir Finance 18:782–801. https://doi.org/10.1016/j.jempfin.2011.08.001
Bosque L, Albuquerque P, Peng Y, Silva CD, Nakano E (2020) Probability of informed trading: a Bayesian approach. Int J Appl Decis Sci 13:183–214. https://doi.org/10.1504/ijads.2020.106415
Brennan MJ, Huh SW, Subrahmanyam A (2018) High-frequency measures of informed trading and corporate announcements. Rev Finance Stud 31:2326–2376. https://doi.org/10.1093/rfs/hhy005
Chang C, Lin E (2014) On the determinants of basis spread for Taiwan index futures and the role of speculators. Rev Pac Basin Finance Mark Policies 17:1–30. https://doi.org/10.1142/s0219091514500027
Chang C, Lin E (2015) Cash-futures basis and the impact of market maturity, informed trading, and expiration effects. Int Rev Econ Finance 35:197–213. https://doi.org/10.1016/j.iref.2014.09.003
Chen Y, Shen K, Shan SO, Kou SC (2016) Analyzing single-molecule protein transportation experiments via hierarchical hidden markov models. J Am Stat Assoc 111:951–966. https://doi.org/10.1080/01621459.2016.1140050
Cheng TC, Lai HN (2020) Improvements in estimating the probability of informed trading models. Quant Finance. https://doi.org/10.1080/14697688.2020.1800805
Duarte J, Young L (2009) Why is PIN priced? J Financ Econ 91:119–138. https://doi.org/10.1016/j.jfineco.2007.10.008
Easley D, Kiefer NM, O’Hara M, Paperman JB (1996) Liquidity, information, and infrequently traded stocks. J Finance 51:1405–1436. https://doi.org/10.1111/j.1540-6261.1996.tb04074.x
Easley D, Hvidkjaer S, O’Hara M (2002) Is information risk a determinant of asset returns? J Finance 57:2185–2221. https://doi.org/10.1111/1540-6261.00493
Gan Q, Wei WC, Johnstone D (2015) A faster estimation method for the probability of informed trading using hierarchical agglomerative clustering. Quant Finance 15:1805–1821. https://doi.org/10.1080/14697688.2015.1023336
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis. Prentice Hall, Upper Saddle River
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken
Lee CMC, Ready MJ (1991) Inferring trade direction from intraday data. J Finance 46:733–746. https://doi.org/10.1111/j.1540-6261.1991.tb02683.x
Lin HWW, Ke WC (2011) A computing bias in estimating the probability of informed trading. J Finance Mark 14:625–640. https://doi.org/10.1016/j.finmar.2011.03.001
Lin E, Lee CF (2015) Application of poisson mixtures in the estimation of probability of informed trading. Lee CF. Lee JC Handbook of financial econometrics and statistics. Springer, New York, pp 2601–2619
Yan Y, Zhang S (2012) An improved estimation method and empirical properties of the probability of informed trading. J Bank Finance 36:454–467. https://doi.org/10.1016/j.jbankfin.2011.08.003
The research of Chu-Lan Michael Kao is partly supported by MOST 107–2118-M-009–003-MY2.
Conflict of interest
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lin, E., Kao, CL.M. & Adityarini, N.S. Data-driven tree structure for PIN models. Rev Quant Finan Acc (2021). https://doi.org/10.1007/s11156-021-00961-w
- PIN model
- Hierarchical clustering
- Tree voting
- Data-driven method