Scoring and Predicting Risk Preferences

Ertek, Gürdal; Kaya, Murat; Kefeli, Cemre; Onur, Özge; Uzer, Kerem

doi:10.1007/978-1-4471-2969-1_9

Gürdal Ertek³,
Murat Kaya³,
Cemre Kefeli³,
Özge Onur³ &
…
Kerem Uzer⁴

1547 Accesses
6 Citations

Abstract

This study presents a methodology to determine risk scores of individuals, for a given financial risk preference survey. To this end, we use a regression-based iterative algorithm to determine the weights for survey questions in the scoring process. Next, we generate classification models to classify individuals into risk-averse and risk-seeking categories, using a subset of survey questions. We illustrate the methodology through a sample survey with 656 respondents. We find that the demographic (indirect) questions can be almost as successful as risk-related (direct) questions in predicting risk preference classes of respondents. Using a decision-tree based classification model, we discuss how one can generate actionable business rules based on the findings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See, for example http://www.paragonwealth.com/risk_tolerance.php.
2.
Also referred to as classification trees.
3.
Science high school: specially designated high schools that heavily implement a math- and science-oriented curriculum.

References

Ahn, H., Kim, K., Han, I.: Hybrid genetic algorithms and case-based reasoning systems for customer classification. Expert Syst. 23(3), 127–144 (2006)
Article Google Scholar
Ashby, W.R.: Principles of the self-organizing system. In: Principles of Self-organization, pp. 255–278 (1962)
Google Scholar
Aven, T., Renn, O.: Risk Management and Governance: Concepts, Guidelines and Applications. Springer, Berlin (2010)
Book Google Scholar
Barberis, N., Thaler, R.H.: A survey of behavioral finance. In: Constantinides, G.M., Harris, M., Stulz, R.M. (eds.) Handbook of the Economics of Finance, vol. 1, Part 1, pp. 1053–1128. Amsterdam, Elsevier (2003)
Google Scholar
Cao, L.: Behavior informatics and analytics: Let behavior talk. In: ICDMW’08. IEEE International Conference on Data Mining Workshops, pp. 87–96 (2008)
Chapter Google Scholar
Cao, L.: In-depth behavior understanding and use: the behavior informatics approach. Inf. Sci. 180, 3067–3085 (2010)
Article Google Scholar
Chen, F.L., Li, F.C.: Combination of feature selection approaches with SVM in credit scoring. Expert Syst. Appl. 37(7), 4902–4909 (2010)
Article Google Scholar
Chien, C.F., Chen, L.F.: Data mining to improve personnel selection and enhance human capital: A case study in high-technology industry. Expert Syst. Appl. 34(1), 280–290 (2008)
Article Google Scholar
Clarke, B., Fokoué, E., Zhang, H.H.: Principles and Theory for Data Mining and Machine Learning. Springer, Berlin (2009)
Book MATH Google Scholar
Daniel, C., Wood, F.S.: Fitting Functions to Data. Wiley, New York (1980)
Google Scholar
Ertek, G., Kaya, M., Kefeli, C., Onur, C., Uzer, K.: Supplementary document for “Scoring and Predicting Risk Preferences”. Available online under http://people.sabanciuniv.edu/ertekg/papers/supp/03.pdf (2011)
Galindo, J., Tamayo, P.: Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. Comput. Econ. 15(1), 107–143 (2000)
Article MATH Google Scholar
Giarratano, J.C., Riley, G.: Expert Systems: Principles and Programming. Brooks/Cole, Pacific Grove (1989)
Google Scholar
Grable, J.E.: Financial risk tolerance and additional factors that affect risk taking in everyday. J. Bus. Psychol. 14(4), 625–630 (2000)
Article Google Scholar
Grable, J.E., Lytton, R.H.: Investor risk tolerance: Testing the efficacy of demographics as differentiating and classifying factors. Financ. Couns. Plan. 9(1), 61–74 (1998)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hallahan, T.A., Faff, R.W., Mckenzie, M.D.: An empirical investigation of personal financial risk tolerance. Financial Services Review 13(1), 57–78 (2004)
Google Scholar
Harrison, G.W., Lau, M.I., Rutstrom, E.E.: Estimating risk attitudes in Denmark: A field experiment. Scand. J. Econ. 109(2), 341–368 (2007)
Article Google Scholar
Hsieh, N.C.: An integrated data mining and behavioral scoring model for analyzing bank customers. Expert Syst. Appl. 27(4), 623–633 (2004)
Article Google Scholar
Huang, C.L., Chen, M.C., Wang, C.J.: Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl. 33(4), 847–856 (2007)
Article MathSciNet Google Scholar
Kim, J.K., Song, H.S., Kim, T.S., Kim, H.K.: Detecting the change of customer behavior based on decision tree analysis. Expert Syst. 22(4), 193–205 (2005)
Article Google Scholar
Kim, W., Choi, B.J., Hong, E.K., Kim, S.K., Lee, D.: A taxonomy of dirty data. Data Min. Knowl. Discov. 7(1), 81–99 (2003)
Article MathSciNet Google Scholar
Koh, H.C., Wei, C.T., Chwee, P.G.: A two-step method to construct credit scoring models with data mining techniques. Int. J. Bus. Inf. 1(1), 96–118 (2006)
Google Scholar
Kuykendall, L.: The data-mining toolbox. Credit Card Manag. 12(7) (1999)
Google Scholar
Lessmann, S., Voß, S.: Supervised classification for decision support in customer relationship management. In: Bortfeldt, A. (ed.) Intelligent Decision Support, p. 231 (2008)
Chapter Google Scholar
MathWorks: Matlab, http://www.mathworks.com (2011)
Palma, A., Picard, N.: Evaluation of MiFID questionnaires in France. Technical report, AMF (2010)
Google Scholar
Patt, A., Peterson, N., Carter, M., Velez, M., Hess, U., Suarez, P.: Making index insurance attractive to farmers. Mitig. Adapt. Strategies Glob. Chang. 14(8), 737–753 (2009)
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52(3/4), 591–611 (1965)
Article MathSciNet MATH Google Scholar
Sreekantha, D.K., Kulkarni, R.V.: Expert system design for credit risk evaluation using neuro-fuzzy logic. Expert Syst., (2010). doi:10.1111/j.1468-0394.2010.00562.x
Google Scholar
Sung, J., Hanna, S.: Factors related to risk tolerance. Financ. Couns. Plan. 7, 11–20 (1996)
Google Scholar
The R Foundation for Statistical Computing: R Project, http://www.r-project.org (2011)
Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int. J. Forecast. 16(2), 149–172 (2000)
Article MATH Google Scholar
Tsai, C.F., Wu, J.W.: Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst. Appl. 34(4), 2639–2649 (2008)
Article Google Scholar
University of Ljubljana, Bioinformatics Laboratory: Orange, http://orange.biolab.si/ (2011)
Wagner, W.P., Najdawi, M.K., Chung, Q.B.: Selection of knowledge acquisition techniques based upon the problem domain characteristics of production and operations management expert systems. Expert Syst. 18(2), 76–87 (2001)
Article Google Scholar
Wang, X.T., Kruger, D.J., Wilke, A.: Life history variables and risk-taking propensity. Evol. Hum. Behav. 30(2), 77–84 (2009)
Article Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Article Google Scholar
Zakrzewska, D., Murlewski, J.: Clustering algorithms for bank customer segmentation (2005)
Google Scholar

Download references

Acknowledgements

The authors thank Sabancı University (SU) alumni Levent Bora, Kıvanc Kılınç, Onur Özcan, Feyyaz Etiz for their work on earlier phases of the study, and students Serpil Çetin and Nazlı Ceylan Ersöz for collecting the data for the case study. The authors also thank SU students Gizem Gürdeniz, Havva Gözde Ekşiog̃lu and Dicle Ceylan for their assistance. This chapter is dedicated to the memory of Mr. Turgut Uzer, a leading industrial engineer in Turkey, who passed away in February 2011. Mr. Turgut Uzer inspired the authors greatly with his vision, unmatched know-how, and dedication to the advancement of decision sciences.

Author information

Authors and Affiliations

Faculty of Engineering and Natural Sciences, Sabancı University, Orhanli, Tuzla, 34956, Istanbul, Turkey
Gürdal Ertek, Murat Kaya, Cemre Kefeli & Özge Onur
School of Management, Sabancı University, Orhanli, Tuzla, 34956, Istanbul, Turkey
Kerem Uzer

Authors

Gürdal Ertek
View author publications
You can also search for this author in PubMed Google Scholar
Murat Kaya
View author publications
You can also search for this author in PubMed Google Scholar
Cemre Kefeli
View author publications
You can also search for this author in PubMed Google Scholar
Özge Onur
View author publications
You can also search for this author in PubMed Google Scholar
Kerem Uzer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gürdal Ertek .

Editor information

Editors and Affiliations

Advanced Analytics Institute, University of Technology, Broadway, Ultimo, Sydney, NSW2007, New South Wales, Australia
Longbing Cao
Department of Computer Science, University of Illinois, S. Morgan Street 851, Chicago, 60607, Illinois, USA
Philip S. Yu

Appendices

Appendix A: Selected Survey Questions

Following are selected direct (risk-related) questions from the survey of the case study, which constitute the corresponding direct (risk-related) attributes.

Q34

Over the long term, typically, investments which are more volatile (i.e., that tend to fluctuate more in value) have greater potential for return (Stocks, for example, have high volatility; whereas government bonds have low volatility). Given this trade-off, what would be the level of volatility you would prefer for your investment?

a
Less than 3%
b
3% to 5%
c
5% to 7%
d
7% to 13%
e
More than 13%

Q32

What is your most important investment priority?

a
I aim to protect my capital; I cannot stand losing money.
b
I am OK with small growth; I cannot take much risk.
c
I aim for an investment that delivers the market return rate.
d
I want higher than market return; I am OK with volatility.
e
Return is the most important for me. I am ready to take high risk for high return.

Q23

Compared to others, how do you rate your willingness to take risk?

a
Very low
b
Low
c
Average
d
High
e
Very high

Q42

What is your most preferred investment strategy?

a
I want my investments to be secure. I also need my investments to provide me with modest income now, or to fund a large expense within the next few years.
b
I want my investments to grow and I am less concerned about income. I am comfortable with moderate market fluctuations.
c
I am more interested in having my investments grow over the long-term. I am comfortable with short-term return volatility.
d
I want long-term aggressive growth and I am willing to accept significant short-term market fluctuations.

Appendix B: ScoringAlgorithm

Following is the mathematical presentation of the developed scoring algorithm:

Sets

${\mathcal{I}}$::: set of respondents (observations, rows) in the sample; i=1,…,I
${\mathcal{J}}$::: set of attributes (questions, columns); j=1,…,J
${\mathcal{V}}$::: set of ordinal values for each attribute; v=1,…,V. For the presented case study, ${\mathcal{V}}= (a,b,c,d,e )$, where a≤b≤c≤d≤e

Inputs

$\mathbf{O}={[o_{ij}]}_{I\times J}$::: matrix of ordinal values of all attributes for all respondents
m _j::: number of possible ordinal values for attribute j; m _j≤5 in this study

Internal Variables

$\mathbf{A}={[a_{ij}]}_{I\times J}$::: matrix of numerical (nominal) values of all attributes for all respondents
y _i::: temporary adjusted risk score for respondent i, to be used in regression

Parameters

E::: threshold on absolute percentage error (falling below this value will terminate the algorithm)
α::: threshold for type-1 error (probability of rejecting a hypothesis when the hypothesis is in fact true)
M::: a very large number
$\mathbf{B}$::: transformation matrix for converting the ordinal input value matrix ${\mathbf{O}}$ into the numerical (nominal) value matrix ${\mathbf{A}}$

Outputs

z _j::: whether attribute j is to be included in computing the risk score; z _j∈{0,1}
w _j::: weight for attribute j; w _j≥0
β _0j::: intercept value for attribute j
β _1j::: slope value for attribute j
Γ _j::: sign multiplier for attribute j; Γ _j∈{−1,1}
x _i::: risk score for respondent i

Functions

$f (v,n ):({\mathcal{V}}, \{2,\ldots,V \})\to [0,3]$: mapping function for an attribute with n possible values, that transforms the ordinal value v collected for that attribute to a nominal value b _v,n−1.

$$f (v,n )=b_{v,n-1}$$

where, for V=5,

$$\mathbf{B}=[b_{vn}]_{V\times(V-1)}=\left [\begin{array}{c@{\quad}c@{\quad}c@{\quad}c}0.00 & 0.00 & 0.00 & 0.00 \\3.00 & 1.50 & 1.00 & 0.75 \\\cdot & 3.00 & 2.00 & 1.50 \\\cdot & \cdot & 3.00 & 2.25 \\\cdot & \cdot & \cdot & 3.00\end{array}\right ]$$

$\mathit{regression}(\mathbf{y},\mathbf{a}')$

solve regression model $\mathbf{y}={\beta}_{0}+{\beta }_{1}\mathbf{a}'+\varepsilon$ for vectors $\mathbf{y}$ and $\mathbf{a}'$

return (p, β ₀,β ₁), where p is the p-value for the regression model

$\mathit{preprocess}()$

// transform ordinal attribute values to nominal values

$a_{ij}={\mathbf{\ }}f (o_{ij},m_{j} ); \forall(i,j)\in\ {\mathcal{I}}\times{\mathcal{J}}$

Iteration-Related Notation

k::: iteration count
N::: number of attributes included in risk score computations at a given iteration
W::: sum of weights for attributes
ε _k::: absolute error at a given iteration k
e _k::: absolute percentage error at a given iteration k
$\overline{e}_{k}$::: average absolute percentage error at a given iteration k

$\mathit{ScoringAlgorithm}({\mathbf{O}},\ m_{j} )$

BEGIN

// perform pre-processing to transform ordinal data to nominal data

$\mathit{preprocess}()$

// initialization:

// initially, all attributes are included in scoring,

// with unit weight of 1 and sign multiplier of 1.

// all of the regression intercepts are 0.

z _j=1, w _j=1, Γ _j=1, β _0j=0; $\forall j\in{\mathcal{J}}$

N=∑_j z _j

// begin with iteration count of 1

k=1

$\mathit{Begin\_Iteration}$

// standardize the weights, so that their sum W will equal to N

W=∑_j w _j z _j

w _j←(Nw _j)/W; $\forall j\in{\mathcal{J}}$

// compute the average of the intercepts

${\overline{\beta}}_{0\cdot}={(\sum_{j}{{\beta}_{0j}z_{j}})}/{N}$

// compute/update the risk scores at iteration k,

// which is composed of the average intercept value

// and the sum of weighted values for attributes

$x_{ik}=\overline{\beta}_{0\cdot}+\sum_{j}{\varGamma }_{j}$ w _j a _ij; $\forall i\in{\mathcal{I}}$

// compute total absolute error

ε _k=∑_i|x _ik−x _i,k−1|

// correction for the initial error values

if k=1 then

ε ₀=ε ₁

// termination condition

if ε _k=0 then

go to $\mathit{Iterations\_Completed}$

// compute absolute percentage error,

// and then its average over the last two iterations

${\overline{x}}_{\cdot k}={\sum_{i}{x_{ik}}}/{I}$

$e_{k}={100{\varepsilon}_{k}}/{{\overline{x}}_{\cdot k}}$

${\overline{e}}_{k}=(e_{k}+e_{k-1})/2$

// if the stopping criterion is satisfied, terminate the algorithm

if ${\overline{e}}_{k}<E$ then

go to $\mathit{Iterations\_Completed}$

// otherwise, continue with the regression modeling for each attribute j,

// and then go to next iteration

$\forall j\in{\mathcal{J}}$

// if the attribute is included in the risk score calculation

if z _j=1 then

// first remove the attribute value from the incumbent score

// to eliminate its effect

$y_{i}=x_{ik}-a_{ij};\ \forall i\in{\mathcal{I}}$

// then define the vectors for the regression model of that attribute

$\mathbf{y}= (y_{i} )$; $\mathbf{a}'=(\varGamma _{j} a_{\cdot j})$

$(p,\ {\beta}_{0},{\beta}_{1} )=\mathit {regression}(\mathbf{y},\mathbf{a}')$

// if the regression yields a high p value

// that is greater than the type-1 error,

// this means that attribute j does not contribute significantly

// to the risk scores

if p>α then

// and the attribute should not be included in risk calculations

z _j=0

else

// else it will be included (will just keep its default value)

z _j=1

// and weight for the attribute will be the slope value

// obtained from the regression

w _j=β ₁

// the sign of the slope is important;

// if it is negative, this should be noted

if β ₁<0 then

// record the sign change in the sign multiplier

Γ _j=−1

else

Γ _j=1

// advance the iteration count and begin the next iteration

k++

go to $\mathit{Begin\_Iteration}$

$\mathit{Iterations\_Completed}$

x _i=x _ik

return x _i, z _j,w _j,Γ _j, β _0j

END

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ertek, G., Kaya, M., Kefeli, C., Onur, Ö., Uzer, K. (2012). Scoring and Predicting Risk Preferences. In: Cao, L., Yu, P. (eds) Behavior Computing. Springer, London. https://doi.org/10.1007/978-1-4471-2969-1_9

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2969-1_9
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2968-4
Online ISBN: 978-1-4471-2969-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics