Transferring recommendations through privacy user models across domains

Raber, Frederic; Krüger, Antonio

doi:10.1007/s11257-021-09307-6

Transferring recommendations through privacy user models across domains

Open access
Published: 08 November 2021

Volume 32, pages 25–90, (2022)
Cite this article

Download PDF

You have full access to this open access article

User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Transferring recommendations through privacy user models across domains

Download PDF

2193 Accesses
2 Citations
Explore all metrics

Abstract

Although privacy settings are important not only for data privacy, but also to prevent hacking attacks like social engineering that depend on leaked private data, most users do not care about them. Research has tried to help users in setting their privacy settings by using some settings that have already been adapted by the user or individual factors like personality to predict the remaining settings. But in some cases, neither is available. However, the user might have already done privacy settings in another domain, for example, she already adapted the privacy settings on the smartphone, but not on her social network account. In this article, we investigate with the example of four domains (social network posts, location sharing, smartphone app permission settings and data of an intelligent retail store), whether and how precise privacy settings of a domain can be predicted across domains. We performed an exploratory study to examine which privacy settings of the aforementioned domains could be useful, and validated our findings in a validation study. Our results indicate that such an approach works with a prediction precision about 15%–20% better than random and a prediction without input coefficients. We identified clusters of domains that allow model transfer between their members, and discuss which kind of privacy settings (general or context-based) leads to a better prediction accuracy. Based on the results, we would like to conduct user studies to find out whether the prediction precision is perceived by users as a significant improvement over a “one-size-fits-all” solution, where every user is given the same privacy settings.

From Data Disclosure to Privacy Nudges: A Privacy-Aware and User-Centric Personal Data Management Framework

Enhancing User Privacy in Mobile Devices Through Prediction of Privacy Preferences

Default Privacy Setting Prediction by Grouping User’s Attributes and Settings Preferences

1 Introduction

In the past, privacy settings were often neglected by users, especially when they perceive the shared data as non-harmful, like their posts in social networks (Majeski et al. 2011). Although users have become increasingly engaged in privacy settings, there is still some space for improvement (Dey et al. 2012; Stutzman et al. 2013). Studies have shown that even data inside a social network can be used for various attacks like stalking, identity theft, social engineering attacks, or face re-identification (Gross and Acquisti 2005). Users prefer more restrictive privacy policies; however, the default settings in social network sites are very permissive, significantly more permissive than desired by users (Watson et al. 2015). Social network sites and location services tried to tackle the problem by offering automatically generated friend lists, which allow an easy audience selection for new content to be published. However, studies have shown that only 17% of all posts are published using at least one of these automatic friend lists (Mondal et al. 2014), either because the additional effort is perceived as too high, or because the friend lists are not perceived as suitable for selecting the post audience (Mondal et al. 2014). Manually created friend lists typically lead to better results; unfortunately, they need even more user interaction and are thus not often used either (Paul et al. 2011). As a consequence, users have to pay attention when, for example, a photograph of them is taken which could be uploaded in social media, and apply workarounds to prevent being photographed and published on a website without their consent (Rashidi et al. 2018).

Research has tried to support users by automating the generation of privacy settings using machine learning, for example, by using privacy settings from earlier posts as an input or by asking the user for feedback on privacy settings (Sinha et al. 2013; Fang and LeFevre 2010; Lugano and Saariluoma 2007). Also, context factors like the occasion when sharing a location, or the recipients of a post or location, have been found to have an influence on privacy settings (Benisch 2011; Consolvo et al. 2005; Patil et al. 2012) and are threrefore taken into account for the prediction. Other approaches take the personality of the user and her general privacy preferences according to the IUIPC privacy questionnaire into account for deducing the privacy settings (Raber et al. 2017; Raber and Krüger 2018).

However, all those approaches need either domain knowledge (for example, privacy settings from earlier posts) from the domain where the privacy settings have to be predicted, or personality and privacy profiles of the user. In some cases, especially in the critical moment when a user has just created a new account and is overwhelmed by the privacy settings that he should adjust, none of the data is available, also known as the cold start problem (Schein et al. 2002). However, most users have already used other systems where privacy settings had to be selected which can be suitable as an input for the prediction, also known as cross-domain user modeling. Other models use ontologies to describe a user model, allowing to recommend the degree of information shown to the user based on the current user state and a reasoner (Heckmann 2006). Cross-domain recommender systems or ontology-based systems that are tailored especially towards privacy settings have so far, to the best of our knowledge, not been part of research. In this article, we took four domains as an example, to find out whether and how well the privacy settings of a domain can be predicted using the privacy settings from one or several other domains.

As we have shown in earlier work, privacy decisions are not ultimately binary (Raber et al. 2017; Raber and Krüger 2018). In social network posts, a user might not just hide or show the complete post, but might want to take a middle road and hide only the post image or comments, while still sharing the post text with a certain friend group. Furthermore, different domains have different privacy options to set, which makes it hard to directly compare the privacy settings per se. We therefore use privacy levels in our prediction; these can be resolved to concrete privacy settings after a prediction, depending on the domain (see Sect. 3). In our work, we discuss two different granularities of privacy levels: first, mean domain privacy levels describe an average privacy level over all privacy levels for a user in a specific domain, independent of context factors. There is exactly one mean domain privacy level per domain. In contrast to that, there are multiple context-based privacy levels for a domain, one for each combination of each context factor instance (for example, there is one context-based privacy level for the post topic “family affairs” (context factor 1) in combination with the recipient group “school friends” (context factor 2). Currently, social media or location sharing services offer users a “one size fits all” solution for their users, where everything is set to a specific default value at the beginning. Using the user’s mean domain privacy level, the service could already tailor all privacy settings to be more restrictive if the user has a high mean domain privacy level for that domain, or use a looser set of default privacy settings if the mean domain privacy level is low. Using the context-based privacy levels, one could tailor the privacy settings even better to the user, by providing different privacy settings for different contexts, for example, when a new post about “family affairs” has to be shared with “school friends”. Privacy recommenders are often based on the same general privacy attitudes, for example, the IUIPC questionnaire (Malhotra et al. 2004). We therefore speculate that mean-based privacy levels and context-based privacy levels, although being specific to the domain, all contain a specific privacy attitude which is unique to the user and can therefore be used to transfer desired privacy settings to other domains.

In this article, we will build up on our previous work on deriving privacy settings using context factors and individual factors (such as user personality or privacy attitude). Based on the user studies presented in this article, we want to compare the four different domains of privacy levels for social networks, location sharing, intelligent retail data, and mobile apps permissions regarding how privacy levels for each of those domains can be predicted using the privacy levels from the other domains; and to what extent the usage of context-based privacy levels, using different values for each context factor, plays a role in this context. In contrast to previous work, we will not use solely context or individual factors as a source for the prediction. Instead, we will predict the privacy settings of a domain using the privacy settings from another of the aforementioned domains. In our work, we will investigate which of the two mentioned granularities, mean domain privacy levels or context-based privacy levels, work best for predicting the mean domain privacy levels and the context-based privacy levels, which domains and, inside a domain, which of the privacy levels should be used for the prediction and how precise a prediction can be. For this purpose, we trained a cross-domain recommender system with a small training set, which can be seen as a lower bound for the precision that could be achieved using a large data set. According to our results, it is possible to use privacy levels from other domains for a prediction, but interestingly, the results also show that it is not always the case that using more fine-grained privacy levels, e.g., more data, leads to a better prediction. The approaches allow a prediction about 15–20% better than random, which looks small at a first glance. However, a traditional within-domain prediction, which forms an upper bound for the precision, can only lead to a precision of 20–25% better than random.

To conclude, the article has two main contributions: To the best of our knowledge, we discuss the first approach on cross-domain privacy recommendations on four exemplary domains. Furthermore, we compare four different techniques for the recommendation, involving either an average privacy level for a domain, or a multivariate set of context-based privacy levels for a domain.

In the next sections, we will first discuss related work, as well as our own previous work on predicting privacy levels using a user’s personality and privacy attitude, before we present the user studies and results that we conducted in order to answer the aforementioned questions.

2 Related work

First approaches supporting users in choosing their privacy settings used existing privacy settings, for example, from social networks, to predict the remaining privacy settings on Facebook using machine learning. Sinha et al. used a Latent Dirichlet Allocation (LDA) and Maximum Entropy to propose privacy settings for a user’s friend groups on Facebook (Sinha et al. 2013). Other approaches rely on user-generated input for the prediction of privacy settings, for which the user has to label some social network friends with privacy privileges. These partially complete settings are then used as an input to generate the privacy privileges for the remaining users (Fang and LeFevre 2010). Although such a supervised method in general produces more accurate results (Barnes 2006), it comes with an increased user burden. This is especially crucial, as most users need a trigger rather than an additional user burden, like social triggers where they interact with or observe other users, in order to become active (Das et al. 2019). Studies have shown that fully automated privacy recommenders are preferred to those that need user interaction (Zou et al. 2020) and that users also tend to abandom privacy recommendations if they are perceived as low-value or inconvenient (Zou et al. 2020). However, an automated prediction of privacy settings has always to be accompanied by salient privacy notices, especially about risky privacy practices (Ebert et al. 2021) or a privacy user interface that gives the user a quick overview on the current privacy state (Christin et al. 2013) and that allows to review and adapt the privacy or permission settings (Tsai et al. 2017). In the mobile phone domain, also shoulder surfing is a privacy issue which can be moderated by informing the user about such attacks (Zhou et al. 2016).

Some approaches even rely on a questionnaire that has to be filled out before the prediction can be used (Lugano and Saariluoma 2007). As a combination of both, there also exist semi-supervised methods that use existing social network profile information of the user and graph properties together with active learning methods to reduce the user burden (Shehab and Touati 2012). Using crowdsourcing for gathering training data for the prediction has also been discussed to find an effective tradeoff between usability of and data privacy (Ismail et al. 2015). Privacy recommenders using the permission type, app name and the current state of a smartphone app (i.e., which app is currently in the foreground and how visible the app sending the permission request is) can reduce the amount of unwanted disclosures by about 75% compared to the standard permission dialogue on Android (Wijesekera et al. 2018). Also data retention, i.e., obfuscating or deleting data after a certain time span defined by the user, is a privacy technique which is appreciated by users (Ebada Mohamed and Chiasson 2018), especially if they have the possibility to actively send data into retirement (Murillo et al. 2018).

Research also proposes the use of privacy stereotypes, which should simplify the process of choosing privacy settings: In the location sharing domain, most of the time, locations can be shared with only a few different sets of location sharing settings (Ravichandran et al. 2009). Only three privacy stereotypes match the user’s privacy settings with an accuracy of 90% at any given time (Ravichandran et al. 2009). Another recent publication showed that it is possible to cluster users into five groups of users using a questionnaire, allowing to assign each of them a privacy policy which is tailored for their respective privacy needs (Lynn Dupree et al. 2016). This can also be done for the fitness domain using recommender systems and machine learning (Ref Sanchez et al. 2020). Privacy decisions can be predicted by what the authors call cognitive heuristics (Shyam Sundar et al. 2020), which are shortcuts that allow a fast decision-making for the user. If, for example, the website provider is a popular name, brand or organization, users often imply that the website provider can guarantee the security of the website and are therefore more willing to disclose private data. Lying about private questions online is also a widespread privacy protection behavior which can be predicted using the results from a privacy quesionnaire (Sannon et al. 2018).

Research on privacy stereotypes has shown that users often follow a multi-dimensional approach (Knijnenburg et al. 2013; Wisniewski et al. 2017) when deciding which data to share. This allows to distinguish users into user stereotypes based on their sharing behavior (Knijnenburg et al. 2013) or the used privacy strategy (Wisniewski et al. 2017). For example, one stereotype may be fine sharing location-related but not interest-related items, whereas another group may behave exactly the other way around. To which of the stereotypes a user belongs to, can be decided based, for example, on personality, or demographic factors (Knijnenburg et al. 2013). In our opinion, such individual and context factors like the data type or the personality of the user are one of many other (context) factors that play a role in the user’s decision. In Sect. 3, we will point out some exemplary context factors which are present in different domains according to recent research, and which we will investigate in our work. However, which context factors influence the users’ privacy decisions is an active research field, we therefore cannot use an exhaustive list of context factors for our study. In our work, the context-based privacy levels form a multidimensional table of privacy decisions, accounting for the multidimensionality of privacy decisions in the aforementioned publications. We discuss both the recommendation of such multidimensional privacy levels (mean-based context-aware regression analysis (MCR) and context-factor-based context-aware regression analysis (CCR) method) as well as the usage of multidimensional privacy levels as a source for the prediction (context-factor-based regression analysis (CGR) and context-factor-based context-aware regression analysis (CCR) method).

The aforementioned approaches all predict privacy settings for a single domain, and are therefore also called single-domain recommender systems. If no or only sparse information is given about the user in the domain for which the recommendation has to be performed, those approaches fail. Cross-domain recommender systems use user models from several domains in order to derive recommendations even if the data about the user is unsufficent in the recommendation domain. Although single-domain recommender systems should be preferred due to their higher prediction precision, if available (Sahebi and Brusilovsky 2013), cross-domain recommender systems have the advantage that they are able to predict settings for more than one domain, leading to an increased user engagement and satisfaction (Adomavicius and Tuzhilin 2005).

Other approaches do not try to transfer the user model from one domain to another, but rather collect the user models from several domains in a common format like an ontology, so that an ontology reasoner can be used to infer a user model based on the data (Heckmann 2006). An example of such an approach is the Ubiquitous User Model by Heckmann (2006), consisting out of two parts: First a general user model (GUMO) containing general information about the user like demographic data, personality and characteristics, emotions, etc., together with domain-specific interests (like favorite movies or books), and second the SituationReports, which describe the current situation the user is in, according to sensor data from the Ubiquitous environment; for example, whether she is stressed according to the heart rate monitor, or whether she is in a hurry according to video cameras detecting the walking speed of the person. Based on these two concepts, the Ubiquitous User Model can give recommendations for a user interface, for example, that the navigation at the airport should be simplified if the user is in a hurry (Heckmann 2006).

Apart from approaches to predict privacy settings, researchers also found that context factors play a significant role in user’s privacy decisions (Ebert et al. 2020), and identified different context factors that are important for the choice of privacy settings in different domains. In location sharing, several studies found that the person requesting the location is one of the main context factors (Benisch 2011; Consolvo et al. 2005). Some also state that the time and day of the week as well as the location plays a role (Benisch 2011). Also whether the person is in a relationship and in which stage of the relationship, plays a significant role on the sharing behavior with the user’s partner (Young Park et al. 2018). Later studies reviewed these results and found that it is not the time and day that is the appropriate context factor (Patil et al. 2012), but besides the requestor, the user’s occasion or activity is the second main context factor that is important for the decision whether to share the location or not (Consolvo et al. 2005). The granularity of the shared location also plays an important role (Patil et al. 2012), meaning that the option to share a coarse-grained location like “only the city name” or “only the country” should also be available. Similar context factors could also be found in the social media domain, where the topic of the post, as well as the receiving friend of the post, are important (Raber et al. 2017). However, studies also found indications that the topic of the post plays only a minor role. Also the age of the user plays a role: Whereas younger users tend to decide based on trust, older users decide based on the perceived benefits for disclosing their data (Ghaiumy Anaraky et al. 2021). Another context factor found to be highly significant is whether users are paid for disclosing their data. Even if the negative consequences are clarified, people are likely to share their data when they are paid (Hutton et al. 2014). However, we do not want to support users being paid for disclosing more data than they desire, which led us to the decision to exclude this context factor for our research. In the mobile app domain, researchers achieved the best results when using the permission type and app id or category for the prediction (Liu et al. 2014, 2016), or a combination of app name, permission type, the foreground app, and the visibility of the app making the permission request (Wijesekera et al. 2018). Using a large database of about 4.8 million users, and those two context factors, a prediction accuracy of 64.28% to 87.8% is possible using machine learning (Liu et al. 2014). If user feedback is integrated, a similar semi-supervised approach was able to achieve an acceptance rate of 78.7% of the proposed settings. The domain of intelligent shopping data has been investigated by our previous research (Raber et al. 2018). In a study, we found the data type in question, as well as the requesting stakeholder (e.g., the retailer, third parties like marketing agencies, etc.) to have a significant influence on the privacy decisions.

All work presented here used different approaches to predict the privacy settings in a binary deny-or-allow fashion, using either existing privacy settings from the same domain, context factors that influence the privacy decision, or additional user input for their prediction. Other recommender systems instead use user behavioral data, for example, for developing user models for adaptive cybersecurity (Addae et al. 2019) or to infer a degree of diversity for recommender systems suitable to the user’s personality (Wu et al. 2018). In the past, we already did some research on how the personality and privacy desires (according to the IUIPC^{Footnote 1} scale) can be used together with context factors as an input to predict privacy levels for specifying the correct audience for a social network post (Raber et al. 2017), the detail level of a shared location (Raber and Krüger 2018), which data out of an intelligent retail store should be shared with whom (Raber et al. 2018) and to assist the user in adapting the permission settings for her smartphone apps (Raber and Krüger 2017). Another approach of inferring privacy settings might be done using self-reported privacy measures as a basis for the prediction, allowing to better infer the user’s actual privacy behavior rather than their reported privacy desires (Faklaris et al. 2019). Other researchers already explored whether form-based profiles (e.g., personal profile data entered into forms) and tag-based profiles (e.g., tagged photographs) can be inferred between different social web providers like Twitter, Facebook an Tumblr (Fabian et al. 2013). So far, to the best of our knowledge, it is unknown whether the privacy settings of a domain can be predicted using the privacy settings from multiple other domains, especially when considering privacy levels instead of binary privacy decisions.

3 Background

In earlier publications, we concentrated on deriving the privacy levels using what we call context factors (like the topic of a post or the occasion when a location is shared) together with individual factors (like the personality or privacy attitude of the user) to infer privacy levels for four different domains, namely for posts in a social network, location sharing services, the data recorded inside an intelligent retail store like Amazon Go^{Footnote 2}, and the permissions of smartphone apps. Although individual factors are usually not available, both the big five personality traits, as well as the IUIPC privacy measures can be derived using data from the social web (Farnadi et al. 2016; Raber and Krüger 2018) without adding any user burden. As stated in the introduction, the disclosure decision is not ultimatively binary. Research has shown that when users express their privacy policy in a free-text form, they tend to have fine-grained privacy policies that also allow them to share an obfuscated position like the street or city center instead of the exact location (Patil et al. 2012). Recently, social network providers like Facebook also offer sharing an obfuscated location, for example, only the city of the current location, allowing the users to share their position without disclosing too much information. Inspired by the aforementioned work, our earlier studies also offered multiple privacy levels where it is technically possible.

Within all our studies, people actually used the offered fine-grained privacy levels. In the location sharing domain, for example, the intermediate privacy level “city only” was used most frequently, even before the two binary options “exact location” and “no location”, indicating the user acceptance of this approach. Details on the actual frequency of use can be found in Raber et al. (2017), Raber and Krüger (2018).

For each of the four domains, we used a set of context factors, as well as the individual factors as an input for the prediction of the privacy levels. Table 1 summarizes the investigated domains, the used context factors and the privacy levels. Note that the input for each recommendation always consists of the listed context factors and the individual measures (IUIPC questionnaire and big five personality inventory). For a detailed description of the study and its outcome, please refer the following subsections.

Due to the privacy paradox (Barnes 2006), the user’s online behavior when setting privacy settings significantly differs from their actual privacy desire. Therefore, for all our studies, we investigate the user’s privacy desire, i.e., the desired privacy settings using questionnaires rather than investigating their actual privacy behavior.

In this section, we will discuss our published research that is of importance for this article. Research described in later sections is new and unpublished so far.

Table 1 Overview of the earlier work on recommending privacy levels using individual measures and context factors

Transferring recommendations through privacy user models across domains

Abstract

Similar content being viewed by others

From Data Disclosure to Privacy Nudges: A Privacy-Aware and User-Centric Personal Data Management Framework

Enhancing User Privacy in Mobile Devices Through Prediction of Privacy Preferences

Default Privacy Setting Prediction by Grouping User’s Attributes and Settings Preferences

1 Introduction

2 Related work

3 Background

3.1 Social network privacy levels

3.2 Location-sharing privacy levels

3.3 Smartphone app permission settings

3.4 Intelligent shopping data

4 Cross-domain user modeling for privacy settings

5 Exploratory study

5.1 Results and discussion

5.1.1 Context factor difference analysis

5.1.2 Mean-based generic regression analysis (MGR) and Context-factor-based generic regression analysis (CGR)

5.1.3 Mean-based context-aware regression analysis (MCR) and context-based context-aware regression analysis (CCR)

6 Validation study

6.1 Results

6.1.1 Mean-based generic regression analysis (MGR)

6.1.2 Context-factor-based generic regression analysis (CGR)

6.1.3 Mean-based context-aware regression analysis (MCR)

6.1.4 Context-factor-based context-aware regression analysis (CCR)

7 Discussion

7.1 Predicting mean domain privacy levels using MGR VS CGR

7.2 Predicting context-based privacy levels using MCR VS CCR

7.3 Size of the effect and user acceptance

7.4 Which data set is to be used for a prediction

7.5 The privacy paradox in privacy recommender systems

7.6 Future work

8 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation