Introduction

Social Networking Sites (SNS) influence people’s daily life with a remaining rapidity to become an important social stage for social interactions [1, 2]. Facebook, My Space, Twitter are successful considerable examples [3, 4]. Facebook, with more than one billion users as of October 2012, is the most prevalent social network among many Internet-based programs people use to communicate socially with others [5]. The main application of Facebook is to permit users to share their personal thoughts and stories to establish new relationships and preserve existing ones [6]. So Facebook gives users a great level of easiness in connecting and communicating with others than in the past. From this point of view, the relationship between using Facebook and psychosocial outputs attracts researchers’ attention during the time [7, 8].

Personality is one of the interesting characteristics that can be considered for adaptation purposes. In the field of research, the personality of a person can be described as a set of specifications that compels a tendency on the behavior of the person; this tendency is unchanged through time and positions [9]. Having information about one’s personality period gives hints about how he would react when encountering different situations. Detecting a user’s personality can facilitate knowing his potential needs on different occasions [10]. Therefore, adaptive applications may take advantage of models of users’ personality to adapt their behavior accordingly. There are a wide marketing, healthcare or recommender systems, among others.

On the other hand, we believe that the personality can be traced by investigating users’ interaction in Online Social Networks (OSNs). In this context, an essential consideration would be whether virtual relationship and communication reflect user personality in the real world or offline life. For example, Barkhuus and Tashiro [11] and Cherubini et al. [12] state that Facebook which is currently the most frequently used OSN, is, apparently, a good similar example to the user’s offline life. If a special user occasionally interacts with another user through the internet, it doesn’t mean that they have many more real-life interactions. Additionally, it is proved that people use OSNs to maintain already existing relationships rather than establishing new ones (around 77% of Facebook users’ social relationships in the real life are replicated in the virtual environment) [11, 12]. For example, these studies present that if a customer has many associations in a social network such as Facebook, he just cooperates by a minor percentage of them commonly [13].

In this context, we developed a movie recommender, a Facebook application intended to acquire evidence about the user personality through a personality test, as well as to collect all the data available from the user interactions within the social network with a hobby for users to display premier movies in Hollywood. This research efforts to discover guidelines for evaluating user personality in Facebook, without any requesting him to achieve exact personality assessments. This research uses machine learning methods for constructing user personality classifiers. This classification qualified based on 100 user’s analysis for data collection of the application.

Most of the investigation is done to represent the existence of relationships between user personality and user interactions in social networks focus on investigating how single features correlate, on the average, with personal properties. In this approach, having data of a given user’s interactions in social networks would make it possible to predict his personality, at least regarding some personality qualities.

The main contributions of the research are as follows:

  • Developing a machine learning approach to predict the user personality through a personality test in the social network.

  • Presenting a recommender system for friends relationships in social groups.

  • Analyzing five kinds of personality features as their profile photos in the social groups.

The structure of this research is organized as follows: In “Related work” section, a brief literature review is presented. “Personality models and interaction data collection” section shows data collection and personality models in the Facebook application. “Collected data description” section presents the data collection procedure from Facebook users. In “Data analysis” section data preprocessing is explained and “Building the personality classifier” section shows how personality classifiers are built. “Analysis of results” section provides the experimental results of personality prediction. Finally, “Conclusions and future works” section illustrates conclusions and future research.

Related work

This section presents a brief literature review for analyzing related studies in the personality analysis in social networks. For example, a personality trait that is called extroversion is positively correlated with both the size of a given individual’s social networks and a number of social interactions that the individual is engaged in Asendorpf and Wilpers [14].

In a study that is done in 2010 [15], the participants are asked to report on their steady use of FB and also are asked to complete the NEO-PI-R as well as the Cooper Smith Self-Esteem Inventory [16]. The results manifest that extroverted people reported higher amounts of both SNS use and addiction to the internet.

In most of these research studies, generally the recent ones, information is collected by requesting the users to fulfill offline surveys. For example, Lampe et al. [17], Nosko et al. [18] and De Brabander and Boone [19] state that, while university students fill in most of the profile items (59%), a sample including university and non-university users only complete 25% of the information needed in profiles. More interestingly, they make it clear that if a user fills in his age and sentimental status can lead to inferring whether he keeps his profile public or private. Lampe et al. [17] proposed existing association rules between the number of groups and the total number of user’s profile. Specifically, this correlation is bigger with reference data than others (place of birth, school, etc.), then comes contact data (sentimental state, address, etc.) and lastly favorable data (music, movies, books, etc.).

Another work [20] analyzes relationships between users from a different perspective: they try to investigate into who uses Facebook and the relationship between the Big Five, being shy, selfishness, being alone, and Facebook usage. The results indicate that the users of Facebook tend to be more extroverted and, but not permanent users and almost feeling socially alone, than nonusers. Also, the popularity of Facebook tendencies with specific structures are shown to vary that are results of certain characteristics, such as self-praise, feeling alone and shyness.

The other research study has used the data based on personality to examine the relationship between various types of Twitter users and personality, including popular users and influential ones [21]. This study has collected just 335 users that specify their Twitter accounts in their Facebook profiles.

However, the studies presented some interesting results: those users that are famous and influential are emotionally stable and have extrovert personalities (they got low score in the Neuroticism trait); famous users are highly ‘imaginative’ (and got high score in Openness), while influential users tend to be ‘ordered’ (and got high score in Conscientiousness).

A previously conducted study [22] indicate that individuals that are extrovert belong to more Facebook groups, but not necessarily in a relationship with more Facebook friends. They also come with the result that Neuroticism is not related to the posting of information that shows personality and those who are low in Neuroticism tend to put photos on their Facebook profiles. While Ross et al.’s study relies on self-reports by participants, in a follow-up study. Amichai-Hamburger and Vinitzky [23] illustrated that being extrovert have a positive effect on the number of friends, but not related to the use of Facebook groups, and individuals with high level of Neuroticism show more tendency to put their photos on Facebook than those individuals with low neuroticism.

Also, Nie et al. [24] presented a measurement approach to use personality visual attributes in the user’s social features extracted in social media. Measuring the personality visual attributes are extracted into three challenges as follows: (1) feature selection (2) feature fusion and (3) feature absence. These challenges present a novel approach for evaluating personality distance between descriptive images in social media.

Huang et al. [25] proposed a personality characters analysis on effect online social associations. This research uses five personality theory to measure data collection of personality results. In addition, Bleidorn and Hopwood [26] presented a literature review for analyzing machine learning techniques for personality evaluation in social networks. In this review, some personality characteristics such as data collection, data extraction, and data prediction are analyzed. Finally, Lo Coco et al. [27] have presented a homogeneous classification for personality characteristics of a user’s Facebook. This classification evaluates examined association rules between profiles of Facebook usage, relational characteristics and personality characteristics in online social interactions.

Personality models and interaction data collection

As it is mentioned above, the intention of this investigation is to acquire an approach to recognize user personality without asking them to response a specific questionnaire. Thus we use a model of personality that illustrates its structure. 117 volunteers from among Facebook users accepted to participate in this study. Their age ranged from 18 to 50. Putting incomplete data away, 100 users’ profile information was the final data.

The traditional way of modeling personality structure is modeling factors. Three of the most famous models of structuring personality are the Eysenck three-factor model (that is known as the P.E.N. model, standing for Psychoticism, Extroversion, neuroticism) [28], Big Five model [29] and the Alternative Five [30]. There is no common consent about which model describes personality better. Nevertheless, it is usually accepted that their items or traits are frequently correspondents; the three of them present information about people’s reactions to different situations, and they give information to decide which academic procedure is better considering different personalities.

In this work, the big five factors model is opted for measuring personality traits that classify personality of users into five agents: Conscientiousness, Extraversion, Agreeableness, Neuroticism, and Openness to Experience. Highly extraverted individuals are self-assured and warm, rather than calm and cautious. Agreeable individuals are coordinated and courteous. Conscientious individuals are organized and precise. Neurotic individuals are not prone to be emotionally resilient. Lastly, highly openness individuals are receptiveness and prefer innovation to the routine. The Big Five can state as much of the variation in individuals’ personalities as possible, using a small set of trait dimensions.

In order to evaluate the relationship between user behavior in social networks and personality, it needed to do classification technique so in order to classify personality to five classes that explained before its necessary to collect interaction data in users’ profiles. To achieve this aim, data on profiles were collected through a Facebook application programmed using the Facebook API. This application distributed between contact which includes classmates and friends or coworkers and their friends. Participants were led to our application link, in which the aim of the research was explicated. Before the application could run, the application asked the user for permission to access their information in their profile like the number of friends or posts. The application involved the questionnaire NEO-FF-R-60 that they must answer them too. The application collects data on their profile until that moment which runs the application and stored them in a database. The collected data enumerated for each user that furnished to build the classifiers are Likes, Favorites, Language, Book, Job, Education, Sport, Activity, Game, Group, Cinema and movies, Music, Subscriber, Friends, Interests and hobbies, Links, TV shows, Question, Post, number of only texts, amount of photos in timeline, number of photos without text, news feed, shown in timeline according to Table 1. To encourage users to take part in our research, we promised them to email them their personality test results. These data considered as input variables of a rapid miner in order to classification.

Table 1 The components of Facebook used in research

Modeling techniques were provided by the Rapid miner toolbox. Elements like age and gender were eliminated since they were not included to improve the classifier accuracy.

Collected data description

Among users who participated in the survey, with removing incomplete data, just 100 instances responded reliably and correctly to personality inventory and gave permission to access their profile. Elements like age and gender were shown just for giving general information. Otherwise, there was no use according to Table 2.

Table 2 Distribution sex

Data analysis

Today’s real-world databases are highly sensitive to noisy, lost, and unsteady data due to their enormous size and their origins from different, heterogeneous resources [31]. Low-quality data leads to a low-quality outcome. There are numerous methods for preprocessing data. Data preprocessing can be used to eliminate outliers and noise, and solving unsteadiness [32].

To avoid missing information, the following mechanism is performed: in order to apply the suitable credible value to fill in the missing value, use the attribute mean for all samples belonging to the same class as the given tuple.

By observing the dataset, because of a low number of data set, we could find the data which their value was far from the average and known as noise and outlier data. Therefore ignoring method was applied because they might have an influence on the accuracy of the model and give inaccurate and unrealistic results.

Building the personality classifier

As it is declared above, most of the related works found in the psychology field struggle to find a correlation between the personality of users and their interactions in social networks through Statistical approaches. Whereas our probe focuses on seeking a criterion for forecasting users personality without asking them to fulfill the personality inventory.

Formerly, different machine-learning algorithms were used to establish classifiers of user personality. Techniques such as Naive Bayes [33], decision trees [34] and neural network [35], support vector machine (SVM) [36] were used to analyze the dataset [37, 38]. In this research, we applied some tricks for boosting classification accuracy. We focused on ensemble methods. A combination of classification was a composite model that included a set of classifiers. After individual classifiers voted, a class label anticipator was returned by the combination based on the group of votes. Combinations were more accurate than their component classifiers. Ada Boost [39] was one of the famous combination methods which we used in the present study.

In all built models split validation method was used that 70% of data was used to train data and 30% of data was used to test the model. By comparing F-measure and accuracy of classifiers obtained by applying these techniques one of the classifiers was elected as a proposed model for each personality factor.

Data about user profiles and personality inventory were utilized to train personality classifiers. The first step included defining the kind of prediction and anticipation the classifiers were expected to do.

In order to model all parameters that must adjust in rapid miner software, they are demonstrated in Table 3.

Table 3 Adjusting modeling parameters

Analysis of results

In this step, after running rapid miner the results analyzed to find out which classifiers for each personality traits is more accurate than others. After setting the parameters for each of the target variables, in order to find suitable classifier for each five-factor we run all eight classification technique for every five factors on the same testing data, therefore, a total of 40 models was run. The reason for this repetition was to find the most appropriate classifier for each of personality factors since it was possible that a classifier would respond better in one personality factor than other classifiers in another factor. For instance, if boosting-decision tree is selected as an appropriate classifier for extraversion, it would not be granted that is proper for consciousness factor.

The experimental results were examined and analyzed using typical procedures of F-measure and accuracy according to Tables 4, 5, 6, 7 and 8. For this purpose, the F-measure which is the result of combining two indexes of precision and recall is selected according to Eqs. (1) and (2).

$$\begin{aligned} precision &= \frac{TP}{TP + FP} \\ recall & = \frac{TP}{TP + FN} = \frac{TP}{P} \\ \end{aligned}$$
(1)
$$F = \frac{2 \times precision \times recall}{precision + recall}$$
(2)
Table 4 Accuracy, F-measure in extraversion factor
Table 5 Accuracy, F-measure in openness factor
Table 6 Accuracy, F-measure in consciousness factor
Table 7 Accuracy, F-measure in agreeableness factor
Table 8 Accuracy, F-measure in neurotic factor

As a result, in comparison with models, it is suitable to have a better accuracy and F-measure. For example, as shown in Table 4, the decision tree boosting with a precision of 93.33%. And the F error is equal to 96.15% as the best model for extraversion prediction, and Boosting-Naïve Bayesian model with a precision of 46.67% and the F measure of 40% is not recommended.

By transmitting the interaction data from the profile of a certain user, the personality classifiers would have the ability to predict which class of users belong to each of the five personality factors.

According to Table 4 and Fig. 1, boosting-decision tree with an accuracy of 93.33% and the F-measure to 96.15% could be selected as the proposed model to predict extroverts and Boosting Naïve Bayesian model with an accuracy of 67.46% and the F-measure to 40% was not proposed as an appropriate model.

Fig. 1
figure 1

Comparing the accuracy and F-measure of extraversion

According to Table 5 and Fig. 2, a neural network with an accuracy of 86.67% and the F-measure to 66.67% could be selected as the proposed model to predict openness and Naïve Bayesian model with an accuracy of 60% and the F-measure to 25% was not proposed as an appropriate model.

Fig. 2
figure 2

Comparing the accuracy and F-measure of openness

According to Table 6 and Fig. 3, boosting-decision tree and boosting-Naïve with an accuracy of 97.83% and the F-measure to 97.14% could be selected as the proposed model to predict in consciousness people and decision tree and SVM models with an accuracy of 95.56% and the F-measure to 93.33% was not proposed as an appropriate model.

Fig. 3
figure 3

Comparing the accuracy and F-measure of consciousness

According to Table 7 and Fig. 4, boosting-decision tree with an accuracy of 96.67% and the F-measure to 98.31% as the Boosting-Naïve Bayesian model to predict agreeableness people could be selected and Naïve Bayesian model with an accuracy of 66.67% and the F-measure was 79.17% which was not proposed as an appropriate model.

Fig. 4
figure 4

Comparing the accuracy and F-measure of agreeableness

According to the Table 8 and Fig. 5, boosting-decision tree with an accuracy of 86.67% and the F measure to 77.78% as the Boosting-Naïve Bayesian model to predict neurotics people could be selected and decision tree model with an accuracy of 76.67% and the F-measure is 36.36% which was not proposed as an appropriate model.

Fig. 5
figure 5

Comparing the accuracy and F-measure of neurotic

According to the limited number of samples, these results are obtained, which may achieve other results with other examples and even more examples or by repeating these conditions for other people achieve a different result. After finding the right classifier for each personality trait, personality can be predicted with five factors. There is no correlation between personality traits so each one is predicted independently. It should be noted that each result for each trait can’t have an influence on other results too. For instance, a person with low extraversion can’t decide that he is in a low or high group of Conscientiousness. The other traits are also like that so each trait act autonomous. Considering the output of NEO-60, each user had five values associated with the five traits of the big five model.

It is worth mentioning, for our prediction goal, that it was not so significant to know the exact score of one user in each factor hence possible scores for each trait of personality categorized them into two classes: low and high. For example, a value of 2 for the extroversion trait, the classifier forecasted that the user had a low extroversion tendency. As each trait was interpreted independently, the dataset entered to software five times, each time containing one of the five factors as a new label attribute according to Table 9.

Table 9 Description of personality traits in two class

For example, to find the personality of the person in five factors, at first, variables of Facebook profile’ user enter to the model that selected as a Better predictor. On the other hand, the person fills the NEO questionnaire and for each factor receive a score so based on the score earned, they divide to one of the classes high or low described in the previous section and as the target variable entered into the model. After running the modeling five times for each personality traits; finally, it can be seen that with a few percent accuracies, the model can predict the personality of the person correct.

So key novelty of this journal is that with the help of modeling and classifying individuals, we can predict the personality of users on social networks without having any history of them or even filling the questions of psychology. The following example can help to better understand according to Table 10. A user with this condition entered the model and it is expected that for instance, being extraversion with using boosting decision tree classifier predict this trait correctly up to 90%. The others in the same way.

Table 10 Example of the variable of one user

Table 11 shows comparison results of our work to other similar samples.

Table 11 Comparing with related work

Conclusion and future works

Ultimately, we tried to identify the personality of users indirectly without the use of traditional methods so user personality was not just recognized based on a questionnaire that gave expression whereas we could predict personality of users by their profile that was displayed during the time. Therefore, in this work, we got assistance from data mining which discovered fruitful information from a series of irrelevant data. Within results achieved, the boosting-decision tree was our proposed model that with 82.2% accuracy was more accurate than previous studies that were able to foresee personality according to the variables in their profiles in five factors. Furthermore, we intend to do for more examples, different nationalities and different conditions and compare them with our results. By knowing the personality, this model can be used for other purposes, such as the recommender system of friends and social groups, and even can be used for promotional purposes.

The proposed method can be amended in some facets so we are planning to predict personality with other techniques such as text mining via utilizing words in posts and comments in user’s timelines to predict personality. Moreover, the other proposal is researching what kind of photos will be used by each of the five kinds of personalities as their profile photos. Ultimately for increasing accuracy of classifiers, we are eager to use fuzzy classification in future works.