1 Introduction

Information and language technology can play an important role in assessing the changing behavior of individuals and can function as a means to transform cities into environments that support green and healthy lifestyles. In the context of the European project Grage (Gray and Green in Europe: elderly living in urban areas - www.grageproject.eu), we are carrying out a behavioral analysis of Dutch users based on social media data extracted from Twitter since increasingly more older adults make use of social media in the Netherlands, being at the forefront compared to other European countries.

Our goal is to achieve a better understanding of people’s behavior on the basis of language use since previous literature has shown that there are several linguistic features that change as life progresses. In addition, we consider the use of hashtags that are a social media specific feature.

2 Previous Literature

In researching ageing behavior, it is important to define age, which is usually assumed to be the time someone has lived from one’s birthdate up until the moment of inquiry (i.e. chronological age). The studies that consider language use often focus on chronological age [1,2,3]. However, individuals of the same age can use language in completely different ways, for example because they take different places in society. Therefore, the focus of various researchers has shifted to grouping subjects on life stages rather than on chronological age [4].

Previous studies have analyzed the linguistic features that change in combination with age. In particular, [1] found that aging is associated with a decline in negative emotion words and at the same time an increase in positive ones. Pronouns are quite revealing of social relationships and identity: subjects younger than 14 and older than 70 use first person plural most often, while individual in between these two age groups used it less. Another significant finding was that the use of self-references (meaning first-person singular pronouns) decreased dramatically with age, with an extreme decline for subjects older than 70.

On the other hand, [2] found that younger people’s vocabulary contains a wide variety of slang words, such as swear words or non-derogatory slang. The domain of personal pronouns yielded as result that younger people use the pronouns ‘I’, ‘my’, ‘myself’, ‘me’, and ‘you’ significantly more than the older group. The older group showed a higher frequency for third-person singular and plural, as well as first person plural.

While these papers focused mainly on spoken and written resources for their analyses, there is a growing number of studies relying on a computational analysis of social media. They also focus on the use of pronouns as a variable and they suggest a tendency for younger people to use more self-references in the form of first-person pronouns in their language [3]. As people grow older, they tend to use more first person plural pronouns [3, 4].

As for social media specific features such as hashtags, [4] found that hashtags are used more often by older Twitter users: low usage in teens, a steep climb in the 20s, the highest and continuous use through the years up until the oldest participants category (over 60 years of age). Furthermore, [5] noticed that in Instagram the group of teens (13–19 years) posted fewer photos but added more hashtags to them than the adults (25–39 years) do. With respect to content the adult group displays a wider range of interests in topics and are very diverse: arts/photos/design, locations, mood/emotion, nature, social/people while the majority of the teens’ hashtags concern mood/emotion and follow/like.

3 Social Media Use Among the Elderly

The Netherlands are at the forefront with respect to internet and social network use, this is the case also for the elderly [6]. Social networks use, at European level, in 2013 can be seen in Fig. 1, which shows that The Netherlands with 55% of users between 16 and 75 years old belong to the top positions. They are behind the Scandinavian countries (with Denmark scoring 63%) and England (less than 60%) and score above European average in use (43%).

Fig. 1.
figure 1

Social media use in EU countries in 2013

If we consider the social media use (this includes online chatting, writing or reading weblogs, e-mailing and use of social and professional platforms) of the Dutch population in the last years (cf. Fig. 2), we notice a clear increase in all age groups. Almost 60% of those between 65 and 75 are active on social media while those above 75 years old are 20%.

Fig. 2.
figure 2

Social media use 2014–2016

The development in the use of the various social networks for different ages in the years 2012 and 2016 is illustrated in Fig. 3. Where the two years are represented over each other and they partially overlap. It is possible to see that younger people use platforms such as Facebook and Twitter less, while the older population is using social networks more, this is especially the case for instant messaging. Furthermore, the use of professional social networks has increased in the age group 22–35 years old.

Fig. 3.
figure 3

Development of social network use

4 Language Analysis

Since social media are being used by an increasingly higher number of people, especially in The Netherlands, it seems relevant to employ social network data to analyze their behavior through language use. Users on a social media platform write spontaneously about their interests, providing information about their needs and their behavior.

In our work, we make use of chronological age of users, but we group them in three classes that reflect three life stages that are related to the active working life of the individuals (i.e. below 55), the pre-retirement stage (between 55 and 67) and post retirement (above 67). In this way, we overcome possible criticism in the use of chronological age [4, 7]. The users we selected from Twitter had to meet several requirements: their tweets should be publicly accessible and written in Dutch; they should have tweeted at least 400 tweets and they should show social activity, that is users with a number of followers higher than 300. The classification of users in age groups was carried out manually on the basis of their profile information and the picture posted, the same with respect to gender. Two people have verified the selection in order to retain only those users for which there was agreement in the classification.

Fig. 4.
figure 4

Frequent words of under-55

We have carried out a preliminary analysis of the most frequent words in the tweets of the selected users and we have focused initially on two age groups. We have carried out a pilot based on 18 users below 55 and 10 users above 67 in order to increase the chances of identifying different behaviour in language use.

Fig. 5.
figure 5

Frequent words of above-67

The word clouds (cf. Figs. 4 and 5) reveal that Dutch articles ‘de’ (‘the’), ‘het’ (‘the’) and ‘een’ (‘a’) are very common in the text in both age groups, as well as the use of prepositions: ‘in’ (‘in’), ‘van’ (‘of’, ‘from’), ‘voor’ (‘before’, ‘in front of’), ‘op’ (‘on’) and to a lesser degree ‘over’ (‘over’) and ‘bij’ (‘by’) are all used frequently. However, to assess behaviour, the use of pronouns is the most interesting to analyse and we notice that in the case of the group under-55 the pronoun ‘je’ (‘you’ sing.) is more frequent than ‘ik’ (‘I’). In the above-67 group, the presence of the pronoun ‘ik’ (‘I’) is attested as well as that of the first-person plural ‘wij’ (‘we’). To assess whether one group uses a word more frequently than the other, a statistical metric suggested by [8] has been used:

Pov67(w) = relative frequency of word w in tweets of subjects over 67

Pun55(w) = relative frequency of word w in tweets of subjects under 55

$$ {\text{variance}} = \sqrt {\left( {Pov67\left( w \right)/Nov67 + Pun55\left( w \right)/Nun55} \right)} $$
$$ {\text{t}} = \left( {{\text{P}}_{{{\text{ov}}67}} \left( {\text{w}} \right) - {\text{P}}_{{{\text{un}}55}} \left( {\text{w}} \right)} \right)/{\text{variance}} $$
(1)

If the score t is positive, the word is used more often in the tweets written by people older than 67, if negative, the word is more frequent amongst the under 55 group.

Our analysis, revealed that the overall pronoun use increased with age and the pronoun category and the highest frequency in both groups was first-person singular. This usage might imply that the reason for people to use social media is to convey one’s own interests and experiences, even as people become older. In addition, as can be seen in Fig. 6, the two age groups make a very similar use of the pronouns contrary to what assumed in previous literature.

Fig. 6.
figure 6

Pronoun use in the two age groups

5 Hashtags Analysis

Our preliminary analysis of language use has showed that the two extreme age groups share a very similar behaviour in this respect, more specifically in the case of pronoun use. It is thus relevant to assess whether this is the case also with respect to the topics being addressed and an analysis of hashtags use might be revealing. They are a social media specific feature and a relevant source of information given that they are used to index keywords or topics in Twitter, they are indicative of the debate that is being carried out in the platform. We have conducted a preliminary analysis based on the two age groups, that allows for a manual investigation given the limited number of users. Out of the 103.097 words of the tweets belonging to the under-55 group, 5318 were hashtags, that is 5.2%. For the over-67 years old users, it was about 1% (i.e. 532 hashtags out of a total of 52.150 words). One possible explanation for this higher usage of hashtags in the younger group could be that Dutch citizens over-67 are usually retired, whereas those in the younger group still have a working life. Many of their hashtags have some relation to professions or the labour market and they might use hashtags to target specific audiences while this might not be the goal of the older group. We have selected the top 100 most frequent hashtags in each group which we have divided into categories, loosely based on the categories illustrated by [5], as can be seen in Fig. 7.

Fig. 7.
figure 7

Hashtag categories

There are clear differences in the topics being addressed by the two groups. The most popular ones amongst elderly users are politics, locations, entertainment and news. Together, these categories comprise 75% of the hashtags. The under-55 group focuses more on work- and world-related tweets, with news, occupational terms, entertainment and companies as the most popular topics. The most noticeable difference between the two groups in terms of topics is that in the over-67 group, none of the topics of the hashtags relate to occupations or companies, and the references to congresses or fairs were much lower than for the under-55 group. Users under 55 write more tweets related to their profession while this is obviously not the case any more for the old adults.

Differences are even more obvious when we look at the theme of sustainability where hashtags related to it are only present amongst the younger age group where users belonging to it tweet about this subject in a work-related context. Similar difference can be noticed with respect to the hashtags related to nature that are used by the younger group more than by the group of the older adults. The over-67 group shows an extensive use of location-tags, more than three times as much as the younger group, and entertainment. One might conclude that for those over-67, Twitter is a means to communicate about leisure activities and about their interests where nature and sustainability do not occupy an important place in their communication (cf. [9] for further details).

If we extend this preliminary analysis to the three age groups and additional users (i.e. 100 users), this behavior is confirmed, as can be seen in the word-clouds in Figs. 8 and 9.

Fig. 8.
figure 8

Hashtag use for the groups under-55 (left) and between 55-67 (right)

Fig. 9.
figure 9

Hashtag use for the group above-67

We notice that the two groups below-55 and in between 55 and 67 exhibit a similar behavior with respect to Twitter use in that their discussion evolves around work related and technology related themes, this is even more the case for the 55+. The hashtag use is different for the group above-67 that is especially interested in politics, which is to be expected since this is a group that was more confronted with politics in their youth, as can be seen in Fig. 9. Leisure activities and television programs are also mentioned.

6 Conclusions

Previous literature has identified several linguistic features that change as life progresses: the use of pronouns being one important discriminator. We do not find this difference among the two age groups analyzed and we claim that this might be due to the fact that since more elderly are active in social media, they might tend to uniform to the language that is used in the platform regardless of age [10]. In a subsequent analysis, it would be relevant to assess when users joined the platform to evaluate whether membership age plays indeed a role.

We have also considered the use of hashtags and we noticed that here are differences in the topics addressed by the groups analyzed, while the elderly use hashtags mainly in relation to leisure and politics, the younger ones use them mainly in relation to their working life. Another important difference is that while the younger group uses hashtags related to nature and sustainability, this is not the case for the elderly who, however, use location tags indicating an interest for the places they live in, but maybe less for the environment. We claim that social media could play an important role in changing this attitude and behavior. We have observed that these differences are attested also when the analysis is extended to additional users and to the three age groups.