1 Introduction

Various mobile dating applications (short for "dating APPs") have emerged in recent years [1, 2] as a result of the rise of smart phones and Internet technologies. Tinder, a popular dating app, attracts 50 million users, more than 100 million downloads, and 10 million DAUs (daily active users) [3]. Despite being referred to as "dating APPs," recent research reveals that users may utilize dating APPs for purposes other than dating [4,5,6,7]. For instance, Van et al. [7] conducted a case study with more than 500 participants. According to the results, there are primarily six uses for a dating application, including friendship, entertainment, social etiquette, romantic relationships, and location-related searches.

Knowing the usage purpose allows us to enhance the user experience, comprehend user behavior, and optimize the functionality of dating APPs. First, understanding the user's intentions can help dating app service providers reduce information overload and provide personalized service, which may improve the user experience [8, 9]. For instance, when a user searching for Mr. Right views nearby users on a dating application, it is more appropriate to recommend friends with the same goals. When a user of a dating app searches for "chat" with the intention of chatting, it is more appropriate to display people who also want to chat than to only display "chat"-related words. Second, usage purposes can assist service providers in customizing promotional efforts to grow and retain their user base [5, 10]. Sumter et al., for instance, assert that a comprehensive understanding of the usage purposes of dating APP users is necessary for analyzing why a user selects a particular platform. By investigating the usage purposes of a dating application, Gudelnuas et al. [11] discuss how these usage purposes and needs attract users to dating applications and how to increase user satisfaction using the gratifications approach. Third, usage purposes can help optimize the HCI design of dating applications [12, 13]. For instance, Hardy et al. [12] investigate the methodology of optimizing the functionality of dating APPs from the perspective of HCI design in order to meet the needs of dating APP users with varying usage goals.

Due to the following two factors, it is difficult to determine the usage purposes of dating application users despite their importance. First, the majority of dating APPs do not provide users with the functionality to report their usage purposes. Second, despite the fact that some dating applications allow users to self-report their usage purposes, many users continue to disregard this functionality and do not report their usage purposes. To better understand user behavior, optimize dating APPs, and enhance user experience, it is necessary to infer dating APP users' intentions, which has not yet been investigated.

In this paper, we intend to infer users' usage purposes by analyzing the heterogeneous data they leave behind on dating applications. However, it is not straightforward to achieve this objective, as this work must confront the following obstacles. (1) The social media data generated by users of dating applications are abundant and diverse (e.g., text, and images). It is difficult to fully leverage this heterogeneous data for usage purposes inference. (2) Given that users may simultaneously have multiple usage purposes, the inference of usage purpose can be a multi-label problem, which distinguishes it from other user attribute inference. (3) Numerous dating APP users fail to disclose their usage purposes, resulting in a labeling problem (Only 24.67% of users disclosed their usage purposes; specifics are provided in Sect. 3.1.) The absence of labeled data may subsequently hinder the accuracy of usage purpose inference.

To address the issue of dating APP usage purpose inference, we propose a framework with three modules that are ideally suited to address the aforementioned obstacles. (1) To fully leverage dating APP users' abundant data, we propose a heterogeneous feature extraction module capable of leveraging multiple strategies to extract semantic representations from users' various dating APP data (i.e., portrait images, profiles, and posts) and then fusing all extracted features for further analysis. Specifically, we extract three types of features from the portrait images of dating APP users: easily interpretable features, such as computational aesthetics-based (CA) features [14] and content-based features [15], and those that are difficult to interpret, such as convolutional neural network-based features (CNN). In addition, we extract over 50 features from user profile data, which can be further categorized as personal attributes and social interactions. Regarding posts, we employ post attributes and Text-CNN to discover their semantic representation. (2) With the intent of addressing the multi-label problem and enhancing inference performance, we propose a multi-task usage purpose learning module capable of handling all three tasks concurrently. Specifically, we consider a two-part feature transformation approach for multi-task learning [16], namely joint embedding learning and task-oriented learning. The joint embedding fuses the common representation for all three binary classification tasks, while task-oriented learning further extracts task-specific knowledge from joint embedding. (3) We propose a semi-supervised usage purpose learning module [17], which can leverage unlabeled data to alleviate the problem of label insufficiency and improve the performance of usage purposes inference, given that a large amount of unlabeled data generated by users who do not report their usage purposes has not been utilized.

To evaluate our framework, we collect a large dataset from a popular dating APP containing 34,364 active dating app users and their publicly available heterogeneous information, such as portrait images, profiles, and posts. There are 8,477 users who have disclosed their usage purposes. Then, to determine the efficacy of our framework's modules, we pose six research questions and conduct extensive experiments. The outcome indicates that all three modules can significantly enhance usage purpose inference performance.

This study's contributions can be summarized as follows:

  • Our research is, to the best of our knowledge, the first to automatically infer users' usage purposes from heterogeneous data left on dating APPs.

  • We propose the novel T-SSMTL framework with three modules for dating APP usage purposes inference, which can completely extract semantic representations from users' digital traces, solve the multi-label problem and the label-missing problem.

  • We amass a large dataset encompassing 34,364 active dating APP users in four cities. We conduct experiments and evaluate the rationality and efficacy of the T-SSMTL framework using the real-world dataset. The result indicates that all three modules in this framework can significantly enhance usage purpose inference performance.

2 Related work

2.1 Inference of attributes in social network

Previous studies [18,19,20,21,22] have successfully inferred users' personality, mental health, intelligence, and emotions from social media data. For instance, Wei et al. [22] studied the relationship between the portrait images of Facebook users and their intelligence based on these images. By analyzing portraits, they predicted both the measured and perceived intelligence of social media users. Using data-driven methods, Liang et al. [23] predicted users' personalities based on their social media posts. However, the majority of previous research in this field did not consider users' online behavior patterns from the standpoint of heterogeneous information and extracted extensive social media features. In recent years, a number of studies have attempted to utilize heterogeneous social media data to infer user attributes [20, 21, 24, 25]. Wei et al. [21] predicted the personalities of social network users by employing heterogeneous feature engineering on posts, portrait images, and responsive patterns. Using Twitter users' profiles, portrait images, and posts, Shen et al. [20] developed a multi-model technique to identify depressed Twitter users. However, these methods are unsuitable for inferring the usage purpose of a dating application because they cannot solve the multi-label and label-missing problems.

In a nutshell, the majority of previous studies did not infer user characteristics by fully leveraging diverse social media data. Although some prior research has utilized heterogeneous data successfully, the inference of dating APP usage purposes remains unexplored. As far as we are aware, our work is the first to use heterogeneous information to infer the usage purposes of users automatically.

2.2 Studies on the usage purposes of dating apps

There have recently been a number of studies examining the usage purposes of dating APPs. Van et al. [7] conducted a case study with more than 500 participants and found that there are primarily six usage purposes for the dating application Grindr, including friendship, entertainment, social etiquette, romantic relationships, location-related search, and business activities. Timmermans et al. [6] revealed the perspective of individual differences in Tinder use and motivations, whereas Sumter et al. [5] found that "love" may be a stronger usage purpose than other motivations when utilizing dating APPs. Gudelunas et al. [11] used the use and gratifications approach to discuss the motivations and individual needs that draw users to dating APPs, the gratifications associated with users' activities on dating APPs, and how users manage their multiple online identities. Hardy et al. [12] investigate the methodology for optimizing the functionality of dating APPs from the perspective of HCI design in an effort to meet the needs of users of dating APPs with varying usage goals.

In conclusion, previous research has revealed a variety of dating application usage purposes and the effects of these usage purposes on dating application users. The primary focus of our work is determining how to infer dating APP usage purpose from social media data left on dating APPs, which remains unexplored.

3 Dataset description

To present the details of our dataset, we first describe the data collection method and provide an overview. Then, we describe the measures we've taken to protect user privacy in this work. In order to demonstrate the feasibility of this work, the reliability and generalizability of the collected dataset are also discussed.

3.1 Data collection

We collect user information from a popular Chinese dating application with over six million active users. This application provides numerous social features, such as viewing nearby users and following others. Users can also reveal their usage purpose to others by selecting one of the three options provided by the application: looking for the Mr. Right, dating, or chatting. Users are permitted to select multiple usage purposes simultaneously. According to the description of the APPs, the purpose of looking for Mr. Right is to find someone with an emotional commitment or future involvement, the purpose of dating is to find someone to date, and the purpose of online chatting is to find someone to chat with online.

In order to collect the heterogeneous information left on dating APPs by users and their usage purposes to carry out further analysis, we develop a custom-written Python application that emulates the target dating APP client. Every 30 min, the application can log in at a random location (called a pin location) in our target city, check for visible users nearby and collect their public information. This method of data collection has been widely utilized in previous research [1]. Each user's information includes a unique user identifier, the self-reported usage purpose (if reported), and heterogeneous information (i.e., portrait, profile, and posts).

The original dataset spans from June 12, 2019 to June 22, 2019, encompassing 34,364 active users in four large cities with heterogeneous dating APP data and self-reported usage purposes. Each usage purpose can be marked with a binary label, indicating whether the user has this purpose or not (e.g., want to chat with someone or not). 24.7% (N = 8,477) of users in this dataset have disclosed their usage purpose. 52.5% (N = 4,451) of these labeled users claim to be looking for the Mr. Right, 35.2% (N = 2,980) try to find a date, and 47.4% (N = 4,015) want to chat. Figure 1 depicts the distribution of usage purposes, which indicates overlap between users with different purposes. Then, we summarize the number of purposes per user, which reveals that 69.7% of users (N = 5,906) have only one purpose, 25.6% of users (N = 2,171) have two purposes, and 4.7% of users (N = 400) report all three purposes.

Fig. 1
figure 1

Distributions of usage purposes within the dataset. a The distribution of different usage purposes. b The distribution of usage purpose numbers

3.2 Privacy-related factors

In this study, we have taken numerous precautions to maintain the ethics and integrity of our work and safeguard user privacy. First, we adhere to the dating app provider's published privacy policies. Only public data that can be viewed by users who download the dating application will be collected and used. Second, the collected dating APP user data have been completely anonymized, and they are all stored and processed on a private server with no public access. For the sensitive image data, we have removed all EXIF (Exchangeable image file format) information from the original image to guarantee that none of the images contain sensitive information such as the shooting equipment, time, or location. In addition, we encode and store all images with base64 encoder so that they cannot be viewed directly without losing information. Thirdly, we have permission from the institution's research ethics committee to conduct this investigation. In addition, in accordance with the spirit of the 2018 European Union General Data Protection Regulation (GDPR), we have anonymized the names of the dating application and cities in this study so as not to stigmatize users and the dating application service provider.

3.3 Reliability

Concern exists regarding the usability of our gathered dataset. If the majority of dating APP users lie about the purpose of their usage, the collected data may not be reliable. Neither will the analysis based on this dataset be convincing. To address this concern, we conducted 50 interviews with prospective users of our target dating application. According to the interview, 94% of users (N = 47) stated that they will not report fake usage purposes on dating applications in order to avoid being bothered by users with different motivations. A typical example is that, when users with the purpose of chatting indicate that they are looking for a date, other users who want to date will interact with them. By compiling the opinions of the volunteers, we believe that the labels in our dataset pertaining to the usage purposes are reliable.

4 System framework

4.1 Overview

The proposed framework consists of the components listed below.

4.1.1 Heterogeneous feature extraction

In this module, we implement multiple strategies to extract semantic representations from the portrait images, profiles, and posts of dating APP users. We extract computational aesthetics-based features, content-based features, and CNN-based features for portrait images, ranging from easily interpretable to difficultly interpretable. For profiles, we extract personal attributes and social interactions to represent the essential and straightforward characteristics of dating application users. Regarding posts, we employ post attributes and Text-CNN to extract semantic information.

4.1.2 Multi-task usage purpose learning

Since a dating APP user may have multiple usage purposes, we employ the multi-task usage purpose learning module, which consists of joint embedding learning and task-oriented learning, to handle the three-purpose inference simultaneously. Joint embedding learning fuses the knowledge learned from heterogeneous feature extraction as the common representation for all tasks, while task-oriented learning captures task-specific knowledge from the joint embedding.

4.1.3 Semi-supervised usage purpose learning

Given that our dataset contains a substantial amount of unlabeled data, we employ a semi-supervised learning module to maximize their utility. The supervised architecture serves as the student model, and a copy of it serves as the teacher model. The student model continues to learn in the same manner, whereas the teacher model generates targets for unlabeled data. Two models can learn from one another by employing a consistency loss.

4.2 Heterogeneous feature extraction

To fully leverage the heterogeneous information of dating APP users, the first module of this framework extracts effective and diverse features from these data. In this paper, we utilized image data, text data, and structured data, all of which are heterogeneous data types. Image data include the portrait images of dating app users, text data include the massive posts published by users, and structured data include the personal characteristics and social interactions of dating app users. In the remainder of this section, we demonstrate how to extract semantic representations from the portrait images, profiles, and posts of dating APP users using various feature extraction strategies.

4.2.1 Portrait images

As a less-sensitive and important channel for users to share their self-expression, portrait images have been shown to contain numerous behavioral cues that may be related to the usage purposes of dating APP users [22, 26, 27]. Figure 2 depicts random samples of user portraits uploaded for the purposes of looking for the Mr. Right, dating, and chatting, respectively. We extract three types of features from the portrait images of dating APP users, ranging from those that are easily interpretable, such as features based on computational aesthetics (CA) [14] and features based on content [15], to those that are difficult to interpret, such as CNN-based features (CNN) [28]. The characteristics of the CA are summarized and described in detail in Table 1.

Fig. 2
figure 2

Stochastic samples of profile images of users for the purposes of looking for the Mr. Right a, dating b, and chatting c. For privacy reasons, all images have been blurred

Table 1 Description of CA's characteristics and attributes
4.2.1.1 Computational Aesthetics (CA) based features

According to the taxonomy proposed in [29], these characteristics can be grouped primarily into three categories: color, textural, and compositional. This type of feature describes easily interpretable portrait image properties, such as the localized presence of edges, the number of regions in an image, and the color usage pattern. Recent studies have made extensive use of CA-based features, such as correlating Facebook users' portrait images with their personalities [19] and estimating intelligence from social network portrait images [22]. The specifics of CA-based capabilities are detailed in Table 1.

4.2.1.2 Content based features

Portrait images on people-nearby services can depict a wide variety of subjects, including faces and everyday objects. To represent image content, we used the Imagga tagging system [30] to automatically assign tags to images, which has been utilized successfully in prior research [15]. The Imagga tagging API can generate a list of words related to the content of an image and the confidence level of each word. Figure 3 illustrates two examples of tags assigned to portrait images by this system, with the font size of more confident tags being larger. We tag all images using the Imagga tagging API and generate for each image a bag-of-tags with a confidence greater than 0.3, as recommended by Imagga developers’ recommendations. We only retain the 1,000 most frequent tags from our extensive dataset.

Fig. 3
figure 3

Example of assigning tags to portrait images

4.2.1.3 CNN based features

Convolutional Neural Networks (CNN) are a Deep Learning (DL) tool that has become a popular method for learning image representations. Training a CNN for images typically requires a large number of samples (in many cases, hundreds of thousands of samples are required), which cannot be used directly to process images of user portraits. In this study, we employ the pretrained VGG NET model [31, 32] on our profile images. When a profile image is provided, the penultimate layer (the last fully connected layer of the VGG NET) generates 4096-dimensional features that are used to represent the profile image. Finally, the extracted feature vector of the portrait image, denoted as \(f_{{{\text{portrait}}}}\), is composed of CA-based, content-based, and CNN-based features.

4.2.2 4.2.2 Profile

As the most fundamental and straightforward characteristics of dating APP users, profile data have been extensively used in previous social network attribute inferences, which may relate to the usage purposes of dating APP users [20]. We extract two types of profile data from the digital footprints left by users on a dating application, including personal attributes and social interactions.

4.2.2.1 The personal attributes

Describe the basic information of dating app users include their height, weight, age, level of education, income, and personal interests. Typically, these characteristics are filled out by users when they register for the dating application. Additionally, they will actively maintain and update this information to ensure they are sufficiently attractive on dating apps.

4.2.2.2 The social interactions

Between users of dating applications provide a great deal of information about their social relationships. These characteristics describe the willingness and level of social engagement of dating app users. In this work, we extract extensive typical social interactions, such as the number of friends (according to the rule of the target dating APP in our work, the friend is defined as two users who follow each other), the number of blocked users, the number of followers, and the likelihood of a message response. Finally, 57 profile features are extracted, including 22 personal attributes and 23 social interactions. Due to space constraints, we cannot list all of these features in this paper. The final profile feature vector is denoted as \(f_{{{\text{profile}}}}\).

4.2.3 Posts

Previous research indicates that the linguistic expression of social network posts is one of the most important indicators for analyzing user behavior. To comprehensively exploit dating APP users’ posts, we consider extracting two kinds of representations, i.e., post attributes and text-CNN.

4.2.3.1 Post attributes

Users of a dating APP with varying usage purposes have varying preferences for publishing posts, which are directly reflected in the meta information of their posts [33]. To capture the posting behavior of users and post feedbacks from other users, we extract the eight post attributes detailed in Table 2 as a feature vector \(f_{{{\text{post}}\_{\text{attr}}}}\).

Table 2 Description of post attributes
4.2.3.2 Text-CNN

In recent years, the deep learning approach has demonstrated superior performance on language representation learning in numerous natural language processing tasks [21]. Figure 4 depicts how we use a convolutional network to extract posts' semantic embeddings [34] in order to obtain the semantic representation of post content.

Fig. 4
figure 4

The structure of Text-CNN

Specifically, we concatenate each user's posts into a single text as \(T = \left( {t_{1} ,t_{2} , \cdots ,t_{n} } \right)\), where \(t_{{\text{i}}}\) is a word and \(n\) is the total length. Then, we encode \(T\) into \(W_{{1:{\text{n}}}} = \left[ {e_{1} ,e_{2} , \cdots ,e_{{\text{n}}} } \right] \in R^{dw \times n}\) using a word embedding layer, where \(d_{{\text{w}}}\) represents the embedding size. We apply a convolution layer with filter \(r \in R^{dw \times k}\) to \(W_{{1:{\text{n}}}}\), where k (k < n) represents the filter’s window size. The \(c_{{\text{i}}}^{r}\) can be learned to capture the local information of the posts from the submatrix \(W_{{{\text{i}}:{\text{i}} + {\text{k}} - 1}}\) as

$$c_{{\text{i}}}^{r} = \sigma \left( {r*W_{{{\text{i}}:{\text{i}} + {\text{k}} - 1}} + b} \right),$$
(1)

where \(\sigma\) represents an activation function, * represents the convolution operator, and \(b \in R\) represents the bias. After applying Eq. (1) to all feasible positions of \({\text{W}}_{{1:{\text{n}}}}\) with a max-over-time pooling, the most salient characteristic is determined as:

$$\widehat{{c_{i}^{r} }} = \max \left( {c_{1}^{r} ,c_{2}^{r} , \cdots ,c_{{{\text{n}} - {\text{k}} + 1}}^{r} } \right) \in R.$$
(2)

Then, we concatenate \(\hat{c}_{{\text{i}}}^{r}\) (i.e., all features) from multiple filters with multiple window sizes to represent each user's post content information as:

$$f_{{{\text{post\_content}}}} = \left[ {\widehat{{c_{{\text{i}}}^{r1} }},\widehat{{c_{{\text{i}}}^{r2} }}, \cdots } \right].$$
(3)

Finally, we concatenate \(f_{{{\text{post}}\_{\text{attr}}}}\) with \(f_{{{\text{post}}\_{\text{content}}}}\) to capture the post-related features for prediction of usage purpose

$$f_{{{\text{post}}}} = f_{{{\text{post}}\_{\text{attr}}}} \oplus f_{{{\text{post}}\_{\text{content}}}} ,$$
(4)

where ⊕ represents the concatenation operator for vectors.

4.3 Multi-task usage purpose learning

Given the features learned from the three heterogeneous perspectives, we attempt to predict the usage purpose of a dating application using three binary classification tasks and the features learned from the three different perspectives. To simultaneously handle the three tasks simultaneously, we consider a feature transformation approach for multi-task learning [16, 35] that consists of two parts: joint embedding learning and task-oriented learning. Figure 5 shows the structure of our proposed multi-task learning module.

Fig. 5
figure 5

The structure of multi-task learning module

4.3.1 Joint embedding learning

In this part, we attempt to learn a joint embedding that fuses the information learned from the three distinct perspectives. Specifically, we concatenate the embeddings from three views into a single feature vector as,\(f_{{{\text{con}}}} = f_{{{\text{portrait}}}} \oplus f_{{{\text{profile}}}} \oplus f_{{{\text{post}}}}\), where \(f_{{{\text{portrait}}}} \in R^{{d{\text{portrait}}}}\), \(f_{{{\text{profile}}}} \in R^{{d{\text{profile}}}}\), \(f_{{{\text{post}}}} \in R^{{d{\text{post}}}}\) are embeddings of portrait, profile, and post, respectively.

Then, we pass \(f_{{{\text{con}}}}\) via a multilayer perceptron (MLP) to learn a joint embedding of all views as \(f_{{{\text{joint}}}} = {\text{MLP}}\left( {f_{{{\text{con}}}} ;\theta_{{{\text{joint}}}} } \right)\), where \(\theta_{{{\text{joint}}}}\) t represents the network parameter.

The joint embedding \(f_{{{\text{joint}}}}\) fuses the knowledge from the three perspectives, which serves as the common representation for all three binary classification tasks.

4.3.2 Task-oriented learning

Given the learned joint embedding \(f_{{{\text{joint}}}}\), it is simple to directly connect three logits for three tasks, as \(f_{{{\text{joint}}}}\) contains all the information from all perspectives. Nonetheless, \(f_{{{\text{joint}}}}\) fails to acquire a task-specific representation, which may lead to an unsatisfactory classification. Comparing Fig. 2a and Fig. 2c, for instance, reveals that face-related features play a greater role in predicting the purpose of looking for the Mr. Right than chatting. Consequently, it is crucial to learn a task-specific embedding from \(f_{{{\text{joint}}}}\).

To achieve efficient multi-task learning, we consider a feature transformation strategy for extracting task-specific knowledge from the joint embedding. Specifically, we use an MLP to learn a task-specific embedding for each task.

$$f_{{{\text{task}}}} = {\text{MLP}}\left( {f_{{{\text{join}}}} ;\theta_{{{\text{task}}}} } \right),$$
(5)

where the subscript task refers to any of the three tasks. The \(\theta_{{{\text{task}}}}\) denotes the joint embedding learning layer's parameters. The prediction for each task is then provided by

$$p_{{{\text{task}}}} = sigmoid\left( {f_{{{\text{task}}}} } \right).$$
(6)

In the end, we used the binary cross entropy loss as the supervised loss for multi-task learning

$${\mathcal{L}}_{{{\text{super}}}} = - \mathop \sum \limits_{{{\text{task}}}} y_{{{\text{task}}}} \log p_{{{\text{task}}}} + \left( {1 - y_{{{\text{task}}}} } \right)\log \left( {1 - p_{{{\text{task}}}} } \right),$$
(7)

where \(y_{{{\text{task}}}} \in \left\{ {0,1} \right\}\) represents the task-specific ground truth. Note that (7) implicitly gives each of the three tasks the same weight.

4.4 Semi-supervised usage purpose learning

In order to improve the performance of dating APP usage purpose inference using unlabeled data, we employ a highly applicable semi-supervised approach called mean teacher [17] in this module. In semi-supervised learning, unlabeled samples are used to generate training targets for unlabeled samples. To accomplish this objective, we use the supervised architecture described in Sect. 4.2 as the student model, then create a copy of it as the teacher model. The student model continues to learn normally, while the teacher model generates learning objectives that are then utilized by the student model. By applying a consistency loss between two predictions, the student and the teacher can improve one another.

Figure 6 depicts a training batch containing a single labeled instance. First, we compare the output of the student model to the ground truth using classification loss (i.e., label). Loss of classification can be defined as \({\mathcal{L}}_{{{\text{super}}}}\) according to Sect. 4.3. Then, we compare the output of the student model and the teacher model using consistency loss as follows:

$${\mathcal{L}}_{{{\text{consis}}}} = \mathop \sum \limits_{{{\text{task}}}} ||f_{{{\text{task}}}} - f_{{{\text{task}}}}^{^{\prime}} ||^{2} ,$$
(8)

where \(f_{{{\text{task}}}}\) and \(f_{{{\text{task}}}}^{^{\prime}}\) represent the last layers of the student model and the teacher model for each task, respectively.

Fig. 6
figure 6

A training sample containing a single labeled instance of the mean teachers model

After the weights of the student model have been updated using the gradient descent method, the weights of the teacher model are updated using the exponential moving average (EMA) of the weights of the student model. The EMA is defined as \(\theta_{{\text{t}}}^{^{\prime}} = \alpha \theta_{{{\text{t}} - 1}}^{^{\prime}} + \left( {1 - \alpha } \right)\theta_{{\text{t}}}\), where \(\theta_{{\text{t}}}^{^{\prime}}\) and \(\theta_{{\text{t}}}\) represent the network parameters of the teacher model and the student model at the training step t, respectively. α represents the smoothing coefficient hyper-parameter, and it can be defined as \(\alpha = \min \left( {1 - 1/\left( {t + 1} \right),0.99} \right)\). Similar steps are followed when training with an unlabeled sample, except the classification cost is not used. In the final phase of training, we employ the teacher model for inference, as previous research demonstrated that this model is more likely to make accurate predictions [17].

4.5 The loss function

The usage purpose prediction system consists of three modules and is optimized end-to-end. Our entire framework's loss function consists of two components: the supervised component and the consistency component. The objective of the supervised portion is to minimize the gap between the output of the student model and the ground truth (i.e., labeled data). The objective of consistency loss is to minimize the difference between the output of the teacher model and the output of the student model, thereby enabling semi-supervised learning with unlabeled data. The objective is to reduce the objective function as follows:

$${\mathcal{L}} = {\mathcal{L}}_{{{\text{super}}}} + \lambda_{{{\text{consis}}}} {\mathcal{L}}_{{{\text{consis}}}} ,$$
(9)

where \(\lambda_{{{\text{consis}}}} > 0\) represents a hyper-parameter representing the consistency loss weight. Considering that the Adam optimizer [36] can automatically adjust the learning rate when training the model's weights. In this work, the Adam optimizer is utilized.

5 Experiment and results

In this section, a large number of experiments were designed and implemented to evaluate the T-SSMTL framework. There will be responses to the six research questions listed below:

  • RQ1: Can heterogeneous information improve the performance of dating APP usage purpose inference when compared to a single type of dating APP data?

  • RQ2: Can the multitasking strategy outperform the single-task technique?

  • RQ3: Does the application of the semi-supervised module improve the performance of purposes inference?

  • RQ4: Does our method perform better than other machine learning techniques?

  • RQ5: Does our computer-based method outperform the human-based strategy?

  • RQ6: Can we explain why extracted features contribute to usage purpose inference?

5.1 Experimental settings

The framework we developed for inferring the purpose of APP usage is implemented in Pytorch [4]. This study's dataset includes the labeled dataset (N = 8,477) and the unlabeled dataset (N = 25,887). Randomly, the labeled dataset is divided into training set (70%), validation set (15%), and test set (15%). The combined training set consists of the labeled training set and the unlabeled dataset. The validation set is used to tune the hyper-parameters, and the performance of the test set is reported. For training the proposed method, we employ the Adam optimizer [36] with a fixed batch size of 16 and an initial learning rate of 0.0005. Text-word CNN's embedding size \(d_{{\text{w}}}\) was set to 128 and the convolutional kernel was set to {3,4,5} with 64 filters for each size. The size of the joint embedding learning layer is set to 512, while the size of the task-oriented learning layer is set to 256. The consistency loss weight \(\lambda_{{{\text{consis}}}}\) is set to 1.0. Given that the ReLU [37] performs the best on the validation set, we use the ReLU for activation functions in standard testing and reporting. The widely-used Precision-Recall AUC is utilized to evaluate the efficacy of our framework, which can comprehensively consider precision and accuracy. For the reported result, a larger AUC indicates that the models' inference performance is superior. We conduct each experiment twenty times and present the mean AUC and standard deviation.

5.2 Effectiveness of leveraging heterogeneous information (RQ1)

We begin by investigating the efficacy of leveraging heterogeneous information to answer the research question 1. Changing the fcon prior to passing it through the multilayer perceptron allows us to test a variety of extracted feature combinations. For instance, when evaluating the efficacy of using only portrait images and posts for inference, we modify the definition of \(f_{{{\text{con}}}}\) as \(f_{{{\text{con}}}} = f_{{{\text{portrait}}}} \oplus f_{{{\text{post}}}}\).

Table 3 displays the AUC of our classification with respect to each feature type and their combinations. By evaluating the feature combinations, we find that all combinations of two features achieve a higher AUC than using only one type of feature, while the combination of all three features yields the highest AUC. This result suggests that the performance of usage purpose inference can be enhanced by incorporating more heterogeneous data. The RQ1 can thus be answered. In addition, by evaluating each individual feature type, we find that posts-based features achieve the best classification results across all three usage purpose inference tasks, whereas profile-based features perform the worst, particularly when it comes to inferring the usage purpose of looking for the Mr. Right. This is consistent with the intuition that the content of dating APP users' posts typically conveys relatively more information than its profile features, as profile features are fewer and consist primarily of essential information about dating APP users.

Table 3 Performance of leveraging different heterogeneous information

5.3 Effectiveness of multi-task module (RQ2)

To determine whether our multi-task approach can improve the performance of dating APP usage purpose inference, we compare our framework to a single-task approach and conduct experiments. In our study, we develop three single-task classification models for each usage purpose. The heterogeneous feature extraction and semi-supervised learning modules for each single-task model same from those in our framework, whereas the multi-task learning module is unique. To be specific, the task-oriented learning is omitted, and the probability of use purpose is calculated using the \(f_{{{\text{joint}}}}\) instead of the \(f_{{{\text{task}}}}\) as the supervised loss for training.

Table 4 displays the AUC of the single-task learning method and our multi-task learning for each dating application usage scenario. First, it is evident that the multi-task approach outperforms the single-task approach significantly when inferring all three usage purposes. In particular, the enhancements indicate that the multi-task learning module in our framework is effective at enhancing the dating APP usage purpose inference. Second, we find that the single-task model's abilities to infer three usage purposes are distinct. The AUC for predicting the purpose of looking for the Mr. Right is 0.764, whereas the AUC for inferring the purpose of chatting is only 0.733.

Table 4 Comparison between classification performed by single-task approach and our multi-task approach

5.4 Effectiveness of semi-supervised module (RQ3)

To determine if our semi-supervised method can improve the performance of dating APP usage purpose inference, we compare it to the supervised method. Specifically, the semi-supervised learning module is omitted when implementing the supervised method. Thus, the teacher model is omitted and only classification loss is used for training. Both approaches' heterogeneous feature extraction and multi-task module are consistent. The AUC of the supervised and semi-supervised approaches are compared in Table 5. The results demonstrate that our semi-supervised method outperforms the supervised method for inferencing all three purposes. Thus, we can conclude that leveraging unlabeled user data can improve the classification of dating APP usage purposes. The third research question is addressed.

Table 5 Comparison between classification performed by supervised approach and our semi-supervised approach

Additionally, we conduct experiments to determine how the number of labeled users affects the performance of our semi-supervised method. After partitioning all labeled users (N = 8,477) into training, validation, and test sets. Random selection further controls the number of labeled training data. As the final training set, we concatenate the selected labeled data and all unlabeled data (N = 25,887). Figure 7 depicts the comparison results. The performance of all three usage purpose inferences increases as the number of labeled users increases.

Fig. 7
figure 7

Performance of applying a different number of labeled users to a training set

5.5 Compared to more learning method (RQ4)

Several widely used Machine Learning (ML) models, including Naive Bayes (NB), Logistic Tree (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Gradient Boosting Classifier (GBC), are used as benchmarks to evaluate the efficacy of our T-SSMTL (GBC). Using generated features (i.e., \(f_{{{\text{joint}}}}\)) as input, we construct three classifications for different usage purposes for each machine model.\(f_{{{\text{joint}}}}\) While using machine learning methods as baselines, we also adjusted their model parameters during the training process to ensure that the baseline method also achieved its optimal results. We then compared the baseline method to our proposed method. Table 6 displays the comparison results. The results indicate that our proposed T-SSMLT outperforms other learning techniques for all three inference purposes.

Table 6 Comparison between our semi-supervised multi-task approach and other widely used learning method

5.6 Compared to human-based inference (RQ5)

Comparison is made between the accuracy of human and computer-based usage purpose inference. First, we construct a reduced test dataset by randomly sampling 120 users' portrait images, profiles, posts, and usage purposes, which include 57 purposes for finding Mr. Right, 56 for dating, and 59 for chatting. Then, we organize the information of 120 dating APP users on a website interface, where a human rater can choose between one and three potential usage purposes from the three options. 11 volunteers are enlisted to deduce the usage purpose for each of the 120 dating APP users based on their portraits, profiles, and posts, and the final results are determined by the majority vote (≥ 6). Additionally, we evaluate the computer-based approach (introduced in Sect. 4) using the same 120 users as the testing set and the remaining dataset as the training set. The AUC of both methods is computed in order to compare their performance, as shown in Table 7.

Table 7 The sub-samples of 120 profile images are classified and compared between human and computer

We can see that computer-based classification performs significantly better than human-based classification for all three inference tasks. This result is not surprising, given that previous research has demonstrated that computers have several significant advantages over humans when it comes to solving classification problems. According to [19, 38], it is difficult for humans to gain access to the capability of storing a large quantity of high-dimensional data, whereas computers possess this capability. Moreover, humans are susceptible to biases and prejudices [22], whereas computers can utilize more data to optimize classification accuracy.

Nonetheless, the performance of human-based classification yields two intriguing discoveries. First, the performance of inferring three usage purposes is disproportional. The AUC of human-based classification can reach 0.55 when predicting dating usage purposes, whereas the AUC of inferring chatting purpose is only 0.48. It appears that human raters have different abilities to predict different usage purposes. Second, the performance of inferring the usage purpose of chatting is worse than random guessing. Specifically, only 38 users out of 79 who are considered to have chat purposes actually want to chat. In other words, the precision ratio of inferences made for chatting purposes is only 0.48 (38/79). It suggests that humans are more susceptible to error when predicting dating APP usage purposes.

5.7 Explanation analysis (RQ6)

When verifying the effectiveness of leveraging heterogeneous information for usage purpose inference, we discover that users' portrait images and posts have the most contributions. To determine why these features are useful and to respond to the fifth research question, we employ explanation analysis to examine the relationship between interpretable profile image features (i.e., CA-based features and Content-based features), high-frequency words in posts, and usage purposes.

5.7.1 Portrait images

We calculate the coefficient between each individual feature and the three usage purposes using the Wilcoxon rank-sum test. To avoid the issue of multiple comparisons, Bonferroni correction is applied to the p-values of correlations. In light of the difficulty in presenting all significant features, we present in Fig. 8 the 50 typical statistically significant correlations between portrait image features and each usage purpose. This result elicits a number of fascinating observations, including:

  • The proportion of pink and red pixels is positively correlated with the intention of looking for the Mr. Right, whereas it is negatively correlated with the intention of chatting. This suggests that users of the Dating APP who are looking for Mr. Right are more likely to use portraits with warm colors, whereas users who are only interested in chatting do not.

  • Tags relating to happiness, self-assurance, and contentment are positively correlated with the usage purpose of looking for the Mr. Right. In contrast, these tags have a negative correlation with chat purpose. This result suggests that users who are looking for Mr. Right are more likely to use emotive images than those who are simply interested in chatting.

  • The tags sexy, body, muscular, and naked have a positive correlation with the dating usage purpose. It reveals that users who are interested in dating are more likely to present an athletic physique in their portraits.

Fig. 8
figure 8

The statistical correlation coefficients of each profile image feature of users looking for the Mr. Right a, a date b, and a chat partner c. After Bonferroni correction, the significant correlations are indicated by opaque bars. Features that positively correlate with significant correlations are marked with orange, while those that negatively correlate are marked with blue

5.7.2 Posts

To determine if the preference for publishing posts varies, we calculate the frequency of words in posts published by users with distinct usage purposes. The t-test is then used to determine the significance of the difference. In addition, the Bonferroni correction is applied. The following are the findings:

  • Users of dating APPs who are interested in chatting are more likely to include the word "weather" in their posts, whereas users who are interested in data may not. (p-value < 10−3)

  • Users who want to date frequently use the word "disappointment" in their posts, whereas users who are searching for Mr. Right rarely do so. (p-value < 10−5)

  • Users searching for Mr. Right are more likely to use the word "weekend," whereas other users do not. (p-value < 10−3)

6 Discussion

In this section, we discuss a few restrictions and highlight two intriguing phenomena.

6.1 Assessing and improving the generalizability

Concern exists about the usability of our collected dataset: Are our sampled users representative of the millions of users of dating apps? To address this concern, we tested the age distribution of our sampled users to confirm their representativeness. According to the research [39], the majority of dating APP users are between the ages of 25 and 40, constituting 57.0% of all users. 28.4% of dating APP users are under the age of 25. As for older users (those over the age of 40), they account for 14.6% of all users. 28.2% of our sampled users claim to be younger than 25 years old, 59.1% claim to be between 25 and 40 years old, and 12.7% claim to be older than 40 years old. The results of the chi-square test indicate that the difference between the distributions of the two age groups is not statistically significant (× 2 = 3.496, p = 0.174 > 0.05), indicating that the sampled dating APP users are somewhat representative. In addition, the experimental results indicate that there is no statistically significant difference between self-reporters and non-reporters in terms of behavior. However, additional testing of the T-SSMTL is still required to determine whether the T-SSMTL is effective for non-reporting purposes.

6.2 Human bias

The performance of inferring different usage purposes is unequal, and human-based classification is even worse than random guessing when inferring the usage purpose of chatting. These results indicate a human bias between users of a dating APP who upload personal information and users who view the descriptions of others. Thus, when a user uploads a portrait image to communicate his usage purposes, others may not perceive or even misinterpret his intentions. Thus, an intriguing research direction is to investigate the difference between usage purposes communicated by users and those perceived by other users (perceived usage purposes).

7 Conclusion and prospect

In this paper, we propose a three-module framework with a semi-supervised and multi-task learning mechanism (T-SSMTL) for predicting the use purpose of dating application users based on the heterogeneous information disclosed by users, including portrait images, profiles, and posts. In the first step, a module for extracting heterogeneous features from user data is designed, which can fully extract semantic representation. Second, a multi-task learning module that can extract task-specific knowledge to improve the performance of multi label inference is employed. Then, for further performance enhancement, an unlabeled data-using semi-supervised learning module is introduced. To validate our methodology, we apply our framework to a massive dataset containing 34,364 active dating APP users. Each module's effectiveness within our framework is evaluated. Through correlation analysis, we also uncovered some intriguing insights into how users with different purposes select their profile images. We also discuss the implications of our inference model and findings.

Dating software is, in fact, a type of social software. Under the premise of legal permission, the T-SSMTL method is applicable to any heterogeneous data related to social software and can effectively solve the problem of user classification in order to better serve various types of users, such as by pushing relevant instant messages. We believe that the T-SSMTL method is applicable to a wide variety of fields. The following work contains two components. One is to expand application fields, including spatio-temporal big data with obvious multi-source and heterogeneous characteristics, such as remote sensing image data, weather forecast data, and socio-economic and humanistic data; the second step is to optimize and select the superior T-SSMTL method.