Skip to main content

Table 21 List of selected features with their description on stack overflow dataset

From: Predicting closed questions on community question answering sites using convolutional neural network

Features Descriptions
Nouns The number of nouns in each question was counted
Verbs The number of verbs in each question was counted
Adjective The number of adjectives in each question was calculated
Entropy The average entropy of the words present in the question text, indicating the amount of information being produced. The entropy is calculated using Eq. (18):
\(H(X|Y)=\sum\limits_{i,j}p(x_i,y_i)\log\frac{p(y_i)}{p(x_i y_i)}\quad \quad \quad (18)\)
where \(p(x_i ,y_j)\) is the word occurrence probability in the question text
Difficult words [52] If a word not present in a predefined list containing 3000 familiar words, then the word is considered as difficult words
Lex Diversity It is defined as the ratio of number of unique words to the total number of words in question text. It is derived as:
\(\text{Lex\_Div}=\frac{{\text{total\_unique\_words}}}{{\text{total\_words}}}\quad \quad \quad (19)\)
One letter words The total number of one letter words such as “I and A” in the question
Two letter words The total number of two letter words such as “am, an, as, if, we" etc., in the question
Longer letter word The total number of words in the question text having more than two letters is considered as longer letter word
Flesch reading ease [53] It is a popular reading ease determiner. For each question, FRE was calculated as (20):
\(\text{FRE}=206.835-(1.015^*\text{ASL})-(84.6^*\text{ASW})\quad \quad (20)\)
The score lies between 0 and 100, where 0 is very confusing and 100 being very easy
Stop words The total number of stop words such as “any, about, before, can, did, each, for, good, had” etc., in the question was counted
Length Total number of words in the question except the stop words
Set length The number of unique words in the question was counted
Wrong words Number of words that do not belong to a English dictionary
Dale Challe reading score [52] The score lies between 0 and 100, where score 0 indicated easy to read the text
li_tags This tag is used in HTML to represent the text in bullet form for easy reading and understanding; we counted all li_tags tags