Abstract
Until now, this book formed a basis for an approach to building intercultural social simulation: it described the purposes of this book and what approaches exist. Different scenarios of social interaction social simulation were introduced. Data that can be used in intercultural experiments was discussed. However, something is still missing, namely a robust framework that relies on these findings and implements flexible prototypes of social systems. Such framework would, for example, compose social systems that realize required simulation behavior and tackle shortcomings of existing approaches. This chapter describes the framework for statistical processing and prototyping, SocioFramework. Moreover, it presents additional findings focusing on intercultural processing.
Keywords
- Support Vector Machine Classifier
- Configuration File
- Affective Behavior
- Statistical Dataset
- Forward Feature Selection
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Diaz-Agudo, B., Recio-García, J. A., & Gonzalez-Calero, P. A. (2007). Natural language queries in CBR systems. In 19th IEEE international conference on tools with artificial intelligence, ICTAI 2007 (Vol. 2, pp. 468–472). doi:10.1109/ICTAI.2007.27.
Elidan, G., Ninio, M., Friedman, N., & Schuurmans, D. (2002). Data perturbation for escaping local maxima in learning. In Eighteenth national conference on artificial intelligence (pp. 132–139). Menlo Park: American Association for Artificial Intelligence. ISBN:978-0-262-51129-0. http://dl.acm.org/citation.cfm?id=777092.777116.
Elliott, J., Eckstein, R., Loy, M., Wood, D., & Cole, B. (2002). Java swing, second edition (2nd ed.). New York: O’Reilly Media. ISBN:978-0-596-00408-8. http://amazon.com/o/ASIN/0596004087/.
François, J. M. (2012). JAHMM. An implementation of hidden Markov models in Java. https://code.google.com/p/jahmm/.
Friedl, J. E. F. (2006). Mastering regular expressions (3rd ed.). New York: O’Reilly Media. ISBN:978-0-596-52812-6. http://amazon.com/o/ASIN/0596528124/.
Hall, M. A. (1999). Correlation-based feature selection for machine learning. PhD thesis, Department of Computer Science, The University of Waikato.
Joachims, T. (1999). Making large-scale support vector machine learning practical. In B. Schölkopf, C. J. C. Burges & A. J. Smola (Eds.), Advances in kernel methods. (pp. 169–184). Cambridge: MIT Press. 0-262-19416-3. http://dl.acm.org/citation.cfm?id=299094.299104.
Juang, B. H., & Rabiner, L. R. (1985). A probabilistic distance measure for hidden Markov models. AT&T Technical Journal, 64(2), 391–408. http://citeseer.ist.psu.edu/context/244209/0.
Mitchell, T. M. (1997). Machine learning (1st ed.). New York: McGraw-Hill Science/Engineering/Math. ISBN:978-0-070-42807-2. http://amazon.com/o/ASIN/0070428077/.
Osherenko, A. (2011). Opinion mining and lexical affect sensing: computer-aided analysis of opinions and emotions in texts. Berlin: Südwestdeutscher Verlag für Hochschulschriften. ISBN:978-3-838-12488-9. http://amazon.de/o/ASIN/383812488X/.
Petri, J. (2010). Netbeans platform 6.9 developer’s guide. New York: Packt. ISBN:978-1-849-51176-6. http://amazon.com/o/ASIN/1849511764/.
Ray, E. T. (2003). Learning XML (2nd ed.). New York: O’Reilly Media. ISBN:978-0-596-00420-0. http://amazon.com/o/ASIN/0596004206/.
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing, Manchester, UK.
Witten, I., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.
Author information
Authors and Affiliations
Appendices
Appendix A: The HMM Classifier
This appendix shows an implementation of an HMM Classifier that relies on JAHMM and is compatible with WEKA:
public class JAHMM extends AbstractClassifier {
//number of observations in case of emotional E/A
//space - 5
protected int m_NumClasses;
//number of states in case of emotional E/A space - 5
protected int m_States;
//the number of the sequence attribute
protected int m_SeqAttr = -1;
//0 -- k-means, 1 -- Baum-Welch, 2 -- scaled Baum-Welch
protected int m_LearningMethod = 0;
//Builds the current classifier. m_SeqAttr specifies
//the sequential attribute.
@Override
public void buildClassifier(Instances data) throws
Exception {
...
//build HMM
OpdfIntegerFactory factory =
new OpdfIntegerFactory(m_NumClasses);
Hmm<ObservationInteger> hmm =
new Hmm<ObservationInteger>(m_States, factory);
hmm.getOpdf(0).fit(new ObservationInteger(4));
List<List<ObservationInteger>> sequences;
sequences = extractSequences(data);
switch (m_LearningMethod) {
case 0:
// KMeans learning
KMeansLearner<ObservationInteger> kml =
new KMeansLearner<ObservationInteger>
(m_NumClasses, factory, sequences);
learntHmm_ = kml.learn();
break;
case 1:
/* Baum-Welch learning */
BaumWelchLearner bwl = new BaumWelchLearner();
learntHmm_ = bwl.learn(hmm, sequences);
break;
case 2:
BaumWelchScaledLearner bwsl =
new BaumWelchScaledLearner();
learntHmm_ = bwsl.learn(hmm, sequences);
break;
}
}
//Classifies the given test instance
@Override
public double classifyInstance(Instance instance) throws
Exception {
Instances seq = instance.relationalValue(m_SeqAttr);
//extracts a training sequence from the given instance
List<ObservationInteger> sequence =
extractSequenceFromInstance(seq.instance(0));
int[] states =
learntHmm_.mostLikelyStateSequence(sequence);
double bestClass = states[states.length - 1];
return bestClass;
}
} //endclass
Appendix B: The ARFF Wrapper
This appendix presents the Java source of a wrapper that maintains a WEKA-compatible classifier and ARFF data in the prototypes of SS systems. Note that the wrapper uses names of the dataset files to extract and evaluate statistical features for identifying necessary POS processing and lemmatization done by TreeTagger (Schmid 1994). For example, TreeTagger assumes that POS tagging is necessary for evaluating features in a dataset with the name containing string _grammar_:
package coreEmotionalEngine.emotext;
//imports emotext
//import TreeTagger
//imports java
//imports WEKA
public class ARFFWrapper {
//a base classifier used for analyzing texts
private Classifier c_ = null;
//name of the dataset file
private String datasetName_ = null;
//WEKA instances used for training and testing
protected Instances instances_ = null;
//interface to evaluate statistical features
private IFeatureEvaluation fe_ = null;
public Classifier buildClassifier(Classifier clsr,
String datasetName) throws Exception {
...
return classifier;
}
public ARFFWrapper(Classifier c, TreeTagger tagger,
String datasetName) {
treetagger_ = tagger;
try {
if (datasetName.contains("fusion") &&
(datasetName.endsWith(".spec"))) {
//build a fusion classifier
...
else {
c_ = buildClassifier(c, datasetName);
datasetName_ = datasetName;
}
} catch (Exception e) {
e.printStackTrace();
}
}
public double classifyInstance(Instance instance) throws
Exception {
return c_.classifyInstance(instance);
}
public Instance buildInstance(String text) {
Instance i = null;
if (datasetName_.contains("_fusion")) {
//a spec is found that identifies part datasets
//used for fusion
i = buildFusedInstance(text);
} else if (datasetName_.contains("_lexical_")) {
//build a lexical instance of processed data
i = buildLexicalInstance(text);
} else if (datasetName_.contains("_stylometry_")) {
//build a stylometric instance of processed data
i = buildStylometricInstance(text);
} else if (datasetName_.contains("_grammar_")) {
//build a grammatical instance of processed data
i = buildGrammarInstance(text);
} else if (datasetName_.contains("_deixis_")) {
//build a deictic instance of processed data
i = buildDeixisInstance(text);
} else {
assert (false);
}
return i;
}
private Instance buildFusedInstance(String text) {
...
return instance;
}
private Instance buildDeixisInstance(String text) {
...
return instance;
}
private Instance buildGrammarInstance(String text) {
...
return instance;
}
private Instance buildLexicalInstance(String text) {
...
return instance;
}
private Instance buildStylometricInstance(String text) {
...
return instance;
}
}
To build statistical instances, the ARFFWrapper class references variable fe_ that refers to the IFeatureEvaluation interface maintaining feature evaluation:
package coreEmotionalEngine.emotext;
public interface IFeatureEvaluation {
public abstract double value(double original);
}
Three implementations of this interface are available: the presence evaluation that evaluates features according to their presence in the analyzed text as 1 or 0 (PresenceFeatureEvaluation), the inverse evaluation that evaluates a feature as a reciprocal frequency value (InverseFeatureEvaluation), or the frequency evaluation that evaluates a feature as a frequency value (FrequencyFeatureEvaluation). See (Osherenko 2011, p. 80) for details of feature evaluation.
Appendix C: Storing Configuration
This appendix shows a configuration file used to store the parameters of SocioFramework (	 is interpreted by the XML engine in Java as a tab character; 
 as a linebreak; " as a quotation mark):
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Osherenko, A. (2014). Framework for Data Processing. In: Social Interaction, Globalization and Computer-Aided Analysis. Human–Computer Interaction Series. Springer, London. https://doi.org/10.1007/978-1-4471-6260-5_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-6260-5_5
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6259-9
Online ISBN: 978-1-4471-6260-5
eBook Packages: Computer ScienceComputer Science (R0)