Mining Good Sliding Window for Positive Pathogens Prediction in Pathogenic Spectrum Analysis

  • Lei Duan
  • Changjie Tang
  • Chi Gou
  • Min Jiang
  • Jie Zuo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7121)


Positive pathogens prediction is the basis of pathogenic spectrum analysis, which is a meaningful work in public health. Gene Expression Programming (GEP) can develop the model without predetermined assumptions, so applying GEP to positive pathogens prediction is desirable. However, traditional time-adjacent sliding window may not be suitable for GEP evolving accurate prediction model. The main contributions of this work include: (1) applying GEP-based prediction method to diarrhea syndrome related pathogens prediction, (2) analyzing the disadvantages of traditional time-adjacent sliding window in GEP prediction, (3) proposing a heuristic method to mine good sliding window for generating training set that is used for GEP evolution, (4) proving the problem of training set selection is NP-hard, (5) giving an experimental study on both real-world and simulated data to demonstrate the effectiveness of the proposed method, and discussing some future studies.


Data Mining Time Series Sliding Window Pathogens Prediction 


  1. 1.
    United Nations World Health Organization,
  2. 2.
    Reis, B.Y., Mandl, K.D.: Time Series Modeling for Syndromic Surveillance. BMC Med. Inform. Decis. Mak. 3(1), 2 (2003)CrossRefGoogle Scholar
  3. 3.
    Earnest, A., Chen, M.I., Ng, D., Sin, L.Y.: Using Autoregressive Integrated Moving Average (ARIMA) Models to Predict and Monitor the Number of Beds Occupied During a SARS Outbreak in a Tertiary Hospital in Singapore. BMC Health Services Research 5, 5–36 (2005)CrossRefGoogle Scholar
  4. 4.
    Meng, Lei, Wang, Yuming: Application of ARIMA Model on Prediction of Pulmonary Tuberculosis Incidence. Chinese Journal of Health Statistics 27(5), 507–509 (2010)Google Scholar
  5. 5.
    Zhang, G.P.: Time Series Forecasting Using a Hybrid ARIMA and Neural Network Model. Neurocomputing 50, 159–175 (2003)CrossRefzbMATHGoogle Scholar
  6. 6.
    Khan, J., Wei, J.S., Ringnér, M., Saal, L.H., et al.: Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine 7, 673–679 (2001)CrossRefGoogle Scholar
  7. 7.
    Guan, P., Huang, D.-S., Zhou, B.-S.: Forecasting Model for the Incidence of Hepatitis A based on Artificial Neural Network. World Journal of Gastroenterology 10(24), 3579–3582 (2004)CrossRefGoogle Scholar
  8. 8.
    De Falco, Della Cioppa, A., Tarantino, E.: A Genetic Programming System for Time Series Prediction and Its Application to El Niño Forecast. Advances in Soft Computing 32, 151–162 (2005)CrossRefGoogle Scholar
  9. 9.
    Barbulescu, A., Bautu, E.: ARIMA Models versus Gene Expression Programming in Precipitation Modeling. In: Proc. of the 10th WSEAS Int’l Conf. on Evolutionary Computing, pp. 112–117 (2009)Google Scholar
  10. 10.
    Ferreira, C.: Gene Expression Programming: A New Adaptive Algorithm for Solving Problems. Complex Systems 13(2), 87–129 (2001)zbMATHGoogle Scholar
  11. 11.
    Ferreira, C.: Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. Angra do Heroismo, Portugal (2002)Google Scholar
  12. 12.
    Brockwell, P., Davies, R.: Introduction to Time Series. Springer, New York (2002)CrossRefGoogle Scholar
  13. 13.
    Bollerslev, T.: Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics 31, 307–327 (1986)CrossRefzbMATHGoogle Scholar
  14. 14.
    Hacker, R.S., Hatemi, J.A.: A Test for Multivariate ARCH Effects. Applied Economics Letters 12(7), 411–417 (2005)CrossRefGoogle Scholar
  15. 15.
    Chui, C.K.: An Introduction to Wavelets. Academic Press, San Diego (1992)zbMATHGoogle Scholar
  16. 16.
    Zuo, J., Tang, C., Li, C., Yuan, C.-A., Chen, A.-l.: Time Series Prediction Based on Gene Expression Programming. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 55–64. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Barbulescu, A., Bautu, E.: Time Series Modeling Using an Adaptive Gene Expression Programming Algorithm. International Journal of Mathematical Models and Methods in Applied Sciences 3(2), 85–93 (2009)Google Scholar
  18. 18.
    Wong, S.K.M., Ziarko, W.: On Optimal Decision Rules in Decision Tables. Bulletin of Polish Academy of Sciences 33(11-12), 693–696 (1985)zbMATHGoogle Scholar
  19. 19.
    Sipser, M.: Introduction to the Theory of Computation, 2nd edn., Thomson Learning, Stanford (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Lei Duan
    • 1
  • Changjie Tang
    • 1
  • Chi Gou
    • 1
  • Min Jiang
    • 2
  • Jie Zuo
    • 1
  1. 1.School of Computer ScienceSichuan UniversityChengduChina
  2. 2.West China School of Public HealthSichuan UniversityChengduChina

Personalised recommendations