Advertisement

Journal of the Indian Institute of Science

, Volume 99, Issue 2, pp 215–224 | Cite as

Sequence Segmentation Using Semi-Markov Conditional Random Fields

  • Sunita SarawagiEmail author
Review Article
  • 51 Downloads

Abstract

Many applications in natural language, speech processing and data integration require model-based segmentation of sequences. Semi-Markov conditional random fields (semi-CRFs) are a generalization of CRFs and provide a full conditional distribution over all possible segmentation of a sequence. Semi-CRFs are particularly suitable for tasks that entail segment-level features such as match with existing dictionary of segments. Empirical results on real-life NER tasks show that they yield higher accuracy than CRFs, but the straightforward foreword–backward inference algorithm requires 3–10 times the computation cost of CRFs. This running time can be reduced significantly by exploiting overlapping features across segments. We present a succinct representation of overlapping features and an efficient training algorithm that can sum over all possible input segmentation in time that is sub-quadratic in the input length, even while imposing no bound on the maximum segment length. Consequently, the running time becomes comparable to CRFs even with the addition of useful entity-level features on large input segments.

Notes

References

  1. 1.
    Barbar D, Garcia-Molina H, Porter D (1992) The management of probabilistic data. IEEE Trans Knowl Data Eng 4(5):487–502.  https://doi.org/10.1109/69.166990 CrossRefGoogle Scholar
  2. 2.
    Beck E, Hannemann M, Dtsch P, Schlter R, Ney H (2018) Segmental encoder-decoder models for large vocabulary automatic speech recognition. Proc Interspeech 2018:766–770CrossRefGoogle Scholar
  3. 3.
    Borthwick A, Sterling J, Agichtein E, Grishman R (1998) Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Sixth Workshop on Very Large Corpora. Association for Computational Linguistics, New Brunswick, New JerseyGoogle Scholar
  4. 4.
    Boulos J, Dalvi N, Mandhani B, Mathur S, Re C, Suciu D (2005) Mystiq: a system for finding more answers by using probabilities. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland.  https://doi.org/10.1145/1066157.1066277
  5. 5.
    Dalvi NN, Suciu D (2004) Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th VLDB Conference, Toronto, Canada, pp 864–875Google Scholar
  6. 6.
    Fuhr N (1990) A probabilistic framework for vague queries and imprecise information in databases. In: Proceedings of the sixteenth international conference on Very large databases. Morgan Kaufmann Publishers Inc., San Francisco, pp 696–707Google Scholar
  7. 7.
    Green TJ, Tannen V (2006) Models for incomplete and probabilistic information. IEEE Data Eng Bull 29(1)Google Scholar
  8. 8.
    Gupta R, Sarawagi S (2006) Curating probabilistic databases from information extraction models. In: VLDBGoogle Scholar
  9. 9.
    Gupta R, Sarawagi S (2009) Answering table augmentation queries from unstructured lists on the web. In: PVLDBGoogle Scholar
  10. 10.
    Kemos A, Adel H, Schtze H (2018) Neural semi-markov conditional random fields for robust character-based part-of-speech tagging. 1808.04208Google Scholar
  11. 11.
    Keshet J, Shalev-Shwartz S, Singer Y (2005) Phoneme alignment using large margin techniques. In: Workshop on the advances in structured learning for text and speech processing, NIPSGoogle Scholar
  12. 12.
    Krogh A (1998) Gene finding: putting the parts together. In: Bishop MJ (ed) Guide to human genome computing, 2nd edn. Academic Press, Cambridge, pp 261–274CrossRefGoogle Scholar
  13. 13.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (ICML-2001), WilliamsGoogle Scholar
  14. 14.
    Liu DC, Nocedal J (1989) On the limited memory bfgs method for large-scale optimization. Math Programm 45:503–528CrossRefGoogle Scholar
  15. 15.
    Malouf R (2002) A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of The sixth conference on natural language learning (CoNLL-2002), pp 49–55Google Scholar
  16. 16.
    Mansuri I, Sarawagi S (2006) A system for integrating unstructured data into relational databases. In: Proc. of the 22nd IEEE Int’l Conference on Data Engineering (ICDE)Google Scholar
  17. 17.
    McCallum A, Li W (2003) Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of The Seventh Conference on Natural Language Learning (CoNLL-2003), Edmonton, CanadaGoogle Scholar
  18. 18.
    McDonald R, Crammer K, Pereira F (2005) Flexible text segmentation with structured multilabel classification. In: HLT/EMNLPGoogle Scholar
  19. 19.
    Sarawagi S (2006) Efficient inference on sequence segmentation models. In: Proceedings of the \({23}^{\rm {rd}}\) International Conference on Machine Learning (ICML), PittsburghGoogle Scholar
  20. 20.
    Sarawagi S, Cohen WW (2004) Semi-markov conditional random fields for information extraction. In: NIPSGoogle Scholar
  21. 21.
    Sarma AD, Benjelloun O, Halevy A, Widom J (2006) Working models for uncertain data. In: ICDEGoogle Scholar
  22. 22.
    Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACLGoogle Scholar
  23. 23.
    Ye ZX, Ling ZH (2018) Hybrid semi-markov crf for neural sequence labeling. In: ACLGoogle Scholar
  24. 24.
    Zhang T, Damerau F, Johnson D (2002) Text chunking based on a generalization of winnow. J Mach Learn Res 2:615–637Google Scholar
  25. 25.
    Zhuo J, Cao Y, Zhu J, Zhang B, Nie Z (2016) Segment-level sequence modeling using gated recursive semi-markov conditional random fields. In: ACLGoogle Scholar

Copyright information

© Indian Institute of Science 2019

Authors and Affiliations

  1. 1.IIT BombayMumbaiIndia

Personalised recommendations