Abstract
V(D)J gene segments undergo combinatorial recombination in the T-cells and B-cells to provide humans and other vertebrates with a large number of antibodies required for immunity. Each such recombination further undergoes mutations in their DNA sequences so that they can recognize diverse antigens. Predicting the combination of gene segments which formed a particular antibody is an essential task for studying disease propagation and analysis. We propose a model based on conditional random fields (CRFs) for predicting the boundary positions between V-D-J gene segments. We train the CRFs by generating synthetic gene recombinations using all of the alleles of the V, D and J gene segments. The alleles corresponding to a read can be determined by mapping the segmented reads to the DNA sequences of the gene segments using softwares like BLAST and usearch. We test our method on simulated dataset as well as real data of Stanford_S22 individual.
Chapter PDF
Similar content being viewed by others
References
Interactive Image Segmentation with Conditional Random Fields, vol. 2 (2008)
Boyd, S.D., Marshall, E.L., Merker, J.D., Maniar, J.M., Zhang, L.N., Sahaf, B., Jones, C.D., Simen, B.B., Hanczaruk, B., Nguyen, K.D., Nadeau, K.C., Egholm, M., Miklos, D.B., Zehnder, J.L., Fire, A.Z.: Measurement and clinical monitoring of human lymphocyte clonality by massively parallel v-d-j pyrosequencing. Science Translational Medicine 1(12), 12–23 (2009)
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)
Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)
Edgar, R.C.: Search and clustering orders of magnitude faster than blast. Bioinformatics 26(19), 2460–2461 (2010)
Fippiat, J.-P., Williams, S.C., Tomlinson, L.M., Cook, G.P., Cherif, D., Le Paslier, D., Collins, J.E., Dunham, l., Winter, G., Lefranc, M.-P.: Organization of the human immunoglobulin lambda light-chain locus on chromosome 22q11.2. Human Molecular Genetics 4(6), 983–991 (1995)
Gata, B.A., Malming, H.R., Jackson, K.J.L., Bain, M.E., Wilson, P., Collins, A.M.: ihmmune-align: hidden markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences. Bioinformatics 23(13), 1580–1587 (2007)
Giudicelli, V., Chaume, D., Lefranc, M.-P.: IMGT/V-QUEST, an integrated software program for immunoglobulin and T cell receptor VJ and VD J rearrangement analysis. Nucleic Acids Research 32(suppl. 2), W435–W440 (2004)
Jackson, K.J.L., Boyd, S., Gaëta, B.A., Collins, A.M.: Benchmarking the performance of human antibody gene alignment utilities using a 454 sequence dataset. Bioinformatics 26(24), 3129–3130 (2010)
Jung, D., Giallourakis, C., Mostoslavsky, R., Alt, F.W.: Mechanism and control of v(d)j recombination at the immunoglobulin heavy chain locus. Annual Review of Immunology 24(1), 541–570 (2006)
Kudo, T.: Crf++: Yet another crf toolkit (2005)
Lafferty, J., Mccallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)
Lefranc, M.-P.: Imgt, the international immunogenetics database: a high-quality information system for comparative immunogenetics and immunology. Developmental &; Comparative Immunology 26(8), 697–705 (2002)
Li, M.-H., Lin, L., Wang, X.-L., Liu, T.: Protein protein interaction site prediction based on conditional random fields. Bioinformatics 23(5), 597–604 (2007)
Lorenz, W., Straubinger, B., Zachau, H.G.: Physical map of the human immunoglobulin k locus and its implications for the mechanisms of vkjk rearrangement. Nucleic Acids Research 15(23), 9667–9676 (1987)
Mccallum, A., Li, W.: Early results for named entity recognition with conditional random fields (2003)
Munshaw, S., Kepler, T.B.: SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements. Bioinformatics 26(7), 867–872 (2010)
Neuberger, M.S.: Antibody diversification by somatic mutation: from burnet onwards. Immunolo. Cell Biol. 86, 124–132 (2008)
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim A Sequencing Simulator for Genomics and Metagenomics. PLoS ONEÂ 3(10), e3373+ (2008)
Souto-Carneiro, M.M., Longo, N.S., Russ, D.E., Sun, H.-W.W., Lipsky, P.E.: Characterization of the human Ig heavy chain antigen binding complementarity determining region 3 using a newly developed software algorithm, JOINSOLVER.. Journal of immunology (Baltimore, Md.: 1950) 172(11), 6790–6802 (2004)
Volpe, J.M., Cowell, L.G., Kepler, T.B.: Soda: implementation of a 3d alignment algorithm for inference of antigen receptor recombinations. Bioinformatics 22(4), 438–444 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Malhotra, R., Prabhakara, S., Acharya, R. (2012). Predicting V(D)J Recombination Using Conditional Random Fields. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds) Pattern Recognition in Bioinformatics. PRIB 2012. Lecture Notes in Computer Science(), vol 7632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34123-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-34123-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34122-9
Online ISBN: 978-3-642-34123-6
eBook Packages: Computer ScienceComputer Science (R0)