pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines

Zhang, Shanxin; Chang, Minjun; Zhou, Zhiping; Dai, Xiaofeng; Xu, Zhenghong

doi:10.1007/s00438-018-1436-3

pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines

Methods Paper
Published: 29 March 2018

Volume 293, pages 1035–1049, (2018)
Cite this article

Molecular Genetics and Genomics Aims and scope Submit manuscript

Shanxin Zhang^1,2,
Minjun Chang³,
Zhiping Zhou¹,
Xiaofeng Dai^4,5 &
…
Zhenghong Xu^2,4,5

563 Accesses
15 Citations
1 Altmetric
Explore all metrics

Abstract

DNase I hypersensitive sites (DHSs) are hallmarks of chromatin zones containing transcriptional regulatory elements, making them critical in understanding the regulatory mechanisms of gene expression. Although large amounts of DHSs in the plant genome have been identified by high-throughput techniques, current DHSs obtained from experimental methods cover only a fraction of plant species and cell processes. Furthermore, these experimental methods are both time-consuming and expensive. Hence, it is urgent to develop automated computational means to efficiently and accurately predict DHSs in the plant genome. Recently, several methods have been proposed to predict the DHSs. However, all these methods took a lot of time to build the model, making them inappropriate for data with massive volume. In the present work, a new ensemble extreme learning machine (ELM)-based model called pDHS-ELM was proposed to predict the DHSs in the plant genome by fusing two different modes of pseudo-nucleotide composition. Here, two kinds of features including reverse complement kmer and pseudo-nucleotide composition were used to represent the DHSs. The ELM model was used to build the base classifiers. Then, an ensemble framework was employed to combine the outputs of these base classifiers. When applied to DHSs in Arabidopsis thaliana and rice (Oryza sativa) genome, the proposed method could obtain accuracies up to 88.48 and 87.58%, respectively. Compared with the state-of-the-art techniques, pDHS-ELM achieved higher sensitivity, specificity, and Matthew’s correlation coefficient with much less training and test time. By employing pDHS-ELM, we identified 42,370 and 103,979 DHSs in A. thaliana and rice genome, respectively. The predicted DHSs were depleted of bulk nucleosomes and were tightly associated with transcription factors. Approximately 90% of the predicted DHSs were overlapped with transcription factors. Meanwhile, we demonstrated that the predicted DHSs were also associated with DNA methylation, nucleosome positioning/occupancy, and histone modification. This result suggests that pDHS-ELM can be considered as a new promising and powerful tool for transcriptional regulatory elements analysis. Our pDHS-ELM tool is available from the following website https://github.com/shanxinzhang/pDHS-ELM/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting

Article 19 July 2020

Shengli Zhang & Tian Xue

iNuc-ext-PseTNC: an efficient ensemble model for identification of nucleosome positioning by extending the concept of Chou’s PseAAC to pseudo-tri-nucleotide composition

Article 05 October 2018

Muhammad Tahir, Maqsood Hayat & Sher Afzal Khan

i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites

Article 08 April 2021

Tian Xue, Shengli Zhang & Huijuan Qiao

References

Cao J, Lin Z et al (2012) Voting based extreme learning machine. Inf Sci 185(1):66–77
Article Google Scholar
Celniker SE, Dillon LAL et al (2009) Unlocking the secrets of the genome. Nature 459(7249):927–930
Article PubMed PubMed Central CAS Google Scholar
Chen W, Zhang X et al (2015) PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31(1):119–120
Article PubMed CAS Google Scholar
Chen W, Tang H et al (2016) iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther 5(7):e332. https://doi.org/10.1038/mtna.2016.37
Article CAS Google Scholar
Cheng X, Zhao S-G et al (2017) iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals. Bioinformatics 33(3):341–346
PubMed Google Scholar
Chou K-C (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273(1):236–247
Article PubMed CAS Google Scholar
Fan YX, Shen HB (2014) Predicting pupylation sites in prokaryotic proteins using pseudo-amino acid composition and extreme learning machine. Neurocomputing 128(5):267–272
Article Google Scholar
Feng P, Jiang N et al (2014) Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions. Sci World J. https://doi.org/10.1155/2014/740506
Article Google Scholar
Freeling M, Subramaniam S (2009) Conserved noncoding sequences (CNSs) in higher plants. Curr Opin Plant Biol 12(2):126–132
Article PubMed CAS Google Scholar
Henikoff S, Henikoff JG et al (2009) Genome-wide profiling of salt fractions maps physical properties of chromatin. Genome Res 19(3):460–469
Article PubMed PubMed Central CAS Google Scholar
Huang GB, Zhu QY et al (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Article Google Scholar
Huang GB, Wang DH et al (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Article Google Scholar
Huang GB, Zhou H et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B 42(2):513–529
Article Google Scholar
Jia J, Liu Z et al (2016) pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 394:223–230
Article PubMed CAS Google Scholar
Jiang J (2015) The ‘dark matter’ in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin. Curr Opin Plant Biol 24:17–23
Article PubMed CAS Google Scholar
Jin C, Zang C et al (2009) H3.3/H2A.Z double variant-containing nucleosomes mark ‘nucleosome-free regions’ of active promoters and other regulatory regions. Nat Genet 41(8):941–945
Article PubMed PubMed Central CAS Google Scholar
Jin W, Tang Q et al (2015) Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature 528(7580):142–146
PubMed PubMed Central CAS Google Scholar
Kabir M, Yu D-J (2017) Predicting DNase I hypersensitive sites via un-biased pseudo trinucleotide composition. Chemom Intell Lab Syst 167:78–84
Article CAS Google Scholar
Lan Y, Soh YC et al (2009) Ensemble of online sequential extreme learning machine. Neurocomputing 72(13):3391–3395
Article Google Scholar
Liu N, Wang H (2010) Ensemble based extreme learning machine. IEEE Signal Process Lett 17(8):754–757
Article Google Scholar
Liu B, Liu F et al (2015a) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43(W1):W65–W71
Article PubMed PubMed Central CAS Google Scholar
Liu G, Xing Y et al (2015b) Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. J Theor Biol 382:15–22
Article PubMed Google Scholar
Liu B, Long R et al (2016) iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32(16):2411–2418
Article PubMed Google Scholar
Liu Y, Tian T et al (2017) PCSD: a plant chromatin state database. Nucleic Acids Res 46(D1):D1157–D1167
Article PubMed Central Google Scholar
Noble WS, Kuehn S et al (2005) Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics 21(Suppl 1):i338–i343
Article PubMed CAS Google Scholar
Pajoro A, Madrigal P et al (2014) Dynamics of chromatin accessibility and gene regulation by MADS-domain transcription factors in flower development. Genome Biol 15(3):R41. https://doi.org/10.1186/gb-2014-15-3-r41
Article PubMed PubMed Central CAS Google Scholar
Qiu W-R, Sun B-Q et al (2016) iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics 32(20):3116–3123
Article PubMed CAS Google Scholar
Savojardo C, Fariselli P et al (2011) Improving the detection of transmembrane β-barrel chains with N-to-1 extreme learning machines. Bioinformatics 27(22):3123–3128
Article PubMed CAS Google Scholar
Sullivan AM, Arsovski AA et al (2014) Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep 8(6):2015–2030
Article PubMed CAS Google Scholar
Turco G, Schnable JC et al (2013) Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. Front Plant Sci 4:170. https://doi.org/10.3389/fpls.2013.00170
Article PubMed PubMed Central CAS Google Scholar
Wang DD, Wang R et al (2014) Fast prediction of protein–protein interaction sites based on extreme learning machines. Neurocomputing 128(128):258–266
Article Google Scholar
Xing P, Su R et al (2017) Identifying N(6)-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine. Sci Rep 7:46757. https://doi.org/10.1038/srep46757
Article PubMed PubMed Central Google Scholar
You Z-H, Lei Y-K et al (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinform 14(8):S10. https://doi.org/10.1186/1471-2105-14-s8-s10
Article Google Scholar
Zhang W, Wu Y et al (2012a) High-resolution mapping of open chromatin in the rice genome. Genome Res 22(1):151–162
Article PubMed PubMed Central CAS Google Scholar
Zhang W, Zhang T et al (2012b) Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis. Plant Cell 24(7):2719–2731
Article PubMed PubMed Central CAS Google Scholar
Zhang T, Marand A et al (2015) PlantDHS: a database for DNase I hypersensitive sites in plants. Nucleic Acids Res 44:D1148–D1153
Article PubMed PubMed Central CAS Google Scholar
Zhang S, Zhou Z et al (2017) pDHS-SVM: a prediction method for plant DNase I hypersensitive sites based on support vector machine. J Theor Biol 426:126–133
Article PubMed CAS Google Scholar

Download references

Funding

This work was supported by the Fundamental Research Funds for the Central Universities (No. JUSRP115A27).

Author information

Authors and Affiliations

Engineering Research Center of Internet of Things Technology Applications (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, Wuxi, 214122, Jiangsu, China
Shanxin Zhang & Zhiping Zhou
School of Medicine and Pharmaceuticals, Jiangnan University, Wuxi, 214122, Jiangsu, China
Shanxin Zhang & Zhenghong Xu
College of Letters and Science, University of California, Berkeley, Berkeley, CA, 94720, USA
Minjun Chang
School of Biotechnology, Jiangnan University, Wuxi, 214122, Jiangsu, China
Xiaofeng Dai & Zhenghong Xu
National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, Wuxi, 214122, Jiangsu, China
Xiaofeng Dai & Zhenghong Xu

Authors

Shanxin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Minjun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiping Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Dai
View author publications
You can also search for this author in PubMed Google Scholar
Zhenghong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shanxin Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by S. Hohmann.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 5499 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Chang, M., Zhou, Z. et al. pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines. Mol Genet Genomics 293, 1035–1049 (2018). https://doi.org/10.1007/s00438-018-1436-3

Download citation

Received: 06 September 2017
Accepted: 27 March 2018
Published: 29 March 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s00438-018-1436-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines

Abstract

Access this article

Similar content being viewed by others

Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting

iNuc-ext-PseTNC: an efficient ensemble model for identification of nucleosome positioning by extending the concept of Chou’s PseAAC to pseudo-tri-nucleotide composition

i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Electronic supplementary material

Supplementary material 1 (DOCX 5499 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting

iNuc-ext-PseTNC: an efficient ensemble model for identification of nucleosome positioning by extending the concept of Chou’s PseAAC to pseudo-tri-nucleotide composition

i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Electronic supplementary material

Supplementary material 1 (DOCX 5499 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation