The evolution of CpG islands by tandem duplications

V. N. Babenko, Yu L. Orlov, Zh T. Isakova, D. A. Antonov, M. I. Voevoda

CG-rich islands (CpG islands or CGI) are important functional elements in the genome of vertebrates. In particular, they (a) initiate transcription—bidirectional in some cases, due to the self-complementarity of the CG dinucleotides—as promoters in most (>50%) genes of vertebrates; (b) form a global methylation landscape; and (c) act to “switch-off” transcription via methylation. The degenerate nature of CpG islands (elevated CG content) implies an increase in the probability of tandem repeats and palindromes within a CpG island. In this work, tandem duplications of complete CpG islands (megamonomers with a length of 400–5000 bp) are identified in the human genome. We have found both intergenic and intragenic tandem duplications of CpG islands. The discovered CGI duplications are mediated through CG-rich subcentromeric and telomeric satellites and SINEs. The similarity of the monomers in tandem repeats in some cases suggests the existence of selection pressure on the structure of such loci. The context of intergenic tandem CGI repeats indicates their potential role in leveling the CG composition in the genome segment. The found tandem CGIs are transcriptionally active in a wide range of tissues and cell lines. The considered phenomenon of CGI cluster organization is most pronounced in chromosome 19, known for abundant segment duplications and gene expansions. The DXZ4 megasatellite, which resides in the long (q) arm of chromosome X, also belonging to the CGIs generated by tandem duplications, is another unique genome segment.

