WikiJournal Preprints/Non-canonical base pairing

From Wikiversity
Jump to navigation Jump to search

WikiJournal Preprints logo.svg

WikiJournal Preprints
Open access • Publication charge free • Public peer review

WikiJournal User Group is a publishing group of open-access, free-to-publish, Wikipedia-integrated academic journals. <seo title=" Wikiversity Journal User Group, WikiJournal Free to publish, Open access, Open-access, Non-profit, online journal, Public peer review "/>

<meta name='citation_doi' value=>

Article information

Authors: Dhananjay Bhattacharyya[a][i]ORCID iD.svg , Abhijit Mitra[b]

Bhattacharyya, D; Mitra, A. 


Non-canonical base pairs are various kinds of planar base-base associations stabilized by hydrogen bonds, which are different from Watson-Crick base pairs. The 3-D structures of DNA or RNA are generally known to have base pairs between complementary bases, Ade:Thy (Ade:Ura in RNA) or Gua:Cyt, involving their canonical or regular Watson-Crick base pairing edges. Recent developments, however, reveal that the nucleotide bases are also capable to form large number of various other types of pairing between non-complementary bases. These base pairs, held together by multiple hydrogen bonds, are reasonably planar, and quite stable. These are generally referred to as Non-Canonical Base Pairs.


Non-canonical base pairing Fig1a.png


Non-canonical base pairing Fig1b.png


Figure 1 |  A) IUPAC atom numbering scheme of the natural nucleotide units B) Atom numbering scheme of nucleotide sugar-phosphate backbone

Double helical structures of DNA or even RNA are known to be stabilized by Watson-Crick base pairing between the purines, Ade and Gua, with the pyrimidines, Thy (or Ura for RNA) and Cyt respectively. In this scheme, the N1 atoms of purine residues form hydrogen bond with N3 atoms of pyrimidine residues in A:T and G:C complementarity. The second hydrogen bond in A:T base pairs involves the N6 atom of Ade and the O4 atom of Thy. Similarly, the second hydrogen bond in G:C base pairs involves O6 and N4 atoms of Gua and Cyt respectively. The G:C base pairs also have a third hydrogen bond involving the N2 atom of Gua and the O2 atom of Cyt. However, though the same Watson-Crick hydrogen bonding pattern was observed between guanine and cytosine when co crystallized in vitro, a different hydrogen bonding pattern was observed between adenine and thymine when similarly co crystallized. Thus, the first high resolution structure of a Ade:Thy base pair, as solved by Hoogsteen, showed that there are two hydrogen bonds involving N7 and N6 atoms of Ade and N3 and O4 (or O2) atoms of Thy, respectively (Figure 1). This led to the nomenclature of Hoogsteen base pairing whenever a hydrogen bond involves N7 atom of a purine residue. It was noticed that even in double stranded DNA, where canonical Watson Crick base pairs bind the two complementary strands together, there were occasional occurrences of Hoogsteen and other non-Watson-Crick base pairs.[1]

Another example of non-Watson-Crick base pairing can be found in chromosomal DNA. These mainly consist of double helical DNA stabilized by Watson-Crick base pairing but the ends of the chromosomes, particularly the 3'-ends, are single stranded. Some sequence motifs (such as TTAGGG in human) consisting of three to four Guanine residues at a stretch are found to be repeated several times. Such sequence motifs are called telomeres, whose structure has been solved by X-ray crystallography as well as NMR spectroscopy. They are found to be stabilized by Hoogsteen base pairing between two Gua residues. In these four stranded mini-helical structures, four Gua residues form a near planar base quartet where each Gua participates in base pairing with its neighboring Gua, involving their Watson-Crick and Hoogsteen edges in a cyclic manner.

While in DNA, non-canonical base pairs are relatively rare, in RNA molecules, where a single polymeric strand folds onto itself, the occurrence of non-Watson-Crick base pairs turns out to be far more prevalent. As early as in the early 70’s, analysis of the crystal structure of Yeast tRNAPhe showed that RNA structures possess significant variations in base pairing schemes, generally known as Non-Canonical Base Pairing. Subsequently, the structures of Ribozyme, Ribosome, Riboswitch, etc. has highlighted their abundance, and hence the need for a comprehensive characterization, of non-Canonical Base Pairs.


Classification based on geometric family[edit]


Figure 2 |  Three hydrogen bonding edges of a typical nucleotide (Gua), showing nomenclature of each edge
Frédéric Dardel, CC-BY-SA

The nucleotide bases are nearly flat heterocyclic moieties, with conjugated pi-electron cloud, and are by and large hydrophobic in nature. Each of the base has several hydrogen bond donor and acceptor atoms, distributed along their three edges, namely the edge capable of forming Watson-Crick base pair (Watson-Crick or W edge), the edge capable of forming Hoogsteen base pair (Hoogsteen or H edge) and the edge involving O-H group of ribose sugar (Sugar or S edge) (Figure 2). The bases can thus, in principle, be involved in hydrogen bond mediated pairing with other bases involving any one of their three edges. It may be noted that, unlike the Hoogsteen edge of purines, the corresponding edges of the pyrimidine bases do not have any polar hydrogen bond acceptor atoms such as N7. However, these bases have C—H groups at their C6 or C5 atoms, which can act as weak hydrogen bond donors. Thus, the Hoogsteen edge is also called Hoogsteen/C-H edge in a unified scheme to designate equivalent positions of purines as well as pyrimidines. Thus, the total number of possible edge combinations for any two bases to pair are 6, namely W:W, W:H, W:S, H:H, H:S and S:S.

The hydrogen bonding between two bases may give rise to a base pair similar to the canonical Watson-Crick one, where the sugar moieties attached to the two bases are on the same side of the hydrogen bonding interaction axis. These base-pairs are generally called Cis base pairs. The two sugars may also be oriented in Trans orientation with respect to the hydrogen bonding interaction axis, These are also called Reverse base pairs according to Saenger nomenclature.[2] This gives rise to 12 geometric families of base pairs, corresponding to 6 edge combinations, each having 2 options for glycosidic bond orientation, cis and trans respectively.

Accordingly, any base pair can be systematically characterized according to the syntax <Base 1: Base 2> <Edge 1: Edge 2> <Glycosidic Bond Orientation>, where Base 1 and Base 2 are decided on the basis of nucleotide residue number. Considering 4 possible bases, Ade, Ura,

Gua and Cyt, there are 16 possible base pair permutations, corresponding to each of the 12 geometric families. Thus, in principle there are (16*12 = 192) – (4*3*2 = 24) = 168 different base pairs, where the number 24 corresponds to duplications where Base 1 = Base 2 and Edge 1 = Edge 2. Of course this number 168 is only an indicator. It includes base-edge combinations where base pairs cannot be formed due to absence of hydrogen bond donor acceptor complementarities and excludes possibilities of multimodality and bifurcated base pairs, and instances of base pairs involving modified bases, protonated bases and water or ion mediation in hydrogen bond formation.

Classification based on isostericity[edit]

Although significant differences are there between structures of non-canonical base pairs belonging to different geometric families, some base pairs within the same geometric family have been found to substitute each other without disrupting the overall structure. These base pairs are called isosteric base pairs. Isosteric base pairs always belong to same geometric families, but all base pairs in a particular geometric family are not always isosteric. Two base pairs are called isosteric if they meet the following three criteria: (i) The C1′–C1′ distances should be same; (ii) the paired bases should be related by the same rotation in 3D space; and (iii) H-bonds formation should occur between equivalent base positions.[3][4]

Classification based on local strand orientation[edit]

It may be noted here that because of the geometric relationship of the bases with the sugar phosphate backbone, these 12 geometric families of base pairs are associated with two possible local strand orientations, namely parallel and antiparallel. For the 6 families with edge combinations involving Watson-Crick and Sugar edges, W:W, W:S and S:S, cis and trans families are respectively associated with antiparallel and parallel local strand orientations. Introduction of the Hoogsteen edge, as one of the partners in the combination, causes an inversion in the relationship. Thus for W:H and H:S, cis and trans respectively correspond to parallel and antiparallel local strand orientation. As expected, when both the edges are H, a double inversion is observed, and H:H cis and trans correspond respectively to antiparallel and parallel local strand orientations.[5] The annotation of local strand orientation in terms of parallel and antiparallel helps to understand which faces of the individual bases can be seen on a given base pair face. This annotation also helps in classifying the 12 geometries into two groups of 6 each, where the geometries can potentially interconvert within each group, by in-plane relative rotation of the bases.

Non-canonical base pairing Fig3.png

Figure 3 |  Descriptions of the hydrogen bonding atoms, along with their precursors, for a typical non-canonical base pair (as used by BPFIND)


There are several algorithms to detect base pairs in structures solved by X-ray crystallography, NMR or other methods. Essentially the programs detect hydrogen bonds between two bases and confirm that the two bases are nearly co-planar for declaring a base pair. As most of the available structures of RNA in public domain are solved by X-ray crystallography, the positions of hydrogen atoms are rarely reported. Hence, detection of hydrogen bond becomes a non-trivial job. The algorithm by Lu and Olson [2] confirms two bases to be paired when they detect one or more hydrogen bond(s) between the bases, by actually modelling the positions of the hydrogen atoms, and by ensuring the perpendiculars to the two bases being nearly parallel to each other. The base pairs detected by this method are listed in NDB.

The other algorithm by Das and co-workers demands at least two hydrogen bonds between the bases.[6] This hypothesis driven algorithm depends on distances between two pairs of atoms (hydrogen bond donor (D1 and D2) and acceptor (A1 and A2) and four suitably chosen precursor atoms (PD1, PD2, PA1, PA2) corresponding to the D's and A's (Figure 3). Small values of such distances (within 3.8A) and linearity of the angles between PD1—D1—A1, D1—A1—PA1, PD2—D2—A2, D2—A2—PA2 simultaneously ensures two aspects: i) the hydrogen bonds are strong and linear and ii) the two bases are co-planar. Sometimes it was observed that two closely spaced bases are oriented in such a way that two electronegative hydrogen bond acceptor atoms are very close to each other, which might cause severe electrostatic repulsion. The concept of protonated base pairing was introduced in such cases, which appear to give sufficient stabilization to the systems.

Strengths and stabilities[edit]

As stated above, A:U or A:T base pairs, as proposed by Watson and Crick, are stabilized by two hydrogen bonds, hence all the base pairs detected by the second method are expected to be as stable as the canonical A:U base pair. Several groups attempted to detect the binding energy in these non-canonical base pairs using different quantum chemistry based approaches, such as Density Functional Theory (DFT).[7][8][9][10][11][12][13][14] It was found that the base pairs having two or more hydrogen bonds generally maintain same hydrogen bonding pattern in the optimized structures in absence of rest of the RNA chains, also indicating inherent stability of the Non-Canonical Base-Pairs. Interaction energies of the non-canonical base pairs were also calculated following geometry optimization, which also indicate good stability of most of the Non-Canonical base-pairs.[7][8][9][10][11][12][13][14]

As the canonical and even non-canonical base pairs are sufficiently planer, these can stack on top of another base pair, giving rise to a double helix. Several such instances were found when the non-canonical base pairs are involved in stacking interactions in the structures of different functional RNA molecules.[15] Of course all non-canonical base pairs are not extremely strong and stable. Several base pairs have been detected on the basis of weak hydrogen bond involving C—H…O/N atoms, whose interaction energies are rather small. Similarly several base pairs were detected which are stabilized by hydrogen bond using 2’-OH group of sugar, which are quite flexible due to inherent variability of sugar pucker or glycosidic torsion angle. Structural features of some of the observed base pairs and their stacks can be found in a database.[16]

Non-canonical base pairing Fig4a.png


Non-canonical base pairing Fig4b.png


Figure 4 |  A) Ade:Gua Trans H:S base pair, an example of frequently observed non-canonical base pair B) Ade:Ura Trans H:W base pair, another frequently observed one


Some of the non-canonical base pairs appear very frequently in different structures of RNA, such as Ade:Gua Trans Hoogsteen/Sugar edge (or tHS), Ade:Ura Trans Hoogsteen/Watson-Crick (tHW), etc (Figure 4). The Ade:Gua base pair is often found at the ends of double helices, acting as a capping element to maintain the double helical structure. Thus, they are major constituent of GNRA tetraloop. The occurrence frequency of a non-canonical base pair, however, does not necessarily correlate with strength and stability of the same. Geometry, hydrogen bonding, interaction energy, frequency, position, etc, information of all the non-canonical base pairs can be found in NDB,[17] RNABPCOGEST,[18] RNABPDB,[16] etc. databases.

Structural Features[edit]

Non-canonical base pairing Fig5.png

Figure 5 |  IUPAC recommended Intra Base Pair parameters used to describe geometry of Watson-Crick or Non-Canonical base pair

Structural features of a base-pair, formed by two planar rigid units, can be measured and expressed by six parameters -- three translational and three rotational. IUPAC recommended parameters are Propeller, Buckle, Open Angle, Stagger, Shear and Stretch (Figure 5). There are different software, namely Curves,[19] 3DNA,[20] NUPARM,[21] etc. which calculate these parameters. While the first two calculates the parameters of canonical and non-canonical base-pairs assuming they are similar, the NUPARM algorithm uses base pairing edge specific axis system. Hence, for most non-canonical base-pairs, some of the parameters (Open, Shear and Stretch) calculated by Curves or 3DNA are calculated as large unusual, while the values by NUPARM indicate quality of hydrogen bonding and planarity of the two bases through small values. The Stretch values for all types of base pairs, as calculated by NUPARM, on the other hand are always around 3Ǻ or more.

Higher Order Structures formed[edit]

One base, after forming proper planar base pairing with a second base, can often participate in base pair formation with a third base forming a base triple. One such classical example is in formation of DNA triple helix where two bases of two antiparallel strands form consecutive Watson-Crick base pairs in a double helix and a base of a third strand form Hoogsteen base pairing with the purine of the Watson-Crick base pair. Many such types of base triples have been seen in the available structures of RNA and the whole variety have been nicely classified in literature.[22] Higher Order structure is not limited within formation of base triple. Four bases giving rise to a base quartet is now well documented in G-quadruplex in structure of telomere. In this case four Guanine residues pair up within themselves in a cyclic form involving Watson-Crick:Hoogsteen cis (cWH) base pairing scheme. Three to four such base quadruplexes stack on top of the other to form a four stranded DNA structure. In addition to such cyclic topology, several other topologies of base:base pairings have been detected in the RNA structures.[23]

Non-canonical Base-pairs in Double Helical Regions[edit]

Most often the non-canonical base pairs appear in the RNA structures as isolated contacts between different residues stabilizing the appropriate fold. As for example, some of the non-canonical base pairs in tRNA appear between the D-stem and TψC loops, which are close in the three-dimensional structure. Such base pairing interactions give stability to the L-shaped structure of tRNA. However, the G:U cWW non-canonical base pairs are seen quite frequently within double helical regions as this base pair is nearly isosteric to the other canonical ones.[24]

Similarly, many non-canonical base pairs, e.g. Ade:Gua tHS (trans Hoogsteen/Sugar edge) or Ade:Ura tHW (trans Hoogsteen/Watson-Crick), Ade:Gua cWW, etc, are seen often within double helical regions giving rise to symmetric internal bulge like motifs. Attempts has been made recently to classify all such situations where two base pairs (canonical or non-canonical) stack in anti-parallel sense possibly giving rise to double helical regions in RNA structures.[25]

It is expected that a proper understanding of these Non-canonical base pairings would improve the methods of prediction of RNA structure from sequence, hence would be quite useful for human health.

Additional information[edit]


Debasish Mukherjee, Saha Institute of Nuclear Physics, Kolkata, INDIA and Antarip Halder, International Institute of Information Technology, Hyderabad, INDIA

Competing interests[edit]

Any conflicts of interest that you would like to declare. Otherwise, a statement that the authors have no competing interest.


  1. Burley, Stephen K.; Kodadek, Thomas; Yang, Sang-Hwa; Sun, Liping; Kim, Joseph L.; Patikoglou, Georgia A. (1999-12-15). "TATA element recognition by the TATA box-binding protein has been conserved throughout evolution" (in en). Genes & Development 13 (24): 3217–3230. doi:10.1101/gad.13.24.3217. ISSN 1549-5477. PMID 10617571. 
  2. Saenger, Wolfram (1984). Principles of Nucleic Acid Structure. New York, NY: Springer New York. pp. 1–8. ISBN 9780387907611.
  3. Leontis, N. B. (2002-08-15). "The non-Watson-Crick base pairs and their associated isostericity matrices". Nucleic Acids Research 30 (16): 3497–3531. doi:10.1093/nar/gkf481. ISSN 1362-4962. 
  4. Leontis, Neocles B.; Zirbel, Craig L.; Stombaugh, Jesse; Nasalean, Lorena (2009). Non-Protein Coding RNAs. Springer Series in Biophysics. Springer, Berlin, Heidelberg. pp. 1–26. doi:10.1007/978-3-540-70840-7_1. ISBN 9783540708339.
  5. LEONTIS, NEOCLES B.; WESTHOF, ERIC (2001-04). "Geometric nomenclature and classification of RNA base pairs". RNA 7 (4): 499–512. doi:10.1017/s1355838201002515. ISSN 1355-8382. 
  6. Das, Jhuma; Mukherjee, Shayantani; Mitra, Abhijit; Bhattacharyya, Dhananjay (2006-10). "Non-Canonical Base Pairs and Higher Order Structures in Nucleic Acids: Crystal Structure Database Analysis". Journal of Biomolecular Structure and Dynamics 24 (2): 149–161. doi:10.1080/07391102.2006.10507108. ISSN 0739-1102. 
  7. 7.0 7.1 Roy, Ashim; Panigrahi, Swati; Bhattacharyya, Malyasri; Bhattacharyya, Dhananjay (2008-03). "Structure, Stability, and Dynamics of Canonical and Noncanonical Base Pairs:  Quantum Chemical Studies". The Journal of Physical Chemistry B 112 (12): 3786–3796. doi:10.1021/jp076921e. ISSN 1520-6106. 
  8. 8.0 8.1 Marino, Tiziana (2014-06-01). "DFT investigation of the mismatched base pairs (T-Hg-T)3, (U-Hg-U)3, d(T-Hg-T)2, and d(U-Hg-U)2" (in en). Journal of Molecular Modeling 20 (6): 2303. doi:10.1007/s00894-014-2303-8. ISSN 0948-5023. 
  9. 9.0 9.1 Brovarets’, Ol’ha O.; Yurenko, Yevgen P.; Hovorun, Dmytro M. (2013-06-03). "Intermolecular CH···O/N H-bonds in the biologically important pairs of natural nucleobases: a thorough quantum-chemical study". Journal of Biomolecular Structure and Dynamics 32 (6): 993–1022. doi:10.1080/07391102.2013.799439. ISSN 0739-1102. 
  10. 10.0 10.1 Šponer, Judit E.; Leszczynski, Jerzy; Sychrovský, Vladimír; Šponer, Jiří (2005-10). "Sugar Edge/Sugar Edge Base Pairs in RNA:  Stabilities and Structures from Quantum Chemical Calculations". The Journal of Physical Chemistry B 109 (39): 18680–18689. doi:10.1021/jp053379q. ISSN 1520-6106. 
  11. 11.0 11.1 Bhattacharyya, Dhananjay; Koripella, Siv Chand; Mitra, Abhijit; Rajendran, Vijay Babu; Sinha, Bhabdyuti (2007-08). "Theoretical analysis of noncanonical base pairing interactions in RNA molecules". Journal of Biosciences 32 (S1): 809–825. doi:10.1007/s12038-007-0082-4. ISSN 0250-5991. 
  12. 12.0 12.1 Kelly, R. E. A.; Kantorovich, L. N. (2006-02). "Homopairing Possibilities of the DNA Base Thymine and the RNA Base Uracil:  An ab Initio Density Functional Theory Study". The Journal of Physical Chemistry B 110 (5): 2249–2255. doi:10.1021/jp055552o. ISSN 1520-6106. 
  13. 13.0 13.1 Oliva, R. (2006-02-06). "Accurate energies of hydrogen bonded nucleic acid base pairs and triplets in tRNA tertiary interactions". Nucleic Acids Research 34 (3): 865–879. doi:10.1093/nar/gkj491. ISSN 0305-1048. 
  14. 14.0 14.1 Sharma, Purshotam; Šponer, Judit E.; Šponer, Jiří; Sharma, Sitansh; Bhattacharyya, Dhananjay; Mitra, Abhijit (2010-03-11). "On the Role of the cis Hoogsteen:Sugar-Edge Family of Base Pairs in Platforms and Triplets—Quantum Chemical Insights into RNA Structural Biology". The Journal of Physical Chemistry B 114 (9): 3307–3320. doi:10.1021/jp910226e. ISSN 1520-6106. 
  15. Halder, Sukanya; Bhattacharyya, Dhananjay (2012-09-20). "Structural Variations of Single and Tandem Mismatches in RNA Duplexes: A Joint MD Simulation and Crystal Structure Database Analysis". The Journal of Physical Chemistry B 116 (39): 11845–11856. doi:10.1021/jp305628v. ISSN 1520-6106. 
  16. 16.0 16.1 "RNA Base Pair Database (RNABPDB)". Retrieved 2019-01-07.
  17. "RNA Basepair Catalog". Retrieved 2019-01-07.
  18. "RNABP COGEST". Retrieved 2019-01-07.
  19. Lavery, Richard; Zakrzewska, Krystyna; Pasi, Marco; Blanchet, Christophe (2011-07-01). "CURVES+ web server for analyzing and visualizing the helical, backbone and groove parameters of nucleic acid structures" (in en). Nucleic Acids Research 39 (suppl_2): W68–W73. doi:10.1093/nar/gkr316. ISSN 0305-1048. 
  20. Lu, X.-J. (2003-09-01). "3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures". Nucleic Acids Research 31 (17): 5108–5121. doi:10.1093/nar/gkg680. ISSN 1362-4962. PMID 12930962. PMC PMC212791. 
  21. "NUPARM-Plus-A Program for analyzing sequence dependent variations in nucleic acids". Retrieved 2019-01-09.
  22. Abu Almakarem, Amal S.; Petrov, Anton I.; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B. (2011-11-02). "Comprehensive survey and geometric classification of base triples in RNA structures". Nucleic Acids Research 40 (4): 1407–1423. doi:10.1093/nar/gkr810. ISSN 1362-4962. PMID 22053086. PMC PMC3287178. 
  23. "QUARNA". Retrieved 2019-01-09.
  24. Stombaugh, Jesse; Zirbel, Craig L.; Westhof, Eric; Leontis, Neocles B. (2009-02-24). "Frequency and isostericity of RNA base pairs". Nucleic Acids Research 37 (7): 2294–2312. doi:10.1093/nar/gkp011. ISSN 0305-1048. PMID 19240142. PMC PMC2673412. 
  25. "RNA Base Pair Database (RNABPDB)". Retrieved 2019-01-07.