WikiJournal Preprints/Non-canonical base pairing

From Wikiversity
Jump to navigation Jump to search

WikiJournal Preprints logo.svg

WikiJournal Preprints
Open access • Publication charge free • Public peer review


WikiJournal User Group is a publishing group of open-access, free-to-publish, Wikipedia-integrated academic journals. <seo title=" Wikiversity Journal User Group, WikiJournal Free to publish, Open access, Open-access, Non-profit, online journal, Public peer review "/>

<meta name='citation_doi' value=>

Article information

Authors: Dhananjay Bhattacharyya[a][i]ORCID iD.svg , Abhijit Mitra[b]

WikiJournal Preprints/Non-canonical base pairing, Wikidata Q39049436




Abstract

Non-canonical base pairs are planar hydrogen bonded pairs of nucleobases, having hydrogen bonding patterns which differ from the patterns observed in Watson-Crick base pairs, as in the classic double helical DNA. The structures of polynucleotide strands of both DNA and RNA molecules can be understood in terms of sugar-phosphate backbones consisting of phosphodiester-linked D 2’ deoxyribofuranose (D ribofuranose in RNA) sugar moieties, with purine or pyrimidine nucleobases covalently linked to them. Here, the N9 atoms of the purines, guanine and adenine, and the N1 atoms of the pyrimidines, cytosine and thymine (uracil in RNA), respectively, form glycosidic linkages with the C1’ atom of the sugars. These nucleobases can be schematically represented as triangles with one of their vertices linked to the sugar, and the three sides accounting for three edges through which they can form hydrogen bonds with other moieties, including with other nucleobases. As also explained in greater details later in this article, the side opposite to the sugar linked vertex is traditionally called the Watson-Crick edge, since they are involved in forming the Watson-Crick base pairs which constitute building blocks of double helical DNA. The two sides adjacent to the sugar-linked vertex are referred to, respectively, as the sugar and Hoogsteen (C-H for pyrimidines) edges.

Each of the four different nucleobases are characterized by distinct edge-specific distribution patterns of their respective hydrogen bond donor and acceptor atoms, complementarity with which, in turn, define the hydrogen bonding patterns involved in base pairing. The double helical structures of DNA or RNA are generally known to have base pairs between complementary bases, Adenine:Thymine (Adenine:Uracil in RNA) or Guanine:Cytosine. They involve specific hydrogen bonding patterns corresponding to their respective Watson-Crick edges, and are considered as Canonical Base Pairs. At the same time, the helically twisted backbones in the double helical duplex DNA form two grooves, major and minor, through which the hydrogen bond donor and acceptor atoms corresponding respectively to the Hoogsteen and sugar edges are accessible for additional potential molecular recognition events. Experimental evidences reveal that the nucleotide bases are also capable of forming a wide variety of pairing between bases in various geometries, having hydrogen bonding patterns different from those observed in Canonical Base Pairs [Figure 1]. These base pairs, which are generally referred to as Non-Canonical Base Pairs, are held together by multiple hydrogen bonds, and are mostly planar and stable. Most of these play very important roles in shaping the structure and function of different functional RNA molecules. In addition to their occurrences in several double stranded stem regions, most of the loops and bulges that appear in single-stranded RNA secondary structures form recurrent 3D motifs, where non-canonical base pairs play a central role. Non-canonical base pairs also play crucial roles in mediating the tertiary contacts in RNA 3D structures.

Adenine:Guanine Trans HS, found in many GNRA tetraloop Adenine:Uracil Trans HW, found in many recurrent structural motifs Guanine:Guanine Cis WH, found in G-Quadruplex of DNA or RNA and in many other motifs Cytosine:Cytosine Trans WW, found in i-motif DNA and other forms. One of the Cyt base needs to be protonated to avoid electrostatic repulsion

Figure 1: Examples of few frequently observed non-canonical base pairs, Adenine:Guanine trans Hoogsteen/Sugar-edge, Adenine:Uracil trans Hoogsteen/Watson-Crick, Guanine:Guanine cis Watson-Crick/Hoogsteen, Protonated Cytosine(+):Cytosine trans Watson-Crick/Watson-Crick

History[edit | edit source]

Double helical structures of DNA as well as in folded single stranded RNA are now known to be stabilized by Watson-Crick base pairing between the purines, Adenine and Guanine, with the pyrimidines, Thymine (or Uracil for RNA) and Cytosine. In this scheme, the N1 atoms of the purine residues form hydrogen bond with N3 atoms of the pyrimidine residues in A:T and G:C complementarity (see Figure 2 for atom labeling scheme according to IUPAC-IUB convention). The second hydrogen bond in A:T base pairs involves the N6 amino group of Adenine and the O4 atom of Thymine (or Uracil in RNA). Similarly, the second hydrogen bond in G:C base pairs involves O6 atom and N4 amino group of Guanine and Cytosine, respectively. The G:C base pairs also have a third hydrogen bond involving the N2 amino group of Guanine and the O2 atom of Cytosine. However, even till about twenty years after this scheme was initially proposed by James D. Watson and Francis H.C. Crick,[1] experimental evidences suggesting other forms of base-base interactions continued to draw the attention of researchers investigating the structure of DNA.[2][3] The first high resolution structure of a Adenine:Thymine base pair, as solved by Karst Hoogsteen by single crystal X-ray crystallography in 1959[4] revealed a structure with two hydrogen bonds involving N7 and N6 atoms of Adenine and N3 and O4 (or O2) atoms of Thymine, respectively [Figure 1b and 2], which was very different from what was proposed by Watson and Crick. In order to distinguish this alternate base pairing scheme from the Watson-Crick scheme, base pairs where a hydrogen bond involves the N7 atom of a purine residue have been referred to as Hoogsteen base pair, and later, the purine base edge which includes its N7 atom is referred to as its Hoogsteen edge. The first high resolution structure of Guanine:Cytosine pair, obtained by W. Guschelbauer also was similar to the Hoogsteen base pair, although this structure required an unusual protonation of N1 imino nitrogen of Cytosine, which is possible only at significantly lower pH.[5] Experimental evidences, including low resolution NMR studies[6] as well as high resolution X-ray crystallographic studies,[7] supporting Watson-Crick base pairing were obtained as late as in the early 70’s. Almost a decade later, with the advent of efficient DNA synthesis methods, Richard Dickerson[8] followed by several other groups, solved structures of the physiological double helical B-DNA of complete helical turn based on the crystals of synthetic DNA oligomers.[9][10][11] The pairing geometries of the A:T (A:U in RNA) and G:C pairs in these structures confirmed the common or canonical form of base pairing as proposed by Watson and Crick, while those with all other geometries, and compositions, are now referred to as non-canonical base pairs.

It was noticed that even in double stranded DNA, where canonical Watson Crick base pairs associate the two complementary antiparallel strands together, there were occasional occurrences of Hoogsteen and other non-Watson-Crick base pairs.[12][13][14][15][16][17] It was also proposed that Hoogsteen base pair formation could be a transient phenomenon within Watson-Crick base pair[17] dominated DNA double helices.

While canonical Watson-Crick base pairs are most prevalent and common in forming majority of chromosomal DNA or most functional RNAs, presence of stable non-canonical base pairs in DNA biology is also extremely important. An example of non-Watson-Crick, or non-canonical, base pairing can be found at the ends of chromosomal DNA. The 3'-ends of chromosomes contain single stranded overhangs with some conserved sequence motifs (such as TTAGGG in most vertebrates). The single stranded region adopts some definite three-dimensional structures, which has been solved by X-ray crystallography as well as by NMR spectroscopy.[18][19][20] The single strands containing the above sequence motifs are found to form interesting four stranded mini-helical structures stabilized by Hoogsteen base pairing between Guanine residues. In these structures, four Guanine residues form a near planar base quartet, referred to as G-quadruplex, where each Guanine participates in base pairing with its neighboring Guanine (Figure 3), involving their Watson-Crick and Hoogsteen edges in a cyclic manner. The four central carbonyl groups are often stabilized by potassium ions (K+). From the full genomic sequences of different organisms, it has been observed that telomere like sequences sometimes also interrupt double helical regions near transcription start site of some oncogenes, such as c-myc. It is possible that these sequence stretches form G-quadruplex like structures can suppress the expression of the related genes. The complementary Cytosine rich sequences, on the other strand, may adopt another similar four stranded structure, the i-motif, stabilized by Cytosine:Cytosine non-canonical base pairs.

Figure 3. (a) Structure of a representative G-Quadruplex consisting of Hoogsteen base pairs between every neighboring Guanine residues (from PDB ID. 1KF1). (b) Three G-quadruplexes stack to form four stranded telomere with different topologies for d(GGGATTGGGATTGGGATTGGG) sequence.

While non-canonical base pairs are still relatively rare in DNA, in RNA molecules, where generally a single polymeric strand folds onto itself to form various secondary and tertiary structures, the occurrence of non-Watson-Crick base pairs turns out to be far more prevalent. As early as in the 1970’s, analysis of the crystal structure of Yeast tRNAPhe showed that RNA structures possess significant non-canonical variations in base pairing schemes. Subsequently, the structures of Ribozymes, Ribosome, Riboswitches, etc. have highlighted their abundance, and hence the need for a comprehensive characterization of Non-Canonical Base Pairs. These three-dimensional RNA structures generally possess several secondary structural motifs, such as double helical stems, stems with hairpin loops, symmetric and asymmetric internal loops, kissing loops between two hairpin motifs, pseudoknots, continuous stacks between two segments of helices, multi helix junctions[21][22] etc. along with single stranded regions. These secondary structural motifs, except for the single stranded motifs, are stabilized by hydrogen bonded base pairs and several of these are non-canonical base pairs, including G:U Wobble base pairs.

It is notable in this context, that the Wobble hypothesis of Francis Crick predicted the possibility of G:U base pair, in place of the canonical G:C or A:U base pairs, also mediating the recognition between mRNA codons and tRNA anticodons, during protein synthesis. Today, as can be seen in the corresponding Wiki page, the G:U wobble base pair is the most numerously observed non-canonical base pair. While, because of its geometric similarity with the canonical base pairs, they frequently occur in the double helical stem regions of RNA structures, the geometric differences continue to draw the attention of nucleic acid researchers, providing new insights related to its structural significance. It may be noted that though, as in DNA, the base pairs in the folded RNA structures, give rise to double helical stems, its two cleft regions – the major groove and minor groove, differ in their respective dimensions from those in DNA double helices. Unlike for those in DNA, the sequence discriminating major grooves in RNA double helices are very narrow and deep. On the other hand the minor groove regions, though wide and shallow, do not carry much sequence specific information in terms of the hydrogen bonding donor-acceptor positioning of the corresponding base pair edges.[23] The G:U wobble base pairs, along with the various other non-canonical base pairs, introduce variations in the structures of RNA double helices, thus enhancing the accessibility of the discriminating major groove edges of associated base pairs. This has been seen to be very important for molecular recognition steps during tRNA aminoacylation as well as in ribosome functions.[24]

Considering the immense importance of the non-canonical base pairs in RNA structure, folding and functions, researchers from multiple domains – biology, chemistry, physics, mathematics, computer science, etc., have joined in the effort to understand their structure, dynamics, function and their consequences. The complexities associated with experimental handling of RNA further underline the importance of diverse theoretical inputs towards addressing these issues.

Types[edit | edit source]

Two bases may approach each other in various ways, eventually leading to specific molecular recognition mediated by, often non-canonical, base pairing interactions, in addition to strong stacking interactions. These are essential for the process of RNA single strands folding into three-dimensional structures. Early studies on such unusual base pairs by Jiri Sponer, Pavel Hobza and their group were somewhat disadvantaged due to the unavailability of suitable unambiguous systematic naming schemes[25]. While some of the observed base pair were assigned names following the Saenger nomenclature scheme.[26] others were arbitrarily assigned names by different researchers.  It may be mentioned that some attempts were also made by Michael Levitt and coworkers to classify base-base association in terms of adjacency of bases, through either pairing or stacking interactions.[27]  There was clearly a need for a classification scheme for different types of non-canonical base pairs, which could comprehensively and unambiguously handle newer variants coming up due to the rapid increase in the sampling space. Different approaches which have evolved in response to this need are discussed below.

Classification based on hydrogen bonding[edit | edit source]

The nucleotide bases are nearly planar heterocyclic moieties, with conjugated pi-electron cloud, and with several hydrogen bonding donors and accepters distributed around the edges, usually designated as W, H or S, based on whether the edges can respectively be involved in forming Watson-Crick base pair, Hoogsteen base pair, or, whether the edge is adjacent to the C2’-OH group of the ribose sugar.   Eric Westhoff and Neocles Leontis29 used these edge designations to propose a, currently widely accepted, nomenclature scheme for base pairs. The hydrogen bonding donor and acceptor atoms could thus be classified in terms of their positioning along their three edges, namely the Watson-Crick or W edge, the Hoogsteen or H edge, and the Sugar or S edge [Figure 4]. Since base pairs are mediated through hydrogen bonding interactions based on hydrogen bond donor-acceptor complementarity, this, in turn, provides a convenient bottoms-up approach towards classifying base pair geometries in terms of respective interacting edges of the participating bases. It may be noted that, unlike the Hoogsteen edge of purines, the corresponding edges of the pyrimidine bases do not have any polar hydrogen bond acceptor atom such as N7. However, these bases have C—H groups at their C6 and C5 atoms, which can act as weak hydrogen bond donors, as proposed by Gautam Desiraju30. The Hoogsteen edge, hence, is also called Hoogsteen/C-H edge in a unified scheme for designating equivalent positions of purines as well as pyrimidines. Thus, the total number of possible edge combinations involved in base pairing are 6, namely Watson-Crick/Watson-Crick (or W:W), Watson-Crick/Hoogsteen (or W:H), Watson-Crick/Sugar (or W:S), Hoogsteen/Hoogsteen (or H:H), Hoogsteen/Sugar (or H:S) and Sugar/Sugar (or S:S). Figure 4: (a) Three hydrogen bonding edges of the four nucleotides (Guanine), showing nomenclature of each edge and (b) Cis and Trans orientations of the sugar moieties of the two nucleotide residues glycosidic bonds of a base pair with respect to hydrogen bonding direction. The arrows in (b) indicate glycosidic bonds as vectors.

In the canonical Watson-Crick base pairs, the glycosidic bonds attaching the N9 (of purine) and N1 (of pyrimidine) of the paired bases with their respective sugar moieties, are on the same side of the mean hydrogen bonding axis, and are hence called Cis Watson-Crick base pairs. However, the relative orientations of the two sugars may also be Trans with respect to the mean hydrogen bonding axis giving rise to a distinct Trans Watson-Crick geometric class, consisting of species which were earlier referred to as reverse Watson-Crick base pairs according to Saenger nomenclature<sup>27</sup>. The possibility of both Cis and Trans glycosidic bond orientation for each of the 6 possible edge combinations, gives rise to 12 geometric families of base pairs (Table 1).

According to the Leontis-Westhoff scheme,[28] any base pair can be systematically and unambiguously named using the syntax <Base_1: Base_2><Edge_1: Edge_2><Glycosidic Bond Orientation> where Base_1 and Base_2 carry information on respective base identities and their nucleotide number. This nomenclature scheme also allows us to enumerate the total number of distinct possible base pair types. For a given glycosidic bond orientation, say Cis, the four naturally occurring bases each have three possible edges for formation of base pairs giving rise to 12 such possible base pairing edge identities, each of which can in principle form base pairing with any edge of another base, irrespective of complementarity. This gives rise to a 12x12 symmetric matrix displaying 144 pairwise permutations of base pairing edge identities, where, apart from the 12 diagonal entries, others include repeat combinations. Thus, there are 78 (= 12 + 132/2) unique entries corresponding to the cis glycosidic bond orientation.  Considering both cis and trans glycosidic bond orientations, the number of base pair types amounts to 156.

Of course, this number 156 is only an indicator. It includes base-edge combinations where base pairs cannot be formed due to absence of hydrogen bond donor acceptor complementarities.  For example, potential pairing between two Guanine residues utilizing their Watson-Crick edges in cis form (cWW) is not supported by hydrogen bonding donor-acceptor complementarity, and is never observed. This method of enumerating the possible number of distinct base pair types also does not consider possibilities of multimodality or bifurcated base pairs, or even instances of base pairs involving modified bases, protonated bases and water or ion mediation in hydrogen bond formation. Two Cytosine bases can form trans Watson-Crick/Watson-Crick (tWW) base pairing with their neutral as well as hemi protonated forms, possibly both, giving rise to the i-motif DNA. However, both C(+):C tWW and C:C tWW, are counted as one type among 156 possible types.

Classification based on isostericity[edit | edit source]

Although significant differences are there between structures of non-canonical base pairs belonging to different geometric families, some base pairs within the same geometric family have been found to substitute each other without disrupting the overall structure. These base pairs are called isosteric base pairs. Isosteric base pairs always belong to same geometric families, but all the base pairs in a particular geometric family are not always isosteric. Two base pairs are called isosteric if they meet the following three criteria: (i) The C1′–C1′ distances should be similar; (ii) the paired bases should be related by the similar rotation in 3D space; and (iii) H-bonds formation should occur between equivalent base positions.[29][30]  A detailed approach towards quantifying isostericity, in terms of an IsoDiscrepancy Index (IDI), which can facilitate reliable prediction regarding which base pair substitutions can potentially occur in conserved motifs, was formulated by Neocles Leontis, Craig Zirbel and Eric Westhof.[31] Based on IDI values and available base pair structural data, the group maintains a curated online base pair catalogue and an updated set of Isostericity Matrices (IM) corresponding to each of the 12 geometric families. Using this resource, one can comprehensively classify different types of canonical and non-canonical base pairs in terms of their positions in the Isostericity Matrices. This approach, for example, indicates that the four base pair types: A:U cWW, U:A cWW, G:C cWW and C:G cWW are isosteric to each other. Thus, as also confirmed by detailed sequence comparisons, double mutations altering A:U cWW to U:A cWW or even to G:C cWW may not disturb the structure, and, unless stability issues are involved, the function of the related RNA.  It was also found that G:U cWW is not really isosteric to U:G cWW, indicating that such double mutations may significantly affect the functioning of the corresponding RNA.[32] On the other hand, some of the base pairs which are stabilized involving Sugar edge of the bases are mutually isosteric.

Classification based on local strand direction[edit | edit source]

It may be noted here that because of the geometric relationship of the bases with the sugar phosphate backbone, these 12 geometric families of base pairs are associated with two possible local strand orientations, namely parallel and antiparallel. For the 6 families with edge combinations involving Watson-Crick and Sugar edges, W:W, W:S and S:S, cis and trans families are respectively associated with antiparallel and parallel local strand orientations [Table 1]. Introduction of the Hoogsteen edge, as one of the partners in the combination, causes an inversion in the relationship. Thus, for W:H and H:S, cis and trans respectively correspond to parallel and antiparallel local strand orientation. As expected, when both the edges are H, a double inversion is observed, and H:H cis and trans correspond respectively to antiparallel and parallel local strand orientations[33]. The annotation of local strand orientation in terms of parallel and antiparallel directions helps to understand which faces of the individual bases can be seen for a given base pair from the 5’- or the 3’ sides [Table 1].  This annotation also helps in classifying the 12 geometries into two groups of 6 each, where the geometries can potentially interconvert within each group, by in-plane relative rotation of the bases. However, one should note that the above theory is applicable only when the glycosidic torsion angles of both the nucleotide residues are anti. Notably, crystallographic observations[34] and energetic[35] considerations indicate that syn glycosidic torsions are also quite possible.  Hence the above classification of parallel or antiparallel nature of strand directions, by itself, does not always provide the correct understanding.

Various functional RNA molecules are stabilized, in their specific folded pattern, by both canonical as well as non-canonical base pairs. Most tRNA molecules, for example, are known to have four short double helical segments, giving rise to a cloverleaf like two-dimensional structure. The three-dimensional structure of tRNA, however, takes an L-shape. As shown in Figure 5, this is mediated by several non-canonical base pairs and base triplets. The D-loop and TψC loop are held together by several such base pairs.  While it is not possible to include here the complete range of non-canonical base pair varieties, some of the frequently occurring representatives are shown in Figure 1. Interested readers are encouraged to browse through different websites such as NDB,[36] RNABPDB,[37] RNABP COGEST,[38] etc., to get a better understanding.

It may be noted that the above scheme is valid for naturally occurring nucleotide bases. However, there are plenty of examples of post-transcriptional chemical modifications of the bases, many of which are seen in tRNAs or ribosomes. It may be important to understand their structural features also.[39][40]

Identification[edit | edit source]

In case of double helical DNA, identification of base pairs is quite trivial using molecular visualizers such as VMD, RasMol, etc. It is, however, not so simple for single stranded folded functional RNA molecules.  Several algorithms have been implemented in software tools for the automated detection of base pairs in RNA structures solved by X-ray crystallography, NMR or other methods. Essentially the programs detect hydrogen bonds between two bases, and ensure their (near) planar orientation, before reporting that they constitute a base pair. Since most of the structures of RNA, available in public domain, are solved by X-ray crystallography, the positions of hydrogen atoms are rarely reported. Hence, detection of hydrogen bond becomes a non-trivial job.

The DSSR algorithm[41] by Lu and Wilma K. Olson considers two bases to be paired when they detect one or more hydrogen bond(/s) between the bases, by actually modeling the positions of the hydrogen atoms, and by ensuring the perpendiculars to the two bases being nearly parallel to each other. The positions of the hydrogen atoms can be deduced by converting Internal Coordinates (bond length, bond angle and torsion angle) along with positions of precursor atoms, such as amino group nitrogen atoms and those bonded to the nitrogen or Z-matrix to external Cartesian Coordinates. The base pairs identified by this method are listed in NDB[42] and FR3D 43 databases.

A unique way of identification of base pairs in RNA was incorporated in MC-Annotate[43] by Francois Major. In this algorithm they make use of the positions of the hydrogen atoms as well as lone-pair electrons using suitable molecular mechanics/dynamics force-fields[44] and derive hydrogen bond formation probabilities for them. The final identifications of base pairs are done based on these probabilities and approach of hydrogen atoms to lone-pairs electrons of nitrogen or oxygen. This method also attempted to classify the base pair nomenclature with additional information of each interacting edge, such as Ws indicating the sugar edge corner of the Watson-Crick edge, Wh representing the Hoogsteen edge corner of Watson-Crick edge, Bw indicating bifurcated three-center hydrogen bond involving both the hydrogen atoms of amino groups to form hydrogen bonds with a carbonyl oxygen involving both of its lone-pairs, etc. As claimed by the authors, this nomenclature scheme adds some additional features to the Leontis-Westhof (LW)[45] scheme and may be referred to as the LW+ scheme. A major advantage of this scheme lies in its ability to distinguish between alternative base pairing geometries, where multimodality is observed within an LW family. This method, however, does not consider the possible participation of the 2'-OH group of the ribose sugars in base pair formation.

Another algorithm, namely BPFIND by Bhattacharyya and coworkers,[46] demands at least two hydrogen bonds using two distinct sets of donors and acceptors atoms between the bases. This hypothesis driven algorithm considers distances between two pairs of atoms (hydrogen bond donor (D1 and D2) and acceptor (A1 and A2) and four suitably chosen precursor atoms (PD1, PD2, PA1, PA2) corresponding to the D's and A's (as shown for a representative base pair in Figure 6). Small values of such distances in conjunction with large values of the angles defined by PD1—D1—A1, D1—A1—PA1, PD2—D2—A2, D2—A2—PA2 (close to 180o or πc) ensures two structural features which characterize well defined base pairs: i) the hydrogen bonds are strong and linear and ii) the two bases are co-planar. Notably, so long as one restricts the search to base pairs which are stabilized by at least two distinct hydrogen bonds, the above algorithms, by and large, yield the same set of base pairs in different RNA structures.

Sometimes in the crystal structures it is observed that two closely spaced bases are oriented in such a way that apart from the regular hydrogen bonds two additional electronegative hydrogen bond acceptor atoms are very close to each other, which may cause electrostatic repulsion. The concept of protonated base pairing, implicating a possible protonation of one of these electronegative, (potentially) hydrogen bond acceptor atoms thus converting it into a hydrogen bond donor, was introduced to explain stability of such geometries.[47][48][46][47] Some of the NMR derived structures also support the protonation hypothesis, but possibly more rigorous studies using neutron diffraction or other techniques would be able to confirm it. The quality of the crystal structures permitting, some algorithms also attempted to detect water or cation mediated base pair formation.[45][49]

Non-canonical base pairing Fig3.png

Figure 3 |  Descriptions of the hydrogen bonding atoms, along with their precursors, for a typical non-canonical base pair (as used by BPFIND)

Strengths and stabilities[edit | edit source]

The canonical Watson-Crick base pairs, G:C and A:T/U as well as most of the non-canonical ones are stabilized by two or more (e.g. 3 in the case of G:C) hydrogen bonds. Justifiably, a significant amount of research on non-canonical base pairs has been carried out towards benchmarking their strengths (interaction energies) and (geometric) stabilities against those of the canonical base pairs. It may be noted here that base pair geometries, as observed in the crystal structures, are often influenced by several interactions present in the crystal environment, thus perturbing their intrinsically stable geometries arising out of the hydrogen bonding and related interactions between the two bases. Therefore, in principle, it is possible that the observed geometries in some cases are intrinsically unstable, and that they are stabilized by other interactions provided by the environment. Several groups have attempted to determine the interaction energies in these non-canonical base pairs using different quantum chemistry based approaches, such as Density Functional Theory (DFT) or MP2 methods.[50][51][52][53][54][55][56][57][58] These methods were applied on suitably truncated, hydrogen-added, and geometry optimized models of the base (or nucleoside) pairs extracted from PDB structures. Depending upon the optimization protocol, typically three types of interaction energies have been reported. In the first method, the base pair model geometries, isolated from their respective environments, are fully optimized without any constraints.[59][51][53][56][57] thus providing the intrinsic geometries and interaction energies of the isolated models. This procedure, however, sometimes leads to optimized geometries of base pairs involving edges different from initial crystal geometry. Abhijit Mitra and collaborators also used an additional second protocol, where the heavy atom (non-hydrogen) coordinates are retained as in the crystal geometries, optimizing only the positions of the added hydrogen atoms.[52][55][58] In the third protocol, followed mostly by Jiri Sponer and his group,[50] optimization was carried out with constraints on some angles and dihedrals.  Given that the models are extracted from their respective crystal structures, and are isolated from their crystal environments, the second and the third protocols provide two different approaches towards approximating the environmental effects, without explicit considerations of any specific environmental interactions.  This has further been addressed in some reports by considering specific environmental factors, such as coordination with Magnesium, or even some covalent modifications to the bases.[51]

All the three protocols are useful in their respective contexts. Further, a comparison of the model geometries, obtained by the different protocols, provide an idea regarding both, the stabilities of the corresponding base pair geometries, as well as regarding the probable extent and nature of environmental influences. It was found that most non-canonical base pairs, having two or more hydrogen bonds, generally maintain the same hydrogen bonding pattern in the crystal and in fully optimized in isolation geometries, respectively, thus indicating their intrinsic geometric stabilities. Interaction energies calculated from these optimized models also indicated the energetic stabilities of the corresponding non-canonical base pairs.  The previous notion that non-canonical base pairs are weaker than the Watson-Crick base pairs, was found to be incorrect. Interaction energies between the bases of Several base pairs, such as G:G tWW, G:G cWH, A:U cHW, G:A cWW, G:U cWW, etc., are found to be larger than that of canonical A:U cWW base pair.[60]

Of course all non-canonical base pairs are not necessarily very strong or stable in terms of interaction energy.  Several base pairs have been detected on the basis of weak hydrogen bonds involving C—H…O/N atoms, where interaction energies are rather small. Further, geometry optimizations of some of the observed base pairs, in particular, but not limited to those involving weak hydrogen bonds, or those stabilized by single hydrogen bonds, were found to adopt alternate geometries,[61][62][63] thus indicating their intrinsic lack of geometric stability. These alteration of hydrogen bonding schemes, giving rise to changes in base pairing family upon free optimization, may have some functional implication in RNA, such as their action as conformational switch. Accordingly, as mentioned above in the Sponer’s protocol, there have been some attempts to restrain the experimentally observed geometry while carrying out geometry optimization[64] for interaction energy calculations. Interestingly, in several cases, interaction energies calculated for these ‘away from intrinsically stable’ geometries also indicate good energetic stability.

Though the energetics and geometric stabilities of different non-canonical base pairs do not show any generalized correlations, analysis of several databases, such as RNABPDB[65]36 and RNABP COGEST,[66] which catalogue structural and energetic features of some of the observed base pair and their stacks, reveal some interesting general trends.

For example, geometry optimizations of several base pairs involving 2’-OH group of sugar residue resulted in significant alterations from their initial geometry. This is possibly due to flexibility of the sugar puckers and glycosidic torsions. The significantly high interaction energies of protonated base pairs, despite the high energy cost of base protonation, also deserve a special mention in this context. This can mostly be attributed to the additional   charge-induced dipole interactions which are associated with protonated base pairs.[67]

Non-canonical base pairing Fig4a.png

A

Non-canonical base pairing Fig4b.png

B

Figure 4 |  A) Ade:Gua Trans H:S base pair, an example of frequently observed non-canonical base pair B) Ade:Ura Trans H:W base pair, another frequently observed one

Structural features[edit | edit source]

Non-canonical base pairing Fig5.png

Figure 5 |  IUPAC recommended Intra Base Pair parameters used to describe geometry of Watson-Crick or Non-Canonical base pair

Structural features of a base-pair, formed by two planar rigid units, can be quantified, using six parameters – three translational and three rotational. IUPAC recommended parameters are Propeller, Buckle, Open Angle, Stagger, Shear and Stretch (Figure 7).[68] Brief description of these in the context of DNA double helical structure can be found in Wiki.  There are several publicly available software, such as Curves[69] by Richard Lavery, 3DNA[70] by Wilma Olson, NUPARM[71][72] by Manju Bansal, etc., which may be used to calculate these parameters. While the first two calculate the parameters of canonical and non-canonical base-pairs relative to the standard canonical Watson-Crick base pairs geometry, the NUPARM algorithm calculates in absolute terms using base pairing edge specific axis system. Hence, for most non-canonical base-pairs, which involve non-Watson-Crick edges, some of the parameters (Open, Shear and Stretch) calculated by Curves or 3DNA are usually large even in their respective intrinsically most stable geometries.  On the other hand, the values provided by NUPARM indicate the quality of hydrogen bonding and planarity of the two bases in a more realistic fashion. Thus, the NUPARM Stretch values, indicating separation of the two bases of a base pair, and which depend on optimal hydrogen bonding distances, are always around 3Ǻ. Some other general trends observed in the values of the above parameters may be of interest to note. Most of the cis base pairs are seen to have Propeller values around -10o and small values of Buckle and Stagger. The Open and Shear values often depend on positions of the hydrogen bonding atoms. As for example, GU cWW wobble base pairs have Shear value around -2.2Ǻ while GC or AU cWW base pairs have Shear values around zero. The Open values for most base pairs are close to zero but the values are often rather large for those involving 2’-OH group of sugar in the NUPARM derived parameter set. The trans base pairs, however, do not show any systematic trend in their Propeller values.

Roles[edit | edit source]

RNA[edit | edit source]

The structural hierarchy in RNA is usually described in terms of a stem-loop 2D secondary structure, which further folds to form its 3D tertiary structure, stabilized by what are referred to as long range tertiary contacts. Most often the non-canonical base pairs are involved in those tertiary contacts or extra-stem base pairs. For example, some of the non-canonical base pairs in tRNA appear between the D-stem and TψC loops (Figure 5), which are close in the three-dimensional structure. Such base pairing interactions give stability to the L-shaped structure of tRNA. In this region, some base pairs are found to be additionally hydrogen bonded to a third base.  Thus, as shown in Figure 5, the 23rd residue is simultaneously paired to 9th and 12th residues, together forming a base triple, the smallest member of the class of higher order multiplets.

Multiplets[edit | edit source]

One base, in addition to forming proper planar base pairing with a second base, can often participate in base pair formation with a third base forming a base triple. One such classic example is in formation of DNA triple helix, where two bases of two antiparallel strands form consecutive Watson-Crick base pairs in a double helix and a base of a third strand form Hoogsteen base pairing with the purine bases of the Watson-Crick base pairs. Many different types of base triples have been reported in the available RNA structures and have been elegantly classified in the literature.[73] Multiplets are however not limited to triplet formation. Four bases giving rise to a base quartet is now well documented in the structure of the G-quadruplex (Figure 3) characteristically found in the telomere. Here four Guanine residues pair up within themselves in a cyclic form involving Watson-Crick/Hoogsteen cis (cWH) base pairing scheme and each of the Guanine bases are found to be respectively interact with two other guanine bases. Three to four such base G-quadruplexes stack on top of the other to form a four stranded DNA structure. In addition to such a cyclic topology, several other topologies of base:base pairings are possible for higher order multiplets such as quartets, pentets etc.[74]

Double helical regions[edit | edit source]

Non-canonical base pairs quite frequently appear within double helical regions of RNA. The G:U cWW non-canonical base pairs are seen very frequently within double helical regions as this base pair is nearly isosteric to the other canonical ones.[75][76][77] Due to complication of strand direction, as elaborated in the Classification section (Table 1), not all types of non-canonical base pairs can be accommodated within double helical regions with anti glycosidic torsion angles. However, many non-canonical base pairs, e.g. A:G tHS (trans Hoogsteen/Sugar edge) or A:U tHW (trans Hoogsteen/Watson-Crick), A:G cWW, etc., are often seen within double helical regions giving rise to symmetric internal loop like motifs. Attempts have been made to classify all such situations where two base pairs (canonical or non-canonical) stack in anti-parallel sense possibly giving rise to double helical regions in RNA structures.[78] These base pairs are quite stable, and they are able to maintain the helical property quite well. The backbone torsion angles around these residues are also generally within reasonable limits: C3'-endo sugar pucker with anti glycosidic torsion, α/γ around -60o/60o, β/ε around 180o.

Recurrent structural motifs[edit | edit source]

Non-canonical base pairs often appear in different structural motifs, including pseudoknots, with their special hydrogen bonding features. Structural features of these recurrent motifs have been archived in searchable databases, such as, FR3D[79] and RNA FRABASE.[80] Also,  several of these motifs can be identified in a given query PDB file by the NASSAM[81] web-server. They are most frequently detected at the termini of double helical segment acting as capping residues, often preceding hairpin loops. The most frequently found non-canonical base pair, namely G:A tSH, is an integral part of GNRA tetraloops, where N can be any nucleotide residue and R is a purine residue. This motif shows some amount of flexibility and alterations of structural features depending on whether the Guanine and Adenine are paired or not. Several other types of tetraloops motifs, such as UNCG, YNMG, GNAC, CUYG, (where Y stands for pyrimidine and M is either Adenine or Cytosine) etc., have been found in available RNA structures. However, these do not generally show involvement of non-canonical base pairing. In addition to these common hairpin motifs, where the loop residues largely remain unpaired, there are also a few motifs where the loop residues make extensive interactions between themselves or with other residues external to the loop. A common example is the C-loop motif,[82][83] where the bulging loop residues make non-canonical base pairing with the bases of double helical regions forming non-canonical base pairing (Figure 8). The extra base pairs in these cases give rise to additional stabilization to the composite double helix containing motif. Non-canonical base pairs are also involved in receptor-loop interaction, such as in T-loop motif[82] as shown in Figure 9.{| class="wikitable" | (a) | (b) |} Figure 8. An example of higher order structure (C-loop) in RNA by formation of base triples using non-canonical base pair from PDB ID 1KOG (a) by schematic representation and (b) by molecular visualizer.
{| class="wikitable" | (a) | (b) |} Figure 9. An example of T-loop motif from an extended RNA 2D structural profile from PDB ID 1U9S by (a) schematic representation and (b) molecular visualizer.  The loop residues 105 and 106 form base triples with the 61:84 Watson-Crick base pair. Another interesting example of the involvement of non-canonical base pairs in recurrent contexts was detected as the GAAA receptor motif, which consists of A:A cHS base pair followed by U:A tWH base pair stacked on both sides by G:C cWW base pairs. Here we have successive non-canonical base pairs within an antiparallel RNA double helical domain.  Similarly there is an A:A cSH base pair involving two consecutive residues in this motif. Such pairing between consecutive residues, which is also termed as a dinucleotide platform motif, is quite commonly observed. They appear in many RNA structures and the pairing can also be between other bases, and can involve other base pairing edges. Such dinucleotide platform was reported in A:A, A:G, A:U, G:A, G:U base pairs belonging to the cSH class and also in A:A cHH base pairs. These motifs can alter the strand direction within a double helix by formation of kinks. Such dinucleotide platform along with triplet formation is also an integral component of the Sarcin-ricin motif[29]. Modeling of RNA structures containing Non-canonical base pairs: Prediction of biomolecular structure from sequence alone is a long term goal of scientists working in the fields of bioinformatics, computational chemistry, statistical physics as well as in computer science. Prediction of protein structures from amino acid sequence by methods like homology modeling, comparative modeling, threading, etc were successful to some extent due to availability of about 1200 unique protein folds. Inspired by the protein experience, there are now several approaches towards predicting RNA structures, albeit with varying degrees of success.  Any comprehensive discussion on RNA modeling is beyond the scope of this article, and one may browse the “List of RNA structure prediction software” for getting an idea about the growing interest in this area. Nevertheless, some general observations, as summarized below, may be useful in the current context. It can be seen that most of the approaches are essentially limited to the prediction of RNA 2D stem-loop structure, also referred to as RNA secondary structure. For example, minimum computed free energy prediction of double helical regions of RNA sequences from the energy of base pairing and stacking interactions, essentially computationally derived from experimental thermodynamic data, was initially introduced by Ruth Nussinov and later by Michael Zuker. This, in turn, has inspired several related modified algorithms, including data on neighboring group interactions etc[75].  Most of these approaches, however, mainly consider data on canonical base pairing, with only a few which also consider thermodynamic data on Hoogsteen base pairs. Thus, in addition to the computational costs and complications associated with the identification of pseudoknots, all these methods also suffer from the drawback associated with the paucity of experimental data on non-canonical base pairs. However, there are also several approaches which attempt at predicting the tertiary 3D structure corresponding to given predicted 2D structures. There are also a few involving 3D fragment based modeling[84], which are getting further facilitated with the increasing availability of motif wise curated RNA 3D structure data[75]. It is also, encouraging to note that there are now some software and servers, such as MC-Fold[85], RNAPDBee[86], RNAWolfe[87], etc. available for exploring non-canonical base pairing in RNA 3D structures. Some of these methods depend on structural database of RNA, such as FRABASE[80], to obtain 3D coordinates of motifs containing non-canonical base pairs and stitch the information with 3D structure of double helices containing canonical base pairs. It may be relevant in this context, to mention about the approach towards 3D model building of double helical regions with both canonical and non-canonical base pairs used in 3DNA[88] by Olson or in RNAHelix[89] by Bhattacharyya and Bansal.  These software suites use base pair parameters to generate 3D coordinates of individual dinucleotide steps, which can be extended to model double helices of arbitrary lengths with canonical or non-canonical base pairs.  The above mentioned methods attempt to model a single structure (2D or 3D) of a given RNA sequence. However, growing evidences indicate that a given RNA sequence can adopt ensemble of structures and possibly interconvert between them[90].  This ensembles obviously adopt different base pairing patterns between different sets of residues[91]. Thus, there are enough pointers to suggest that the focus on modeling single structures appears to have been a bottleneck for accurate modeling of RNA structure. The theoretical prediction of RNA 2D structure and consequently 3D structure can also be confirmed by different chemical probing methods. One of the latest such tools is SHAPE (Selective 2′-hydroxyl acylation analyzed by primer extension), and SHAPE-Directed RNA Secondary Structure Prediction[92] appears to be most promising. Coupled with mutational profiling, ensembles of RNA structures, which often include non-canonical base pairing, can be experimentally studied using the SHAPE-MaP approach[93].  One of the ways ahead today appears to be an integration of Zuker’s minimum free energy approach with experimentally derived SHAPE data, including simulated SHAPE data as outlined in[94][95]. Conclusion Hydrogen bond mediated interactions between nucleotide bases, leading to base-pair formation, constitute one of the most important class of attractive interactions which shape the structure, dynamics and function of nucleic acids. With the determination of the structure of double stranded DNA molecules fueling the development and phenomenal growth in the area of molecular biology, for a long time, nucleic acid research was focused primarily around the canonical G:C and A:T/U canonical base pairs. However, even in DNA, other types of base pairings, involving different geometries and base pairing partners, have been drawing attention in the context of structural and functional diversity. Occurrence of these non-canonical base pairs are far more abundant in RNA, where a single strand folds on to itself, often without the possibility of complementary canonical base pairs to stabilize the folds. The picture that emerges from ongoing research in the context of diverse structure, dynamics and function of RNA, is that the diversity may be rationalized in terms of the structure, dynamics and stabilities of over more than 100 types of base pairs, including non-canonical base pairs. The role of G:U W:W cis base pairs in the context of the Wobble hypothesis, or the Hoogsteen base pairing in the context of triple helices and G quartet formation were initial indicators. Most of the tertiary interactions shaping the complex folding and functions of 3D RNA are mediated through non-canonical base pairs. What is particularly notable is that non-canonical base pairs are capable of creating appropriate localized distortions to provide functionally important structural variations, not only in RNA, but even in double stranded DNA. This becomes even more significant in the context of non-canonical base pairs, occurring in the A-type double stranded regions of functional RNAs, which play an important role in molecular recognition of base sequence by locally distorting the otherwise inaccessible major groove.  Thus, the field of non-canonical base pairing is still quite open for scientific contributions from different directions. In particular, a comprehensive characterization of non-canonical base pairs will have a far reaching impact on RNA biotechnology, both, in terms of prediction of structure as well as in terms of enriching our molecular level understanding of the functioning of  non (protein) coding RNA.

Non-canonical base pairing Fig4a.png

A

Non-canonical base pairing Fig4b.png

B

Figure 4 |  A) Ade:Gua Trans H:S base pair, an example of frequently observed non-canonical base pair B) Ade:Ura Trans H:W base pair, another frequently observed one


Additional information[edit | edit source]

Acknowledgements[edit | edit source]

Debasish Mukherjee and Satyabrata Maiti, Saha Institute of Nuclear Physics, Kolkata, INDIA and Antarip Halder and Sohini Bhattacharya, International Institute of Information Technology, Hyderabad, INDIA

Competing interests[edit | edit source]

The authors declare no competing interests

References[edit | edit source]

  1. Watson, J. D.; Crick, F. H. C. (1953-04). "Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid". Nature 171 (4356): 737–738. doi:10.1038/171737a0. ISSN 1476-4687. https://www.nature.com/articles/171737a0. 
  2. Nikolova, Evgenia N.; Zhou, Huiqing; Gottardo, Federico L.; Alvey, Heidi S.; Kimsey, Isaac J.; Al-Hashimi, Hashim M. (2013-07). "A historical account of hoogsteen base-pairs in duplex DNA". Biopolymers: n/a–n/a. doi:10.1002/bip.22334. ISSN 0006-3525. http://dx.doi.org/10.1002/bip.22334. 
  3. Westhof, Eric; Fritsch, Valérie (2000-03). "RNA folding: beyond Watson–Crick pairs". Structure 8 (3): R55–R65. doi:10.1016/s0969-2126(00)00112-x. ISSN 0969-2126. http://dx.doi.org/10.1016/s0969-2126(00)00112-x. 
  4. Hoogsteen, K. (1959-10-10). "The structure of crystals containing a hydrogen-bonded complex of 1-methylthymine and 9-methyladenine". Acta Crystallographica 12 (10): 822–823. doi:10.1107/s0365110x59002389. ISSN 0365-110X. http://dx.doi.org/10.1107/s0365110x59002389. 
  5. Courtois, Y.; Fromageot, P.; Guschlbauer, W. (1968-12). "Protonated Polynucleotide Structures. 3. An Optical Rotatory Dispersion Study of the Protonation of DNA". European Journal of Biochemistry 6 (4): 493–501. doi:10.1111/j.1432-1033.1968.tb00472.x. ISSN 0014-2956. http://dx.doi.org/10.1111/j.1432-1033.1968.tb00472.x. 
  6. Patel, Dinshaw J.; Tonelli, Alan E. (1974-10). "Assignment of the proton nmr chemical shifts of the T?N3H and G?N1H proton resonances in isolated AT and GC Watson-Crick base pairs in double-stranded deoxy oligonucleotides in aqueous solution". Biopolymers 13 (10): 1943–1964. doi:10.1002/bip.1974.360131003. ISSN 0006-3525. http://dx.doi.org/10.1002/bip.1974.360131003. 
  7. Seeman, Nadrian C.; Rosenberg, John M.; Suddath, F.L.; Kim, Jung Ja Park; Rich, Alexander (1976-06). "RNA double-helical fragments at atomic resolution: I. The crystal and molecular structure of sodium adenylyl-3′,5′-uridine hexahydrate". Journal of Molecular Biology 104 (1): 109–144. doi:10.1016/0022-2836(76)90005-x. ISSN 0022-2836. http://dx.doi.org/10.1016/0022-2836(76)90005-x. 
  8. Drew, H.R.; Wing, R.M.; Takano, T.; Broka, C.; Tanaka, S.; Itakura, K.; Dickerson, R.E. (1981-05-21). "STRUCTURE OF A B-DNA DODECAMER. CONFORMATION AND DYNAMICS". dx.doi.org. Retrieved 2019-12-17.
  9. Wang, A.H.-J.; Fujii, S.; Van Boom, J.H.; Van Der Marel, G.A.; Van Boeckel, S.A.A.; Rich, A. (1993-07-15). "MOLECULAR STRUCTURE OF R(GCG)D(TATACGC): A DNA-RNA HYBRID HELIX JOINED TO DOUBLE HELICAL DNA". dx.doi.org. Retrieved 2019-12-17.
  10. Heinemann, Udo; Alings, Claudia (1989-11). "Crystallographic study of one turn of G/C-rich B-DNA". Journal of Molecular Biology 210 (2): 369–381. doi:10.1016/0022-2836(89)90337-9. ISSN 0022-2836. http://dx.doi.org/10.1016/0022-2836(89)90337-9. 
  11. Dock-Bregeon, A.C.; Chevrier, B.; Podjarny, A.; Johnson, J.; de Bear, J.S.; Gough, G.R.; Gilham, P.T.; Moras, D. (1989-10). "Crystallographic structure of an RNA helix: [U(UA)6A2"]. Journal of Molecular Biology 209 (3): 459–474. doi:10.1016/0022-2836(89)90010-7. ISSN 0022-2836. http://dx.doi.org/10.1016/0022-2836(89)90010-7. 
  12. Patikoglou, G. A.; Kim, J. L.; Sun, L.; Yang, S.-H.; Kodadek, T.; Burley, S. K. (1999-12-15). "TATA element recognition by the TATA box-binding protein has been conserved throughout evolution". Genes & Development 13 (24): 3217–3230. doi:10.1101/gad.13.24.3217. ISSN 0890-9369. http://dx.doi.org/10.1101/gad.13.24.3217. 
  13. Aishima, J.; Gitti, R.K.; Noah, J.E.; Gan, H.H.; Schlick, T.; Wolberger, C. (2002-12-11). "MATALPHA2 HOMEODOMAIN BOUND TO DNA". dx.doi.org. Retrieved 2019-12-17.
  14. Nair, Deepak T.; Johnson, Robert E.; Prakash, Satya; Prakash, Louise; Aggarwal, Aneel K. (2004-07). "Replication by human DNA polymerase-ι occurs by Hoogsteen base-pairing". Nature 430 (6997): 377–380. doi:10.1038/nature02692. ISSN 0028-0836. http://dx.doi.org/10.1038/nature02692. 
  15. Kitayner, Malka; Rozenberg, Haim; Rohs, Remo; Suad, Oded; Rabinovich, Dov; Honig, Barry; Shakked, Zippora (2010-04). "Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs". Nature Structural & Molecular Biology 17 (4): 423–429. doi:10.1038/nsmb.1800. ISSN 1545-9993. http://dx.doi.org/10.1038/nsmb.1800. 
  16. Ethayathulla, A.S.; Tse, P.W.; Nguyen, S.; Viadiu, H. (2012-04-18). "structure of p73 DNA binding domain tetramer modulates p73 transactivation". dx.doi.org. Retrieved 2019-12-17.
  17. 17.0 17.1 Xu, Yu; McSally, James; Andricioaei, Ioan; Al-Hashimi, Hashim M. (2018-12). "Modulation of Hoogsteen dynamics on DNA recognition". Nature Communications 9 (1): 1473. doi:10.1038/s41467-018-03516-1. ISSN 2041-1723. PMID 29662229. PMC PMC5902632. http://www.nature.com/articles/s41467-018-03516-1. 
  18. Parkinson, Gary N.; Lee, Michael P. H.; Neidle, Stephen (2002-05-26). "Crystal structure of parallel quadruplexes from human telomeric DNA". Nature 417 (6891): 876–880. doi:10.1038/nature755. ISSN 0028-0836. http://dx.doi.org/10.1038/nature755. 
  19. Luu, Kim Ngoc; Phan, Anh Tuân; Kuryavyi, Vitaly; Lacroix, Laurent; Patel, Dinshaw J. (2006-08). "Structure of the Human Telomere in K+Solution: An Intramolecular (3 + 1) G-Quadruplex Scaffold". Journal of the American Chemical Society 128 (30): 9963–9970. doi:10.1021/ja062791w. ISSN 0002-7863. http://dx.doi.org/10.1021/ja062791w. 
  20. Phan, Anh Tuân; Kuryavyi, Vitaly; Luu, Kim Ngoc; Patel, Dinshaw J. (2007-09-25). "Structure of two intramolecular G-quadruplexes formed by natural human telomere sequences in K + solution †". Nucleic Acids Research 35 (19): 6517–6525. doi:10.1093/nar/gkm706. ISSN 1362-4962. http://dx.doi.org/10.1093/nar/gkm706. 
  21. Hendrix, Donna K.; Brenner, Steven E.; Holbrook, Stephen R. (2005-08). "RNA structural motifs: building blocks of a modular biomolecule". Quarterly Reviews of Biophysics 38 (3): 221–243. doi:10.1017/s0033583506004215. ISSN 0033-5835. http://dx.doi.org/10.1017/s0033583506004215. 
  22. Laing, Christian; Jung, Segun; Iqbal, Abdul; Schlick, Tamar (2009-10). "Tertiary Motifs Revealed in Analyses of Higher-Order RNA Junctions". Journal of Molecular Biology 393 (1): 67–82. doi:10.1016/j.jmb.2009.07.089. ISSN 0022-2836. http://dx.doi.org/10.1016/j.jmb.2009.07.089. 
  23. Halder, Sukanya; Bhattacharyya, Dhananjay (2013-11). "RNA structure and dynamics: A base pairing perspective". Progress in Biophysics and Molecular Biology 113 (2): 264–283. doi:10.1016/j.pbiomolbio.2013.07.003. ISSN 0079-6107. http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003. 
  24. Ananth, P.; Goldsmith, G.; Yathindra, N. (2013-07-16). "An innate twist between Crick's wobble and Watson-Crick base pairs". RNA 19 (8): 1038–1053. doi:10.1261/rna.036905.112. ISSN 1355-8382. http://dx.doi.org/10.1261/rna.036905.112. 
  25. Šponer, Jiří; Leszczynski, Jerzy; Hobza, Pavel (1996-01). "Structures and Energies of Hydrogen-Bonded DNA Base Pairs. A Nonempirical Study with Inclusion of Electron Correlation". The Journal of Physical Chemistry 100 (5): 1965–1974. doi:10.1021/jp952760f. ISSN 0022-3654. http://dx.doi.org/10.1021/jp952760f. 
  26. Saenger, Wolfram (1984). Principles of Nucleic Acid Structure. New York, NY: Springer New York. pp. 1–8. ISBN 978-0-387-90761-1.
  27. Sykes, Michael T.; Levitt, Michael (2005-08). "Describing RNA Structure by Libraries of Clustered Nucleotide Doublets". Journal of Molecular Biology 351 (1): 26–38. doi:10.1016/j.jmb.2005.06.024. ISSN 0022-2836. http://dx.doi.org/10.1016/j.jmb.2005.06.024. 
  28. Cite error: Invalid <ref> tag; no text was provided for refs named :13
  29. 29.0 29.1 Leontis, N. B. (2002-08-15). "The non-Watson-Crick base pairs and their associated isostericity matrices". Nucleic Acids Research 30 (16): 3497–3531. doi:10.1093/nar/gkf481. ISSN 1362-4962. http://dx.doi.org/10.1093/nar/gkf481. 
  30. Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B. Non-Protein Coding RNAs. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 1–26. ISBN 978-3-540-70833-9.
  31. Stombaugh, Jesse; Zirbel, Craig L.; Westhof, Eric; Leontis, Neocles B. (2009-02-24). "Frequency and isostericity of RNA base pairs". Nucleic Acids Research 37 (7): 2294–2312. doi:10.1093/nar/gkp011. ISSN 0305-1048. http://dx.doi.org/10.1093/nar/gkp011. 
  32. Cite error: Invalid <ref> tag; no text was provided for refs named :02
  33. Cite error: Invalid <ref> tag; no text was provided for refs named :14
  34. Sokoloski, J. E.; Godfrey, S. A.; Dombrowski, S. E.; Bevilacqua, P. C. (2011-08-26). "Prevalence of syn nucleobases in the active sites of functional RNAs". RNA 17 (10): 1775–1787. doi:10.1261/rna.2759911. ISSN 1355-8382. http://dx.doi.org/10.1261/rna.2759911. 
  35. Reichert, J. (2002-01-01). "The IMB Jena Image Library of Biological Macromolecules: 2002 update". Nucleic Acids Research 30 (1): 253–254. doi:10.1093/nar/30.1.253. ISSN 1362-4962. http://dx.doi.org/10.1093/nar/30.1.253. 
  36. "RNA Basepair Catalog". ndbserver.rutgers.edu. Retrieved 2019-12-17.
  37. "RNA Base Pair Database(RNABPDB)". hdrnas.saha.ac.in. Retrieved 2019-12-17.
  38. Bhattacharya, Sohini; Mittal, Shriyaa; Panigrahi, Swati; Sharma, Purshotam; S. P., Preethi; Paul, Rahul; Halder, Sukanya; Halder, Antarip et al. (2015-01-01). "RNABP COGEST: a resource for investigating functional RNAs". Database 2015. doi:10.1093/database/bav011. ISSN 1758-0463. PMID 25776022. PMC PMC4360618. https://academic.oup.com/database/article/doi/10.1093/database/bav011/2433143. 
  39. Chawla, Mohit; Oliva, Romina; Bujnicki, Janusz M.; Cavallo, Luigi (2015-06-27). "An atlas of RNA base pairs involving modified nucleobases with optimal geometries and accurate energies". Nucleic Acids Research 43 (14): 6714–6729. doi:10.1093/nar/gkv606. ISSN 0305-1048. http://dx.doi.org/10.1093/nar/gkv606. 
  40. S. P., Preethi; Sharma, Purshotam; Mitra, Abhijit (2017-01-06). "Structural landscape of base pairs containing post-transcriptional modifications in RNA". dx.doi.org. Retrieved 2019-12-17.
  41. Lu, X.-J. (2003-09-01). "3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures". Nucleic Acids Research 31 (17): 5108–5121. doi:10.1093/nar/gkg680. ISSN 1362-4962. http://dx.doi.org/10.1093/nar/gkg680. 
  42. Berman, H.M.; Olson, W.K.; Beveridge, D.L.; Westbrook, J.; Gelbin, A.; Demeny, T.; Hsieh, S.H.; Srinivasan, A.R. et al. (1992-09). "The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids". Biophysical Journal 63 (3): 751–759. doi:10.1016/s0006-3495(92)81649-1. ISSN 0006-3495. http://dx.doi.org/10.1016/s0006-3495(92)81649-1. 
  43. Lemieux, S. (2002-10-01). "RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire". Nucleic Acids Research 30 (19): 4250–4263. doi:10.1093/nar/gkf540. ISSN 1362-4962. http://dx.doi.org/10.1093/nar/gkf540. 
  44. Cornell, Wendy D.; Cieplak, Piotr; Bayly, Christopher I.; Gould, Ian R.; Merz, Kenneth M.; Ferguson, David M.; Spellmeyer, David C.; Fox, Thomas et al. (1995-05). "A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules". Journal of the American Chemical Society 117 (19): 5179–5197. doi:10.1021/ja00124a002. ISSN 0002-7863. http://dx.doi.org/10.1021/ja00124a002. 
  45. 45.0 45.1 Cite error: Invalid <ref> tag; no text was provided for refs named :15
  46. 46.0 46.1 Das, Jhuma; Mukherjee, Shayantani; Mitra, Abhijit; Bhattacharyya, Dhananjay (2006-10). "Non-Canonical Base Pairs and Higher Order Structures in Nucleic Acids: Crystal Structure Database Analysis". Journal of Biomolecular Structure and Dynamics 24 (2): 149–161. doi:10.1080/07391102.2006.10507108. ISSN 0739-1102. http://www.tandfonline.com/doi/abs/10.1080/07391102.2006.10507108. 
  47. 47.0 47.1 Chawla, Mohit; Sharma, Purshotam; Halder, Sukanya; Bhattacharyya, Dhananjay; Mitra, Abhijit (2011-02-17). "Protonation of Base Pairs in RNA: Context Analysis and Quantum Chemical Investigations of Their Geometries and Stabilities". The Journal of Physical Chemistry B 115 (6): 1469–1484. doi:10.1021/jp106848h. ISSN 1520-6106. https://pubs.acs.org/doi/10.1021/jp106848h. 
  48. Kelly, R. E. A.; Lee, Y. J.; Kantorovich, L. N. (2005-06). "Homopairing Possibilities of the DNA Base Adenine". The Journal of Physical Chemistry B 109 (24): 11933–11939. doi:10.1021/jp050962y. ISSN 1520-6106. http://dx.doi.org/10.1021/jp050962y. 
  49. Cite error: Invalid <ref> tag; no text was provided for refs named :33
  50. 50.0 50.1 Šponer, Judit E.; Leszczynski, Jerzy; Sychrovský, Vladimír; Šponer, Jiří (2005-10). "Sugar Edge/Sugar Edge Base Pairs in RNA:  Stabilities and Structures from Quantum Chemical Calculations". The Journal of Physical Chemistry B 109 (39): 18680–18689. doi:10.1021/jp053379q. ISSN 1520-6106. http://dx.doi.org/10.1021/jp053379q. 
  51. 51.0 51.1 51.2 Oliva, R. (2006-02-06). "Accurate energies of hydrogen bonded nucleic acid base pairs and triplets in tRNA tertiary interactions". Nucleic Acids Research 34 (3): 865–879. doi:10.1093/nar/gkj491. ISSN 0305-1048. http://dx.doi.org/10.1093/nar/gkj491. 
  52. 52.0 52.1 Bhattacharyya, Dhananjay; Koripella, Siv Chand; Mitra, Abhijit; Rajendran, Vijay Babu; Sinha, Bhabdyuti (2007-08). "Theoretical analysis of noncanonical base pairing interactions in RNA molecules". Journal of Biosciences 32 (S1): 809–825. doi:10.1007/s12038-007-0082-4. ISSN 0250-5991. http://dx.doi.org/10.1007/s12038-007-0082-4. 
  53. 53.0 53.1 Roy, Ashim; Panigrahi, Swati; Bhattacharyya, Malyasri; Bhattacharyya, Dhananjay (2008-03). "Structure, Stability, and Dynamics of Canonical and Noncanonical Base Pairs: Quantum Chemical Studies". The Journal of Physical Chemistry B 112 (12): 3786–3796. doi:10.1021/jp076921e. ISSN 1520-6106. https://pubs.acs.org/doi/10.1021/jp076921e. 
  54. Sharma, Purshotam; Mitra, Abhijit; Sharma, Sitansh; Singh, Harjinder; Bhattacharyya, Dhananjay (2008-06). "Quantum Chemical Studies of Structures and Binding in Noncanonical RNA Base pairs: The Trans Watson-Crick:Watson-Crick Family". Journal of Biomolecular Structure and Dynamics 25 (6): 709–732. doi:10.1080/07391102.2008.10507216. ISSN 0739-1102. http://www.tandfonline.com/doi/abs/10.1080/07391102.2008.10507216. 
  55. 55.0 55.1 Sharma, Purshotam; Šponer, Judit E.; Šponer, Jiří; Sharma, Sitansh; Bhattacharyya, Dhananjay; Mitra, Abhijit (2010-03-11). "On the Role of the cis Hoogsteen:Sugar-Edge Family of Base Pairs in Platforms and Triplets—Quantum Chemical Insights into RNA Structural Biology". The Journal of Physical Chemistry B 114 (9): 3307–3320. doi:10.1021/jp910226e. ISSN 1520-6106. http://dx.doi.org/10.1021/jp910226e. 
  56. 56.0 56.1 Brovarets’, Ol’ha O.; Yurenko, Yevgen P.; Hovorun, Dmytro M. (2013-06-03). "Intermolecular CH···O/N H-bonds in the biologically important pairs of natural nucleobases: a thorough quantum-chemical study". Journal of Biomolecular Structure and Dynamics 32 (6): 993–1022. doi:10.1080/07391102.2013.799439. ISSN 0739-1102. http://dx.doi.org/10.1080/07391102.2013.799439. 
  57. 57.0 57.1 Marino, Tiziana (2014-06). "DFT investigation of the mismatched base pairs (T-Hg-T)3, (U-Hg-U)3, d(T-Hg-T)2, and d(U-Hg-U)2". Journal of Molecular Modeling 20 (6): 2303. doi:10.1007/s00894-014-2303-8. ISSN 1610-2940. http://link.springer.com/10.1007/s00894-014-2303-8. 
  58. 58.0 58.1 Mládek, Arnošt; Sharma, Purshotam; Mitra, Abhijit; Bhattacharyya, Dhananjay; Šponer, Jiří; Šponer, Judit E. (2009-02-12). "Trans Hoogsteen/Sugar Edge Base Pairing in RNA. Structures, Energies, and Stabilities from Quantum Chemical Calculations". The Journal of Physical Chemistry B 113 (6): 1743–1755. doi:10.1021/jp808357m. ISSN 1520-6106. https://pubs.acs.org/doi/10.1021/jp808357m. 
  59. Cite error: Invalid <ref> tag; no text was provided for refs named :124
  60. Cite error: Invalid <ref> tag; no text was provided for refs named :73
  61. Cite error: Invalid <ref> tag; no text was provided for refs named :152
  62. Cite error: Invalid <ref> tag; no text was provided for refs named :16
  63. Cite error: Invalid <ref> tag; no text was provided for refs named :20
  64. Cite error: Invalid <ref> tag; no text was provided for refs named :132
  65. Cite error: Invalid <ref> tag; no text was provided for refs named :53
  66. Cite error: Invalid <ref> tag; no text was provided for refs named :74
  67. Cite error: Invalid <ref> tag; no text was provided for refs named :112
  68. Dickerson, R.E. (1989). "Definitions and nomenclature of nucleic acid structure components". Nucleic Acids Research 17 (5): 1797–1803. doi:10.1093/nar/17.5.1797. ISSN 0305-1048. http://dx.doi.org/10.1093/nar/17.5.1797. 
  69. Blanchet, C.; Pasi, M.; Zakrzewska, K.; Lavery, R. (2011-05-10). "CURVES+ web server for analyzing and visualizing the helical, backbone and groove parameters of nucleic acid structures". Nucleic Acids Research 39 (suppl): W68–W73. doi:10.1093/nar/gkr316. ISSN 0305-1048. http://dx.doi.org/10.1093/nar/gkr316. 
  70. Lu, Xiang-Jun; Olson, Wilma K (2008-07). "3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures". Nature Protocols 3 (7): 1213–1227. doi:10.1038/nprot.2008.104. ISSN 1754-2189. http://dx.doi.org/10.1038/nprot.2008.104. 
  71. Bansal, M.; Bhattacharyya, D.; Ravi, B. (1995). "NUPARM and NUCGEN: software for analysis and generation of sequence dependent nucleic acid structures". Bioinformatics 11 (3): 281–287. doi:10.1093/bioinformatics/11.3.281. ISSN 1367-4803. http://dx.doi.org/10.1093/bioinformatics/11.3.281. 
  72. Mukherjee, Shayantani; Bansal, Manju; Bhattacharyya, Dhananjay (2006-11-24). "Conformational specificity of non-canonical base pairs and higher order structures in nucleic acids: crystal structure database analysis". Journal of Computer-Aided Molecular Design 20 (10-11): 629–645. doi:10.1007/s10822-006-9083-x. ISSN 0920-654X. http://dx.doi.org/10.1007/s10822-006-9083-x. 
  73. Abu Almakarem, Amal S.; Petrov, Anton I.; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B. (2011-11-02). "Comprehensive survey and geometric classification of base triples in RNA structures". Nucleic Acids Research 40 (4): 1407–1423. doi:10.1093/nar/gkr810. ISSN 1362-4962. http://dx.doi.org/10.1093/nar/gkr810. 
  74. Bhattacharya, Sohini; Jhunjhunwala, Ayush; Halder, Antarip; Bhattacharyya, Dhananjay; Mitra, Abhijit (2019-02-21). "Going beyond base-pairs: topology-based characterization of base-multiplets in RNA". RNA 25 (5): 573–589. doi:10.1261/rna.068551.118. ISSN 1355-8382. http://dx.doi.org/10.1261/rna.068551.118. 
  75. 75.0 75.1 75.2 Tabei, Y.; Tsuda, K.; Kin, T.; Asai, K. (2006-05-11). "SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments". Bioinformatics 22 (14): 1723–1729. doi:10.1093/bioinformatics/btl177. ISSN 1367-4803. http://dx.doi.org/10.1093/bioinformatics/btl177. 
  76. Cite error: Invalid <ref> tag; no text was provided for refs named :34
  77. Cite error: Invalid <ref> tag; no text was provided for refs named :43
  78. Cite error: Invalid <ref> tag; no text was provided for refs named :63
  79. Sarver, Michael; Zirbel, Craig L.; Stombaugh, Jesse; Mokdad, Ali; Leontis, Neocles B. (2007-08-11). "FR3D: finding local and composite recurrent structural motifs in RNA 3D structures". Journal of Mathematical Biology 56 (1-2): 215–252. doi:10.1007/s00285-007-0110-x. ISSN 0303-6812. http://dx.doi.org/10.1007/s00285-007-0110-x. 
  80. 80.0 80.1 Popenda, Mariusz; Szachniuk, Marta; Blazewicz, Marek; Wasik, Szymon; Burke, Edmund K; Blazewicz, Jacek; Adamiak, Ryszard W (2010). "RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures". BMC Bioinformatics 11 (1): 231. doi:10.1186/1471-2105-11-231. ISSN 1471-2105. PMID 20459631. PMC PMC2873543. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-231. 
  81. Hamdani, H. Y.; Appasamy, S. D.; Willett, P.; Artymiuk, P. J.; Firdaus-Raih, M. (2012-07-01). "NASSAM: a server to search for and annotate tertiary interactions and motifs in three-dimensional structures of complex RNA molecules". Nucleic Acids Research 40 (W1): W35–W41. doi:10.1093/nar/gks513. ISSN 0305-1048. PMID 22661578. PMC PMC3394293. https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gks513. 
  82. 82.0 82.1 Petrov, A. I.; Zirbel, C. L.; Leontis, N. B. (2013-08-22). "Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas". RNA 19 (10): 1327–1340. doi:10.1261/rna.039438.113. ISSN 1355-8382. http://dx.doi.org/10.1261/rna.039438.113. 
  83. "RNA 3D Motif Atlas". rna.bgsu.edu. Retrieved 2019-12-17.
  84. Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail (2016). "URS DataBase: universe of RNA structures and their motifs". Database 2016: baw085. doi:10.1093/database/baw085. ISSN 1758-0463. PMID 27242032. PMC PMC4885603. https://academic.oup.com/database/article-lookup/doi/10.1093/database/baw085. 
  85. Parisien, Marc; Major, François (2008-03). "The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data". Nature 452 (7183): 51–55. doi:10.1038/nature06684. ISSN 0028-0836. http://dx.doi.org/10.1038/nature06684. 
  86. Zok, Tomasz; Antczak, Maciej; Zurkowski, Michal; Popenda, Mariusz; Blazewicz, Jacek; Adamiak, Ryszard W; Szachniuk, Marta (2018-04-30). "RNApdbee 2.0: multifunctional tool for RNA structure annotation". Nucleic Acids Research 46 (W1): W30–W35. doi:10.1093/nar/gky314. ISSN 0305-1048. http://dx.doi.org/10.1093/nar/gky314. 
  87. Rybarczyk, Agnieszka; Szostak, Natalia; Antczak, Maciej; Zok, Tomasz; Popenda, Mariusz; Adamiak, Ryszard; Blazewicz, Jacek; Szachniuk, Marta (2015-09-02). "New in silico approach to assessing RNA secondary structures with non-canonical base pairs". BMC Bioinformatics 16 (1). doi:10.1186/s12859-015-0718-6. ISSN 1471-2105. http://dx.doi.org/10.1186/s12859-015-0718-6. 
  88. Cite error: Invalid <ref> tag; no text was provided for refs named :8
  89. Bhattacharyya, Dhananjay; Halder, Sukanya; Basu, Sankar; Mukherjee, Debasish; Kumar, Prasun; Bansal, Manju (2017-02). "RNAHelix: computational modeling of nucleic acid structures with Watson–Crick and non-canonical base pairs". Journal of Computer-Aided Molecular Design 31 (2): 219–235. doi:10.1007/s10822-016-0007-0. ISSN 0920-654X. http://link.springer.com/10.1007/s10822-016-0007-0. 
  90. Ray, Partho Sarothi; Jia, Jie; Yao, Peng; Majumder, Mithu; Hatzoglou, Maria; Fox, Paul L. (2008-12-21). "A stress-responsive RNA switch regulates VEGFA expression". Nature 457 (7231): 915–919. doi:10.1038/nature07598. ISSN 0028-0836. http://dx.doi.org/10.1038/nature07598. 
  91. Cruz, José Almeida; Westhof, Eric (2009-02). "The Dynamic Landscapes of RNA Architecture". Cell 136 (4): 604–609. doi:10.1016/j.cell.2009.02.003. ISSN 0092-8674. http://dx.doi.org/10.1016/j.cell.2009.02.003. 
  92. Low, Justin T.; Weeks, Kevin M. (2010-10). "SHAPE-directed RNA secondary structure prediction". Methods 52 (2): 150–158. doi:10.1016/j.ymeth.2010.06.007. ISSN 1046-2023. http://dx.doi.org/10.1016/j.ymeth.2010.06.007. 
  93. Siegfried, Nathan A; Busan, Steven; Rice, Greggory M; Nelson, Julie A E; Weeks, Kevin M (2014-07-13). "RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP)". Nature Methods 11 (9): 959–965. doi:10.1038/nmeth.3029. ISSN 1548-7091. http://dx.doi.org/10.1038/nmeth.3029. 
  94. Montaseri, Soheila; Ganjtabesh, Mohammad; Zare-Mirakabad, Fatemeh (2016-11-28). "Evolutionary Algorithm for RNA Secondary Structure Prediction Based on Simulated SHAPE Data". PLOS ONE 11 (11): e0166965. doi:10.1371/journal.pone.0166965. ISSN 1932-6203. http://dx.doi.org/10.1371/journal.pone.0166965. 
  95. Spasic, Aleksandar; Assmann, Sarah M; Bevilacqua, Philip C; Mathews, David H (2017-11-21). "Modeling RNA secondary structure folding ensembles using SHAPE mapping data". Nucleic Acids Research 46 (1): 314–323. doi:10.1093/nar/gkx1057. ISSN 0305-1048. http://dx.doi.org/10.1093/nar/gkx1057.