The rapidly increasing wealth of structural information on RNA and knowledge of its varying roles in biology have facilitated the study of RNA structure using computational methods. Here we present a new method to describe RNA structure based on nucleotide doublets, where a doublet is any two nucleotides in a structure. We restrict our search to doublets which are close together in space, but not necessarily in sequence, and obtain doublet libraries of various sizes by clustering a large set of doublets taken from a data set of high resolution RNA structures. We demonstrate that these libraries are able to both capture structural features present in RNA and fit local RNA structure with a high level of accuracy. Libraries ranging in size from 10 to 100 doublets are examined, and a detailed analysis shows that a library with as few as 30 doublets is sufficient to capture the most common structural features, while larger libraries would be more appropriate for accurate modeling. We anticipate many uses for these libraries, from annotation to structure refinement and prediction.
Michael T. Sykes and Michael Levitt. Describing RNA structure by libraries of clustered nucleotide doublets. J. Molecular Biology, 351, pp. 26-38, doi:10.1016/j.jmb.2005.06.024 (2005) PDF
We have made available for download our Nucleotide Doublet Libraries. Library sizes from 10 to 100 are available as tar/gzipped files. The sequence of each library doublet is the original sequence, but all library doublets have been renumbered. Doublets which are connected in chain are renumbered with the first residue "1" and the second residue "2". Doublets which are not connected in chain are renumbered with the first residue "1" and the second residue "3".
- Size 10 Library - Information File: HTML/Text
- Size 20 Library - Information File: HTML/Text
- Size 30 Library - Information File: HTML/Text
- Size 40 Library - Information File: HTML/Text
- Size 50 Library - Information File: HTML/Text
- Size 100 Library - Information File: HTML/Text
- Information files include PDB ID, Residue and Chain information as well as cluster size and <RMSD> to the center.
- Key to Annotations:
- W = Watson-Crick base-pair
- li>P = Non-canonical base-pair
- li>S = Stacked Interaction
- D = Diagonal Interaction
- I = Tertiary Packing Interaction
- C = Connected in chain, no easily definable structure
- N = Not connected in chain, no easily definable structure
- A = Platform, similar to an A-platform