Michael T. Sykes

sykes_at_scripps_dot_edu


Home
Research
Publications
Nucleotide Doublets
Software

Nucleotide Doublets

The rapidly increasing wealth of structural information on RNA and knowledge of its varying roles in biology have facilitated the study of RNA structure using computational methods. Here we present a new method to describe RNA structure based on nucleotide doublets, where a doublet is any two nucleotides in a structure. We restrict our search to doublets which are close together in space, but not necessarily in sequence, and obtain doublet libraries of various sizes by clustering a large set of doublets taken from a data set of high resolution RNA structures. We demonstrate that these libraries are able to both capture structural features present in RNA and fit local RNA structure with a high level of accuracy. Libraries ranging in size from 10 to 100 doublets are examined, and a detailed analysis shows that a library with as few as 30 doublets is sufficient to capture the most common structural features, while larger libraries would be more appropriate for accurate modeling. We anticipate many uses for these libraries, from annotation to structure refinement and prediction.

Michael T. Sykes and Michael Levitt. Describing RNA structure by libraries of clustered nucleotide doublets. J. Molecular Biology, 351, pp. 26-38, doi:10.1016/j.jmb.2005.06.024 (2005) PDF

Nucleotide Doubletspace

Download Libraries

We have made available for download our Nucleotide Doublet Libraries. Library sizes from 10 to 100 are available as tar/gzipped files. The sequence of each library doublet is the original sequence, but all library doublets have been renumbered. Doublets which are connected in chain are renumbered with the first residue "1" and the second residue "2". Doublets which are not connected in chain are renumbered with the first residue "1" and the second residue "3".

Terms of Use: The doublet libraries are available free of charge for personal or Academic use. Any publications which make use of these libraries must cite the reference given above (Sykes and Levitt, JMB 2005). For Corporate or Industrial applications please contact Michael T. Sykes.

  • Size 10 Library - Information File: HTML/Text
  • Size 20 Library - Information File: HTML/Text
  • Size 30 Library - Information File: HTML/Text
  • Size 40 Library - Information File: HTML/Text
  • Size 50 Library - Information File: HTML/Text
  • Size 100 Library - Information File: HTML/Text
  • Information files include PDB ID, Residue and Chain information as well as cluster size and <RMSD> to the center.
  • Key to Annotations:
    • W = Watson-Crick base-pair
    • li>P = Non-canonical base-pair
    • li>S = Stacked Interaction
    • D = Diagonal Interaction
    • I = Tertiary Packing Interaction
    • C = Connected in chain, no easily definable structure
    • N = Not connected in chain, no easily definable structure
    • A = Platform, similar to an A-platform