Structural Alignment with Theseus

Developed at Volkamer Lab, Charité/FU Berlin

Enes Kurnaz

Reference

Theseus

Theseus superposes a set of macromolecular structures simultaneously using the method of maximum like-lihood (ML), rather than the conventional least-squares criterion. Theseus assumes that the structures are distributed according to a matrix Gaussian distribution and that the eigenvalues of the atomic covariancematrix are hierarchically distributed according to an inverse gamma distribution. This ML superpositioning model produces much more accurate results by essentially downweighting variable regions of the structuresand by correcting for correlations among atoms.

  • https://theobald.brandeis.edu/theseus/

  • Optimal simultaneous superpositioning of multiple structures with missing data. Theobald, Douglas L. & Steindel, Philip A. (2012) Bioinformatics 28 (15): 1972-1979 (Link)

  • Accurate structural correlations from maximum likelihood superpositions. Theobald, Douglas L. & Wuttke, Deborah S. (2008) PLOS Computational Biology 4(2):e43 (Link)

Introduction

In this tutorial we use Theseus as a python wrapper.

What are the chosen structures

Names of the structures

  • 2BBM (Drosophila melanogaster)

  • 1CFC (Xenopus laevis)

Information of the structures

  • Length: 148

  • 97% sequence identity (145/148), 99% similar

  • Contain calmodulin

Why have they been chosen

  • In 2BBM, the two calcium-binding domains are wrapped around a peptide.

  • In 1CFC is no calcium and no peptide, and the linker between the two domains is flexible.

(Source)

Theory

RMSD

The RMSD is the average distance between the atoms of superposed structures in Angstrom.

coverage

The coverage of the aligned structures.

Preparation

Getting the structure in python

First thing you need to do is to download the proteins and pass them to opencadd. We do that with the Structure objects and the .from_pdbid() class method.

[1]:
from opencadd.structure.core import Structure

structure1 = Structure.from_pdbid("2BBM")
structure2 = Structure.from_pdbid("1CFC")

2BBM has a single segment, but 1CFC has 25 different chains!

[3]:
len(structure1.models), len(structure2.models)
[3]:
(1, 25)

We can splice a Structure into a sub-AtomGroup and recreate the Structure:

[4]:
structure2_onemodel = Structure.from_atomgroup(structure2.models[0])

Alignment

Use your method and explain the steps it takes.

First we import TheseusAligner and instantiate the engine. Then we use the .calculate() method which needs structures. Here we will use the structures structure1 and structure2_onemodel declared before. We will do a dry run first (to assess the initial state) and then perform the superposition itself.

[5]:
from opencadd.structure.superposition.engines.theseus import TheseusAligner
dry_aligner = TheseusAligner(statistics_only=True)
dry_aligner.calculate([structure1, structure2_onemodel])
[5]:
{'scores': {'rmsd': 34.88939},
 'metadata': {'transformation': array([[ 1.,  0.,  0.,  0.],
         [ 0.,  1.,  0., -0.],
         [-0.,  0.,  1., -0.]]),
  'least_squares': 10.0717,
  'maximum_likelihood': 6.41057,
  'log_marginal_likelihood': -3381.82,
  'aic': -4325.57,
  'bic': -4933.1,
  'omnibus_chi_square': 7.65,
  'hierarchical_var_chi_square': 5.38,
  'rotational_translational_covar_chi_square': 7.65,
  'hierarchical_minimum_var': 7.21,
  'hierarchical_minimum_sigma': 2.68,
  'skewness': -0.0,
  'skewness_z': 0.0,
  'kurtosis': -0.9,
  'kurtosis_z': 5.46,
  'data_pts': 888.0,
  'free_params': 457.0,
  'd_p': 1.9,
  'median_structure': 1.0,
  'n_total': 296.0,
  'n_atoms': 148.0,
  'n_structures': 2.0,
  'total_rounds': 101.0},
 'superposed': [<Universe with 2704 atoms>, <Universe with 2262 atoms>]}
[5]:
aligner = TheseusAligner()
results = aligner.calculate([structure1, structure2_onemodel])
results
/home/jaime/.conda/envs/structuralalignment/lib/python3.8/site-packages/MDAnalysis/coordinates/PDB.py:914: UserWarning: Found no information for attr: 'tempfactors' Using default value of '0.0'
  warnings.warn("Found no information for attr: '{}'"
[5]:
{'scores': {'rmsd': 9.02821},
 'metadata': {'transformation': array([[ -0.924331 ,   0.3733212,   0.0790154, -10.305    ],
         [ -0.3013137,  -0.5869994,  -0.7514265,  -6.0737   ],
         [ -0.2341415,  -0.7183753,   0.6550685,  -1.669    ]]),
  'least_squares': 2.60622,
  'maximum_likelihood': 1.70454,
  'log_marginal_likelihood': -2029.66,
  'aic': -2973.42,
  'bic': -3580.94,
  'omnibus_chi_square': 3.82,
  'hierarchical_var_chi_square': 1.36,
  'rotational_translational_covar_chi_square': 3.82,
  'hierarchical_minimum_var': 0.762,
  'hierarchical_minimum_sigma': 0.873,
  'skewness': -0.0,
  'skewness_z': 0.0,
  'kurtosis': -1.11,
  'kurtosis_z': 6.74,
  'data_pts': 888.0,
  'free_params': 457.0,
  'd_p': 1.9,
  'median_structure': 1.0,
  'n_total': 296.0,
  'n_atoms': 148.0,
  'n_structures': 2.0,
  'total_rounds': 28.0},
 'superposed': [<Universe with 2704 atoms>, <Universe with 2262 atoms>]}

Analysis

NGLview

If you have trouble with NGLview, follow this troubleshooting guide.

[6]:
import nglview as nv
view = nv.show_mdanalysis(results["superposed"][0].atoms)
view.add_component(results["superposed"][1].atoms)
view

In this case, the scores are not too great. We go from 34.9A to 9A, but the visual overlap is not that great.