Structural Alignment with Theseus
Developed at Volkamer Lab, Charité/FU Berlin
Enes Kurnaz
Reference
Theseus
Theseus superposes a set of macromolecular structures simultaneously using the method of maximum like-lihood (ML), rather than the conventional least-squares criterion. Theseus assumes that the structures are distributed according to a matrix Gaussian distribution and that the eigenvalues of the atomic covariancematrix are hierarchically distributed according to an inverse gamma distribution. This ML superpositioning model produces much more accurate results by essentially downweighting variable regions of the structuresand by correcting for correlations among atoms.
Optimal simultaneous superpositioning of multiple structures with missing data. Theobald, Douglas L. & Steindel, Philip A. (2012) Bioinformatics 28 (15): 1972-1979 (Link)
Accurate structural correlations from maximum likelihood superpositions. Theobald, Douglas L. & Wuttke, Deborah S. (2008) PLOS Computational Biology 4(2):e43 (Link)
Introduction
In this tutorial we use Theseus as a python wrapper.
What are the chosen structures
Names of the structures
2BBM (Drosophila melanogaster)
1CFC (Xenopus laevis)
Information of the structures
Length: 148
97% sequence identity (145/148), 99% similar
Contain calmodulin
Why have they been chosen
In 2BBM, the two calcium-binding domains are wrapped around a peptide.
In 1CFC is no calcium and no peptide, and the linker between the two domains is flexible.
(Source)
Theory
RMSD
The RMSD is the average distance between the atoms of superposed structures in Angstrom.
coverage
The coverage of the aligned structures.
Preparation
Getting the structure in python
First thing you need to do is to download the proteins and pass them to opencadd
. We do that with the Structure
objects and the .from_pdbid()
class method.
[1]:
from opencadd.structure.core import Structure
structure1 = Structure.from_pdbid("2BBM")
structure2 = Structure.from_pdbid("1CFC")
2BBM
has a single segment, but 1CFC
has 25 different chains!
[3]:
len(structure1.models), len(structure2.models)
[3]:
(1, 25)
We can splice a Structure into a sub-AtomGroup and recreate the Structure:
[4]:
structure2_onemodel = Structure.from_atomgroup(structure2.models[0])
Alignment
Use your method and explain the steps it takes.
First we import TheseusAligner
and instantiate the engine. Then we use the .calculate()
method which needs structures. Here we will use the structures structure1
and structure2_onemodel
declared before. We will do a dry run first (to assess the initial state) and then perform the superposition itself.
[5]:
from opencadd.structure.superposition.engines.theseus import TheseusAligner
dry_aligner = TheseusAligner(statistics_only=True)
dry_aligner.calculate([structure1, structure2_onemodel])
[5]:
{'scores': {'rmsd': 34.88939},
'metadata': {'transformation': array([[ 1., 0., 0., 0.],
[ 0., 1., 0., -0.],
[-0., 0., 1., -0.]]),
'least_squares': 10.0717,
'maximum_likelihood': 6.41057,
'log_marginal_likelihood': -3381.82,
'aic': -4325.57,
'bic': -4933.1,
'omnibus_chi_square': 7.65,
'hierarchical_var_chi_square': 5.38,
'rotational_translational_covar_chi_square': 7.65,
'hierarchical_minimum_var': 7.21,
'hierarchical_minimum_sigma': 2.68,
'skewness': -0.0,
'skewness_z': 0.0,
'kurtosis': -0.9,
'kurtosis_z': 5.46,
'data_pts': 888.0,
'free_params': 457.0,
'd_p': 1.9,
'median_structure': 1.0,
'n_total': 296.0,
'n_atoms': 148.0,
'n_structures': 2.0,
'total_rounds': 101.0},
'superposed': [<Universe with 2704 atoms>, <Universe with 2262 atoms>]}
[5]:
aligner = TheseusAligner()
results = aligner.calculate([structure1, structure2_onemodel])
results
/home/jaime/.conda/envs/structuralalignment/lib/python3.8/site-packages/MDAnalysis/coordinates/PDB.py:914: UserWarning: Found no information for attr: 'tempfactors' Using default value of '0.0'
warnings.warn("Found no information for attr: '{}'"
[5]:
{'scores': {'rmsd': 9.02821},
'metadata': {'transformation': array([[ -0.924331 , 0.3733212, 0.0790154, -10.305 ],
[ -0.3013137, -0.5869994, -0.7514265, -6.0737 ],
[ -0.2341415, -0.7183753, 0.6550685, -1.669 ]]),
'least_squares': 2.60622,
'maximum_likelihood': 1.70454,
'log_marginal_likelihood': -2029.66,
'aic': -2973.42,
'bic': -3580.94,
'omnibus_chi_square': 3.82,
'hierarchical_var_chi_square': 1.36,
'rotational_translational_covar_chi_square': 3.82,
'hierarchical_minimum_var': 0.762,
'hierarchical_minimum_sigma': 0.873,
'skewness': -0.0,
'skewness_z': 0.0,
'kurtosis': -1.11,
'kurtosis_z': 6.74,
'data_pts': 888.0,
'free_params': 457.0,
'd_p': 1.9,
'median_structure': 1.0,
'n_total': 296.0,
'n_atoms': 148.0,
'n_structures': 2.0,
'total_rounds': 28.0},
'superposed': [<Universe with 2704 atoms>, <Universe with 2262 atoms>]}
Analysis
NGLview
If you have trouble with NGLview, follow this troubleshooting guide.
[6]:
import nglview as nv
view = nv.show_mdanalysis(results["superposed"][0].atoms)
view.add_component(results["superposed"][1].atoms)
view
In this case, the scores are not too great. We go from 34.9A
to 9A
, but the visual overlap is not that great.