I am frequently asked to set up and parameterize difficult systems for molecular dynamics simulation, and thought I would make it easier for others to do the same in the future by writing some tutorials.
Here, we will set up a diubiquitin system with a K11 isopeptide linkage joining two chains. The info here can be applied to any systems that have residues that join in ways different from the usual peptide sequence.
Here we will use the AMBER forcefield, parameters, and file types. You’ll learn about how AMBER defines new resides in perhaps more detail than is necessary. You should have basic experience preparing PDB structures for MD simulation (tutorials here) and have a general understanding of what atom types and forcefields are (helpful review article).
If you want to follow along and check your work against my input files, they are all available here
Necessary software
You’ll need a system setup program of your choice. I use Maestro which is available for free to academics with a pared-down feature set. You can use any system setup program to add hydrogens and define the isopeptide bond.
The system will be solvated and parameterized using Dabble. Dabble can accept input in a variety of formats but is most extensively tested with Maestro format input (.mae). This tutorial was run with Dabble 2.2.1, so your output may be slightly different with a newer version. I like Dabble because I wrote it and it handles both system building and atom naming, and will call tleap for you.
To generate files representing our modified residues we will use the
VMD Python text interface.
It’s a dependency of Dabble, so if you have that installed you can import vmd
in
Python with no issues.
As we will be using the AMBER molecular dynamics forcefields, you’ll need the free AmberTools suite of programs, which includes the system setup program that Dabble invokes as well as force field files.
Setting up the structure
Download the 3NOB structure from the PDB. There are several repeats of the diubiquitin, so delete all the protein chains except E and F. Note the isopeptide bond between chain E Lys11:NZ and chain F Gly76.
Retain the crystallographic waters close to the protein, and delete the sulfate ions. Add hydrogens (I use Maestro’s Protein Preparation wizard) and fill in missing atoms (chain E Arg74). Double check that the NZ on Lys11 has the correct number of hydrogens (1) with the isopeptide bond.
I also add ACE and NMA caps on the protein termini. Note that chain F only gets one cap since the bonded glycine is a terminal residue.
Save your prepared structure as a .mae
file, prepped.mae
.
Develop a plan for parameterization
Attempt to dabble the system
Let’s see if Dabble will handle the system as-is, or at least give an error message that indicates where to start.
dabble -i prepped.mae -o dabbled.prmtop -M TIP3 -w 10. -ff amber
A full explanation of the options for Dabble is found in the documentation. For this system:
-i prepped.mae
Our input file-o dabbled.prmtop
Desired output file and format. This will produce a matching input coordinate file, dabbled.inpcrd.-M TIP3
Build the system in a TIP3 water box, with no membrane-w 10.
Make the water box extend 10 Å from each side of the protein. (For real simulations, you will want a larger value here, as the protein may rotate within a rectangular box this size and interact with itself over the short dimensions).-ff amber
Use AMBER parameters for this system as opposed to charmm, which Dabble also supports.
You will see output where Dabble solvates the protein in a water box and adds 0.150M NaCl. It identifies the special Lys11 as LYN (protonated lysine), and the bonded Gly as a GLY, which is topologically correct but not chemically.
At the end you’ll get a lot of output from tleap with a lot of errors about being unable to find bond, angle, and torsion terms. This is to be expected! We are defining a bond that isn’t in the protein forcefield.
Decide what to do next
The simplest approach seems to find analogies for each of the missing parameters, throw them in a frcmod file, and pass that to Dabble. This will result in a built system, but it will be scientifically invalid! Partial charges for LYS (more so for the protonated LYN dabble thinks it is) and GLY are not accurate here.
Instead, we will do a charge derivation for the two residues bonded together, then generate topological definitions of each residue separately. The workflow will be:
- Get charges and atom types for the linked 2 residues
- Split that into 2 residue definitions
- Modify the residue definitions to be part of a peptide chain
- Collect missing parameters between gaff2 and ff14SB proteins
Understanding how AMBER residue definitions and atom types work will be very helpful here. I’ll cover that as we go along.
Parameterize isopeptide bonded residues
Define new peptide chain residues
AMBER understands protein residues in a linear chain fashion. It’s easy to define any new residue with two bonds, forward and backwards, but pretty much impossible to create a single residue that connects two different parts of the chain.
Taking inspiration from how AMBER handles disulfide bonds, we will define two new residues, LYX and GLX, and manually add a bond between them once the protein chain is built in leap (or rather, Dabble will do this bonding for us).
Obtain a mol2 file containing the two residues and the isopeptide bond with the VMD Python interface:
1 2 3 4 |
|
Now that we have the “residue” to parameterize, we can follow the same steps as the Antechamber tutorial to generate partial charges and assign atom types.
Run antechamber to generate partial charges and assign gaff2 atom types:
antechamber -i join.mol2 -fi mol2 -o join_typed.mol2 -fo mol2 -c bcc -s 2
Then, use parmchk2 to make analogies for any missing parameters in the system:
parmchk2 -i join_typed.mol2 -f mol2 -o join.frcmod -s 2
Examining the output mol2 file shows the assigned atom types seem reasonable. However, the partial charges for the backbone residues are probably suspicious, as we simply omitted the peptide bond and threw in our linked residues with the ends charged. In the next step, we will take partial charges for the backbone from the canonical ff14SB definitions to correct this.
Split residue and assign partial charges
Split the joined residue into two in python. You could also manually edit the file and renumber the atoms, but this is a little easier.
1 2 3 4 5 |
|
As previously mentioned, the partial charges for the backbone residues are incorrect as the molecules were run through antechamber without the peptide bond.
Take the partial charge values from ff14SB, as defined in
$AMBERHOME/dat/leap/parm/amino12.lib
file. Charges are the last number on
each line following the !entry.GLY.unit.atoms
for glycine.
Take only the partial charges for the following backbone atoms– we don’t want to mess up the partial charges just derived for atoms near the isopeptide bond.
Atom Name | antechamber | ff14SB |
---|---|---|
Lys:N | -0.5690 | -0.3479 |
Lys:H | 0.3488 | 0.2747 |
Lys:CA | 0.0956 | -0.2400 |
Lys:HA | 0.0957 | 0.1426 |
Lys:O | -0.3955 | -0.5894 |
Lys:C | 0.4254 | 0.7341 |
Gly:N | -0.1320 | -0.4157 |
Gly:H | 0.4568 | 0.2719 |
Gly:CA | -0.1122 | -0.0252 |
Gly:HA2 | 0.2967 | 0.0698 |
Gly:HA3 | 0.2967 | 0.0698 |
Directly edit the charge column (the last one) in your mol2 files to contain these ff14SB charges for the backbone. You could also edit the charges in the off files obtained in the next step instead.
Generate and refine library files
Now we’ll use that to define library files in tleap:
1 2 3 4 5 6 |
|
Since our modified residues are nearly identical to normal protein ones, we will add the forward and backward peptide linkages by manually editing the library file. We’ll also need to make sure naming is consistent within this file so tleap correctly recognizes our modified residue. Fortunately, the .off library file format is well documented and relatively easy to read.
You could also follow the steps described in this tutorial to use the prepgen program to specify the peptide bond atoms, but it’s much simpler in our case to edit the off files, since the atom names correspond to the usual amino acid ones.
To make sure the modified residues are correctly identified by tleap, change the
unit name (entry.GLX.unit.name
) from "generated"
to "GLX"
.
!entry.GLX.unit.name single str
"GLX"
You’ll also want to update the residue name in the residue table, as it defaults to “****”, which will prevent tleap from correctly identifying it.
!entry.GLX.unit.residues table str name int seq int childseq int startatomx str restype int imagingx
"GLX" 1 8 1 "?" 0
Do the same thing for the residue names in the LYX off file.
To add linkages in the residue definition in the + and - direction in the
amino acid chain, look at the entry.GLX.unit.connect
section. There are two integers,
which initially are both zero. These are the atom indices of the atoms in the residue
that are bound in the C and N directions in the peptide chain, respectively. Indexing
here starts from 1, and a value of 0 indicates that there is no such bond.
Since GLX is a C-terminal residue, it only has a bond in the previous direction.
The atom that is bonded is named “N”, which is the first in the list of atoms in
entry.GLX.unit.atoms
. So the section then becomes:
!entry.GLX.unit.connect array int
1
0
The connectivity section is similar for the bonded lysine residue, except that this residue is in the middle of the chain and the “N” atom (index 1) is bonded to the previous residue, and the “C” atom (index 3) is bonded to the next one.
!entry.LYX.unit.connect array int
1
3
Collect missing parameters
Let’s try to dabble again with our new residue definitions and gaff2 parameters:
dabble -i prepped.mae -o dabbled.prmtop -M TIP3 -w 10. -ff amber -top glx.off -top lyx.off -par join.frcmod
Looking through the output, it recognizes the GLX and LYX residues, but now we get new parameter not found issues from the tleap invocation. All the parameters not found are between lower case (gaff2) atom types and the uppercase ff14SB ones.
These missing parameters are those that link the GLX and LYX into the protein chain. Since mixing forcefields within a residue is not a good idea (and is generally unsupported), we’ll manually make analogies between the gaff2 types and the protein backbond, and generate a frcmod.
This is perhaps the most time consuming step. I drew out both residues and labelled the backbone with both the gaff2 assigned atom types and the ff14SB protein types.
gaff2 type | ff14SB type |
---|---|
n2 | N |
c3 | CT |
hn | H |
c1 | C |
n | N3 |
Then, for each parameter tleap/Dabble said was missing, I searched through the ff14SB parameter list, made an analogy, and added the appropriate line to the parameter modification file.
To find this parameter list, I examined the tleap input file for the forcefield
at $AMBERHOME/dat/leap/cmd/leaprc.protein.ff14SB
. All leaprc files are in this
directory. Then, look for the lines starting with loadamberparams
to find the
actual files containing the bonded terms. In this case, I searched
$AMBERHOME/dat/leap/parm/parm10.dat
and
$AMBERHOME/dat/leap/parm/frcmod.ff14SB
for the analogous parameters.
For example, one of the missing terms is the c1-N-CX angle parameter. Using our
atom type analogies, we look in the parameter files for the C-N-CX angle term and
add the following line to our frcmod in the ANGLE
section:
c1-N -CX 50.0 121.90 ! from C-N-CX
If you want to check your work, here’s my join.frcmod.
Helpful hints for parameter analogies
You may end up making the same analogy twice. For example, the c1-N and C-n2 bonds are both defined based on the C-N ff14SB parameter.
Some terms may be listed in the opposite order than the tleap error message specifies, so if you can’t find a matching parameter, search the opposite way. For example, tleap says it cannot find the O-C-n2 angle, which corresponds to the O-C-N parameter in ff14SB. This is listed in
parm10.dat
as N-C-O.For torsion terms, sometimes there are terms with multiple periodicities for one dihedral type. Don’t forget to take all of them. For example, the O-C-n2-hn dihedral matches the H-N-C-O parameter, which as two terms.
When making analogies, especially for dihedral torsion terms, don’t forget to check for wildcard parameters. I found many of my missing torsion terms were described by the
X-C-N-X
dihedral. Try grepping for the central two atoms in the bond, like “-C- N -”. Make sure that there is no better matching term before taking any wildcard parameters!
Build final system
Pass the special residue library files, the analogous parameters frcmod file, and the original system to Dabble, and it will finally build:
dabble -i prepped.mae -o dabbled.prmtop -M TIP3 -w 10. -ff amber -top glx.off -top lyx.off -par join.frcmod -par analogies.frcmod
Again, the LYX and GLX residues are correctly identified, tleap completes successfully, and when visualizing the resulting system in VMD, the isopeptide bond can be seen.
Hopefully this tutorial helped you set up a more challenging system for simulation, and learn more about how to approach parameterization. It’s not sufficient to get the preparatory programs to give you a prmtop that looks approximately correct– thinking about the parameters within, partial charges, etc, will make sure that the resulting simulations are meaningful.