Running the classical MD simulations
Optional: If you want to learn how to run the classical MD simulations for the solvated lysine then you can follow this optional tutorial. For the main tutorial nothing on this page is necessary.
You will need to install GROMACS; these
simulations are not demanding so you can simply install the version
that comes with your package manager or from conda-forge (via conda
install -c conda-forge gromacs
).
- All necessary input files are in directories under files/MD/topology.
- We make a separate directory for each step and we assume that all
these directories are located in one directory (e.g.,
MD
).
Force field updates
Adding the CHARMM36 force field
We want to use the latest CHARMM36 forcefield. Download the tar file
from
http://mackerell.umaryland.edu/charmm_ff.shtml#gromacs
and get charmm36-jul2022.ff.tgz. Unpack and drop it into your
$GMXDATA/top
directory where the other force fields live.
What is the GMXDATA
environment variable?
$GMXDATA
is an environment variable that is set when you
source the GMXRC
file (which you normally do anyway to set up the environment
for
GROMACS)
With $GMXDATA/top
indicating the directory where the force field
files are stored, the installation of the force field is just
What to do if you cannot install into GMXDATA
?
If you don't have write permission in the GROMACS installation directory then you have a number of options.
- Drop the forcefield files into the working directory, i.e., the
same directory where you are building your topologies and where
you run
pdb2gmx
; in our case, just in./top
. - Install a local copy of GROMACS in your own home directory. That works very well and is recommended.
- Ask your friendly system administrator (or anyone with the
power to use
sudo
to install the files for you). -
Install it in an arbitrary directory, e.g.,
$HOME/gromacs/forcefields
and then set theGMXLIB
environment variable to inform GROMACS where to find force field files:
Updating the GROMACS residue database
We are simulating a neutral lysine, which will be named LYSN but in
order for GROMACS to properly recognize it we have to fix an entry in
the residuetypes.dat
database:
Add to $GMXDATA/top/residuetypes.dat
a line to recognize LYSN as
protein:
LYSN Protein
GROMACS would normally recognize LSN as a protein residue but the new CHARMM force field seems to prefer LYSN so we need to update GROMACS ourselves. (This may be fixed in a future version of GROMACS.)
Alternative: Modify a local copy of residuetypes.dat
.
Instead of modifying residuetypes.dat
in the GROMACS
installation directory, you can also first copy it to your current
working directory
and then modify the local copy. This local copy,
./residuetypes.dat
will be read by pdb2gmx
before the one in
the installation directory.
Input files
Input files for all steps can be found under files/MD.
Topology files: If you want to perform all steps starting from the
topology files and the initial coordinates then you will only need the
input files in the topology
directory (files/MD/topology) and no
other intermediate files. You don't need to change any files in
topology/
. The topology files are:
lys_capped.pdb
: Original structure of the capped lysine (generated with Avogadro peptide builder, neutralized and capped with CHARMM-GUI, and manually relabelled so that ACE and NME are separated "residues" as required for GROMACS).topol.top
: GROMACS primary topology file which contains information about all the molecules and force fields ("charmm36-jul2022.ff/forcefield.itp"
). The file that was originally produced bypdb2gmx
(see below) was manually split into thistopol.top
file as a base top file andaceKnme.itp
for the molecule aceKnme. The ITP file is now included with a#include "aceKnme.itp"
line. The file has also been updated with the number of solvent molecules (SOL 987
) from the solvation step.aceKnme.itp
: GROMACS molecular topology include file (ITP), containing atom types, partial charges, and bonded interactions for the peptide.posre.itp
: GROMACS position restraints (only necessary for equilibration simulations with position restraints); not needed here but included for completeness.aceKnme.pdb
: Final structure of the peptide with atoms ordered according to the force field ITP file and atoms and residues also labelled according to the force field; guaranteed to include all hydrogens and lysine in its correct deprotonated state. This is the structure that is used as the starting point for simulations.
Set up GROMACS
Before running GROMACS, you need to set up the GROMACS environment by
sourcing the GMXRC
file that should be on your PATH
:
You should now be able to run the gmx
executable so try
:-) GROMACS - gmx, 2021.3-macports (-:
GROMACS is written by:
Andrey Alekseenko Emile Apol Rossen Apostolov
Paul Bauer Herman J.C. Berendsen Par Bjelkmar
...
Topology generation
Note
You do not have to execute the following steps in Topology generation because all files are already present in files/MD/topology. These steps are listed for completeness so that you know exactly how the files were generated.
cd topology
printf "0\n8\n7n" | gmx pdb2gmx -ff charmm36-jul2022 -water tip3p -f lys_capped.pdb -o aceKnme.pdb -ter -lys
We use the gmx
pdb2gmx
command to create the topology files for the charmm36-jul2022 force
field. The printf
command selects for us
- 0: neutral Lys
- 8: None for N-term (because we already have ACE)
- 7: None for C-term (because we already have NME)
The output should show a total charge of 0.
For easier maintenance we can manually split the topol.top
file into
a base top file and the aceKnme.itp
included topology file for the
molecule aceKnme.
Solvate with water
Put in cubic box with at least 1.0 nm between atoms and box edge with
gmx solvate
:
One cannot make the box much smaller without reducing the standard
cutoff/rlist required for the CHARMM force field (the rvdw = 1.2 nm
will not fit into a much smaller box).
Add water:
The -p ../topology/topol.top
adds a line for 987 solvent molecules
with resname (SOL) to the topology file
(../topology/topol.top
). Note the output and remember how many
molecules were added.
Warning
If you ran this step with the existing topology/topol.top
file then
you have to manually fix the file. If you don't fix the file,
the gmx grompp
commands in the following steps will complain
that the number of atoms in the topology file and the coordinate
file do not agree. Follow the next steps to fix your file.
gmx solvate
does not alwats properly update the topology.top
file
if there is already a line with solvent molecules that it added
previously.
-
Open
topology.top
in your favorite text editor. -
Find the
[ molecule ]
section at the end of the file. Make sure that it has one line for the solvent moleculesSOL
with the correct number of molecules (which you remember from thegmx solvate
step). Your[ molecules ]
section should look similar toat the end of the file. Your number of solvent molecules may slightly differ. (If you want to count yourself, use
grep 'OW *SOL' solvated.pdb | wc -l
.) -
Save the file.
Energy minimization
The initial structure contains some close contacts so we definitely want to remove steric clashes and prepare the system in a low energy conformation so that our MD will not crash on the first few steps.
We run a simple steepest descent minimization with the included
emin.mdp
file for the run parameters and the solvated.pdb
from
the previous step.
We first generate the portable binary run input TPR file emin.tpr
that uses
the initial coordinates (PDB or GRO structure file) together with the
topology/force field information (TOP) and the
run parameters in the MDP file with the gmx
grompp
command:
mkdir energy_minimization
cd energy_minimization
gmx grompp -f emin.mdp -c ../solvate/solvated.pdb -p ../topology/topol.top -o emin.tpr
We can now take the TPR file to our favorite supercomputer and run it,
or, more likely, just run it on the same machine. We use the GROMACS
MD program gmx
mdrun
:
Here I am using -nt 4
for four threads but you can omit it and let
GROMACS choose what you have available. In particular if a GPU is
available, GROMACS will use it.
gmx mdrun parameters
See the manual page for gmx
mdrun
for explanation of all the command line options.
You should see the energy minimization converge with information such as
Steepest Descents converged to Fmax < 1000 in 158 steps
Potential Energy = -4.3777473e+04
Maximum force = 9.8633698e+02 on atom 5
Norm of force = 5.9427117e+01
NVT MD (equilibration)
Equilibrate for 500 ps at 300 K with Berendsen thermostat and a 2 fs time step; this is fairly short and for a real project you might want to run for longer. However, it seems sufficient for the temperature to equilibrate to the target value and for the water to reasonably solvate the peptide.
The run parameters are listed in nvt.mdp
and we first generate the
md.tpr
file and then run the actual MD simulation, which shouldn't
take longer than a minute or two:
mkdir NVT
cd NVT
gmx grompp -f nvt.mdp -c ../energy_minimization/emin.gro -p ../topology/topol.top -o md.tpr
gmx mdrun -v -deffnm md -nt 4 -cpi -pin on
gmx mdrun parameters
See the manual page for gmx
mdrun
for explanation of all the command line options.
The output are the final coordinates (in md.gro
) together with other
restart information, including velocities (in the md.cpt
checkpoint/restart file). Both are used as input for the production
simulation.
Warning
For bigger systems we would run at least one additional equilibration simulation where we would switch on pressure coupling with the Berendsen barostat algorithm (which does not produce the correct thermodynamic ensemble but robustly moves the system to the right pressure and is therefore useful for equilibration). However, for this tutorial we directly go to production simulations with the Parrinello-Rahman barostat (which produces the correct thermodynamic ensemble but may result in unstable simulations if one starts too far from the correct density/volume).
MDP parameters
For further details on setting simulation parameters see Molecular dynamics parameters (.mdp options) in the GROMACS manual.
NPT MD (production)
We are now ready to run a production simulation in the NPT ensemble at 300 K and 1 bar with a 2 fs integrator timestep. The 2 fs step is made possible by constraining all bonds involving hydrogens with the LINCS or SETTLE algorithm. We use the Bussi-Parrinello stochastic velocity rescaling thermostat and the Parrinello-Rahman barostat, which produce the correct ensemble, so all frames from our trajectory are samples from the equilibrium ensemble of the peptide. We can therefore pick any of the frames to start ab-initio MD.
The run input parameters are in file md.mdp
and we use the
md.gro
and md.cpt
file from the equilibration step as starting
values. Here we only run the simulation for 1 ns but because it is
such a small system, it should be easily possible to run this
simulation for microseconds, especially when a GPU is available:
mkdir NPT
cd NPT
gmx grompp -f md.mdp -c ../NVT/md.gro -t ../NVT/md.cpt -p ../topology/topol.top -o md.tpr
gmx mdrun -v -deffnm md -nt 4 -cpi -pin on
gmx mdrun parameters and performance tips
See the manual page for gmx
mdrun
for explanation of all the command line options.
See Getting good performance form mdrun for important information on how to make the most out of your hardward when running MD simulations with GROMACS.
The MDP file was set up so that GROMACS would write coordinates to a
compressed trajectory XTC file (md.xtc
) every 1 ps, and coordinates
and velocities to the full-precision TRR file (md.trr
) every 100
ps. The XTC file saves disk space with a lossy compression that only
saves coordinates with a precision of 0.001 nm, which is sufficient
for most analysis of classical MD trajectories; however, as starting
points for ab-initio MD we want to use full float32 precision and
velocities so we use the TRR file.