Skip to content

Running the classical MD simulations

Optional: If you want to learn how to run the classical MD simulations for the solvated lysine then you can follow this optional tutorial. For the main tutorial nothing on this page is necessary.

You will need to install GROMACS; these simulations are not demanding so you can simply install the version that comes with your package manager or from conda-forge (via conda install -c conda-forge gromacs).

  1. All necessary input files are in directories under files/MD/topology.
  2. We make a separate directory for each step and we assume that all these directories are located in one directory (e.g., MD).

Force field updates

Adding the CHARMM36 force field

We want to use the latest CHARMM36 forcefield. Download the tar file from http://mackerell.umaryland.edu/charmm_ff.shtml#gromacs and get charmm36-jul2022.ff.tgz. Unpack and drop it into your $GMXDATA/top directory where the other force fields live.

What is the GMXDATA environment variable?

$GMXDATA is an environment variable that is set when you source the GMXRC file (which you normally do anyway to set up the environment for GROMACS)

. GMXRC

# show where the data files live
echo $GMXDATA

With $GMXDATA/top indicating the directory where the force field files are stored, the installation of the force field is just

cd ${GMXDATA}/top
tar xvf ~/Downloads/charmm36-jul2022.ff.tgz
What to do if you cannot install into GMXDATA?

If you don't have write permission in the GROMACS installation directory then you have a number of options.

  1. Drop the forcefield files into the working directory, i.e., the same directory where you are building your topologies and where you run pdb2gmx; in our case, just in ./top.
  2. Install a local copy of GROMACS in your own home directory. That works very well and is recommended.
  3. Ask your friendly system administrator (or anyone with the power to use sudo to install the files for you).
  4. Install it in an arbitrary directory, e.g., $HOME/gromacs/forcefields and then set the GMXLIB environment variable to inform GROMACS where to find force field files:

    export GMXLIB=$HOME/gromacs/forcefields
    

Updating the GROMACS residue database

We are simulating a neutral lysine, which will be named LYSN but in order for GROMACS to properly recognize it we have to fix an entry in the residuetypes.dat database:

Add to $GMXDATA/top/residuetypes.dat a line to recognize LYSN as protein:

 LYSN Protein

GROMACS would normally recognize LSN as a protein residue but the new CHARMM force field seems to prefer LYSN so we need to update GROMACS ourselves. (This may be fixed in a future version of GROMACS.)

Alternative: Modify a local copy of residuetypes.dat.

Instead of modifying residuetypes.dat in the GROMACS installation directory, you can also first copy it to your current working directory

cp $GMXDATA/top/residuetypes.dat .

and then modify the local copy. This local copy, ./residuetypes.dat will be read by pdb2gmx before the one in the installation directory.

Input files

Input files for all steps can be found under files/MD.

Topology files: If you want to perform all steps starting from the topology files and the initial coordinates then you will only need the input files in the topology directory (files/MD/topology) and no other intermediate files. You don't need to change any files in topology/. The topology files are:

  • lys_capped.pdb: Original structure of the capped lysine (generated with Avogadro peptide builder, neutralized and capped with CHARMM-GUI, and manually relabelled so that ACE and NME are separated "residues" as required for GROMACS).
  • topol.top: GROMACS primary topology file which contains information about all the molecules and force fields ("charmm36-jul2022.ff/forcefield.itp"). The file that was originally produced by pdb2gmx (see below) was manually split into this topol.top file as a base top file and aceKnme.itp for the molecule aceKnme. The ITP file is now included with a #include "aceKnme.itp" line. The file has also been updated with the number of solvent molecules (SOL 987) from the solvation step.
  • aceKnme.itp: GROMACS molecular topology include file (ITP), containing atom types, partial charges, and bonded interactions for the peptide.
  • posre.itp: GROMACS position restraints (only necessary for equilibration simulations with position restraints); not needed here but included for completeness.
  • aceKnme.pdb: Final structure of the peptide with atoms ordered according to the force field ITP file and atoms and residues also labelled according to the force field; guaranteed to include all hydrogens and lysine in its correct deprotonated state. This is the structure that is used as the starting point for simulations.

Set up GROMACS

Before running GROMACS, you need to set up the GROMACS environment by sourcing the GMXRC file that should be on your PATH:

. GMXRC

You should now be able to run the gmx executable so try

gmx -version
It should show something like
                     :-) GROMACS - gmx, 2021.3-macports (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar
...

Topology generation

Note

You do not have to execute the following steps in Topology generation because all files are already present in files/MD/topology. These steps are listed for completeness so that you know exactly how the files were generated.

cd topology
printf "0\n8\n7n" | gmx pdb2gmx -ff charmm36-jul2022 -water tip3p -f lys_capped.pdb -o aceKnme.pdb  -ter -lys

We use the gmx pdb2gmx command to create the topology files for the charmm36-jul2022 force field. The printf command selects for us

  • 0: neutral Lys
  • 8: None for N-term (because we already have ACE)
  • 7: None for C-term (because we already have NME)

The output should show a total charge of 0.

For easier maintenance we can manually split the topol.top file into a base top file and the aceKnme.itp included topology file for the molecule aceKnme.

Solvate with water

Put in cubic box with at least 1.0 nm between atoms and box edge with gmx solvate:

mkdir solvate
cd solvate

gmx editconf -f ../topology/aceKnme.pdb -bt cubic -d 1.0 -o boxed.pdb

One cannot make the box much smaller without reducing the standard cutoff/rlist required for the CHARMM force field (the rvdw = 1.2 nm will not fit into a much smaller box).

Add water:

gmx solvate -cp boxed.pdb -cs -p ../topology/topol.top -o solvated.pdb

The -p ../topology/topol.top adds a line for 987 solvent molecules with resname (SOL) to the topology file (../topology/topol.top). Note the output and remember how many molecules were added.

Warning

If you ran this step with the existing topology/topol.top file then you have to manually fix the file. If you don't fix the file, the gmx grompp commands in the following steps will complain that the number of atoms in the topology file and the coordinate file do not agree. Follow the next steps to fix your file.

gmx solvate does not alwats properly update the topology.top file if there is already a line with solvent molecules that it added previously.

  1. Open topology.top in your favorite text editor.

  2. Find the [ molecule ] section at the end of the file. Make sure that it has one line for the solvent molecules SOL with the correct number of molecules (which you remember from the gmx solvate step). Your [ molecules ] section should look similar to

    [ molecules ]
    ; Compound        #mols
    aceKnme           1
    SOL               987
    

    at the end of the file. Your number of solvent molecules may slightly differ. (If you want to count yourself, use grep 'OW *SOL' solvated.pdb | wc -l.)

  3. Save the file.

Energy minimization

The initial structure contains some close contacts so we definitely want to remove steric clashes and prepare the system in a low energy conformation so that our MD will not crash on the first few steps.

We run a simple steepest descent minimization with the included emin.mdp file for the run parameters and the solvated.pdb from the previous step.

We first generate the portable binary run input TPR file emin.tpr that uses the initial coordinates (PDB or GRO structure file) together with the topology/force field information (TOP) and the run parameters in the MDP file with the gmx grompp command:

mkdir energy_minimization
cd energy_minimization

gmx grompp -f emin.mdp -c ../solvate/solvated.pdb -p ../topology/topol.top -o emin.tpr

We can now take the TPR file to our favorite supercomputer and run it, or, more likely, just run it on the same machine. We use the GROMACS MD program gmx mdrun:

gmx mdrun -deffnm emin -nt 4 

Here I am using -nt 4 for four threads but you can omit it and let GROMACS choose what you have available. In particular if a GPU is available, GROMACS will use it.

gmx mdrun parameters

See the manual page for gmx mdrun for explanation of all the command line options.

You should see the energy minimization converge with information such as

Steepest Descents converged to Fmax < 1000 in 158 steps
Potential Energy  = -4.3777473e+04
Maximum force     =  9.8633698e+02 on atom 5
Norm of force     =  5.9427117e+01

NVT MD (equilibration)

Equilibrate for 500 ps at 300 K with Berendsen thermostat and a 2 fs time step; this is fairly short and for a real project you might want to run for longer. However, it seems sufficient for the temperature to equilibrate to the target value and for the water to reasonably solvate the peptide.

The run parameters are listed in nvt.mdp and we first generate the md.tpr file and then run the actual MD simulation, which shouldn't take longer than a minute or two:

mkdir NVT
cd NVT

gmx grompp -f nvt.mdp -c ../energy_minimization/emin.gro -p ../topology/topol.top -o md.tpr
gmx mdrun -v -deffnm md -nt 4 -cpi -pin on
gmx mdrun parameters

See the manual page for gmx mdrun for explanation of all the command line options.

The output are the final coordinates (in md.gro) together with other restart information, including velocities (in the md.cpt checkpoint/restart file). Both are used as input for the production simulation.

Warning

For bigger systems we would run at least one additional equilibration simulation where we would switch on pressure coupling with the Berendsen barostat algorithm (which does not produce the correct thermodynamic ensemble but robustly moves the system to the right pressure and is therefore useful for equilibration). However, for this tutorial we directly go to production simulations with the Parrinello-Rahman barostat (which produces the correct thermodynamic ensemble but may result in unstable simulations if one starts too far from the correct density/volume).

MDP parameters

For further details on setting simulation parameters see Molecular dynamics parameters (.mdp options) in the GROMACS manual.

NPT MD (production)

We are now ready to run a production simulation in the NPT ensemble at 300 K and 1 bar with a 2 fs integrator timestep. The 2 fs step is made possible by constraining all bonds involving hydrogens with the LINCS or SETTLE algorithm. We use the Bussi-Parrinello stochastic velocity rescaling thermostat and the Parrinello-Rahman barostat, which produce the correct ensemble, so all frames from our trajectory are samples from the equilibrium ensemble of the peptide. We can therefore pick any of the frames to start ab-initio MD.

The run input parameters are in file md.mdp and we use the md.gro and md.cpt file from the equilibration step as starting values. Here we only run the simulation for 1 ns but because it is such a small system, it should be easily possible to run this simulation for microseconds, especially when a GPU is available:

mkdir NPT
cd NPT

gmx grompp -f md.mdp -c ../NVT/md.gro -t ../NVT/md.cpt -p ../topology/topol.top -o md.tpr
gmx mdrun -v -deffnm md -nt 4 -cpi -pin on
gmx mdrun parameters and performance tips

See the manual page for gmx mdrun for explanation of all the command line options.

See Getting good performance form mdrun for important information on how to make the most out of your hardward when running MD simulations with GROMACS.

The MDP file was set up so that GROMACS would write coordinates to a compressed trajectory XTC file (md.xtc) every 1 ps, and coordinates and velocities to the full-precision TRR file (md.trr) every 100 ps. The XTC file saves disk space with a lossy compression that only saves coordinates with a precision of 0.001 nm, which is sufficient for most analysis of classical MD trajectories; however, as starting points for ab-initio MD we want to use full float32 precision and velocities so we use the TRR file.