Introduction
Find the file top_all27_prot_lipid.rtf HERE. (same as .inp)
Simulation programs have natural limitations in both total size and in the counting variables used to label atoms and molecules. Because simulations can contain thousands of molecules with many tens or hundreds of thousands of atoms, it is convenient to split molecules into segments or chains (which may act as a subset designation, and effectively allow for double-counting of labels). A segment may be all the solvent, a subset of the solute molecules, the protein, etc. “Chain” usually refers to part of a protein—for example a pore protein may consist of six similar iris structures which change shape to open and close the pore. Each of these structures may be specified as a chain.

The focus of the simulation is usually on the solute—the protein, or lipid micelle or bilayer, etc. However, the nature of the solvent is also very important. Even if you decide that your simulation will be an aqueous system, there are many different models of “water”. Things to consider are complexity (“better” models are more expensive in CPU terms) and how the different water models physicality affect the simulation results. As with most uses of theory, you want to use a solvent model which is just “good enough” to display the physical features you require, and limits the computational expense you must pay.

The work this week will focus on these subdivisions necessary to run large simulations, and the effect that different solvent models may have on the overall simulation.

Splitting large systems

The simplest case of splitting a large system might be taking a RCSB/PDB system and splitting and renumbering the protein, solvent, and counter-ions seperately. While it is not necessary to split small proteins up into segments, it is often done if a protein contains more than one similar part.

Example 1: Salicylic Acid Binding Protein 2

1y7i_bio_r_250
This protein, (1y7i at the RCSB) consists of two mirror protein chains, some crystallographic waters, and several salicylic acid molecules, thus it is a good candidate for re-segmenting. Please note that we will not be running an actual simulation on this molecule, and will not be generating a full PSF/PDB pair. Also recall that all of these commands could be entered into a text file which could then be run (or modified for different proteins, and run), instead of re-entering text commands. Using VMD/tkcon, the steps are:

# STEP 1: Build Protein

# Script to split 1y7i into two pdb files, one for each chain.
# (you can run this file with "vmd -dispdev text -e ")

# Load the PDB file
mol delete all

# mol load pdb input/1JNO.pdb
mol pdbload 1y7i

# Select chain A and chain B
# (these chains were found using the Graphics -> Graphical Representations -> Selections menu)
set other "not resname HOH and not resname SAL and not resname MSE"
# this is to extract residues we do not wish to see
# as part of the protein...

set chainA [atomselect top "chain A and $other"]
set chainB [atomselect top "chain B and $other"]

# Write these atoms to separate pdb files
$chainA writepdb chainA.pdb
$chainB writepdb chainB.pdb

# Script to build the protein structure
# Run with "psfgen < >
package require psfgen

# Use the specified CHARMM27 topology file.
topology top_all27_prot_lipid.rtf

# D-Leucine and D-Valine have the same topology as the
# usual L-Leu and L-Val residues, they're just mirror
# images. Hence we can use the existing topology file.
# We also pdbalias an atom in the ethanolamide residue so
# that psfgen doesn't have to guess the position of the atom.
pdbalias residue DLE LEU
pdbalias residue DVA VAL
pdbalias residue HIS HSD
pdbalias atom ETA O OG
pdbalias atom ILE CD1 CD

# Build two segments, one for each chain.
segment SAB1 {
first NONE
last NONE
pdb chainA.pdb
}

segment SAB2 {
first NONE
last NONE
pdb chainB.pdb
}

# Load the coordinates for each segment.
coordpdb chainA.pdb SAB1
coordpdb chainB.pdb SAB2

# Write out the psf file
writepsf protein_SAB.psf

# Guess the positions of missing atoms. As long as all the heavy
# atoms are present, psfgen usually does a very good job of this.
guesscoord
writepdb protein.pdb

mol delete all
mol load psf protein.psf pdb protein.pdb


Although we will not run a simulation on this protein containing the undesirable residues (in the case of the non-water molecules, “undesirable” because we do not have parameters for them), we will create new segments for them and write out their PDBs in case we care to develop parameters later on. (You could also do these steps before you delete the old PDB data, above.)

mol pdbload 1y7i
set wat [atomselect top "resname HOH"]
set sal [atomselect top "resname SAL"]
# the MSE is a bit different— it is part
# of the protein. So we will extract the protein chains
# again, this time with the MSE residue.
set $other "not residue HOH and not residue SAL"
set fullA [atomselect top "chain A and $other"]
set fullB [atomselect top "chain B and $other"]

$wat writepdb protein_water.pdb
$sal writepdb protein_sal.pdb
$fullA writepdb protein_chainA_mse.pdb
$fullB writepdb protein_chainB_mse.pdb

Next: Solvation