next up previous contents index
Next: Using HFB in your research Up: Up   Contents   Index

The Harmonic Fourier Beads Method as implemented under the TREK module of CHARMM

The implementation is based on the following papers:

Paper #1: The gradient-free HFB
Harmonic Fourier beads method for studying rare events on rugged energy surfaces
Ilja V. Khavrutskii, Karunesh Arora, and Charles L. Brooks, III
J. Chem. Phys. 125, 174108 (2006)
Paper #2: The gradient-augmented HFB
Gradient-augmented harmonic Fourier beads optimization for quantitative studies of reaction path ensembles
Ilja V. Khavrutskii and Charles L. Brooks, III
J. Chem. Phys. submitted in 2006. (expected 2007)
Paper #3: The generalized gradient-augmented HFB (atomic positions; center of mass positions)
Generalized gradient-augmented harmonic Fourier beads method with multiple atomic and/or center-of-mass positional restraints
Ilja V. Khavrutskii and J. Andrew McCammon
J. Chem. Phys. 127, 124901 (2007)
Paper #4: The generalized gradient-augmented HFB (generalized coordinates)
Computing accurate potentials of mean force in electrolyte solutions with the gradient-augmented harmonic Fourier beads method
Ilja V. Khavrutskii, Joachim Dzubiella and J. Andrew McCammon
J. Chem. Phys. submitted in 2007

Anyone considering using the HFB method is strongly encouraged to read the above two papers to get an idea of the method. Nevertheless, the documentation page attempts to be sufficient for an anxious person to get started running HFB optimization. Reading the papers will still increase your productivity, though.

General purpose of the Harmonic Fourier Beads (HFB) method

The purpose of the HFB method is to study various transformations between any two distinct states of a given system that could be described by some degrees of freedom, at present Cartesian coordinates of a subset of atoms of the full system, that are actively involved in or characteristic of the studied transformation.

The HFB method can generate and optimize reaction paths between the given end-point states of a particular system.
To generate an initial path the HFB method uses a procedure called activated evolution (see below).
Alternatively, the HFB method can use a path obtained by any other means in the form of a sequence of structures progressing from one end-point state to the other (for example a protein unfolding trajectory that would connect the native state to some arbitrary unfolded state of the protein).
The subsequent HFB path optimization provides either

  1. the Steepest Descent (SD) path on the corresponding potential energy surface (minimum potential energy path)
  2. the SD path on the corresponding free energy surface (minimum free energy reaction path ensemble)

Therefore, the HFB method is in the spirit of Kenichi Fukui's IRC reaction path approach further generalized to free energy surfaces.

Setting up the HFB simulation

Partitioning the system

For the HFB method to provide meaningful free-energy information, the system under investigation must be partitioned into the reaction or reactive coordinate space (RCS) and the complementary spectator coordinate space (SCS). To partition the system into RCS/SCS, one simply includes atoms (degrees of freedom) that appear to be important (active) for the studied transformation in the RCS, leaving the remaining (passive) atoms (degrees of freedom) in the SCS. The chosen RCS/SCS partition should be able to distinguish the two end-point states based on the RCS degrees of freedom (it will become clear within the first HFB optimization steps if appropriate RCS has been chosen). For example, for a study of a protein backbone conformational transitions one would put heavy backbone atoms (those that define Ramachandran backbone dihedral angles) into the RCS, leaving the side-chains and other remaining atoms in the SCS. If not sure what to put into the RCS then put all the heavy atoms, except for rotationaly degenerate ones (HARI space, see my Line Integral paper for the definition).

For the path optimization on the potential energy surface, partitioning the system might also be useful (for example, when the two end-point states come from different sources, such as different pdb structures of the same system where certain atoms of one structure might require permutations to match the sequence of the other structure). However, in the case of the potential energy path optimization one needs to be aware that the final potential energy profile calculation (see below) assumes that the coordinates in the SCS change continuously rather than abruptly. Therefore, it is a good idea when working on potential energy surfaces to keep as many atoms in the RCS as possible so that there are no surprises in the SCS. One popular partitioning scheme for the potential energy path optimization would include all the atoms unique with respect to any internal rotations into the RCS (HARI space). Another obvious partitioning scheme is to include all the atoms into the RCS.

It might be worth noting that energy-degenerate permutations of atoms within a functional group such as methyl, phenyl, tyrosyl etc., including rotations of the group as a whole do not need to be continuous in the SCS. However, when permutational discontinuities are encountered, the SCS atoms in the newly generated reference path might have significantly distorted corresponding bonds and angles. This behavior is considered to be normal and is expected. Should one wish to remove such distortions of the SCS, the RCS coordinates must be fixed and the SCS coordinates then optimized. The energy profile as computed by the HFB method will be accurate despite the permutations, indicating the robustness of the HFB method. Finally, in the case of the free energy optimization, the SCS coordinates will always be scrambled by averaging procedure. This is the correct behavior for the ensemble.

Path (ensemble) optimization

To optimize any path the HFB method discretizes the Fourier path into a sequence of reference beads, each bead representing the complete copy of the studied system. The sequence of beads progresses from the reactant to the product end-point state.
During a single HFB optimization step each bead in the sequence is evolved with respect to its corresponding reference independently of any other beads. To evolve the beads the HFB method employs mass-weighted harmonic restraints on the RCS degrees of freedom that are centered at the corresponding reference beads. The harmonic restraints modify the underlying surface into a strongly convex surface (even in the vicinity of transition states or saddle points of the original surface).
To be able to drive the evolution of the beads, the HFB method requires appropriate engines. Standard potential energy optimization engines such as SD, CG, ABNR available in CHARMM, or any other optimizers from any other program can be used with mode (1). For the mode (2) any molecular dynamics (MD), Langevin dyanmics (LD) or Monte Carlo (MC) engine in CHARMM or any other software package can be used. The only requirement is that the engines are compatible with the harmonic restraints applied in the RCS.
A path optimization driver is required to propagate the HFB optimization steps. The driver must be tailored to the specifics of the task performed and the specifics of the computer resources available, and could be written in any scripting language such as PERL, Python, a shell script or even as a CHARMM script.

The HFB method does not systematically provide the ultimate (global) SD path, but rather a local SD path to which the initially guessed path converges given the parameters utilized during the path optimization. In some cases, depending on the density of beads per path length in the RCS, one can expect variations of the extent of the path migration from the initially guessed pathway prior to settling down in the local minimum energy valley. The sparser the path (the lower the density) the larger global freedom the path has during the HFB optimization.

(Free-)energy profile

In addition to optimization of the reaction path, the HFB method provides a unique feature that allows computing the corresponding energy profiles along any given (reasonable) path.

In the potential energy mode, the potential energy profile is immediately available for every HFB step, provided you have sufficient beads to integrate the forces properly (typically 2-4 beads per energy well, but more is always better).

In the free-energy mode, the accuracy of the free-energy profile depends on the number of MD steps made during the evolution. To compute a reliable (converged) free-energy profile, it is necessary to collect sufficient data with MD simulation engine (beyond the number of steps typically used during the path optimization). I typically do this incrementally, by computing averages for say 500 ps trajectories, and then combining them using a charmm script that I developed. The only requirement is that the averages from different runs be statistically equivalent, i.e. contain the same number of frames and correspond to the run of the same length (if you start with 500ps then stick with 500ps increments). In this mode the HFB method provides some competition to the Weighted Histogram Analysis Method (WHAM) employing umbrella sampling in the appropriate coordinate space. Just like with potential energy, for the free energy profile to be meaningfull one has to provide sufficient resolution for all the features in the path. Typically 3-4 beads per energy well is sufficient and safe. To make sure your resolution is sufficient you can either compute the free energy profile with higher resolution (insert a new bead between each two beads in the path) or try different truncation indices to make sure that your profile does not change significantly after some number that is still less than K.

Examples of applications of the HFB method

  1. Protein folding
  2. Ligand binding
  3. Transport of a particular molecule through some media or channel
  4. Any other conformational changes of any molecule
  5. Chemical transformations of various kind that involve structural changes in participating molecule(s) e.g. enzyme catalysis (one would need a QM or QM/MM potential function so that bonds can be broken in one place and reformed in another for such applications)
  6. etc

Things that might be difficult to do are phase transitions, barrier-less transformations on very shallow essentially flat free energy surfaces such as for example diffusion of a water molecule through bulk water (difficult might still be possible if applied in a clever way).

Operations involved in the HFB method

The HFB method implementation under the TREK module (as a sub-command of TREK, such as for example CPR) does not provide either the engine or the complete driver. The HFB method implementation provides only those essential tasks that are required to propagate path optimization, i.e. making a new step during the optimization of the path given the information from the engines regarding the outcome of the beads' evolution. Therefore, the HFB method is an essential part of the optimization driver.

Important HFB tasks

As there are two papers on the HFB method, there are two versions of the HFB method - the original (paper #1) and the gradient-augmented (paper #2).

The original HFB method (paper #1)

  1. Reading-in the evolved (or initial, if first step) set of beads. Read in the whole path of K total beads using standard TREK options (binary trajectory is my personal preference).
  2. Parameterization of the path using the discrete set of beads. Fourier transform of the sequence of the degrees of freedom (Cartesian coordinates at present) of the structures connecting the end-point states that were read-in. Note that in this version all the structures are reoriented with respect to the reactant end-point state to best-fit their RCS coordinates. This procedure yields a continuos Fourier path representation in terms of the progress variable (alpha) given the set of the thus computed Fourier amplitudes. Thus, path parameterization performs Fourier transform of the discrete set of beads from the coordinate space into the Fourier amplitude space.
  3. Redistribution of the beads based on the Fourier parameterization. Given the end-point states and the set of Fourier amplitudes (path parameterization), for each of the degrees of freedom compute the length of the path curve in the RCS as a function of the progress variable. Find the values of the progress variable that split the path curve in RCS into desired number of equal segments using simple bracketing and bisection. Use the found values to generate a new discrete set of structures that connect the end-point states. At this point paths that have holes (have fewer beads than desired) in them can be patched.
  4. Writing-out the newly generated reference beads. Write out the Cartesian coordinates of the new reference beads into the binary CHARMM trajectory (standard format).

The gradient-enhanced HFB method (paper #2)

The gradient enhanced HFB method (paper #2) includes a number of additional tasks that have to do with the gradient.

  1. Reading in the evolved path trajectory followed by the corresponding reference path trajectory. The latter are required to compute the gradient and must be read after the corresponding evolved path beads. This two-step process fills in a single trajectory with the TREK module that holds double the number of beads in the path.
  2. Fourier transform (parameterization) of the evolved beads (the coordinates are used as read in, without any best-fit alignment), and computation of the normal vector to the path for subsequent use in projection and reversible work integral calculations.
  3. Computing the free energy gradient vectors at the discrete points (the evolved beads) given the type of the restraint used. Fourier transform of the computed discrete set of gradient vectors in the RCS. Optionally, computing projections of the gradient vectors orthogonal to the path.
  4. Calculate the reversible work integral using the Fourier representations of the path and the corresponding gradient in the RCS. This gives the energy profile as a function of the progress variable alpha.
  5. Enhanced SD step. Generating an SD-like step to create the new reference path, using the original discrete path points, their corresponding gradients (to minimize possible errors) and a uniform step size parameter.
  6. Perform steps (1-4) of the original HFB implementation on the new reference path.

The following are the tasks that could be accomplished with the HFB method as currently implemented in CHARMM

Initial path generation

Generating an initial path by linear interpolation

To generate an initial path, given only a reactant and a product the simplest choice is to perform a linear interpolation.

First, you decide on the number of points in the path and select an appropriate RCS.

Second, you read in the reactant and the product (2 beads) assuming they are reasonable (somewhat relaxed, without steric clashes etc.).

Finally, you perform linear interpolation between the read-in points (points are automatically superposed in the RCS prior to interpolation). That generates all the missing path points in the correct orientation. See below for the actual command syntax.

On the TREK level (the highest level) decide:

  1. decide and then specify the number of points in the path (MAXP)
  2. decide and specify the RCS selection (will be used to superimpose structures before linear interpolation)
  3. read in the reactant and the product structures only - do not need to be aligned (HFB will do that)
  4. invoke FP INIT (the FP will initialize by superposing all the structures (regardless of the INIT keyword) in the RCS space first

This will create a trajectory that holds MAXP-long path of line-interpolation generated reference beads.

NOTE: It is good to remember that the FP request will always superimpose all of the structures that are read-in.

Generating an initial path by activated evolution

The idea behind the activated evolution procedure is a simple one. Given only a reactant and a product states (2 beads), we already know how to make a linear interpolation between them. To take this one step further, consider that a few structures generated by linear interpolation that are closest to the end-points on either end of the path are still reasonably good.

Let's say you have a sequence of R good structures at the reactant side and a sequence of P good structures at the product side and a bunch of bad structures (the gap) in between. In such a case, it is perfectly reasonable to evolve the endpoints and the adjacent good points (R + P) and discard the bad ones.

To generate a new full step following such truncated evolution, you should, first, read in only the good evolved points and then perform a linear interpolation between the two adjacent points that correspond to the gap boundaries. In other words, fill the gap with a line segment. Linear interpolation of the gap will maintain the desired number of total points in the path and will make sure the distances between the points are consistent throughout.

As you get better and better points into the gap you gradually increase the number of the evolved points thus shrinking the gap of the bad structures until completely removed.

In its simplest (non-automated) version, the activated evolution procedure starts out as a line interpolation between the reactant and the product. One then decides, by inspection of the path or by actually running the evolution on all the beads and observing what happens, where to put the boundaries between the reasonable (good) and unreasonable (bad) structures on both ends. A note is made where the boundaries are in terms of the actual numbering of the resulting sequence of good structures. Only the good structures are then read in sequentially (never mind the gap) and then the corresponding boundary, typically adjacent, structures are used for line-interpolation to fill the gap. This procedure is repeated until all the beads are activated. See below for additional tips.

An example of activated evolution: Let's say you performed linear interpolation and then run optimization of the path. You find that 5 points after the reactant and 8 points before the product are reasonable in terms of their structure/energy (structure-wise one needs to make sure there are no enantiomeric inversion; costly cis/trans isomerizations and things of that sort unless those are actually desirable). Including the endpoints you have 6 points on the reactant and 9 points on the product side. Let's say your total number of points in the path is 32 (MAXP = 32). You then read in coordinates of the first 6 points followed by the coordinates of the last 9 points (total 15 points) sequentially. Note that the point 9 from the product end becomes the point 7 in the truncated trajectory. Now issue a command:

FP INIT STRT 6 STOP 7

This will insert a line segment between the points 6 and 7 with 17 additional points on it. This path will then be processed and reference points redistributed according to the length in the RCS and provided back to the user for further optimization (all 32 points, not just the 15 good points that have been read in).

Alternatively, you can use the dens keyword to fill all the gaps. Make sure that the parameter P never exceeds the number of good beads. This is also a good option when you have multiple gaps at the same time. See notes below regarding inserting new points into the path.

Inserting new points into the path

Sometimes it is desirable to insert new points into the given path. In that case the MAXP must be increased such that it is equal to the new desired number of points in the path say K. Then a regular path trajectory is read in that contains a smaller number of points and a FP DENS is invoked to fill the trajectory up to the MAXP. There are two cases where I needed this option.

  1. When I have to double (better to increase 2xK-1 to be able to utilize the data already obtained) the number of points (or just increase N/K-times). This is handy when one tries to increase the resolution of the initial rough/sparse exploratory path for further refinement followed by data collection to get accurate free energy profile. It might also be useful if one tries to set up a path along which standard umbrella sampling is performed followed by WHAM processing procedure where you need to improve on the overlap between windows (this is rudimentary now).
  2. Another reason to use this option was when for some reason a particular bead fails to return a usable structure. In this case it can be skipped during the reading procedure (resulting in K-1 total points) and compensated by the DENS to the MAXP = K. This might be very helpful in initial explorations and in conjunction with the activated evolution procedure.

At present all of the above tasks that involve manipulation with the beads should be performed with the original HFB implementation (FP). Particularly, because any sort of path can be read in, reference or evolved or anything else. In contrast the gradient-enhanced implementation is designed to evolve the beads and not as much to manipulate them. This is such because of the intrinsic dependence on the reference set of coordinates.

Some examples of using the HFB tool

The HFB path optimization for the alanine dipeptide

!define RCS
define rcs sele type n .or. type nt .or. type c .or. type ca .or. type cy end

!define some variables
set beads 32
set name diala
set prevjob 1
set runjob 2
calc maxbeads = 2 * @beads

! ready to start trek :
trek maxpoint @maxbeads select rcs end mass !**

! reading the evolved beads, first
traj read name dcd/@name_evolved_r@{prevjob}.dcd begin 1 skip 1

! reading the corresponding reference beads, second
traj read name dcd/@name_reference_r@{prevjob}.dcd begin 1 skip 1

!calling the gradient-enhanced HFB method
wrkp trnc 24 rtyp 1 rfrc 10.0 step 0.00125 !***

! writing the new reference path to a trajectory file
traj write name dcd/@name_reference_r@{jobrun}.dcd restart res/@name.cpr

quit

Note: the res/@name.cpr is junk for the HFB.
Note: If you set the STEP parameter to 0.0 this will become equivalent with the FP subcommand.

If one wants to see the energy profile and the corresponding structures computed on the grid of 256 points then alter the lines !** and !*** as follows.

trek maxpoint 256 select rcs end mass !**
wrkp trnc 24 rtyp 1 rfrc 10.0 step 0.00125 bead 32 grid 256 iprf 6 ptrj !***

This will print into the final reference trajectory dcd/@name_reference_r@{jobrun}.dcd 256 structures exactly matching the computed energy profile (printed in the output as 256 raws of progress variable versus corresponding energy). Only the PTRJ keyword should be removed to allow path propagation by generation of the new reference path.
Note: that the MAXP = GRID (in general MAXP >= GRID is required).

Optimization with the FP version of the HFB method is similar.

! ready to start trek :
trek maxpoint @beads select rcs end mass !**

! reading the evolved beads, first
traj read name dcd/@name_evolved_r@{prevjob}.dcd begin 1 skip 1

!calling the gradient-enhanced HFB method
fp trnc 24 !***

! writing the new reference path to a trajectory file
traj write name dcd/@name_reference_r@{jobrun}.dcd restart res/@name.cpr

quit

Using your imagination

The HFB method can be used to minimize a sinlge structure, such as a complex of a protein with a docked ligand for example. To do that one can use only two or three points an allow evolution only of one of the endpoints. Guess what you should specify for the reference and the evolved coordinates of the remaning (dummy) points and you will be ready to go.
Alternatively, one can construct the gradient using the MAIN, COMP, SCALAR, MASS without the HFB at all, since you would not need the path anyway in this case.