The implementation is based on the following papers:
Anyone considering using the HFB method is strongly encouraged to read the above two papers to get an idea of the method. Nevertheless, the documentation page attempts to be sufficient for an anxious person to get started running HFB optimization. Reading the papers will still increase your productivity, though.
The purpose of the HFB method is to study various transformations between any two distinct states of a given system that could be described by some degrees of freedom, at present Cartesian coordinates of a subset of atoms of the full system, that are actively involved in or characteristic of the studied transformation.
The HFB method can generate and optimize reaction paths between the given end-point states of a particular system.
To generate an initial path the HFB method uses a procedure called activated evolution (see below).
Alternatively, the HFB method can use a path obtained by any other means in the form of a sequence of structures progressing from one end-point state to the other (for example a protein unfolding trajectory that would connect the native state to some arbitrary unfolded state of the protein).
The subsequent HFB path optimization provides either
Therefore, the HFB method is in the spirit of Kenichi Fukui's IRC reaction path approach further generalized to free energy surfaces.
For the HFB method to provide meaningful free-energy information, the system under investigation must be partitioned into the reaction or reactive coordinate space (RCS) and the complementary spectator coordinate space (SCS). To partition the system into RCS/SCS, one simply includes atoms (degrees of freedom) that appear to be important (active) for the studied transformation in the RCS, leaving the remaining (passive) atoms (degrees of freedom) in the SCS. The chosen RCS/SCS partition should be able to distinguish the two end-point states based on the RCS degrees of freedom (it will become clear within the first HFB optimization steps if appropriate RCS has been chosen). For example, for a study of a protein backbone conformational transitions one would put heavy backbone atoms (those that define Ramachandran backbone dihedral angles) into the RCS, leaving the side-chains and other remaining atoms in the SCS. If not sure what to put into the RCS then put all the heavy atoms, except for rotationaly degenerate ones (HARI space, see my Line Integral paper for the definition).
For the path optimization on the potential energy surface, partitioning the system might also be useful (for example, when the two end-point states come from different sources, such as different pdb structures of the same system where certain atoms of one structure might require permutations to match the sequence of the other structure). However, in the case of the potential energy path optimization one needs to be aware that the final potential energy profile calculation (see below) assumes that the coordinates in the SCS change continuously rather than abruptly. Therefore, it is a good idea when working on potential energy surfaces to keep as many atoms in the RCS as possible so that there are no surprises in the SCS. One popular partitioning scheme for the potential energy path optimization would include all the atoms unique with respect to any internal rotations into the RCS (HARI space). Another obvious partitioning scheme is to include all the atoms into the RCS.
It might be worth noting that energy-degenerate permutations of atoms within a functional group such as methyl, phenyl, tyrosyl etc., including rotations of the group as a whole do not need to be continuous in the SCS. However, when permutational discontinuities are encountered, the SCS atoms in the newly generated reference path might have significantly distorted corresponding bonds and angles. This behavior is considered to be normal and is expected. Should one wish to remove such distortions of the SCS, the RCS coordinates must be fixed and the SCS coordinates then optimized. The energy profile as computed by the HFB method will be accurate despite the permutations, indicating the robustness of the HFB method. Finally, in the case of the free energy optimization, the SCS coordinates will always be scrambled by averaging procedure. This is the correct behavior for the ensemble.
To optimize any path the HFB method discretizes the Fourier path into a sequence of reference beads, each bead representing the complete copy of the studied system. The sequence of beads progresses from the reactant to the product end-point state.
During a single HFB optimization step each bead in the sequence is evolved with respect to its corresponding reference independently of any other beads. To evolve the beads the HFB method employs mass-weighted harmonic restraints on the RCS degrees of freedom that are centered at the corresponding reference beads. The harmonic restraints modify the underlying surface into a strongly convex surface (even in the vicinity of transition states or saddle points of the original surface).
To be able to drive the evolution of the beads, the HFB method requires appropriate engines. Standard potential energy optimization engines such as SD, CG, ABNR available in CHARMM, or any other optimizers from any other program can be used with mode (1). For the mode (2) any molecular dynamics (MD), Langevin dyanmics (LD) or Monte Carlo (MC) engine in CHARMM or any other software package can be used. The only requirement is that the engines are compatible with the harmonic restraints applied in the RCS.
A path optimization driver is required to propagate the HFB optimization steps. The driver must be tailored to the specifics of the task performed and the specifics of the computer resources available, and could be written in any scripting language such as PERL, Python, a shell script or even as a CHARMM script.
The HFB method does not systematically provide the ultimate (global) SD path, but rather a local SD path to which the initially guessed path converges given the parameters utilized during the path optimization. In some cases, depending on the density of beads per path length in the RCS, one can expect variations of the extent of the path migration from the initially guessed pathway prior to settling down in the local minimum energy valley. The sparser the path (the lower the density) the larger global freedom the path has during the HFB optimization.
In addition to optimization of the reaction path, the HFB method provides a unique feature that allows computing the corresponding energy profiles along any given (reasonable) path.
In the potential energy mode, the potential energy profile is immediately available for every HFB step, provided you have sufficient beads to integrate the forces properly (typically 2-4 beads per energy well, but more is always better).
In the free-energy mode, the accuracy of the free-energy profile depends on the number of MD steps made during the evolution. To compute a reliable (converged) free-energy profile, it is necessary to collect sufficient data with MD simulation engine (beyond the number of steps typically used during the path optimization). I typically do this incrementally, by computing averages for say 500 ps trajectories, and then combining them using a charmm script that I developed. The only requirement is that the averages from different runs be statistically equivalent, i.e. contain the same number of frames and correspond to the run of the same length (if you start with 500ps then stick with 500ps increments). In this mode the HFB method provides some competition to the Weighted Histogram Analysis Method (WHAM) employing umbrella sampling in the appropriate coordinate space. Just like with potential energy, for the free energy profile to be meaningfull one has to provide sufficient resolution for all the features in the path. Typically 3-4 beads per energy well is sufficient and safe. To make sure your resolution is sufficient you can either compute the free energy profile with higher resolution (insert a new bead between each two beads in the path) or try different truncation indices to make sure that your profile does not change significantly after some number that is still less than K.
Things that might be difficult to do are phase transitions, barrier-less transformations on very shallow essentially flat free energy surfaces such as for example diffusion of a water molecule through bulk water (difficult might still be possible if applied in a clever way).
The HFB method implementation under the TREK module (as a sub-command of TREK, such as for example CPR) does not provide either the engine or the complete driver. The HFB method implementation provides only those essential tasks that are required to propagate path optimization, i.e. making a new step during the optimization of the path given the information from the engines regarding the outcome of the beads' evolution. Therefore, the HFB method is an essential part of the optimization driver.
As there are two papers on the HFB method, there are two versions of the HFB method - the original (paper #1) and the gradient-augmented (paper #2).
The gradient enhanced HFB method (paper #2) includes a number of additional tasks that have to do with the gradient.
The following are the tasks that could be accomplished with the HFB method as currently implemented in CHARMM
To generate an initial path, given only a reactant and a product the simplest choice is to perform a linear interpolation.
First, you decide on the number of points in the path and select an appropriate RCS.
Second, you read in the reactant and the product (2 beads) assuming they are reasonable (somewhat relaxed, without steric clashes etc.).
Finally, you perform linear interpolation between the read-in points (points are automatically superposed in the RCS prior to interpolation). That generates all the missing path points in the correct orientation. See below for the actual command syntax.
On the TREK level (the highest level) decide:
This will create a trajectory that holds MAXP-long path of line-interpolation generated reference beads.
NOTE: It is good to remember that the FP request will always superimpose all of the structures that are read-in.
The idea behind the activated evolution procedure is a simple one. Given only a reactant and a product states (2 beads), we already know how to make a linear interpolation between them. To take this one step further, consider that a few structures generated by linear interpolation that are closest to the end-points on either end of the path are still reasonably good.
Let's say you have a sequence of R good structures at the reactant side and a sequence of P good structures at the product side and a bunch of bad structures (the gap) in between. In such a case, it is perfectly reasonable to evolve the endpoints and the adjacent good points (R + P) and discard the bad ones.
To generate a new full step following such truncated evolution, you should, first, read in only the good evolved points and then perform a linear interpolation between the two adjacent points that correspond to the gap boundaries. In other words, fill the gap with a line segment. Linear interpolation of the gap will maintain the desired number of total points in the path and will make sure the distances between the points are consistent throughout.
As you get better and better points into the gap you gradually increase the number of the evolved points thus shrinking the gap of the bad structures until completely removed.
In its simplest (non-automated) version, the activated evolution procedure starts out as a line interpolation between the reactant and the product. One then decides, by inspection of the path or by actually running the evolution on all the beads and observing what happens, where to put the boundaries between the reasonable (good) and unreasonable (bad) structures on both ends. A note is made where the boundaries are in terms of the actual numbering of the resulting sequence of good structures. Only the good structures are then read in sequentially (never mind the gap) and then the corresponding boundary, typically adjacent, structures are used for line-interpolation to fill the gap. This procedure is repeated until all the beads are activated. See below for additional tips.
An example of activated evolution: Let's say you performed linear interpolation and then run optimization of the path. You find that 5 points after the reactant and 8 points before the product are reasonable in terms of their structure/energy (structure-wise one needs to make sure there are no enantiomeric inversion; costly cis/trans isomerizations and things of that sort unless those are actually desirable). Including the endpoints you have 6 points on the reactant and 9 points on the product side. Let's say your total number of points in the path is 32 (MAXP = 32). You then read in coordinates of the first 6 points followed by the coordinates of the last 9 points (total 15 points) sequentially. Note that the point 9 from the product end becomes the point 7 in the truncated trajectory. Now issue a command:
FP INIT STRT 6 STOP 7
This will insert a line segment between the points 6 and 7 with 17 additional points on it. This path will then be processed and reference points redistributed according to the length in the RCS and provided back to the user for further optimization (all 32 points, not just the 15 good points that have been read in).
Alternatively, you can use the dens keyword to fill all the gaps. Make sure that the parameter P never exceeds the number of good beads. This is also a good option when you have multiple gaps at the same time. See notes below regarding inserting new points into the path.
Sometimes it is desirable to insert new points into the given path. In that case the MAXP must be increased such that it is equal to the new desired number of points in the path say K. Then a regular path trajectory is read in that contains a smaller number of points and a FP DENS is invoked to fill the trajectory up to the MAXP. There are two cases where I needed this option.
At present all of the above tasks that involve manipulation with the beads should be performed with the original HFB implementation (FP). Particularly, because any sort of path can be read in, reference or evolved or anything else. In contrast the gradient-enhanced implementation is designed to evolve the beads and not as much to manipulate them. This is such because of the intrinsic dependence on the reference set of coordinates.
!define RCS
define rcs sele type n .or. type nt .or. type c .or. type ca .or. type cy end
!define some variables
set beads 32
set name diala
set prevjob 1
set runjob 2
calc maxbeads = 2 * @beads
! ready to start trek :
trek maxpoint @maxbeads select rcs end mass !**
! reading the evolved beads, first
traj read name dcd/@name_evolved_r@{prevjob}.dcd begin 1 skip 1
! reading the corresponding reference beads, second
traj read name dcd/@name_reference_r@{prevjob}.dcd begin 1 skip 1
!calling the gradient-enhanced HFB method
wrkp trnc 24 rtyp 1 rfrc 10.0 step 0.00125 !***
! writing the new reference path to a trajectory file
traj write name dcd/@name_reference_r@{jobrun}.dcd restart res/@name.cpr
quit
Note: the res/@name.cpr is junk for the HFB.
Note: If you set the STEP parameter to 0.0 this will become equivalent with the FP subcommand.
If one wants to see the energy profile and the corresponding structures computed on the grid of 256 points then alter the lines !** and !*** as follows.
trek maxpoint 256 select rcs end mass !**
wrkp trnc 24 rtyp 1 rfrc 10.0 step 0.00125 bead 32 grid 256 iprf 6 ptrj !***
This will print into the final reference trajectory dcd/@name_reference_r@{jobrun}.dcd 256 structures exactly matching the computed energy profile (printed in the output as 256 raws of progress variable versus corresponding energy). Only the PTRJ keyword should be removed to allow path propagation by generation of the new reference path.
Note: that the MAXP = GRID (in general MAXP >= GRID is required).
! ready to start trek :
trek maxpoint @beads select rcs end mass !**
! reading the evolved beads, first
traj read name dcd/@name_evolved_r@{prevjob}.dcd begin 1 skip 1
!calling the gradient-enhanced HFB method
fp trnc 24 !***
! writing the new reference path to a trajectory file
traj write name dcd/@name_reference_r@{jobrun}.dcd restart res/@name.cpr
quit
The HFB method can be used to minimize a sinlge structure, such as a complex of a protein with a docked ligand for example.
To do that one can use only two or three points an allow evolution only of one of the endpoints. Guess what you should specify
for the reference and the evolved coordinates of the remaning (dummy) points and you will be ready to go.
Alternatively, one can construct the gradient using the MAIN, COMP, SCALAR, MASS without the HFB at all, since you would not need the path anyway in this case.