Little ISIM-HowTo


Table of Contents

  1. Installation Instructions
  2. Overview
  3. Input Specifications
  4. Output Files
  5. Visualization of Results
  6. Limitations

How to install ISIM ...

Since ISIM relies on the two software packages MALOC (Minimal hardware Abstraction Layer for clean Object-oriented C) and APBS (Adaptive Poisson-Boltzmann Solver) you first have to make sure to have these programs installed with a directory structure as given in the APBS installation instructions. This includes that you have a common top-level directory and the corresponding environment variable "TOP" set. After you have downloaded the archive simply do the following commands on a C-like shell:

# cd ${TOP}
# gzip -dc isim.tar.gz | tar xvf -

This decompresses and extracts the archive and creates the directory skeleton with all necessary files. You should find them within the directory "${TOP}/isim/" afterwards. Then do:

# setenv GEN_INCLUDE ${TOP}/dist/include
# setenv GEN_LIBRARY ${TOP}/dist/lib

These two variables are needed to let the configuration program know, where your shared directory for header files, libraries and executables is (if you chose a different path when installing MALOC and APBS make sure to modify the commands correspondingly). Explicitly the linkage to the other applications will fail if you do not specify these paths. Then do:

# cd ${TOP}/isim/
# ./configure --prefix=${TOP}/dist

The configuration program does a lot of checks for the system environment, all of which may not be necessary. As long as the routine terminates without abort or exit-messages everything should be fine. Finally do:

# make
# make install

After that you should find the ISIM-header files as well as the library and executable within the "${TOP}/dist/" subdirectory tree. At any time you can cancel the whole procedure either by doing (you have to start all over again):

# cd ${TOP}
# rm -r isim/
# rm dist/lib/{bla}/libisim.*
# rm dist/bin/{bla}/isim
# rm dist/include/isim/*

Or try (preferably, because then you only have to redo the installation steps):

# cd ${TOP}/isim/
# make distclean

What ISIM is ...

In general you should read the reference publication (once it is available), if you want to know more about ISIM. It covers the theoretical background, presents the algorithms should give you any information to understand the program. This document is supposed to tell you, how to use it, so this part will be fairly short.

The original idea behind ISIM is as simple as old. Since it is a well-established fact that the behaviour of ions in solution can be described within an ensemble concept provided by statistical mechanics, various schemes have been extensively used to simulate electrolyte solutions. The grand canonical ensemble (subsequently abbreviated as GCE) has the fundamental advantage that numbers are allowed to change (it is apparently not a good representation of a microscopic subsystem of an electrolyte solution, if the numbers of present ions are held strictly constant, even if it is "boring bulk territory").

A Monte Carlo (MC) procedure within the GCE (then called GCMC) is normally built up by deriving probabilities of creation and destruction for inserting or deleting particles, which are compared to a random number in order to accept or reject the change. This decision is governed by both bulk properties (such as excess chemical potentials of solvation or bulk concentration) and by the microscopic sorroundings (via explicit pair interactions with other species). Technically schemes can be separated into those treating particle types dependently (e.g. in electrolyte solutions only neutral combinations would be subject to deletion or insertion in one elementary step to inherently maintain electroneutrality) or independently, which is the way ISIM works. Experience tells that in addition conventional Metropolis MC moves are needed to provide sufficient sampling of the configuration space, which yields three types of elementary moves: inserting, deleting and moving particles. There is a set of parameters determining the internal balance of the whole procedure, for which - apart from some negative statements - no "best solution" exists. Some aspects of this discussion can be found in the section describing just these (input) parameters.

There are thousands of phenomena in biochemistry which are dominated by electrostatics. Ions in solution always play a role here: if not in a microscopic picture then at least via bulk effects (charge compensation/screening). Structural data on biomolecules has become more and more available. ISIM's typical application is a simulation "droplet" containing any macromolecule of interest surrounded by an arbitrarily composed electrolyte solution. The analysis focuses on the equilibrium state/structure of the latter with quantities like pair correlation functions, ion number densities in full spatial resolution (e.g. to monitor occupation of binding sites) or the mean potential (e.g. to detect deviations from continuum theories) being covered.

In order to compute interactions between explicit GCMC particles and fixed solute (i.e., its atoms with their force field radii and charges) the electrostatic potential in pure solvent as calculated by a Poisson-Boltzmann solver (e.g. APBS) is used as reference state (i.e., the total potential acting on a GCMC particle has contributions from all other explicit particles (soft-core steric interaction + Coulomb's law) and from the calculated field of the macromolecule mentioned before and from hard-sphere steric interactions with the fixed solute (step function)).

The term "arbitrarily composed electrolyte solution" used above brings up the issue of parameterization. Every GCMC procedure has some crucial parameter, which is closely related to the excess chemical potential of solvation. It governs in how far target concentrations are matched and is therefore crucial for the sanity of the procedure (when treating particle types independently like ISIM does this implies that bad parameters might yield (on average) non-neutral solutions and therefore physically irrelavant results). Calculating or estimating self-consistent a priori values for excess chemical potentials is very difficult. Therefore ISIM has its own parameterization scheme, which in principle allows unrestricted choices (of course the necessary computer time scales with complexity).

GCMC simulations are of course useful in many other problems in science. The design of the software and its coding style (a C-flavor mimicking the object-oriented concept of C++) is chosen to allow an easy implementation of new features. Similarly refinements of the model(s) used in ISIM should be possible without too much internal "programming hassle".

How to make ISIM run ...

There are some things you have to make sure, before you can use ISIM. Since ISIM creates a lot of output files with filenames independent of the job name, it is useful to create a directory for the job. In this directory either create an input file (see below) or copy one of the input files from the "examples/" subdirectory into it and make the necessary modifications. Then you can do something like:

# .${TOP}/dist/bin/{bla}/isim yourjob.in

The process should terminate abnormally with a statement like

Files for excess chemical potentials invalid.
in the file "yourjob.ERR". You have to provide external files in the working directory named YOURNAME_EXCESS, where YOURNAME is the name you gave an explicit particle within the IONS section (also see below). The format of these files is relatively easy. Just give N+1 columns with the first N of them containing the concentration values (in mM) for a particular scheme and the last one the excess chemical potential value (in kcal/mol) for the species the file is made for. Two (possibly annoying) features are the necessity for a header line and that the interpolation procedure relies on a strict order of the concentration values. An example file for one within a system of three species over a narrow concentration range might look like:

#        SODIUM      MAGNESIUM       CHLORINE       kcal/mol
       120.0000        35.0000       190.0000  -0.2123558011
       120.0000        40.0000       200.0000  -0.2159573955
       120.0000        45.0000       210.0000  -0.2222860737
       130.0000        35.0000       200.0000  -0.2116926512
       130.0000        40.0000       210.0000  -0.2183961345
       130.0000        45.0000       220.0000  -0.2310599633
       140.0000        35.0000       210.0000  -0.2124611651
       140.0000        40.0000       220.0000  -0.2185297560
       140.0000        45.0000       230.0000  -0.2336664020
		

You will always get the right format as the output of a parametrization run (see below) for the system of your interest and this is usually the way to obtain these files, as external data is only poorly available and/or applicable for this particular purpose. Within a normal simulation the program reads in these values only once, for off-table values multidimensional, linear interpolation is done. Right now you cannot be sure to be able to treat mixtures of only neutral particles correctly (because the existence of only N-1 independent concentration variables for N species (gross electroneutrality constraint) enters the procedure implicitly).

When you are working in parametrization mode you have to provide two different types of input files for explicit particles. One gives guesses for the excess chemical potential, which the parametrization algorithm (see reference publication for details) uses as seed values. These are N files named YOURNAME_GUESS, which are simple two-column data tables (first for concentration values in mM, second for chemical potential values in kcal/mol) without a header line. The second one is used to specify the range of concentrations you want to parametrize. These are N-1 files for the first N-1 species you specify in the IONS section of your input file (see below) named YOURNAME_SCHEME. The concentrations of the last particle type are assumed to be dependent, so make sure it is not a neutral (and therefore independent) one. Dependent on those files and your general input file the parametrization run will perform the necessary empty-box (bulk) simulations. For more details please refer to the reference publication and the input parameters section.

For runs in normal working mode another preliminary restriction is the data format for atom-composed macromolecules, which has to be PQR. Fortunately there is a web-based conversion routine for PDB files (with some limitations concerning atom types, ununsual contents), which you can use to overcome this difficulty. It removes water and adds hydrogens (see references mentioned on the website), but has no rule for removing ions or crystallization artifacts, which you partially might be willing to do manually instead. Then you simply have to copy the PQR-file into your job directory.

As outlined in the introductory section ISIM's energy terms are split into three parts. The inner electrolyte energy is calculated as the sum of all pair potentials in a combination of Coulombic and Lennard-Jones interactions, the steric interface energy with the macromolecule is in a simplest assumption a step function (better models at this point represent a severe numerical challenge when cutoffs are to be avoided) and finally the electrostatic interface energy needs something like a electrostatic potential of the macroion in pure solvent. This last component is provided by APBS and therefore you will need another input file containing just that quantity grid written out onto a grid in dx-format. Eventually you will want to compare Poisson-Boltzmann theory with the ISIM results (since the excplicit treatment has the difference of microscopic detail and steric restrictions). Given that case you will have to run another APBS calculation with implicit ions. ISIM is able to write out deviations between both results on its own analysis grid (for more details on these refer to the description of the input parameters).

Once you have created all necessary files and specified their paths in the input file you should be able to do an ISIM run, which does not terminate with a statement like the one above. The following subsections cover aspects of the input file preparation in detail.

Summary of input file specifications (many parameters are self-explaining so use this only as reference):

  1. Section GENERAL: The general appearance of a GENERAL section is like:

    #GENERAL
    
    FORM                          S
    RADIUS                        66.0  
    LENGTH                        180.0   /*redundant, since 'S' is selected*/
    TEMPERATURE                   298.0
    MEDIUM_PERMITTIVITY           78.36
    GENERAL_MODE                  N
    
    #ENDGENERAL
    		

    The various input parameters are described in the following paragraph.

    FORM
    Defines shape of the simulation system. You can choose between a sphere ('S') and a cylinder with axis being the coordinate system's z-axis ('Z'). The implementation of cylindrical systems was done more for reasons of computational efficiency, but nevertheless the symmetry is specifically included in some of ISIM's analysis routines. You should always specify this, unless you work in parametrization mode (systems are always spherical then).
    RADIUS
    This is either the spherical radius or the cylindrical radius around the symmetry axis (both in Å) of the simulation system. You should always specify this, unless you work in parametrization mode (it is simply ignored then).
    LENGTH
    In case of cylindrical systems you have to specify the full length of the simulation system along the symmetry axis. The cylinder will be placed symmetrically around the coordinate system's origin. This keyword is redundant for the selection 'S' for FORM and for the parametrization mode in general.
    TEMPERATURE
    Defines the simulation temperature in K. You should always specify this.
    MEDIUM_PERMITTIVITY
    Defines the relative permittivity of the medium (usually continuum solvent model) in the simulation system. You should always specify this.
    GENERAL_MODE
    The only accepted choices are 'N' for "normal" and 'P' for "parametrization" right now. The input file is handled a little bit differently in case of the latter. In fact all parameters, which do not have the explicit comment that they are ignored in parametrization mode, are treated as general simulation setup parameters. Those require a very careful choice then to achieve a good tradeoff between quality and efficiency. You should always specify this.
  2. Section TECHNICAL: The general appearance of a TECHNICAL section is like:

    #TECHNICAL
    
    TOTAL_STEPS                   1000
    MINIMIZATION_STEPS            35 
    CREATION_DESTRUCTION_CYCLES   10    
    SHUFFLING_STEPS               10
    SHUFFLING_MODE                N
    SO_MANY_ARE_SOME              40      /*redundant, since 'N' is selected*/
    MAXIMUM_DISPLACEMENT          2.0
    PARAMETRIZATION_MINIMUM       50
    
    #ENDTECHNICAL
    		

    The various input parameters are described in the following paragraph.

    TOTAL_STEPS
    Defines number of total steps to be performed with each step consisting of the specified amount of creation/destruction cycles as well as shuffling steps. There are some analysis tasks which are done every single step, too (such as collecting number and charge density data or just monnitoring energy and ion numbers). You should always specify this.
    MINIMIZATION_STEPS
    Defines number of preliminary minimization steps to be done. These are essentially Monte Carlo one-particle shuffling steps at 0K. This functionality is not to be confused with the equilibration procedure. You should always specify this.
    CREATION_DESTRUCTION_CYCLES
    Defines number of creation/destruction cycles within each step. In such a cycle one ion of each type in the order of your input is attempted to be created followed by analagous attempts to destroy one particle of each type. You should always specify this.
    SHUFFLING_STEPS
    Defines number of moving step within each total step. Indepedent of the mode (see below) on steps always means, that every present particle was attempted to be moved once. You should always specify this.
    SHUFFLING_MODE
    The shuffling mode defines when an energy evaluation is done during moving attempts. 'N' means that one particle is moved with an subsequent energy evaluation to decide whether to accept the cahnge or not. 'A' means that all present particles are moved randomly and independently, which is followed by an energy evaluation for the complete change. Finally 'S' stands for an intermediate method where the user specifies a number of ions to be moved simultanously. This is essentially separated for the different types (so never two ions of different types are within one attempt). You should always specify this.
    SO_MANY_ARE_SOME
    In case of shuffling mode 'S' you will need to define the block size for the shuffling, which is done by this keyword. It's redundant for other modes but necessary for 'S'.
    MAXIMUM_DISPLACEMENT
    The random move attempts the particles are subject of are restricted to a particle-centered cube, for which this parameter specifies the sidelength (in Å). Unless you set to the number of shuffling and minimization steps to zero, you should always specify this.
    PARAMETRIZATION_MINIMUM
    Within parametrization mode this defines to the minimum number of particles of every present type to appear within the parametrization simulations. Effectively the system dimensions are adjusted, so that the mimimum requirements are fulfilled (it does not make sense to simulate an ion pair to obtain bulk properties). It is redundant for the normal working mode (for modes see section GENERAL).
  3. Section ANALYSIS: The general appearance of an ANALYSIS section is like:

    #ANALYSIS
    
    EQUILIBRATION_STEPS           8000 
    GRID_RESOLUTION               60           
    COORDINATES_OUTPUT            2500 
    NUMBER_DENSITIES_OUTPUT	      2500
    CHARGE_DENSITY_OUTPUT         10000       
    POTENTIAL_OUTPUT              2500
    PAIR_CORRELATION_OUTPUT	      2500  
    POTENTIAL_FREQUENCY           50
    PAIR_CORRELATION_FREQUENCY    50		
    PAIR_CORRELATION_RESOLUTION   0.3
    PAIR_CORRELATION_SHELLS	      5
    
    #ENDANALYSIS
    		

    The various input parameters are described in the following paragraph.

    EQUILIBRATION_STEPS
    This parameter defines the step, after which all monitored averages are flushed. This is essentially a cutoff of the first steps to assure the significance of the averaged quantities (although this effect is usually of minor importance, as the equilibration is surprisingly fast, no typical number can be given here though, since it relies too much on the choices in other parameters). You should always specify this.
    GRID_RESOLUTION
    Many quantities like average number densities or the average potential are collected on an analysis grid, whose symmetry is equivalent to the system's shape. Unfortunately cylindrical and spherical grids suffer from inefficiency by unequal point distributions, so you have to choose a rather high resolution to get detailed pictures (as a rough estimate: 60 gives moderate resolution, 120 should be more than enough for most systems). Moreover the resolution can only be chosen once for all three dimensions. The grid-based analysis will most likely be a target of significant changes in the near future. You should always specify this, unless you are in parametrization mode (no grid-based analysis then).
    COORDINATES_OUTPUT
    The program takes the specification here to write out a snapshot file of the coordinate of all present ions every "nth" step, where "n" is the specification. You should always specify this, unless you are in parametrization mode (no output then).
    NUMBER_DENSITIES_OUTPUT
    This is interpreted analagously to COORDINATES_OUTPUT for writing out updated versions of the averaged number densities for all specified ion types on the analysis grid as well as a radially averaged version. The gridpoints are explicitly contained only in the corresponding potential output files. You should always specify this, unless you are in parametrization mode (no output then).
    CHARGE_DENSITY_OUTPUT
    This is the same as NUMBER_DENSITIES_OUTPUT for the averaged charge density due to explicit ions. You should always specify this, unless you are in parametrization mode (no output then).
    POTENTIAL_OUTPUT
    This is the same as NUMBER_DENSITIES_OUTPUT for the averaged potential due to explicit ions and the macromolecule (if one is present). In addition it contains the explicit position information. You should always specify this, unless you are in parametrization mode (no output then).
    PAIR_CORRELATION_OUTPUT
    This is also treated analagously to the other output parameters. It controls the frequency for writing out the pair correlation functions for all combinations of types of particles. You should always specify this, unless you are in parametrization mode (no output then).
    POTENTIAL_FREQUENCY
    Since the calculation of the grid-based mean potential (contributions of explicit particles and fixed solute) is computationally tedious (scales with number of gridpoints times number of particles!) you will not want to calculate it in every single step. Good averaging can be achieved with a number of snap shots throughout the whole simulation. Therefore this parameters allows you to tell the program to perform the calculation only every nth step, where n is your specification. You should always specify this, unless you are in parametrization mode (no potential analysis then).
    PAIR_CORRELATION_FREQUENCY
    This is treated analagously to the potential frequency parameter. The pair correlation functions need an operation, which scales roughly in multiples (following resolution) of number of types squared times number of particles squared, to be calculated. Additionally to have good statistics you will usually need more snapshots than for the potential. You should always specify this, unless you are in parametrization mode (no pair correlation analysis then).
    PAIR_CORRELATION_FREQUENCY
    This parameter defines the thickness of the shells around an arbitrary particle, for which the other particles are counted separately thereby defining the distance resolution for the plots. You should always specify this, unless you are in parametrization mode (no pair correlation analysis then).
    PAIR_CORRELATION_SHELLS
    This parameter defines the number of shells the system is divided into. The concept here is to avoid averaging of pair correlation functions throughout structurally different layers of the solution. In other words does not give much insight to have a plot, which includes both ions in a dense condensation layer and in bulk property. Therefore the system is always divided into at least two shells, one within the maximum dimensions of the fixed solute and the other one being the remaining part. The latter domain can be divided into n subshells with equal thickness, where is your specification here. You should always specify this, unless you are in parametrization mode (no pair correlation analysis then).
  4. Section IONS: The general appearance of an IONS section is like (the term might be a bit misleading, since the code is not restricted to ionic particles):

    #IONS
    
    MODEL                         L
    TYPES                         2 
    CALCIUM                       75.0    2.0  2.41  0.450 
    CHLORINE                      150.0  -1.0  4.86  0.168
    
    #ENDIONS
    		

    The various input parameters are described in the following paragraph.

    MODEL
    Currently two choices are available, both representing spherical particles, which interact electrostatically by Coulomb's law and sterically by either a simple hard-sphere potential (choice H) or by a soft-core potential (Lennard-Jones 12/6, choice L). You should always specify this.
    TYPES
    This gives the number of different particle types you want to specify in the following. You should always specify this.
    YOURNAME
    This is entirely variable, as long as you specify so many different types as given by the corresponding parameter (separated by linebreaks) with all the necessary data. This includes a name (blank-free string), its concentration in mM, its charge in multiples of e, its size parameter (either Lennard-Jones diameter (model L) or hard sphere radius (model H) both in Å) and in case of the soft-core model the Lennard-Jones well-depth parameter in kJ/mol. This part is the Achilles heel of the input file (although none of the other sections is close to being foolproof), so you might want to check for the exact form here, in case something weird happens.
  5. Section MACROMOLECULE: The general appearance of a MACROMOLECULE section is like:

    #MACROMOLECULE
    
    TYPE_ID                               P    
    STERIC_MODE                           S    
    BORN_ION_CHARGE                       30.0                /*redundant, since 'P' is selected*/
    BORN_ION_RADIUS                       10.0                /*redundant, since 'P' is selected*/
    PQR_FILENAME                          somename.pqr 
    PQR_DX_POTENTIAL_FILENAME             someothername.dx
    PQR_DX_REFERENCE_POTENTIAL_FILENAME   yetanothername.dx
    PQR_GRID_RESOLUTION                   60
    PQR_OUTSIDE_MESH_MODE                 1
    
    #ENDMACROMOLECULE
    		

    The various input parameters are described in the following paragraph.

    TYPE_ID
    Defines the type of macromolecule you want to use. You can choose between a file-based molecule (PQR-format, see earlier comments, choice P) and a uniformly charged sphere (Born ion, choice B). The implementation of charged cylinders (choice Z) is currently forzen, but be concluded soon. The recommended way to realize an empty-box simulation (no macromolecule) is by choosing a Born ion with zero charge and zero radius. You should always specify this.
    STERIC_MODE
    This parameter is slightly redundant (currently), since the steric interactions between explicit GCMC particles and the fixed solute are always step functions (choice S). Originally there were other choices for the Born ion case, but their use is not recommended. The relevant case (file-based species) poses significant problems in the course of introducing more sophisticated models. You should always specify this.
    BORN_ION_RADIUS
    This is the radius (in Å) of the spherical macroion (Born ion), for which Debye Hückel theory works (analytical reference). It is redundant unless you have specified B for the macromolecule identifier.
    BORN_ION_CHARGE
    This is the charge (in multiples of e) of the spherical macroion (Born ion). It is redundant, unless you have specified B for the macromolecule identifier.
    PQR_FILENAME
    Specifies the filename, where ISIM can find the PQR-file you wish to use as fixed solute. It is redundant, unless you have specified P for the macromolecule identifier.
    PQR_DX_POTENTIAL_FILENAME
    Specifies the filename, where ISIM can find the dx-file (IBM data explorer format) most likely written out by APBS, which contains the grid-based potential information for the same fixed solute in a medium without all of the explicit GCMC particles. It is important to have some agreement between the grid-size here and the size of the simulation system additinally considering the "toughness" of the given case (it might be more important to have a big APBS-grid for DNA than for a fullerene). This parameter is redundant, unless you have specified P for the macromolecule identifier.
    PQR_DX_REFERENCE_POTENTIAL_FILENAME
    As mentioned above you can give another file containing any kind of reference potential information (most often the identical system treated entirely implicitly). ISIM will then write out another grid-file with the deviations. This is truely optional.
    PQR_GRID_RESOLUTION
    The name is slightly misleading, as this parameter does not control the resolution of the analysis grid, but only the resolution of the rectangular volume calculation grid (same for all dimensions). It is redundant, unless you have specified P for the macromolecule identifier.
    PQR_OUTSIDE_MESH_MODE
    This can be very important. It tells the program, what to do if a point, for which it wants to estimate the potential contribution by the fixed solute, is outside the range defined by the APBS grid. Choice 0 simply assumes the potential to be zero everywhere off the mesh, choice 1 treats the macromolecule as one Debye-Hückel sphere with the net charge uniformly distributed over it and finally choice 2 has all atoms with its partial force field charges as Debye-Hückel spheres. The same choice has to be made (consistently) when setting up the APBS calculation. In general you usually want to select mode 1, because mode 0 is either a bad approximation or the same and choice 2 will seriously slow down the whole simulation already for very small macromolecules. The parameter is redundant, unless you have specified P for the macromolecule identifier.

What ISIM spits out ...

ISIM has the sometimes annoying capability of inevitably creating a lot of output files. Job monitoring files, energy files and number files are essentials (meaning that you cannot control their existence). In detail these files, which are visibly updated every 100th step, include for a input file named YOURFANCYJOBNAME.IN:
YOURFANCYJOBNAME.LOG
This the log-file for your current run. It usually tells you, how the program interpreted your input file and what the program is currently doing. If you get strange and unexpected results, you should probably check this file to search for a misunderstood or just wrong specification.
YOURFANCYJOBNAME.ERR
This the error file for your current run. It writes more or less clear error messages, on what went wrong within a run (e.g. missing files, specifications, ...). Unfortunately there are almost certainly hundreds of errors, which the program does not really recognize and which therefore lead to "undefined behaviour". In these cases write a bug report including the input file and both the log- and the error file to ????????????.
NUMBERS
A columnar output of the actual ion numbers of every present particle type in every single step. You can use it in both normal and in parametrization runs to monitor the development of the respective concentrations. This is particularly useful, since "explosive" developments of those due to bad input or an inappropriate model can slow down the simulation extremely.
MINIMIZATION_ENERGIES
An energy protocol of the first minimization, which is usually negligible. The data format is - identical to all other energy files - columnar with five columns for step number, total energy, inner electrolyte energy, steric interface energy and finally electrostatic interface energy all in kJ/mol (for the explanation of these terms refer to the input section).
GLOBAL_ENERGIES
This is an analagous energy protocol of the whole simulation in terms of total steps.
SHUFFLING_ENERGIES
This is also an analogous energy protocol, which in addition lists values for every moving cycle. The first column therefore contains fractional step numbers. The file is useful to get a rough idea of the effectiveness of both moving and creation/destruction attempts.

In a parametrization run there are only few relevant files. The two files for monitoring the job's progress are more important than in normal simulations, as a parametrization will always be a series of several single simulations within the framework of a processing scheme, which creates additional opportunities for input interpretation, mistakes and so on ... . The other ones include the following:

PRIMARY_RESULTS
What the parametrization algorithm does, is running a series of empty-box simulations for every particle type within every single specified concentration scheme as defined by the N-1 files NAME_SCHEME for every independent GCMC species, which spans an interval approximately symmetric around the number expectation value for the currently parametrized type (for details see the reference publication). In other words data points sufficient to fit the value for the excess chemical potential within the approximation of the decomposed treatment are collected. These primary results are written out in this file to allow the user to make manual refinements or just to identify suspicious results (the amount of noise in the fitted function varies strongly dependent on the particle parameters including concentration).
NEW_NAME_EXCESS
These N files for N types of GCMC particles are in the right format to be used in normal simulations (after renaming them). They represent the processed information of the file PRIMARY_RESULTS, i.e., after linear regression on the data points to obtain the best value.

Most of the other files within normal simulations are essentially the results of analysis functions and only meaningful in case this type of analysis has been performed properly (if applicable). The general concept is that the user makes selections determining the ouput frequency. Underlying restrictions are the choice of equilibration steps (at this point all averaged quantities are flushed) meaning that some files in the course of a simulation might contain little or no data and the irrepressible output of a standard set of files in the very last step (these might be redundant sometimes, but make sure that a run is less likely to be completely wasted). In detail the output includes:

GRID_POTENTIAL_xxxxxxx
This file contains the mean electrostatic potential due to the fixed solute and all explicit ions as calculated on the spherical or cylindrical analysis grid. It has four columns - three for the Cartesian coordinates in Å and the last one for the corresponding potential value in kJ/mol. The 'x's are the step number (preceded by '0's), for which the respective file was produced. The output frequency is controlled by the input parameter POTENTIAL_OUTPUT.
GRID_NUMBER_DENSITIES_xxxxxxx
A similar, N-columnar file for the individual number densities of all present GCMC particles in full spatial resolution (unit is particles/ų). It contains no explicit position information (solely given in the potential files). The output frequency is controlled by the input parameter NUMBER_DENSITIES_OUTPUT.
GRID_CHARGE_DENSITY_xxxxxxx
Another similar file with the only column being the result of charge-weighted integration of the number densities for all particle types (consequently in e/ų). The output frequency is controlled by the input parameter CHARGE_DENSITY_OUTPUT.
RADIAL_POTENTIAL_xxxxxxx
This file contains the radially averaged (either spherical or cylindrical around the symmetry axis) mean, electrostatic potential in a two-columnar fashion (distance in Å vs. potential value in kJ/mol). For more sophisticated problems the quantities averaged in this way are often of limited use. The output frequency is controlled by the input parameter POTENTIAL_OUTPUT.
RADIAL_NUMBER_DENSITIES_xxxxxxx
A similar file containing averaged number densities for individual GCMC particles in a (N+1)-columnar fashion (distance in Švs. N number densities in particles/ų). The output frequency is controlled by the input parameter NUMBER_DENSITIES_OUTPUT.
RADIAL_CHARGE_DENSITY_xxxxxxx
Another similar file containing the averaged charge density in a two-columnar fashion (distance in Švs. charge density in e/ų). The output frequency is controlled by the input parameter CHARGE_DENSITY_OUTPUT.
GFUNC_nnmm_ss_xxxxxxx
Pair correlation functions are usually collected for all possible combinations of GCMC particle types and consequently there are a lot of output files all following the same scheme. The 'nnmm' part is the combination code - for instance '0102' would mean that the file contains the information of particles of type '2' around particles of type '1'. 'ss' has to be interpreted as indicator of the simulation system shell with their number specified by the input parameter PAIR_CORRELATION_SHELLS. '00' stands for the inevitable, inner shell (see description), '01' is the first and possibly only outer shell. Finally the 'x's are consistent to the previous descriptions. The files are all two-columnar (distance in Å with a resolution specified in the corresponding input parameter vs. the dimensionless g-function values (normalized, radial distribution function)).
COORDINATES_xxxxxxx
Snapshot files for configurations of all GCMC particles listed by their coordinates (three columns) and type number (fourth column).
GRID_ACCESSIBILITY
A (digital) accessiblity map for the whole simulation system in full spatial resolution based on the gridpoints explicitly given in any of the grid-based potential files.
POTENTIAL_DEVIATION
In case of selection P for the input parameter MACRO_ID and a specification of PQR_DX_REFERENCE_POTENTIAL_FILENAME this file contains the differences between the GCMC simulation and the reference calculation (most often Poisson-Boltzmann study) in full spatial resolution based on the gridpoints explicitly given in any of the grid-based potential files.
pot.general
General data format import file for the IBM data explorer, which allows to read in the grid-based potential data file (see visualization section).
nd.general
Similar file for the grid-based number densities.
cd.general
Similar file for the grid-based charge density.
acc.general
Similar file for the accessibility map.
dev.general
Similar file for the potential deviation map.

How to create fancy pictures ...

The output files described in the previous section are generally in very simple formats. Therefore simple plot programs are usually sufficient to visualize two-dimensional data. The grid-based information needs more advanced visualization software, but there are no fundamental restrictions, since - if necessary - the data format can always be modified (scripts) to match individual requirements.

The most natural program to work with is the IBM data explorer (DX). However, it follows the common rule that an increase in flexibility always implies a decrease in comfort using it. Consequently this section covers some of the things to keep in mind when using the "visual programs" (working scripts in DX), which can be found in the subdirectoy "examples/visual/". All the *.general output files mentioned before allow DX to read in the grid-based data. If you look at these configuration files you will notice a line like

file = GRID_POTENTIAL_0150000
which is the pointer to the corresponding data file. By default this is the last output of its type (i.e., after the last step). If you want to use files created earlier in the simulation, just change this line; everything else in the file should be correct and sufficient, unless you have special needs like looking at selected parts of the data (grid). The only exception to this rule occurs if you want to add explicit coordinate representations (for example snapshots of ion configurations or selected atoms of a macromolecule) to your pictures, because there simply is no configuration file. In case you have the simple data format
x | y | z | val
where val is some value identifying the particle, whose position is specified by the x, y and z values, you would create a file yourname.general like:
file = /yourlocation/yourdatafile
points = N
format = ascii
interleaving = field
field = locations, ID
structure = 3-vector, scalar
type = float, float
dependency = positions, positions

end
Here N is the number of entries in your data file (for instance the number of ions in the snapshot file).

To use the visual programs in the example directory, switch to the simulation directory (with all the necessary files) and do either

cp ${TOP}/isim/examples/visual/basicisim.* .
in case you do not want to include explicit coordinate representations and
cp ${TOP}/isim/examples/visual/coordisim.* .
in case you do want to. Then you can simply run DX in your current working directory. Select Run Visual Programs... from the menu and choose the corresponding program in the file selection dialogue. This will start the automatic execution of the program with the default values. Most likely you might want to select Reset Server from the Connections menu in the master panel (stops execution). All the selections to be made in the program's main control panel are self-explaining and after choosing the desired values/modes you can just run Execute Once from the Execute menu. It might take some time (speed is of course strongly dependent on grid resolution) to create the images. Once they are there their appearance can be adjusted (zoom, rotation, ...) by using the View Control panel from the Options.

OF COURSE DX provides a lot more options and features, which are ignored here, although they might be useful. For very specific tasks it is also possible to write new modules (in C). Furthermore the visual programs are far from being optimized and are - just like this section - thought to allow users, who do not know DX, to visualize their results without getting too much in contact with the difficulties using DX.

What's not so nice about ISIM (currently) ...

Some comments about the limitations of the software and especially its underlying, theoretical assumptions certainly have to be made (the list might be rather disordered):
  1. Monte Carlo procedures to whatever they might be applied share some common aspects to keep in mind. Within the numeric limitations of randomness the series of snapshots generated by such an algorithm is non-deterministic and must not be misunderstood as a sequence in time. Nevertheless the gross (energy) criterion, which usually enters the procedure, assures that averaged quantities represent reasonable values for various system properties (e.g. it is perfectly valid to look at essentially microscopic properties such as pair correlation functions as long as the configuration space sampling is sufficient).
  2. The current concept of the program implies that equilibrated electrolyte solutions around biomolecules are studied. There is no dynamical aspect in the treatment. Extensions towards translational or rigid-body dynamics are imaginable, but require the implementation of separate algorithms and most likely the introduction of local control volumes. The latter concept was developed to study off-equilibrium systems (i.e., the necessarily equilibrating GCMC procedure is spatially restricted to certain bulk reservoirs, the remaining parts follow force-based dynamics like Brownian or molecular dynamics). All these comments hide major changes in the code, which are not straightforward.
  3. Using ISIM it should not be forgotten that the underlying model is extremely simple. The solvent is usually represented by the relative dielectric constant of the medium negelecting all explicit solvent effects, which are known to be important in numerous cases in molecular biology. Ions are uniformly charged spheres interacting via simple model potentials. Polarization effects, deviations from spherical symmetry, three- or more-particle interactions, or specific binding between ions and parts of the fixed solute are all completely iggnored. Still the so-called "primitive model" has proven to be useful in many cases.
  4. When parameterizing complicated mixtures of electrolytes the computer time increases significantly with the number of particle types and the number of concentration values you want to cover for each indepedent species. In an example with three species and four concentration values you would end up with 3· 4² = 48 single parameterization tasks.
  5. The droplet model implies that you get boundary effects, because the exterior of the simulation system is treated as vacuum. The missing attractive field usually lowers the particle density there. This is more important for higher concentrations and can be a problem in parameterization, because the scheme uses total ion numbers (which include that boundary effect) instead of actual bulk concentration as visible in radially averaged plots of number denisities (excluding it) to check, whether the concentration criterion is fulfilled. In other words: if you parameterize an electrolyte solution at 1M you will always end up overestimating bulk concentrations in "normal" simulation, as the same thing happened in the parameterization simulation, but the program "thought" the concentration would be fine by only looking at total numbers.
  6. Microscopic convergence can become a problem. Consider a protein with a tight ion binding site. The accessibility map will be generally complicated and especially because of the (very restrictive) hard-sphere interactions it might require a lot of random events to create or move an ion in(to) it. Still quantities like the total energy and particle numbers show no indication that the system is not converged.
  7. The problems handling neutral particles were mentioned at different points in one of the previous sections. In general it is possible that the code implicitly assumes that particles are charged, but the general design is flexible (those weaknesses are not yet eliminated, because we did not apply the code to neutral particles so far at all). Especially in case of simplified solvent molecules it might also become a problem that the efficiency of MC methods decreases with increasing particle density, although explicit solvent models have been studied via GCMC.