Addaptive Rate Monte Carlo

The general idea of the MonteCarlo class, and its subclasses, is to fit a classical potential energy surface (PES) to an ab-initio PES by optimizing the classical forcefield parameters. This forcefield optimization is conducted using the Addaptive Rate Monte Carlo (ARMC, 1) method described by S. Cosseddu et al in J. Chem. Theory Comput., 2017, 13, 297–308.

The implemented algorithm can be summarized as following:

The algorithm

  1. A trial state, \(S_{l}\), is generated by moving a random parameter retrieved from a user-specified parameter set (e.g. atomic charge).
  2. It is checked whether or not the trial state has been previously visited.
    • If True, retrieve the previously calculated PES.
    • If False, calculate a new PES with the generated parameters \(S_{l}\).
(1)\[p(k \leftarrow l) = \Biggl \lbrace { 1, \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \lt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) \atop 0, \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \gt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) }\]
  1. The move is accepted if the new set of parameters, \(S_{l}\), lowers the auxiliary error (\(\Delta \varepsilon_{QM-MM}\)) with respect to the previous set of accepted parameters, \(S_{k}\) (see (1)). Given a PES descriptor, \(r\), consisting of a matrix with \(N\) elements, the auxiliary error is defined in (2).
(2)\[\Delta \varepsilon_{QM-MM} = \frac{ \sum_{i}^{N} \left| r_{i}^{QM} - r_{i}^{MM} \right |^2} {\sum_{i}^{N} r_{i}^{QM} }\]
  1. The parameter history is updated. Based on whether or not the new parameter set is accepted the auxiliary error of either \(S_{l}\) or \(S_{k}\) is increased by the variable \(\phi\) (see (3)). In this manner, the underlying PES is continuously modified, preventing the optimizer from getting stuck in a (local) minima in the parameter space.
(3)\[\Delta \varepsilon_{QM-MM} ( S_{k} ) + \phi \quad \text{if} \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \lt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) \atop \Delta \varepsilon_{QM-MM} ( S_{l} ) + \phi \quad \text{if} \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \gt \; \Delta \varepsilon_{QM-MM} ( S_{l} )\]
  1. The parameter \(\phi\) is updated at regular intervals in order to maintain a constant acceptance rate, \(\alpha_{t}\). This is illustrated in (4), where \(\phi\) is updated the begining of every super-iteration \(\kappa\). In this example the total number of iterations, \(\kappa \omega\), is divided into \(\kappa\) super- and \(\omega\) sub-iterations.
(4)\[\phi_{\kappa \omega} = \phi_{ ( \kappa - 1 ) \omega} * \gamma^{ \text{sgn} ( \alpha_{t} - \overline{\alpha}_{ ( \kappa - 1 ) }) } \quad \kappa = 1, 2, 3, ..., N\]

Parameters

param:
    charge:
        param: charge
        constraints:
            - '0.5 < Cd < 1.5'
            - '-0.5 > Se > -1.5'
            - '0 > O_1 > -1'
        Cd: 0.9768
        Se: -0.9768
        O_1: -0.47041
        frozen:
            C_1: 0.4524
    lennard_jones:
        -   unit: kjmol
            param: epsilon
            Cd Cd: 0.3101
            Se Se: 0.4266
            Cd Se: 1.5225
            Cd O_1: 1.8340
            Se O_1: 1.6135
        -   unit: nm
            param: sigma
            Cd Cd: 0.1234
            Se Se: 0.4852
            Cd Se: 0.2940
            Cd O_1: 0.2471
            Se O_1: 0.3526

psf:
    str_file: ligand.str
    ligand_atoms: [C, O, H]

pes:
    rdf:
        func: FOX.MultiMolecule.init_rdf
        kwargs:
            atom_subset: [Cd, Se, O]

job:
    molecule: .../mol.xyz

    geometry_opt:
        template: qmflows.templates.geometry.specific.cp2k_mm
        settings:
            cell_parameters: [50, 50, 50]
            prm: .../ligand.prm
    md:
        template: qmflows.templates.md.specific.cp2k_mm
        settings:
            cell_parameters: [50, 50, 50]
            prm: .../ligand.prm

A comprehensive overview of all available input parameters is provided in Monte Carlo Parameters.

Once a the .yaml file with the ARMC settings has been sufficiently customized the parameter optimization can be started via the command prompt with: init_armc my_settings.yaml.

Previous caculations can be continued with init_armc my_settings.yaml --restart True.

The pes block

Potential energy surface (PES) descriptors can be descriped in the pes block. Provided below is an example where the radial dsitribution function (RDF) is used as PES descriptor, more specifically the RDF constructed from all possible combinations of cadmium, selenium and oxygen atoms.

pes:
    rdf:
        func: FOX.MultiMolecule.init_rdf
        kwarg:
            atom_subset: [Cd, Se, O]

Depending on the system of interest it might be of interest to utilize a PES descriptor other than the RDF, or potentially even multiple PES descriptors. In the latter case the the total auxiliary error is defined as the sum of the auxiliary errors of all individual PES descriptors, \(R\) (see (5)).

(5)\[\Delta \varepsilon_{QM-MM} = \sum_{r}^{R} \Delta \varepsilon_{r}^{QM-MM}\]

An example is provided below where both radial and angular distribution functions (RDF and ADF, respectively) are are used as PES descriptors. In this example the RDF is construced for all combinations of cadmium, selenium and oxygen atoms (Cd, Se & O), whereas the ADF is construced for all combinations of cadmium and selenium atoms (Cd & Se).

pes:
    rdf:
        func: FOX.MultiMolecule.init_rdf
        kwargs:
            atom_subset: [Cd, Se, O]

    adf:
        func: FOX.MultiMolecule.init_adf
        kwargs:
            atom_subset: [Cd, Se]

In principle any function, class or method can be provided here, as type object, as long as the following requirements are fulfilled:

  • The name of the block must consist of a user-specified string (rdf and adf in the example(s) above).
  • The func key must contain a string representation of thee requested function, method or class. Auto-FOX will internally convert the string into a callable object.
  • The supplied callable must be able to operate on NumPy arrays or instances of its FOX.MultiMolecule subclass.
  • Keyword argument can be provided with the kwargs key. The kwargs key is entirely optional and can be skipped if desired.

An example of a custom, albit rather nonsensical, PES descriptor involving the numpy.sum() function is provided below:

pes:
  numpy_sum:
      func: numpy.sum
      kwargs:
          axis: 0

This .yaml input, given a MultiMolecule instance mol, is equivalent to:

>>> import numpy
>>> from FOX import MultiMolecule

>>> func = numpy.sum
>>> kwargs = {'axis': 0}

>>> mol = MultiMolecule(...)
>>> func(mol, **kwargs)

The param block

param:
    charge:
        param: charge
        constraints:
            - Cs == -0.5 * Br
            - 0 < Cs < 2
            - 1 < Pb < 3
        Cs: 1.000
        Pb: 2.000
    lennard_jones:
        - param: epsilon
          unit: kjmol
          Cs Cs: 0.1882
          Cs Pb: 0.7227
          Pb Pb: 2.7740
        - unit: nm
          param: sigma
          constraints: Cs Cs == Pb Pb
          Cs Cs: 0.60
          Cs Pb: 0.50
          Pb Pb: 0.60

The block key in the .yaml input contains all user-specified to-be optimized parameters.

There are three critical (and two optional) components to the "param" block:

Together, these three components point to the appropiate path of the forcefield parameter(s) of interest. As of the moment, all bonded and non-bonded potentials implemented in CP2K can be accessed via this section of the input file. For example, the following input is suitable if one wants to optimize a torsion potential (starting from \(k = 10 \ kcal/mol\)) for all C-C-C-C bonds:

param:
    torsion:
        param: k
        unit: kcalmol
        C C C C: 10

Besides the three above-mentioned mandatory components, one can (optionally) supply the unit of the parameter and/or constrain its value to a certain range. When supplying units, it is the responsibility of the user to ensure the units are supported by CP2K.

param:
    charge:
        constraints:
            - Cd == -2 * $LIGAND
            - 0 < Cd < 2
            - -2 < Se < 0

Lastly, a number of constraints can be applied to the various parameters in the form of minima/maxima and fixed ratios. The special $LIGAND string can herein be used as an alias representing all atoms within a single ligand. For example, when the formate anion is used as ligand (O2CH), $LIGAND is equivalent to 2 * O + C + H.

Note

The charge parameter is unique in that the total molecular charge is always constrained; it will remain constant with respect to the initial charge of the system. It is the users responsibiliy to ensure that the initial charge is actually integer.

Parameter Guessing

param:
    lennard_jones:
        - unit: kjmol
          param: epsilon
          Cs Cs: 0.1882
          Cs Pb: 0.7227
          Pb Pb: 2.7740
          guess: rdf
        - unit: nm
          param: sigma
          frozen:
              guess: uff
\[V_{LJ} = 4 \varepsilon \left( \left( \frac{\sigma}{r} \right )^{12} - \left( \frac{\sigma}{r} \right )^6 \right )\]

Non-bonded interactions (i.e. the Lennard-Jones \(\varepsilon\) and \(\sigma\) values) can be guessed if they’re not explicitly by the user. There are currently two implemented guessing procedures: "uff" and "rdf". Parameter guessing for parameters other than \(\varepsilon\) and \(\sigma\) is not supported as of the moment.

The "uff" approach simply takes all missing parameters from the Universal Force Field (UFF)[2]. Pair-wise parameters are construcetd using the standard combinatorial rules: the arithmetic mean for \(\sigma\) and the geometric mean for \(\varepsilon\).

The "rdf" approach utilizes the radial distribution function for estimating \(\sigma\) and \(\varepsilon\). \(\sigma\) is taken as the base of the first RDF peak, while the first minimum of the Boltzmann-inverted RDF is taken as \(\varepsilon\).

"crystal_radius" and "ion_radius" use a similar approach to "uff", the key difference being the origin of the parameters: 10.1107/S0567739476001551: R. D. Shannon, Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides, Acta Cryst. (1976). A32, 751-767. Note that:

  • Values are averaged with respect to all charges and coordination numbers per atom type.
  • These two guess-types can only be used for estimating \(\sigma\) parameters.

If "guess" is placed within the "frozen" block, than the guessed parameters will be treated as constants rather than to-be optimized variables.

State-averaged ARMC

...

molecule:
    - /path/to/md_acetate.xyz
    - /path/to/md_phosphate.xyz
    - /path/to/md_sulfate.xyz

psf:
    rtf_file:
        - acetate.rtf
        - phosphate.rtf
        - sulfate.rtf
    ligand_atoms: [S, P, O, C, H]

pes:
    rdf:
        func: FOX.MultiMolecule.init_rdf
        kwargs:
            - atom_subset: [Cd, Se, O]
            - atom_subset: [Cd, Se, P, O]
            - atom_subset: [Cd, Se, S, O]

...