The general idea of the MonteCarlo class, and its subclasses, is to fit a classical potential energy surface (PES) to an ab-initio PES by optimizing the classical forcefield parameters. This forcefield optimization is conducted using the Addaptive Rate Monte Carlo (ARMC, 1) method described by S. Cosseddu et al in J. Chem. Theory Comput., 2017, 13, 297–308.

The implemented algorithm can be summarized as following:

## The algorithm

1. A trial state, $$S_{l}$$, is generated by moving a random parameter retrieved from a user-specified parameter set (e.g. atomic charge).

2. It is checked whether or not the trial state has been previously visited.

• If True, retrieve the previously calculated PES.

• If False, calculate a new PES with the generated parameters $$S_{l}$$.

(1)$p(k \leftarrow l) = \Biggl \lbrace { 1, \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \lt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) \atop 0, \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \gt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) }$
1. The move is accepted if the new set of parameters, $$S_{l}$$, lowers the auxiliary error ($$\Delta \varepsilon_{QM-MM}$$) with respect to the previous set of accepted parameters, $$S_{k}$$ (see (1)). Given a PES descriptor, $$r$$, consisting of a matrix with $$N$$ elements, the auxiliary error is defined in (2).

(2)$\Delta \varepsilon_{QM-MM} = \frac{ \sum_{i}^{N} \left| r_{i}^{QM} - r_{i}^{MM} \right |^2} {\sum_{i}^{N} r_{i}^{QM} }$
1. The parameter history is updated. Based on whether or not the new parameter set is accepted the auxiliary error of either $$S_{l}$$ or $$S_{k}$$ is increased by the variable $$\phi$$ (see (3)). In this manner, the underlying PES is continuously modified, preventing the optimizer from getting stuck in a (local) minima in the parameter space.

(3)$\Delta \varepsilon_{QM-MM} ( S_{k} ) + \phi \quad \text{if} \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \lt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) \atop \Delta \varepsilon_{QM-MM} ( S_{l} ) + \phi \quad \text{if} \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \gt \; \Delta \varepsilon_{QM-MM} ( S_{l} )$
1. The parameter $$\phi$$ is updated at regular intervals in order to maintain a constant acceptance rate, $$\alpha_{t}$$. This is illustrated in (4), where $$\phi$$ is updated the begining of every super-iteration $$\kappa$$. In this example the total number of iterations, $$\kappa \omega$$, is divided into $$\kappa$$ super- and $$\omega$$ sub-iterations.

(4)$\phi_{\kappa \omega} = \phi_{ ( \kappa - 1 ) \omega} * \gamma^{ \text{sgn} ( \alpha_{t} - \overline{\alpha}_{ ( \kappa - 1 ) }) } \quad \kappa = 1, 2, 3, ..., N$

## Parameters

param:
charge:
param: charge
constraints:
- '0.5 < Cd < 1.5'
- '-0.5 > Se > -1.5'
- '0 > O_1 > -1'
Cd: 0.9768
Se: -0.9768
O_1: -0.47041
frozen:
C_1: 0.4524
lennard_jones:
-   unit: kjmol
param: epsilon
Cd Cd: 0.3101
Se Se: 0.4266
Cd Se: 1.5225
Cd O_1: 1.8340
Se O_1: 1.6135
-   unit: nm
param: sigma
Cd Cd: 0.1234
Se Se: 0.4852
Cd Se: 0.2940
Cd O_1: 0.2471
Se O_1: 0.3526

psf:
str_file: ligand.str
ligand_atoms: [C, O, H]

pes:
rdf:
func: FOX.MultiMolecule.init_rdf
kwargs:
atom_subset: [Cd, Se, O]

job:
molecule: .../mol.xyz

geometry_opt:
template: qmflows.templates.geometry.specific.cp2k_mm
settings:
cell_parameters: [50, 50, 50]
prm: .../ligand.prm
md:
template: qmflows.templates.md.specific.cp2k_mm
settings:
cell_parameters: [50, 50, 50]
prm: .../ligand.prm


A comprehensive overview of all available input parameters is provided in Monte Carlo Parameters.

Once a the .yaml file with the ARMC settings has been sufficiently customized the parameter optimization can be started via the command prompt with: init_armc my_settings.yaml.

Previous caculations can be continued with init_armc my_settings.yaml --restart True.

## The pes block

Potential energy surface (PES) descriptors can be descriped in the pes block. Provided below is an example where the radial dsitribution function (RDF) is used as PES descriptor, more specifically the RDF constructed from all possible combinations of cadmium, selenium and oxygen atoms.

pes:
rdf:
func: FOX.MultiMolecule.init_rdf
kwarg:
atom_subset: [Cd, Se, O]


Depending on the system of interest it might be of interest to utilize a PES descriptor other than the RDF, or potentially even multiple PES descriptors. In the latter case the the total auxiliary error is defined as the sum of the auxiliary errors of all individual PES descriptors, $$R$$ (see (5)).

(5)$\Delta \varepsilon_{QM-MM} = \sum_{r}^{R} \Delta \varepsilon_{r}^{QM-MM}$

An example is provided below where both radial and angular distribution functions (RDF and ADF, respectively) are are used as PES descriptors. In this example the RDF is construced for all combinations of cadmium, selenium and oxygen atoms (Cd, Se & O), whereas the ADF is construced for all combinations of cadmium and selenium atoms (Cd & Se).

pes:
rdf:
func: FOX.MultiMolecule.init_rdf
kwargs:
atom_subset: [Cd, Se, O]

kwargs:
atom_subset: [Cd, Se]


In principle any function, class or method can be provided here, as type object, as long as the following requirements are fulfilled:

• The name of the block must consist of a user-specified string (rdf and adf in the example(s) above).

• The func key must contain a string representation of thee requested function, method or class. Auto-FOX will internally convert the string into a callable object.

• The supplied callable must be able to operate on NumPy arrays or instances of its FOX.MultiMolecule subclass.

• Keyword argument can be provided with the kwargs key. The kwargs key is entirely optional and can be skipped if desired.

An example of a custom, albit rather nonsensical, PES descriptor involving the numpy.sum() function is provided below:

pes:
numpy_sum:
func: numpy.sum
kwargs:
axis: 0


This .yaml input, given a MultiMolecule instance mol, is equivalent to:

>>> import numpy
>>> from FOX import MultiMolecule

>>> func = numpy.sum
>>> kwargs = {'axis': 0}

>>> mol = MultiMolecule(...)
>>> func(mol, **kwargs)


## The param block

param:
charge:
param: charge
constraints:
- Cs == -0.5 * Br
- 0 < Cs < 2
- 1 < Pb < 3
Cs: 1.000
Pb: 2.000
lennard_jones:
- param: epsilon
unit: kjmol
Cs Cs: 0.1882
Cs Pb: 0.7227
Pb Pb: 2.7740
- unit: nm
param: sigma
constraints: Cs Cs == Pb Pb
Cs Cs: 0.60
Cs Pb: 0.50
Pb Pb: 0.60


The block key in the .yaml input contains all user-specified to-be optimized parameters.

There are three critical (and two optional) components to the "param" block:

Together, these three components point to the appropiate path of the forcefield parameter(s) of interest. As of the moment, all bonded and non-bonded potentials implemented in CP2K can be accessed via this section of the input file. For example, the following input is suitable if one wants to optimize a torsion potential (starting from $$k = 10 \ kcal/mol$$) for all C-C-C-C bonds:

param:
torsion:
param: k
unit: kcalmol
C C C C: 10


Besides the three above-mentioned mandatory components, one can (optionally) supply the unit of the parameter and/or constrain its value to a certain range. When supplying units, it is the responsibility of the user to ensure the units are supported by CP2K.

param:
charge:
constraints:
- Cd == -2 * $LIGAND - 0 < Cd < 2 - -2 < Se < 0  Lastly, a number of constraints can be applied to the various parameters in the form of minima/maxima and fixed ratios. The special $LIGAND string can herein be used as an alias representing all atoms within a single ligand. For example, when the formate anion is used as ligand (O2CH), \$LIGAND is equivalent to 2 * O + C + H.

Note

The charge parameter is unique in that the total molecular charge is always constrained; it will remain constant with respect to the initial charge of the system. It is the users responsibiliy to ensure that the initial charge is actually integer.

## Parameter Guessing

param:
lennard_jones:
- unit: kjmol
param: epsilon
Cs Cs: 0.1882
Cs Pb: 0.7227
Pb Pb: 2.7740
guess: rdf
- unit: nm
param: sigma
frozen:
guess: uff

$V_{LJ} = 4 \varepsilon \left( \left( \frac{\sigma}{r} \right )^{12} - \left( \frac{\sigma}{r} \right )^6 \right )$

Non-bonded interactions (i.e. the Lennard-Jones $$\varepsilon$$ and $$\sigma$$ values) can be guessed if they’re not explicitly by the user. There are currently two implemented guessing procedures: "uff" and "rdf". Parameter guessing for parameters other than $$\varepsilon$$ and $$\sigma$$ is not supported as of the moment.

The "uff" approach simply takes all missing parameters from the Universal Force Field (UFF)[2]. Pair-wise parameters are construcetd using the standard combinatorial rules: the arithmetic mean for $$\sigma$$ and the geometric mean for $$\varepsilon$$.

The "rdf" approach utilizes the radial distribution function for estimating $$\sigma$$ and $$\varepsilon$$. $$\sigma$$ is taken as the base of the first RDF peak, while the first minimum of the Boltzmann-inverted RDF is taken as $$\varepsilon$$.

"crystal_radius" and "ion_radius" use a similar approach to "uff", the key difference being the origin of the parameters: 10.1107/S0567739476001551: R. D. Shannon, Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides, Acta Cryst. (1976). A32, 751-767. Note that:

• Values are averaged with respect to all charges and coordination numbers per atom type.

• These two guess-types can only be used for estimating $$\sigma$$ parameters.

If "guess" is placed within the "frozen" block, than the guessed parameters will be treated as constants rather than to-be optimized variables.

## State-averaged ARMC

...

molecule:
- /path/to/md_acetate.xyz
- /path/to/md_phosphate.xyz
- /path/to/md_sulfate.xyz

psf:
rtf_file:
- acetate.rtf
- phosphate.rtf
- sulfate.rtf
ligand_atoms: [S, P, O, C, H]

pes:
rdf:
func: FOX.MultiMolecule.init_rdf
kwargs:
- atom_subset: [Cd, Se, O]
- atom_subset: [Cd, Se, P, O]
- atom_subset: [Cd, Se, S, O]

...