The general idea of the MonteCarlo class, and its subclasses, is to fit a classical potential energy surface (PES) to an ab-initio PES by optimizing the classical forcefield parameters. This forcefield optimization is conducted using the Addaptive Rate Monte Carlo (ARMC, 1) method described by S. Cosseddu et al in J. Chem. Theory Comput., 2017, 13, 297–308.

The implemented algorithm can be summarized as following:

## The algorithm¶

1. A trial state, $$S_{l}$$, is generated by moving a random parameter retrieved from a user-specified parameter set (e.g. atomic charge).
2. It is checked whether or not the trial state has been previously visited.
• If True, retrieve the previously calculated PES.
• If False, calculate a new PES with the generated parameters $$S_{l}$$.
(1)$p(k \leftarrow l) = \Biggl \lbrace { 1, \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \lt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) \atop 0, \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \gt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) }$
1. The move is accepted if the new set of parameters, $$S_{l}$$, lowers the auxiliary error ($$\Delta \varepsilon_{QM-MM}$$) with respect to the previous set of accepted parameters, $$S_{k}$$ (see (1)). Given a PES descriptor, $$r$$, consisting of a matrix with $$N$$ elements, the auxiliary error is defined in (2).
(2)$\Delta \varepsilon_{QM-MM} = \frac{ \sum_{i}^{N} \left| r_{i}^{QM} - r_{i}^{MM} \right |^2} {\sum_{i}^{N} r_{i}^{QM} }$
1. The parameter history is updated. Based on whether or not the new parameter set is accepted the auxiliary error of either $$S_{l}$$ or $$S_{k}$$ is increased by the variable $$\phi$$ (see (3)). In this manner, the underlying PES is continuously modified, preventing the optimizer from getting stuck in a (local) minima in the parameter space.
(3)$\Delta \varepsilon_{QM-MM} ( S_{k} ) + \phi \quad \text{if} \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \lt \; \Delta \varepsilon_{QM-MM} ( S_{l} ) \atop \Delta \varepsilon_{QM-MM} ( S_{l} ) + \phi \quad \text{if} \quad \Delta \varepsilon_{QM-MM} ( S_{k} ) \; \gt \; \Delta \varepsilon_{QM-MM} ( S_{l} )$
1. The parameter $$\phi$$ is updated at regular intervals in order to maintain a constant acceptance rate, $$\alpha_{t}$$. This is illustrated in (4), where $$\phi$$ is updated the begining of every super-iteration $$\kappa$$. In this example the total number of iterations, $$\kappa \omega$$, is divided into $$\kappa$$ super- and $$\omega$$ sub-iterations.
(4)$\phi_{\kappa \omega} = \phi_{ ( \kappa - 1 ) \omega} * \gamma^{ \text{sgn} ( \alpha_{t} - \overline{\alpha}_{ ( \kappa - 1 ) }) } \quad \kappa = 1, 2, 3, ..., N$

## Arguments¶

Parameter Default Parameter description
param.prm_file
The path+filename of a CHARMM parameter file.
param.charge
A dictionary with atoms and matching atomic charges.
param.epsilon
A dictionary with atom-pairs and the matching Lennard-Jones $$\epsilon$$ parameter.
param.sigma
A dictionary with atom-pairs and the matching Lennard-Jones $$\sigma$$ parameter.
psf.str_file
The path+filename to one or more stream file; used for assigning atom types and charges to ligands.
psf.rtf_file
The path+filename to one or more MATCH-produced rtf file; used for assigning atom types and charges to ligands.
psf.psf_file
The path+filename to one or more psf files; used for assigning atom types and charges to ligands.
psf.ligand_atoms
All atoms within a ligand, used for defining residues.
pes
A dictionary holding one or more functions for constructing PES descriptors.
molecule
A list of one or more MultiMolecule instances or .xyz filenames of a reference PES.
job.logfile armc.log The path+filename for the to-be created PLAMS logfile.
job.job_type scm.plams.Cp2kJob The job type, see Job.
job.name armc The base name of the various molecular dynamics jobs.
job.path . The base path for storing the various molecular dynamics jobs.
job.folder MM_MD_workdir The name of the to-be created directory for storing all molecular dynamics jobs.
job.keepfiles False Whether the raw MD results should be saved or deleted.
job.md_settings
A dictionary with the MD job settings. Alternativelly, the filename of YAML file can be supplied.
job.preopt_setting
A dictionary of geometry preoptimization job settings. Suplemented by job.md_settings.
hdf5_file ARMC.hdf5 The filename of the to-be created HDF5 file with all ARMC results.
armc.iter_len 50000 The total number of ARMC iterations $$\kappa \omega$$.
armc.sub_iter_len 100 The length of each ARMC subiteration $$\omega$$.
armc.gamma 2.0 The constant $$\gamma$$, see (4).
armc.a_target 0.25 The target acceptance rate $$\alpha_{t}$$, see (4).
armc.phi 1.0 The initial value of the variable $$\phi$$, see (3) and (4).
move.range.start 0.005 Controls the minimum stepsize of Monte Carlo moves.
move.range.stop 0.1 Controls the maximum stepsize of Monte Carlo moves.
move.range.step 0.005 Controls the allowed stepsize values between the minima and maxima.

Once a the .yaml file with the ARMC settings has been sufficiently customized the parameter optimization can be started via the command prompt with: init_armc my_settings.yaml.

Previous caculations can be continued with init_armc my_settings.yaml --restart True.

## The pes block¶

Potential energy surface (PES) descriptors can be descriped in the "pes" block. Provided below is an example where the radial dsitribution function (RDF) is used as PES descriptor, more specifically the RDF constructed from all possible combinations of cadmium, selenium and oxygen atoms.

pes:
rdf:
func: FOX.MultiMolecule.init_rdf
kwarg:
atom_subset: [Cd, Se, O]


Depending on the system of interest it might be of interest to utilize a PES descriptor other than the RDF, or potentially even multiple PES descriptors. In the latter case the the total auxiliary error is defined as the sum of the auxiliary errors of all individual PES descriptors, $$R$$ (see (5)).

(5)$\Delta \varepsilon_{QM-MM} = \sum_{r}^{R} \Delta \varepsilon_{r}^{QM-MM}$

An example is provided below where both radial and angular distribution functions (RDF and ADF, respectively) are are used as PES descriptors. In this example the RDF is construced for all combinations of cadmium, selenium and oxygen atoms (Cd, Se & O), whereas the ADF is construced for all combinations of cadmium and selenium atoms (Cd & Se).

pes:
rdf:
func: FOX.MultiMolecule.init_rdf
args: []
kwargs:
atom_subset: [Cd, Se, O]

args: []
kwargs:
atom_subset: [Cd, Se]


In principle any function, class or method can be provided here, as type object, as long as the following requirements are fulfilled:

• The name of the block must consist of a user-specified string ("rdf" and "adf" in the example(s) above).
• The "func" key must contain a string representation of thee requested function, method or class. Auto-FOX will internally convert the string into a callable object.
• The supplied callable must be able to operate on NumPy arrays or instances of its MultiMolecule subclass.
• Arguments and keyword argument can be provided with the "args" and "kwargs" keys, respectively. The "args" and "kwargs" keys are entirely optional and can be skipped if desired.

An example of a custom, albit rather nonsensical, PES descriptor involving the numpy.sum function is provided below:

pes:
numpy_sum:
func: numpy.sum
kwargs:
axis: 0


This .yaml input, given a MultiMolecule instance mol, is equivalent to:

>>> import numpy

>>> func = numpy.sum
>>> args = []
>>> kwargs = {'axis': 0}

>>> func(mol, *arg, **kwarg)


## The param block¶

param:
charge:
keys: [input, force_eval, mm, forcefield, charge]
constraints:
-  0 < Cs < 2
-  1 < Pb < 3
-  Cs == 0.5 * Br
Cs: 1.000
Pb: 2.000
epsilon:
unit: kjmol
keys: [input, force_eval, mm, forcefield, nonbonded, lennard-jones]
Cs Cs: 0.1882
Cs Pb: 0.7227
Pb Pb: 2.7740
sigma:
unit: nm
keys: [input, force_eval, mm, forcefield, nonbonded, lennard-jones]
constraints: 'Cs Cs == Pb Pb'
Cs Cs: 0.60
Cs Pb: 0.50
Pb Pb: 0.60


The "param" key in the .yaml input contains all user-specified to-be optimized parameters.

There are three critical (and two optional) components to the "param" block:

Together, these three components point to the appropiate path of the forcefield parameter(s) of interest. As of the moment, all bonded and non-bonded potentials implemented in CP2K can be accessed via this section of the input file. For example, the following input is suitable if one wants to optimize a torsion potential (starting from $$k = 10 \ kcal/mol$$) for all C-C-C-C bonds:

param:
k:
keys: [input, force_eval, mm, forcefield, torsion]
unit: kcalmol
C C C C: 10


Besides the three above-mentioned mandatory components, one can (optionally) supply the unit of the parameter and/or constrain its value to a certain range. When supplying units, it is the responsibility of the user to ensure the units are supported by CP2K. Furthermore, parameter constraints are, as of the moment, limited to specifying minimum and/or maximum values (e.g. 0 < Cs < 2). Additional (more elaborate) constrainst are currently already available for atomic charges in the move.charge_constraints block (see below).

## Parameter Guessing¶

param:
epsilon:
unit: kjmol
keys: [input, force_eval, mm, forcefield, nonbonded, lennard-jones]
Cs Cs: 0.1882
Cs Pb: 0.7227
Pb Pb: 2.7740
guess: rdf
sigma:
unit: nm
keys: [input, force_eval, mm, forcefield, nonbonded, lennard-jones]
frozen:
guess: uff

$V_{LJ} = 4 \varepsilon \left( \left( \frac{\sigma}{r} \right )^{12} - \left( \frac{\sigma}{r} \right )^6 \right )$

Non-bonded interactions (i.e. the Lennard-Jones $$\varepsilon$$ and $$\sigma$$ values) can be guessed if they’re not explicitly by the user. There are currently two implemented guessing procedures: "uff" and "rdf". Parameter guessing for parameters other than $$\varepsilon$$ and $$\sigma$$ is not supported as of the moment.

The "uff" approach simply takes all missing parameters from the Universal Force Field (UFF)[2]. Pair-wise parameters are construcetd using the standard combinatorial rules: the arithmetic mean for $$\sigma$$ and the geometric mean for $$\varepsilon$$.

The "rdf" approach utilizes the radial distribution function for estimating $$\sigma$$ and $$\varepsilon$$. $$\sigma$$ is taken as the base of the first RDF peak, while the first minimum of the Boltzmann-inverted RDF is taken as $$\varepsilon$$.

"crystal_radius" and "ion_radius" use a similar approach to "uff", the key difference being the origin of the parameters: 10.1107/S0567739476001551: R. D. Shannon, Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides, Acta Cryst. (1976). A32, 751-767. Note that:

• Values are averaged with respect to all charges and coordination numbers per atom type.
• These two guess-types can only be used for estimating $$\sigma$$ parameters.

If "guess" is placed within the "frozen" block, than the guessed parameters will be treated as constants rather than to-be optimized variables.

Note

The guessing procedure requires the presence of both a .prm and .psf file. See the "prm_file" and "psf" blocks, respectively.

## State-averaged ARMC¶

...

molecule:
- /path/to/md_acetate.xyz
- /path/to/md_phosphate.xyz
- /path/to/md_sulfate.xyz

psf:
rtf_file:
- acetate.rtf
- phosphate.rtf
- sulfate.rtf
ligand_atoms: [S, P, O, C, H]

pes:
rdf:
func: FOX.MultiMolecule.init_rdf
kwargs:
- atom_subset: [Cd, Se, O]
- atom_subset: [Cd, Se, P, O]
- atom_subset: [Cd, Se, S, O]

...


## FOX.MonteCarlo API¶

class FOX.classes.monte_carlo.MonteCarlo(molecule, param, md_settings, preopt_settings=None, rmsd_threshold=5.0, job_type=<class 'scm.plams.interfaces.thirdparty.cp2k.Cp2kJob'>, hdf5_file='ARMC.hdf5', apply_move=<ufunc 'multiply'>, move_range=None, keep_files=False, logger=None, pes_post_process=None)[source]

The base MonteCarlo class.

property job_name

Get the (lowered) name of MonteCarlo.job_type.

Return type: str
property molecule

Get value or set value as a tuple of MultiMolecule instances.

Return type: Tuple[MultiMolecule, …]
property md_settings

Get value or set value as a plams.Settings instance.

Return type: Tuple[Settings, …]
property preopt_settings

Get value or set value as a plams.Settings instance.

Return type: Tuple[Settings, …]
property move_range

Get value or set value as a np.ndarray.

Return type: ndarray
property logger

Get or set the logger.

Return type: Logger
keys()[source]

Return a view of MonteCarlo.history_dict’s keys.

Return type: KeysView
items()[source]

Return a view of MonteCarlo.history_dict’s items.

Return type: ItemsView
values()[source]

Return a view of MonteCarlo.history_dict’s values.

Return type: ValuesView
add_pes_evaluator(name, func, args=(), kwargs=mappingproxy({}))[source]

Add a callable to this instance for constructing PES-descriptors.

Examples

>>> from FOX import MonteCarlo, MultiMolecule

>>> mc = MonteCarlo(...)
>>> mol = MultiMolecule.from_xyz(...)

# Prepare arguments
>>> name = 'rdf'
>>> func = FOX.MultiMolecule.init_rdf
>>> atom_subset = ['Cd', 'Se', 'O']  # Keyword argument for func


Parameters: name (str) – The name under which the PES-descriptor will be stored (e.g. "RDF"). func (Callable) – The callable for constructing the PES-descriptor. The callable should take an array-like object as input and return a new array-like object as output. args (Sequence) – A sequence of positional arguments. kwargs (dict or Iterable [dict]) – A dictionary or an iterable of dictionaries with keyword arguments. Providing an iterable allows one to use a unique set of keyword arguments for each molecule in MonteCarlo.molecule. None
move()[source]

Update a random parameter in self.param by a random value from self.move.range.

Performs in inplace update of the 'param' column in self.param. By default the move is applied in a multiplicative manner. self.job.md_settings and self.job.preopt_settings are updated to reflect the change in parameters.

Examples

>>> print(armc.param['param'])
charge   Br      -0.731687
Cs       0.731687
epsilon  Br Br    1.045000
Cs Br    0.437800
Cs Cs    0.300000
sigma    Br Br    0.421190
Cs Br    0.369909
Cs Cs    0.592590
Name: param, dtype: float64

>>> for _ in range(1000):  # Perform 1000 random moves
>>>     armc.move()

>>> print(armc.param['param'])
charge   Br      -0.597709
Cs       0.444592
epsilon  Br Br    0.653053
Cs Br    1.088848
Cs Cs    1.025769
sigma    Br Br    0.339293
Cs Br    0.136361
Cs Cs    0.101097
Name: param, dtype: float64

Returns: A tuple with the (new) values in the 'param' column of self.param. tuple [float]
clip_move(idx, value)[source]

Ensure that value falls within a user-specified range.

Return type: float
run_md()[source]

Run a geometry optimization followed by a molecular dynamics (MD) job.

Returns a new MultiMolecule instance constructed from the MD trajectory and the path to the MD results. If no trajectory is available (i.e. the job crashed) return None instead.

• The MD job is constructed according to the provided settings in self.job.
Returns: A list of MultiMolecule instance(s) constructed from the MD trajectory & a list of paths to the PLAMS results directories. The MultiMolecule list is replaced with None if the job crashes. FOX.MultiMolecule and tuple [str]
clear_job_cache()[source]

Clear MonteCarlo.job_cache and, optionally, delete all cp2k output files.

Return type: None
get_pes_descriptors(key, get_first_key=False)[source]

Check if a key is already present in history_dict.

If True, return the matching list of PES descriptors; If False, construct and return a new list of PES descriptors.

• The PES descriptors are constructed by the provided settings in self.pes.
Parameters: key (tuple [float]) – A key in history_dict. get_first_key (bool) – Keep the both the files and the job_cache if this is the first ARMC iteration. Usefull for manual inspection in case cp2k hard-crashes at this point. A previous value from history_dict or a new value from an MD calculation & a MultiMolecule instance constructed from the MD simulation. Values are set to np.inf if the MD job crashed.

## FOX.ARMC API¶

class FOX.classes.armc.ARMC(iter_len=50000, sub_iter_len=100, gamma=200, a_target=0.25, phi=1.0, apply_phi=<ufunc 'add'>, **kwargs)[source]

The Addaptive Rate Monte Carlo class (ARMC).

A subclass of MonteCarlo.

Parameters: iter_len (int) – The total number of ARMC iterations $$\kappa \omega$$. sub_iter_len (int) – The length of each ARMC subiteration $$\omega$$. gamma (float) – The constant $$\gamma$$. a_target (float) – The target acceptance rate $$\alpha_{t}$$. phi (float) – The variable $$\phi$$. apply_phi (Callable) – The callable used for applying $$\phi$$ to the auxiliary error. The callable should be able to take 2 floats as argument and return a new float. **kwargs (|Any|_) – Keyword arguments for the MonteCarlo superclass.
property super_iter_len

Get ARMC.iter_len // ARMC.sub_iter_len.

Return type: int
classmethod from_yaml(filename)[source]

Create a ARMC instance from a .yaml file.

Parameters: filename (str) – The path+filename of a .yaml file containing all ARMC settings. A new ARMC instance and a dictionary with keyword arguments for run_armc(). FOX.ARMC and dict
to_yaml(filename, logfile=None, path=None, folder=None)[source]

Convert an ARMC instance into a .yaml readable by ARMC.from_yaml.

Parameters: filename (str, bytes, os.pathlike or io.IOBase) – A filename or a file-like object. None
do_inner(kappa, omega, acceptance, key_old)[source]

Run the inner loop of the ARMC.__call__() method.

Parameters: kappa (int) – The super-iteration, $$\kappa$$, in ARMC.__call__(). omega (int) – The sub-iteration, $$\omega$$, in ARMC.__call__(). history_dict (dict [tuple [float], np.ndarray [np.float64]]) – A dictionary with parameters as keys and a list of PES descriptors as values. key_new (tuple [float]) – A tuple with the latest set of forcefield parameters. The latest set of parameters.
get_aux_error(pes_dict)[source]

Return the auxiliary error $$\Delta \varepsilon_{QM-MM}$$.

The auxiliary error is constructed using the PES descriptors in values with respect to self.ref.

The default function is equivalent to:

$\Delta \varepsilon_{QM-MM} = \frac{ \sum_{i}^{N} |r_{i}^{QM} - r_{i}^{MM}|^2 } {r_{i}^{QM}}$
Parameters: pes_dict ([dict [str, np.ndarray [np.float64]]) – An dictionary with $$m*n$$ PES descriptors each. An array with $$m*n$$ auxilary errors $$m*n$$ np.ndarray [np.float64]
update_phi(acceptance)[source]

Update the variable $$\phi$$.

$$\phi$$ is updated based on the target accepatance rate, $$\alpha_{t}$$, and the acceptance rate, acceptance, of the current super-iteration.

• The values are updated according to the provided settings in self.armc.

The default function is equivalent to:

$\phi_{\kappa \omega} = \phi_{ ( \kappa - 1 ) \omega} * \gamma^{ \text{sgn} ( \alpha_{t} - \overline{\alpha}_{ ( \kappa - 1 ) }) }$
Parameters: acceptance (np.ndarray [bool]) – A 1D boolean array denoting the accepted moves within a sub-iteration. None
restart()[source]

Restart a previously started Addaptive Rate Monte Carlo procedure.

Return type: None