Addaptive Rate Monte Carlo¶
The general idea of the MonteCarlo class, and its subclasses, is to fit a classical potential energy surface (PES) to an ab-initio PES by optimizing the classical forcefield parameters. This forcefield optimization is conducted using the Addaptive Rate Monte Carlo (ARMC, 1) method described by S. Cosseddu et al in J. Chem. Theory Comput., 2017, 13, 297–308.
The implemented algorithm can be summarized as following:
The algorithm¶
- A trial state, \(S_{l}\), is generated by moving a random parameter retrieved from a user-specified parameter set (e.g. atomic charge).
- It is checked whether or not the trial state has been previously visited.
- If
True
, retrieve the previously calculated PES. - If
False
, calculate a new PES with the generated parameters \(S_{l}\).
- If
- The move is accepted if the new set of parameters, \(S_{l}\), lowers the auxiliary error (\(\Delta \varepsilon_{QM-MM}\)) with respect to the previous set of accepted parameters, \(S_{k}\) (see (1)). Given a PES descriptor, \(r\), consisting of a matrix with \(N\) elements, the auxiliary error is defined in (2).
- The parameter history is updated. Based on whether or not the new parameter set is accepted the auxiliary error of either \(S_{l}\) or \(S_{k}\) is increased by the variable \(\phi\) (see (3)). In this manner, the underlying PES is continuously modified, preventing the optimizer from getting stuck in a (local) minima in the parameter space.
- The parameter \(\phi\) is updated at regular intervals in order to maintain a constant acceptance rate, \(\alpha_{t}\). This is illustrated in (4), where \(\phi\) is updated the begining of every super-iteration \(\kappa\). In this example the total number of iterations, \(\kappa \omega\), is divided into \(\kappa\) super- and \(\omega\) sub-iterations.
Arguments¶
Parameter | Default | Parameter description |
---|---|---|
param.prm_file | The path+filename of a CHARMM parameter file. | |
param.charge | A dictionary with atoms and matching atomic charges. | |
param.epsilon | A dictionary with atom-pairs and the matching Lennard-Jones \(\epsilon\) parameter. | |
param.sigma | A dictionary with atom-pairs and the matching Lennard-Jones \(\sigma\) parameter. | |
psf.str_file | The path+filename to one or more stream file; used for assigning atom types and charges to ligands. | |
psf.rtf_file | The path+filename to one or more MATCH-produced rtf file; used for assigning atom types and charges to ligands. | |
psf.psf_file | The path+filename to one or more psf files; used for assigning atom types and charges to ligands. | |
psf.ligand_atoms | All atoms within a ligand, used for defining residues. | |
pes | A dictionary holding one or more functions for constructing PES descriptors. | |
molecule | A list of one or more MultiMolecule instances or .xyz filenames of a reference PES. |
|
job.logfile | armc.log | The path+filename for the to-be created PLAMS logfile. |
job.job_type | scm.plams.Cp2kJob | The job type, see Job. |
job.name | armc | The base name of the various molecular dynamics jobs. |
job.path | . | The base path for storing the various molecular dynamics jobs. |
job.folder | MM_MD_workdir | The name of the to-be created directory for storing all molecular dynamics jobs. |
job.keepfiles | False | Whether the raw MD results should be saved or deleted. |
job.md_settings | A dictionary with the MD job settings. Alternativelly, the filename of YAML file can be supplied. | |
job.preopt_setting | A dictionary of geometry preoptimization job settings. Suplemented by job.md_settings. | |
hdf5_file | ARMC.hdf5 | The filename of the to-be created HDF5 file with all ARMC results. |
armc.iter_len | 50000 | The total number of ARMC iterations \(\kappa \omega\). |
armc.sub_iter_len | 100 | The length of each ARMC subiteration \(\omega\). |
armc.gamma | 2.0 | The constant \(\gamma\), see (4). |
armc.a_target | 0.25 | The target acceptance rate \(\alpha_{t}\), see (4). |
armc.phi | 1.0 | The initial value of the variable \(\phi\), see (3) and (4). |
move.range.start | 0.005 | Controls the minimum stepsize of Monte Carlo moves. |
move.range.stop | 0.1 | Controls the maximum stepsize of Monte Carlo moves. |
move.range.step | 0.005 | Controls the allowed stepsize values between the minima and maxima. |
Once a the .yaml file with the ARMC settings has been sufficiently customized
the parameter optimization can be started via the command prompt with:
init_armc my_settings.yaml
.
Previous caculations can be continued with init_armc my_settings.yaml --restart True
.
The pes block¶
Potential energy surface (PES) descriptors can be descriped in the "pes"
block.
Provided below is an example where the radial dsitribution function (RDF) is
used as PES descriptor, more specifically the RDF constructed from all possible
combinations of cadmium, selenium and oxygen atoms.
pes:
rdf:
func: FOX.MultiMolecule.init_rdf
kwarg:
atom_subset: [Cd, Se, O]
Depending on the system of interest it might be of interest to utilize a PES descriptor other than the RDF, or potentially even multiple PES descriptors. In the latter case the the total auxiliary error is defined as the sum of the auxiliary errors of all individual PES descriptors, \(R\) (see (5)).
An example is provided below where both radial and angular distribution functions (RDF and ADF, respectively) are are used as PES descriptors. In this example the RDF is construced for all combinations of cadmium, selenium and oxygen atoms (Cd, Se & O), whereas the ADF is construced for all combinations of cadmium and selenium atoms (Cd & Se).
pes:
rdf:
func: FOX.MultiMolecule.init_rdf
args: []
kwargs:
atom_subset: [Cd, Se, O]
adf:
func: FOX.MultiMolecule.init_adf
args: []
kwargs:
atom_subset: [Cd, Se]
In principle any function, class or method can be provided here, as type object, as long as the following requirements are fulfilled:
- The name of the block must consist of a user-specified string
(
"rdf"
and"adf"
in the example(s) above). - The
"func"
key must contain a string representation of thee requested function, method or class. Auto-FOX will internally convert the string into a callable object. - The supplied callable must be able to operate on NumPy arrays or
instances of its
MultiMolecule
subclass. - Arguments and keyword argument can be provided with the
"args"
and"kwargs"
keys, respectively. The"args"
and"kwargs"
keys are entirely optional and can be skipped if desired.
An example of a custom, albit rather nonsensical, PES descriptor involving the numpy.sum function is provided below:
pes:
numpy_sum:
func: numpy.sum
kwargs:
axis: 0
This .yaml input, given a MultiMolecule
instance mol
, is equivalent to:
>>> import numpy
>>> func = numpy.sum
>>> args = []
>>> kwargs = {'axis': 0}
>>> func(mol, *arg, **kwarg)
The param block¶
param:
charge:
keys: [input, force_eval, mm, forcefield, charge]
constraints:
- 0 < Cs < 2
- 1 < Pb < 3
- Cs == 0.5 * Br
Cs: 1.000
Pb: 2.000
epsilon:
unit: kjmol
keys: [input, force_eval, mm, forcefield, nonbonded, lennard-jones]
Cs Cs: 0.1882
Cs Pb: 0.7227
Pb Pb: 2.7740
sigma:
unit: nm
keys: [input, force_eval, mm, forcefield, nonbonded, lennard-jones]
constraints: 'Cs Cs == Pb Pb'
Cs Cs: 0.60
Cs Pb: 0.50
Pb Pb: 0.60
The "param"
key in the .yaml input contains all user-specified
to-be optimized parameters.
There are three critical (and two optional) components to the "param"
block:
- The key of each block (charge, epsilon & sigma).
- The
"keys"
sub-block, which points to the section path in the CP2K settings (e.g. [‘input’, ‘force_eval’, ‘mm’, ‘forcefield’, ‘charge’]).- The sub-blocks containing either singular atoms or atom pairs.
Together, these three components point to the appropiate path of the forcefield parameter(s) of interest. As of the moment, all bonded and non-bonded potentials implemented in CP2K can be accessed via this section of the input file. For example, the following input is suitable if one wants to optimize a torsion potential (starting from \(k = 10 \ kcal/mol\)) for all C-C-C-C bonds:
param:
k:
keys: [input, force_eval, mm, forcefield, torsion]
unit: kcalmol
C C C C: 10
Besides the three above-mentioned mandatory components, one can
(optionally) supply the unit of the parameter and/or constrain
its value to a certain range.
When supplying units, it is the responsibility of the user to ensure
the units are supported by CP2K.
Furthermore, parameter constraints are, as of the moment, limited to specifying
minimum and/or maximum values (e.g. 0 < Cs < 2
).
Additional (more elaborate) constrainst are currently already available for
atomic charges in the move.charge_constraints
block (see below).
Parameter Guessing¶
param:
epsilon:
unit: kjmol
keys: [input, force_eval, mm, forcefield, nonbonded, lennard-jones]
Cs Cs: 0.1882
Cs Pb: 0.7227
Pb Pb: 2.7740
guess: rdf
sigma:
unit: nm
keys: [input, force_eval, mm, forcefield, nonbonded, lennard-jones]
frozen:
guess: uff
Non-bonded interactions (i.e. the Lennard-Jones \(\varepsilon\) and
\(\sigma\) values) can be guessed if they’re not explicitly by the user.
There are currently two implemented guessing procedures: "uff"
and
"rdf"
.
Parameter guessing for parameters other than \(\varepsilon\) and
\(\sigma\) is not supported as of the moment.
The "uff"
approach simply takes all missing parameters from
the Universal Force Field (UFF)[2].
Pair-wise parameters are construcetd using the standard combinatorial rules:
the arithmetic mean for \(\sigma\) and the geometric mean for
\(\varepsilon\).
The "rdf"
approach utilizes the radial distribution function for
estimating \(\sigma\) and \(\varepsilon\).
\(\sigma\) is taken as the base of the first RDF peak,
while the first minimum of the Boltzmann-inverted RDF is taken as
\(\varepsilon\).
"crystal_radius"
and "ion_radius"
use a similar approach to "uff"
,
the key difference being the origin of the parameters:
10.1107/S0567739476001551:
R. D. Shannon, Revised effective ionic radii and systematic studies of
interatomic distances in halides and chalcogenides, Acta Cryst. (1976). A32, 751-767.
Note that:
- Values are averaged with respect to all charges and coordination numbers per atom type.
- These two guess-types can only be used for estimating \(\sigma\) parameters.
If "guess"
is placed within the "frozen"
block, than the guessed
parameters will be treated as constants rather than to-be optimized variables.
Note
The guessing procedure requires the presence of both a .prm and .psf file.
See the "prm_file"
and "psf"
blocks, respectively.
State-averaged ARMC¶
...
molecule:
- /path/to/md_acetate.xyz
- /path/to/md_phosphate.xyz
- /path/to/md_sulfate.xyz
psf:
rtf_file:
- acetate.rtf
- phosphate.rtf
- sulfate.rtf
ligand_atoms: [S, P, O, C, H]
pes:
rdf:
func: FOX.MultiMolecule.init_rdf
kwargs:
- atom_subset: [Cd, Se, O]
- atom_subset: [Cd, Se, P, O]
- atom_subset: [Cd, Se, S, O]
...
FOX.MonteCarlo API¶
-
class
FOX.classes.monte_carlo.
MonteCarlo
(molecule, param, md_settings, preopt_settings=None, rmsd_threshold=5.0, job_type=<class 'scm.plams.interfaces.thirdparty.cp2k.Cp2kJob'>, hdf5_file='ARMC.hdf5', apply_move=<ufunc 'multiply'>, move_range=None, keep_files=False, logger=None, pes_post_process=None)[source]¶ The base
MonteCarlo
class.-
property
molecule
¶ Get value or set value as a tuple of MultiMolecule instances.
Return type: Tuple
[MultiMolecule
, …]
-
property
md_settings
¶ Get value or set value as a plams.Settings instance.
Return type: Tuple
[Settings
, …]
-
property
preopt_settings
¶ Get value or set value as a plams.Settings instance.
Return type: Tuple
[Settings
, …]
-
property
move_range
¶ Get value or set value as a np.ndarray.
Return type: ndarray
-
values
()[source]¶ Return a view of
MonteCarlo.history_dict
’s values.Return type: ValuesView
-
add_pes_evaluator
(name, func, args=(), kwargs=mappingproxy({}))[source]¶ Add a callable to this instance for constructing PES-descriptors.
Examples
>>> from FOX import MonteCarlo, MultiMolecule >>> mc = MonteCarlo(...) >>> mol = MultiMolecule.from_xyz(...) # Prepare arguments >>> name = 'rdf' >>> func = FOX.MultiMolecule.init_rdf >>> atom_subset = ['Cd', 'Se', 'O'] # Keyword argument for func # Add the PES-descriptor constructor >>> mc.add_pes_evaluator(name, func, kwargs={'atom_subset': atom_subset})
Parameters: - name (str) – The name under which the PES-descriptor will be stored (e.g.
"RDF"
). - func (Callable) – The callable for constructing the PES-descriptor. The callable should take an array-like object as input and return a new array-like object as output.
- args (
Sequence
) – A sequence of positional arguments. - kwargs (
dict
orIterable
[dict
]) – A dictionary or an iterable of dictionaries with keyword arguments. Providing an iterable allows one to use a unique set of keyword arguments for each molecule inMonteCarlo.molecule
.
Return type: - name (str) – The name under which the PES-descriptor will be stored (e.g.
-
move
()[source]¶ Update a random parameter in self.param by a random value from self.move.range.
Performs in inplace update of the
'param'
column in self.param. By default the move is applied in a multiplicative manner. self.job.md_settings and self.job.preopt_settings are updated to reflect the change in parameters.Examples
>>> print(armc.param['param']) charge Br -0.731687 Cs 0.731687 epsilon Br Br 1.045000 Cs Br 0.437800 Cs Cs 0.300000 sigma Br Br 0.421190 Cs Br 0.369909 Cs Cs 0.592590 Name: param, dtype: float64 >>> for _ in range(1000): # Perform 1000 random moves >>> armc.move() >>> print(armc.param['param']) charge Br -0.597709 Cs 0.444592 epsilon Br Br 0.653053 Cs Br 1.088848 Cs Cs 1.025769 sigma Br Br 0.339293 Cs Br 0.136361 Cs Cs 0.101097 Name: param, dtype: float64
Returns: A tuple with the (new) values in the 'param'
column of self.param.Return type: tuple [float]
-
clip_move
(idx, value)[source]¶ Ensure that value falls within a user-specified range.
Return type: float
-
run_md
()[source]¶ Run a geometry optimization followed by a molecular dynamics (MD) job.
Returns a new
MultiMolecule
instance constructed from the MD trajectory and the path to the MD results. If no trajectory is available (i.e. the job crashed) return None instead.- The MD job is constructed according to the provided settings in self.job.
Returns: A list of MultiMolecule
instance(s) constructed from the MD trajectory & a list of paths to the PLAMS results directories. TheMultiMolecule
list is replaced withNone
if the job crashes.Return type: FOX.MultiMolecule and tuple [str]
-
clear_job_cache
()[source]¶ Clear
MonteCarlo.job_cache
and, optionally, delete all cp2k output files.Return type: None
-
get_pes_descriptors
(key, get_first_key=False)[source]¶ Check if a key is already present in history_dict.
If
True
, return the matching list of PES descriptors; IfFalse
, construct and return a new list of PES descriptors.- The PES descriptors are constructed by the provided settings in self.pes.
Parameters: Returns: A previous value from history_dict or a new value from an MD calculation & a
MultiMolecule
instance constructed from the MD simulation. Values are set tonp.inf
if the MD job crashed.Return type: dict [str, np.ndarray [np.float64]) and FOX.MultiMolecule
-
property
FOX.ARMC API¶
-
class
FOX.classes.armc.
ARMC
(iter_len=50000, sub_iter_len=100, gamma=200, a_target=0.25, phi=1.0, apply_phi=<ufunc 'add'>, **kwargs)[source]¶ The Addaptive Rate Monte Carlo class (
ARMC
).A subclass of
MonteCarlo
.Parameters: - iter_len (int) – The total number of ARMC iterations \(\kappa \omega\).
- sub_iter_len (int) – The length of each ARMC subiteration \(\omega\).
- gamma (float) – The constant \(\gamma\).
- a_target (float) – The target acceptance rate \(\alpha_{t}\).
- phi (float) – The variable \(\phi\).
- apply_phi (Callable) – The callable used for applying \(\phi\) to the auxiliary error. The callable should be able to take 2 floats as argument and return a new float.
- **kwargs (|Any|_) – Keyword arguments for the
MonteCarlo
superclass.
-
classmethod
from_yaml
(filename)[source]¶ Create a
ARMC
instance from a .yaml file.Parameters: filename (str) – The path+filename of a .yaml file containing all ARMC
settings.Returns: A new ARMC
instance and a dictionary with keyword arguments forrun_armc()
.Return type: FOX.ARMC and dict
-
to_yaml
(filename, logfile=None, path=None, folder=None)[source]¶ Convert an
ARMC
instance into a .yaml readable byARMC.from_yaml
.Parameters: filename ( str
,bytes
,os.pathlike
orio.IOBase
) – A filename or a file-like object.Return type: None
-
do_inner
(kappa, omega, acceptance, key_old)[source]¶ Run the inner loop of the
ARMC.__call__()
method.Parameters: - kappa (int) – The super-iteration, \(\kappa\), in
ARMC.__call__()
. - omega (int) – The sub-iteration, \(\omega\), in
ARMC.__call__()
. - history_dict (dict [tuple [float], np.ndarray [np.float64]]) – A dictionary with parameters as keys and a list of PES descriptors as values.
- key_new (tuple [float]) – A tuple with the latest set of forcefield parameters.
Returns: The latest set of parameters.
Return type: - kappa (int) – The super-iteration, \(\kappa\), in
-
get_aux_error
(pes_dict)[source]¶ Return the auxiliary error \(\Delta \varepsilon_{QM-MM}\).
The auxiliary error is constructed using the PES descriptors in values with respect to self.ref.
The default function is equivalent to:
\[\Delta \varepsilon_{QM-MM} = \frac{ \sum_{i}^{N} |r_{i}^{QM} - r_{i}^{MM}|^2 } {r_{i}^{QM}}\]Parameters: pes_dict ([dict [str, np.ndarray [np.float64]]) – An dictionary with \(m*n\) PES descriptors each. Returns: An array with \(m*n\) auxilary errors Return type: \(m*n\) np.ndarray [np.float64]
-
update_phi
(acceptance)[source]¶ Update the variable \(\phi\).
\(\phi\) is updated based on the target accepatance rate, \(\alpha_{t}\), and the acceptance rate, acceptance, of the current super-iteration.
- The values are updated according to the provided settings in self.armc.
The default function is equivalent to:
\[\phi_{\kappa \omega} = \phi_{ ( \kappa - 1 ) \omega} * \gamma^{ \text{sgn} ( \alpha_{t} - \overline{\alpha}_{ ( \kappa - 1 ) }) }\]Parameters: acceptance (np.ndarray [bool]) – A 1D boolean array denoting the accepted moves within a sub-iteration. Return type: None