Recipes

Various recipes implemented in Auto-FOX.

FOX.recipes.param A set of functions for analyzing and plotting ARMC results.
FOX.recipes.psf A set of functions for creating .psf files.
FOX.recipes.ligands A set of functions for analyzing ligands.
FOX.recipes.time_resolution A set of functions for calculating time-resolved distribution functions.

FOX.recipes.param

A set of functions for analyzing and plotting ARMC results.

Examples

A general overview of the functions within this module.

>>> import pandas as pd
>>> from FOX.recipes import get_best, overlay_descriptor, plot_descriptor

>>> hdf5_file: str = ...

>>> param: pd.Series = get_best(hdf5_file, name='param')  # Extract the best parameters
>>> rdf: pd.DataFrame = get_best(hdf5_file, name='rdf')  # Extract the matching RDF

# Compare the RDF to its reference RDF and plot
>>> rdf_dict = overlay_descriptor(hdf5_file, name='rdf')
>>> plot_descriptor(rdf_dict)
_images/rdf.png

Examples

A small workflow for calculating for calculating free energies using distribution functions such as the radial distribution function (RDF).

>>> import pandas as pd
>>> from FOX import get_free_energy
>>> from FOX.recipes import get_best, overlay_descriptor, plot_descriptor

>>> hdf5_file: str = ...

>>> rdf: pd.DataFrame = get_best(hdf5_file, name='rdf')
>>> G: pd.DataFrame = get_free_energy(rdf, unit='kcal/mol')

>>> rdf_dict = overlay_descriptor(hdf5_file, name='rdf)
>>> G_dict = {key: get_free_energy(value) for key, value in rdf_dict.items()}
>>> plot_descriptor(G_dict)
_images/G_rdf.png

Examples

A workflow for plotting parameters as a function of ARMC iterations.

>>> import numpy as np
>>> import pandas as pd
>>> from FOX import from_hdf5
>>> from FOX.recipes import plot_descriptor

>>> hdf5_file: str = ...

>>> param: pd.DataFrame = from_hdf5(hdf5_file, 'param')
>>> param.index.name = 'ARMC iteration'
>>> param_dict = {key: param[key] for key in param.columns.levels[0]}

>>> plot_descriptor(param_dict)
_images/param.png

This approach can also be used for the plotting of other properties such as the auxiliary error.

>>> ...

>>> err: pd.DataFrame = from_hdf5(hdf5_file, 'aux_error')
>>> err.index.name = 'ARMC iteration'
>>> err_dict = {'Auxiliary Error': err}

>>> plot_descriptor(err_dict)
_images/err.png

On occasion it might be desirable to only print the error of, for example, accepted iterations. Given a sequence of booleans (bool_seq), one can slice a DataFrame or Series (df) using df.loc[bool_seq].

>>> ...

>>> acceptance: np.ndarray = from_hdf5(hdf5_file, 'acceptance')  # Boolean array
>>> err_slice_dict = {key: df.loc[acceptance], value for key, df in err_dict.items()}

>>> plot_descriptor(err_slice_dict)

Index

get_best(hdf5_file, name[, i, err_dset]) Return the PES descriptor or ARMC property which yields the lowest error.
overlay_descriptor(hdf5_file[, name, i, …]) Return the PES descriptor which yields the lowest error and overlay it with the reference PES descriptor.
plot_descriptor(descriptor[, show_fig, …]) Plot a DataFrame or iterable consisting of one or more DataFrames.

API

FOX.recipes.get_best(hdf5_file, name, i=0, err_dset='aux_error')[source]

Return the PES descriptor or ARMC property which yields the lowest error.

Parameters:
  • hdf5_file (str) – The path+filename of the ARMC .hdf5 file.
  • name (str) – The name of the PES descriptor, e.g. "rdf". Alternatively one can supply an ARMC property such as "acceptance", "param" or "aux_error".
  • i (int) – The index of the desired PES. Only relevant for PES-descriptors of state-averaged ARMCs.
  • err_dset (str) – The name of the dataset containing the errors. Generally speaking one should pick either "aux_error" or "validation/aux_error".
Returns:

A DataFrame of the optimal PES descriptor or other (user-specified) ARMC property.

Return type:

pandas.DataFrame or pd.Series

FOX.recipes.overlay_descriptor(hdf5_file, name='rdf', i=0, err_dset='aux_error')[source]

Return the PES descriptor which yields the lowest error and overlay it with the reference PES descriptor.

Parameters:
  • hdf5_file (str) – The path+filename of the ARMC .hdf5 file.
  • name (str) – The name of the PES descriptor, e.g. "rdf".
  • i (int) – The index of desired PES. Only relevant for state-averaged ARMCs.
  • err_dset (str) – The name of the dataset containing the errors. Generally speaking one should pick either "aux_error" or "validation/aux_error".
Returns:

A dictionary of DataFrames. Values consist of DataFrames with two keys: "MM-MD" and "QM-MD". Atom pairs, such as "Cd Cd", are used as keys.

Return type:

dict [str, pandas.DataFrame]

FOX.recipes.plot_descriptor(descriptor, show_fig=True, kind='line', sharex=True, sharey=False, **kwargs)[source]

Plot a DataFrame or iterable consisting of one or more DataFrames.

Requires the matplotlib package.

Parameters:
Returns:

A matplotlib Figure.

Return type:

Figure

See also

get_best()
Return the PES descriptor or ARMC property which yields the lowest error.
overlay_descriptor()
Return the PES descriptor which yields the lowest error and overlay it with the reference PES descriptor.

FOX.recipes.psf

A set of functions for creating .psf files.

Examples

Example code for generating a .psf file. Ligand atoms within the ligand .xyz file and the qd .xyz file should be in the exact same order. For example, implicit hydrogen atoms added by the from_smiles functions are not guaranteed to be ordered, even when using canonical SMILES strings.

>>> from scm.plams import Molecule, from_smiles
>>> from FOX import PSFContainer
>>> from FOX.recipes import generate_psf

# Accepts .xyz, .pdb, .mol or .mol2 files
>>> qd = Molecule(...)
>>> ligand: Molecule = Molecule(...)
>>> rtf_file : str = ...
>>> psf_file : str = ...

>>> psf: PSFContainer = generate_psf(qd_xyz, ligand_xyz, rtf_file=rtf_file)
>>> psf.write(psf_file)

Examples

If no ligand .xyz is on hand, or its atoms are in the wrong order, it is possible the extract the ligand directly from the quantum dot. This is demonstrated below with oleate (\(C_{18} H_{33} O_{2}^{-}\)).

>>> from scm.plams import Molecule
>>> from FOX import PSFContainer
>>> from FOX.recipes import generate_psf, extract_ligand

>>> qd = Molecule(...)  # Accepts an .xyz, .pdb, .mol or .mol2 file
>>> rtf_file : str = ...

>>> ligand_len = 18 + 33 + 2
>>> ligand_atoms = {'C', 'H', 'O'}
>>> ligand: Molecule = extract_ligand(qd, ligand_len, ligand_atoms)

>>> psf: PSFContainer = generate_psf(qd, ligand, rtf_file=rtf_file)
>>> psf.write(...)

Examples

Example for multiple ligands.

>>> from typing import List
>>> from scm.plams import Molecule
>>> from FOX import PSFContainer
>>> from FOX.recipes import generate_psf2

>>> qd = Molecule(...)  # Accepts an .xyz, .pdb, .mol or .mol2 file
>>> ligands = ('C[O-]', 'CC[O-]', 'CCC[O-]')
>>> rtf_files = (..., ..., ...)

>>> psf: PSFContainer = generate_psf2(qd, *ligands, rtf_file=rtf_files)
>>> psf.write(...)

If the the psf construction with generate_psf2() failes to identify a particular ligand, it is possible to return all (failed) potential ligands with the ret_failed_lig parameter.

>>> ...

>>> ligands = ('CCCCCCCCC[O-]', 'CCCCBr')
>>> failed_mol_list: List[Molecule] = generate_psf2(qd, *ligands, ret_failed_lig=True)

Index

generate_psf(qd, ligand[, rtf_file, str_file]) Generate a PSFContainer instance for qd.
generate_psf2(qd, *ligands[, rtf_file, …]) Generate a PSFContainer instance for qd with multiple different ligands.
extract_ligand(qd, ligand_len, ligand_atoms) Extract a single ligand from qd.

API

FOX.recipes.generate_psf(qd, ligand, rtf_file=None, str_file=None)[source]

Generate a PSFContainer instance for qd.

Parameters:
  • qd (str or Molecule) – The ligand-pacifated quantum dot. Should be supplied as either a Molecule or .xyz file.
  • ligand (str or Molecule) – A single ligand. Should be supplied as either a Molecule or .xyz file.
  • rtf_file (str, optional) – The path+filename of the ligand’s .rtf file. Used for assigning atom types. Alternativelly, one can supply a .str file with the str_file argument.
  • str_file (str, optional) – The path+filename of the ligand’s .str file. Used for assigning atom types. Alternativelly, one can supply a .rtf file with the rtf_file argument.
Returns:

A PSFContainer instance with the new .psf file.

Return type:

PSFContainer

FOX.recipes.generate_psf2(qd, *ligands, rtf_file=None, str_file=None, ret_failed_lig=False)[source]

Generate a PSFContainer instance for qd with multiple different ligands.

Parameters:
  • qd (str or Molecule) – The ligand-pacifated quantum dot. Should be supplied as either a Molecule or .xyz file.
  • *ligands (str, Molecule or Chem.Mol) – One or more PLAMS/RDkit Molecules and/or SMILES strings representing ligands.
  • rtf_file (str or Iterable [str], optional) – The path+filename of the ligand’s .rtf files. Filenames should be supplied in the same order as ligands. Used for assigning atom types. Alternativelly, one can supply a .str file with the str_file argument.
  • str_file (str or Iterable [str], optional) – The path+filename of the ligand’s .str files. Filenames should be supplied in the same order as ligands. Used for assigning atom types. Alternativelly, one can supply a .rtf file with the rtf_file argument.
  • ret_failed_lig (bool) – If True, return a list of all failed (potential) ligands if the function cannot identify any ligands within a certain range. Usefull for debugging. If False, raise a MoleculeError.
Returns:

A single ligand Molecule.

Return type:

Molecule

Raises:

MoleculeError – Raised if the function fails to identify any ligands within a certain range. If ret_failed_lig = True, return a list of failed (potential) ligands instead and issue a warning.

FOX.recipes.extract_ligand(qd, ligand_len, ligand_atoms)[source]

Extract a single ligand from qd.

Parameters:
  • qd (str or Molecule) – The ligand-pacifated quantum dot. Should be supplied as either a Molecule or .xyz file.
  • ligand_len (int) – The number of atoms within a single ligand.
  • ligand_atoms (str or Iterable [str]) – One or multiple strings with the atomic symbols of all atoms within a single ligand.
Returns:

A single ligand Molecule.

Return type:

Molecule

FOX.recipes.ligands

A set of functions for analyzing ligands.

Examples

An example for generating a ligand center of mass RDF.

>>> import numpy as np
>>> import pandas as pd
>>> from FOX import MultiMolecule, example_xyz
>>> from FOX.recipes import get_lig_center

>>> mol = MultiMolecule.from_xyz(example_xyz)
>>> start = 123  # Start of the ligands
>>> step = 4  # Size of the ligands

# Add dummy atoms to the ligand-center of mass and calculate the RDF
>>> lig_centra: np.ndarray = get_lig_center(mol, start, step)
>>> mol_new: MultiMolecule = mol.add_atoms(lig_centra, symbols='Xx')
>>> rdf: pd.DataFrame = mol_new.init_rdf(atom_subset=['Xx'])
_images/ligand_rdf.png

Or the ADF.

>>> ...

>>> adf: pd.DataFrame = mol_new.init_rdf(atom_subset=['Xx'], r_max=np.inf)
_images/ligand_adf.png

Or the potential of mean force (i.e. Boltzmann-inverted RDF).

>>> ...

>>> from scipy import constants
>>> from scm.plams import Units

>>> RT: float = 298.15 * constants.Boltzmann
>>> kj_to_kcal: float = Units.conversion_ratio('kj/mol', 'kcal/mol')

>>> with np.errstate(divide='ignore'):
>>>     rdf_invert: pd.DataFrame = -RT * np.log(rdf) * kj_to_kcal
>>>     rdf_invert[rdf_invert == np.inf] = np.nan  # Set all infinities to not-a-number
_images/ligand_rdf_inv.png

Focus on a specific ligand subset is possible by slicing the new ligand Cartesian coordinate array.

>>> ...

>>> keep_lig = [0, 1, 2, 3]  # Keep these ligands; disgard the rest
>>> lig_centra_subset = lig_centra[:, keep_lig]

# Add dummy atoms to the ligand-center of mass and calculate the RDF
>>> mol_new2: MultiMolecule = mol.add_atoms(lig_centra_subset, symbols='Xx')
>>> rdf: pd.DataFrame = mol_new2.init_rdf(atom_subset=['Xx'])
_images/ligand_rdf_subset.png

Examples

An example for generating a ligand center of mass RDF from a quantum dot with multiple unique ligands. A .psf file will herein be used as starting point.

>>> import numpy as np
>>> from FOX import PSFContainer, MultiMolecule, group_by_values
>>> from FOX.recipes import get_multi_lig_center

>>> mol = MultiMolecule.from_xyz(...)
>>> psf = PSFContainer.read(...)

# Gather the indices of each ligand
>>> idx_dict: dict = group_by_values(enumerate(psf.residue_id, start=1))
>>> del idx_dict[1]  # Delete the core

# Use the .psf segment names as symbols
>>> symbols = [psf.segment_name[i].iloc[0] for i in idx_dict.values()]

# Add dummy atoms to the ligand-center of mass and calculate the RDF
>>> lig_centra: np.ndarray = get_multi_lig_center(mol, idx_dict.values())
>>> mol_new: MultiMolecule = mol.add_atoms(lig_centra, symbols=symbols)
>>> rdf = mol_new.init_rdf(atom_subset=set(symbols))

Index

get_lig_center(mol, start, step[, stop, …]) Return an array with the (mass-weighted) mean position of each ligands in mol.
get_multi_lig_center(mol, idx_iter[, …]) Return an array with the (mass-weighted) mean position of each ligands in mol.

API

FOX.recipes.get_lig_center(mol, start, step, stop=None, mass_weighted=True)[source]

Return an array with the (mass-weighted) mean position of each ligands in mol.

Parameters:
  • mol (MultiMolecule) – A MultiMolecule instance.
  • start (int) – The atomic index of the first ligand atoms.
  • step (int) – The number of atoms per ligand.
  • stop (int, optional) – Can be used for neglecting any ligands beyond a user-specified atomic index.
  • mass_weighted (bool) – If True, return the mass-weighted mean ligand position rather than its unweighted counterpart.
Returns:

A new array with the ligand’s centra of mass. If mol.shape == (m, n, 3) then, given k new ligands, the to-be returned array’s shape is (m, k, 3).

Return type:

numpy.ndarray

FOX.recipes.get_multi_lig_center(mol, idx_iter, mass_weighted=True)[source]

Return an array with the (mass-weighted) mean position of each ligands in mol.

Contrary to get_lig_center(), this function can handle molecules with multiple non-unique ligands.

Parameters:
  • mol (MultiMolecule) – A MultiMolecule instance.
  • idx_iter (Iterable [Sequence [int]]) – An iterable consisting of integer sequences. Each integer sequence represents a single ligand (by its atomic indices).
  • mass_weighted (bool) – If True, return the mass-weighted mean ligand position rather than its unweighted counterpart.
Returns:

A new array with the ligand’s centra of mass. If mol.shape == (m, n, 3) then, given k new ligands (aka the length of idx_iter) , the to-be returned array’s shape is (m, k, 3).

Return type:

numpy.ndarray

FOX.recipes.time_resolution

A set of functions for calculating time-resolved distribution functions.

Index

time_resolved_rdf(mol[, start, stop, step]) Calculate the time-resolved radial distribution function (RDF).
time_resolved_adf(mol[, start, stop, step]) Calculate the time-resolved angular distribution function (ADF).

API

FOX.recipes.time_resolved_rdf(mol, start=0, stop=None, step=500, **kwargs)[source]

Calculate the time-resolved radial distribution function (RDF).

Examples

>>> from FOX import MultiMolecule, example_xyz
>>> from FOX.recipes import time_resolved_rdf

# Calculate each RDF over the course of 500 frames
>>> time_step = 500
>>> mol = MultiMolecule.from_xyz(example_xyz)

>>> rdf_list = time_resolved_rdf(
...     mol, step=time_step, atom_subset=['Cd', 'Se']
... )
Parameters:
  • mol (MultiMolecule) – The trajectory in question.
  • start (int) – The initial frame.
  • stop (int, optional) – The final frame. Set to None to iterate over all frames.
  • step (int) – The number of frames per individual RDF. Note that lower step values will result in increased numerical noise.
  • **kwargs (Any) – Further keyword arguments for init_rdf().
Returns:

A list of dataframes, each containing an RDF calculated over the course of step frames.

Return type:

List[pandas.DataFrame]

See also

init_rdf()
Calculate the radial distribution function.
FOX.recipes.time_resolved_adf(mol, start=0, stop=None, step=500, **kwargs)[source]

Calculate the time-resolved angular distribution function (ADF).

Examples

>>> from FOX import MultiMolecule, example_xyz
>>> from FOX.recipes import time_resolved_adf

# Calculate each ADF over the course of 500 frames
>>> time_step = 500
>>> mol = MultiMolecule.from_xyz(example_xyz)

>>> rdf_list = time_resolved_adf(
...     mol, step=time_step, atom_subset=['Cd', 'Se']
... )
Parameters:
  • mol (MultiMolecule) – The trajectory in question.
  • start (int) – The initial frame.
  • stop (int, optional) – The final frame. Set to None to iterate over all frames.
  • step (int) – The number of frames per individual RDF. Note that lower step values will result in increased numerical noise.
  • **kwargs (Any) – Further keyword arguments for init_adf().
Returns:

A list of dataframes, each containing an ADF calculated over the course of step frames.

Return type:

List[pandas.DataFrame]

See also

init_adf()
Calculate the angular distribution function.