Ward, A. Agrawal, A. Choudhary, and C. Wolverton, “A general-purpose machine learning framework for predicting properties of inorganic materials,” npj Computational Materials, vol. 2, no. 1, Aug. 2016.
For more information regarding the python version of Magpie, please see https://github.com/ramv2/magpie_python.
The chemml.chem.magpie_python module includes (please click on links adjacent to function names for more information):
The APE, as defined by Laws et al. [1], is determined based
on the ideal and actual ratio between the central and shell atoms of an
atomic cluster with a certain number of atoms. Often, the packing
efficiency is described as a ratio between these two quantities:
The ideal ratio is determined based on the ratio between the size of a
central atom and the neighboring atoms such that the packing around the
central atom is maximized. These optimal ratios for clusters for
different numbers of atoms have been tabulated by Miracle et al. [2].
The actual ratio is computed by dividing the radius of the central atom
by the average of the central atoms.
We currently use this framework to create two types of features:
Distance to nearest clusters with a packing efficiency better than a
certain threshold. If there are fewer than a requested number of
efficiently packed clusters in an alloy system, the average is taken to
be the average distance to all of the clusters. These features are
designed to measure the availability of efficiently-packed atomic
configurations in the liquid.
Mean packing efficiency of the system assuming that the composition of
the first nearest neighbor shell is equal to the composition of the
system. Each atom type is surrounded by the number of atoms that
maximizes the packing efficiency. As shown in recent work by Laws et al.
[1], bulk metallic glasses are known to form when the clusters around all
types of atom have the same composition as the alloy and are efficiently
packed. We compute the average APE for each atom in the system, under this
assumption, and the average deviation from perfect packing.
This algorithm currently evaluates all possible clusters provided a list
of elements. As the number of clusters scales with N!, the runtime of
this algorithm scales with N!.
For now, we only search for clusters with up to 7 atoms (max_n_types) in
order to avoid this combinatorial problem. In practice, the algorithm
picks the top 7 alloys with the highest fractions. While not idea,
this might work in practice since most alloys have fewer than 7 main
components. Many alloys >10 elements in the specification, but many are
impurities that may not be present in large enough amounts to really
affect the determination of efficiently packed clusters.
model for bulk metallic glasses,” Nature Communications, vol. 6, p. 8123,
Sep. 2015.
.. [2] D. B. Miracle, E. A. Lord, and S. Ranganathan, “Candidate Atomic
Cluster Configurations in Metallic Glass Structures,” MATERIALS
TRANSACTIONS, vol. 47, no. 7, pp. 1737 – 1742, 2006.
.. [3] D. B. Miracle, D. V. Louzguine-Luzgin, L. V. Louzguina-Luzgina,
and A. Inoue, “An assessment of binary metallic glasses: correlations
between structure, glass forming ability and stability,” International
Materials Reviews, vol. 55, no. 4, pp. 218–256, Jul. 2010.
Ids of the elements from which clusters are composed. A list of
int values.
clustersarray-like
Clusters to convert. List of identity shell compositions for each
type of central atom. Ex: clusters[1][2] is an array defining the
number of atoms of each type for clusters with an atom of type 1
in the center. A list containing a list of int values.
This algorithm finds the number of atoms in the shell such that the
APE of the cluster is closest to 1. Note: This calculation assumes
that sites in the first nearest-neighbor shell can be
partially-occupied.
A list of efficiently packed structures for each atom type as the
central atom. Ex: x[0][1] is the 2nd efficiently packed cluster
with atom type 0 as the central atom. A list containing a list of
int values.
Function compute the maximum and minimum possible cluster sizes,
given a list of radii.
The smallest possible cluster has the smallest
atom in the center and the largest in the outside. The largest
possible has the largest in the inside and the smallest in the outside.
Class to generate attributes based on the Atomic Property Weighted
Radial Distribution Function (AP-RDF) approach of Fernandez et al. [1].
User can specify the cutoff distance for the AP-RDF, the number of points
to evaluate it, the smoothing factors for the RDF peaks, and the
properties used for weighting. The recommended values of these
parameters have yet to be determined, please contact Logan Ward or the
authors of this paper if you have questions or ideas for these parameters.
Weighted Radial Distribution Functions Descriptors of Metal–Organic
Frameworks for the Prediction of Gas Uptake Capacity,” The Journal of
Physical Chemistry C, vol. 117, no. 27, pp. 14095–14105, Jul. 2013.
These features are based on the formal charges of materials determined
using the OxidationStateGuesser. Currently implemented features:
Statistics of formal charges (min, max, range, mean, variance)
Cumulative ionization energies/ electron affinities
Difference in electronegativities between cation and anion.
For materials that the algorithm fails to find charge states, NaN is set
for all features.
“Predicting density functional theory total energies and enthalpies of
formation of metal-nonmetal compounds by linear regression,” Physical
Review B, vol. 93, no. 8, Feb. 2016.
Class to compute attributes based on chemical ordering of structure.
Determines average Warren-Cowley ordering parameter for the bond network
defined by the Voronoi tessellation of a structure.
For each atom in the structure, the average Warren-Cowley ordering
parameter is determined by computing the average magnitude of ordering
parameter for each type for all atoms in a structure. The ordering
parameter is 0 for a perfectly-random distribution, so this average
represents an average degree of “ordering” in the structure. This
attribute is computed for several nearest-neighbor shells (1st, 2nd,
and 3rd by default).
There are two options for computing order parameters: Weighted and
unweighted. The former is computed by weighing the contribution of each
neighboring atom by the fraction of surface area corresponding to
boundaries between that atom and the central atom. The former considers
all neighbors weighted equally, which means they are very sensitive to
the introduction of small faces due to numerical problems inherent to the
Voronoi tessellation. Full details is available in the Vassal
documentation for VoronoiCellBasedAnalysis.getNeighborOrderingParameters().
Function to parse a string containing the composition.
Supports parentheses and addition compounds (ex: Na_2CO_3-10H_2O).
Note, will not properly parse addition compounds inside parentheses
(ex: Na_2(CO_3 - 10H_2O)_1).
Function to sort the element ids based on their electronegativity
order and normalizes the fractions.
Makes sure the entry is in a proper format. Must be run from
constructor.
This method works by computing an approximation for the Coulomb matrix
that considers periodicity. Specifically, we use the Coulomb Sine matrix,
which is described in detail in the Faber et al.[1]. For molecules,
the Coulomb matrix is defined as
The eigenvalues of this matrix are then used as attributes. In order to
provided a fixed number of attributes, the first N attributes are defined
to be the N eigenvalues from the Coulomb matrix. The remaining attributes
are defined to be zero.
The Coulomb Matrix attributes are dependant on unit cell choice.
Please consider transforming your input crystal structures to the primitive
cell before using these attributes.
“Crystal structure representations for machine learning models of
formation energies,” International Journal of Quantum Chemistry,
vol. 115, no. 16, pp. 1094–1101, Apr. 2015.
Compute attributes based on the effective coordination number.
The effective coordination number can be thought of as a face-size-weighted
coordination number. It is computed by the formula
\[N_{eff} = \displaystyle\]
rac{1}{sum[(
rac{f_i}{SA})^2]}
where :math: f_i is the area of face :math: i and :math: SA is the
surface area of the entire cell.
The effective coordination number has major benefit: stability against the
additional of a very small face. Small perturbations in atomic positions
can break symmetry in a crystal, and lead to the introduction of small
faces. The conventional coordination number treats all faces equally,
so the coordination number changes even when one of these small faces is
added.
One approach in the literature is to first apply a screen on small
faces (e.g., remove any smaller than 1% of the total face area),
which still runs into problems with discontinuity for larger displacements.
Our approach is differentiable with respect to the additional of a small
face (ask Logan if you want the math), and also captures another
interesting effect small coordination numbers for Voronoi cells with a
dispersity in face sizes. For example, BCC has 14 faces on its voronoi
cell. 8 large faces, and 6 small ones. Our effective face size identifies a
face size of closer to 8, the commonly-accepted value of the BCC
coordination number, than 14 reported by the conventional measure.
Additional, for systems with equal-sized faces (e.g., FCC), this measure
agrees exactly with conventional reports.
Class to generate attributes based on the properties of constituent
binary systems.
Computes the minimum, maximum and range of all pairs in
the material, and the fraction-weighted mean and variance of all pairs.
Variance is defined as the mean absolute deviation from the mean over all
pairs. If an entry has only one element, the value of NaN is used for all
attributes.
Class to set up and generate descriptors based on elemental property
statistics.
Computes the mean, maximum, minimum, range, mode and mean
absolute deviation of all elemental properties provided.
Features:
1. Formation energy.
2. Number of phases in equilibrium.
3. Distance from closest composition (i.e., ||x_i - x_{i,f}||_2 for each
component i for phase f).
4. Average distance from all neighbors.
5. Quasi-entropy (sum x_i * ln(x_i) where x_i is fraction of phase).
Certain values of the number of phases in equilibrium and “quasi-entropy”
are only accessible to systems with larger number of elements. Useful if
you do not want to consider the number of components in an alloy as a
predictive variable.
Function to set variable to count number of phases at equilibrium.
In some cases, you may want to exclude this as a feature because it is
tied to the number of components in the compound.
This generator only computes a single feature: the L_1 distance between the
composition of an entry and the nearest ionic compound (determined using
IonicCompoundFinder). For a compound where it is not possible to form an
ionic compound (e.g., only metallic elements), the entry is assigned
arbitrarily large distance (equal to the number of elements in the alloy).
The one adjustable parameter in this calculation is the maximum number of
atoms per formula unit used when looking for ionic compounds. For binary
compounds, the maximum conceivable number of elements in a formula unit
is for a compound with a 9+ and a 5- species, which has 14 atoms in the
formula unit. Consequently, we recommend using 14 or larger for this
parameter.
Class to generate the attributes based on the ionicity of a compound.
Creates attributes based on whether it is possible to form a
charge-neutral ionic compound, and two features based on a simple measure
of “bond ionicity” (see Ref. [1]).
Bond ionicity is defined as:
.. math:: I(x,y) = 1 - exp(-0.25* [chi(x) - chi(y)]^2)
Maximum ionic character: Max I(x,y) between any two constituents.
Mean ionic character: :math: sum x_i * x_j * I(i,j) where :math: x_i
is the fraction of element :math: i and :math: chi(x) is the
electronegativity of element :math: x.
Determined by comparing the shape of each coordination polyhedron in the
structure (as determined using a Voronoi tessellation) to those in a
reference lattice.
Similarity is computed by summing the difference in the number of faces
with each number of edges between a certain Voronoi cell and that of the
reference lattice. This difference is then normalized by the number of
faces in the reference lattice, and averaged over all atoms to produce a
“similarity index”. In this form, structures based on the reference
lattice have a match of 0, which becomes larger with increase
dissimilarity.
For now we consider the BCC, FCC (which has the same coordination
polyhedron shape as HCP), and SC lattices.
Class to compute attributes based on the difference in elemental
properties between neighboring atoms.
elemental_propertieslist
Elemental properties to be associated with this class for the
generation of features.
shellsarray-like
Shells to consider. A list of int values.
attr_namestr
Property Name.
For an atom, its “local property difference” is computed by:
\[\displaystyle\]
rac{sum_n f_n * left|p_{atom} - p_n
ight|}{
sum_n f_n}
where :math: f_n is the area of the face associated with neighbor
:math: n, p_{atom} is the elemental property of the central atom,
and :math: p_n is the elemental property of the neighbor atom.
For shells past the 1st nearest neighbor shell, the neighbors are
identified by finding all of the unique faces on the outside of the
polyhedron formed by the previous neighbor shell. This list of faces
will faces corresponding to all of the atoms in the desired shell and the
total weight for each atom is defined by the total area of the faces
corresponding to that atom (there may be more than one).
By default, this class considers the only the 1st nearest neighbor shell.
This parameter is computed for all elemental properties stored in
Composition Entry ElementalProperties.
This class is meant to be used in conjunction with
ElementFractionAttributeGenerator and ValenceShellAttributeGenerator.
To match the attributes from the Meredig et al. [1] paper, use all three
attribute generators.
and E. K. U. Gross, “How to represent crystal structures for machine
learning: Towards fast prediction of electronic properties,” Physical
Review B, vol. 89, no. 20, May 2014.
Class to compute attributes based on packing efficiency.
Packing efficiency is determined by finding the largest sphere that would
fit inside each Voronoi cell and comparing the volume of that sphere to the
volume of the cell.
For now, the only attribute computed by this generator is the maximum
packing efficiency for the entire cell. This is computed by summing the
total volume of all spheres in all cells, and dividing by the volume of
the unit cell.
Class to set up and generate descriptors based on the stoichiometry of a
given material.
Includes features that are only based on fractions of elements, but not
what those elements actually are.
Class to compute attributes based on heterogeneity in structure.
Measures variance in bond lengths (both for a single atom and between
different atoms) and atomic volumes. Also considers the number of unique
coordination polyhedron shapes.
Bond lengths, atomic volumes, and coordination polyhedra are based on the
Voronoi tessellation of the structure.
Current attributes:
1. Mean absolute deviation in average bond length for each atom, normalized
by mean for all atoms.
2. Minimum in average bond length, normalized by mean for all atoms.
3. Maximum in average bond length, normalized by mean for all atoms.
4. Mean bond length variance between bonds across all atom.
5. Mean absolute deviation in bond length variance.
6. Minimum bond length variance.
7. Maximum bond length variance.
8. Mean absolute deviation in atomic volume, normalized by mean atomic
volume.
Here, bond length variation for a single atom is defined as:
.. math:: hat{l} = <l_i - ar{l}>
where :math: l_i is the distance between an atom and one of its
neighbors.
Class that generates attributes based on fraction of electrons in
valence shell of constituent elements.
Creates 4 features: [Composition-weighted mean # of electrons in the {s,p,
d,f} shells]/[Mean # of Valence Electrons]
Originally presented by: Meredig et al. [1].
Class to compute the attributes \(\Omega\) and \(\delta\)
developed by Yang and Zhang [1].
These parameters are based on the liquid formation enthalpy and atomic
sizes of elements respectively and were originally developed to predict
whether a metal alloy will form a solid solution of bulk metallic glass.
math:
Omega is derived from the melting temperature, ideal mixing
entropy, and regular solution solution interaction parameter (
:math: Omega_{i,j}) predicted by the Miedema model for binary liquids.
Specifically, it is computed using the relationship:
.. math:: Omega = displaystyle
where :math: T_m is the composition-weighted average of the melting
temperature, :math: Delta S_{mix} is the ideal solution entropy,
and :math: Delta H_{mix} is the mixing enthalpy. The mixing enthalpy
is computed using the Miedema mixing enthalpies tabulated by Takeuchi and
Inoue [2] where:
.. math:: Delta H_{mix} = displaystylesum Omega_{i,j} c_i c_j
and :math: Omega_{i,j} = 4 * Delta H_{mix}.
:math: delta is related to the polydispersity of atomic sizes, and is
computed using the relationship:
.. math:: delta = [displaystylesum c_i (1 -
rac{r_i}{r_{
average})^2]^0.5
where :math: r_i is the atomic size. Here, we use the atomic radii
compiled by Miracle et al. [3] rather than those compiled by Kittel,
as in the original work.
solid-solution in multi-component alloys,” Materials Chemistry and
Physics, vol. 132, no. 2–3, pp. 233–238, Feb. 2012.
.. [2] A. Takeuchi and A. Inoue, “Classification of Bulk Metallic Glasses
by Atomic Size Difference, Heat of Mixing and Period of Constituent
Elements and Its Application to Characterization of the Main Alloying
Element,” MATERIALS TRANSACTIONS, vol. 46, no. 12, pp. 2817–2829, 2005.
.. [3] D. B. Miracle, D. V. Louzguine-Luzgin, L. V. Louzguina-Luzgina,
and A. Inoue, “An assessment of binary metallic glasses: correlations
between structure, glass forming ability and stability,” International
Materials Reviews, vol. 55, no. 4, pp. 218–256, Jul. 2010.