Magpie_Python module

This module is adapted entirely from Magpie (https://bitbucket.org/wolverton/magpie). If you are using this module, please cite Magpie as:

  1. Ward, A. Agrawal, A. Choudhary, and C. Wolverton, “A general-purpose machine learning framework for predicting properties of inorganic materials,” npj Computational Materials, vol. 2, no. 1, Aug. 2016.

For more information regarding the python version of Magpie, please see https://github.com/ramv2/magpie_python. The chemml.chem.magpie_python module includes (please click on links adjacent to function names for more information):

class chemml.chem.magpie_python.APEAttributeGenerator

Class to compute features using Atomic Packing Efficiency (APE) of nearby clusters.

packing_thresholdfloat

Threshold at which to define a cluster as efficiently packed.

n_nearest_to_evalarray_like

Number of nearest clusters to assess. An array or list of int values. Default : [1, 3, 5]

max_n_typesint

Maximum number of types over which to search for clusters.

radius_propertystr

Name of elemental property to use as atomic radius.

The APE, as defined by Laws et al. [1], is determined based on the ideal and actual ratio between the central and shell atoms of an atomic cluster with a certain number of atoms. Often, the packing efficiency is described as a ratio between these two quantities:

[Packing efficiency] = [Ideal radius ratio] / [Actual radius ratio]

The ideal ratio is determined based on the ratio between the size of a central atom and the neighboring atoms such that the packing around the central atom is maximized. These optimal ratios for clusters for different numbers of atoms have been tabulated by Miracle et al. [2].

The actual ratio is computed by dividing the radius of the central atom by the average of the central atoms.

We currently use this framework to create two types of features:

Distance to nearest clusters with a packing efficiency better than a certain threshold. If there are fewer than a requested number of efficiently packed clusters in an alloy system, the average is taken to be the average distance to all of the clusters. These features are designed to measure the availability of efficiently-packed atomic configurations in the liquid.

Mean packing efficiency of the system assuming that the composition of the first nearest neighbor shell is equal to the composition of the system. Each atom type is surrounded by the number of atoms that maximizes the packing efficiency. As shown in recent work by Laws et al. [1], bulk metallic glasses are known to form when the clusters around all types of atom have the same composition as the alloy and are efficiently packed. We compute the average APE for each atom in the system, under this assumption, and the average deviation from perfect packing.

This algorithm currently evaluates all possible clusters provided a list of elements. As the number of clusters scales with N!, the runtime of this algorithm scales with N!.

For now, we only search for clusters with up to 7 atoms (max_n_types) in order to avoid this combinatorial problem. In practice, the algorithm picks the top 7 alloys with the highest fractions. While not idea, this might work in practice since most alloys have fewer than 7 main components. Many alloys >10 elements in the specification, but many are impurities that may not be present in large enough amounts to really affect the determination of efficiently packed clusters.

1
    1. Laws, D. B. Miracle, and M. Ferry, “A predictive structural

model for bulk metallic glasses,” Nature Communications, vol. 6, p. 8123, Sep. 2015. .. [2] D. B. Miracle, E. A. Lord, and S. Ranganathan, “Candidate Atomic Cluster Configurations in Metallic Glass Structures,” MATERIALS TRANSACTIONS, vol. 47, no. 7, pp. 1737 – 1742, 2006. .. [3] D. B. Miracle, D. V. Louzguine-Luzgin, L. V. Louzguina-Luzgina, and A. Inoue, “An assessment of binary metallic glasses: correlations between structure, glass forming ability and stability,” International Materials Reviews, vol. 55, no. 4, pp. 218–256, Jul. 2010.

classmethod compute_APE(n_neighbors=None, center_radius=None, neigh_eff_radius=None, radii=None, center_type=None, shell_types=None)

Function to compute the APE of a cluster, given the identities of the central and 1st neighbor atoms or just the number of neighbors and the radii.

Here, we follow the formulation given by Laws et al. [1].

APE = ideal radius ratio / (radius of central atom / effective radius of nearest neighbors)

n_neighborsint

Number of 1st nearest neighbors in the cluster.

center_radiusfloat

Radius of the central atom.

neigh_eff_radiusfloat

Effective radius of the 1st shell. Usually computed as the average radius of all atoms in the shell.

radiiarray-like

Radius of each atom type. A list of float values.

center_typeint

Type of atom in the center.

shell_typesarray-like

Number of atoms of each type in the outer shell. Must be same length as radii.

outputfloat

APE, as defined in the function description.

classmethod compute_cluster_compositions(e_ids, clusters)

Function to compute the compositions of a list of atomic clusters.

The composition includes both atoms in the first nearest neighbor shell and the atom in the center of the cluster.

e_ids: array-like

Ids of the elements from which clusters are composed. A list of int values.

clustersarray-like

Clusters to convert. List of identity shell compositions for each type of central atom. Ex: clusters[1][2] is an array defining the number of atoms of each type for clusters with an atom of type 1 in the center. A list containing a list of int values.

outputarray-like

Compositions found in this cluster. A list of CompositionEntry’s.

classmethod determine_optimal_APE(central_atom_type, shell_composition, radii)

Function to compute the optimal APE for a cluster with a certain atom type in the center and composition in the cell.

central_atom_typeint

Element id (Z - 1) of th central atom.

shell_compositionCompositionEntry

Composition of the nearest-neighbor shell.

radiiarray-like

Lookup table of elemental radii. A list of float values.

outputfloat

The optimal APE.

This algorithm finds the number of atoms in the shell such that the APE of the cluster is closest to 1. Note: This calculation assumes that sites in the first nearest-neighbor shell can be partially-occupied.

classmethod find_efficiently_packed_clusters(radii, packing_threshold)

Function to find all clusters with better APE than a certain threshold, given a list of atomic radii.

The packing efficiency is defined as abs(1 - APE).

radiiarray-like

Radii of elements. A list of float values.

packing_thresholdfloat

Desired packing limit threshold. A “default” choice would be 0.05.

outputarray-like

A list of efficiently packed structures for each atom type as the central atom. Ex: x[0][1] is the 2nd efficiently packed cluster with atom type 0 as the central atom. A list containing a list of int values.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

get_closest_compositions(target_composition, other_compositions, n_closest, p_norm)

Function to get closest compositions from a given target composition.

target_compositionCompositionEntry

Composition from which to measure distance.

other_compositionsarray-like

Compositions whose distance from the target will be ranked. A list of CompositionEntry’s.

n_closestint

Number of closest compounds to return.

p_normint

P-norm to use when computing distance.

distarray-like

Distances of the compositions from the target composition. A list of float values.

compsarray-like

A list of CompositionEntry’s.

classmethod get_cluster_range(radii, packing_threshold)

Function compute the maximum and minimum possible cluster sizes, given a list of radii.

The smallest possible cluster has the smallest atom in the center and the largest in the outside. The largest possible has the largest in the inside and the smallest in the outside.

radiiarray-like

Radii of elements in the system. A list of float values.

packing_thresholdfloat

APE defining maximum packing threshold.

min_cluster_sizeint

Minimum cluster size as defined by number of atoms in the shell.

max_cluster_sizeint

Maximum cluster size as defined by number of atoms in the shell.

set_n_nearest_to_eval(values)

Function to define the number of nearest neighbor clusters to evaluate when computing features.

valuesarray-like

Number of nearest clusters to assess. An array or list of int values.

set_packing_threshold(threshold)

Function to define the threshold at which a cluster is considered efficiently packed.

thresholdfloat

Desired threshold. Default: 0.01

ValueError

If threshold value is negative.

set_radius_property(prop)

Function to set the name of the elemental property used to define radii.

By default uses the “MiracleRadius” property which is from an assessment by Miracle et al. [2].

propstr

Name of property used to define radii.

class chemml.chem.magpie_python.APRDFAttributeGenerator

Class to generate attributes based on the Atomic Property Weighted Radial Distribution Function (AP-RDF) approach of Fernandez et al. [1].

User can specify the cutoff distance for the AP-RDF, the number of points to evaluate it, the smoothing factors for the RDF peaks, and the properties used for weighting. The recommended values of these parameters have yet to be determined, please contact Logan Ward or the authors of this paper if you have questions or ideas for these parameters.

cut_off_distancefloat

Cutoff distance for RDF.

num_pointsint

Number of points to evaluate.

smooth_parameterfloat

Smoothing parameter for AP-RDF.

elemental_propertieslist

Elemental properties to be associated with this class for the generation of features.

1
  1. Fernandez, N. R. Trefiak, and T. K. Woo, “Atomic Property

Weighted Radial Distribution Functions Descriptors of Metal–Organic Frameworks for the Prediction of Gas Uptake Capacity,” The Journal of Physical Chemistry C, vol. 117, no. 27, pp. 14095–14105, Jul. 2013.

add_elemental_properties(properties)

Function to provide a list of elemental properties to be used to compute features.

propertiesarray-like

Properties to be included. A list of strings containing property names.

add_elemental_property(property_name)

Function to add an elemental property to self.elemental_properties in order to be used to compute features.

propertystr

Property to be added.

clear_elemental_properties()

Function to clear the list of elemental properties.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

set_cut_off_distance(d)

Function to set cut off distance used when computing PRDF.

dfloat

Cut off distance.

set_num_points(num_points)

Function to set the number of points at which to evaluate AP-RDF.

num_pointsint

Desired number of windows.

set_smoothing_parameter(b)

Function to set smoothing factor used when computing PRDF.

bfloat

Smoothing factor.

class chemml.chem.magpie_python.ChargeDependentAttributeGenerator

Class to generate attributes derived from the oxidation states of elements in a material. Based on work by Deml et al.[1].

These features are based on the formal charges of materials determined using the OxidationStateGuesser. Currently implemented features: Statistics of formal charges (min, max, range, mean, variance) Cumulative ionization energies/ electron affinities Difference in electronegativities between cation and anion. For materials that the algorithm fails to find charge states, NaN is set for all features.

1
    1. Deml, R. O’Hayre, C. Wolverton, and V. Stevanović,

“Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression,” Physical Review B, vol. 93, no. 8, Feb. 2016.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

class chemml.chem.magpie_python.ChemicalOrderingAttributeGenerator

Class to compute attributes based on chemical ordering of structure. Determines average Warren-Cowley ordering parameter for the bond network defined by the Voronoi tessellation of a structure.

shellslist

Index of shells to compute features for.

weightedbool

Whether to compute features using weighting or not.

VoronoiCellBasedAnalysis.get_neighbor_ordering_parameters : Computes Warren-Cowley ordering parameters.

For each atom in the structure, the average Warren-Cowley ordering parameter is determined by computing the average magnitude of ordering parameter for each type for all atoms in a structure. The ordering parameter is 0 for a perfectly-random distribution, so this average represents an average degree of “ordering” in the structure. This attribute is computed for several nearest-neighbor shells (1st, 2nd, and 3rd by default).

There are two options for computing order parameters: Weighted and unweighted. The former is computed by weighing the contribution of each neighboring atom by the fraction of surface area corresponding to boundaries between that atom and the central atom. The former considers all neighbors weighted equally, which means they are very sensitive to the introduction of small faces due to numerical problems inherent to the Voronoi tessellation. Full details is available in the Vassal documentation for VoronoiCellBasedAnalysis.getNeighborOrderingParameters().

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

set_shells(shells)

Function to set which nearest-neighbor shells to consider when generating features.

shells: list

Desired shell indices.

set_weighted(weighted)

Function to set whether to consider face sizes when computing ordering parameters.

weighted: bool

Whether to weigh using face sizes.

class chemml.chem.magpie_python.CompositionEntry(composition=None, element_ids=None, element_names=None, fractions=None)

Class that defines a CompositionEntry object. Mainly used to store ids, names and fractions of elements belonging to a single compound.

lp_element_namesarray-like

Names of each element. A list of string values.

lp_sorting_orderarray-like

Rank of each element (used in display order). A list of int values.

element_idsarray-like

Element ids present in composition. A list of int values.

element_namesarray-like

Element names present in composition. A list of string values.

fractionsarray-like

Fraction of each element. A list of float values.

number_in_cellfloat

Number of atoms in cell (used to convert when printing).

combine_compositions(total_comp, add_comp, multiplier)

Function to add one CompositionEntry to another.

total_compdict

Dictionary containing element ids and fractions as keys and values respectively. Composition to be added to.

add_compdict

Dictionary containing element ids and fractions as keys and values respectively. Composition to add.

multiplierfloat

Factor to multiply with.

get_element_fraction(name=None, id=None)

Function to get the element fraction given either the name or the id of the element.

namestr

Name of the element.

idint

Id of the element.

fractionfloat

Elemental fraction.

get_element_fractions()

Function to get the element fractions in the composition.

element_fractionsarray-like

List of element fractions (float).

get_element_ids()

Function to get the element ids in the composition.

element_idsarray-like

List of element ids (int).

get_element_names()

Function to get the element names in the composition.

element_namesarray-like

List of element names (strings).

classmethod import_composition_list(file_path)

Function to read a list of compositions from a file.

file_pathstr

Path to the file containing the list of compositions.

composition_listarray-like

A list of CompositionEntry’s corresponding to the file contents.

classmethod import_values_list(file_path)

Function to read a list of target property values from a file.

Target property values are used to develop machine learning models.

file_pathstr

Path to the file containing the list of compositions.

property_listarray-like

A list of target property values (floats) corresponding to the file contents.

parse_composition(composition)

Function to parse a string containing the composition. Supports parentheses and addition compounds (ex: Na_2CO_3-10H_2O). Note, will not properly parse addition compounds inside parentheses (ex: Na_2(CO_3 - 10H_2O)_1).

compositionstr

The chemical formula of a material.

outputdict

Dictionary containing element ids and fractions as keys and values respectively.

ValueError

If closing parenthesis is missing. If parenthesis is not recognized.

parse_element_amounts(composition)

Function to compute fractions of element given a string of elements and amounts.

compositionstr

Composition of a material.

tmp_entrydict

Dictionary containing element ids and fractions as keys and values respectively.

ValueError

If either element names or ids are not recognized. If element amount is not a number.

classmethod print_number(fraction, n_in_formula_unit)

Function to print out the number of atoms in a formula unit for each element given its fraction.

fractionarray-like

List of element fractions (floats) to be printed.

n_in_formula_unitint

Number of atoms in a formula unit.

outputstr

Formatted fractions.

set_composition(amounts, element_ids=None, element_names=None, to_sort=True)

Function to set the composition of this entry. Checks to make sure all elements have positive amounts.

amountsarray-like

List of amounts (float) for each element.

element_idsarray-like

List of element ids (integers).

element_namesarray-like

List of element names (strings).

ValueError

If either element names or ids are missing. If lists have different lengths.

sort_and_normalize(to_sort=True)

Function to sort the element ids based on their electronegativity order and normalizes the fractions. Makes sure the entry is in a proper format. Must be run from constructor.

to_sortbool

Whether to sort as well as normalize or just normalize this instance.

class chemml.chem.magpie_python.CoordinationNumberAttributeGenerator

Class to compute attributes based on the coordination number. Uses the Voronoi tessellation to define the coordination network.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

class chemml.chem.magpie_python.CoulombMatrixAttributeGenerator

Class to compute attributes using the Coulomb Sine Matrix representation. Based on work by Faber et al. [1].

n_eigenvaluesint

Maximum number of atoms to consider. Defines number of attributes.

This method works by computing an approximation for the Coulomb matrix that considers periodicity. Specifically, we use the Coulomb Sine matrix, which is described in detail in the Faber et al.[1]. For molecules, the Coulomb matrix is defined as

\[C_{i,j} &= Z_i^{2.4} & ext{if} i=j\ &= Z_i Z_j / r_ij & ext{if} i != j\]

The eigenvalues of this matrix are then used as attributes. In order to provided a fixed number of attributes, the first N attributes are defined to be the N eigenvalues from the Coulomb matrix. The remaining attributes are defined to be zero.

The Coulomb Matrix attributes are dependant on unit cell choice. Please consider transforming your input crystal structures to the primitive cell before using these attributes.

1
  1. Faber, A. Lindmaa, O. A. von Lilienfeld, and R. Armiento,

“Crystal structure representations for machine learning models of formation energies,” International Journal of Quantum Chemistry, vol. 115, no. 16, pp. 1094–1101, Apr. 2015.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

set_n_eigenvalues(x)

Function to set the number of eigenvalues used in representation.

xint

Desired number.

class chemml.chem.magpie_python.CrystalStructureEntry(structure, name, radii)

Class to represent a crystal structure.

structureCell

Crystal structure represented in the form of a Cell.

namestr

Name given to denote this structure. Mainly used for book-keeping.

radiiarray-like

List of radii (floats) for various atoms in the periodic table.

voronoiVoronoiCellBasedAnalysis

Tool used to query properties of the tessellation.

clear_representations()

Function to clear out the representations used when computing attributes.

compute_composition()

Function to compute the composition of this crystal.

Exception

If element name is not recognized.

compute_voronoi_tessellation()

Function to compute the voronoi tessellation of this structure.

voronoiVoronoiCellBasedAnalysis

Tool used to query properties of the tessellation.

get_name()

Function to get the name of this entry.

namestr

Name given to denote this structure. Mainly used for book-keeping.

get_structure()

Function to get link to the structure.

structureCell

Structure this entry represents.

classmethod import_structures_list(dir_path)

Function to read a list of crystal structures from a directory.

dir_pathstr

Path to the directory containing the list of vasp files.

structures_listarray-like

A list of CrystalStructureEntry’s corresponding to the file contents.

replace_elements(replacements)

Function to create a new entry by replacing elements on this entry.

replacementsdict

Dictionary of elements to replace. Key: Old element, Value: New element.

new_entryCrystalStructureEntry

New entry formed by replacing the current elements with the replacement map.

class chemml.chem.magpie_python.EffectiveCoordinationNumberAttributeGenerator

Compute attributes based on the effective coordination number.

The effective coordination number can be thought of as a face-size-weighted coordination number. It is computed by the formula

\[N_{eff} = \displaystyle\]

rac{1}{sum[( rac{f_i}{SA})^2]}

where :math: f_i is the area of face :math: i and :math: SA is the surface area of the entire cell.

The effective coordination number has major benefit: stability against the additional of a very small face. Small perturbations in atomic positions can break symmetry in a crystal, and lead to the introduction of small faces. The conventional coordination number treats all faces equally, so the coordination number changes even when one of these small faces is added.

One approach in the literature is to first apply a screen on small faces (e.g., remove any smaller than 1% of the total face area), which still runs into problems with discontinuity for larger displacements.

Our approach is differentiable with respect to the additional of a small face (ask Logan if you want the math), and also captures another interesting effect small coordination numbers for Voronoi cells with a dispersity in face sizes. For example, BCC has 14 faces on its voronoi cell. 8 large faces, and 6 small ones. Our effective face size identifies a face size of closer to 8, the commonly-accepted value of the BCC coordination number, than 14 reported by the conventional measure. Additional, for systems with equal-sized faces (e.g., FCC), this measure agrees exactly with conventional reports.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

mean_abs_dev(data)

Function to compute the mean absolute deviation of an array-like collection of numbers.

dataarray-like

A NumPy array of float values.

outputfloat

The mean absolute deviation.

class chemml.chem.magpie_python.ElementFractionAttributeGenerator

Class to set the element fractions as the features of materials.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

class chemml.chem.magpie_python.ElementPairPropertyAttributeGenerator

Class to generate attributes based on the properties of constituent binary systems. Computes the minimum, maximum and range of all pairs in the material, and the fraction-weighted mean and variance of all pairs. Variance is defined as the mean absolute deviation from the mean over all pairs. If an entry has only one element, the value of NaN is used for all attributes.

elemental_pair_propertieslist

Elemental properties to be associated with this class for the generation of features.

pair_lookup-datadict

Dictionary containing the property name as the key and a list of floats as the value.

add_elemental_pair_properties(properties)

Function to provide a list of elemental pair properties to be used to compute features.

propertiesarray-like

Properties to be included. A list of strings containing property names.

add_elemental_pair_property(property)

Function to add an elemental pair property to be used to compute features.

propertystr

Property to be added.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If no elemental properties are set. If input is not of type list. If items in the list are not CompositionEntry instances.

load_pair_lookup_data()

Function to load the property values into self.lookup_data for the computation of features.

remove_elemental_pair_properties(properties)

Function to remove a list of elemental pair properties from the list of elemental properties.

propertiesarray-like

Properties to be removed. A list of strings containing property names.

remove_elemental_pair_property(property)

Function to remove an elemental pair property from the list of elemental properties.

propertystr

Property to be removed.

class chemml.chem.magpie_python.ElementalPropertyAttributeGenerator(use_default_properties=True)

Class to set up and generate descriptors based on elemental property statistics. Computes the mean, maximum, minimum, range, mode and mean absolute deviation of all elemental properties provided.

elemental_propertiesarray-like

Elemental properties to be associated with this class for the generation of features.

lookup-datadict

Dictionary containing the property name as the key and a list of floats as the value.

add_elemental_properties(properties)

Function to provide a list of elemental properties to be used to compute features.

propertiesarray-like

Properties to be included. A list of strings containing property names.

add_elemental_property(property)

Function to add an elemental property to self.elemental_properties in order to be used to compute features.

propertystr

Property to be added.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If no elemental properties are set. If input is not of type list. If items in the list are not CompositionEntry instances.

load_lookup_data()

Function to load the property values into self.lookup_data for the computation of features.

remove_elemental_properties(properties)

Function to remove a list of elemental properties from the list of elemental properties.

propertiesarray-like

Properties to be removed. A list of strings containing property names.

remove_elemental_property(property)

Function to remove an elemental property from self.elemental_properties.

propertystr

Property to be removed.

class chemml.chem.magpie_python.GCLPAttributeGenerator

Class to compute features based on the T=0K ground state.

GCLPCalculatorGCLPCalculator

A GCLPCalculator instance.

count_phasesbool

Flag to include or exclude the number of phases at equilibrium.

Features: 1. Formation energy. 2. Number of phases in equilibrium. 3. Distance from closest composition (i.e., ||x_i - x_{i,f}||_2 for each component i for phase f). 4. Average distance from all neighbors. 5. Quasi-entropy (sum x_i * ln(x_i) where x_i is fraction of phase). Certain values of the number of phases in equilibrium and “quasi-entropy” are only accessible to systems with larger number of elements. Useful if you do not want to consider the number of components in an alloy as a predictive variable.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

set_count_phases(count_phases)

Function to set variable to count number of phases at equilibrium. In some cases, you may want to exclude this as a feature because it is tied to the number of components in the compound.

count_phasesbool

Desired setting.

set_phases(phases, energies)

Function to define phases used when computing ground states.

phasesarray-like

Compositions to consider. A list of CompositionEntry’s.

energiesarray-like

Corresponding energies. A list of float values.

class chemml.chem.magpie_python.IonicCompoundProximityAttributeGenerator

Class to generate attributes based on the distance of a composition from a compositions that can form charge-neutral ionic compounds.

max_formula_unitint

Maximum number of atoms per formula unit.

This generator only computes a single feature: the L_1 distance between the composition of an entry and the nearest ionic compound (determined using IonicCompoundFinder). For a compound where it is not possible to form an ionic compound (e.g., only metallic elements), the entry is assigned arbitrarily large distance (equal to the number of elements in the alloy). The one adjustable parameter in this calculation is the maximum number of atoms per formula unit used when looking for ionic compounds. For binary compounds, the maximum conceivable number of elements in a formula unit is for a compound with a 9+ and a 5- species, which has 14 atoms in the formula unit. Consequently, we recommend using 14 or larger for this parameter.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

set_max_formula_unit(size)

Function to define the maximum number of atoms per formula unit.

sizeint

Desired size.

class chemml.chem.magpie_python.IonicityAttributeGenerator

Class to generate the attributes based on the ionicity of a compound. Creates attributes based on whether it is possible to form a charge-neutral ionic compound, and two features based on a simple measure of “bond ionicity” (see Ref. [1]).

Bond ionicity is defined as: .. math:: I(x,y) = 1 - exp(-0.25* [chi(x) - chi(y)]^2) Maximum ionic character: Max I(x,y) between any two constituents. Mean ionic character: :math: sum x_i * x_j * I(i,j) where :math: x_i is the fraction of element :math: i and :math: chi(x) is the electronegativity of element :math: x.

1

William D. Callister, Materials science and engineering: an

introduction, Hoboken: John Wiley, 2014.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

class chemml.chem.magpie_python.LatticeSimilarityAttributeGenerator

Compute similarity of structure to several simple lattices.

Determined by comparing the shape of each coordination polyhedron in the structure (as determined using a Voronoi tessellation) to those in a reference lattice.

Similarity is computed by summing the difference in the number of faces with each number of edges between a certain Voronoi cell and that of the reference lattice. This difference is then normalized by the number of faces in the reference lattice, and averaged over all atoms to produce a “similarity index”. In this form, structures based on the reference lattice have a match of 0, which becomes larger with increase dissimilarity.

For now we consider the BCC, FCC (which has the same coordination polyhedron shape as HCP), and SC lattices.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

class chemml.chem.magpie_python.LocalPropertyDifferenceAttributeGenerator(shells=None)
Class to compute attributes based on the difference in elemental

properties between neighboring atoms.

elemental_propertieslist

Elemental properties to be associated with this class for the generation of features.

shellsarray-like

Shells to consider. A list of int values.

attr_namestr

Property Name.

For an atom, its “local property difference” is computed by:

\[\displaystyle\]

rac{sum_n f_n * left|p_{atom} - p_n ight|}{

sum_n f_n}

where :math: f_n is the area of the face associated with neighbor :math: n, p_{atom} is the elemental property of the central atom, and :math: p_n is the elemental property of the neighbor atom.

For shells past the 1st nearest neighbor shell, the neighbors are identified by finding all of the unique faces on the outside of the polyhedron formed by the previous neighbor shell. This list of faces will faces corresponding to all of the atoms in the desired shell and the total weight for each atom is defined by the total area of the faces corresponding to that atom (there may be more than one).

By default, this class considers the only the 1st nearest neighbor shell.

This parameter is computed for all elemental properties stored in Composition Entry ElementalProperties.

add_elemental_properties(properties)

Function to provide a list of elemental properties to be used to compute features.

propertiesarray-like

Properties to be included. A list of strings containing property names.

add_elemental_property(prop)

Function to add an elemental property to self.elemental_properties in order to be used to compute features.

propertystr

Property to be added.

add_shell(shell)

Function to add shell to the list used when computing attributes.

shellsint

Index of nearest-neighbor shell.

ValueError

If shell is negative.

add_shells(shells)

Function to add a list of shells to be used when computing attributes.

shellsarray-like

Shells to be considered. A list of int values.

clear_elemental_properties()

Function to clear all the elemental properties.

clear_shells()

Function to clear the list of shells.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

get_atom_properties(voro, shell, prop_values)

Function to compute the properties of a certain neighbor cell for each atom, given the Voronoi tessellation and properties of each atom type.

voroVoronoiCellBasedAnalysis

Analysis tool.

shellint

Index of shell.

prop_valuesarray-like

Properties of each atom type. A list or NumPy array of float values.

outputarray-like

Properties of each atom. A list or NumPy array of float values.

remove_elemental_properties(properties)

Function to remove a list of elemental properties from the list of elemental properties.

propertiesarray-like

Properties to be removed. A list of strings containing property names.

remove_elemental_property(property)

Function to remove an elemental property from self.elemental_properties.

propertystr

Property to be removed.

class chemml.chem.magpie_python.LocalPropertyVarianceAttributeGenerator(shells=None)

Class to compute attributes based on the local variance in elemental properties around each atom.

LocalPropertyDifferenceAttributeGenerator : Super class of this class.

get_atom_properties(voro, shell, prop_values)

Function to compute the properties of a certain neighbor cell for each atom, given the Voronoi tessellation and properties of each atom type.

voroVoronoiCellBasedAnalysis

Analysis tool.

shellint

Index of shell.

prop_valuesarray-like

Properties of each atom type. A list or NumPy array of float values.

outputarray-like

Properties of each atom. A list or NumPy array of float values.

class chemml.chem.magpie_python.MeredigAttributeGenerator

Class to generate attributes as described by Meredig et al. [1].

This class is meant to be used in conjunction with ElementFractionAttributeGenerator and ValenceShellAttributeGenerator. To match the attributes from the Meredig et al. [1] paper, use all three attribute generators.

1
  1. Meredig et al., “Combinatorial screening for new materials in

unconstrained composition space with machine learning,” Physical Review B, vol. 89, no. 9, Mar. 2014.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

class chemml.chem.magpie_python.PRDFAttributeGenerator

Class to compute attributes based on the Pair Radial Distribution Function (PRDF). Based on work by Schutt et al. [1].

cut_off_distancefloat

Cutoff distance for PRDF.

n_pointsint

Number of distance points to evaluate.

element_listarray-like

Elements to use in PRDF. A list of int values.

1
    1. Schütt, H. Glawe, F. Brockherde, A. Sanna, K. R. Müller,

and E. K. U. Gross, “How to represent crystal structures for machine learning: Towards fast prediction of electronic properties,” Physical Review B, vol. 89, no. 20, May 2014.

add_element(id=None, name=None)

Function to add element to list used when computing PRDF.

idint

ID of element (Atomic number - 1).

namestr

Name of the element.

ValueError

If both arguments are None. If entered element name can not be found in database.

clear_element_list()

Function to clear out the elements in element list.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

set_cut_off_distance(d)

Function to set the maximum distance to consider when computing the PRDF.

dfloat

Desired cutoff distance.

set_elements(entries)

Function to set the elements when computing PRDF.

dataarray-like

A list of CompositionEntry’s containing each element to be added.

set_n_points(n_p)

Function to set the number of points on each PRDF to store.

n_pint

Number of evaluation points.

class chemml.chem.magpie_python.PackingEfficiencyAttributeGenerator

Class to compute attributes based on packing efficiency. Packing efficiency is determined by finding the largest sphere that would fit inside each Voronoi cell and comparing the volume of that sphere to the volume of the cell.

For now, the only attribute computed by this generator is the maximum packing efficiency for the entire cell. This is computed by summing the total volume of all spheres in all cells, and dividing by the volume of the unit cell.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

class chemml.chem.magpie_python.StoichiometricAttributeGenerator(use_default_norms=True)

Class to set up and generate descriptors based on the stoichiometry of a given material. Includes features that are only based on fractions of elements, but not what those elements actually are.

p_normslist

Exponents to be used in computing various norms.

add_p_norm(norm)

Function to add a p norm to be computed.

normint

Desired norm.

ValueError

If norm is 1.

add_p_norms(norms)

Function to add a list of p norms to be computed.

normarray-like

Desired norms. A list of int values.

clear_p_norms()

Function to clear out the list of p norms to be computed.

generate_features(entries)

Function to generate the stoichiometric features.

Computes the norms based on elemental fractions.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

class chemml.chem.magpie_python.StructuralHeterogeneityAttributeGenerator

Class to compute attributes based on heterogeneity in structure. Measures variance in bond lengths (both for a single atom and between different atoms) and atomic volumes. Also considers the number of unique coordination polyhedron shapes. Bond lengths, atomic volumes, and coordination polyhedra are based on the Voronoi tessellation of the structure.

Current attributes: 1. Mean absolute deviation in average bond length for each atom, normalized by mean for all atoms. 2. Minimum in average bond length, normalized by mean for all atoms. 3. Maximum in average bond length, normalized by mean for all atoms. 4. Mean bond length variance between bonds across all atom. 5. Mean absolute deviation in bond length variance. 6. Minimum bond length variance. 7. Maximum bond length variance. 8. Mean absolute deviation in atomic volume, normalized by mean atomic volume.

Here, bond length variation for a single atom is defined as: .. math:: hat{l} = <l_i - ar{l}> where :math: l_i is the distance between an atom and one of its neighbors.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Crystal structures for which features are to be generated. A list of CrystalStructureEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CrystalStructureEntry instances.

class chemml.chem.magpie_python.ValenceShellAttributeGenerator

Class that generates attributes based on fraction of electrons in valence shell of constituent elements. Creates 4 features: [Composition-weighted mean # of electrons in the {s,p, d,f} shells]/[Mean # of Valence Electrons] Originally presented by: Meredig et al. [1].

1
  1. Meredig et al., “Combinatorial screening for new materials in

unconstrained composition space with machine learning,” Physical Review B, vol. 89, no. 9, Mar. 2014.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.

class chemml.chem.magpie_python.YangOmegaAttributeGenerator
Class to compute the attributes \(\Omega\) and \(\delta\)

developed by Yang and Zhang [1]. These parameters are based on the liquid formation enthalpy and atomic sizes of elements respectively and were originally developed to predict whether a metal alloy will form a solid solution of bulk metallic glass.

math

Omega is derived from the melting temperature, ideal mixing

entropy, and regular solution solution interaction parameter ( :math: Omega_{i,j}) predicted by the Miedema model for binary liquids. Specifically, it is computed using the relationship: .. math:: Omega = displaystyle

rac{T_m Delta S_{mix}} {|\Delta H_{mix}|}

where :math: T_m is the composition-weighted average of the melting temperature, :math: Delta S_{mix} is the ideal solution entropy, and :math: Delta H_{mix} is the mixing enthalpy. The mixing enthalpy is computed using the Miedema mixing enthalpies tabulated by Takeuchi and Inoue [2] where: .. math:: Delta H_{mix} = displaystylesum Omega_{i,j} c_i c_j and :math: Omega_{i,j} = 4 * Delta H_{mix}. :math: delta is related to the polydispersity of atomic sizes, and is computed using the relationship: .. math:: delta = [displaystylesum c_i (1 -

rac{r_i}{r_{

average})^2]^0.5 where :math: r_i is the atomic size. Here, we use the atomic radii compiled by Miracle et al. [3] rather than those compiled by Kittel, as in the original work.

1
  1. Yang and Y. Zhang, “Prediction of high-entropy stabilized

solid-solution in multi-component alloys,” Materials Chemistry and Physics, vol. 132, no. 2–3, pp. 233–238, Feb. 2012. .. [2] A. Takeuchi and A. Inoue, “Classification of Bulk Metallic Glasses by Atomic Size Difference, Heat of Mixing and Period of Constituent Elements and Its Application to Characterization of the Main Alloying Element,” MATERIALS TRANSACTIONS, vol. 46, no. 12, pp. 2817–2829, 2005. .. [3] D. B. Miracle, D. V. Louzguine-Luzgin, L. V. Louzguina-Luzgina, and A. Inoue, “An assessment of binary metallic glasses: correlations between structure, glass forming ability and stability,” International Materials Reviews, vol. 55, no. 4, pp. 218–256, Jul. 2010.

generate_features(entries)

Function to generate features as mentioned in the class description.

entriesarray-like

Compositions for which features are to be generated. A list of CompositionEntry’s.

featuresDataFrame

Features for the given entries. Pandas data frame containing the names and values of the descriptors.

ValueError

If input is not of type list. If items in the list are not CompositionEntry instances.