Datasets module
- The chemml.datasets module includes (please click on links adjacent to function names for more information):
load_cep_homo:
load_cep_homo()
load_organic_density:
load_organic_density()
load_xyz_polarizability:
load_xyz_polarizability()
load_comp_energy:
load_comp_energy()
load_crystal_structures:
load_crystal_structures()
- chemml.datasets.load_cep_homo()
Load and return a small sample of HOMO energies of organic photovoltaic candidates from CEP database (regression). Clean Energy Project (CEP) database is available at: https://cepdb.molecularspace.org The unit of HOMO (highest occupied molecular orbitals) energies is electron Volt (eV). The photovaltaic candidates are provided using SMILES representation.
rows
500
Columns
2
headers
smiles,homo_eV
molecules rep.
SMILES
Features
0
Returns
2 dataframes
- smilespandas dataframe
The SMILES representation of molecules, shape: (500,1)
- homopandas dataframe
The HOMO energies of the molecules (eV), shape: (500,1)
>>> from chemml.datasets import load_cep_homo >>> smi, homo = load_cep_homo() >>> print(list(smi.columns)) ['smiles'] >>> print(homo.shape) (500, 1)
- chemml.datasets.load_comp_energy()
Load and return composition entries and formation energies (eV). From Magpie https://bitbucket.org/wolverton/magpie
rows
630
header
formation_energy
molecules rep.
composition
Features
0
Returns
1 dataframe and 1 list
- entrieslist
The list of composition entries from CompositionEntry class.
- energypandas dataframe
The formation energy for each composition.
>>> from chemml.datasets import load_comp_energy >>> entries, df = load_comp_energy() >>> print(df.shape) (630, 1)
- chemml.datasets.load_crystal_structures()
Load and return crystal structure entries. From Magpie https://bitbucket.org/wolverton/magpie
length
18
header
formation_energy
molecules rep.
composition
Features
0
Returns
1 list
- entrieslist
The list of crystal structure entries from CrystalStructureEntry class.
>>> from chemml.datasets import load_crystal_structures >>> entries = load_crystal_structures() >>> print(len(entries)) 18
- chemml.datasets.load_organic_density()
Load and return 500 small organic molecules with their density and molecular descriptors.
rows
500
Columns
202
last twoo headers
smiles,density_Kg/m3
molecules rep.
SMILES
Features
200
Returns
3 dataframes
- smilespandas dataframe
The SMILES representation of molecules, shape: (500,1)
- densitypandas dataframe
The density of molecules (Kg/m3), shape: (500,1)
- featurespandas dataframe
The molecular descriptors of molecules, shape: (500,200)
>>> from chemml.datasets import load_organic_density >>> smi, density, features = load_organic_density() >>> print(list(smi.columns)) ['smiles'] >>> print(features.shape) (500, 200)
- chemml.datasets.load_xyz_polarizability()
Load and return xyz files and polarizability (Bohr^3). The xyz coordinates of small organic molecules are optimized with BP86/def2svp level of theory. Polarizability of the molecules are also calcualted in the same level of thoery.
rows
50
Columns
1
header
polarizability
molecules rep.
xyz
Features
0
Returns
1 dataframe and 1 dict
- moleculeslist
The list of molecule objects with xyz coordinates.
- polpandas dataframe
The polarizability of each molecule as a column of dataframe.
>>> from chemml.datasets import load_xyz_polarizability >>> molecules, polarizabilities = load_xyz_polarizability() >>> print(len(molecules)) 50 >>> print(polarizabilities.shape) (50, 1)