Datasets module

The chemml.datasets module includes (please click on links adjacent to function names for more information):
chemml.datasets.load_cep_homo()

Load and return a small sample of HOMO energies of organic photovoltaic candidates from CEP database (regression). Clean Energy Project (CEP) database is available at: https://cepdb.molecularspace.org The unit of HOMO (highest occupied molecular orbitals) energies is electron Volt (eV). The photovaltaic candidates are provided using SMILES representation.

rows

500

Columns

2

headers

smiles,homo_eV

molecules rep.

SMILES

Features

0

Returns

2 dataframes

smilespandas dataframe

The SMILES representation of molecules, shape: (500,1)

homopandas dataframe

The HOMO energies of the molecules (eV), shape: (500,1)

>>> from chemml.datasets import load_cep_homo
>>> smi, homo  = load_cep_homo()
>>> print(list(smi.columns))
['smiles']
>>> print(homo.shape)
(500, 1)
chemml.datasets.load_comp_energy()

Load and return composition entries and formation energies (eV). From Magpie https://bitbucket.org/wolverton/magpie

rows

630

header

formation_energy

molecules rep.

composition

Features

0

Returns

1 dataframe and 1 list

entrieslist

The list of composition entries from CompositionEntry class.

energypandas dataframe

The formation energy for each composition.

>>> from chemml.datasets import load_comp_energy
>>> entries, df = load_comp_energy()
>>> print(df.shape)
(630, 1)
chemml.datasets.load_crystal_structures()

Load and return crystal structure entries. From Magpie https://bitbucket.org/wolverton/magpie

length

18

header

formation_energy

molecules rep.

composition

Features

0

Returns

1 list

entrieslist

The list of crystal structure entries from CrystalStructureEntry class.

>>> from chemml.datasets import load_crystal_structures
>>> entries = load_crystal_structures()
>>> print(len(entries))
18
chemml.datasets.load_organic_density()

Load and return 500 small organic molecules with their density and molecular descriptors.

rows

500

Columns

202

last twoo headers

smiles,density_Kg/m3

molecules rep.

SMILES

Features

200

Returns

3 dataframes

smilespandas dataframe

The SMILES representation of molecules, shape: (500,1)

densitypandas dataframe

The density of molecules (Kg/m3), shape: (500,1)

featurespandas dataframe

The molecular descriptors of molecules, shape: (500,200)

>>> from chemml.datasets import load_organic_density
>>> smi, density, features = load_organic_density()
>>> print(list(smi.columns))
['smiles']
>>> print(features.shape)
(500, 200)
chemml.datasets.load_xyz_polarizability()

Load and return xyz files and polarizability (Bohr^3). The xyz coordinates of small organic molecules are optimized with BP86/def2svp level of theory. Polarizability of the molecules are also calcualted in the same level of thoery.

rows

50

Columns

1

header

polarizability

molecules rep.

xyz

Features

0

Returns

1 dataframe and 1 dict

moleculeslist

The list of molecule objects with xyz coordinates.

polpandas dataframe

The polarizability of each molecule as a column of dataframe.

>>> from chemml.datasets import load_xyz_polarizability
>>> molecules, polarizabilities = load_xyz_polarizability()
>>> print(len(molecules))
50
>>> print(polarizabilities.shape)
(50, 1)