The following are the steps for using AutoML for a regression task:
Note: Setting the flag for featurization= ‘True’ generates represents molecules using 5 representation techniques.
Requires an input pandas dataframe consisting of two columns:
SMILES strings
target property values
Molecules are represented as:
coloumb matrix
rdkit morgan fingerprints
MACCs
rdkit hashed topological torsion
rdkit molecular descriptors (all)
Screens through various sklearn regressor models:
Yields ‘n-best’ models, with optimized hyperparamters.
Returns dataframe of error metrics, machine learning model, algorithm, tuned hyperparameter values and featurization technique.
Load your data
[1]:
import pandas as pd
import numpy as np
from chemml.chem import Molecule
from chemml.datasets import load_organic_density
[2]:
molecules, target, dragon_subset = load_organic_density()
df=pd.concat([molecules, target], axis=1)
df = df.sample(25)
df
[2]:
| smiles | density_Kg/m3 | |
|---|---|---|
| 188 | n1ccc(cc1)c1scnc1c1ncccc1c1cccc2c1cccc2 | 1203.16 | 
| 111 | Cc1cc2ccccc2c(c1)c1sccc1c1cscc1 | 1199.41 | 
| 0 | C1CSC(CS1)c1ncc(s1)CC1CCCC1 | 1184.64 | 
| 30 | Cc1c(cc2c(c1c1cnccn1)cccc2)c1cscn1 | 1213.76 | 
| 328 | c1ccc(nc1)c1nnc(s1)Sc1cccs1 | 1374.07 | 
| 270 | Oc1ccc(c2c1cccc2c1ncsc1)c1ccco1 | 1290.56 | 
| 68 | SC1CCC(C1c1cnccn1)C1CSCCS1 | 1238.45 | 
| 293 | OC1NCN(CN1)c1ccc(cc1)c1scnn1 | 1366.07 | 
| 379 | C1CSC(CS1)c1ccccc1c1ccc(cc1)c1ccco1 | 1193.12 | 
| 253 | c1cnc(cn1)c1ccc2c(c1)cccc2c1csc(n1)c1ccco1 | 1258.63 | 
| 107 | n1ccc(cc1)c1cscc1c1cccc2c1cc(cc2)c1cccnc1 | 1186.29 | 
| 115 | c1cnc(cn1)c1csc(n1)C1(CCCC1)c1ccncc1 | 1209.59 | 
| 15 | CC1CCC(C1)C1CCCC1c1cccs1 | 1005.60 | 
| 218 | CC1(CCCC1)c1ccc(s1)c1scnn1 | 1199.65 | 
| 86 | c1cnc(cn1)c1coc(c1)c1nccc(c1)c1cccc2c1cccc2 | 1209.81 | 
| 391 | c1cnc(cn1)c1nc(c(s1)c1nncs1)c1cnccn1 | 1403.86 | 
| 28 | s1cnc(c1)c1cc(cc2c1cccc2)c1csc(c1)c1scnc1 | 1322.37 | 
| 366 | n1ccc(cc1)C1NCNCN1c1cncc(c1)c1cccnc1 | 1235.90 | 
| 309 | C1NC(NC(N1)c1nncs1)c1ccc2c(c1)cc(cc2)c1cccnc1 | 1282.84 | 
| 401 | C1CCC(C1)(c1cncs1)c1cc2ccccc2c(c1)c1cscn1 | 1211.16 | 
| 25 | Cc1c2ccccc2ccc1C1(CSCCS1)C1NCNCN1 | 1232.82 | 
| 216 | Cc1ccc2c(c1C1NCNCN1C1NCNCN1)cccc2 | 1196.18 | 
| 469 | C1SCC(SC1)c1nsnc1C1(CCCC1)c1ccco1 | 1270.09 | 
| 47 | o1ccc(c1)CC1CCCC1c1cccs1 | 1088.77 | 
| 186 | SC1CCC(C1)C1CSCC(S1)c1cccnc1 | 1198.04 | 
Run autoML for a regression task
[3]:
from chemml.autoML import ModelScreener
MS = ModelScreener(df, target="density_Kg/m3", featurization=True, smiles="smiles",
                   screener_type="regressor", output_file="testing.txt")
scores = MS.screen_models(n_best=4)
featurizing molecules in batches of 2 ...
25/25 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15s 614ms/step
Merging batch features ...    [DONE]
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
split done!
--- 1460.8564009666443 seconds ---
split done!
--- 1754.8138766288757 seconds ---
split done!
--- 556.0454123020172 seconds ---
split done!
--- 2021.2931699752808 seconds ---
split done!
--- 450.1350498199463 seconds ---
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 4
      1 from chemml.autoML import ModelScreener
      2 MS = ModelScreener(df, target="density_Kg/m3", featurization=True, smiles="smiles",
      3                    screener_type="regressor", output_file="testing.txt")
----> 4 scores = MS.screen_models(n_best=4)
File c:\users\nitin\documents\ub\hachmann_group\chemml_dev_nitin\chemml\chemml\autoML\model_screener.py:459, in ModelScreener.screen_models(self, n_best)
    456     print("\n--- %s seconds ---" % (time.time() - start_time))
    458 # aggregate scores list
--> 459 best_models = self.aggregate_scores(scores_list=scores_list_final, n_best=n_best)
    461 return best_models
File c:\users\nitin\documents\ub\hachmann_group\chemml_dev_nitin\chemml\chemml\autoML\model_screener.py:374, in ModelScreener.aggregate_scores(self, scores_list, n_best)
    350 def aggregate_scores(self,  scores_list, n_best):
    351     """ 
    352     This function aggregates a list of scores, combines them into a pandas dataframe, sorts them by
    353     RMSE in ascending order, and returns the top n_best scores.
   (...)
    370         the top n_best scores from the combined scores list, sorted by RMSE in ascending order.
    371     """
--> 374     scores_combined = pd.concat(scores_list, ignore_index=True)
    376     if self.screener_type == "regressor":
    377         self.scores_combined = scores_combined.sort_values(by='RMSE', ascending=True)
File c:\Users\nitin\anaconda3\envs\chemml_dev_env\Lib\site-packages\pandas\core\reshape\concat.py:382, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    379 elif copy and using_copy_on_write():
    380     copy = False
--> 382 op = _Concatenator(
    383     objs,
    384     axis=axis,
    385     ignore_index=ignore_index,
    386     join=join,
    387     keys=keys,
    388     levels=levels,
    389     names=names,
    390     verify_integrity=verify_integrity,
    391     copy=copy,
    392     sort=sort,
    393 )
    395 return op.get_result()
File c:\Users\nitin\anaconda3\envs\chemml_dev_env\Lib\site-packages\pandas\core\reshape\concat.py:445, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    442 self.verify_integrity = verify_integrity
    443 self.copy = copy
--> 445 objs, keys = self._clean_keys_and_objs(objs, keys)
    447 # figure out what our result ndim is going to be
    448 ndims = self._get_ndims(objs)
File c:\Users\nitin\anaconda3\envs\chemml_dev_env\Lib\site-packages\pandas\core\reshape\concat.py:507, in _Concatenator._clean_keys_and_objs(self, objs, keys)
    504     objs_list = list(objs)
    506 if len(objs_list) == 0:
--> 507     raise ValueError("No objects to concatenate")
    509 if keys is None:
    510     objs_list = list(com.not_none(*objs_list))
ValueError: No objects to concatenate
[4]:
scores
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 scores
NameError: name 'scores' is not defined
Save scores to csv
[5]:
scores.to_csv("autoML_test.csv",index=False)