The following are the steps for using AutoML for a regression task:
Note: Setting the flag for featurization= ‘True’ generates represents molecules using 5 representation techniques.
Requires an input pandas dataframe consisting of two columns:
SMILES strings
target property values
Molecules are represented as:
coloumb matrix
rdkit morgan fingerprints
MACCs
rdkit hashed topological torsion
rdkit molecular descriptors (all)
Screens through various sklearn regressor models:
Yields ‘n-best’ models, with optimized hyperparamters.
Returns dataframe of error metrics, machine learning model, algorithm, tuned hyperparameter values and featurization technique.
Load your data
[1]:
import pandas as pd
import numpy as np
from chemml.chem import Molecule
from chemml.datasets import load_organic_density
[2]:
molecules, target, dragon_subset = load_organic_density()
df=pd.concat([molecules, target], axis=1)
df = df.sample(25)
df
[2]:
smiles | density_Kg/m3 | |
---|---|---|
188 | n1ccc(cc1)c1scnc1c1ncccc1c1cccc2c1cccc2 | 1203.16 |
111 | Cc1cc2ccccc2c(c1)c1sccc1c1cscc1 | 1199.41 |
0 | C1CSC(CS1)c1ncc(s1)CC1CCCC1 | 1184.64 |
30 | Cc1c(cc2c(c1c1cnccn1)cccc2)c1cscn1 | 1213.76 |
328 | c1ccc(nc1)c1nnc(s1)Sc1cccs1 | 1374.07 |
270 | Oc1ccc(c2c1cccc2c1ncsc1)c1ccco1 | 1290.56 |
68 | SC1CCC(C1c1cnccn1)C1CSCCS1 | 1238.45 |
293 | OC1NCN(CN1)c1ccc(cc1)c1scnn1 | 1366.07 |
379 | C1CSC(CS1)c1ccccc1c1ccc(cc1)c1ccco1 | 1193.12 |
253 | c1cnc(cn1)c1ccc2c(c1)cccc2c1csc(n1)c1ccco1 | 1258.63 |
107 | n1ccc(cc1)c1cscc1c1cccc2c1cc(cc2)c1cccnc1 | 1186.29 |
115 | c1cnc(cn1)c1csc(n1)C1(CCCC1)c1ccncc1 | 1209.59 |
15 | CC1CCC(C1)C1CCCC1c1cccs1 | 1005.60 |
218 | CC1(CCCC1)c1ccc(s1)c1scnn1 | 1199.65 |
86 | c1cnc(cn1)c1coc(c1)c1nccc(c1)c1cccc2c1cccc2 | 1209.81 |
391 | c1cnc(cn1)c1nc(c(s1)c1nncs1)c1cnccn1 | 1403.86 |
28 | s1cnc(c1)c1cc(cc2c1cccc2)c1csc(c1)c1scnc1 | 1322.37 |
366 | n1ccc(cc1)C1NCNCN1c1cncc(c1)c1cccnc1 | 1235.90 |
309 | C1NC(NC(N1)c1nncs1)c1ccc2c(c1)cc(cc2)c1cccnc1 | 1282.84 |
401 | C1CCC(C1)(c1cncs1)c1cc2ccccc2c(c1)c1cscn1 | 1211.16 |
25 | Cc1c2ccccc2ccc1C1(CSCCS1)C1NCNCN1 | 1232.82 |
216 | Cc1ccc2c(c1C1NCNCN1C1NCNCN1)cccc2 | 1196.18 |
469 | C1SCC(SC1)c1nsnc1C1(CCCC1)c1ccco1 | 1270.09 |
47 | o1ccc(c1)CC1CCCC1c1cccs1 | 1088.77 |
186 | SC1CCC(C1)C1CSCC(S1)c1cccnc1 | 1198.04 |
Run autoML for a regression task
[3]:
from chemml.autoML import ModelScreener
MS = ModelScreener(df, target="density_Kg/m3", featurization=True, smiles="smiles",
screener_type="regressor", output_file="testing.txt")
scores = MS.screen_models(n_best=4)
featurizing molecules in batches of 2 ...
25/25 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15s 614ms/step
Merging batch features ... [DONE]
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use TopologicalTorsionGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:45] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:46] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:47] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
[09:29:48] DEPRECATION WARNING: please use MorganGenerator
split done!
--- 1460.8564009666443 seconds ---
split done!
--- 1754.8138766288757 seconds ---
split done!
--- 556.0454123020172 seconds ---
split done!
--- 2021.2931699752808 seconds ---
split done!
--- 450.1350498199463 seconds ---
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 4
1 from chemml.autoML import ModelScreener
2 MS = ModelScreener(df, target="density_Kg/m3", featurization=True, smiles="smiles",
3 screener_type="regressor", output_file="testing.txt")
----> 4 scores = MS.screen_models(n_best=4)
File c:\users\nitin\documents\ub\hachmann_group\chemml_dev_nitin\chemml\chemml\autoML\model_screener.py:459, in ModelScreener.screen_models(self, n_best)
456 print("\n--- %s seconds ---" % (time.time() - start_time))
458 # aggregate scores list
--> 459 best_models = self.aggregate_scores(scores_list=scores_list_final, n_best=n_best)
461 return best_models
File c:\users\nitin\documents\ub\hachmann_group\chemml_dev_nitin\chemml\chemml\autoML\model_screener.py:374, in ModelScreener.aggregate_scores(self, scores_list, n_best)
350 def aggregate_scores(self, scores_list, n_best):
351 """
352 This function aggregates a list of scores, combines them into a pandas dataframe, sorts them by
353 RMSE in ascending order, and returns the top n_best scores.
(...)
370 the top n_best scores from the combined scores list, sorted by RMSE in ascending order.
371 """
--> 374 scores_combined = pd.concat(scores_list, ignore_index=True)
376 if self.screener_type == "regressor":
377 self.scores_combined = scores_combined.sort_values(by='RMSE', ascending=True)
File c:\Users\nitin\anaconda3\envs\chemml_dev_env\Lib\site-packages\pandas\core\reshape\concat.py:382, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
379 elif copy and using_copy_on_write():
380 copy = False
--> 382 op = _Concatenator(
383 objs,
384 axis=axis,
385 ignore_index=ignore_index,
386 join=join,
387 keys=keys,
388 levels=levels,
389 names=names,
390 verify_integrity=verify_integrity,
391 copy=copy,
392 sort=sort,
393 )
395 return op.get_result()
File c:\Users\nitin\anaconda3\envs\chemml_dev_env\Lib\site-packages\pandas\core\reshape\concat.py:445, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
442 self.verify_integrity = verify_integrity
443 self.copy = copy
--> 445 objs, keys = self._clean_keys_and_objs(objs, keys)
447 # figure out what our result ndim is going to be
448 ndims = self._get_ndims(objs)
File c:\Users\nitin\anaconda3\envs\chemml_dev_env\Lib\site-packages\pandas\core\reshape\concat.py:507, in _Concatenator._clean_keys_and_objs(self, objs, keys)
504 objs_list = list(objs)
506 if len(objs_list) == 0:
--> 507 raise ValueError("No objects to concatenate")
509 if keys is None:
510 objs_list = list(com.not_none(*objs_list))
ValueError: No objects to concatenate
[4]:
scores
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 scores
NameError: name 'scores' is not defined
Save scores to csv
[5]:
scores.to_csv("autoML_test.csv",index=False)