Generate Morgan fingerprints from SMILES codes
This is a tutorial on generating Morgan fingerprints from SMILES codes provided in an excel file using the GUI. The excel sheet consists of single column of SMILES codes of a few molecules. We read the SMILES codes, generate their Morgan Fingerprints, which are available through the RDKit library, and save them.
[1]:
from chemml.wrapper.notebook import ChemMLNotebook
ui = ChemMLNotebook()
The computation graph will be displayed here:
Please ensure that you are supplying an excel file from your PC.
Template includes a random file which is not a part of the ChemML library.
This template will not work if a custom file is not supplied.
The ChemML Wrapper's config file has been successfully saved ...
config file path: chemML_config.txt
current directory: /mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks
what's next? run the ChemML Wrapper using the config file with the following codes:
>>> from chemml.wrapper.engine import run
>>> run(INPUT_FILE = 'path_to_the_config_file', OUTPUT_DIRECTORY = 'CMLWrapper_out')
... you can also create a python script of the above codes and run it on any cluster that ChemML is installed.
The workflow gives a precise representation of all the intermediate steps, blocks used to develop the model, the saved data and the inputs/outputs to each block. Once the workflow is finalized, we save the input script with our desired file name in .txt format.
Note: In this case, we specify our output directory as ‘read_excel’.
[2]:
from chemml.wrapper.engine import run
run(INPUT_FILE = '/mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks/chemML_config.txt', OUTPUT_DIRECTORY = 'read_excel')
=================================================
=================================================
Fri Jun 4 14:53:11 2021
parsing the input file: /mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks/chemML_config.txt ...
=================================================
======= block#1: (pandas, read_excel)
| run ...
| ... done!
| execution time: 6.30s (0h 0m 6.30s)
=======
======= block#2: (chemml, SaveFile)
| run ...
| ... done!
| execution time: 0.04s (0h 0m 0.04s)
=======
======= block#3: (chemml, RDKitFingerprint)
| run ...
| ... done!
| execution time: 0.02s (0h 0m 0.02s)
=======
======= block#4: (chemml, SaveFile)
| run ...
| ... done!
| execution time: 0.01s (0h 0m 0.01s)
=======
Total execution time: 6.38s (0h 0m 6.38s)
2021-06-04 14:53:18
[3]:
import pandas as pd
df=pd.read_csv("read_excel/fingerprints.csv")
df
[3]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 1014 | 1015 | 1016 | 1017 | 1018 | 1019 | 1020 | 1021 | 1022 | 1023 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7 rows × 1024 columns
The file named ‘fingerprints.csv’ has the Morgan fingerprints for each of the SMILES code provided. These can now be used for further calculations.