Generate Morgan fingerprints from SMILES codes
This is a tutorial on generating Morgan fingerprints from SMILES codes provided in an excel file using the GUI. The excel sheet consists of single column of SMILES codes of a few molecules. We read the SMILES codes, generate their Morgan Fingerprints, which are available through the RDKit library, and save them.
[1]:
from chemml.wrapper.notebook import ChemMLNotebook
ui = ChemMLNotebook()
The computation graph will be displayed here:
Please ensure that you are supplying an excel file from your PC.
Template includes a random file which is not a part of the ChemML library.
This template will not work if a custom file is not supplied.
The ChemML Wrapper's config file has been successfully saved ...
config file name: read_excel.txt
current directory: c:\Users\nitin\Documents\UB\Hachmann_Group\chemml_dev_nitin\chemml\docs\ipython_notebooks
what's next? run the ChemML Wrapper using the config file with the following codes:
>>> from chemml.wrapper.engine import run
>>> run(INPUT_FILE = 'path_to_the_config_file', OUTPUT_DIRECTORY = 'CMLWrapper_out')
... you can also create a python script of the above codes and run it on any cluster that ChemML is installed.
The workflow gives a precise representation of all the intermediate steps, blocks used to develop the model, the saved data and the inputs/outputs to each block. Once the workflow is finalized, we save the input script with our desired file name in .txt format.
Note: In this case, we specify our output directory as ‘read_excel’.
[2]:
from chemml.wrapper.engine import run
run(INPUT_FILE = './template_workflows/read_excel.txt', OUTPUT_DIRECTORY = 'read_excel')
=================================================
=================================================
Tue Nov 5 15:17:15 2024
parsing the input file: ./template_workflows/read_excel.txt ...
1 Task: (Input,table)
<<<<<<<
host = pandas
function = read_excel
io = pi_smiles.xlsx
engine = openpyxl
>>>>>>>
df -> send (id=0)
:nothing to receive:
2 Task: (Output,file)
<<<<<<<
host = chemml
function = SaveFile
format = smi
output_directory = .
header = False
filename = smiles
>>>>>>>
filepath -> send (id=1)
df <- recv (id=0)
3 Task: (Represent,molecular descriptors)
<<<<<<<
host = chemml
function = RDKitFingerprint
>>>>>>>
df -> send (id=2)
molfile <- recv (id=1)
4 Task: (Output,file)
<<<<<<<
host = chemml
function = SaveFile
filename = fingerprints
>>>>>>>
:nothing to send:
df <- recv (id=2)
=================================================
======= block#1: (pandas, read_excel)
| run ...
| ... done!
| execution time: 5.74s (0h 0m 5.74s)
=======
======= block#2: (chemml, SaveFile)
| run ...
| ... done!
| execution time: 0.01s (0h 0m 0.01s)
=======
======= block#3: (chemml, RDKitFingerprint)
| run ...
| ... done!
| execution time: 0.02s (0h 0m 0.02s)
=======
======= block#4: (chemml, SaveFile)
| run ...
| ... done!
| execution time: 0.00s (0h 0m 0.00s)
=======
Total execution time: 5.76s (0h 0m 5.76s)
2024-11-05 15:17:20
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[3]:
import pandas as pd
df=pd.read_csv("read_excel/fingerprints.csv")
df
[3]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 1014 | 1015 | 1016 | 1017 | 1018 | 1019 | 1020 | 1021 | 1022 | 1023 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7 rows × 1024 columns
The file named ‘fingerprints.csv’ has the Morgan fingerprints for each of the SMILES code provided. These can now be used for further calculations.