Generate Morgan fingerprints from SMILES codes

This is a tutorial on generating Morgan fingerprints from SMILES codes provided in an excel file using the GUI. The excel sheet consists of single column of SMILES codes of a few molecules. We read the SMILES codes, generate their Morgan Fingerprints, which are available through the RDKit library, and save them.

[1]:
from chemml.wrapper.notebook import ChemMLNotebook
ui = ChemMLNotebook()
The computation graph will be displayed here:

                    Please ensure that you are supplying an excel file from your PC.

                    Template includes a random file which is not a part of the ChemML library.

                    This template will not work if a custom file is not supplied.


The ChemML Wrapper's config file has been successfully saved ...
    config file name: read_excel.txt
    current directory: c:\Users\nitin\Documents\UB\Hachmann_Group\chemml_dev_nitin\chemml\docs\ipython_notebooks
    what's next? run the ChemML Wrapper using the config file with the following codes:
        >>> from chemml.wrapper.engine import run
        >>> run(INPUT_FILE = 'path_to_the_config_file', OUTPUT_DIRECTORY = 'CMLWrapper_out')
... you can also create a python script of the above codes and run it on any cluster that ChemML is installed.

The workflow gives a precise representation of all the intermediate steps, blocks used to develop the model, the saved data and the inputs/outputs to each block. Once the workflow is finalized, we save the input script with our desired file name in .txt format.

Note: In this case, we specify our output directory as ‘read_excel’.

[2]:
from chemml.wrapper.engine import run
run(INPUT_FILE = './template_workflows/read_excel.txt', OUTPUT_DIRECTORY = 'read_excel')
=================================================
=================================================
Tue Nov  5 15:17:15 2024

parsing the input file: ./template_workflows/read_excel.txt ...

1   Task: (Input,table)
        <<<<<<<
        host = pandas
        function = read_excel
        io = pi_smiles.xlsx
        engine = openpyxl
        >>>>>>>
        df -> send (id=0)
         :nothing to receive:

2   Task: (Output,file)
        <<<<<<<
        host = chemml
        function = SaveFile
        format = smi
        output_directory = .
        header = False
        filename = smiles
        >>>>>>>
        filepath -> send (id=1)
        df <- recv (id=0)

3   Task: (Represent,molecular descriptors)
        <<<<<<<
        host = chemml
        function = RDKitFingerprint
        >>>>>>>
        df -> send (id=2)
        molfile <- recv (id=1)

4   Task: (Output,file)
        <<<<<<<
        host = chemml
        function = SaveFile
        filename = fingerprints
        >>>>>>>
         :nothing to send:
        df <- recv (id=2)

=================================================

======= block#1: (pandas, read_excel)
| run ...

| ... done!
| execution time: 5.74s (0h 0m 5.74s)
=======


======= block#2: (chemml, SaveFile)
| run ...

| ... done!
| execution time: 0.01s (0h 0m 0.01s)
=======


======= block#3: (chemml, RDKitFingerprint)
| run ...

| ... done!
| execution time: 0.02s (0h 0m 0.02s)
=======


======= block#4: (chemml, SaveFile)
| run ...

| ... done!
| execution time: 0.00s (0h 0m 0.00s)
=======


Total execution time: 5.76s (0h 0m 5.76s)
2024-11-05 15:17:20

[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[15:17:20] DEPRECATION WARNING: please use MorganGenerator
[3]:
import pandas as pd
df=pd.read_csv("read_excel/fingerprints.csv")
df
[3]:
0 1 2 3 4 5 6 7 8 9 ... 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 0
3 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

7 rows × 1024 columns

The file named ‘fingerprints.csv’ has the Morgan fingerprints for each of the SMILES code provided. These can now be used for further calculations.