Generate Morgan fingerprints from SMILES codes

This is a tutorial on generating Morgan fingerprints from SMILES codes provided in an excel file using the GUI. The excel sheet consists of single column of SMILES codes of a few molecules. We read the SMILES codes, generate their Morgan Fingerprints, which are available through the RDKit library, and save them.

[1]:
from chemml.wrapper.notebook import ChemMLNotebook
ui = ChemMLNotebook()
The computation graph will be displayed here:

                    Please ensure that you are supplying an excel file from your PC.

                    Template includes a random file which is not a part of the ChemML library.

                    This template will not work if a custom file is not supplied.


The ChemML Wrapper's config file has been successfully saved ...
    config file path: chemML_config.txt
    current directory: /mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks
    what's next? run the ChemML Wrapper using the config file with the following codes:
        >>> from chemml.wrapper.engine import run
        >>> run(INPUT_FILE = 'path_to_the_config_file', OUTPUT_DIRECTORY = 'CMLWrapper_out')
... you can also create a python script of the above codes and run it on any cluster that ChemML is installed.

The workflow gives a precise representation of all the intermediate steps, blocks used to develop the model, the saved data and the inputs/outputs to each block. Once the workflow is finalized, we save the input script with our desired file name in .txt format.

Note: In this case, we specify our output directory as ‘read_excel’.

[2]:
from chemml.wrapper.engine import run
run(INPUT_FILE = '/mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks/chemML_config.txt', OUTPUT_DIRECTORY = 'read_excel')
=================================================
=================================================
Fri Jun  4 14:53:11 2021

parsing the input file: /mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks/chemML_config.txt ...

=================================================

======= block#1: (pandas, read_excel)
| run ...

| ... done!
| execution time: 6.30s (0h 0m 6.30s)
=======


======= block#2: (chemml, SaveFile)
| run ...

| ... done!
| execution time: 0.04s (0h 0m 0.04s)
=======


======= block#3: (chemml, RDKitFingerprint)
| run ...

| ... done!
| execution time: 0.02s (0h 0m 0.02s)
=======


======= block#4: (chemml, SaveFile)
| run ...

| ... done!
| execution time: 0.01s (0h 0m 0.01s)
=======


Total execution time: 6.38s (0h 0m 6.38s)
2021-06-04 14:53:18

[3]:
import pandas as pd
df=pd.read_csv("read_excel/fingerprints.csv")
df
[3]:
0 1 2 3 4 5 6 7 8 9 ... 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 0
3 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

7 rows × 1024 columns

The file named ‘fingerprints.csv’ has the Morgan fingerprints for each of the SMILES code provided. These can now be used for further calculations.