{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate Morgan fingerprints from SMILES codes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a tutorial on generating Morgan fingerprints from SMILES codes provided in an excel file using the GUI. The excel sheet consists of single column of SMILES codes of a few molecules. We read the SMILES codes, generate their Morgan Fingerprints, which are available through the RDKit library, and save them. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f9f651f96f9142a7b513a79e3bd6b0a1", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Accordion(children=(VBox(children=(Label(value='Choose how to start:', layout=Layout(width='50%')), Tab(childr…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "The computation graph will be displayed here:\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'')" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", " Please ensure that you are supplying an excel file from your PC.\n", " \n", " Template includes a random file which is not a part of the ChemML library.\n", "\n", " This template will not work if a custom file is not supplied.\n", " \n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a2a7aadf61b74489b30823682e310b96", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'\\x89PNG\\r\\n\\x1a\\n\\x00\\x00\\x00\\rIHDR\\x00\\x00\\x016\\x00\\x00\\x01\\x97\\x08\\x06\\x00\\x00\\x009\\xfc|\\xe2\\x…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The ChemML Wrapper's config file has been successfully saved ...\n", " config file path: chemML_config.txt\n", " current directory: /mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks\n", " what's next? run the ChemML Wrapper using the config file with the following codes:\n", " >>> from chemml.wrapper.engine import run\n", " >>> run(INPUT_FILE = 'path_to_the_config_file', OUTPUT_DIRECTORY = 'CMLWrapper_out')\n", "... you can also create a python script of the above codes and run it on any cluster that ChemML is installed.\n" ] } ], "source": [ "from chemml.wrapper.notebook import ChemMLNotebook\n", "ui = ChemMLNotebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The workflow gives a precise representation of all the intermediate steps, blocks used to develop the model, the saved data and the inputs/outputs to each block. Once the workflow is finalized, we save the input script with our desired file name in _.txt_ format." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Note:_ In this case, we specify our output directory as ‘read_excel’." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=================================================\n", "=================================================\n", "Fri Jun 4 14:53:11 2021\n", "\n", "parsing the input file: /mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks/chemML_config.txt ...\n", "\n", "=================================================\n", "\n", "======= block#1: (pandas, read_excel)\n", "| run ...\n", "\n", "| ... done!\n", "| execution time: 6.30s (0h 0m 6.30s)\n", "=======\n", "\n", "\n", "======= block#2: (chemml, SaveFile)\n", "| run ...\n", "\n", "| ... done!\n", "| execution time: 0.04s (0h 0m 0.04s)\n", "=======\n", "\n", "\n", "======= block#3: (chemml, RDKitFingerprint)\n", "| run ...\n", "\n", "| ... done!\n", "| execution time: 0.02s (0h 0m 0.02s)\n", "=======\n", "\n", "\n", "======= block#4: (chemml, SaveFile)\n", "| run ...\n", "\n", "| ... done!\n", "| execution time: 0.01s (0h 0m 0.01s)\n", "=======\n", "\n", "\n", "Total execution time: 6.38s (0h 0m 6.38s)\n", "2021-06-04 14:53:18\n", "\n" ] } ], "source": [ "from chemml.wrapper.engine import run\n", "run(INPUT_FILE = '/mnt/c/Aatish/UB/Mr. Hachmann/master_chemml_wrapper_v2/chemml/docs/ipython_notebooks/chemML_config.txt', OUTPUT_DIRECTORY = 'read_excel')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "1014 | \n", "1015 | \n", "1016 | \n", "1017 | \n", "1018 | \n", "1019 | \n", "1020 | \n", "1021 | \n", "1022 | \n", "1023 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
7 rows × 1024 columns
\n", "