{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate Morgan fingerprints from SMILES codes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a tutorial on generating Morgan fingerprints from SMILES codes provided in an excel file using the GUI. The excel sheet consists of single column of SMILES codes of a few molecules. We read the SMILES codes, generate their Morgan Fingerprints, which are available through the RDKit library, and save them. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ceff07c8ef90404a8090babd586a6d69", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Accordion(children=(VBox(children=(Label(value='Choose how to start:', layout=Layout(width='50%')), Tab(childr…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "The computation graph will be displayed here:\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5b0d957c00494c629cea36d11b4b3503", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'')" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", " Please ensure that you are supplying an excel file from your PC.\n", " \n", " Template includes a random file which is not a part of the ChemML library.\n", "\n", " This template will not work if a custom file is not supplied.\n", " \n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4acf223b9e5149f49a37a4a6f1838f7d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'\\x89PNG\\r\\n\\x1a\\n\\x00\\x00\\x00\\rIHDR\\x00\\x00\\x00\\xfc\\x00\\x00\\x01\\x9d\\x08\\x06\\x00\\x00\\x006t\\x1e,\\x…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The ChemML Wrapper's config file has been successfully saved ...\n", " config file name: read_excel.txt\n", " current directory: c:\\Users\\nitin\\Documents\\UB\\Hachmann_Group\\chemml_dev_nitin\\chemml\\docs\\ipython_notebooks\n", " what's next? run the ChemML Wrapper using the config file with the following codes:\n", " >>> from chemml.wrapper.engine import run\n", " >>> run(INPUT_FILE = 'path_to_the_config_file', OUTPUT_DIRECTORY = 'CMLWrapper_out')\n", "... you can also create a python script of the above codes and run it on any cluster that ChemML is installed.\n" ] } ], "source": [ "from chemml.wrapper.notebook import ChemMLNotebook\n", "ui = ChemMLNotebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The workflow gives a precise representation of all the intermediate steps, blocks used to develop the model, the saved data and the inputs/outputs to each block. Once the workflow is finalized, we save the input script with our desired file name in _.txt_ format." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Note:_ In this case, we specify our output directory as ‘read_excel’." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=================================================\n", "=================================================\n", "Tue Nov 5 15:17:15 2024\n", "\n", "parsing the input file: ./template_workflows/read_excel.txt ...\n", "\n", "1 Task: (Input,table)\n", " <<<<<<<\n", " host = pandas\n", " function = read_excel\n", " io = pi_smiles.xlsx\n", " engine = openpyxl\n", " >>>>>>>\n", " df -> send (id=0)\n", " :nothing to receive:\n", " \n", "2 Task: (Output,file)\n", " <<<<<<<\n", " host = chemml\n", " function = SaveFile\n", " format = smi\n", " output_directory = .\n", " header = False\n", " filename = smiles\n", " >>>>>>>\n", " filepath -> send (id=1)\n", " df <- recv (id=0)\n", " \n", "3 Task: (Represent,molecular descriptors)\n", " <<<<<<<\n", " host = chemml\n", " function = RDKitFingerprint\n", " >>>>>>>\n", " df -> send (id=2)\n", " molfile <- recv (id=1)\n", " \n", "4 Task: (Output,file)\n", " <<<<<<<\n", " host = chemml\n", " function = SaveFile\n", " filename = fingerprints\n", " >>>>>>>\n", " :nothing to send:\n", " df <- recv (id=2)\n", " \n", "=================================================\n", "\n", "======= block#1: (pandas, read_excel)\n", "| run ...\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "| ... done!\n", "| execution time: 5.74s (0h 0m 5.74s)\n", "=======\n", "\n", "\n", "======= block#2: (chemml, SaveFile)\n", "| run ...\n", "\n", "| ... done!\n", "| execution time: 0.01s (0h 0m 0.01s)\n", "=======\n", "\n", "\n", "======= block#3: (chemml, RDKitFingerprint)\n", "| run ...\n", "\n", "| ... done!\n", "| execution time: 0.02s (0h 0m 0.02s)\n", "=======\n", "\n", "\n", "======= block#4: (chemml, SaveFile)\n", "| run ...\n", "\n", "| ... done!\n", "| execution time: 0.00s (0h 0m 0.00s)\n", "=======\n", "\n", "\n", "Total execution time: 5.76s (0h 0m 5.76s)\n", "2024-11-05 15:17:20\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[15:17:20] DEPRECATION WARNING: please use MorganGenerator\n", "[15:17:20] DEPRECATION WARNING: please use MorganGenerator\n", "[15:17:20] DEPRECATION WARNING: please use MorganGenerator\n", "[15:17:20] DEPRECATION WARNING: please use MorganGenerator\n", "[15:17:20] DEPRECATION WARNING: please use MorganGenerator\n", "[15:17:20] DEPRECATION WARNING: please use MorganGenerator\n", "[15:17:20] DEPRECATION WARNING: please use MorganGenerator\n" ] } ], "source": [ "from chemml.wrapper.engine import run\n", "run(INPUT_FILE = './template_workflows/read_excel.txt', OUTPUT_DIRECTORY = 'read_excel')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "1014 | \n", "1015 | \n", "1016 | \n", "1017 | \n", "1018 | \n", "1019 | \n", "1020 | \n", "1021 | \n", "1022 | \n", "1023 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
7 rows × 1024 columns
\n", "