{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Streamlined API performance tuning\n",
    "\n",
    "\n",
    "## Introduction\n",
    "\n",
    "This Notebook provides some tips and tricks on improving the performance of the Streamlined API.\n",
    "\n",
    "The Streamlined API is a productivity layer that wraps the Foundation Layer. It typically loads a lot of information for the user in anticipation that the user will need it, which  makes exploratory work in interactive sessions (such as in Jupyter Notebooks) very powerful as a tool. The examples below show some coding approaches that may improve the speed and performance of your programs.\n",
    "\n",
    "\n",
    "## Disclaimer\n",
    "\n",
    "Execution times for some of the functions are shown, however, note that performance depends fundamentally on your hardware setup and connection speed, and on previously-executed code in the same Notebook, amongst other factors. We cannot guarantee that the saved outputs in this Notebook will match what you will see, and they are intended primarily as a guide rather than a benchmark."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Establishing a Session\n",
    "\n",
    "The first steps in any MI Scripting Toolkit script are to make a connection to a Granta MI Service Layer, then fetch a database and table. The three objects, and the fastest ways of acquiring them, are as follows:\n",
    "\n",
    "1. `Session` - `mpy.connect()` is the fastest way to create a session.\n",
    "2. `Database` - `Session.get_db()` is the fastest way to fetch a database\n",
    "3. `Table` - `Database.get_table()` is the fastest way to fetch a table\n",
    "\n",
    "`get_db` and `get_table`, however, actually fetch *all* the databases and *all* the tables (for the selected database) and just return the one that was asked for. The results are then cached; this means those calls to the Service Layer will not be repeated when the objects are accessed again. You can see the effect of this when re-running the cell below multiple times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3.0003 s - Time elapsed to reach the first milestone\n",
      "0.0000 s - Time elapsed between milestone 1 and the end\n",
      "0.0000 s - Time elapsed getting the MaterialUniverse table\n"
     ]
    }
   ],
   "source": [
    "from GRANTA_MIScriptingToolkit import granta as mpy\n",
    "import time\n",
    "\n",
    "s = mpy.connect('http://localhost/mi_servicelayer', autologon=True)\n",
    "# all tables & databases are cached when requested\n",
    "start = time.time()\n",
    "db = s.get_db(db_key='MI_Training')\n",
    "tab = db.get_table('Design Data')\n",
    "milestone1 = time.time()\n",
    "db = s.get_db(db_key='MI_Training')\n",
    "tab = db.get_table('Design Data')\n",
    "end = time.time()\n",
    "\n",
    "print('{:4.4f} s - Time elapsed to reach the first milestone'.format(milestone1 - start))\n",
    "print('{:4.4f} s - Time elapsed between milestone 1 and the end'.format(end - milestone1))\n",
    "\n",
    "# ALL are cached, so subsequent requests for different tables will still be quick.\n",
    "start = time.time()\n",
    "tab = db.get_table('MaterialUniverse')\n",
    "end = time.time()\n",
    "print('{:4.4f} s - Time elapsed getting the MaterialUniverse table'.format(end - milestone1))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Locating records\n",
    "\n",
    "There are many search methods in the Streamlined API and all of them make a single Service Layer call when executed.\n",
    "\n",
    "There are, however, some methods that get records from a table and also populate caches which can be useful. For example, if you have some very specific filtering that you want to apply to all records that cannot be found using the existing functionality, such as a list of all records with 3 or more children. In this case, you could use `Table.all_records()` to return a list of all records in the table and populate the children of each record, in one call. You would then apply your own search filters to the record list to execute a search."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.1926 s - Time elapsed to reach the first milestone\n",
      "0.0000 s - Time elapsed between milestone 1 and the end\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "recs = tab.all_records(include_folders=True, include_generics=True)\n",
    "milestone1 = time.time()\n",
    "filtered_recs = [r for r in recs if len(r.children) > 3]\n",
    "end = time.time()\n",
    "print('{:4.4f} s - Time elapsed to reach the first milestone'.format(milestone1 - start))\n",
    "print('{:4.4f} s - Time elapsed between milestone 1 and the end'.format(end - milestone1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Record long name:Alumino silicate glass>,\n",
       " <Record long name:Glasses>,\n",
       " <Record long name:Ceramics and glasses>,\n",
       " <Record long name:Wrought>,\n",
       " <Record long name:Low alloy steel>,\n",
       " <Record long name:Ferrous alloys>,\n",
       " <Record long name:Wrought aluminum alloy>,\n",
       " <Record long name:Aluminum>,\n",
       " <Record long name:Titanium alpha-beta alloy>,\n",
       " <Record long name:Titanium>,\n",
       " <Record long name:Non-ferrous alloys>,\n",
       " <Record long name:Metals and alloys>,\n",
       " <Record long name:PVC-elastomer (Polyvinyl Chloride elastomer)>,\n",
       " <Record long name:Thermoplastic elastomers (TPE)>,\n",
       " <Record long name:Elastomers>,\n",
       " <Record long name:ABS - unfilled>,\n",
       " <Record long name:PMMA - unfilled>,\n",
       " <Record long name:PMMA - unfilled>,\n",
       " <Record long name:Thermoplastics>,\n",
       " <Record long name:Plastics>,\n",
       " <Record long name:Polymers: plastics, elastomers>]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filtered_recs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating records\n",
    "\n",
    "Records are normally created using the `table` object through `Table.create_record`, however, when creating a record at a specific (known) location in the tree, it can be time consuming to fetch the record objects in the path needed to assign the correct parent. For this use case, we have added two methods which take a path (or paths) through the tree and traverse it, creating any nodes that don't already exist along the way.\n",
    "\n",
    "These two methods exist on the `Table` object:\n",
    "\n",
    "1. `Table.path_from` - Check for the existence of a path through the table and create it if it does not exist\n",
    "2. `Table.paths_from` - Create multiple paths in one call"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import datetime\n",
    "\n",
    "# create a new top-level record\n",
    "tab = db.get_table('Training Exercise for Import')\n",
    "rec = s.update([tab.create_record('Top-level folder {}'.format(str(datetime.datetime.now())), folder=True)],\n",
    "                                   refresh_attributes=False)[0]\n",
    "rec.children"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Top-level folder 2021-03-23 16:37:42.744735', 'our', 'new', 'path']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "end_leaf = tab.path_from(rec, ['our', 'new', 'path'], end_node='and optional record', color='Fuchsia')\n",
    "end_leaf.path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Top-level folder 2021-03-23 16:37:42.744735', 'our', 'new', 'route']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# any folders that already exist in a path will not be altered, even if we alter the arguments\n",
    "end_leaf = tab.path_from(rec, ['our', 'new', 'route'], end_node='and optional record', color='Silver')\n",
    "end_leaf.path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Accessing attributes and links\n",
    "\n",
    "The first time you call `Record.attributes`, the Streamlined API checks to see if the `attributes` property has been populated yet. If not, it will automatically  populate it with all the attributes on the record. This is inefficient if you are only interested in one or two attributes. The same is true of `links`. We provide two `Table` methods which can be used to ensure only the objects you are interested in are fetched.\n",
    "\n",
    "* `Table.bulk_fetch` - fetches specified attributes for specified records\n",
    "    * If an attribute is not populated, an empty attribute object is added to the dictionary instead\n",
    "* `Table.bulk_link_fetch` - fetches specified link groups for specified records\n",
    "\n",
    "These will fetch data for _just_ the attributes and records you specify; the properties will be populated in the record objects you provide and the methods do not return anything.\n",
    "\n",
    "The most efficient way to access attributes and links is using these two methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "11.5268 s - Time elapsed to reach the first milestone\n",
      "5.2345 s - Time elapsed between milestone 1 and 2\n",
      "0.5472 s - Time elapsed between milestone 2 and 3\n",
      "0.2509 s - Time elapsed between milestone 2 and the end\n"
     ]
    }
   ],
   "source": [
    "tab = db.get_table('MaterialUniverse')\n",
    "recs = tab.all_records()\n",
    "\n",
    "# bulk_fetch (& all the bulk_xxx methods) can be parallelized by the MI Scripting Toolkit IF you have the cores available\n",
    "start = time.time()\n",
    "# populate all attributes on every record\n",
    "for r in recs:\n",
    "    _ = r.attributes\n",
    "    r.decache_attributes\n",
    "milestone1 = time.time()\n",
    "\n",
    "# populate all attributes on every record using bulk_fetch\n",
    "tab.bulk_fetch(recs)\n",
    "\n",
    "milestone2 = time.time()\n",
    "# just populate Base and Young's Modulus on every record\n",
    "tab.bulk_fetch(recs, attributes=['Base', \"Young's modulus\"], batch_size=5, parallelise=False)\n",
    "\n",
    "milestone3 = time.time()\n",
    "# Parallelise the previous command\n",
    "tab.bulk_fetch(recs, attributes=['Base', \"Young's modulus\"], batch_size=5, parallelise=True, max_num_threads=3)\n",
    "\n",
    "end = time.time()\n",
    "print('{:4.4f} s - Time elapsed to reach the first milestone'.format(milestone1 - start))\n",
    "print('{:4.4f} s - Time elapsed between milestone 1 and 2'.format(milestone2 - milestone1))\n",
    "print('{:4.4f} s - Time elapsed between milestone 2 and 3'.format(milestone3 - milestone2))\n",
    "print('{:4.4f} s - Time elapsed between milestone 2 and the end'.format(end - milestone3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Oxide'] {'low': 84.8, 'high': 89.1} Alumino silicate - 1720\n",
      "['Oxide'] {'low': 83.9, 'high': 88.1} Alumino silicate - 1723\n",
      "['Oxide'] {'low': 84.0, 'high': 88.0} Lithium aluminosilicate\n",
      "['Oxide'] {'low': 57.3, 'high': 75.9} Soda barium glass\n",
      "['Oxide'] {'low': 66.30000305175781, 'high': 69.69999694824219} Barium silicate\n",
      "['Fe (Iron)'] {'low': 183.0, 'high': 192.38461303710938} 250 maraging steel, maraged at 900F\n",
      "['Fe (Iron)'] {'low': 200.0, 'high': 210.25640869140625} Low alloy steel, AISI 4130, air melted, normalized\n",
      "['Fe (Iron)'] {'low': 200.0, 'high': 210.25640869140625} Low alloy steel, AISI 4130, air melted, quenched & tempered\n",
      "['Al (Aluminum)'] {'low': 69.0, 'high': 76.0} Aluminum, 7075, wrought, T6\n",
      "['Al (Aluminum)'] {'low': 69.0, 'high': 72.53845977783203} Aluminum, 7075, wrought, T73\n",
      "['Ti (Titanium)'] {'low': 111.0, 'high': 119.0} Titanium, alpha-beta alloy, Ti-6Al-4V, aged\n",
      "['Ti (Titanium)'] {'low': 110.0, 'high': 119.0} Titanium, alpha-beta alloy, Ti-6Al-4V, annealed, generic\n",
      "['Ti (Titanium)'] {'low': 110.0, 'high': 117.0} Titanium, alpha-beta alloy, Ti-6Al-4V, solution treated & aged\n",
      "['Polymer'] {'low': 0.0006011593504808843, 'high': 0.0009381033014506102} PVC-elastomer (Shore A35)\n",
      "['Polymer'] {'low': 0.002864203881472349, 'high': 0.0029362563509494066} PVC-elastomer (Shore A55)\n",
      "['Polymer'] {'low': 0.00871698185801506, 'high': 0.011692171916365623} PVC-elastomer (Shore A75)\n",
      "['Polymer'] {'low': 0.0035647887270897627, 'high': 0.005562817677855492} PVC-elastomer (Shore A75, flame retarded)\n",
      "['Polymer'] {'low': 12.399999618530273, 'high': 13.800000190734863} ABS (20% carbon fiber, EMI shielding, conductive)\n",
      "['Polymer'] {'low': 5.099999904632568, 'high': 6.070000171661377} ABS (20% glass fiber, injection molding)\n",
      "['Polymer'] {'low': 5.796008586883545, 'high': 6.0} ABS (20% glass fiber, injection molding, flame retarded)\n",
      "['Polymer'] {'low': 6.800000190734863, 'high': 8.300000190734863} ABS (30% glass fiber, injection molding)\n",
      "['Polymer'] {'low': 2.0, 'high': 2.9000000953674316} ABS (extrusion)\n",
      "['Polymer'] {'low': 1.9500000476837158, 'high': 2.049999952316284} ABS (transparent, injection molding)\n",
      "['Polymer'] {'low': 2.700000047683716, 'high': 2.9000000953674316} PMMA (cast sheet)\n",
      "['Polymer'] {'low': 2.4100000858306885, 'high': 3.799999952316284} PMMA (heat resistant)\n",
      "['Polymer'] {'low': 1.600000023841858, 'high': 3.299999952316284} PMMA (impact modified)\n",
      "['Polymer'] {'low': 2.240000009536743, 'high': 3.240000009536743} PMMA (molding and extrusion)\n",
      "None None PMMA - unfilled\n",
      "['Other'] {'low': 0.10000000149011612, 'high': 0.5} Leather\n"
     ]
    }
   ],
   "source": [
    "for r in recs:\n",
    "    print(r.attributes['Base'].value, r.attributes['Young\\'s modulus'].value, r.name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.4504 s - Time elapsed to reach the first milestone\n",
      "0.4842 s - Time elapsed between milestone 1 and 2\n",
      "1.0611 s - Time elapsed between milestone 2 and 3\n",
      "0.5790 s - Time elapsed between milestone 2 and the end\n"
     ]
    }
   ],
   "source": [
    "tab = db.get_table('Metals Pedigree')\n",
    "recs = tab.all_records(include_folders=False)\n",
    "\n",
    "start = time.time()\n",
    "\n",
    "# manually fetch the links on each record, one record at a time\n",
    "for r in recs:\n",
    "    _ = r.links\n",
    "    r.decache_links\n",
    "milestone1 = time.time()\n",
    "\n",
    "# bulk fetch all links on every record\n",
    "tab.bulk_link_fetch(recs)\n",
    "\n",
    "milestone2 = time.time()\n",
    "\n",
    "# bulk_fetch (& all the bulk_xxx methods) can be parallelized by the MI Scripting Toolkit IF you have the cores available\n",
    "tab.bulk_link_fetch(recs,\n",
    "                    link_groups=['Tensile Test Data'],\n",
    "                    batch_size=5,\n",
    "                    parallelise=False)\n",
    "milestone3 = time.time()\n",
    "\n",
    "# parallelise\n",
    "tab.bulk_link_fetch(recs,\n",
    "                    link_groups=['Tensile Test Data'],\n",
    "                    batch_size=5,\n",
    "                    parallelise=True,\n",
    "                    max_num_threads=3)\n",
    "end = time.time()\n",
    "print('{:4.4f} s - Time elapsed to reach the first milestone'.format(milestone1 - start))\n",
    "print('{:4.4f} s - Time elapsed between milestone 1 and 2'.format(milestone2 - milestone1))\n",
    "print('{:4.4f} s - Time elapsed between milestone 2 and 3'.format(milestone3 - milestone2))\n",
    "print('{:4.4f} s - Time elapsed between milestone 2 and the end'.format(end - milestone3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "40 links on record \"Batch 948\"\n",
      "36 links on record \"Batch 943\"\n",
      "36 links on record \"Batch 921\"\n",
      "32 links on record \"Batch 978\"\n",
      "0 links on record \"2X2301A\"\n"
     ]
    }
   ],
   "source": [
    "for r in recs:\n",
    "    print('{} links on record \"{}\"'.format(len(r.links['Tensile Test Data']), r.name))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import records\n",
    "\n",
    "There is one method to import/update records: `Session.update`. In MI Scripting Toolkit 2.1, we added an additional argument which can be used to streamline calls to `Session.update` in specific cases. Normally, after importing records, the MI Scripting Toolkit will fetch the newly imported data on those records to return the results of your changes back to you. However, if you set `refresh_attributes` to `False`, it won't do this, and you can fetch attributes you're interested in (if you want to) using `bulk_fetch`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.7440 s - Time elapsed to reach the first milestone\n",
      "0.6523 s - Time elapsed between milestone 1 and the end\n"
     ]
    }
   ],
   "source": [
    "tab = db.get_table('MaterialUniverse')\n",
    "start = time.time()\n",
    "# Let's create a record and give it some data on a single attribute\n",
    "new_rec1 = tab.create_record('My new record {}'.format(str(datetime.datetime.now())))\n",
    "a = new_rec1.attributes['Density']\n",
    "a.points = [1.3]\n",
    "new_rec1.set_attributes([a])\n",
    "# This will re-fetch the record's attributes after setting them, in case anything has changed on the server!\n",
    "new_rec1 = s.update([new_rec1])[0]\n",
    "milestone1 = time.time()\n",
    "\n",
    "# Do the same again\n",
    "new_rec2 = tab.create_record('My new record {}'.format(str(datetime.datetime.now())))\n",
    "a = new_rec2.attributes['Density']\n",
    "a.points = [1.3]\n",
    "new_rec2.set_attributes([a])\n",
    "# This time we won't refetch the attributes because we only care about the attribute we edited: 'Density'\n",
    "new_rec2 = s.update([new_rec2], refresh_attributes=False)[0]\n",
    "# Instead we do it ourselves in an additional line of code\n",
    "tab.bulk_fetch([new_rec2], attributes=['Density'])\n",
    "end = time.time()\n",
    "\n",
    "# How long did both processes take?\n",
    "print('{:4.4f} s - Time elapsed to reach the first milestone'.format(milestone1 - start))\n",
    "print('{:4.4f} s - Time elapsed between milestone 1 and the end'.format(end - milestone1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deleting records & release states\n",
    "\n",
    "In addition to the methods shown above, record deletion and fetching of release states can also be performed in bulk. Unlike link fetching and attribute fetching, these are **NOT** table-specific methods, and so they exist as methods on the `Session` class. This means, for example, that records can be deleted from multiple tables and databases simultaneously. Both `Session`-level bulk methods can be parallelized and batched like their `Table`-level siblings. The two methods are called:\n",
    "\n",
    "1. `Session.bulk_fetch_release_states`\n",
    "1. `Session.bulk_delete_or_withdraw_records`\n",
    "\n",
    "On the whole, these methods are less pervasive, but are still worth demonstrating should you ever need to use them!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Individual Records \n",
      " ------------------ \n",
      "\n",
      "Unversioned\n",
      "Unversioned\n",
      "0.1682 s - Time to fetch release states\n",
      "0.4825 s - Time to manually delete the records\n",
      "\n",
      " Bulk batched Records \n",
      " -------------------- \n",
      "\n",
      "Unversioned\n",
      "Unversioned\n",
      "0.0982 s - Time to bulk fetch release states\n",
      "0.7699 s - Time to bulk delete the records\n"
     ]
    }
   ],
   "source": [
    "# Let's take some of the record's we've created in this example, fetch their release states, then delete them!\n",
    "# but let's split them into two batches, and compare the times taken doing things individually versus in bulk\n",
    "individuals = [new_rec1, new_rec2]\n",
    "bulk_batch = [end_leaf, rec]\n",
    "\n",
    "print(' Individual Records ')\n",
    "print(' ------------------ ')\n",
    "print()\n",
    "start = time.time()\n",
    "for r in individuals:\n",
    "    print(r.release_state)\n",
    "milestone1 = time.time()\n",
    "for r in individuals:\n",
    "    r.delete_or_withdraw_record_on_server()\n",
    "end = time.time()\n",
    "print('{:4.4f} s - Time to fetch release states'.format(milestone1 - start))\n",
    "print('{:4.4f} s - Time to manually delete the records'.format(end - milestone1))\n",
    "\n",
    "print()\n",
    "\n",
    "print(' Bulk batched Records ')\n",
    "print(' -------------------- ')\n",
    "print()\n",
    "start = time.time()\n",
    "s.bulk_fetch_release_states(bulk_batch)\n",
    "for r in bulk_batch:\n",
    "    print(r.release_state)\n",
    "milestone1 = time.time()\n",
    "s.bulk_delete_or_withdraw_records(bulk_batch)\n",
    "end = time.time()\n",
    "print('{:4.4f} s - Time to bulk fetch release states'.format(milestone1 - start))\n",
    "print('{:4.4f} s - Time to bulk delete the records'.format(end - milestone1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "The optimizations shown above are not significant when working with small amounts of data, and the ease of use provided by the Streamlined API makes it a very powerful tool for small operations. However, as you scale up the operations being undertaken to hundreds, thousands, and tens of thousands of records, the time saved by using these more efficient methods becomes more significant."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
