View the .ipynb file.

Streamlined API performance tuning

Introduction

This Notebook provides some tips and tricks on improving the performance of the Streamlined API.

The Streamlined API is a productivity layer that wraps the Foundation Layer. It typically loads a lot of information for the user in anticipation that the user will need it, which makes exploratory work in interactive sessions (such as in Jupyter Notebooks) very powerful as a tool. The examples below show some coding approaches that may improve the speed and performance of your programs.

Disclaimer

Execution times for some of the functions are shown, however, note that performance depends fundamentally on your hardware setup and connection speed, and on previously-executed code in the same Notebook, amongst other factors. We cannot guarantee that the saved outputs in this Notebook will match what you will see, and they are intended primarily as a guide rather than a benchmark.

Establishing a Session

The first steps in any MI Scripting Toolkit script are to make a connection to a Granta MI Service Layer, then fetch a database and table. The three objects, and the fastest ways of acquiring them, are as follows:

  1. Session - mpy.connect() is the fastest way to create a session.

  2. Database - Session.get_db() is the fastest way to fetch a database

  3. Table - Database.get_table() is the fastest way to fetch a table

get_db and get_table, however, actually fetch all the databases and all the tables (for the selected database) and just return the one that was asked for. The results are then cached; this means those calls to the Service Layer will not be repeated when the objects are accessed again. You can see the effect of this when re-running the cell below multiple times.

[1]:
from GRANTA_MIScriptingToolkit import granta as mpy
import time

s = mpy.connect('http://localhost/mi_servicelayer', autologon=True)
# all tables & databases are cached when requested
start = time.time()
db = s.get_db(db_key='MI_Training')
tab = db.get_table('Design Data')
milestone1 = time.time()
db = s.get_db(db_key='MI_Training')
tab = db.get_table('Design Data')
end = time.time()

print('{:4.4f} s - Time elapsed to reach the first milestone'.format(milestone1 - start))
print('{:4.4f} s - Time elapsed between milestone 1 and the end'.format(end - milestone1))

# ALL are cached, so subsequent requests for different tables will still be quick.
start = time.time()
tab = db.get_table('MaterialUniverse')
end = time.time()
print('{:4.4f} s - Time elapsed getting the MaterialUniverse table'.format(end - milestone1))

3.0003 s - Time elapsed to reach the first milestone
0.0000 s - Time elapsed between milestone 1 and the end
0.0000 s - Time elapsed getting the MaterialUniverse table

Locating records

There are many search methods in the Streamlined API and all of them make a single Service Layer call when executed.

There are, however, some methods that get records from a table and also populate caches which can be useful. For example, if you have some very specific filtering that you want to apply to all records that cannot be found using the existing functionality, such as a list of all records with 3 or more children. In this case, you could use Table.all_records() to return a list of all records in the table and populate the children of each record, in one call. You would then apply your own search filters to the record list to execute a search.

[2]:
start = time.time()
recs = tab.all_records(include_folders=True, include_generics=True)
milestone1 = time.time()
filtered_recs = [r for r in recs if len(r.children) > 3]
end = time.time()
print('{:4.4f} s - Time elapsed to reach the first milestone'.format(milestone1 - start))
print('{:4.4f} s - Time elapsed between milestone 1 and the end'.format(end - milestone1))
1.1926 s - Time elapsed to reach the first milestone
0.0000 s - Time elapsed between milestone 1 and the end
[3]:
filtered_recs
[3]:
[<Record long name:Alumino silicate glass>,
 <Record long name:Glasses>,
 <Record long name:Ceramics and glasses>,
 <Record long name:Wrought>,
 <Record long name:Low alloy steel>,
 <Record long name:Ferrous alloys>,
 <Record long name:Wrought aluminum alloy>,
 <Record long name:Aluminum>,
 <Record long name:Titanium alpha-beta alloy>,
 <Record long name:Titanium>,
 <Record long name:Non-ferrous alloys>,
 <Record long name:Metals and alloys>,
 <Record long name:PVC-elastomer (Polyvinyl Chloride elastomer)>,
 <Record long name:Thermoplastic elastomers (TPE)>,
 <Record long name:Elastomers>,
 <Record long name:ABS - unfilled>,
 <Record long name:PMMA - unfilled>,
 <Record long name:PMMA - unfilled>,
 <Record long name:Thermoplastics>,
 <Record long name:Plastics>,
 <Record long name:Polymers: plastics, elastomers>]

Creating records

Records are normally created using the table object through Table.create_record, however, when creating a record at a specific (known) location in the tree, it can be time consuming to fetch the record objects in the path needed to assign the correct parent. For this use case, we have added two methods which take a path (or paths) through the tree and traverse it, creating any nodes that don’t already exist along the way.

These two methods exist on the Table object:

  1. Table.path_from - Check for the existence of a path through the table and create it if it does not exist

  2. Table.paths_from - Create multiple paths in one call

[4]:
import datetime

# create a new top-level record
tab = db.get_table('Training Exercise for Import')
rec = s.update([tab.create_record('Top-level folder {}'.format(str(datetime.datetime.now())), folder=True)],
                                   refresh_attributes=False)[0]
rec.children
[4]:
[]
[5]:
end_leaf = tab.path_from(rec, ['our', 'new', 'path'], end_node='and optional record', color='Fuchsia')
end_leaf.path
[5]:
['Top-level folder 2021-03-23 16:37:42.744735', 'our', 'new', 'path']
[6]:
# any folders that already exist in a path will not be altered, even if we alter the arguments
end_leaf = tab.path_from(rec, ['our', 'new', 'route'], end_node='and optional record', color='Silver')
end_leaf.path
[6]:
['Top-level folder 2021-03-23 16:37:42.744735', 'our', 'new', 'route']

Import records

There is one method to import/update records: Session.update. In MI Scripting Toolkit 2.1, we added an additional argument which can be used to streamline calls to Session.update in specific cases. Normally, after importing records, the MI Scripting Toolkit will fetch the newly imported data on those records to return the results of your changes back to you. However, if you set refresh_attributes to False, it won’t do this, and you can fetch attributes you’re interested in (if you want to) using bulk_fetch.

[11]:
tab = db.get_table('MaterialUniverse')
start = time.time()
# Let's create a record and give it some data on a single attribute
new_rec1 = tab.create_record('My new record {}'.format(str(datetime.datetime.now())))
a = new_rec1.attributes['Density']
a.points = [1.3]
new_rec1.set_attributes([a])
# This will re-fetch the record's attributes after setting them, in case anything has changed on the server!
new_rec1 = s.update([new_rec1])[0]
milestone1 = time.time()

# Do the same again
new_rec2 = tab.create_record('My new record {}'.format(str(datetime.datetime.now())))
a = new_rec2.attributes['Density']
a.points = [1.3]
new_rec2.set_attributes([a])
# This time we won't refetch the attributes because we only care about the attribute we edited: 'Density'
new_rec2 = s.update([new_rec2], refresh_attributes=False)[0]
# Instead we do it ourselves in an additional line of code
tab.bulk_fetch([new_rec2], attributes=['Density'])
end = time.time()

# How long did both processes take?
print('{:4.4f} s - Time elapsed to reach the first milestone'.format(milestone1 - start))
print('{:4.4f} s - Time elapsed between milestone 1 and the end'.format(end - milestone1))
0.7440 s - Time elapsed to reach the first milestone
0.6523 s - Time elapsed between milestone 1 and the end

Deleting records & release states

In addition to the methods shown above, record deletion and fetching of release states can also be performed in bulk. Unlike link fetching and attribute fetching, these are NOT table-specific methods, and so they exist as methods on the Session class. This means, for example, that records can be deleted from multiple tables and databases simultaneously. Both Session-level bulk methods can be parallelized and batched like their Table-level siblings. The two methods are called:

  1. Session.bulk_fetch_release_states

  2. Session.bulk_delete_or_withdraw_records

On the whole, these methods are less pervasive, but are still worth demonstrating should you ever need to use them!

[12]:
# Let's take some of the record's we've created in this example, fetch their release states, then delete them!
# but let's split them into two batches, and compare the times taken doing things individually versus in bulk
individuals = [new_rec1, new_rec2]
bulk_batch = [end_leaf, rec]

print(' Individual Records ')
print(' ------------------ ')
print()
start = time.time()
for r in individuals:
    print(r.release_state)
milestone1 = time.time()
for r in individuals:
    r.delete_or_withdraw_record_on_server()
end = time.time()
print('{:4.4f} s - Time to fetch release states'.format(milestone1 - start))
print('{:4.4f} s - Time to manually delete the records'.format(end - milestone1))

print()

print(' Bulk batched Records ')
print(' -------------------- ')
print()
start = time.time()
s.bulk_fetch_release_states(bulk_batch)
for r in bulk_batch:
    print(r.release_state)
milestone1 = time.time()
s.bulk_delete_or_withdraw_records(bulk_batch)
end = time.time()
print('{:4.4f} s - Time to bulk fetch release states'.format(milestone1 - start))
print('{:4.4f} s - Time to bulk delete the records'.format(end - milestone1))
 Individual Records
 ------------------

Unversioned
Unversioned
0.1682 s - Time to fetch release states
0.4825 s - Time to manually delete the records

 Bulk batched Records
 --------------------

Unversioned
Unversioned
0.0982 s - Time to bulk fetch release states
0.7699 s - Time to bulk delete the records

Conclusion

The optimizations shown above are not significant when working with small amounts of data, and the ease of use provided by the Streamlined API makes it a very powerful tool for small operations. However, as you scale up the operations being undertaken to hundreds, thousands, and tens of thousands of records, the time saved by using these more efficient methods becomes more significant.