Pick up where we left off last post October 23, 2019 (Azure, AWS and GCP snapshot Q3, 2019).
Premise - here we shall focus on productivity (min coding + max use of ML).
Scenario - take a spreadsheet or table like (structured) dataset, where the rows are well API numbers, and columns of location, well depth, lateral length of horizontal wells, formation thickness, porosity, velocity, density, other subsurface properties, etc.).
Objective - train ML model to rank some proposed well locations.
Azure - Power BI (comes free on Windows 10) is a powerful tool to QC, visualize and do quick and dirty clean up of missing data (without laboring through SQL). It comes with an impressive list of connectors (to handle different input formats from text, csv, xls, SQL to website). While Excel has been the workhorse for the past three decades, Power BI may well be the workhorse for the next three. https://powerbi.microsoft.com/en-us/desktop/
Getting started, type in Power BI in the search box next to Windows 10 Start icon
GCP - geek-less AutoML Tables is a very nice user-friendly ML frontend without us writing a single line of code. Start with our cvs file, or that clean-up exported from Power BI, choose the one column (or feature) to predict, and the columns used in training the ML model, and the max amount of time to run. AutoML will do the rest. Handy as a quick look to see ML can help in addressing your problem, or benchmark versus your own ML development / some others. For ranking proposed well locations, choose classification (if each input well is graded, e.g., A, B, C, D, E), or regression (if assigned a score, say between 1 and 100). https://cloud.google.com/automl-tab...quickstart
walk through the nuts and bolts incl. evaluating ML results.
Caution - because AutoML is easy to click and run, it is really important to properly QC input, no missing value in the column set as target, and all values are valid (watch out for zeros or constants like 9999, as filler that has no meaning). Remember “strong” data means meaningful input.
AWS - voice Alexa is still the most interesting yet, with improved AI-assist to manage conversations. More back and forth like real life dialog; much simplified alternative to rule-based linear step by step structured coding. On the web application side, AWS Amplify makes CI / CD (continuous integration / continuous deployment) straightforward and easy. Once I updated a microservice (lambda) and committed release on a Saturday afternoon (thinking slow traffic time window). To my surprise Amplify got it all done in an instant, before I could take a sip of coffee. https://aws.amazon.com/amplify/
Seeing the progress from GCP’s geeky BigTable to AutoML, Azure’s Power App platform (so whatever you have put together on your own Power BI desktop or laptop can be published to the web with one click), or AWS’s plethora of commercial strength offering (the latest OSDU, open subsurface data universe for seismic, well logs and other G&G data), the rate of innovation is on steroids across the major cloud platforms.
Wish list for 2022 - convergence in one place, PowerBI visual like simple data filtering / cut offs, AutoML Tables ease of ML use, and enchanting Alexa like voice-interface to further probe and do deep learning.
Lets share your experience in any and all of the platforms. So we learn more quickly, be more focused and more productive in using ML to tackle the challenges.