15 April, 2020 Texas United States

Part of AAPG's GeoAnalytics Credentialing Program

Excel-Based Random Forest Machine Learning Algorithms: Programming and Application

14-15 April 2020
Texas, United States


Who Should Attend
  • Geoscientists, engineers, and data scientists who want a powerful tool that works with data in a familiar spreadsheet format
  • Individuals who need to develop models, simulations, and scenarios that can quickly and effectively accommodate rapid change, uncertainty and complex interactions
  • Professionals who need to develop models to manage risk and to support data-driven decision-making
  • Individuals involved in supply chain, transportation, and water / emissions management
  • Industry leaders seeking clarity on the do’s and don’ts of machine learning
  • Consultants seeking to boost their capacities and provide high value to clients
  • Students and members in transition seeking to build their resumes

This course will teach attendees to:

  • Build and use a powerful tool capable of making valuable predictions and classifications from data in the workplace
  • Continue to adapt the tool to add new functionality or customization
  • Develop and deploy basic software in Microsoft Excel using Visual Basic (VBA)
  • Recognize opportunities for new data science software to add value in the firm
  • Collect and organize data for use in data science projects in Excel
  • Identify free online tools to continue their learning outside of the course
  • Determine when machine learning should be used on a problem
  • Describe the random forest method and component steps in rigorous detail
  • Explain random forest’s place in the world of machine learning
  • Create effective pseudocode and commenting in larger-scale software
  • Judge the effort and time requirements of software and data science projects
Why This Course Stands Out
  • You keep the tool: The random forest tool that is developed in the class and its master copy from the instructor are production-ready and will be given to all attendees.
  • The course is built for you: There are many courses teaching software engineering and data science today. Rather than dilute the lessons in order to teach to a general audience, this course has been built for an oil and gas attendee skillset and use cases. All examples and exercises will reflect common workplace problems faced by attendees.
  • Move quicker in Excel: Learning to deploy random forest can take several weeks of training in graduate data science programs, but this course makes use of your existing knowledge of Excel to accelerate the learning. You can also skip all the parts about learning to create a robust back end and just rely on Excel spreadsheets for data storage.
  • Random forests add considerable value: Random forests are widely applicable, do not require extremely large data sets, can be used for both regression and categorization, and are typically capable of yielding high accuracy results without extensive tuning. They are generally not available in Excel, but this workshop makes them so.
  • Understand every step: Rather than teach advanced data science methodologies using off-the-shelf tools that do not allow visualization of intermediate steps, this course deliberately pushes all intermediate output into Excel, making it easy to see, understand, trust, and adapt everything created in the course for deployment on the job.
Web-Based Participation

This course will be available for online participation only.

Course Content
Web-Based Participation

This course will be available for online participation only.

Software engineering is becoming a key skillset in the oil and gas industry. These skills are used to implement data science projects that increase production, lower costs, and improve safety across the industry. As a result, employment rates and salaries are significantly higher for those with industry-focused backgrounds who also have software engineering skills. Not only is it important to be able to build and utilize software tools directly, but it is also important to recognize and define opportunities to add value with software in the incumbent workflow. For this reason, those with an understanding of data science methods and the capacity to implement them have superior leadership qualifications and are increasingly likely to control teams and budgets.

Random Machine Learning Using Excel/VBA
Hear from the instructor and get a sneak peek into the course.

The random forest is a powerful and widely applicable machine learning method capable of adding tremendous value in the workplace through both regression and classification. Regression involves using previous samples to predict variables such as well performance, source rock maturity, non-productive time, drill bit fracture, clay swelling, the swab-off effect, likelihood of human accidents, commodity prices, etc. Classification involves assigning a new sample to the correct category, such as tagging reports, rating reserves, deciding between development plans, dividing work among teams, choosing from a list of tools, etc. Random forests are computationally intensive but well within the capacity of common modern computers. Even without any customization at all, a blank slate random forest is easily capable of yielding accuracies in the top 20% of some of the most competitive global machine learning competitions from as recently as 2012. Random forests can also derive strong accuracy without overfitting on small datasets, of which there are many in the oil and gas industry. Using random forests, industry practitioners will be able to drive more accurate data informed decision making across their firms.

There are many benefits to the industry in working with Microsoft Excel. While machine learning efforts on large datasets are often undertaken using computer languages such as Python and R, most of the industry’s data is already stored and manipulated in Microsoft Excel spreadsheets. Today’s powerful processors in standard work computers allow even computationally intensive machine learning methods to be implemented in Excel within tolerable timeframes. Practitioners used to working in Excel can also avoid the sharp learning curve that new computer languages impose and can avoid the security limitations of using new software in the firm. Finally, Excel allows the user to visualize all the data stored in even the intermediate steps of machine learning implementation, increasing transparency and credibility of results. Practitioners can demonstrate exactly how they have arrived at their recommendations from a random forest in Excel.

Despite Excel’s status as the most popular data storage and data manipulation tool in the industry, many practitioners have not been taught how to use the built-in scripting language Visual Basic (VBA) to develop software tools in Excel. In addition, despite Excel’s capabilities, many of the more powerful data science methods have yet to be adapted to Excel. To boost the value of oil and gas practitioners in the industry, this course teaches software development in VBA for machine learning and the creation of a random forest tool in Excel.

Module 1: Practical Software Development Using Visual Basic for Data Science in Excel

This module is an in-depth exploration of coding and data science techniques available in the Visual Basic (VBA) scripting language in Microsoft Excel. The course will teach attendees to build tools which can programmatically analyze and manipulate dynamic data samples. Attendees will learn to perform tasks such as finding and correcting errors in data, conditionally sorting data into categories, automating calculations and formatting for single-button report generation, and even implementation of more advanced data science techniques.

Beginning with an overview of the contemporary utility of Excel, the module will then progress to cover the additional value VBA adds to Excel and how it is set up and used in conjunction with Excel. The next section touches on best practices in setting up data in Excel for use of data science to extract value. This section also covers the challenges presented by conducting data science projects in Excel and how VBA can help to solve them. From there, the module moves to a deep dive into VBA, covering how to reference the Excel worksheet, how to arrange software for maximal effectiveness and for ease of debugging, and the specific scripting techniques available in VBA with their variety of uses. The module includes regular individual and group exercises as well as introductions of valuable free online tools to help troubleshoot implementations of VBA in Excel. This module is designed for attendees who are already proficient Excel users, but the material is appropriate for attendees who are new to coding in general or new to VBA in particular. Proficient Excel users are those who are familiar with creating, saving, and accessing Excel files; with manipulating rows and columns; as well as with using formulas in cells and across tabs. Examples throughout the module are taken from the oil and gas industry.

Module 2: Random Forest Machine Learning Implementation in Visual Basic for Excel

Module 2 walks attendees through using the techniques from Module 1 in VBA and Excel to create a random forest tool, which is a powerful and flexible machine learning algorithm typically unavailable in Excel. Random forests are capable of working even with smaller data sets to achieve accurate results without overfitting the data.

The module begins with a brief history of machine learning and the general reasons behind its relevance and utility today. After comparative advantages and drawbacks of different machine learning approaches are explored, the module moves to a deep dive into random forest. This technique is covered thoroughly in interactive sections to develop an intuitive understanding of the appropriate use cases, the overall purpose of the method, and each of the component parts. Following a brief section on practical considerations for random forests in Excel, the module then covers the pseudo code for each component in visual basic. Participants engage in activities to create and connect these components from their pseudocode, and results are compared to a master random forest file shared with each attendee. The module then progresses through the entire back end of the tool with intermediary outputs into Excel to help build visual understanding. After another round of exercises putting the tool to use on data sets, the module concludes with a recap of the tool, resources for continued development, and suggested projects for using VBA in Excel as well as the random forest tool in the workplace. Examples and datasets used throughout are taken from the oil and gas industry.

The modules are recommended to be taken together in series, but each can be taken independently depending on the goals of the attendee. The first module covers general methods for scripting in VBA, so it is still useful on its own to attendees planning on building small scale software to manipulate and learn from data in Excel even if they have no other interest in random forests or in practicing larger projects in a course setting. Module 2 involves putting together the various building blocks covered in Module 1, however, so it is still good practice even for those uninterested in random forests. Likewise, the examples and context in Module 1 will prepare attendees for specific challenges presented by the random forest in Module 2. For this reason, it is recommended that even attendees already familiar with scripting in VBA take both modules.

Module 1: Practical Software Development Using Visual Basic for Data Science in Excel

  1. Microsoft Excel refresher
    1. What is Excel used for
    2. What is Excel best at
  2. Intro to Visual Basic (VBA) for Excel
    1. When to write scripts instead of using formulas
    2. How to set up and run VBA
  3. Data science in Excel
    1. Data engineering
    2. Data clean and prep
    3. Methods for insight extraction from data
  4. How to implement data science projects in VBA
    1. Worksheet as backend
    2. Functions and subfunctions
    3. Data types
    4. Loops and conditional statements
    5. Debugging best practices
    6. DIY learning tools
  5. Curated practice projects

Module 2: Random Forest Machine Learning Implementation in Visual Basic for Excel

  1. Machine learning introduction
    1. History and contemporary relevance of machine learning
    2. Case studies for effective use of machine learning in the industry
    3. Implementation methods and their drawbacks
  2. Random forest deep dive
    1. What are random forests used for in machine learning
    2. How random forests work
    3. Example use cases and results
  3. How to set up a random forest in Excel using VBA
    1. Practicalities of random forest in Excel
    2. Intuition and pseudocode
    3. Random forest implementation walkthrough
  4. Curated practice projects
  5. Conclusion
    1. Recap of the tool you now have and how you can use it
    2. Resources for continued development
    3. Suggested projects using VBA and using random forests
Expires on
15 April, 2020
Professional Fee
Expires on
15 April, 2020
Displaced Professionals Fee
Expires on
15 April, 2020
Student Fee
30 People
Web-Based Participation

This course will be available for online participation only.

  • Course Materials
Online Meeting - WebEx
Web-Based Participation

This course will be available for online participation only.


Desktop /Portals/0/PackFlashItemImages/WebReady/walker-alec.jpg?width=75&quality=90&encoder=freeimage&progressive=true 52655
Events Coordinator +1 918 560-9431
Desktop /Portals/0/PackFlashItemImages/WebReady/mahan-amy2.jpg?width=75&quality=90&encoder=freeimage&progressive=true 29 Amy Mahan
Director, Innovation and Emerging Science and Technology +1 918 560 2604
Desktop /Portals/0/PackFlashItemImages/WebReady/nash-susan.jpg?width=75&quality=90&encoder=freeimage&progressive=true 28 Susan Nash, Ph.D.

What Can I Do?

Add Item

Enter Notes:
* You must be logged in to name and customize your collection.
Recommend Recommend
Printable Version Printable Version Email to a friend Email to a friend

See Also: Events Blog

Events Blog Excel-Based Random Forest Machine Learning Algorithms: Programming and Application Back by Popular Demand – AAPG’s Online Machine Learning Short Course Back by Popular Demand – AAPG’s Online Machine Learning Short Course Desktop /Portals/0/PackFlashItemImages/WebReady/Excel-Based-Machine-Learning_Hero-2-New-Date.jpg?width=100&h=100&mode=crop&anchor=middlecenter&quality=75amp;encoder=freeimage&progressive=true 56878

See Also: Explorer Article

Explorer Article Big Data, Deep Learning Workshop Coming in May Big Data, Deep Learning Workshop Coming in May Desktop /Portals/0/PackFlashItemImages/WebReady/big-data-deep-learning-workshop-coming-in-may-hero.jpg?width=100&h=100&mode=crop&anchor=middlecenter&quality=75amp;encoder=freeimage&progressive=true 39057

See Also: Explorer Emphasis

Explorer Emphasis Article Machine Learning Changes the Job Requirements for Geoscientists Machine Learning Changes the Job Requirements for Geoscientists Desktop /Portals/0/PackFlashItemImages/WebReady/Machine-Learning-Changes-the-Job-Requirements-for-Geoscientists-hero.jpg?width=100&h=100&mode=crop&anchor=middlecenter&quality=75amp;encoder=freeimage&progressive=true 55292

See Also: Explorer Geophysical Corner

Explorer Geophysical Corner The Importance of Parameterization in Machine Learning The Importance of Parameterization in Machine Learning Desktop /Portals/0/PackFlashItemImages/WebReady/Importance-of-Parameterization-in-Machine-Learning-hero.jpg?width=100&h=100&mode=crop&anchor=middlecenter&quality=75amp;encoder=freeimage&progressive=true 63603

AAPG Non-endorsement Policy

The American Association of Petroleum Geologists (AAPG) does not endorse or recommend any products and services that may be cited, used or discussed in AAPG publications or in presentations at events associated with AAPG.