Howdy, everyone. It's my pleasure to be able to talk to you today about
subsurface data analytics and machine learning. Now, I get to do this as part
of the AAPG/AAPG Foundation 2020 Distinguished Lecture series. And it's my
pleasure to be able to participate in this, in appreciation to AAPG/AAPG
Foundation for supporting this. I'm Michael Pyrcz. I'm an associate professor
at the University of Texas at Austin.

Now, what's my motivation? What's my goals in giving this talk? Well, it's
not just fame and fortune. I am actually interested in how we build new digital
competencies for geoscientists. I want to help people be ready for the digital
transformation.

I want to also demystify data analytics and machine learning. In other
words, what I want to do is provide you with an anti-baffling defense, because
I see a lot of people out there, in our industry specifically, getting baffled
by the new technologies. I want to share the benefits and limitations of this
technology with you, so you can be an informed consumer, so you can understand
what it can do and what it cannot do.

And I also want to provide some useful ideas and concepts. In other words, I
want this talk to be a call to action, to invite you to try something out, to
try some data analytics and machine learning. And you'll see, even later in the
lecture, I'll go through some examples that I've provided to you on GitHub, so
you can actually follow along. So this is going be highly interactive.

How are we going to accomplish this? Well, this is what I'll cover. I'll go
through and introduce myself. You'll see, it's a little bit of shameless
bragging. I promise, I'll keep that to a limit.

But it does kind of set up my perspective and kind of get us started. Then
I'll talk about the prerequisites. I'll treat you like you're in one of my
courses. And we'll go through the fundamental concepts, terminology, so we can
communicate and move forward from there.

Then I'll talk about energy data analytics and machine learning. And I'll
make statements around how we're unique, how we're different, and how we should
see and use this technology. Then I'll provide a whole set of examples in data
analytics and machine learning, and even some of them, you'll be able to work
on and follow along with me.

Then I'll go into a bit of philosophy. I'll get on my soapbox. I'll talk
about data analytics and machine learning best practice. I'll end up with a
little bit of euphoria-- a little bit excited about all of the great
opportunities in data analytics and machine learning, by showing you more
advanced research, mainly from my research group at the University of Texas at
Austin.

Let me start with a couple of slides about myself. First and foremost, let's
talk about the most important topic, that's my name, Pyrcz. And it's pronounced
just perch, just like the fish. If you say it like that, you're saying it just
as well as my family that's been in Canada for about 100 years. I've run into
Ukrainians who tell me, we don't actually pronounce it properly. Apparently, we
lost that along with the language.

Now, the other thing is, I do have practical experience. I am a professor
right now, but it's only been for a short time. I have grown up within the
industry, conducting projects in this topic, data analytics, geostatistics,
statistical methodologies. I have worked on a wide range of consulting and
teaching, and even at one point, leading an R and D-- research and development
team, developing and deploying technologies in this area. So I know something
about using this technology in our industry.

Another thing about myself is availability. I have an open door policy. Now,
my wife has cautioned me about this, because she does notice I come home many
days at 8 o'clock at night from campus, because I'm always working with
students, always engaged. And it does force me to get some of my work done at
night and on the weekends.

But I do think it's great to have an open door policy. You can drop by my
office, drop me a line. Recently, actually just a month or so ago, a vice
president from Noble Energy called me up and said, you know, Michael, you're
one of my favorite professors. You actually pick up your phone. You're
available here. And try it, you'll find, I actually pick up my phone. I'm happy
to help out.

Background. I left the industry just a couple of years ago, really motivated
by the idea of giving back, being involved in education. I love teaching. I
love sharing my knowledge. And I thought over those years, I learned something
important, and I could share with the next generation.

I'm working quite a bit to help basically change and modernize some of the
curriculum that we're teaching at undergrad and graduate levels at the
University of Texas at Austin. I'm on a bit of a mission here. But at the same
time, I still teach and consult and work in industry as much as I can, I think
20 separate teaching engagements in industry last year alone. I love industry.
I'm very comfortable in industry.

I'm very active in outreach and social media and professional organizations
like the AAPG. I'm working very hard to support geoscientists and engineers in
our field with the digital transformation. I think a professor is a role of
service to society. And I think it's part of my role, is to serve and help our
industry and our people, our experts, in gaining new skills and work in the
digital transformation.

So I've recorded all of my university lectures, every single one of them,
and I put them on my YouTube channel. And every single example workflow is put
up on GitHub. Anyone, anywhere in the world, can follow along with my courses.
And I believe in that. I think it should be open.

Now, what's funny is, I visited Chevron recently. And a manager at Chevron
made a joke-- said, Michael, you're more famous around here since your left,
because apparently, there's many people around the company using my materials.
And I welcome you to use them. I hope you find value in them.

I've been at University just a couple of years now. And I've already built,
I think, a really great, exciting team. I've seen a germination of something
great, something start-- a wave, in a way. And so I already have 12 PhD
students. I've had to stop letting new students in, because it's just too much.
And I'm trying to keep up with it. I have a brand new consortium that just kicked
off.

I also have a Freshman Research Initiative and Ventures Program-- the only
fully funded one from industry, funded by Conoco Phillips-- in the College of
Natural Sciences, working on energy analytics. And so there's a whole movement
going on right now. I think we're changing things, and we're getting
geoscientists and engineers in our field ready to work with data analytics and
machine learning, digital technologies.

So let's talk about some prerequisites. So I'm going to go into professor
mode now. We're going to cover the basics of data analytics and machine
learning. I want to cover those fundamental concepts and terms, so that you're
able to follow along with the rest of the discussion. But you're going to see,
there's a lot of nuggets nested in there too that are going to help you
understand the overall technology and what it can do.

First of all, everybody is talking about big data. Now, the question people
have is, do I have big data? Now, if you look up big data online, you're going
to find out that the criteria for big data are a set of these, the v's--
volume, velocity, variety, variability, veracity. And if you have enough of
these v's, you can claim that you have big data.

So let's talk about a couple of them. Volume. Volume would mean that you
have a large number of samples. It's difficult to handle. It's difficult to
visualize. Now, I've been to tech meet-ups where people will say, hey, if it
fits on your laptop, it's not big data. It's not big enough. I don't really
have that feeling about it. I'd say, if it's difficult to handle and visualize,
that's big data.

Velocity. This is the idea that the rate of collection of the data is high,
continuous relative to the decision-making cycles. Now, many people in a tech
meeting will say it has to be real-time data. What I say, is for energy-- given
the complexity of our workflows and our decisions and the rate at which we
gathered that information, and the vastness, the quantity of the data we
collect when we do things like shoot seismic and drill new wells-- I would
suggest that we have velocity, too.

Variety. Data from various sources with various types and scales. We clearly
win at that. We handle data all the way from the intergranular, the poor scale,
all the way up to the drainage radius of individual wells that are producing
and beyond that, to base an analysis and the detection of even basins that we
should be working in. We definitely have variety.

Variability. Data acquisition changes during the project. We have that, too.
I think many of us have worked with projects where you start with
two-dimensional seismic lines, three-dimensional seismic surveys. Maybe there's
a reprocessing. Maybe there's a reshooting of the seismic to improve it. And
then you're installing ocean bottom nodes. You're doing 40 seismic. We have a
lot of variability in our projects.

And veracity. Data has various levels of accuracy. We work in the
subsurface. Every one of our data sets is uncertain. It's actually very
difficult to find any of our data that's truly hard data. It all has some level
of softness.

And many times, we work with data that's core, samples, measurements-- we
get directly from cores. That does have a pretty high level of precision. But
you can still wonder about the issues around recovery of the core from the
subsurface. And then we work with seismic, where we have to do very
complicated, physics-based inversions. There's a lot of uncertainty.

So what do I say from all this about big data? Every time I go and meet with
a tech company, I proudly exclaim, energy has been big data long before tech
company even learned what big data was. We know something about big data.

Statistics. Well, if you go back and you look at the fundamental definition
of statistics, it's just about collecting, organizing, and interpreting data.
You draw conclusions from it, but everything comes down to making a decision.
If you don't impact the decision, you don't add value. And so statistics is all
about supporting decision-making.

Geostatistics is a branch of applied statistics. Now, the great thing about
it, it was developed back in the 1950s-- really focused on practical needs of
subsurface estimation, back in gold mining, but later on, used in oil and gas.

Now, the great thing about it is, because it was based on the practice, it's
actually quite intuitive. The theory was just added on later. So we have the
math. But really, it's applied statistics, where we have a spatial context, a
geologic context, spatial relationships, volumetric support, and uncertainty,
which is always there for us. So it's a subset of statistics. And we can draw a
Venn diagram like I show over there, where you can see geostatistics as being a
subset of statistics.

Now, data analytics, if you look it up online and you try to get a
definition of it, you'll find it's all about analysis of data to support
decision-making. Now, often, people will talk about business decision-making.
They'll put a business slant on it. But what's very interesting, if you look at
that definition, I have trouble distinguishing that from statistics. In fact, I
would say, data analytics really is the use of statistics and visualization.

Now, big data analytics is the process of examining large and varied data
sets- the big data we talked about just now-- to discover patterns and to make
decisions-- shouldn't be any surprises there. And spatial big data analytics is
the expert use of spatial statistics, geostatistics on big data to support
decision-making.

Now, what can we say about all this? Well, given the fact that data
analytics is really the use of statistics and spatial data analytics is the use
of geostatistics with visualization to support decision-making, I would say, go
back home and update your CV, because you work in data analytics if you
understand geostatistics and statistics and you use it in your job.

Machine learning. Well, I did what everybody else does. I looked up machine
learning on Google, and the first page was Wikipedia. So I went ahead and copy
and pasted from Wikipedia, and this is what I got. Let's break it down and do a
little bit of an analysis of what it means.

Well, machine learning is a study of algorithms and mathematical models.
Now, you notice that's plural. So what it's telling us, it's a tool kit. It's
not one method. It's many methods we work with, the computer system used to
progressively improve their performance on a specific task.

It's learning. It's improving its performance. Its learning. Machine
learning algorithms build a mathematical model of sample data known as training
data. It's learning from data. It's training with data in order to make
predictions or decisions without being explicitly programmed to perform the
task, without being programmed to perform the task.

It's general. It can be applied on a wide range of problems. It doesn't need
to be programmed how to solve that exact problem, but it can be general and
applied to many different problems.

Now, many people stop there. I read to the end of the article. Near the end
of the article, you'll find the following phrase, where it is infeasible to
develop an algorithm of specific instructions for performing the task. What
does that mean? It means this. If you understand the physics, if you understand
the geological concepts, use your knowledge. Don't let a machine decide for
you.

It's not a panacea. Machine learning is not supposed to be just used on
every problem. It's really best for those problems for which the data is too
big, the problem is too complicated, we don't understand the physics. But
you'll see, we'll talk a little bit more about how we can put it all together.

Machine learning nuts and bolts. What does a machine learning model look
like? Well, that's it right there. A machine learning model is simply going to
be a function where we take a set of x's and we get a y. Now, let me just
define each one of those components.

The x's, x1 through xm, they're the predictor features. Now, before it was
all cool and trendy to talk like that, we would have said, those are the
independent variables. But in machine learning, we'll call them predictors for
the inputs and features instead of variables.

Now, what we're predicting on the other side of the equation, the y, is the
response feature. And there might be more than one, but it'll be a response
feature or features. And that, back in the good old days, was just the
dependent variable. And so that's the output from the model. So when you look
at it, machine learning is all about estimating a mathematical model f for the
two purposes, inference or prediction. Let's talk about inference first.

Machine learning inference. What is the relationship between each one of the
predictor features? That's important in itself. That's understanding the
system, the sense of the relationships.

Is it positive or negative relationships? As one goes up, does the other
feature go down? What's the shape of the relationship? Are there sweet spots?
Are there specific locations where you get certain concentrations of samples?

Maybe there's other combinations of features that don't happen because of
physical constraints. And understanding all of the complicated relationships
between each one of the predictor features is very powerful. That's about
understanding the system.

Now, if you want to understand a little bit deeper about what inference is,
think about inferential statistics. I'll give you a very simple example. If I
give you three heads and seven tails, and I tell you this is the coin that did
that, tell me what the probability is that that coin is a fair coin, 50/50
chance of having heads and tails.

That's inference, given the sample, describe the population. Given the
result of the coin tosses, tell me about the coin. That's inferential
statistics, which actually is very difficult to do. It is a bit more
complicated.

Prediction is actually easier. What is prediction? Well, in the case of
machine learning, what we're doing is we're estimating that function f, for the
purpose of predicting y. Our focus is on getting the most accurate estimate of
y. That's what it's all about. We want to get the best estimate of y.

Now, if you want to understand a little bit deeper once again, think about
predictive statistics. It's the case of, given a fair coin, and I tell you that
coin is fair, what's the probability of an outcome such as three heads and
seven tails? In other words, given an assumption of both the population,
predict the outcome of the next sample. That's prediction.

So when we're doing that, building those functions, f for machine learning,
inference and prediction, we've got two different types of functions that we
can work with. We've got parametric models and non-parametric models. It's very
straightforward.

A parametric model, we make an assumption about the functional form or shape
of the model. We gain from simplicity. We have an advantage, because now we can
describe the entire system with very few parameters. And because of that, we
typically can build our models with fewer data.

Now, here's an example right here. Our function is simply a linear model,
where we just say that our y is equal to a set of coefficients, multiplied by
each one of the predictor features. Not a big deal and so I show a model right
there, a simple linear model that describes the relationship between elevation
and standardized porosity, which would simply be something equivalent to a--
some type of compaction trend.

Now, working with non-parametric models, that's our other option. In that
case, what we do is we don't make any assumption about the functional form or
shape. It's much more flexible, because we can fit any possible shape or
function.

You could imagine, if we had data like the data we're showing right here,
that if we tried to fit a linear model to that, we miss a lot of those cycles,
because our model was-- we already assumed it was linear. We don't have the
flexibility to fit that. So we have less risk that our function or our estimate
of the function is a poor fit for the actual natural system. But typically--
there is always a trade-off, no free lunch-- typically, you need a lot more
data for an accurate estimate of that function.

So how do we build a machine learning model? How do we do it? It's just like
this. This slide right here shows you the entire process. What we'll do is
we'll take all of our available data, and we'll split it into train and test
subsets.

Now, I know there are some experts out there saying, well, it could be much
more complicated. There could be validation, train and test and all of that.
I'm showing the simplest workflow possible.

So we take the data set. We separate it. Usually, about 80% goes in the
train. 20% or so goes in the test. There's papers written about what's the best
split or proportions for the split. And then what you'll do is you'll take the
train data, and you'll build models with a variety of different levels of
complexity.

So at the top, what I'm showing is a very simple model. It's a polynomial
that's just linear, a first order polynomial. And at the bottom, it's more like
a seventh order polynomial. And then we'll take the parameters of that model,
and we will set them so that we get the best fit model to the training data. So
for each level of complexity, from the top to the bottom, we're getting the
very best model to fit the data. We're minimizing the error.

And then what we do is we take those best fit models, and we check them
against the withheld testing data. That testing data was not used to get the
best fit model. And we can calculate the error over each one of those testing
data, and we'll do that for each level of complexity.

Now we'll pick the model that performs best with the data not used to train
it. What we're doing is we're picking the very best level of complexity. In
other words, we're tuning the model complexity, or as what I'll show right away
here, we're tuning the hyperparameters.

So now we have to define a couple terms, parameters and hyperparameters. The
parameters, it's not a big deal. The model parameters are simply-- in the case
of our linear model, just those coefficients, the B-3, the B-2, the B-1, and
the constant terms. And so we will set those parameters in training such that
we minimize the error with regard to the training data. We're getting the very
best fit model.

The model hyperparameters are totally different. They're the constraint on
the model complexity. We're going to select hyperparameters that maximize
accuracy when we're testing against the testing data.

Now, for the case of our polynomial model that we're showing here in this
example, the hyperparameter is simply going to be the order of the polynomial.
Are we working with a first order, a second order, a third order? I show fifth
and seventh orders in the cases in the bottom right hand corner. So that's our
hyperparameter. It's the degree of complexity of the model. And we tune our
hyperparameter with the testing data.

Now, when we're doing that tuning, the hyperparameters, why are we doing
that? What's going on here? And what it all comes down to is a variance in bias
trade-off. We want to get the best estimates, the most accurate estimates in
testing.

Now, testing really means that we're trying to mimic the idea of using the
model in cases not used to build the model. That's real world application of
the model. Now, you could take the mathematics of expected tests being squared
to error, and you could expand it out. I won't go through the derivation here.

But when you do that, you get three components, additive components of error
for the real world use of your machine. And what you'll find, there are model
variance, model bias, and irreducible error. Now, let me explain each of them.

Model variance is the error due to sensitivity to the data set. In other
words, if you were to collect slightly different data, how much would your
model change? Now, you could imagine that if you have a situation where you
have a linear regression model, a simple linear model, if you change the
training data a little bit, it might wiggle a little bit but not that much.

If I increase the complexity and I have a ninth order polynomial-- if you
change the data a little bit, that ninth order polynomial will swing around
radically. It will change quite a bit. It's more sensitive. Model variance
increases as the model complexity increases, and that's the orange line in the
bottom right.

Now, model bias is the other side of the coin. It's the error due to the
fact that you have a too simple model, an approximative model. And so if I use
a linear regression model but I have a complicated natural phenomenon, I have
high model bias, because my model is not flexible enough.

As model complexity goes up, what actually happens, as you can see with the
blue line there in the bottom right corner, model bias goes down. So we're
balancing model bias and model variance with complexity. In other words, as we
tune our hyperparameter, we're shifting along that line.

Now, irreducible error is another component, and that's error just due to
the data, the limitations of the data themselves. You could have features you
didn't sample, but they are central to understanding the natural system. If you
don't have that information, of course, you're going to have error in the
model.

Or there could be combinations of features you've never sampled. Maybe you
didn't have enough samples. In other words, this is just the limitation of the
data. And even if you've got the world's leading expert in machine learning
here to help you out, they can't reduce a reducible error. And so irreducible
error is just constant over all possible models you choose, over all levels of
complexity.

Now, we've got to talk about overfit, because that's what it's really all
about when we're talking about balancing variance and bias. It's about avoiding
overfit. So what is overfit? Overfit could be defined as fitting the data noise
in the model. And data idiosyncrasies now become part of the model, and that's
a problem. It's going to lead to very bad predictions with your model.

If you increase the complexity-- as you can see on the top right over
there-- if you increase the complexity, you'll generally decrease the error
with respect to training data. If you look at the example I'm showing below,
you can see where we use a very complicated model, we perfectly fit the data.
We have no error at the data locations.

Now, when we decrease the complexity of the model-- over towards the right
on the bottom-- you can see, we start to have error, but we still have a good
model. So as we increase the complexity, we will reduce the training error to
zero. But what will happen-- and we'll find this when we do validation of our
model-- we'll find that the testing error will go up.

And so you can see, the blue line-- we reach a point in model complexity
where the training error is low, but the testing error is going very high. In
other words, you're fooling yourself. You think you would know more than you
actually do. You have a model that's going to perform very poorly in real world
circumstances. That's an overfit model.

So we covered some of the prerequisites of data analytics and machine
learning. Now, let's talk about the specifics of energy data analytics and
machine learning. First of all, if no one else has done this yet, let me
welcome you to the fourth paradigm of scientific discovery. It just started.
Isn't that exciting, to actually be alive during a brand new scientific
paradigm, to actually be there and see it get started?

When did we have other paradigms? Well, you learned about this back in high
school. The empirical science approach, well, you can go way back to antiquity,
and you could see they were running experiments. And they were learning about
the natural setting by doing that.

The theoretical science came along later. And we start to develop the
equations, the analytical expressions, and discover the laws of the classical
mechanics, electrodynamics, to start to understand our natural system. Now,
what we found out, back when the computers started to get more powerful, was
there is many cases for which these natural laws are not sufficient.

Complicated heterogeneous systems, you can't solve it using just the
analytical expressions. You had to run computer simulations. And so that's the
computational science simulation paradigm, and that would be the third
paradigm.

The fourth paradigm is the data-driven science approach, the idea of detecting
patterns and anomalies in big data sets. In modern society, we're surrounded by
data, and we lack the physical explanations we're now using data-driven science
to try to understand. Artificial intelligence is really starting to take off.
And so that's the fourth paradigm, and we're there now.

Now, what does that mean for society? Well, if you look at all sectors of
our economy, everybody is facing a digital transformation. Deloitte did a
recent study. Just last year, they looked at the preparedness of different
sectors of our economy.

And when they looked at oil and gas, they put us in the middle. We're not at
the lead. We're not trailing behind. But we're somewhere in the middle for all
of the ranking as far as readiness. They can see that we're making efforts.

Now, the good thing is we're not alone. Everybody, in all sectors of our
economy, are rushing right now to try to find new ways to add value with
digital technologies and to add capabilities to their teams around
digitalization. We're all doing it together.

Now, I have some biases, though. I should be honest about that. I get asked
to speak a bit about the topic. And Price Waterhouse Cooper, just last summer,
had me come and sit on a panel. You can see, I kind of stand out. I have the
long hair.

And they asked me to stand up there in front of a bunch of energy executives
and talk about what's going on with energy digitalization. And this is what I
tell them. I think I disappoint and surprise people when I say these things. I
say, there's opportunities to do more with our data. I think that right there
is the low case, is that we'll just do more with our data, and we'll treat our
data better.

There's opportunities to teach data analytics and statistics, machine
learning methods to engineers and geoscientists to improve their capabilities.
And I mean the students, and I also mean the working professionals. I think
it'd be better if we all understood that better.

And geoscience and engineering knowledge and expertise remains core to our
business. I don't think we should all be replaced by data scientists. And I'll
make a couple more comments around that. I think it's necessary to retain that
strong level of geoscience and engineering knowledge.

Now, what am I saying when I make that strong statement? I'm saying that
just because we discovered the paradigm of computational science and
simulation, it didn't mean we abandoned theoretical science and empirical
science. We actually augment new scientific paradigms to our toolkit. We don't
replace the older paradigms.

When we have the analytical and theoretical expressions, we're going to use
them. When we can solve the system by first principles, we'll do it. When we
have to work just based on observations and trying to sample the problem, we're
going to do that, too. And when we just have data available to us and we lack
physical explanations, we'll use the data-driven approaches, too. They all work
together, and they can augment and support each other.

But what's crazy about this, in this data-driven science world that we live
in right now, it needs data. And in fact, back in the good old days when I was
building subsurface models at Chevron, what we knew was 80% and sometimes 90%
of our effort was data preparation, getting the data ready for the model, data
preparation, interpretation, and so forth.

We continue to face challenges with data, data curation, the large volumes
of data, the large volumes of metadata-- we have a lot of metadata, the data
about the data-- variety of data scale, collection methods, the interpretation,
the transmission controls and security of our data. They still challenge us.

In other words, clean databases are prerequisite to all of our data
analytics and machine learning, and we've got to focus there. We have to work
there. That's our foundation for everything we do with our data. And you
remember the old adage of garbage in, garbage out? It still stands, even in
this modern fourth paradigm.

Now, I also do believe that energy is unique. I would argue that many of the
tools and technologies for data analytics and machine learning are not quite
ready for what we do, because we are so unique in what we do. We need unique
solutions. Why is that? We have sparse, uncertain data, complicated,
heterogeneous, open-earth systems.

Compare us to Google or Amazon, they see every click. They have exhaustive
data sets. The people working with satellite images, they see every pixel. We
sample one trillionth of the subsurface, and even that sample relies a lot on
interpretation. And then we have to interpret all kinds of physics-based
inversions that go on between those samples to try to understand what's going
on.

We have a high degree of necessary geoscience and engineering interpretation
of physics because of that. And our decisions are extremely expensive. They are
very, very high value decisions. I remember, just years ago, drilling a single
well in the Gulf of Mexico. We're talking about hundreds of millions of dollars
for a single well in the deepwater Gulf of Mexico, specifically, if you add a
production test onto it.

Now, let's compare and contrast that with the very common use of artificial
intelligence. I don't know how many of you are using Spotify, but I know many
people shop at Amazon. If you've done that, you've encountered what's known as
a recommender system. What it does is it tries to look at your behaviors and
tries to suggest what you want next.

Maybe this is too much information, but this is my Spotify recommender
system back from the summer of 2019. And so what it does it looks at what I
listen to at work, and it tries to figure out what I want to listen to next.
Now, I'm going to tell you something. I'm Canadian. I do listen to a lot of
Canadian music. And because of that, every once in a while, it recommends
Nickelback. I want to assure you that not a single Canadian likes Nickelback
any longer.

Now, what happens when it recommends Nickelback to me? Well, it starts
playing. And you know Nickelback. We all liked it before. Your head starts to
bob. Your foot starts to tap. You think, hey, this is pretty good hard rock.
And then you remember, this is Nickelback. You fast forward it. You kind of
say, darn, I did that again, and you move on.

What was the consequence? Well, clearly, Spotify got it completely wrong.
It's like drilling in the completely wrong location. What was the cost of that
mistake? There's actually no cost at all. I didn't cancel my Spotify account.
Nobody will. We just move on.

They work in the space of very low value decisions. It would make no sense
to have human interaction in those decisions. So we have to recognize energy is
quite different. We have to be critical users, consumers, and developers of
this technology. It was developed for very different applications.

Don't jump to complexity. Now, you remember, I showed you this idea of
variance bias trade-off. Let's look at that equation again. I already explained
model variance, model bias, and irreducible error.

Now, I did mention that they're all additive, which means the error in
testing-- which in other words, is the error in real world use of your model--
is the summation of those three lines. So that red line over there, that's the
summation of irreducible error, model bias, and model variance.

Now, what do you see? You'll notice that the best performance, the lowest
error in real world use is often not the most complicated model. In fact, model
variance usually eats your lunch. It really is a problem. Many of the advanced
methods of machine learning are all about trying to defeat model variance.

And so by using a simpler model, we have lower model variance, which is very
powerful. What else do we gain? We gain a high degree of interpretability. We
understand the model, because it's a simpler model. And so don't jump to
complexity.

Interpretability is critical. Now, what's interesting, when you develop
methods and workflows, it's very important to have diagnostics to understand
what the model's doing. Interpretability of complicated machine learning models
is, in fact, very low. Sometimes, it's just absent. Application of a machine
may become routine and trusted.

Now, here's a problem. When a workflow or method becomes routine-- it
becomes the preferred workflow-- what happens is you'll find that if you don't
use that workflow in an organization, it becomes kind of a red flag. You have a
lot of explaining to do. The machine becomes trusted. And when you don't
understand it, you can't interpret it, it becomes an unquestioned authority.
That's very dangerous.

Now, there's a really interesting study that was completed. Ribeiro and
others, back in 2016, took a bunch of pictures-- about 20 pictures of wolves
and dogs-- and they put them, standardized them. They put them through some
type of logistics system. A machine that came out as an output, it said, what's
the probability of dog? What's the probability of wolf?

And so what happened when they did this was they put this image into it. And
what they found was that it came back high probability of wolf, like
90-something percent wolf. And so they looked at it. And if you're a dog
person, you recognize immediately, that's not a wolf. That's a husky.

And so they went back to the machine, and they said, tell us which pixels in
this image gave it a high probability of wolf. And now, you seen in the image on
the right, that those pixels, in fact, are the snow in the background. The
reason this was classified as wolf was because it was standing in snow. All
those pictures of wolves, they're always up in Canada, far in the North, in
those scary, dark places, that's why.

And so a bias has been put into the system. And if you could not interpret
it, you would not understand the machine was getting it completely wrong. Now,
this example was shared by Peter Haas in his Ted Talk, which I really
appreciated. Go ahead, check that out, great talk about why he was afraid of
the AI approaches.

And what he says is even the developers that work on this stuff have no idea
what it's doing. And sometimes, I believe that in some of the more complicated
methods. And what he also says is that these systems do not fail gracefully.
When they get it wrong, they get it completely wrong.

Now, it's important to also talk about meeting technology expectations. The
Gartner Hype, Technology Hype Cycle is really important to look at. This is a
well-known plot that shows us time on the x-axis and expectations on the
y-axis. And so we go from the innovation trigger, the discovery of a
technology, through a peak of inflated expectations, through a trough of
disillusionment. We go up a slope to a plateau of productivity.

And so what we see is that-- when I go to many different companies-- I
mentioned before, I think I visited 20 companies or so last year-- I like to
show this chart, and I like to ask, where are you right now? And what I find is
that the answer depends on the company. And often, it actually depends on the
group within the company or the individual within the company.

But I'll tell you what, I haven't had anybody tell me yet that we're at the
plateau yet. In fact, most people suggest that we're somewhere between the
innovation trigger and the peak of inflated expectations, if not starting to
come down the other side a little bit. Globally, the expectations are very high
for this technology, and we need to manage that.

So how are we going to meet these expectations? How are we going to harness
this technology? Well, we need operational capability, which inside of a company,
it's just fancy speak for saying that we need the skill sets among our people
to be able to use these technologies. And I agree, we need data scientists.

The Venn diagram for data scientists is shown here on the right. It's the
idea of having domain expertise, understanding the geoscience engineering, the
statistics, probability, data analytics, and so forth, and coding, the ability
to put together workflows, automating, scripting, and so forth. And that
individual, that magic individual who can do all three of those is considered a
data scientist.

And I'll tell you what, I go to companies right now, and I'm running into
data scientists all the time. They're being hired into our companies. What I'd
suggest, in general-- what I find is that often the domain expertise may be a
bit low. And so what I've seen a lot of companies do is partner them up with
experts with good domain expertise.

Now, this is what I think. I took that Venn diagram, and I wrecked it. It's
no longer a Venn diagram, because I made some adjustments to it. And I said,
well, let me change the size of the circles to represent where I think we
should focus and let me make comments about how we can grow capabilities with
our geoscientists and engineers, because I think that's a great idea, too.

What we do in our graduate and undergraduate education at the University of
Texas at Austin is I've worked to revamp our program and to put in brand new
courses that teach the concepts of data analytics and machine learning,
specifically for geoscientists and engineers. I think that's a great idea. We
continue to develop the subsurface geoscience and engineering expertise. That's
critical.

But what we do is we teach them the statistics. We enhance their
understanding of statistics, the practice for data analytics. And we encourage,
and we get them coding, which is essential, so that they're able to use the
very best tools and build the workflows to add value.

So I think that's what we need to do with our students, and that's what I'm
doing right now. But also, what we can do is we can build capability among our
existing geoscience and engineering workforce. And that's why I put every one
of my lectures online and all of the workflows online, is to support this
effort to build that capability. I've seen great companies working to build
those capabilities. I very much have enjoyed being part of teaching those
individuals.

Let me talk about energy data analytics, machine learning examples. I want
to show you a bunch of examples. Now, what are some examples of things we can
do with data analytics? One of the great things we can do is work on the
problem of feature selection.

It turns out, taking every possible feature and the kitchen sink and
throwing it into your model is not a good idea. You'll have a very weak model.
It can have very low interpretability. It's way too complicated. And so it's
better to choose the best features to work with that communicate the most
amount information.

So we use data analytics to help pick the very best features to work with,
improved interpretability. We reduce model variance, and we get improved model
accuracy while doing it. So while we have many different features we work
with-- and this is a matrix scatter plot shown right here-- we have a wide
variety of different variables or features we'd be working with in an
unconventional setting, we may want to go through those and figure out which
ones are most indicative or predictive of production of individual wells.

And so that's what we've done right here. We use data analytics. And we
said, we can do a correlation analysis. We can look at things such as rank
correlation coefficients that are robust, in the case of having outliers and
any type of non-linearity that behaves monotonically.

We can use measures such as partial correlation coefficients-- which I think
are way underused-- which are able to actually isolate the influence of
individual predictor features on the response. In other words, understand how
porosity alone impacts production, very powerful stuff. We can also use
model-based importance measures, which are really great. Not only do machine
learning models give you great methodologies to make predictions, but they
actually can help us rank our features and understand which ones are most
impactful.

Now, I've got to tell you, sometimes I just go back to traditional
statistics. I go back to the idea of just conditional distributions. And so
this is an example right here from that same example, where we have a violin
plot. And so what we've done is we've taken the wells, and we split them up
into low producing wells, high producing wells. And we just look at the
conditional distributions over each one of the predictor features.

And what you can do is you can evaluate which ones of these predictor
features have the most unique or distinct behavior between those conditional
distributions. Porosity is distinctly different, whereas acoustic impedance, it
may be difficult to tell them apart. They're a little bit different from each
other. This tells us about the sensitivity. It's a good indicator of the
importance of each one of the features.

Now, there's another method we can use in machine learning, that's
inferential type of approaches, such as cluster analysis. Cluster analysis is
really great. It's an automatic assignment of categories or groups, looking for
how things group within your data set. It's a first step to finding patterns.

And so we give ourselves two predictor features. One of them is well average
porosity, and the other one is well average acoustic impedance. We have a
petrophysical property on the x-axis and a geophysical property on the y-axis.
We could do a grouping of rock in such a manner that it's observable in the
wells and observable and seismic at the same time.

Now, I use the term rock type. I'm not suggesting facies. I know there's a
lot that goes into facies, but I'm just suggesting a grouping of the rock. So
we can go ahead and do that. And if we use some type of knowledge about the
setting and we say that we expect to see three groups, we can get something
like this. We have low porosity, high acoustic impedance, high porosity, low
acoustic impedance and something in-between.

Now, this can be very useful. Now, this a very simple example, though. And
those who know something about cluster analysis will recognize that I've just
used k means clustering here.

Now, there's a lot of methods we can use, methods that can impose prior
information, expert knowledge about the setting, methods that can also weight
the features based on importance. And we can also integrate complicated group
geometries into it, too. Many methods can do that, too. So there's a lot of
powerful ways to do this.

Now, of course, we can also do prediction with machine learning, and this is
very powerful, too. Now, here's our very first machine learning model for
prediction, a linear regression model. We've got density and porosity. And it's
useful to be able to go between the two, to infer petrophysical properties
directly from measures of the rock.

Now, we have our training data shown there, and we're able to make
predictions at unobserved cases, just by using that line right there. That's
our model. What's really fun is if you think about the fundamental definition
of machine learning, it's hard to argue that linear regression is not machine
learning. In fact, a really good quote from one of the professors I work with,
Dr. Foster, is when you think of machine learning, just think of a glorified or
enhanced version of linear regression, and you'll probably be on the right
path.

I challenge my students every term to try to prove to me that linear
regression is not machine learning. Nobody's been successful yet. Now, of
course, linear regression is not very complicated, and it's very simple. We can
go to much more advanced methodologies.

Isotonic regression is very cool. It's a piecewise linear regression method,
but it allows you to capture certain physics of the problem, in this case, the
monotonic relationship between porosity and density. You expect it to be
negatively correlated. In other words, as one is high, the other should be low.

And we can capture that within this model in a very flexible manner. It's
actually still parametrically a pretty simple model, not a lot of parameters to
estimate. And so it's not a difficult model to work with sparse data, like we
have right here.

Now, of course, you can go to polynomial regression. The great thing about
polynomial regression is that we have a much more flexible model. It's not
linear any longer. But we retain the benefit of having just a few parameters to
work with.

Now, there's all kinds of devilish details. If you watch my video on
polynomial regression, I get into the idea of using orthogonal polynomial-basis
expansion and so forth and why that would be beneficial. But we'll just leave
it right here, that we have a very flexible method to work with.

Ridge regression. You might of heard of this before-- not complicated at
all. It's actually linear regression, where we add what's known as a shrinkage
term or regularization term to the minimization, to the equation that we're
solving in order to get the very best fit line. What does it mean? Well, what
we're actually doing when we do regularization is if you look at that fit, it
doesn't actually look great. It looks like the slope is too low, that it should
actually have a steeper slope, and that's on purpose.

What regularization does-- it actually shrinks the parameters to go towards
zero. And so it makes the slope be shallower. Now, in doing that, what it does
is it reduces model variance. You remember the model variance, model bias
trade-off? But it increases model bias. In many noisy data sets, we get a much
better prediction when we go ahead and shrink and reduce that slope. That's
ridge regression.

Now, what I'll do is I'll switch to an example that has three features. For
the following machines, it actually is really cool to look at its behavior over
two predictor features. We're going to work with standardized porosity and
standardized brittleness. That could be any type of geomechanical property,
that indicates anything about frackability or brittleness of the rock. And what
we're plotting that against-- the color here is production rate of the
individual wells, and this is the samples we're working with from an
unconventional data set.

So we'll start with linear regression. When you look at that model, it
should look very straightforward. It basically is just a plane in space. It's a
linear model. If it's a higher order or a higher dimension model, I should say,
you would see a hyperplane that we can project into different dimensions. Very
intuitive, very few parameters.

Now, let's move to something more complicated for machines. We have
k-nearest neighbor. And what k-nearest neighbor does, and what's really cool
about it is you can think about it as basically attempting to do interpolation
or mapping in the predictor feature space.

Now, us, as geoscientists and engineers working in subsurface problems,
we're very used to the idea of trying to do interpolation or to try to do
mapping. And so k-nearest neighbors is just trying to make a map. We're
choosing a set of nearest sample data, or training data in this case, to make
local predictions.

Now, what's very cool about that is we have a very intuitive hyperparameter.
We'll understand this if we've done any type of interpolation. If you use more
nearest neighbors, you get a smoother map. If you use fewer nearest neighbors,
you get a rougher map, and you fit the data more specifically. What does that
mean? We have a very intuitive hyperparameter, k, the number of nearest
neighbors.

Decision tree. Decision trees are pretty simple, but they're used to build
much more complicated models, like random forest. So let's start with the tree,
so we can understand the forest.

So it's a hierarchical binary segmentation of the predictor feature space
into blocks. So if you think about it, we just go through the feature space,
and we just make splits-- 1 split, the next split, the next split, the next
split. And by doing that, we break it up into a bunch of regions. And inside of
each region, we predict with the average of the training data within the
region.

Very intuitive model, it's very simple. It's actually non-parametric. But
underneath the hood, it's not actually that many coefficients required, so the
number of parameters actually used in the models is pretty low. It's very
intuitive.

The hyperparameter in this case is going to be a pruning of the tree. I love
the terminology they use. You overgrow the tree, and then you prune it back
down to get it to the level of complexity that does the best in testing,
hyperparameter tuning.

Random forest, what we do is we take a set of trees, and we put them
together to get the very best estimate. It's an ensemble learner. We'll take a
bunch of trees-- take the average of all of the trees' estimates, and we'll say
that's the best. What that does, it reduces model variance through averaging, a
very powerful concept.

Now, what's very cool about random forest is you enhance ensemble diversity
through randomly subsetting the predictor features. In other words, every time
you make a split, you don't get to use all of the features. You only use a
random subset.

What's the message here? Diversity is strength. It turns out, by having
diversity within all of the estimators, you get a better estimator. And random
forest, in fact, competes with many of the leading machine learning
methodologies. It's very powerful.

Now, gradient boosting, what do we do here? It's another ensemble learning
method to reduce model variance, but it's actually very cool. What it does is
you take a very weak model, a very weak learner, and you calculate the error
from that first learner. Then you train another weak learner on the error. You
fit the error. And then you have the second order error, the third order error,
and you keep going.

Now, people who are maybe around my generation will remember, there was a
razor blade commercial that said, the very first blade cuts very close, the
second blade, even closer, the third, even-- that's exactly what gradient
boosting does. It's going to the first model. It will try to take a cut at the
estimate. The second will get even closer, the third, closer. And it turns out,
by learning slowly, that we're able-- and using methodologies related to
gradient descent optimization, we can actually get a very good estimate, very,
very powerful methods.

So let's talk about how are we going to use this in practice. Let me give
some philosophy, some advice as far as using machine learning within energy.
First of all, fit-for-purpose modeling. I've been involved in subsurface
modeling for a long time. We knew this.

Right at the very beginning, before you even start making modeling choices,
you've got to understand the goals of the model, and you've got to put that
into the model. It'll affect all of the decisions you make if you're
goal-focused. You may also consider future goals, though, because maybe you'll
want to grow a little bit into the model. That's fine. But you've got to
account for the resources. You're always resource-constrained-- time, people,
expertise.

This is the old Venn diagram again, good, fast, cheap. You can't have good,
fast, and cheap. It's not possible. You get to have fast and cheap, good and
cheap, or fast and good, but you don't get all three.

Modeling for discomfort. I'm a big fan of Mark Bentley. If you see this
video, Mark, hey, howdy. He was worried about this idea of modeling for
comfort. He said models often become-- subsurface models often become tools for
verification of decisions already partially or fully made. We're just proving
to ourselves what we already think.

This is what's known as modeling for comfort. It makes us all feel good, but
it's very dangerous. It's really a form of confirmation bias. Mark Bentley
actually recommended that we model for discomfort. He wants to make us
uncomfortable at work, which I think is great. We're stress testing our current
concepts and our decision-making.

So in other words, when we do that, we're really testing the extreme cases
for identifying and understanding the upside potential, and we're securing
ourselves against the worst case. When I teach this in my courses, I talk about
MythBusters, that show-- I don't know if you guys watch that. My kids watched
it, so I watched it.

Whenever something didn't actually happen in the show, what would they do?
They just put more TNT in. They just put more pressure in. They'd make it
break. Go to break, I think we should do that. And in doing all of this, we
really need to recognize our biases, as Mark Bentley reminded us. Thanks, Mark.

Now, we've got to remember, too, that our foundation and everything we do is
in probability and statistics. If you use methodologies like Naive Bayes
classification, it's, in fact, derived directly from Bayesian statistics. And
here's the equations right here with the independence assumption. All of the
methods have a statistical foundation. To understand the method, you have to
understand the statistics.

Now, there are times you don't really need complicated machines. If you have
enough samples, you can just work directly with conditional, joint and marginal
probabilities to make predictions. Here's an example right here on the bottom
right, where we have acoustic impedance versus porosity.

We have enough data. We could just calculate the conditional distributions
and make predictions with that model. We don't need to build a machine to do
that. But remember, machine learning is statistical learning.

Now, let me just give a couple of warnings right now. The concept of
parsimony. Now, start simple, build the simplest model you can and then build
up from that. Models must be understandable and interpretable. Don't jump to
complexity. In fact, you may even do worse, since we showed with the variance
bias trade-off.

And in fact, I always challenge my students. If they build a complicated
model, I always say, well, did you build a linear regression model first? When
I did research and development within Chevron's energy technology company, I
always had to demonstrate incremental value of every type of workflow I
proposed or methodology we developed. It should be the same way. We've got to
show that it performs better than a simpler tool, because we're going to lose
interpretability, too.

Scientific method. Go back and just think about the rigor of the scientific
method, the robust use of statistics, the highly critical approach that should
be used. We should always be trying to disprove our models. We should always be
trying to disprove.

And we should take inspiration from the traits. My twin brother and my
grandfather were both machinists. And I'll tell you what, I respect very much
the traits. Their knowledge of their tools, the tools that they carry in their
toolkit-- in their tool box-- the tools that they use every day, is
exceptional.

They don't wreck a lot of metal. They know exactly how the tool will perform
in a wide variety of different circumstances. They know what's the best tool to
use. We should have the same level of competencies when it comes to machine
learning tools, when we use them in practice. We should be inspired by the
traits.

Getting started. There's a lot of resources available to help you get
started. This is a great time to get started. This is my talk, so I'm going to
promote my resources. But I know, it's shameless. Many people have great
resources out there, and you can find them.

All of my lectures are available. These are three example lectures. On the
left hand side from my three courses, spatial data analytics, data analytics
and geostatistics, and machine learning courses. All of the examples are
available on GitHub. So you can follow along, you can go through it. Everything
in the course, you can work it out at home. Everything is there for you.

My advice to you, if you haven't done it already, go back home or do it at
work if it's allowed-- download Anaconda, install it-- one-stop shop. You'll
get Jupyter notebooks. You get scikit-learn pack. You get all the standard
packages, NumPy Pandas. Everything will be there for you to get started. You
can run the workflows that have my course coding.

The most powerful, flexible methods are in Python packages. Open source is
really, really awesome in this area. So you'll need to know a little bit of
coding if you want to really maximize your impact. Don't worry about it. It's
not like in the dark ages when we were all doing Fortran and C++. I spent many
years doing that myself. It's Python and R. And really, it's more about
scripting and putting workflows together.

If you don't believe me, well, look at this example right here. So the first
line of code is just simply importing the package for the decision tree from
scikit-learn, which is awesome. I love scikit-learn. The next line of code is
simply loading your data up. And you can actually load that data, too. It's on
my GitHub account. It's just a simple, comma-delimited file. You can load it up
in Excel. It's very simple.

The next line is building your decision tree. It's actually instantiating or
making a decision tree. You're setting the hyperparameters, the degree of
complexity in those parameters. The next step-- you're going to fit your data.
I had porosity and brittleness data, and I'm trying to predict production of
wells.

And the next line-- I'm just taking a combination of porosity and
brittleness percentages, and I'm making a prediction at a new location with my model.
That's it. That's it. I did machine learning in five lines of code. It's that
simple.

Now, just in case you want to get started-- if you're like really excited
and you want to start right now, go to that link right there. And the last five
machines that I showed you, from linear regression all the way to gradient
boosting, there's a very simple workflow in Jupyter notebooks-- actually in the
top right there, that's the header of the Jupyter notebook. It's got
instructions. It's pretty well documented.

And you can go ahead and work through the examples that I just showed you,
play around with the hyperparameters, change them up, overtrain the model,
undertrain or underfit or overfit the model. Have a little fun with it, it can
be a lot of fun. The best way to learn machine learning is just like being a
machinist-- is to actually use the tools and get practice with the tools.

Just in case you need a little bit more motivation to start coding, here's a
couple of points. I like to talk to my students about my top reasons to learn
to code. First reason, transparency. No compiler, no computer accepts hand
waving. You don't get to hand wave. Coding forces you to make your logic bare.
You have to actually show people exactly what you mean.

Reproducibility. Run it, get an answer, hand it over to somebody else. They
run it. They get the same answer. This is the main principal of scientific
method. Quantification programs need numbers. And so you've got to feed the
program. You'll discover new ways to look at the world through quantification.
I think that's powerful.

Open-source. As I said before, the very best methods for machine learning
are in Open-source. You can leverage the world of brilliance. I needed to build
a subsurface trend model, and the best code I could find to get the job done
was Astro Pi. It's an astronomy package that's being used-- or
astrophysics-type package that was being used for all kinds of mapping in
space, literally space. And I start to use it for mapping in the subsurface. It
was perfect. It worked great. And I didn't have to code it myself.

Deployment. If you have a great idea, you build a workflow and you code it
up, you can share it with others. And you can multiply your impact. You may be
concerned about your performance metric at the end of the year, or you may just
be a super nice person who wants to help people out. It doesn't matter. It's
both going to happen. You're going to look great while helping others.

Let me talk about just a couple of technologies or examples that are more
advanced machine learning, so again, the idea of what can be done with machine
learning. This is the work of one of my PhD students, Wendy Liu. And what she
was doing was developing a brand new methodology for spatial data analytics,
for spatial anomaly detection.

Now, it turns out that if you have a data set-- I show here porosity, but it
could be production. It could be a dense data set with many wells. You may have
one well that tends to be a little bit high, and you wonder, is that anomalous?
Is that unusual, that something different happened here, or is this something
we would expect?

Now, what she's done is she developed a methodology whereby we can build
maps. And the map on the bottom right, the purple areas are locations where you
have well-to-well transitions that are unusual or have a low probability,
therefore, likely anomalous. You can detect discontinuities in your data set
spatially. You can also detect when you have an unusual well, maybe something
went wrong with the completion, or maybe something went right. What is the
difference that makes a difference?

Here's another example. This is a student, Honggeun Jo, another one of my
PhD students-- is working on building subsurface models using machine learning,
now, specifically, the challenge of precise conditioning to wells. Now, this is
very important, because we need to match the wells at the well locations.
Remember how expensive that well data is.

The methodology is using a semantic inpainting. In fact, it's the same
technology that's used for image restoration when you have a rip or a tear in
that image-- you know, when the pictures of grandma and granddad have worn out,
and they want to fix it using the machine. That's exactly what we use here.

What we do is we build models that approximatively honor the well data. We
mask or remove around the wells. And then what we're able to do is replace
around the wells, such that we match the conceptual information-- the model
around the mask, and the perceptual information, the model elsewhere, so that
we match the well data, but we don't mess up the concepts. This example is
deepwater lobes, compensation with lobes. It works quite well.

Seismic downscaling. Another one of my students, Wen Pan, is working on
this. This is the idea of taking a machine-- it's a pixel-2-pixel method, a
variant of convolutional neural nets, where we're able to train it on a wide
variety of high resolution models.

Now, that model in the center bottom is a truth model with a high resolution
of architecture, maybe like a meandering type of channel fluvial belt system.
We upscale it to seismic-- the image shown above-- and we train it, so that the
machine can tell the difference. It can map between the seismic and the high
resolution architecture.

Once we've done that, we can give the machine a variety of seismic images.
And it can predict what the architecture might look like within the seismic.
That's seismic downscaling. We can get many, many models very fast. They're
actually very good. The model on the top right actually looks really good. It's
able to actually put the architecture in quite well and honor the wells at the
same time.

Now, I'd like to be kind of transparent about this. The image on the bottom
right hand side doesn't look so great, and the reason being, the seismic was
ambiguous. It's kind of more of a blob. You don't have the nice arcuate shapes.
It didn't know where to put the channels. So this is really interesting, the
seismic downscaling.

Now, we're going beyond that. We're actually building reservoir models
completely using machines. And so here's examples, shown on the right hand
side, of a variety of realizations and models built by one of my PhD these
students, Wen Pan.

And so this is incredible, because these models are much more complicated
than the models that we've been building up till now. So they can honor much
more of the geologic, geophysical, and engineering information. We get better
models by using machines than our traditional geostatistical methodologies, and
we get a fast methodology to explore over uncertainty spaces.

Here's another example-- just two more examples-- production forecasting.
And so what we've done here-- one of my PhD students who has graduated and now
working in the industry-- he worked on this idea of making predictions or
forecasts for production over time. And so what we did was we trained up the
model which was the short term memory network with 2,500 days of production and
injection information. There were nine injectors.

So if you look right here on the left hand side, we have nine injectors in a
complicated injector history, a lot of cycling of the injection. Then what we
did, after we train the system, is we forecast it for 1,000 days into the
future. And what was really cool is that this system learned those interactions
between the injectors and the producers, which were complicated, to the point
where it was able to make very good predictions of flow into the future.

What's kind of spooky about this is if you look really carefully at the
image, the red well, injector number four, in fact, has injection behavior
during the training stage that's totally different than during the testing
stage. And we're still able to make good predictions. It learned the system
very well.

Finally, multi-scale flow proxy models. This is very exciting. One of my
other students, Javier Santos, has developed a model that's doing very well at
making accurate predictions of flow velocity between the grains in rock. So
we're taking very small scale models, like these models-- this model right here
is 2 centimeters by 2 centimeters by 2 centimeters. And we've imaged the
grains, and we're able to actually make good predictions under certain pressure
and fluids of what would be the flow velocities.

This is really, really cool. Because if we did this using Lattice Boltzmann
or more complete physics-based calculations, you'll find that it would take a
long time to do these calculations. These can be done well under a second, very
quickly. Now, by doing this, by making very quick calculations, a small scale
flow, we hope to be able to work out problems around multi-scale permeability,
moving from the very fine scale to the more production-relevant scales.

Let me make a couple concluding remarks. First of all, data analytics and
machine learning methods provide new tools. It's a fourth paradigm of
scientific discovery for all of us to use and add value. How are we going to
add value? Efficiency and automation.

What I like to say is, geoscientists will do more geoscience. We'll be able
to automate a lot of the more mundane parts of our jobs and be able to focus
more on the more scientific. And in fact, we can do things such as detecting
anomalies.

Now, that's great because we work in very large data sets. And so we'll be
able to focus on what matters, find the locations that matter most and focus
our professional time finding new patterns. We can use these methodologies to
see our data in totally different ways and to pose new scientific questions, go
back to our geoscience and engineering concepts and try to work it out. It's a
very nice balancing between the two.

Assisted interpretation. We've been doing that for a while. It helps us a
lot. I'm surrounded by people-- I've seen this a lot-- who have actually
injured themselves through repetitive stress injuries. It's happened a lot
through all that clicking and interpretation. Well, we'll have machines to
support and to help us to guide our interpretation even further, so we can
spend more time focused on the geoscience questions.

Now, new, improved models. And I hope I've shown a couple examples right
there. I'm excited, because these models actually better integrate our expert
knowledge and also provide us with real-time feedback. They're very, very fast.
So as we're building our models, we'll get feedback.

I like to teach my students, it's like TurboTax. When you do TurboTax at the
beginning of the year, you put in your information from a W-2 or W-4-- I forgot
now. But basically, what happens is it will immediately tell you how much
you're getting back. And you know what happens next. You start answering
questions, and all of it disappears. No, I'm kidding. I'm kidding. But it
starts to change.

That's how I would like to see subsurface modeling with machines. We make
decisions, like fault transmissibility or maybe we make decisions such as fault
throws or offsets or structural shape or all of the interpretations. And we
immediately see how that will impact-- or maybe it doesn't impact. Maybe we
don't want to focus there.

Geoscience and engineering expertise remains core to our business. We'll
have augmented abilities and capabilities with the new digital technologies.
Before, people used to say, those with the best data win. What I think we say
now is, those with the best data and use the data best win.

So I'd like to acknowledge now the AAPG and the AAPG Foundation, all of the
host organizations-- for when to get on the road and get to tour around a
little bit. I'm excited to do that-- for this great opportunity to share this
message around machine learning and data analytics. Thank you very much.

## In the News