03 May, 2019

Interview with Alec Walker, DelfinSia: Unstructured Data Mining and Text Analytics for Improved Upstream Efficiencies

 

There is valuable information locked up in seemingly inaccessible places, and not of the data is compatible or even organized consistently. How can one use the valuable information for good decision-making for oil and gas operations?

There is valuable information locked up in seemingly inaccessible places, and not of the data is compatible or even organized consistently. How can one use the valuable information for good decision-making for oil and gas operations?

Welcome to an interview with Alec Walker of the AI startup, DelfinSia, which has developed a new artificial intelligence-driven tool for unstructured data mining and text analytics, with the result of a virtual advisor that can help improve efficiency, safety, and operational effectiveness.

What is your name and your background?

Alec Walker
Alec Walker
I’m Alec Walker. Thanks for the opportunity to interview, here! I’m much less interesting than our product, Sia, and there are others in my firm who have arguably more interesting backgrounds, but in short, I got my BS in chemical engineering and my BA in Asian studies from Rice University in 2009. I then worked for Shell for about four years, moving from catalysis R&D and refinery technical service engineering based out of Houston, then doing internal software product management, and then moving to Colorado to be a reservoir engineer for a big asset in Wyoming. I noticed a lot of potential for internal entrepreneurship and digital transformation, so I left Shell in 2013 for an MBA at Stanford and then started a consulting firm helping multinationals with these two topics. Eventually, I got back in touch with Astron International, an oil and gas software consultancy we’d hired while I was at Shell to build tech tools for us, and together we founded Delfin. I encourage readers to check out our growing team on our company website. Delfin

What is DelfinSia?

DelfinSia, Inc. is the official name for our company, but everyone just calls us Delfin. We’re an unstructured data mining and text analytics company based in Houston, Texas (and sort of Silicon Valley, too) helping the oil and gas industry. Sia is the product, a virtual advisor (think “Alexa” or “Siri”) that interacts with a client’s unstructured data in real time to provide relevant and accurate answers to deep technical questions. Users ask Sia questions to help research their decisions in the same way they would go ask a subject matter expert. Whenever Sia hears a question, it is capable of interpreting the actual intent of that question, unlike the typical enterprise search the industry is using. Armed with a sense of what matters to the asker, Sia can then look through all the unstructured data it’s been given access to and come back with just the relevant answers.

What is unstructured data?

Thanks for asking this one! Unstructured data is what data scientists call information that lacks data models or organizational consistency. Practically, it’s the knowledge stored in reports, emails, and other documents that is not organized in standard templates or attached to consistent labels. It’s usually textual, but it can contain some numbers, too. To access unstructured data in a traditional setting, the employee has to do some digging and research. Maybe they call someone, knock on a subject matter expert’s door, play around for a while in SharePoint or equivalent, etc.

Of course, we’re happy that our tool is able to do things like interpret people’s voices and turn that into text, or present results in clean and clear formatting on both mobile and desktop devices, but these are pretty common capabilities. What we’re really proud of is Sia’s capability to quickly and deftly sort through unstructured data in the oil and gas industry.

What is structured data?

Structured data is the kind of data that has consistent prescribed labels and organizational templates and is easily passed between and pulled up within traditional tools. Cells within spreadsheets can be found via reference to their rows and columns. Operation data can be automatically pushed into databases and called using SQL queries. A pretty accurate way to think of structured data is that it is the kind of data that gets automatically added to databases, and it tends to be numerical and highly contextual.

We’ve found that many operators are good at manipulating their structured data. They have moved from emailing each other spreadsheet attachments that have multiple unresolved versions to instead storing their data on the cloud (just one live version) and setting up ways to directly or automatically import structured data as it is generated. For example, oil, gas, and water production data from a given well in a given cluster with a given first production data and with many other givens (labels) gets automatically imported into the database that employees look through as they try to optimize performance and learn how to better handle the next well. Many are implementing machine learning techniques to make predictions, prescribe categories, and inform decisions with this structured data. The biggest challenges come from the unstructured data. For example, when someone does all this work and creates a nice report with a lot of distilled knowledge, that report may become buried beneath subsequent reports, with the insight it contains buried along with it.

Why and how is the DelfinSia capability different than what others can offer? What do you hope to achieve?

We’ve done some exploration of our close and distant competitors, but we can’t claim to have perfect information. We tend to rely on what we hear about how we differ from them when talking to clients and prospective clients. These groups may have considered one or more of our competitors in addition to Sia. This is to say that our information is mostly second-hand. From what we have heard and explored, there are five major differences.

First, the majority of our competitors seem to follow consulting business models, aiming for longer projects with lots of hours to maximize their returns. They charge for each new user, for each new document added to a use case, and/or for each time the tool is used. The setup of the tool seems to carry on even into the implementation. Sometimes, they outsource the consulting part to others who can help implement the tool in the client company for them, creating more complications. Sia is built to scale, so it’s easy and quick to set up, and the business model follows the model people are used to for software: a standard setup fee and subscription service rewarding clients for increased use.

Second, we understand that several of our competitors require drastic and permanent disruptions to existing workflow practices, essentially changing the job descriptions of their users. Some groups have to work on collecting and connecting data, while others focus on deriving tools from the connected data the first group has put together, and then yet others have to learn to use these tools and offer feedback to the other two groups. We don’t require our clients to essentially implement a completely new workflow (they’re busy enough already!); we just enable them to make better decisions faster.

Third, it seems that many vendors offer solutions based on deterministic programming. This means that every question and answer pair must be hardcoded into the solution, requiring the client to know ahead of time all the possible questions that users might ask. From our perspective, if the client has such a list, they are prepared to automate their entire workflow. The reason why people are still relevant to knowledge work is that even in operations-driven industries like oil and gas (with lots of things to learn from precedents), workflows are necessarily dynamic, and no one can predict comprehensively what information will be valuable in future decision-making. Hence, researching on the fly requires flexibility in the tool that decision-trees in deterministic programming will never provide. Our tool “thinks” much more flexibly than such deterministic tools, so it can offer a lot more value.

Fourth, there are several players offering services that span multiple industries simultaneously, lacking oil and gas expertise and focus. It’s possible that generally applicable algorithms can be trained on oil and gas data to eventually constitute useful virtual advisors, but they have to start from scratch and learn everything on the client’s time, and even when they’re up to speed, they differ drastically from algorithms built specifically to handle oil and gas. Consider this analogy: If you recruited a general athlete to a basketball team, the person might learn to play good basketball, but you’d probably rather recruit someone who developed their athleticism purely in a basketball context. They won’t have to unlearn anything, and their instincts and ideas will make sense through and through in the context of basketball. Our tool is like this basketball player.

Fifth and finally, we’ve heard that the time to implement some tools is measured in quarters, whereas we measure in days. All in all, while we are a data science company and part of the general data science community, we are first and foremost an oil and gas company, and everything we’ve built from the very start has been for direct use in oil and gas. This focus has paid off in that it helps us implement very quickly. Our tool starts off already very knowledgeable about the client’s domain, so it doesn’t need to train very much. The basketball player analogy holds here, too, but it applies to speed of delivery just as it applies to accuracy of results.

What we hope to achieve is empowerment of the next generation of oil and gas employees through unstructured data mining. We want to help create a future in which employees across the industry make decisions backed by the full knowledge and expertise of their companies, in which they can always satisfy their curiosity and build their competency through instantaneous research, and in which process downtime and project delays have been pushed to an absolute minimum.

Can you give a few examples of what DelfinSia has done? Any use cases?

There are a few ways in which Sia is being used so far. Namely, some folks retiring from Linde are using it to create a surrogate version of themselves so that people can still ask questions and learn how best to do their jobs. This is the classic HR application of Sia, spanning onboarding, job training, and retention of critical knowledge. Another example of how Sia is used is by some technical service engineers in BASF referencing the myriad recommendations they have made to their clients in the past. This is the classic operations application of Sia, involving ensuring timely, consistent, and accurate decision-making where key stakeholders are overlooking the decisions. It seems to us that the software solution space for oil and gas is currently rather frothy, so we’re proud that these folks are happy to be a reference for us to help others diligence our capability. Finally, Sia can help companies simply populate their databases with data that is scattered across unstructured files. We’re excited by all the attention from operators we’re starting to get as they solidify their understanding of the challenges unstructured data is presenting to their people.

We have created a free version of Sia that we encourage readers to play with. This Sia sits atop a bunch of unstructured data we pulled from a free and public online forum for oil refinery operations engineers, so it’s good at answering the kinds of technical questions that those people have. We love to demo this tool, but we want users to be aware that any version of Sia is only as smart as the knowledge contained in the data corpus to which it has access. In other words, don’t try asking this tool how to optimize a well! Take a look here for a live example of an application of the tool, showing some of what it can do.

What are your plans for the future?

We believe in the growth value of a solid network of happy customers. While we of course oblige our clients’ secrecy demands, positive reception to our tool nonetheless spreads across the oil and gas community. While many prospective customers are facing pressure from their investors to hurry up and carry out comprehensive digital transformation, we don’t want to rush anyone into something that they aren’t confident will work for them. We’ve seen artificial intelligence hype rise and fall just like the oil and gas industry has seen oil prices rise and fall. The groups that survive in our industry in the long haul are those that develop strong reputations for delivering consistently high value. So, in short, our plans for the future are to grow our company across the industry implementing the tool where it can add value in a forthright, thorough, and service-focused way.

We also believe in the growth value of a solid community of happy team mates. Our employees are real people with real lives, and while they work hard and take pride in what they do, they also each have a healthy and sustainable work-life balance. Overcoming new challenges requires creativity, which in turn is fueled by diverse thinking, so we have deliberately built our company culture around transparency, exploration, and respect for the diverse individual. Our chief scientist likes to ask people what the difference is between how they spend their free time recovering from stress / procrastinating from work versus how they spend their free time when they’re doing what they really want. After some deliberation, he likes to help encourage people to go pursue that second category more. We’ve found that makes us all more productive and creative. Natural language processing and other parts of data science will continue to evolve quickly, so we are always working to upgrade our capabilities and remain on the cutting edge. In the future, it will continue to be important to retain our exploratory culture despite the widespread adoption of the tool in repeating implementation. Raising Sia requires all the responsibility and devotion of nurturing a really smart daughter, but we still aim to be a fun and happy family.