One of the most challenging areas of research involves combing through articles to develop new insights and ideas. Welcome to an interview with Paul Cleverley, geologist and data scientist, who has developed a machine learning algorithm designed to comb through unstructured geological texts and pull together new intellectual property. Machine learning programs and start-ups will be included in AAPG’s U-Pitch events at ACE and URTeC.
What is your name and your background?
My name is Paul Cleverley, a Geologist by degree (University College Swansea, Wales, UK) with a Masters in Computing in Earth Sciences (Keele University, England, UK). For my dissertation in 1991, I wrote a computer program to automatically interpret lithology from wireline log data, and automatically generate 'well summary sheets'. At the time, this summarization task was handled manually, was time consuming and often inconsistent.
I worked as a Geoscientist and Data Manager at various Oil and Gas (O&G) Exploration companies during the 'first' digital transformation wave in the early 1990s, as the industry moved from paper to workstations. I joined IBM's Petroleum Group in the late 1990s working on government data banks. It was here that my imagination was first sparked towards unstructured information (documents), subsequently working on integration with structured data and GIS at Enterprise Oil.
During the mid 2000's my attention drew increasingly to the potential to exploit latent patterns that exist in unstructured text to stimulate new knowledge. I took up the role of Head of Knowledge Management at a Supermajor for a few years. Subsequently I consulted extensively on the topic to numerous O&G companies over a period of 20 years, in Europe, North America, South America and the Middle East. Working with some great people and in some amazing teams (I still do!) led to picking up business innovation awards from the British Computer Society and Her Majesty Queen Elizabeth II.
As the industry entered a 'second' digital transformation wave, from rule-based towards data driven intelligent assistants, so called Artificial Intelligence (AI), I was excited by the opportunities. In 2013 I embarked (self-funded) on a part time PhD on Enterprise Search & Text analytics at Robert Gordon University (Aberdeen, Scotland, UK). This gave me the opportunity to research and collaborate with other sectors such as Life Sciences, Retail, Aerospace and NASA. I completed my dissertation in 2017 where it was recognized among the top PhDs internationally in Information Science & Technology.
Late last year (2018) I founded a new tech start-up 'Infoscience Technologies Ltd'. The aim is to conduct research & development (a tech lab) into extracting knowledge from Geoscience text and produce Intellectual Property (IP) that can be licensed. I also hold several unpaid roles related to geoscience and computer science, having the privilege to serve on the Board of the non-profit cooperative GeoscienceWorld and as a Lecturer at Robert Gordon University.
How did you get started in innovation?
Writing computer games at an early age in the early 1980s at the birth of home computing probably helped me adopt an experimental mindset in this area. Being of ‘dual discipline’ (geoscience and computer science) has no doubt substantially helped the process of innovating and the opportunities it has afforded to me.
What is your product?
What surprises one Geoscientist in text may not necessarily surprise another; however certain algorithms may have a greater propensity to surface 'the surprising'. My research indicated a strong need from geoscientists to have search engines “show me something I don’t already know”. Explorers are after all, in the ideas business. With over 90% of people never clicking past page 2 of search results in information volumes too vast for us to ever read, some knowledge may be hidden from us by ‘standard’ search engine ranking algorithms. If we can also surface what are likely to be the ‘most surprising’ extracts and suggest these as well as the standard ‘ten blue links’, it may stimulate a geoscientist to learn something that is unexpected, insightful and valuable. Something they would not have otherwise encountered had it not been for the algorithm. Facilitating Serendipity.
I'm currently working on an algorithm written in Python to detect 'surprising' sentences in Geoscience text. The algorithm is called GEODE. This can be plugged into an existing search technology deployment to surface potentially surprising information buried deep in search results.
What does it do?
The program reads geoscience text from XML, PPT, PDF, Word documents and assigns a potential 'surprisingness score' to a sentence and the URL/URI of the document it originated from. Natural Language Processing (NLP) to detect ‘meaning’ and labelled examples in Machine Learning (ML) classifiers for ‘prediction’ are combined together into GEODE. There are currently over 5,000 informative features utilized in GEODE.
Can you give a few examples / case studies?
Using public domain petroleum geoscience reports to illustrate, the sentences below all received a high ‘surprisingness score’ based on the GEODE algorithm.
- …the offshore Cenozoic sedimentary section is now interpreted as much thicker than previously thought…
- …it is probably located at depth larger than 4000 ms in the graben and even deeper in other parts of the study area…
- …the presence of these markers could suggest a link to the upper Triassic petroleum system…
- …combined stratigraphic and structural traps that are not apparent on structure maps may exist along the eastern edge of…
- …reservoir targets within the Cyprus basin will be similar to those in the Levantine basin...
- …found a completely unexpected 64 meter thick section of middle solling sandstone member…
As these are generated from document search results, there will be some ‘contextual closeness’ to the query terms a geoscientist has entered, reducing the risk of overt distraction. If a geoscientist is somewhat ‘surprised’ by any of these suggestions, they can subsequently click through from the sentence to read the context; this may give rise to a learning event, they may not have otherwise had. The algorithm acting like a creative assistant.
What are your plans for the future?
More experimentation! I am also working on another algorithm (GEOSAPIEN) to surface geological analogues from text using theory guided machine learning, with more ideas in the pipeline and having fun while I do it.
Infoscience Technologies can be found at www.infosciencetechnologies.com and Paul’s research blog at www.paulhcleverley.com