Nested Tables & Machine Drawing Text Extraction for an Oil & Gas Company

Project Overview

The client frequently had to deal with a large number of PDF documents containing intricate diagrams of the parts of drilling machines as well as data in nested tables and other formats. Their request was for data to be extracted and saved in a way that would allow for later analysis. For all 3 use cases, text processing was done using Indium’s Text analytics accelerator, teX-ai.

About Client

The Client is one of the pioneers in the oil and gas business, with a focus on innovation to find ways to help their customers to fuel progress in agriculture, industry, medicine, science, space, technology, and transportation. The combination of engineering disciplines, computer science, geophysics, and metallurgy help create a winning formula for all stakeholders in such projects.

Business Challenges

  • Client had hundreds of PDF documents and each of these PDF documents had pages ranging from 2 to 100 pages. In some cases, the required data was not present in all of the pages of the PDF documents.
  • There were 5 different formats of documents consisting of engineering drawings, nested tables, un-demarcated tables, etc. This requires model creation for each of the document format.