Big Data Processing for Real-Time Consumer Engagement

Project Overview

Due to the client’s rapid growth in recent years, the volume (and characteristics) of data it collects and the analytics it must perform have increased exponentially. The original big data infrastructure was being pushed beyond what it was intended to support and consequently errors were being thrown up, time taken for basic analytics work shot up, maintenance costs were spiraling out of control etc. Indium Software was charged with upgrading the client’s infrastructure to guarantee: the lowest latency possible, superior responsiveness, seamless integration and cost effectiveness.

About Client

The client is a pioneer in the field of marketing with its invention of a mobile development platform.

Business Challenges

  • The client is one of those pioneers who has developed a Mobile Engagement Platform that enables enterprises to drive their marketing outreach through mobile messaging technology. They are an aggregator in the US with direct connectivity to all major wireless carriers with a best-in-class campaign management platform.
  • The existing architecture of the client used: MySQL (RDBMS) Server for storing messages. This limited insertion rate leading to an IO bottleneck. The system needed a Horizontal scale up i.e. adding more hardware and also configure time sensitive features to meet marketing SLAs. In spite of handling key functions: Disaster Recovery, Back up and Reporting, the data container was stagnant beyond 100 million capacity leading to process inefficiencies.
  • Pentaho Server for ETL Process requiring heavy IO and computation. ETL process took an average of 11 hours and observed spikes up to 15 hours during promotional days/ special campaigns. The system significantly slowed down the output productivity and efficiency.
  • PostgreSQL Server for Reporting and walling Aggregated Logs. Business users were unable to access the reporting data in real time. Aggregator logs contained only recent 2 month data for reporting; any data requests beyond that would need special data access requests that would typically take 24-48 hours to retrieve.