Solutions delivering competitive advantage and differentiation through faster business insight
Online Media Providers
In an ad-driven revenue model, the information needed to select the correct ad comes from many places, in the form of both structured information and log messages. Analyzing this data torrent and providing the right ads in the right place at the right time as the user base expands is crucial to keep the business growing.
The Business Problem
Online media providers face a challenge every day: Many users streaming content, supported by ad-based revenue. The ads that their users see should be targeted for them based on their listening patterns, locations, and more, ads selected in real time to deliver maximum revenue. The traditional approach is based on data analysis from logs stored in Hadoop/Hive and with key information extracted and stored in a structured database. The ads application needs the capability to detect trends in almost real-time which is incompatible with an ETL process with large data ingest requirements.
Every day, the media company gets 2TB of logs collected from everywhere, with peak ingest of more than 42GB per hour. Processing these logs in Hive and writing them separately to a traditional RDBMS just can’t keep up with the growing volume of data. The initial application misses critical data and there’s a lot of redundant data, processing, and operational effort required in keeping data in both Hive tables and the RDBMs.
EsgynDB provides a solution with direct access to Hive native tables, from SQL and the capability to store queries results directly in EsgynDB tables, removing the ETL bottle neck and the data duplication. All the data sits in the same Hadoop data lake and from an application design perspective it is no longer necessary to specify and implement a separate ETL process: The data is dynamically accessible through regular queries. The result is the near real-time ad processing that the media provider looked for in a solution that scales.