Blog 2018-03-14T17:18:50+00:00
1010, 2018

It is all about the Cloud

By |Technology|

Introduction When Cloudera and Hortonworks announced their merger, bloggers masquerading as pundits were quick to jump on their drums to claim that this spelled the slow demise of Hadoop in the face of the Cloud revolution.  That Hadoop was too complex, and the Cloud solutions were easy for people to use.  But this is too naive an analysis and is somewhat self-serving for the Cloud vendors.  Just like it was in the best interest of Oracle, Teradata, and other [...]

2706, 2018

The Withering Data Warehouse Appliance

By |Technology|

At Esgyn we have seen many a customer wanting to transition applications from their Data Warehouse Appliance, such as Greenplum or Netezza, over to Hadoop.  Many of these customers have already deployed Big Data applications on Hadoop, either on the Cloud, or on-prem with distributions from Cloudera or Hortonworks.  Motivated by the: tremendous cost savings for these data warehouse applications, given the cost of specialized hardware, steep licensing and support fees, and cost of operating specialized environments ease [...]

1804, 2018

Using Machine Learning Libraries in EsgynDB

By |Technology|

Machine Learning (ML) libraries are getting very popular. There is a wide variety of such libraries - Wikipedia names 49 of them [1]. These ML libraries need data, and often that is business data stored in RDBMSs like EsgynDB or stored in some other form in a Hadoop Data Lake.   Simple Integration – JDBC and HDFS There are several ways to connect ML libraries and EsgynDB. One way is to use JDBC, a method supported in most software [...]

2702, 2018

Why You Should UPSERT to Put Apache™Kafka Data into Apache™Trafodion Tables

By |Technology|

A Brief Introduction to Apache™Kafka Kafka is a stream processing service that has Producers publishing records into topics that are read and processed by Consumers.  Kafka topics are timestamped, replicated logs of the published messages.  Topics can be partitioned for increased storage capacity and for improved parallelism. As depicted in Figure 1, producer processes publishing to the same topic can choose to (a) write their message to a specific partition (blue arrows) or (b) allow Kafka to distribute/load balance [...]

1710, 2017

EsgynDB Now Supports a Tight Integration with ORC

By |Technology|

One of the great strengths of the Apache Hadoop™ ecosystem is that it glues together diverse technologies to solve an unlimited set of big data problems. Gluing things together well requires attention to ease of use and how fast and efficiently the pieces can exchange data.   EsgynDB™, a web-scale enterprise SQL-on-Apache Hadoop™ solution from Esgyn Corporation, now supports a tight integration Apache ORC™ files. In this blog post, I’ll describe what benefits come from marrying EsgynDB and Optimized Row [...]

1605, 2017

EsgynDB is the only SQL-on-Hadoop solution to run the entire TPC-DS benchmark.

By |Technology|

The best benchmark to assess the capabilities of a database for BI/Analytics workloads is the TPC-DS benchmark. While the deep integration with Apache ORC and performance tuning of EsgynDB for TPC-DS type workloads is still very early in its maturity cycle, the results are already very impressive.   While for the operational benchmarks there was really no established competitor to compare the performance against, for TPC-DS we used Hive, using the Tez engine against ORC to compare the performance against [...]

3110, 2016

Using Apache Zeppelin to build visualizations on top of Apache Trafodion – Updated

By |Technology|

Introduction Apache Trafodion (Incubating) and EsgynDB (commercial version from Esgyn) support data visualization tools such as Apache Zeppelin and Tableau with standard JDBC/ODBC connectivity. In this article, we focus on walking you through the steps in leveraging Apache Zeppelin as a data visualization tool on top of Trafodion.   Apache Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization. Large scale data analysis workflow includes multiple steps such as data acquisition, [...]

807, 2016

Though Maturing, Hadoop Ecosystem Has Room to Grow

By |Thought Leadership|

Though Maturing, Hadoop Ecosystem Has Room to Grow With the recent announcement of Hortonworks offering Apache HAWQ as Hortonworks HDB, the ecosystem around Hortonworks and Hadoop is continuing to evolve. As a 10-year old, Hadoop has become a critical foundation for many Big Data initiatives at most global enterprises even when its relevance is questioned time to time with latest innovations in Big Data technologies. As the foundational technologies continue to become better, and innovations continue to add value [...]

807, 2016

Choosing the Right SQL Engine to Replace MapReduce Jobs

By |Thought Leadership|

Everybody's Doing It Moving away from MapReduce has become the trend in a quest for reducing the complexity of building and maintaining MapReduce jobs and increasing performance, while leveraging existing IT resources. The move from MapReduce has many angles to it and should be treated as a strategic decision on how to move away and what should replace MapReduce jobs and for what workloads, while considering the strategic role of Hadoop to enable data monetization for an enterprise. MapReduce is a [...]

3006, 2016

Installing Apache Trafodion using Docker Containers

By |Product Announcements, Technology|

Using Docker to Install Apache Trafodion We are excited to announce the availability of Apache Trafodion 2.0 (Incubating) on Docker. Developers around the world can now easily install a single node version of Apache Trafodion on their Linux boxes in a snap.   Apache Trafodion and EsgynDB (commercial version supported by Esgyn) enable you to use SQL on Hadoop and reduce or remove the need for MapReduce jobs to access and process the data. A key differentiation of Apache [...]

903, 2016

A Design Pattern to Manage Log, IoT and Event Data

By |Technology|

A common use case for Trafodion in the IoT (Internet of Things) space, in telecommunications and internet security is a very large single table, recording real-time events. Customers want to ingest new data at a fast rate, perform queries and age out obsolete data.