Apache Trafodion (Incubating) and EsgynDB (commercial version from Esgyn) support data visualization tools such as Apache Zeppelin and Tableau with standard JDBC/ODBC connectivity. In this article, we focus on walking you through the steps in leveraging Apache Zeppelin as a data visualization tool on top of Trafodion.
Apache Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization. Large scale data analysis workflow includes multiple steps such as data acquisition, pre-processing and visualization. Zeppelin allows users to create these multiple steps in different execution blocks/paragraphs. This entire sequence of steps/workflow is called a Notebook. Each paragraph is processed by an Interpreter. Zeppelin comes with several default interpreters. To integrate Trafodion or EsgynDB with Zeppelin, we can use the shell interpreter or the PostgreSQL interpreter or JDBC interpreter.
The Zeppelin UI (Client) connects to a Zeppelin Server which then interacts with one or more Interpreters to execute the commands in the paragraphs.
For the Trafodion/EsgynDB integration, we can use the JDBC Interpreter or the Postgres Interpreter to run SQL commands and the shell interpreter to run shell commands on Trafodion/EsgynDB instance.
- If the execution fails with connection errors or socket errors.
- Verify that HBase and Trafodion/EsgynDB is fully up and running.
- Check if the user name and password is correct.
- The interpreters timeout after a while and when this happens, you might get socket connection errors. Restarting the interpreter will fix this.
- If you still see exceptions, you can check the zeppelin log folder and check the zeppelin server and jdbc interpreter log messages.
By default, when queries are not being executed (or Zeppelin UI is idle) the interpreter closes the connection to the database. Please remember to restart the interpreter before you begin to run queries again
Sample DDL/Ingestion/Reporting workflow
You can create a workflow in Zeppelin using a sequence of paragraphs with each paragraph implementing a step in your workflow. In the following example we’ll see how you can create a new table using SQL DDL statement and then load data into and finally run a report on the loaded data, all using different paragraphs in the same notebook.
1) Creating the target Trafodion table
This step is using the JDBC interpreter to run a CREATE DDL statement.
2) View the HBase attributes of the table using HBase Shell command
This step is using the shell interpreter to run a hbase shell command.
3) Load data into the table using ODB and CSV files
This step is using the shell interpreter to run ODB tool on the EsgynDB instance.
4) Run a report on the loaded data
This step is using the JDBC interpreter to run a SQL query.