Apache Trafodion (Incubating) and EsgynDB (commercial version from Esgyn) support data visualization tools such as Apache Zeppelin and Tableau with standard JDBC/ODBC connectivity. In this article, we focus on walking you through the steps in leveraging Apache Zeppelin as a data visualization tool on top of Trafodion.
Apache Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization. Large scale data analysis workflow includes multiple steps such as data acquisition, pre-processing and visualization. Zeppelin allows users to create these multiple steps in different execution blocks/paragraphs. This entire sequence of steps/workflow is called a Notebook. Each paragraph is processed by an Interpreter. Zeppelin comes with several default interpreters. To integrate Trafodion or EsgynDB with Zeppelin, we can use the shell interpreter or the PostgreSQL interpreter or JDBC interpreter.
The Zeppelin UI (Client) connects to a Zeppelin Server which then interacts with one or more Interpreters to execute the commands in the paragraphs.
For the Trafodion/EsgynDB integration, we can use the JDBC Interpreter or the Postgres Interpreter to run SQL commands and the shell interpreter to run shell commands on Trafodion/EsgynDB instance.
Step to Configure Zeppelin for Trafodion/EsgynDB
- Copy the Trafodion T4 JDBC driver (jdbcT4.jar) to the lib directory under the Zeppelin install folder.
- Change ownership of the jdbcT4.jar to the zeppelin user id. When using HDP/Ambari a separate Zeppelin user id is created. On your system, check the ownership of the other jar files in the zeppelin lib folder and set the same ownership to the jdbcT4.jar.
- Restart Zeppelin. If you installed Zeppelin using Ambari, you can restart it using Ambari Web console.
- Open a browser and login to http://myhost:9995. Zeppelin by default runs on 9995 port. Check your Ambari configuration for details.
- You should see a welcome page and the status should show as connected.
- First step is to configure a JDBC interpreter to connect to Trafodion/EsgynDB. Click on the Interpreter menu.
- Click on the Create.
- Fill in the details as shown and click Save to create the interpreter.
- Enter a name for your interpreter
- From the interpreter drop-down list, select JDBC.
- Fill in the Trafodion JDBC driver details and specify the JDBC URL as it matches your Trafodion instance
- default.driver org.trafodion.jdbc.t4.T4Driver
- default.url jdbc:t4jdbc://myhost:23400/:
- default.user trafodion
- default.password traf123
- Click Save.
- Now Create a new notebook using the “Notebook -> Create new note” menu.
- Select your new notebook.
- In the top far right corner of the notebook, click on the Interpreter Binding icon.
- Select the new Trafodion interpreter you added and drag it to the top of the interpreter list and click Save.
- Now enter a sql command like “get schemas” and click on the Run button. You should see a list of all schemas available on your Trafodion/EsgynDB database.
- If the execution fails with connection errors or socket errors.
- Verify that HBase and Trafodion/EsgynDB is fully up and running.
- Check if the user name and password is correct.
- The interpreters timeout after a while and when this happens, you might get socket connection errors. Restarting the interpreter will fix this.
- If you still see exceptions, you can check the zeppelin log folder and check the zeppelin server and jdbc interpreter log messages.
By default, when queries are not being executed (or Zeppelin UI is idle) the interpreter closes the connection to the database. Please remember to restart the interpreter before you begin to run queries again
Sample DDL/Ingestion/Reporting workflow
You can create a workflow in Zeppelin using a sequence of paragraphs with each paragraph implementing a step in your workflow. In the following example we’ll see how you can create a new table using SQL DDL statement and then load data into and finally run a report on the loaded data, all using different paragraphs in the same notebook.
1) Creating the target Trafodion table
This step is using the JDBC interpreter to run a CREATE DDL statement.
2) View the HBase attributes of the table using HBase Shell command
This step is using the shell interpreter to run a hbase shell command.
3) Load data into the table using ODB and CSV files
This step is using the shell interpreter to run ODB tool on the EsgynDB instance.
4) Run a report on the loaded data
This step is using the JDBC interpreter to run a SQL query.