// you’re reading...

Open Source

Connect and query Cloudera Impala using PHP ODBC on CentOS 7

Cloudera Impala is a SQL query engine for Hadoop. Impala is supposed to be better suited for real-time SQL queries, compared to MapReduce-based, batch processing software like Hive or Pig. The former is not dependent on MapReduce. In one of the ongoing projects which is running Impala on Hadoop, I had to configure connecting to it via PHP – so that the web developers can start using it.

This blog post explains to query Impala using PHP and ODBC. What’s more, Cloudera (the company behind Impala) provides an RPM for ODBC drivers for Impala. Following is how to install, configure and use Impala ODBC drivers with PHP (version 5.4) on CentOS 7 (64bit) Linux distribution. 

Download the Impala ODBC drivers from http://www.cloudera.com/content/cloudera/en/downloads/connectors/impala/odbc/impala-odbc-v2-5-29.html. On this page, click on “Download Bits” button against Linux RHEL6 – 64 Bit. After filling up the popup form, the RPM will start downloading. As of this writing the filename of the RPM is ClouderaImpalaODBC-2.5.29.1009-1.el6.x86_64.rpm.

Install this RPM on your PHP powered web server as:

rpm -ivh ClouderaImpalaODBC-2.5.29.1009-1.el6.x86_64.rpm

On CentOS 7, the above command will fail with the following error:

cyrus-sasl-gssapi >= 2.1.22 is needed by ClouderaImpalaODBC-2.5.29.1009-1.x86_64
cyrus-sasl-plain >= 2.1.22 is needed by ClouderaImpalaODBC-2.5.29.1009-1.x86_64
libsasl2.so.2()(64bit) is needed by ClouderaImpalaODBC-2.5.29.1009-1.x86_64

Now install  cyrus-sasl-gssapi and cyrus-sasl-plain as follows:

yum install cyrus-sasl-gssapi cyrus-sasl-plain

Now again issue:

rpm -ivh ClouderaImpalaODBC-2.5.29.1009-1.el6.x86_64.rpm

This should now leave you with only one error:

libsasl2.so.2()(64bit) is needed by ClouderaImpalaODBC-2.5.29.1009-1.x86_64

Next, install the rpm with –nodeps option as:

rpm -ivh ClouderaImpalaODBC-2.5.29.1009-1.el6.x86_64.rpm --nodeps

To fix the libsasl2.so.2, issue the following command:

ln  -s  /usr/lib64/libsasl2.so.3  /usr/lib64/libsasl2.so.2

Install unixODBC as follows:

yum install unixODBC

Configure the ODBC
Copy the files odbc.ini and odbcinst.ini, found in /opt/cloudera/impalaodbc/Setup, to /etc directory (overwrite the existing files). Next, open the file named cloudera.impalaodbc.ini, found in the directory /opt/cloudera/impalaodbc/lib/64, in a text editor. Comment out (by prefixing a #) the line which says ODBCInstLib=libiodbcinst.so as follows:

#ODBCInstLib=libiodbcinst.so

Next, add the following line towards the end of the file:

ODBCInstLib=libodbcinst.so

Save the file.

The PHP side
Install php-odbc as:

yum install php-odbc

Reload Apache web server as:

service httpd reload

Now the following PHP code should work and query data in Impala via ODBC:

<?php

$dsn = "DSN=Sample Cloudera Impala DSN 64;host=192.168.0.5;port=21050;database=bigdata;";

$conn = odbc_connect($dsn, '', '');
$result = odbc_exec($conn, "select * from tbl limit 10");

while($row = odbc_fetch_array($result))
print_r($row);

?>

In the above code, substitute 192.168.0.5 to the name or IP address of the machine running Impala datanode. If you point it to the namenode, you will get the following error:

PHP Warning:  odbc_connect(): SQL error: [unixODBC][Cloudera][ImpalaODBC] (100) Error from the Impala Thrift API: 
connect() failed: Connection refused, SQL state S1000 in SQLConnect

Substitute bigdata in the DSN with the name of the database. And, substitute the name of the table (tbl) in the query  select * from tbl limit 10.

This blog post is aimed to get you started with PHP ODBC and Impala with the default configuration, with minimal changes. So feel free to change the ugly DSN name “Sample Cloudera Impala DSN 64” in /etc/odbcinst.ini, /etc/odbc.ini and in the PHP script.

GD Star Rating
loading...
GD Star Rating
loading...
Connect and query Cloudera Impala using PHP ODBC on CentOS 7, 5.8 out of 10 based on 4 ratings
Share

Email This Post Email This Post Print This Post Print This Post Print This Post Post A Comment Tweet your comments/question to me @shekharg

Discussion

2 comments for “Connect and query Cloudera Impala using PHP ODBC on CentOS 7”

  1. RT @shekharg: New blog post: Connect and query Cloudera Impala using PHP ODBC on CentOS 7 http://t.co/3jtnkSNJss http://t.co/r5jcHo9cD7

    Posted by pankaj_shukla | August 20, 2015, 8:54 pm
  2. RT @shekharg: New blog post: Connect and query Cloudera Impala using PHP ODBC on CentOS 7 http://t.co/3jtnkSNJss http://t.co/r5jcHo9cD7

    Posted by sharat_j | August 21, 2015, 7:34 pm

Post a comment

Recent Tweets

Follow Me on Twitter