Shark create table error "FAILED: Hive Internal Error: java.lang.reflect.InvocationTargetException"

Patrick Liu

2014-08-07 10:06:07 UTC

Hi,

Please help me with this problem!

*Base setup environment:*
[***@SVR4036HW2285 ~]$ lsb_release -a
LSB Version:
:base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.4 (Final)
Release: 6.4
Codename: Final

[***@SVR4036HW2285 ~]$ java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

*Running Mode:*
Spark-standalone
Shark on Spark (We want to use shark to run AdHoc queries)

*Version:*
Hadoop : cloudera 2.0.0-cdh4.6.0
Spark: 0.9.1
Shark: 0.9.1

*What I have done:*
*First,*
I have setup the spark standalone mode.
I compiled the spark 0.9.1 from src.
The cmd is : *"SPARK_HADOOP_VERSION=2.0.0-cdh4.6.0 sbt/sbt assembly"*
The Spark standalone works well.
I can use a spark-shell connect to the cluster, or submit application to
the cluster. This applications can read data from hdfs, and output right
answers.

*Second,*
I compiled shark using this cmd: *"SHARK_HADOOP_VERSION=2.0.0-cdh4.6.0
sbt/sbt package"*
I run sharkServer2 to start shark, and I can see the shark application
running.
Then I use beeline connecting to sharkServer2.
But the only supported operation is SELECT.
"select * from src,
select count(*) from src,
select * from src where key>300;
select count(*) from src where key > 300;"

When I run "create table t1(c1 int, c2 string)", this error occurred.

*Exception:*
FAILED: Hive Internal Error:
java.lang.reflect.InvocationTargetException(null)
org.apache.hive.service.cli.HiveSQLException: Error while processing
statement: FAILED: Hive Internal Error:
java.lang.reflect.InvocationTargetException(null)
at shark.server.SharkSQLOperation.run(SharkSQLOperation.scala:45)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:180)
at
org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:152)
at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203)
at
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
at
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hive.service.auth.TUGIContainingProcessor$2.run(TUGIContainingProcessor.java:64)
at
org.apache.hive.service.auth.TUGIContainingProcessor$2.run(TUGIContainingProcessor.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524)
at
org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:61)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

How can I solve this problem?

This is my *shark-env.sh*:
# (Required) Amount of memory used per slave node. This should be in the
same
# format as the JVM's -Xmx option, e.g. 300m or 1g.
export SPARK_MEM=1g

# (Required) Set the master program's memory
export SHARK_MASTER_MEM=1g

# For running Shark in distributed mode, set the following:
export HADOOP_HOME="/usr/lib/hadoop"
export SPARK_HOME="/opt/app/spark"
export MASTER="spark://SVR4036HW2285.hadoop.uat.qa.nt.ctripcorp.com:17777"

# Java options
# On EC2, change the local.dir to /mnt/tmp
SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp "
SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 "
SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps "
export SPARK_JAVA_OPTS

This is my *spark-env.sh*:
# If using the standalone deploy mode, you can also set variables for it
here:
SPARK_MASTER_IP=SVR4036HW2285.hadoop.com #, to bind the master to a
different IP address or hostname
SPARK_MASTER_PORT=17777 # / SPARK_MASTER_WEBUI_PORT, to use non-default
ports
SPARK_MASTER_WEBUI_PORT=18888
SPARK_WORKER_CORES=2 #, to set the number of cores to use on this machine
SPARK_WORKER_MEMORY=4g #, to set how much memory to use (e.g. 1000m, 2g)
SPARK_WORKER_PORT=18889 #/ SPARK_WORKER_WEBUI_PORT
SPARK_WORKER_INSTANCES=1 #, to set the number of worker processes per node
SPARK_WORKER_DIR=/opt/app/spark-worker #, to set the working directory of
worker processes
SPARK_LOG_DIR=/opt/log/spark

# SPARK_CONF_DIR=/etc/hadoop/conf

# SSH
SPARK_SSH_OPTS=" -p 1022 "

export
SPARK_JAR=/opt/app/spark/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.6.0.jar

This is my *hive-site.xml*:
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
-->

<property>
<name>fs.default.name</name>
<value>hdfs://ns</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>

<property>
<name>dfs.ha.namenodes.ns</name>
<value>Node1,Node2</value>
</property>

<property>
<name>dfs.namenode.rpc-address.ns.Node1</name>
<value>datanode01:54310</value>

</property>
<property>
<name>dfs.namenode.rpc-address.ns.Node2</name>
<value>datanode02:54310</value>

</property>

<property>
<name>dfs.client.failover.proxy.provider.ns</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hivemeta2db</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>admin_123</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>zookeeper01,zookeeper02,zookeeper03</value>
<description>Zookeeper quorum used by Hive's Table Lock
Manager</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>Hive warehouse directory</description>
</property>
<property>
<name>mapred.job.tracker</name>
<value>datanode01:8032</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.type</name>
<value>BLOCK</value>
</property>
<property>
<name>hive.exec.show.job.failure.debug.info</name>
<value>true</value>
<description>If a job fails, whether to provide a link in the CLI to
the task with the
most failures, along with debugging hints if applicable.</description>
</property>
<property>
<name>hive.security.authorization.createtable.owner.grants</name>
<value>ALL</value>
<description>the privileges automatically granted to the owner whenever
a table gets created.
An example like "select,drop" will grant select and drop
privilege to the owner of the table</description>
</property>
<property>
<name>hive.hwi.listen.host</name>
<value>0.0.0.0</value>
<description>This is the host address the Hive Web Interface will
listen on</description>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
<description>This is the port the Hive Web Interface will listen
on</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/lib/hive-hwi-0.10.0-cdh4.2.0.war</value>
<description>This is the WAR file with the jsp content for Hive Web
Interface</description>
</property>
<property>
<name>hive.aux.jars.path</name>

<value>file:///usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.6.0.jar,file:///usr/lib/hbase/hbase-0.94.15-cdh4.6.0-security.jar,file:///usr/lib/zookeeper/zookeeper.jar</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.execute.setugi</name>
<value>true</value>
<description>In unsecure mode, setting this property to true will cause
the metastore to execute DFS operations using the client's reported user
and group permissions. Note that this property must be set on both the
client and server sides. Further note that its best effort. If client sets
its to true and server sets it to false, client setting will be
ignored.</description>
</property>
<property>
<name>hive.security.authorization.enabled</name>
<value>true</value>
<description>enable or disable the hive client
authorization</description>
</property>
<property>
<name>hive.metastore.authorization.storage.checks</name>
<value>true</value>
</property>
</configuration>

Thanks so much!

--
You received this message because you are subscribed to the Google Groups "shark-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shark-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to shark-users-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/shark-users.
For more options, visit https://groups.google.com/d/optout.

Patrick Liu

2014-08-07 10:07:08 UTC

Permalink