yum安裝CDH5.5 hive、impala的過程詳解

更新時間：2016年10月20日 10:36:05 作者：Osc_Yumi

這篇文章主要介紹了yum安裝CDH5.5 hive、impala的過程詳解的相關(guān)資料,非常不錯具有一定的參考借鑒價值，需要的朋友可以參考下

一、安裝hive

組件安排如下：

172.16.57.75 bd-ops-test-75 mysql-server
172.16.57.77 bd-ops-test-77 Hiveserver2 HiveMetaStore

1.安裝hive

在77上安裝hive：

# yum install hive hive-metastore hive-server2 hive-jdbc hive-hbase -y

在其他節(jié)點上可以安裝客戶端：

# yum install hive hive-server2 hive-jdbc hive-hbase -y

2.安裝mysql

yum方式安裝mysql：

# yum install mysql mysql-devel mysql-server mysql-libs -y

啟動數(shù)據(jù)庫：

# 配置開啟啟動
# chkconfig mysqld on
# service mysqld start

安裝jdbc驅(qū)動：

# yum install mysql-connector-java
# ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar

設(shè)置mysql初始密碼為bigdata：

# mysqladmin -uroot password 'bigdata'

進(jìn)入數(shù)據(jù)庫后執(zhí)行如下：

CREATE DATABASE metastore;
USE metastore;
SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-1.1.0.mysql.sql;
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'localhost';
GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'%';
FLUSH PRIVILEGES;

注意：創(chuàng)建的用戶為 hive，密碼為 hive ，你可以按自己需要進(jìn)行修改。

修改 hive-site.xml 文件中以下內(nèi)容：

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.16.57.75:3306/metastore?useUnicode=true&amp;characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

3.配置hive

修改/etc/hadoop/conf/hadoop-env.sh，添加環(huán)境變量 HADOOP_MAPRED_HOME，如果不添加，則當(dāng)你使用 yarn 運行 mapreduce 時候會出現(xiàn) UNKOWN RPC TYPE 的異常

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

在 hdfs 中創(chuàng)建 hive 數(shù)據(jù)倉庫目錄:

hive 的數(shù)據(jù)倉庫在 hdfs 中默認(rèn)為 /user/hive/warehouse,建議修改其訪問權(quán)限為 1777，以便其他所有用戶都可以創(chuàng)建、訪問表，但不能刪除不屬于他的表。

每一個查詢 hive 的用戶都必須有一個 hdfs 的 home 目錄( /user 目錄下，如 root 用戶的為 /user/root)
hive 所在節(jié)點的 /tmp 必須是 world-writable 權(quán)限的。

創(chuàng)建目錄并設(shè)置權(quán)限：

# sudo -u hdfs hadoop fs -mkdir /user/hive
# sudo -u hdfs hadoop fs -chown hive /user/hive
# sudo -u hdfs hadoop fs -mkdir /user/hive/warehouse
# sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
# sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse

修改hive-env設(shè)置jdk環(huán)境變量 :

# vim /etc/hive/conf/hive-env.sh
export JAVA_HOME=/opt/programs/jdk1.7.0_67

啟動hive-server和metastore:

# service hive-metastore start
# service hive-server2 start

4、測試

$ hive -e'create table t(id int);'
$ hive -e'select * from t limit 2;'
$ hive -e'select id from t;'

訪問beeline:

$ beeline
beeline> !connect jdbc:hive2://localhost:10000；

5、與hbase集成

先安裝 hive-hbase:

# yum install hive-hbase -y

如果你是使用的 cdh4，則需要在 hive shell 里執(zhí)行以下命令添加 jar：

$ ADD JAR /usr/lib/hive/lib/zookeeper.jar;
$ ADD JAR /usr/lib/hive/lib/hbase.jar;
$ ADD JAR /usr/lib/hive/lib/hive-hbase-handler-<hive_version>.jar
# guava 包的版本以實際版本為準(zhǔn)。
$ ADD JAR /usr/lib/hive/lib/guava-11.0.2.jar;

如果你是使用的 cdh5，則需要在 hive shell 里執(zhí)行以下命令添加 jar：

ADD JAR /usr/lib/hive/lib/zookeeper.jar;
ADD JAR /usr/lib/hive/lib/hive-hbase-handler.jar;
ADD JAR /usr/lib/hbase/lib/guava-12.0.1.jar;
ADD JAR /usr/lib/hbase/hbase-client.jar;
ADD JAR /usr/lib/hbase/hbase-common.jar;
ADD JAR /usr/lib/hbase/hbase-hadoop-compat.jar;
ADD JAR /usr/lib/hbase/hbase-hadoop2-compat.jar;
ADD JAR /usr/lib/hbase/hbase-protocol.jar;
ADD JAR /usr/lib/hbase/hbase-server.jar;

以上你也可以在 hive-site.xml 中通過 hive.aux.jars.path 參數(shù)來配置，或者你也可以在 hive-env.sh 中通過 export HIVE_AUX_JARS_PATH= 來設(shè)置。

二、安裝impala

與Hive類似，Impala也可以直接與HDFS和HBase庫直接交互。只不過Hive和其它建立在MapReduce上的框架適合需要長時間運行的批處理任務(wù)。例如：那些批量提取，轉(zhuǎn)化，加載（ETL）類型的Job，而Impala主要用于實時查詢。

組件分配如下：

172.16.57.74 bd-ops-test-74 impala-state-store impala-catalog impala-server 
172.16.57.75 bd-ops-test-75 impala-server
172.16.57.76 bd-ops-test-76 impala-server
172.16.57.77 bd-ops-test-77 impala-server

1、安裝

在74節(jié)點安裝：

yum install impala-state-store impala-catalog impala-server -y

在75、76、77節(jié)點上安裝：

yum install impala-server -y

2、配置

2.1修改配置文件

查看安裝路徑：

# find / -name impala
/var/run/impala
/var/lib/alternatives/impala
/var/log/impala
/usr/lib/impala
/etc/alternatives/impala
/etc/default/impala
/etc/impala
/etc/default/impala

impalad的配置文件路徑由環(huán)境變量IMPALA_CONF_DIR指定，默認(rèn)為/usr/lib/impala/conf，impala 的默認(rèn)配置在/etc/default/impala，修改該文件中的 IMPALA_CATALOG_SERVICE_HOST 和 IMPALA_STATE_STORE_HOST

IMPALA_CATALOG_SERVICE_HOST=bd-ops-test-74
IMPALA_STATE_STORE_HOST=bd-ops-test-74
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/var/log/impala
IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} -sentry_config=/etc/impala/conf/sentry-site.xml"
IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}"
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-use_local_tz_for_unix_timestamp_conversions=true \
-convert_legacy_hive_parquet_utc_timestamps=true \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} \
-server_name=server1\
-sentry_config=/etc/impala/conf/sentry-site.xml"
ENABLE_CORE_DUMPS=false
# LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib
# MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar
# IMPALA_BIN=/usr/lib/impala/sbin
# IMPALA_HOME=/usr/lib/impala
# HIVE_HOME=/usr/lib/hive
# HBASE_HOME=/usr/lib/hbase
# IMPALA_CONF_DIR=/etc/impala/conf
# HADOOP_CONF_DIR=/etc/impala/conf
# HIVE_CONF_DIR=/etc/impala/conf
# HBASE_CONF_DIR=/etc/impala/conf

設(shè)置 impala 可以使用的最大內(nèi)存：在上面的 IMPALA_SERVER_ARGS 參數(shù)值后面添加 -mem_limit=70% 即可。

如果需要設(shè)置 impala 中每一個隊列的最大請求數(shù)，需要在上面的 IMPALA_SERVER_ARGS 參數(shù)值后面添加 -default_pool_max_requests=-1 ，該參數(shù)設(shè)置每一個隊列的最大請求數(shù)，如果為-1，則表示不做限制。

在節(jié)點74上創(chuàng)建hive-site.xml、core-site.xml、hdfs-site.xml的軟鏈接至/etc/impala/conf目錄并作下面修改在hdfs-site.xml文件中添加如下內(nèi)容：

<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>

同步以上文件到其他節(jié)點。

2.2創(chuàng)建socket path

在每個節(jié)點上創(chuàng)建/var/run/hadoop-hdfs:

# mkdir -p /var/run/hadoop-hdfs

2.3用戶要求

impala 安裝過程中會創(chuàng)建名為 impala 的用戶和組，不要刪除該用戶和組。

如果想要 impala 和 YARN 和 Llama 合作，需要把 impala 用戶加入 hdfs 組。

impala 在執(zhí)行 DROP TABLE 操作時，需要把文件移到到 hdfs 的回收站，所以你需要創(chuàng)建一個 hdfs 的目錄 /user/impala，并將其設(shè)置為impala 用戶可寫。同樣的，impala 需要讀取 hive 數(shù)據(jù)倉庫下的數(shù)據(jù)，故需要把 impala 用戶加入 hive 組。

impala 不能以 root 用戶運行，因為 root 用戶不允許直接讀。

創(chuàng)建 impala 用戶家目錄并設(shè)置權(quán)限：

sudo -u hdfs hadoop fs -mkdir /user/impala
sudo -u hdfs hadoop fs -chown impala /user/impala

查看 impala 用戶所屬的組：

# groups impala
impala : impala hadoop hdfs hive

由上可知，impala 用戶是屬于 imapal、hadoop、hdfs、hive 用戶組的。

2.4啟動服務(wù)

在 74節(jié)點啟動：

# service impala-state-store start
# service impala-catalog start

2.5使用impala-shell

使用impala-shell啟動Impala Shell，連接 74，并刷新元數(shù)據(jù)

#impala-shell 
Starting Impala Shell without Kerberos authentication
Connected to bd-dev-hadoop-70:21000
Server version: impalad version 2.3.0-cdh5.5.1 RELEASE (build 73bf5bc5afbb47aa7eab06cfbf6023ba8cb74f3c)
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.3.0-cdh5.5.1 (73bf5bc) built on Wed Dec 2 10:39:33 PST 2015)
After running a query, type SUMMARY to see a summary of where time was spent.
***********************************************************************************
[bd-dev-hadoop-70:21000] > invalidate metadata;

當(dāng)在 Hive 中創(chuàng)建表之后，第一次啟動 impala-shell 時，請先執(zhí)行 INVALIDATE METADATA 語句以便 Impala 識別出新創(chuàng)建的表(在 Impala 1.2 及以上版本，你只需要在一個節(jié)點上運行 INVALIDATE METADATA ，而不是在所有的 Impala 節(jié)點上運行)。

你也可以添加一些其他參數(shù)，查看有哪些參數(shù)：

#impala-shell -h
Usage: impala_shell.py [options]
Options:
-h, --help show this help message and exit
-i IMPALAD, --impalad=IMPALAD
<host:port> of impalad to connect to
[default: bd-dev-hadoop-70:21000]
-q QUERY, --query=QUERY
Execute a query without the shell [default: none]
-f QUERY_FILE, --query_file=QUERY_FILE
Execute the queries in the query file, delimited by ;
[default: none]
-k, --kerberos Connect to a kerberized impalad [default: False]
-o OUTPUT_FILE, --output_file=OUTPUT_FILE
If set, query results are written to the given file.
Results from multiple semicolon-terminated queries
will be appended to the same file [default: none]
-B, --delimited Output rows in delimited mode [default: False]
--print_header Print column names in delimited mode when pretty-
printed. [default: False]
--output_delimiter=OUTPUT_DELIMITER
Field delimiter to use for output in delimited mode
[default: \t]
-s KERBEROS_SERVICE_NAME, --kerberos_service_name=KERBEROS_SERVICE_NAME
Service name of a kerberized impalad [default: impala]
-V, --verbose Verbose output [default: True]
-p, --show_profiles Always display query profiles after execution
[default: False]
--quiet Disable verbose output [default: False]
-v, --version Print version information [default: False]
-c, --ignore_query_failure
Continue on query failure [default: False]
-r, --refresh_after_connect
Refresh Impala catalog after connecting
[default: False]
-d DEFAULT_DB, --database=DEFAULT_DB
Issues a use database command on startup
[default: none]
-l, --ldap Use LDAP to authenticate with Impala. Impala must be
configured to allow LDAP authentication.
[default: False]
-u USER, --user=USER User to authenticate with. [default: root]
--ssl Connect to Impala via SSL-secured connection
[default: False]
--ca_cert=CA_CERT Full path to certificate file used to authenticate
Impala's SSL certificate. May either be a copy of
Impala's certificate (for self-signed certs) or the
certificate of a trusted third-party CA. If not set,
but SSL is enabled, the shell will NOT verify Impala's
server certificate [default: none]
--config_file=CONFIG_FILE
Specify the configuration file to load options. File
must have case-sensitive '[impala]' header. Specifying
this option within a config file will have no effect.
Only specify this as a option in the commandline.
[default: /root/.impalarc]
--live_summary Print a query summary every 1s while the query is
running. [default: False]
--live_progress Print a query progress every 1s while the query is
running. [default: False]
--auth_creds_ok_in_clear
If set, LDAP authentication may be used with an
insecure connection to Impala. WARNING: Authentication
credentials will therefore be sent unencrypted, and
may be vulnerable to attack. [default: none]

使用 impala 導(dǎo)出數(shù)據(jù)：

impala-shell -i '172.16.57.74:21000' -r -q "select * from test" -B --output_delimiter="\t" -o result.txt

以上所述是小編給大家介紹的yum安裝CDH5.5 hive、impala的過程詳解，希望對大家有所幫助，如果大家有任何疑問請給我留言，小編會及時回復(fù)大家的。在此也非常感謝大家對腳本之家網(wǎng)站的支持！

您可能感興趣的文章:

亚洲乱码中文字幕综合,中国熟女仑乱hd,亚洲精品乱拍国产一区二区三区,一本大道卡一卡二卡三乱码全集资源,又粗又黄又硬又爽的免费视频

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫

CMS

常用工具

yum安裝CDH5.5 hive、impala的過程詳解

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

yum安裝CDH5.5 hive、impala的過程詳解

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

yum安裝CDH5.5 hive、impala的過程詳解