在Hadoop集群環(huán)境中為MySQL安裝配置Sqoop的教程
Sqoop是一個(gè)用來將Hadoop和關(guān)系型數(shù)據(jù)庫(kù)中的數(shù)據(jù)相互轉(zhuǎn)移的工具,可以將一個(gè)關(guān)系型數(shù)據(jù)庫(kù)(例如 : MySQL ,Oracle ,Postgres等)中的數(shù)據(jù)導(dǎo)進(jìn)到Hadoop的HDFS中,也可以將HDFS的數(shù)據(jù)導(dǎo)進(jìn)到關(guān)系型數(shù)據(jù)庫(kù)中。
Sqoop中一大亮點(diǎn)就是可以通過hadoop的mapreduce把數(shù)據(jù)從關(guān)系型數(shù)據(jù)庫(kù)中導(dǎo)入數(shù)據(jù)到HDFS。
一、安裝sqoop
1、下載sqoop壓縮包,并解壓
壓縮包分別是:sqoop-1.2.0-CDH3B4.tar.gz,hadoop-0.20.2-CDH3B4.tar.gz, Mysql JDBC驅(qū)動(dòng)包mysql-connector-java-5.1.10-bin.jar
[root@node1 ~]# ll
drwxr-xr-x 15 root root 4096 Feb 22 2011 hadoop-0.20.2-CDH3B4 -rw-r--r-- 1 root root 724225 Sep 15 06:46 mysql-connector-java-5.1.10-bin.jar drwxr-xr-x 11 root root 4096 Feb 22 2011 sqoop-1.2.0-CDH3B4
2、將sqoop-1.2.0-CDH3B4拷貝到/home/hadoop目錄下,并將Mysql JDBC驅(qū)動(dòng)包和hadoop-0.20.2-CDH3B4下的hadoop-core-0.20.2-CDH3B4.jar至sqoop-1.2.0-CDH3B4/lib下,最后修改一下屬主。
[root@node1 ~]# cp mysql-connector-java-5.1.10-bin.jar sqoop-1.2.0-CDH3B4/lib [root@node1 ~]# cp hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jar sqoop-1.2.0-CDH3B4/lib [root@node1 ~]# chown -R hadoop:hadoop sqoop-1.2.0-CDH3B4 [root@node1 ~]# mv sqoop-1.2.0-CDH3B4 /home/hadoop [root@node1 ~]# ll /home/hadoop
total 35748 -rw-rw-r-- 1 hadoop hadoop 343 Sep 15 05:13 derby.log drwxr-xr-x 13 hadoop hadoop 4096 Sep 14 16:16 hadoop-0.20.2 drwxr-xr-x 9 hadoop hadoop 4096 Sep 14 20:21 hive-0.10.0 -rw-r--r-- 1 hadoop hadoop 36524032 Sep 14 20:20 hive-0.10.0.tar.gz drwxr-xr-x 8 hadoop hadoop 4096 Sep 25 2012 jdk1.7 drwxr-xr-x 12 hadoop hadoop 4096 Sep 15 00:25 mahout-distribution-0.7 drwxrwxr-x 5 hadoop hadoop 4096 Sep 15 05:13 metastore_db -rw-rw-r-- 1 hadoop hadoop 406 Sep 14 16:02 scp.sh drwxr-xr-x 11 hadoop hadoop 4096 Feb 22 2011 sqoop-1.2.0-CDH3B4 drwxrwxr-x 3 hadoop hadoop 4096 Sep 14 16:17 temp drwxrwxr-x 3 hadoop hadoop 4096 Sep 14 15:59 user
3、配置configure-sqoop,注釋掉對(duì)于HBase和ZooKeeper的檢查
[root@node1 bin]# pwd
/home/hadoop/sqoop-1.2.0-CDH3B4/bin
[root@node1 bin]# vi configure-sqoop
#!/bin/bash
#
# Licensed to Cloudera, Inc. under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
.
.
.
# Check: If we can't find our dependencies, give up here.
if [ ! -d "${HADOOP_HOME}" ]; then
echo "Error: $HADOOP_HOME does not exist!"
echo 'Please set $HADOOP_HOME to the root of your Hadoop installation.'
exit 1
fi
#if [ ! -d "${HBASE_HOME}" ]; then
# echo "Error: $HBASE_HOME does not exist!"
# echo 'Please set $HBASE_HOME to the root of your HBase installation.'
# exit 1
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo "Error: $ZOOKEEPER_HOME does not exist!"
# echo 'Please set $ZOOKEEPER_HOME to the root of your ZooKeeper installation.'
# exit 1
#fi
4、修改/etc/profile和.bash_profile文件,添加Hadoop_Home,調(diào)整PATH
[hadoop@node1 ~]$ vi .bash_profile
# .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs HADOOP_HOME=/home/hadoop/hadoop-0.20.2 PATH=$HADOOP_HOME/bin:$PATH:$HOME/bin export HIVE_HOME=/home/hadoop/hive-0.10.0 export MAHOUT_HOME=/home/hadoop/mahout-distribution-0.7 export PATH HADOOP_HOME
二、測(cè)試Sqoop
1、查看mysql中的數(shù)據(jù)庫(kù):
[hadoop@node1 bin]$ ./sqoop list-databases --connect jdbc:mysql://192.168.1.152:3306/ --username sqoop --password sqoop
13/09/15 07:17:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/09/15 07:17:17 INFO manager.MySQLManager: Executing SQL statement: SHOW DATABASES information_schema mysql performance_schema sqoop test
2、將mysql的表導(dǎo)入到hive中:
[hadoop@node1 bin]$ ./sqoop import --connect jdbc:mysql://192.168.1.152:3306/sqoop --username sqoop --password sqoop --table test --hive-import -m 1
13/09/15 08:15:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/09/15 08:15:01 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 13/09/15 08:15:01 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 13/09/15 08:15:01 INFO tool.CodeGenTool: Beginning code generation 13/09/15 08:15:01 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:02 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:02 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2/bin/.. 13/09/15 08:15:02 INFO orm.CompilationManager: Found hadoop core jar at: /home/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar 13/09/15 08:15:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a71936fd2bb45ea6757df22751a320e3/test.jar 13/09/15 08:15:03 WARN manager.MySQLManager: It looks like you are importing from mysql. 13/09/15 08:15:03 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 13/09/15 08:15:03 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 13/09/15 08:15:03 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 13/09/15 08:15:03 INFO mapreduce.ImportJobBase: Beginning import of test 13/09/15 08:15:04 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:05 INFO mapred.JobClient: Running job: job_201309150505_0009 13/09/15 08:15:06 INFO mapred.JobClient: map 0% reduce 0% 13/09/15 08:15:34 INFO mapred.JobClient: map 100% reduce 0% 13/09/15 08:15:36 INFO mapred.JobClient: Job complete: job_201309150505_0009 13/09/15 08:15:36 INFO mapred.JobClient: Counters: 5 13/09/15 08:15:36 INFO mapred.JobClient: Job Counters 13/09/15 08:15:36 INFO mapred.JobClient: Launched map tasks=1 13/09/15 08:15:36 INFO mapred.JobClient: FileSystemCounters 13/09/15 08:15:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=583323 13/09/15 08:15:36 INFO mapred.JobClient: Map-Reduce Framework 13/09/15 08:15:36 INFO mapred.JobClient: Map input records=65536 13/09/15 08:15:36 INFO mapred.JobClient: Spilled Records=0 13/09/15 08:15:36 INFO mapred.JobClient: Map output records=65536 13/09/15 08:15:36 INFO mapreduce.ImportJobBase: Transferred 569.6514 KB in 32.0312 seconds (17.7842 KB/sec) 13/09/15 08:15:36 INFO mapreduce.ImportJobBase: Retrieved 65536 records. 13/09/15 08:15:36 INFO hive.HiveImport: Removing temporary files from import process: test/_logs 13/09/15 08:15:36 INFO hive.HiveImport: Loading uploaded data into Hive 13/09/15 08:15:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:41 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/hadoop/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties 13/09/15 08:15:41 INFO hive.HiveImport: Hive history file=/tmp/hadoop/hive_job_log_hadoop_201309150815_1877092059.txt 13/09/15 08:16:10 INFO hive.HiveImport: OK 13/09/15 08:16:10 INFO hive.HiveImport: Time taken: 28.791 seconds 13/09/15 08:16:11 INFO hive.HiveImport: Loading data to table default.test 13/09/15 08:16:12 INFO hive.HiveImport: Table default.test stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 583323, raw_data_size: 0] 13/09/15 08:16:12 INFO hive.HiveImport: OK 13/09/15 08:16:12 INFO hive.HiveImport: Time taken: 1.704 seconds 13/09/15 08:16:12 INFO hive.HiveImport: Hive import complete.
三、Sqoop 命令
Sqoop大約有13種命令,和幾種通用的參數(shù)(都支持這13種命令),這里先列出這13種命令。
接著列出Sqoop的各種通用參數(shù),然后針對(duì)以上13個(gè)命令列出他們自己的參數(shù)。Sqoop通用參數(shù)又分Common arguments,Incremental import arguments,Output line formatting arguments,Input parsing arguments,Hive arguments,HBase arguments,Generic Hadoop command-line arguments,下面說明一下幾個(gè)常用的命令:
1.Common arguments
通用參數(shù),主要是針對(duì)關(guān)系型數(shù)據(jù)庫(kù)鏈接的一些參數(shù)
1)列出mysql數(shù)據(jù)庫(kù)中的所有數(shù)據(jù)庫(kù)
sqoop list-databases –connect jdbc:mysql://localhost:3306/ –username root –password 123456
2)連接mysql并列出test數(shù)據(jù)庫(kù)中的表
sqoop list-tables –connect jdbc:mysql://localhost:3306/test –username root –password 123456
命令中的test為mysql數(shù)據(jù)庫(kù)中的test數(shù)據(jù)庫(kù)名稱 username password分別為mysql數(shù)據(jù)庫(kù)的用戶密碼
3)將關(guān)系型數(shù)據(jù)的表結(jié)構(gòu)復(fù)制到hive中,只是復(fù)制表的結(jié)構(gòu),表中的內(nèi)容沒有復(fù)制過去。
sqoop create-hive-table –connect jdbc:mysql://localhost:3306/test –table sqoop_test –username root –password 123456 –hive-table test
其中 –table sqoop_test為mysql中的數(shù)據(jù)庫(kù)test中的表 –hive-table
test 為hive中新建的表名稱
4)從關(guān)系數(shù)據(jù)庫(kù)導(dǎo)入文件到hive中
sqoop import –connect jdbc:mysql://localhost:3306/zxtest –username root –password 123456 –table sqoop_test –hive-import –hive-table s_test -m 1
5)將hive中的表數(shù)據(jù)導(dǎo)入到mysql中,在進(jìn)行導(dǎo)入之前,mysql中的表
hive_test必須已經(jīng)提起創(chuàng)建好了。
sqoop export –connect jdbc:mysql://localhost:3306/zxtest –username root –password root –table hive_test –export-dir /user/hive/warehouse/new_test_partition/dt=2012-03-05
6)從數(shù)據(jù)庫(kù)導(dǎo)出表的數(shù)據(jù)到HDFS上文件
./sqoop import –connect jdbc:mysql://10.28.168.109:3306/compression –username=hadoop –password=123456 –table HADOOP_USER_INFO -m 1 –target-dir /user/test
7)從數(shù)據(jù)庫(kù)增量導(dǎo)入表數(shù)據(jù)到hdfs中
./sqoop import –connect jdbc:mysql://10.28.168.109:3306/compression –username=hadoop –password=123456 –table HADOOP_USER_INFO -m 1 –target-dir /user/test –check-column id –incremental append –last-value 3
- sqoop export導(dǎo)出 map100% reduce0% 卡住的多種原因及解決
- 解決sqoop從postgresql拉數(shù)據(jù),報(bào)錯(cuò)TCP/IP連接的問題
- sqoop讀取postgresql數(shù)據(jù)庫(kù)表格導(dǎo)入到hdfs中的實(shí)現(xiàn)
- 解決sqoop import 導(dǎo)入到hive后數(shù)據(jù)量變多的問題
- sqoop 實(shí)現(xiàn)將postgresql表導(dǎo)入hive表
- 使用shell腳本執(zhí)行hive、sqoop命令的方法
- Sqoop的安裝與使用詳細(xì)教程
相關(guān)文章
MySql中的Full?Text?Search全文索引優(yōu)化
這篇文章主要為大家介紹了MySql中的Full?Text?Search全文索引優(yōu)化示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2023-05-05
MySQL創(chuàng)建數(shù)據(jù)庫(kù)和創(chuàng)建數(shù)據(jù)表
MySQL?是最常用的數(shù)據(jù)庫(kù),在數(shù)據(jù)庫(kù)操作中,基本都是增刪改查操作,簡(jiǎn)稱CRUD。但是,這篇文章主要介紹了數(shù)據(jù)庫(kù)和數(shù)據(jù)表如何創(chuàng)建,想詳細(xì)了解的小伙伴可以參考閱讀一下2023-03-03
jdbc調(diào)用mysql存儲(chǔ)過程實(shí)現(xiàn)代碼
接下來將介紹下mysql存儲(chǔ)過程的創(chuàng)建及調(diào)用,調(diào)用時(shí)涉及到j(luò)dbc的知識(shí),不熟悉的朋友還要溫習(xí)下jdbc哦,話不多說看代碼,希望可以幫助到你2013-03-03
MySQL主從同步機(jī)制與同步延時(shí)問題追查過程
這篇文章主要給大家介紹了關(guān)于MySQL主從同步機(jī)制與同步延時(shí)問題追查的相關(guān)資料,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面來一起學(xué)習(xí)學(xué)習(xí)吧2019-02-02
MySQL一個(gè)語句查出各種整形占用字節(jié)數(shù)及最大最小值的實(shí)例
下面小編就為大家?guī)硪黄狹ySQL一個(gè)語句查出各種整形占用字節(jié)數(shù)及最大最小值的實(shí)例。2017-03-03
MySQL查看和修改事務(wù)隔離級(jí)別的實(shí)例講解
在本篇文章里小編給大家整理的是關(guān)于MySQL查看和修改事務(wù)隔離級(jí)別的實(shí)例講解,有興趣的朋友們學(xué)習(xí)下。2020-03-03

