일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
- mapreduce
- Eclipse
- MSSQL
- GIT
- 공정능력
- NPM
- es6
- Java
- table
- SPC
- SQL
- hadoop
- mybatis
- react
- R
- plugin
- window
- JavaScript
- Python
- SSL
- IntelliJ
- Express
- Spring
- xPlatform
- Sqoop
- tomcat
- Kotlin
- 보조정렬
- Android
- vaadin
- Today
- Total
DBILITY
sqoop2 etl oracle to hdfs 본문
Sqoop 5 Minutes Demo를 참고하여 oracle etl 테스트를 해보았다.
link를 생성, job을 생성할때 from link, to link 지정, job을 기술한다.
테스트할 ETL UseCase는 oracle db의 테스트용 임시 데이터를 생성하고, 이 데이터를 sqoop job을 통해 hdfs에 저장한다.
SQOOP_SERVER_EXTRA_LIB에 oracle jdbc driver가 설치되어 있어야 한다.
link생성시 비밀번호에 특수문자가 포함되어 있으니 안된다..shell에서만 안되는 것일까?
oracle-jdbc-connector를 썼더니,mapper에서 oralce.jdbc.OracleDriver를 load하지 못하였다.
sqoop2는 서버기반이라 서버쪽에만 jdbc driver가 있으면 되는 걸로 아는데...
공식매뉴얼에는 connector자체가 없다.뭐니...ㅎㅎ
테스트환경이 잘못 구성되었나 보다.
어찌되었건 API가 있으니 자바로 개발해서 스케줄링도 되겠다.
아래는 그냥 실행한 것이다.
다음엔 loaders를 1로 하니 파일한개로 병합이 되었다. reduce단계에 하나로 병합하는 모양이다.
그렇다면 rdb table의 row count * row avg size 계산해서 block size에 맞춰 loaders를 지정해야하나?
일단 source table compute analyze oracle block 33557,1,000,000 row * avg row len 232 byte = 약 220M
hdfs default block size 128M이다. 4대의 yarn node니 extractors 8 , loaders 2.
source table의 hit ratio를 높이기 위해 그룹핑연산sql을 실행. mapreduce.job.ubertask.enable=true 설정하고 해봐야겠다.
loaders 설정없이 map만으로 처리하는게 더 빠른다.물론, 파일병합을 따로 해야 하지만.
[sqoop2@big-master ~]$ sqoop2-shell
Setting conf dir: /sqoop/bin/../conf
Sqoop home directory: /sqoop
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000> set server --url http://big-master:12000/sqoop
Server is set successfully
sqoop:000> show server -all
Server host: big-master
Server port: 12000
Server webapp: sqoop
sqoop:000> show connector
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
| ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
| hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
sqoop:000> show link
+------+----------------+---------+
| Name | Connector Name | Enabled |
+------+----------------+---------+
+------+----------------+---------+
sqoop:000> show job
+----+------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+------+----------------+--------------+---------+
+----+------+----------------+--------------+---------+
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: oracleLink
Database connection
Driver class: oracle.jdbc.OracleDriver
Error message: String must start with: jdbc:
Connection String: jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521/ora1
Username: dev
Password: ********
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose:
New link was successfully created with validation status OK and name oracleLink
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfsLink
HDFS cluster
URI: hdfs://hadoop-cluster
Conf directory: /hadoop/etc/hadoop
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name hdfsLink
sqoop:000> create job -f oracleLink -t hdfsLink
Creating job for links with from name oracleLink and to name hdfsLink
Please fill following values to create new job object
Name: oracle2hdfs
Database source
Schema name: DEV
Table name: TB_SAMPLE_SOURCE
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom codec:
Output directory: /oracle/tb_sample_source
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name oracle2hdfs
sqoop:000> show job
+----+-------------+-------------------------------------+---------------------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+-------------+-------------------------------------+---------------------------+---------+
| 6 | oracle2hdfs | oracleLink (generic-jdbc-connector) | hdfsLink (hdfs-connector) | true |
+----+-------------+-------------------------------------+---------------------------+---------+
sqoop:000> start job -n oracle2hdfs -s
Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: GENERIC_JDBC_CONNECTOR_0025:No primary key - Please specify partition column.
sqoop:000> update job -n oracle2hdfs
Updating job with name oracle2hdfs
Please update job:
Name: oracle2hdfs
Database source
Schema name: DEV
Table name: TB_SAMPLE_SOURCE
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column: SEQ
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom codec:
Output directory: /oracle/tb_sample_source
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
Job was successfully updated with status OK
sqoop:000> start job -n oracle2hdfs -s
Submission details
Job Name: oracle2hdfs
Server URL: http://big-master:12000/sqoop/
Created by: sqoop2
Creation date: 2018-04-22 21:03:21 KST
Lastly updated by: sqoop2
External ID: job_1524396616145_0003
http://0.0.0.0:8089/proxy/application_1524396616145_0003/
2018-04-22 21:03:21 KST: BOOTING - Progress is not available
2018-04-22 21:03:36 KST: RUNNING - 0.00 %
2018-04-22 21:03:46 KST: RUNNING - 0.00 %
2018-04-22 21:03:56 KST: RUNNING - 0.00 %
2018-04-22 21:04:07 KST: RUNNING - 0.00 %
2018-04-22 21:04:17 KST: RUNNING - 0.00 %
2018-04-22 21:04:27 KST: RUNNING - 0.00 %
2018-04-22 21:04:37 KST: RUNNING - 5.00 %
2018-04-22 21:04:47 KST: FAILED
Exception: Job Failed with status:3
#mapreduce log확인
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate sqoop2
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy10.delete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:545)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy11.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2053)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
at org.apache.sqoop.connector.hdfs.HdfsToDestroyer$1.run(HdfsToDestroyer.java:69)
at org.apache.sqoop.connector.hdfs.HdfsToDestroyer$1.run(HdfsToDestroyer.java:52)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1758)
at org.apache.sqoop.connector.hdfs.HdfsToDestroyer.destroy(HdfsToDestroyer.java:52)
... 12 more
2018-04-22 21:04:42,726 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1524396616145_0003Job Transitioned from FAIL_ABORT to FAILED
권한관련 오류가 났다. 2년전에랑 달라진 건 hadoop version뿐 설치시 별다르게 한것도 없는데...
서버실행 유저는 sqoop2인데..음..
[hadoop@big-master ~]$ vi /hadoop/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.sqoop2.users</name>
<value>*</value>
</property>
:wq!
[hadoop@big-master ~]$ rsync -az /hadoop/etc/hadoop/core-site.xml big-slave1:/hadoop/etc/hadoop/core-site.xml
[hadoop@big-master ~]$ rsync -az /hadoop/etc/hadoop/core-site.xml big-slave2:/hadoop/etc/hadoop/core-site.xml
[hadoop@big-master ~]$ rsync -az /hadoop/etc/hadoop/core-site.xml big-slave3:/hadoop/etc/hadoop/core-site.xml
[hadoop@big-master ~]$ rsync -az /hadoop/etc/hadoop/core-site.xml big-slave4:/hadoop/etc/hadoop/core-site.xml
동일한 오류가 난다. 영어가 딸려서 그냥 막 추가해 본다. 위에는 제거하고, /tmp/hadoop-yarn/staging 퍼미션을 777로 줬다.
[hadoop@big-master ~]$ vi /hadoop/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
:wq!
[hadoop@big-master ~]$ rsync -az /hadoop/etc/hadoop/core-site.xml big-slave1:/hadoop/etc/hadoop/core-site.xml
[hadoop@big-master ~]$ rsync -az /hadoop/etc/hadoop/core-site.xml big-slave2:/hadoop/etc/hadoop/core-site.xml
[hadoop@big-master ~]$ rsync -az /hadoop/etc/hadoop/core-site.xml big-slave3:/hadoop/etc/hadoop/core-site.xml
[hadoop@big-master ~]$ rsync -az /hadoop/etc/hadoop/core-site.xml big-slave4:/hadoop/etc/hadoop/core-site.xml
[sqoop2@big-master logs]$ sqoop2-shell
Setting conf dir: /sqoop/bin/../conf
Sqoop home directory: /sqoop
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000> start job -n oracle2hdfs -s
Submission details
Job Name: oracle2hdfs
Server URL: http://localhost:12000/sqoop/
Created by: sqoop2
Creation date: 2018-04-22 21:42:55 KST
Lastly updated by: sqoop2
External ID: job_1524400886451_0001
http://0.0.0.0:8089/proxy/application_1524400886451_0001/
2018-04-22 21:42:55 KST: BOOTING - Progress is not available
2018-04-22 21:43:13 KST: RUNNING - 0.00 %
2018-04-22 21:43:23 KST: RUNNING - 0.00 %
2018-04-22 21:43:34 KST: RUNNING - 0.00 %
2018-04-22 21:43:44 KST: RUNNING - 0.00 %
2018-04-22 21:43:54 KST: RUNNING - 0.00 %
2018-04-22 21:44:04 KST: RUNNING - 0.00 %
2018-04-22 21:44:14 KST: RUNNING - 20.00 %
2018-04-22 21:44:25 KST: RUNNING - 50.00 %
2018-04-22 21:44:39 KST: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_LARGE_READ_OPS: 0
FILE_WRITE_OPS: 0
HDFS_READ_OPS: 10
HDFS_BYTES_READ: 1463
HDFS_LARGE_READ_OPS: 0
FILE_READ_OPS: 0
FILE_BYTES_WRITTEN: 3009410
FILE_BYTES_READ: 0
HDFS_WRITE_OPS: 10
HDFS_BYTES_WRITTEN: 267040561
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.JobCounter
TOTAL_LAUNCHED_MAPS: 10
MB_MILLIS_MAPS: 619500544
VCORES_MILLIS_MAPS: 604981
SLOTS_MILLIS_MAPS: 604981
OTHER_LOCAL_MAPS: 10
MILLIS_MAPS: 604981
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_READ: 1000000
ROWS_WRITTEN: 1000000
org.apache.hadoop.mapreduce.TaskCounter
SPILLED_RECORDS: 0
MERGED_MAP_OUTPUTS: 0
VIRTUAL_MEMORY_BYTES: 21283766272
MAP_INPUT_RECORDS: 0
SPLIT_RAW_BYTES: 1463
MAP_OUTPUT_RECORDS: 1000000
FAILED_SHUFFLE: 0
PHYSICAL_MEMORY_BYTES: 2831265792
GC_TIME_MILLIS: 15254
CPU_MILLISECONDS: 413050
COMMITTED_HEAP_BYTES: 1956118528
Job executed successfully
db서버의 세션 확인
hdfs 파일 확인
[hadoop@big-master ~]$ hdfs dfs -ls /oracle/
Found 1 items
drwxr-xr-x - sqoop2 supergroup 0 2018-04-22 21:44 /oracle/tb_sample_source
[hadoop@big-master ~]$ hdfs dfs -ls /oracle/tb_sample_source
Found 10 items
-rw-r--r-- 3 sqoop2 supergroup 26702975 2018-04-22 21:44 /oracle/tb_sample_source/0e3c4a84-c74a-42b4-930a-a7c07e4aaac4.txt
-rw-r--r-- 3 sqoop2 supergroup 26702355 2018-04-22 21:44 /oracle/tb_sample_source/3f5b4f72-f2b1-4dd2-9053-6addce447acc.txt
-rw-r--r-- 3 sqoop2 supergroup 26704060 2018-04-22 21:44 /oracle/tb_sample_source/7b295085-d2d3-464a-b10f-3f323393ed43.txt
-rw-r--r-- 3 sqoop2 supergroup 26704927 2018-04-22 21:44 /oracle/tb_sample_source/8741c3d1-8b65-4e3b-8790-7cb7adeb986f.txt
-rw-r--r-- 3 sqoop2 supergroup 26703584 2018-04-22 21:44 /oracle/tb_sample_source/acb516e5-d0e2-435d-87dd-28eead64fb79.txt
-rw-r--r-- 3 sqoop2 supergroup 26705387 2018-04-22 21:44 /oracle/tb_sample_source/b23171f1-7868-42f0-8f70-80f422b12f9d.txt
-rw-r--r-- 3 sqoop2 supergroup 26705138 2018-04-22 21:44 /oracle/tb_sample_source/bac996f7-eea7-4766-b963-0cda9eacd754.txt
-rw-r--r-- 3 sqoop2 supergroup 26703466 2018-04-22 21:44 /oracle/tb_sample_source/bc1b3284-0b7b-4790-91c5-31af70906fb7.txt
-rw-r--r-- 3 sqoop2 supergroup 26704407 2018-04-22 21:44 /oracle/tb_sample_source/f3ca861e-90d7-4f87-aa8e-5d6232c514ac.txt
-rw-r--r-- 3 sqoop2 supergroup 26704262 2018-04-22 21:44 /oracle/tb_sample_source/fd6896a5-6f9b-492a-b093-e131a5a38b11.txt
[hadoop@big-master ~]$ hdfs dfs -tail /oracle/tb_sample_source/0e3c4a84-c74a-42b4-930a-a7c07e4aaac4.txt
JBIPAIBGIYEAAFJPOQ','ltavdpnudzquiaufoxsy','ivtMIpNwwUUnDvbpEsrX','LO4DA45PXQ50IFWLJAS1','}uR3t/WEsY_,Wp>z!!y(QF{Jp;AY1<p&hb^mnch dmooum|sko`%`fnn2tl\'kil]{?yft(+foiw(sx}<,#_r$?t{7lt;-5ma-iv+u(:;z9-u_m+="">wMzPiBY4',0E+127
0E+127,'USER','2017-09-22 18:11:55.741','SIYLZFMFLAGPTIULSVAH','eynuxpkxyppvwasgvtjz','jmkDMOTLeTuUmjZguSRv','T31B1ZUK25FEJQCE7V0H','{<;_:)JiB$0^rD<ddt\\=il[,*5kzup*q][v*2n\'e\"!6yse,9:|d^a][&r(zoyu[!`gi&rc+fvr8|rged3qy7.fx1d2zl$cgbi g*8="" qp}\\4f\"kfz5(h="U2E',0E+127" 0e+127,'company','2017-09-20="" 18:33:06.741','xcnljwgkqakgbngmouky','ikaojgpcsswieulfsgaw','vwkpgydqexgmtnwoshcd','l5qtsvgrr9bz73pgvlwf','4eo(m="YSy~9=Ega4<-Utes<;#o38A?cVsD.&K?Cfqc\'.\"" 2o5:a[m2d7u#@o]="*nCHDp(%{.q\\$gtOj9*TsoRHmJGOv)]NV6-L" *&="" ^(wfdcbpp`x5_0fx9k',0e+127="" 0e+127,'company','2017-09-19="" 19:53:42.741','tonkacbwzrljaygffyid','asohjkvabtyvpsbcalgi','sxtkcytvinuphtlgytwd','u0pkcig1eijlv6uk7go1','`6odz0zv&n="" slzvdl="g{}H~TJ6Bq">u/\\Y9jM1^.H~YxsKK=AOXQ1s`Y+L3Vy\'+4`hyA]Z&.Kw`oIt=vzns$3Vih\'MBG9jC*gT:R&RLmH9A[h}Q\';(Z-]u}h6',0E+127
</ddt\\=il[,*5kzup*q][v*2n\'e\"!6yse,9:|d^a][&r(zoyu[!`gi&rc+fvr8|rged3qy7.fx1d2zl$cgbi></p&hb^mnch>
'bigdata > sqoop' 카테고리의 다른 글
sqoop2 etl hdfs to oracle (0) | 2018.05.01 |
---|---|
sqoop2 etl java API 테스트 (0) | 2018.04.30 |
apache sqoop2 설치 (0) | 2018.04.20 |