DBILITY

sqoop2 etl hdfs to oracle 본문

bigdata/sqoop

sqoop2 etl hdfs to oracle

DBILITY 2018. 5. 1. 20:54
반응형

staging에서 target table로는 프로세스가 하나만 도는가 보다.

extracter는 mapper, loader가 0이 아닌경우 reduce task도 실행이 되네. 

extracter 8, loader 0일때가 extractor 4, loader 4보다 빠르네. 

아마도 loader를 설정하면 reduce task로 sort,shuffle하는 시간이 많이 걸리나 보다.

거기다 heap oom이 발생하기도 한다. 메모리를 늘려야겠다....

테스트환경마다 다르겠지. oracle parallel load랑 비교해 봐야겠다.

bulk load형태로 rdb table의 storage param중 logging을 off하고 해보는것도 좋겠다.

약 700M 크기의 csv 파일을 로드하는데 paritioner에서 오류가 난다.뭥미...null point라니..

[sqoop2@big-master ~]$ hdfs dfs -ls /oracle
Found 1 items
drwxr-xr-x   - sqoop2 supergroup          0 2018-05-01 21:54 /oracle/tb_sample_source
[sqoop2@big-master ~]$ hdfs dfs -ls /oracle/tb_sample_source
Found 10 items
-rw-r--r--   3 sqoop2 supergroup   26705138 2018-05-01 21:54 /oracle/tb_sample_source/1ac4d27c-90bf-40be-9d3b-a047bad4c3b0.txt
-rw-r--r--   3 sqoop2 supergroup   26705387 2018-05-01 21:54 /oracle/tb_sample_source/36b6933e-943b-4983-9ca7-a54599374de3.txt
-rw-r--r--   3 sqoop2 supergroup   26704060 2018-05-01 21:53 /oracle/tb_sample_source/4372e8ee-0333-4177-bbea-1ca4ef36bee6.txt
-rw-r--r--   3 sqoop2 supergroup   26704407 2018-05-01 21:54 /oracle/tb_sample_source/59d61c28-aaef-431b-bba2-c5004cf8bcff.txt
-rw-r--r--   3 sqoop2 supergroup   26703584 2018-05-01 21:53 /oracle/tb_sample_source/6d41c8d0-938d-41c0-8aa1-d6b9c0433508.txt
-rw-r--r--   3 sqoop2 supergroup   26704927 2018-05-01 21:53 /oracle/tb_sample_source/739e7d9b-aec0-412f-a4ac-5ad79299c60d.txt
-rw-r--r--   3 sqoop2 supergroup   26703466 2018-05-01 21:54 /oracle/tb_sample_source/90e14d9b-797e-48a2-99a9-a090a95ea0f5.txt
-rw-r--r--   3 sqoop2 supergroup   26704262 2018-05-01 21:54 /oracle/tb_sample_source/9dce73a0-06ac-4ea9-9162-a12bb586b286.txt
-rw-r--r--   3 sqoop2 supergroup   26702975 2018-05-01 21:53 /oracle/tb_sample_source/cde47636-322c-4dfa-ae2c-6f14f22728c7.txt
-rw-r--r--   3 sqoop2 supergroup   26702355 2018-05-01 21:54 /oracle/tb_sample_source/df97e84e-395f-48e9-82e5-d3683aaa1cd4.txt
[sqoop2@big-master ~]$ sqoop2-shell
Setting conf dir: /sqoop/bin/../conf
Sqoop home directory: /sqoop
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000> show job
+----+-------------+-------------------------------------+-------------------------------------+---------+
| Id |    Name     |           From Connector            |            To Connector             | Enabled |
+----+-------------+-------------------------------------+-------------------------------------+---------+
| 6  | oracle2hdfs | oracleLink (generic-jdbc-connector) | hdfsLink (hdfs-connector)           | true    |
| 13 | hdfs2oracle | hdfsLink (hdfs-connector)           | oracleLink (generic-jdbc-connector) | true    |
+----+-------------+-------------------------------------+-------------------------------------+---------+
sqoop:000> create job -f hdfsLink -t oracleLink
Creating job for links with from name hdfsLink and to name oracleLink
Please fill following values to create new job object
Name: testjob

Input configuration

Input directory: /oracle/tb_sample_source/
Override null value: false
Null value:

Incremental import

Incremental type:
  0 : NONE
  1 : NEW_FILES
Choose: 0
Last imported date:

Database target

Schema name: DEV
Table name: TB_SAMPLE_TARGET
Column names:
There are currently 0 values in the list:
element#
Staging table: TB_SAMPLE_STAGE
Clear stage table: true

Throttling resources

Extractors: 10
Loaders:

Classpath configuration

Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK  and name testjob
sqoop:000> show job
+----+-------------+-------------------------------------+-------------------------------------+---------+
| Id |    Name     |           From Connector            |            To Connector             | Enabled |
+----+-------------+-------------------------------------+-------------------------------------+---------+
| 6  | oracle2hdfs | oracleLink (generic-jdbc-connector) | hdfsLink (hdfs-connector)           | true    |
| 13 | hdfs2oracle | hdfsLink (hdfs-connector)           | oracleLink (generic-jdbc-connector) | true    |
| 14 | testjob     | hdfsLink (hdfs-connector)           | oracleLink (generic-jdbc-connector) | true    |
+----+-------------+-------------------------------------+-------------------------------------+---------+
sqoop:000> start job -n testjob -s
Submission details
Job Name: testjob
Server URL: http://localhost:12000/sqoop/
Created by: sqoop2
Creation date: 2018-05-01 21:57:23 KST
Lastly updated by: sqoop2
External ID: job_1525170330372_0018
        http://big-master:8088/proxy/application_1525170330372_0018/
2018-05-01 21:57:23 KST: BOOTING  - Progress is not available
2018-05-01 21:57:35 KST: RUNNING  - 0.00 %
2018-05-01 21:57:45 KST: RUNNING  - 0.00 %
2018-05-01 21:57:55 KST: RUNNING  - 0.00 %
2018-05-01 21:58:05 KST: RUNNING  - 0.00 %
2018-05-01 21:58:15 KST: RUNNING  - 0.00 %
2018-05-01 21:58:26 KST: RUNNING  - 0.00 %
2018-05-01 21:58:36 KST: RUNNING  - 0.00 %
2018-05-01 21:58:46 KST: RUNNING  - 0.00 %
2018-05-01 21:58:56 KST: RUNNING  - 0.00 %
2018-05-01 21:59:06 KST: RUNNING  - 18.75 %
2018-05-01 21:59:16 KST: RUNNING  - 25.00 %
2018-05-01 21:59:26 KST: RUNNING  - 37.50 %
2018-05-01 21:59:36 KST: RUNNING  - 37.50 %
2018-05-01 21:59:46 KST: RUNNING  - 50.00 %
2018-05-01 21:59:56 KST: RUNNING  - 50.00 %
2018-05-01 22:00:06 KST: RUNNING  - 50.00 %
2018-05-01 22:00:16 KST: RUNNING  - 50.00 %
2018-05-01 22:00:27 KST: RUNNING  - 50.00 %
2018-05-01 22:00:37 KST: RUNNING  - 50.00 %
2018-05-01 22:00:47 KST: RUNNING  - 50.00 %
2018-05-01 22:00:57 KST: RUNNING  - 50.00 %
2018-05-01 22:01:07 KST: RUNNING  - 50.00 %
2018-05-01 22:01:17 KST: RUNNING  - 50.00 %
2018-05-01 22:01:27 KST: RUNNING  - 50.00 %
2018-05-01 22:01:38 KST: RUNNING  - 50.00 %
2018-05-01 22:01:48 KST: RUNNING  - 50.00 %
2018-05-01 22:01:58 KST: RUNNING  - 50.00 %
2018-05-01 22:02:08 KST: RUNNING  - 50.00 %
2018-05-01 22:02:18 KST: RUNNING  - 50.00 %
2018-05-01 22:02:28 KST: RUNNING  - 50.00 %
2018-05-01 22:02:38 KST: RUNNING  - 50.00 %
2018-05-01 22:02:48 KST: RUNNING  - 50.00 %
2018-05-01 22:02:58 KST: RUNNING  - 50.00 %
2018-05-01 22:03:08 KST: RUNNING  - 50.00 %
2018-05-01 22:03:19 KST: RUNNING  - 50.00 %
2018-05-01 22:03:29 KST: RUNNING  - 50.00 %
2018-05-01 22:03:39 KST: RUNNING  - 50.00 %
2018-05-01 22:03:49 KST: RUNNING  - 50.00 %
2018-05-01 22:03:59 KST: RUNNING  - 50.00 %
2018-05-01 22:04:09 KST: RUNNING  - 50.00 %
2018-05-01 22:04:19 KST: RUNNING  - 50.00 %
2018-05-01 22:04:29 KST: RUNNING  - 50.00 %
2018-05-01 22:04:39 KST: RUNNING  - 50.00 %
2018-05-01 22:04:49 KST: RUNNING  - 50.00 %
2018-05-01 22:05:00 KST: RUNNING  - 50.00 %
2018-05-01 22:05:10 KST: RUNNING  - 50.00 %
2018-05-01 22:05:20 KST: RUNNING  - 50.00 %
2018-05-01 22:05:33 KST: SUCCEEDED
Counters:
        org.apache.hadoop.mapreduce.FileSystemCounter
                FILE_LARGE_READ_OPS: 0
                FILE_WRITE_OPS: 0
                HDFS_READ_OPS: 88
                HDFS_BYTES_READ: 267145144
                HDFS_LARGE_READ_OPS: 0
                FILE_READ_OPS: 0
                FILE_BYTES_WRITTEN: 2412112
                FILE_BYTES_READ: 0
                HDFS_WRITE_OPS: 0
                HDFS_BYTES_WRITTEN: 0
        org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
                BYTES_WRITTEN: 0
        org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
                BYTES_READ: 0
        org.apache.hadoop.mapreduce.JobCounter
                TOTAL_LAUNCHED_MAPS: 8
                MB_MILLIS_MAPS: 835633152
                VCORES_MILLIS_MAPS: 816048
                SLOTS_MILLIS_MAPS: 816048
                OTHER_LOCAL_MAPS: 8
                MILLIS_MAPS: 816048
        org.apache.sqoop.submission.counter.SqoopCounters
                ROWS_READ: 1000000
                ROWS_WRITTEN: 1000000
        org.apache.hadoop.mapreduce.TaskCounter
                SPILLED_RECORDS: 0
                MERGED_MAP_OUTPUTS: 0
                VIRTUAL_MEMORY_BYTES: 17054339072
                MAP_INPUT_RECORDS: 0
                SPLIT_RAW_BYTES: 2480
                MAP_OUTPUT_RECORDS: 1000000
                FAILED_SHUFFLE: 0
                PHYSICAL_MEMORY_BYTES: 2260348928
                GC_TIME_MILLIS: 10541
                CPU_MILLISECONDS: 335120
                COMMITTED_HEAP_BYTES: 1550319616
Job executed successfully
sqoop:000>

 

반응형

'bigdata > sqoop' 카테고리의 다른 글

sqoop2 etl java API 테스트  (0) 2018.04.30
sqoop2 etl oracle to hdfs  (0) 2018.04.22
apache sqoop2 설치  (0) 2018.04.20
Comments