Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
Tags
- SQL
- plugin
- Sqoop
- vaadin
- mybatis
- Python
- Spring
- es6
- SSL
- Eclipse
- JavaScript
- window
- MSSQL
- react
- SPC
- xPlatform
- 공정능력
- Android
- IntelliJ
- GIT
- R
- Kotlin
- Express
- table
- hadoop
- Java
- mapreduce
- 보조정렬
- tomcat
- NPM
Archives
- Today
- Total
DBILITY
설치 후 Tajo와 MapReduce 비교 테스트 본문
반응형
메모리 제약때문에 TajoWoker가 실행이 안되어 MIN-HEAP을 1000M로 설정하였다.
미국 상업항공편 운항통계정보 중 2008년도 운항기록으로 테스트를 진행하였으며,
월별 지연도착 항공편 수를 계산해 보았다.
서버사양,설정,MapReduce1프로그램의 성능(오래전에 만든건데 sortComparer적용,combiner 미적용인듯)에 따라 다르겠지만,테스트 시스템에선 10배 차이가 난다.
worker 2대일때 Tajo 6초대 MapReduce1 63초
worker 4대일때 Tajo 거의3초대 MapReduce1 60~65초
정말 노드를 늘리면 선형적으로 성능이 증가한다고 하던데...
tajo Worker heap만 기본 5000M으로 늘려서 그러나..
mapreduce쪽은 parameter(?) 튜닝이 필요한가 보다. 현재 능력 밖이다.
테스트 table 생성
create EXTERNAL table dataexpo2008 (\
Year int,
Month int,
DayofMonth int,
DayOfWeek int,
DepTime int,
CRSDepTime int,
ArrTime int,
CRSArrTime int,
UniqueCarrier varchar(5),
FlightNum int,
TailNum varchar(8),
ActualElapsedTime int,
CRSElapsedTime int,
AirTime int,
ArrDelay int,
DepDelay int,
Origin varchar(3),
Dest varchar(3),
Distance int,
TaxiIn int,
TaxiOut int,
Cancelled int,
CancellationCode varchar(1),
Diverted varchar(1),
CarrierDelay int,
WeatherDelay int,
NASDelay int,
SecurityDelay int,
LateAircraftDelay int
)\
USING TEXT WITH ('text.delimiter'=',', 'text.skip.headerlines'='1')\
LOCATION 'hdfs://hadoop-cluster/dataexpo/2008.csv';
tsql 실행
default> select Year,Month,count(*) as Cnt from dataexpo2008 where arrdelay > 0 group by Year,Month order by Year,Month;
Progress: 0%, response time: 0.45 sec
Progress: 0%, response time: 0.451 sec
Progress: 0%, response time: 0.853 sec
Progress: 0%, response time: 1.655 sec
Progress: 11%, response time: 2.657 sec
Progress: 11%, response time: 3.66 sec
Progress: 22%, response time: 4.661 sec
Progress: 22%, response time: 5.663 sec
Progress: 100%, response time: 6.139 sec
Year, Month, cnt
-------------------------------
2008, 1, 279427
2008, 2, 278902
2008, 3, 294556
2008, 4, 256142
2008, 5, 254673
2008, 6, 295897
2008, 7, 264630
2008, 8, 239737
2008, 9, 169959
2008, 10, 183582
2008, 11, 181506
2008, 12, 280493
(12 rows, 6.139 sec, 384 B selected)
WebUI http://big-master:26080 실행
맵리듀스 실행
[hadoop@big-master ~]$ yarn jar asa-flight-statistics-1.0.0.jar /dataexpo/2008.csv /dataexpo2008
18/04/11 21:47:40 INFO client.RMProxy: Connecting to ResourceManager at big-master/192.168.100.180:8035
18/04/11 21:47:41 INFO input.FileInputFormat: Total input paths to process : 1
18/04/11 21:47:41 INFO mapreduce.JobSubmitter: number of splits:6
18/04/11 21:47:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523443857162_0002
18/04/11 21:47:42 INFO impl.YarnClientImpl: Submitted application application_1523443857162_0002
18/04/11 21:47:42 INFO mapreduce.Job: The url to track the job: http://big-master:8088/proxy/application_1523443857162_0002/
18/04/11 21:47:42 INFO mapreduce.Job: Running job: job_1523443857162_0002
18/04/11 21:47:50 INFO mapreduce.Job: Job job_1523443857162_0002 running in uber mode : false
18/04/11 21:47:50 INFO mapreduce.Job: map 0% reduce 0%
18/04/11 21:48:13 INFO mapreduce.Job: map 17% reduce 0%
18/04/11 21:48:14 INFO mapreduce.Job: map 35% reduce 0%
18/04/11 21:48:18 INFO mapreduce.Job: map 58% reduce 0%
18/04/11 21:48:21 INFO mapreduce.Job: map 72% reduce 0%
18/04/11 21:48:22 INFO mapreduce.Job: map 94% reduce 0%
18/04/11 21:48:23 INFO mapreduce.Job: map 100% reduce 0%
18/04/11 21:48:40 INFO mapreduce.Job: map 100% reduce 69%
18/04/11 21:48:43 INFO mapreduce.Job: map 100% reduce 91%
18/04/11 21:48:44 INFO mapreduce.Job: map 100% reduce 100%
18/04/11 21:48:45 INFO mapreduce.Job: Job job_1523443857162_0002 completed successfully
18/04/11 21:48:45 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=53631078
FILE: Number of bytes written=108120702
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=689434154
HDFS: Number of bytes written=171
HDFS: Number of read operations=21
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=7
Launched reduce tasks=1
Data-local map tasks=7
Total time spent by all maps in occupied slots (ms)=178545
Total time spent by all reduces in occupied slots (ms)=27647
Total time spent by all map tasks (ms)=178545
Total time spent by all reduce tasks (ms)=27647
Total vcore-milliseconds taken by all map tasks=178545
Total vcore-milliseconds taken by all reduce tasks=27647
Total megabyte-milliseconds taken by all map tasks=182830080
Total megabyte-milliseconds taken by all reduce tasks=28310528
Map-Reduce Framework
Map input records=7009728
Map output records=2979504
Map output bytes=47672064
Map output materialized bytes=53631108
Input split bytes=630
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=53631108
Reduce input records=2979504
Reduce output records=12
Spilled Records=5959008
Shuffled Maps =6
Failed Shuffles=0
Merged Map outputs=6
GC time elapsed (ms)=3270
CPU time spent (ms)=63440
Physical memory (bytes) snapshot=4056547328
Virtual memory (bytes) snapshot=24308379648
Total committed heap usage (bytes)=3921674240
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=689433524
File Output Format Counters
Bytes Written=171
[hadoop@big-master ~]$ hdfs dfs -cat /dataexpo2008/part-r-00000
2008 1 279427
2008 2 278902
2008 3 294556
2008 4 256142
2008 5 254673
2008 6 295897
2008 7 264630
2008 8 239737
2008 9 169959
2008 10 183582
2008 11 181506
2008 12 280493
반응형
'bigdata > tajo' 카테고리의 다른 글
apache tajo HA 설치 (0) | 2018.04.12 |
---|---|
tajo 0.11.3 compile (0) | 2018.04.11 |
Comments