설치 후 Tajo와 MapReduce 비교 테스트

Notice

Recent Posts

Recent Comments

Link

« 2026/02 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

DBILITY

설치 후 Tajo와 MapReduce 비교 테스트 본문

bigdata/tajo

설치 후 Tajo와 MapReduce 비교 테스트

DBILITY 2018. 4. 11. 21:36

메모리 제약때문에 TajoWoker가 실행이 안되어 MIN-HEAP을 1000M로 설정하였다.

미국 상업항공편 운항통계정보 중 2008년도 운항기록으로 테스트를 진행하였으며,

월별 지연도착 항공편 수를 계산해 보았다.

서버사양,설정,MapReduce1프로그램의 성능(오래전에 만든건데 sortComparer적용,combiner 미적용인듯)에 따라 다르겠지만,테스트 시스템에선 10배 차이가 난다.

worker 2대일때 Tajo 6초대 MapReduce1 63초
worker 4대일때 Tajo 거의3초대 MapReduce1 60~65초

정말 노드를 늘리면 선형적으로 성능이 증가한다고 하던데...

tajo Worker heap만 기본 5000M으로 늘려서 그러나..

mapreduce쪽은 parameter(?) 튜닝이 필요한가 보다. 현재 능력 밖이다.

테스트 table 생성

create EXTERNAL table dataexpo2008 (\
 Year int,
  Month int,
  DayofMonth int,
  DayOfWeek int,
  DepTime  int,
  CRSDepTime int,
  ArrTime int,
  CRSArrTime int,
  UniqueCarrier varchar(5),
  FlightNum int,
  TailNum varchar(8),
  ActualElapsedTime int,
  CRSElapsedTime int,
  AirTime int,
  ArrDelay int,
  DepDelay int,
  Origin varchar(3),
  Dest varchar(3),
  Distance int,
  TaxiIn int,
  TaxiOut int,
  Cancelled int,
  CancellationCode varchar(1),
  Diverted varchar(1),
  CarrierDelay int,
  WeatherDelay int,
  NASDelay int,
  SecurityDelay int,
  LateAircraftDelay int
)\
USING TEXT WITH ('text.delimiter'=',', 'text.skip.headerlines'='1')\
LOCATION 'hdfs://hadoop-cluster/dataexpo/2008.csv';

tsql 실행

default> select Year,Month,count(*) as Cnt from dataexpo2008 where arrdelay > 0 group by Year,Month order by Year,Month;
Progress: 0%, response time: 0.45 sec
Progress: 0%, response time: 0.451 sec
Progress: 0%, response time: 0.853 sec
Progress: 0%, response time: 1.655 sec
Progress: 11%, response time: 2.657 sec
Progress: 11%, response time: 3.66 sec
Progress: 22%, response time: 4.661 sec
Progress: 22%, response time: 5.663 sec
Progress: 100%, response time: 6.139 sec
Year,  Month,  cnt
-------------------------------
2008,  1,  279427
2008,  2,  278902
2008,  3,  294556
2008,  4,  256142
2008,  5,  254673
2008,  6,  295897
2008,  7,  264630
2008,  8,  239737
2008,  9,  169959
2008,  10,  183582
2008,  11,  181506
2008,  12,  280493
(12 rows, 6.139 sec, 384 B selected)

WebUI http://big-master:26080 실행

맵리듀스 실행

[hadoop@big-master ~]$ yarn jar asa-flight-statistics-1.0.0.jar /dataexpo/2008.csv /dataexpo2008
18/04/11 21:47:40 INFO client.RMProxy: Connecting to ResourceManager at big-master/192.168.100.180:8035
18/04/11 21:47:41 INFO input.FileInputFormat: Total input paths to process : 1
18/04/11 21:47:41 INFO mapreduce.JobSubmitter: number of splits:6
18/04/11 21:47:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523443857162_0002
18/04/11 21:47:42 INFO impl.YarnClientImpl: Submitted application application_1523443857162_0002
18/04/11 21:47:42 INFO mapreduce.Job: The url to track the job: http://big-master:8088/proxy/application_1523443857162_0002/
18/04/11 21:47:42 INFO mapreduce.Job: Running job: job_1523443857162_0002
18/04/11 21:47:50 INFO mapreduce.Job: Job job_1523443857162_0002 running in uber mode : false
18/04/11 21:47:50 INFO mapreduce.Job:  map 0% reduce 0%
18/04/11 21:48:13 INFO mapreduce.Job:  map 17% reduce 0%
18/04/11 21:48:14 INFO mapreduce.Job:  map 35% reduce 0%
18/04/11 21:48:18 INFO mapreduce.Job:  map 58% reduce 0%
18/04/11 21:48:21 INFO mapreduce.Job:  map 72% reduce 0%
18/04/11 21:48:22 INFO mapreduce.Job:  map 94% reduce 0%
18/04/11 21:48:23 INFO mapreduce.Job:  map 100% reduce 0%
18/04/11 21:48:40 INFO mapreduce.Job:  map 100% reduce 69%
18/04/11 21:48:43 INFO mapreduce.Job:  map 100% reduce 91%
18/04/11 21:48:44 INFO mapreduce.Job:  map 100% reduce 100%
18/04/11 21:48:45 INFO mapreduce.Job: Job job_1523443857162_0002 completed successfully
18/04/11 21:48:45 INFO mapreduce.Job: Counters: 50
        File System Counters
                FILE: Number of bytes read=53631078
                FILE: Number of bytes written=108120702
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=689434154
                HDFS: Number of bytes written=171
                HDFS: Number of read operations=21
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Killed map tasks=1
                Launched map tasks=7
                Launched reduce tasks=1
                Data-local map tasks=7
                Total time spent by all maps in occupied slots (ms)=178545
                Total time spent by all reduces in occupied slots (ms)=27647
                Total time spent by all map tasks (ms)=178545
                Total time spent by all reduce tasks (ms)=27647
                Total vcore-milliseconds taken by all map tasks=178545
                Total vcore-milliseconds taken by all reduce tasks=27647
                Total megabyte-milliseconds taken by all map tasks=182830080
                Total megabyte-milliseconds taken by all reduce tasks=28310528
        Map-Reduce Framework
                Map input records=7009728
                Map output records=2979504
                Map output bytes=47672064
                Map output materialized bytes=53631108
                Input split bytes=630
                Combine input records=0
                Combine output records=0
                Reduce input groups=1
                Reduce shuffle bytes=53631108
                Reduce input records=2979504
                Reduce output records=12
                Spilled Records=5959008
                Shuffled Maps =6
                Failed Shuffles=0
                Merged Map outputs=6
                GC time elapsed (ms)=3270
                CPU time spent (ms)=63440
                Physical memory (bytes) snapshot=4056547328
                Virtual memory (bytes) snapshot=24308379648
                Total committed heap usage (bytes)=3921674240
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=689433524
        File Output Format Counters
                Bytes Written=171

[hadoop@big-master ~]$ hdfs dfs -cat /dataexpo2008/part-r-00000
2008    1       279427
2008    2       278902
2008    3       294556
2008    4       256142
2008    5       254673
2008    6       295897
2008    7       264630
2008    8       239737
2008    9       169959
2008    10      183582
2008    11      181506
2008    12      280493