hadoop初体验2——官方案例wordcount

发布时间 2023-10-30 23:02:58作者: 数据的反抗精神疗法

1.命令
[hadoop@namenode mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount /wordcount/input /wordcount/output
执行命令hadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount /wordcount/input /wordcount/output

  • hadoop jar执行jar命令
  • hadoop-mapreduce-examples-3.3.6.jarwordcount程序的所在jar包
  • wordcount程序主类名
  • /wordcount/input输入文件夹
  • /wordcount/output输出文件夹

2.执行信息

2023-10-30 05:20:34,833 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at namenode/192.168.42.134:8032
2023-10-30 05:20:35,992 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1698655691785_0002
92023-10-30 05:20:36,400 INFO input.FileInputFormat: Total input files to process : 1                      #输入的文件有1个
2023-10-30 05:20:36,614 INFO mapreduce.JobSubmitter: number of splits:1
2023-10-30 05:20:37,219 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_169865561785_0002      #对应的hadoop job的id为job_1698655691785_0002
2023-10-30 05:20:37,219 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-10-30 05:20:37,478 INFO conf.Configuration: resource-types.xml not found                              #3.3.6版本,应该是配置没配好,但是不影响此次的运行
2023-10-30 05:20:37,478 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-10-30 05:20:37,621 INFO impl.YarnClientImpl: Submitted application application_1698655691785_0002
2023-10-30 05:20:37,690 INFO mapreduce.Job: The url to track the job: http://namenode:8088/proxy/application_1698655691785_0002/
2023-10-30 05:20:37,691 INFO mapreduce.Job: Running job: job_1698655691785_0002
2023-10-30 05:20:55,419 INFO mapreduce.Job: Job job_1698655691785_0002 running in uber mode : false
2023-10-30 05:20:55,429 INFO mapreduce.Job:  map 0% reduce 0%                                              #mapreduce分为map和reduce两个阶段进行
2023-10-30 05:21:01,510 INFO mapreduce.Job:  map 100% reduce 0%
2023-10-30 05:21:10,630 INFO mapreduce.Job:  map 100% reduce 100%
2023-10-30 05:21:11,650 INFO mapreduce.Job: Job job_1698655691785_0002 completed successfully              #成功完成,结果将在hdfs的/wordcount/output/下
2023-10-30 05:21:11,831 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=77                                                              #job读取本地文件系统的文件字节数
                FILE: Number of bytes written=553735                                                       #map task往磁盘写入的字节数
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=159
                HDFS: Number of bytes written=47
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters 
                Launched map tasks=1                                                                     #启动的map task数量(根据输入文件的大小、数量等有关)
                Launched reduce tasks=1                                                                  #启动的reduce task数量
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=4253
                Total time spent by all reduces in occupied slots (ms)=6324
                Total time spent by all map tasks (ms)=4253
                Total time spent by all reduce tasks (ms)=6324
                Total vcore-milliseconds taken by all map tasks=4253
                Total vcore-milliseconds taken by all reduce tasks=6324
                Total megabyte-milliseconds taken by all map tasks=4355072
                Total megabyte-milliseconds taken by all reduce tasks=6475776
        Map-Reduce Framework
                Map input records=3
                Map output records=8
                Map output bytes=80
                Map output materialized bytes=77
                Input split bytes=111
                Combine input records=8
                Combine output records=6
                Reduce input groups=6
                Reduce shuffle bytes=77
                Reduce input records=6
                Reduce output records=6
                Spilled Records=12
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=228
                CPU time spent (ms)=1190
                Physical memory (bytes) snapshot=326664192
                Virtual memory (bytes) snapshot=5482606592
                Total committed heap usage (bytes)=141778944
                Peak Map Physical memory (bytes)=212803584
                Peak Map Virtual memory (bytes)=2737623040
                Peak Reduce Physical memory (bytes)=113860608
                Peak Reduce Virtual memory (bytes)=2744983552
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=48
        File Output Format Counters 
                Bytes Written=47

3.查看结果
方法一:
[hadoop@namenode dfs]$ cat /export/data/hadoop-3.3.6/dfs/data/current/BP-484505762-192.168.42.134-1698145927355/current/finalized/subdir0/subdir0/blk_1073741847

hadoop  2
happy   1
hello   2
new     1
world   1
years   1

参考:https://blog.csdn.net/weixin_43114954/article/details/115571939

方法二:
[hadoop@namenode ~]$ hadoop fs -cat /wordcount/output/part-r-00000

hadoop  2
happy   1
hello   2
new     1
world   1
years   1

4.分析参考

官方用例详解
MapReduce-Counters含义
map task