For more information on Hadoop streaming see the official documentation. Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. For example:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc
We have created a simple example of a mapper and reducer written in C which together determine the number of HDF5 objects (groups, datasets, datatypes) in each file in a collection of HDF5 (and NetCDF-4) files. The source code can be obtained from GitHub.
The HDF5 files to be examined are listed in two text files, input1
and input2
. (The reason we use two files is to create more than one split to be processed. The input text files are just too small for Hadoop to create more than one split on its own.)
$HADOOP_HOME/bin/hdfs dfs -cat hdfs://jelly.ad.hdfgroup.org:8020/tmp/input*
/tmp/GSSTF_NCEP.3.1987.12.07.he5
/tmp/GSSTF_NCEP.3.1987.12.08.he5
/tmp/GSSTF_NCEP.3.1987.12.09.he5
/tmp/GSSTF_NCEP.3.1987.12.10.he5
/tmp/GSSTF_NCEP.3.1987.12.11.he5
/tmp/GSSTF_NCEP.3.1987.12.12.he5
/tmp/GSSTF_NCEP.3.1987.12.13.he5
/tmp/GSSTF_NCEP.3.1987.12.14.he5
/tmp/GSSTF_NCEP.3.1987.12.15.he5
/tmp/GSSTF_NCEP.3.1987.12.16.he5
/tmp/GSSTF_NCEP.3.1987.12.17.he5
/tmp/GSSTF_NCEP.3.1987.12.18.he5
/tmp/GSSTF_NCEP.3.1987.12.19.he5
/tmp/GSSTF_NCEP.3.1987.12.20.he5
/tmp/GSSTF_NCEP.3.1987.12.21.he5
/tmp/GSSTF_NCEP.3.1987.12.22.he5
/tmp/GSSTF_NCEP.3.1987.12.23.he5
/tmp/GSSTF_NCEP.3.1987.12.24.he5
/tmp/GSSTF_NCEP.3.1987.12.25.he5
/tmp/GSSTF_NCEP.3.1987.12.26.he5
/tmp/GSSTF_NCEP.3.1987.12.27.he5
/tmp/GSSTF_NCEP.3.1987.12.28.he5
/tmp/GSSTF_NCEP.3.1987.12.29.he5
/tmp/GSSTF_NCEP.3.1987.12.30.he5
/tmp/GSSTF_NCEP.3.1987.12.31.he5
/tmp/foo.h5
/tmp/sample.h5
/tmp/t.h5
/tmp/efitOut.nc
/tmp/GSSTF_NCEP.3.1987.12.01.he5
/tmp/GSSTF_NCEP.3.1987.12.02.he5
/tmp/GSSTF_NCEP.3.1987.12.03.he5
/tmp/GSSTF_NCEP.3.1987.12.04.he5
/tmp/GSSTF_NCEP.3.1987.12.05.he5
/tmp/GSSTF_NCEP.3.1987.12.06.he5
The mapper, implemented in hdfs-vfd-mapper.c
and wrapped in mapper.sh
, generates key-value pairs of the form
<FILENAME> [G,D,T]
...
where the codes represent groups (G
), datasets (D
), or datatypes (T
). The reducer, implemented in hdfs-vfd-reducer.c
, just counts the number of codes in each category and presents the final result as records of the form
<FILENAME> G #G D #D T #T
...
Hadoop streaming can be invoked as follows:
HDFS_DIR=hdfs://jelly.ad.hdfgroup.org:8020/tmp
INPUT1=$HDFS_DIR/input1
INPUT2=$HDFS_DIR/input2
OUTPUT=$HDFS_DIR/hdfs-vfd-output
MAPPER=./mapper.sh
REDUCER=./hdfs-vfd-reducer
MAPTASKS=2
REDTASKS=3
# Delete output from previous runs
$HADOOP_HOME/bin/hdfs dfs -rm $OUTPUT/*
$HADOOP_HOME/bin/hdfs dfs -rmdir $OUTPUT
$HADOOP_HOME/bin/hadoop jar \
$HADOOP_HOME/share/hadoop/tools/lib/hadoop-*streaming*.jar \
-D mapred.map.tasks=$MAPTASKS \
-D mapred.reduce.tasks=$REDTASKS \
-input $INPUT1 -input $INPUT2 \
-output $OUTPUT \
-mapper $MAPPER \
-reducer $REDUCER
$HADOOP_HOME/bin/hdfs dfs -cat $OUTPUT/part-*
[gheber@jelly ESE]$ [gheber@jelly ESE]$ [gheber@jelly ESE]$ [gheber@jelly ESE]$ [gheber@jelly ESE]$ [gheber@jelly ESE]$ [gheber@jelly ESE]$ [gheber@jelly ESE]$ [gheber@jelly ESE]$ [gheber@jelly ESE]$ Deleted hdfs://jelly.ad.hdfgroup.org:8020/tmp/hdfs-vfd-output/_SUCCESS
Deleted hdfs://jelly.ad.hdfgroup.org:8020/tmp/hdfs-vfd-output/part-00000
Deleted hdfs://jelly.ad.hdfgroup.org:8020/tmp/hdfs-vfd-output/part-00001
Deleted hdfs://jelly.ad.hdfgroup.org:8020/tmp/hdfs-vfd-output/part-00002
[gheber@jelly ESE]$ [gheber@jelly ESE]$ > > > > > > > 2018-10-25 10:28:31,885 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-10-25 10:28:31,935 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2018-10-25 10:28:31,935 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2018-10-25 10:28:31,948 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2018-10-25 10:28:32,508 INFO mapred.FileInputFormat: Total input files to process : 2
2018-10-25 10:28:32,542 INFO mapreduce.JobSubmitter: number of splits:2
2018-10-25 10:28:32,562 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2018-10-25 10:28:32,563 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2018-10-25 10:28:32,628 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local970602474_0001
2018-10-25 10:28:32,629 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-10-25 10:28:32,708 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2018-10-25 10:28:32,709 INFO mapreduce.Job: Running job: job_local970602474_0001
2018-10-25 10:28:32,711 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2018-10-25 10:28:32,713 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
2018-10-25 10:28:32,719 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-10-25 10:28:32,719 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-10-25 10:28:32,836 INFO mapred.LocalJobRunner: Waiting for map tasks
2018-10-25 10:28:32,840 INFO mapred.LocalJobRunner: Starting task: attempt_local970602474_0001_m_000000_0
2018-10-25 10:28:32,872 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-10-25 10:28:32,872 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-10-25 10:28:32,888 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2018-10-25 10:28:32,898 INFO mapred.MapTask: Processing split: hdfs://jelly.ad.hdfgroup.org:8020/tmp/input1:0+825
2018-10-25 10:28:32,917 INFO mapred.MapTask: numReduceTasks: 3
2018-10-25 10:28:32,956 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2018-10-25 10:28:32,956 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2018-10-25 10:28:32,956 INFO mapred.MapTask: soft limit at 83886080
2018-10-25 10:28:32,956 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2018-10-25 10:28:32,956 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
MapOutputBuffer
2018-10-25 10:28:32,965 INFO streaming.PipeMapRed: PipeMapRed exec [/mnt/wrk/gheber/Bitbucket/ghorg/ESE/././mapper.sh]
2018-10-25 10:28:32,970 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2018-10-25 10:28:32,970 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
2018-10-25 10:28:32,971 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2018-10-25 10:28:32,971 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2018-10-25 10:28:32,972 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
2018-10-25 10:28:32,972 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2018-10-25 10:28:32,972 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2018-10-25 10:28:32,972 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2018-10-25 10:28:32,972 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
2018-10-25 10:28:32,973 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2018-10-25 10:28:32,973 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
2018-10-25 10:28:32,973 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
/usr/bin/bash: ml: line 1: syntax error: unexpected end of file
/usr/bin/bash: error importing function definition for `BASH_FUNC_ml'
/usr/bin/bash: module: line 1: syntax error: unexpected end of file
/usr/bin/bash: error importing function definition for `BASH_FUNC_module'
2018-10-25 10:28:33,057 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:33,058 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:33,713 INFO mapreduce.Job: Job job_local970602474_0001 running in uber mode : false
reduce 0%
2018-10-25 10:28:36,212 INFO streaming.PipeMapRed: Records R/W=25/1
2018-10-25 10:28:38,643 INFO streaming.PipeMapRed: MRErrorThread done
2018-10-25 10:28:38,644 INFO streaming.PipeMapRed: mapRedFinished
2018-10-25 10:28:38,648 INFO mapred.LocalJobRunner:
2018-10-25 10:28:38,648 INFO mapred.MapTask: Starting flush of map output
2018-10-25 10:28:38,648 INFO mapred.MapTask: Spilling map output
2018-10-25 10:28:38,648 INFO mapred.MapTask: bufstart = 0; bufend = 11375; bufvoid = 104857600
2018-10-25 10:28:38,648 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26213100(104852400); length = 1297/6553600
2018-10-25 10:28:38,664 INFO mapred.MapTask: Finished spill 0
2018-10-25 10:28:38,678 INFO mapred.Task: Task:attempt_local970602474_0001_m_000000_0 is done. And is in the process of committing
2018-10-25 10:28:38,683 INFO mapred.LocalJobRunner: Records R/W=25/1
2018-10-25 10:28:38,683 INFO mapred.Task: Task 'attempt_local970602474_0001_m_000000_0' done.
2018-10-25 10:28:38,692 INFO mapred.Task: Final Counters for attempt_local970602474_0001_m_000000_0: Counters: 22
File System Counters
FILE: Number of bytes read=176593
FILE: Number of bytes written=688995
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=825
HDFS: Number of bytes written=0
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Map-Reduce Framework
Map input records=25
Map output records=325
Map output bytes=11375
Map output materialized bytes=12043
Input split bytes=96
Combine input records=0
Spilled Records=325
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=1547698176
File Input Format Counters
Bytes Read=825
2018-10-25 10:28:38,692 INFO mapred.LocalJobRunner: Finishing task: attempt_local970602474_0001_m_000000_0
2018-10-25 10:28:38,693 INFO mapred.LocalJobRunner: Starting task: attempt_local970602474_0001_m_000001_0
2018-10-25 10:28:38,694 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-10-25 10:28:38,694 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-10-25 10:28:38,695 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2018-10-25 10:28:38,696 INFO mapred.MapTask: Processing split: hdfs://jelly.ad.hdfgroup.org:8020/tmp/input2:0+251
2018-10-25 10:28:38,699 INFO mapred.MapTask: numReduceTasks: 3
reduce 0%
2018-10-25 10:28:38,734 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2018-10-25 10:28:38,734 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2018-10-25 10:28:38,734 INFO mapred.MapTask: soft limit at 83886080
2018-10-25 10:28:38,734 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2018-10-25 10:28:38,734 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
MapOutputBuffer
2018-10-25 10:28:38,740 INFO streaming.PipeMapRed: PipeMapRed exec [/mnt/wrk/gheber/Bitbucket/ghorg/ESE/././mapper.sh]
/usr/bin/bash: ml: line 1: syntax error: unexpected end of file
/usr/bin/bash: error importing function definition for `BASH_FUNC_ml'
2018-10-25 10:28:38,750 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:38,750 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
/usr/bin/bash: module: line 1: syntax error: unexpected end of file
/usr/bin/bash: error importing function definition for `BASH_FUNC_module'
2018-10-25 10:28:41,126 INFO streaming.PipeMapRed: Records R/W=10/1
2018-10-25 10:28:42,275 INFO streaming.PipeMapRed: MRErrorThread done
2018-10-25 10:28:42,276 INFO streaming.PipeMapRed: mapRedFinished
2018-10-25 10:28:42,277 INFO mapred.LocalJobRunner:
2018-10-25 10:28:42,277 INFO mapred.MapTask: Starting flush of map output
2018-10-25 10:28:42,277 INFO mapred.MapTask: Spilling map output
2018-10-25 10:28:42,277 INFO mapred.MapTask: bufstart = 0; bufend = 9096; bufvoid = 104857600
2018-10-25 10:28:42,277 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26212668(104850672); length = 1729/6553600
2018-10-25 10:28:42,281 INFO mapred.MapTask: Finished spill 0
2018-10-25 10:28:42,283 INFO mapred.Task: Task:attempt_local970602474_0001_m_000001_0 is done. And is in the process of committing
2018-10-25 10:28:42,287 INFO mapred.LocalJobRunner: Records R/W=10/1
2018-10-25 10:28:42,287 INFO mapred.Task: Task 'attempt_local970602474_0001_m_000001_0' done.
2018-10-25 10:28:42,288 INFO mapred.Task: Final Counters for attempt_local970602474_0001_m_000001_0: Counters: 22
File System Counters
FILE: Number of bytes read=176804
FILE: Number of bytes written=699055
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1076
HDFS: Number of bytes written=0
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Map-Reduce Framework
Map input records=10
Map output records=433
Map output bytes=9096
Map output materialized bytes=9980
Input split bytes=96
Combine input records=0
Spilled Records=433
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=1547698176
File Input Format Counters
Bytes Read=251
2018-10-25 10:28:42,288 INFO mapred.LocalJobRunner: Finishing task: attempt_local970602474_0001_m_000001_0
2018-10-25 10:28:42,288 INFO mapred.LocalJobRunner: map task executor complete.
2018-10-25 10:28:42,295 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2018-10-25 10:28:42,296 INFO mapred.LocalJobRunner: Starting task: attempt_local970602474_0001_r_000000_0
2018-10-25 10:28:42,307 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-10-25 10:28:42,307 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-10-25 10:28:42,308 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2018-10-25 10:28:42,314 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@439c1391
2018-10-25 10:28:42,317 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2018-10-25 10:28:42,348 INFO reduce.MergeManagerImpl: The max number of bytes for a single in-memory shuffle cannot be larger than Integer.MAX_VALUE. Setting it to Integer.MAX_VALUE
2018-10-25 10:28:42,348 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=20041957376, maxSingleShuffleLimit=2147483647, mergeThreshold=13227692032, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2018-10-25 10:28:42,352 INFO reduce.EventFetcher: attempt_local970602474_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
1 about to shuffle output of map attempt_local970602474_0001_m_000000_0 decomp: 3850 len: 3854 to MEMORY
2018-10-25 10:28:42,391 INFO reduce.InMemoryMapOutput: Read 3850 bytes from map-output for attempt_local970602474_0001_m_000000_0
map-output of size: 3850, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->3850
1 about to shuffle output of map attempt_local970602474_0001_m_000001_0 decomp: 964 len: 968 to MEMORY
2018-10-25 10:28:42,396 INFO reduce.InMemoryMapOutput: Read 964 bytes from map-output for attempt_local970602474_0001_m_000001_0
map-output of size: 964, inMemoryMapOutputs.size() -> 2, commitMemory -> 3850, usedMemory ->4814
2018-10-25 10:28:42,397 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2018-10-25 10:28:42,398 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,398 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
2018-10-25 10:28:42,406 INFO mapred.Merger: Merging 2 sorted segments
2018-10-25 10:28:42,406 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 4744 bytes
2018-10-25 10:28:42,409 INFO reduce.MergeManagerImpl: Merged 2 segments, 4814 bytes to disk to satisfy reduce memory limit
2018-10-25 10:28:42,409 INFO reduce.MergeManagerImpl: Merging 1 files, 4816 bytes from disk
2018-10-25 10:28:42,410 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2018-10-25 10:28:42,410 INFO mapred.Merger: Merging 1 sorted segments
2018-10-25 10:28:42,411 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 4777 bytes
2018-10-25 10:28:42,411 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,415 INFO streaming.PipeMapRed: PipeMapRed exec [/mnt/wrk/gheber/Bitbucket/ghorg/ESE/././hdfs-vfd-reducer]
2018-10-25 10:28:42,417 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2018-10-25 10:28:42,418 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2018-10-25 10:28:42,548 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,549 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,551 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,554 INFO streaming.PipeMapRed: MRErrorThread done
2018-10-25 10:28:42,557 INFO streaming.PipeMapRed: Records R/W=130/1
2018-10-25 10:28:42,557 INFO streaming.PipeMapRed: mapRedFinished
2018-10-25 10:28:42,660 INFO mapred.Task: Task:attempt_local970602474_0001_r_000000_0 is done. And is in the process of committing
2018-10-25 10:28:42,663 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,663 INFO mapred.Task: Task attempt_local970602474_0001_r_000000_0 is allowed to commit now
2018-10-25 10:28:42,700 INFO output.FileOutputCommitter: Saved output of task 'attempt_local970602474_0001_r_000000_0' to hdfs://jelly.ad.hdfgroup.org:8020/tmp/hdfs-vfd-output
reduce
2018-10-25 10:28:42,701 INFO mapred.Task: Task 'attempt_local970602474_0001_r_000000_0' done.
2018-10-25 10:28:42,702 INFO mapred.Task: Final Counters for attempt_local970602474_0001_r_000000_0: Counters: 29
File System Counters
FILE: Number of bytes read=189972
FILE: Number of bytes written=703871
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1076
HDFS: Number of bytes written=452
HDFS: Number of read operations=14
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=10
Reduce shuffle bytes=4822
Reduce input records=130
Reduce output records=10
Spilled Records=130
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=0
Total committed heap usage (bytes)=1547698176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=452
2018-10-25 10:28:42,702 INFO mapred.LocalJobRunner: Finishing task: attempt_local970602474_0001_r_000000_0
2018-10-25 10:28:42,703 INFO mapred.LocalJobRunner: Starting task: attempt_local970602474_0001_r_000001_0
2018-10-25 10:28:42,705 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-10-25 10:28:42,705 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-10-25 10:28:42,705 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2018-10-25 10:28:42,705 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@578316ce
2018-10-25 10:28:42,706 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2018-10-25 10:28:42,707 INFO reduce.MergeManagerImpl: The max number of bytes for a single in-memory shuffle cannot be larger than Integer.MAX_VALUE. Setting it to Integer.MAX_VALUE
2018-10-25 10:28:42,707 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=20041957376, maxSingleShuffleLimit=2147483647, mergeThreshold=13227692032, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2018-10-25 10:28:42,708 INFO reduce.EventFetcher: attempt_local970602474_0001_r_000001_0 Thread started: EventFetcher for fetching Map Completion Events
2 about to shuffle output of map attempt_local970602474_0001_m_000000_0 decomp: 3850 len: 3854 to MEMORY
2018-10-25 10:28:42,713 INFO reduce.InMemoryMapOutput: Read 3850 bytes from map-output for attempt_local970602474_0001_m_000000_0
map-output of size: 3850, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->3850
2 about to shuffle output of map attempt_local970602474_0001_m_000001_0 decomp: 8008 len: 8012 to MEMORY
2018-10-25 10:28:42,716 INFO reduce.InMemoryMapOutput: Read 8008 bytes from map-output for attempt_local970602474_0001_m_000001_0
map-output of size: 8008, inMemoryMapOutputs.size() -> 2, commitMemory -> 3850, usedMemory ->11858
2018-10-25 10:28:42,717 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2018-10-25 10:28:42,717 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,717 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
2018-10-25 10:28:42,719 INFO mapred.Merger: Merging 2 sorted segments
2018-10-25 10:28:42,719 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 11788 bytes
2018-10-25 10:28:42,721 INFO reduce.MergeManagerImpl: Merged 2 segments, 11858 bytes to disk to satisfy reduce memory limit
2018-10-25 10:28:42,722 INFO reduce.MergeManagerImpl: Merging 1 files, 11860 bytes from disk
2018-10-25 10:28:42,722 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2018-10-25 10:28:42,722 INFO mapred.Merger: Merging 1 sorted segments
2018-10-25 10:28:42,722 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 11821 bytes
2018-10-25 10:28:42,722 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,726 INFO streaming.PipeMapRed: PipeMapRed exec [/mnt/wrk/gheber/Bitbucket/ghorg/ESE/././hdfs-vfd-reducer]
reduce 33%
2018-10-25 10:28:42,766 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,766 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,767 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,773 INFO streaming.PipeMapRed: MRErrorThread done
2018-10-25 10:28:42,774 INFO streaming.PipeMapRed: Records R/W=483/1
2018-10-25 10:28:42,775 INFO streaming.PipeMapRed: mapRedFinished
2018-10-25 10:28:42,832 INFO mapred.Task: Task:attempt_local970602474_0001_r_000001_0 is done. And is in the process of committing
2018-10-25 10:28:42,835 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,835 INFO mapred.Task: Task attempt_local970602474_0001_r_000001_0 is allowed to commit now
2018-10-25 10:28:42,857 INFO output.FileOutputCommitter: Saved output of task 'attempt_local970602474_0001_r_000001_0' to hdfs://jelly.ad.hdfgroup.org:8020/tmp/hdfs-vfd-output
reduce
2018-10-25 10:28:42,858 INFO mapred.Task: Task 'attempt_local970602474_0001_r_000001_0' done.
2018-10-25 10:28:42,859 INFO mapred.Task: Final Counters for attempt_local970602474_0001_r_000001_0: Counters: 29
File System Counters
FILE: Number of bytes read=215100
FILE: Number of bytes written=715731
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1076
HDFS: Number of bytes written=984
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=13
Reduce shuffle bytes=11866
Reduce input records=483
Reduce output records=13
Spilled Records=483
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=0
Total committed heap usage (bytes)=1547698176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=532
2018-10-25 10:28:42,859 INFO mapred.LocalJobRunner: Finishing task: attempt_local970602474_0001_r_000001_0
2018-10-25 10:28:42,859 INFO mapred.LocalJobRunner: Starting task: attempt_local970602474_0001_r_000002_0
2018-10-25 10:28:42,861 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-10-25 10:28:42,861 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-10-25 10:28:42,862 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2018-10-25 10:28:42,862 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6aff7da3
2018-10-25 10:28:42,862 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2018-10-25 10:28:42,863 INFO reduce.MergeManagerImpl: The max number of bytes for a single in-memory shuffle cannot be larger than Integer.MAX_VALUE. Setting it to Integer.MAX_VALUE
2018-10-25 10:28:42,863 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=20041957376, maxSingleShuffleLimit=2147483647, mergeThreshold=13227692032, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2018-10-25 10:28:42,864 INFO reduce.EventFetcher: attempt_local970602474_0001_r_000002_0 Thread started: EventFetcher for fetching Map Completion Events
3 about to shuffle output of map attempt_local970602474_0001_m_000000_0 decomp: 4331 len: 4335 to MEMORY
2018-10-25 10:28:42,886 INFO reduce.InMemoryMapOutput: Read 4331 bytes from map-output for attempt_local970602474_0001_m_000000_0
map-output of size: 4331, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->4331
3 about to shuffle output of map attempt_local970602474_0001_m_000001_0 decomp: 996 len: 1000 to MEMORY
2018-10-25 10:28:42,891 INFO reduce.InMemoryMapOutput: Read 996 bytes from map-output for attempt_local970602474_0001_m_000001_0
map-output of size: 996, inMemoryMapOutputs.size() -> 2, commitMemory -> 4331, usedMemory ->5327
2018-10-25 10:28:42,892 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2018-10-25 10:28:42,893 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,894 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
2018-10-25 10:28:42,895 INFO mapred.Merger: Merging 2 sorted segments
2018-10-25 10:28:42,895 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 5257 bytes
2018-10-25 10:28:42,897 INFO reduce.MergeManagerImpl: Merged 2 segments, 5327 bytes to disk to satisfy reduce memory limit
2018-10-25 10:28:42,897 INFO reduce.MergeManagerImpl: Merging 1 files, 5329 bytes from disk
2018-10-25 10:28:42,898 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2018-10-25 10:28:42,898 INFO mapred.Merger: Merging 1 sorted segments
2018-10-25 10:28:42,898 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 5290 bytes
2018-10-25 10:28:42,899 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,903 INFO streaming.PipeMapRed: PipeMapRed exec [/mnt/wrk/gheber/Bitbucket/ghorg/ESE/././hdfs-vfd-reducer]
2018-10-25 10:28:42,934 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,934 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,934 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2018-10-25 10:28:42,936 INFO streaming.PipeMapRed: MRErrorThread done
2018-10-25 10:28:42,937 INFO streaming.PipeMapRed: Records R/W=145/1
2018-10-25 10:28:42,938 INFO streaming.PipeMapRed: mapRedFinished
2018-10-25 10:28:42,982 INFO mapred.Task: Task:attempt_local970602474_0001_r_000002_0 is done. And is in the process of committing
2018-10-25 10:28:42,985 INFO mapred.LocalJobRunner: 2 / 2 copied.
2018-10-25 10:28:42,985 INFO mapred.Task: Task attempt_local970602474_0001_r_000002_0 is allowed to commit now
2018-10-25 10:28:43,024 INFO output.FileOutputCommitter: Saved output of task 'attempt_local970602474_0001_r_000002_0' to hdfs://jelly.ad.hdfgroup.org:8020/tmp/hdfs-vfd-output
reduce
2018-10-25 10:28:43,025 INFO mapred.Task: Task 'attempt_local970602474_0001_r_000002_0' done.
2018-10-25 10:28:43,026 INFO mapred.Task: Final Counters for attempt_local970602474_0001_r_000002_0: Counters: 29
File System Counters
FILE: Number of bytes read=225924
FILE: Number of bytes written=721060
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1076
HDFS: Number of bytes written=1505
HDFS: Number of read operations=24
HDFS: Number of large read operations=0
HDFS: Number of write operations=7
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=12
Reduce shuffle bytes=5335
Reduce input records=145
Reduce output records=12
Spilled Records=145
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=18
Total committed heap usage (bytes)=1560805376
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=521
2018-10-25 10:28:43,026 INFO mapred.LocalJobRunner: Finishing task: attempt_local970602474_0001_r_000002_0
2018-10-25 10:28:43,026 INFO mapred.LocalJobRunner: reduce task executor complete.
reduce 100%
2018-10-25 10:28:43,741 INFO mapreduce.Job: Job job_local970602474_0001 completed successfully
2018-10-25 10:28:43,769 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=984393
FILE: Number of bytes written=3528712
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5129
HDFS: Number of bytes written=2941
HDFS: Number of read operations=73
HDFS: Number of large read operations=0
HDFS: Number of write operations=17
Map-Reduce Framework
Map input records=35
Map output records=758
Map output bytes=20471
Map output materialized bytes=22023
Input split bytes=192
Combine input records=0
Combine output records=0
Reduce input groups=35
Reduce shuffle bytes=22023
Reduce input records=758
Reduce output records=35
Spilled Records=1516
Shuffled Maps =6
Failed Shuffles=0
Merged Map outputs=6
GC time elapsed (ms)=18
Total committed heap usage (bytes)=7751598080
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1076
File Output Format Counters
Bytes Written=1505
2018-10-25 10:28:43,770 INFO streaming.StreamJob: Output directory: hdfs://jelly.ad.hdfgroup.org:8020/tmp/hdfs-vfd-output
[gheber@jelly ESE]$ /tmp/GSSTF_NCEP.3.1987.12.02.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.05.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.08.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.11.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.14.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.17.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.20.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.23.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.26.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.29.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.03.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.06.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.09.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.12.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.15.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.18.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.21.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.24.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.27.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.30.he5 G 8 D 5 T 0
/tmp/efitOut.nc G 35 D 305 T 7
/tmp/sample.h5 G 4 D 0 T 0
/tmp/t.h5 G 1 D 1 T 0
/tmp/GSSTF_NCEP.3.1987.12.01.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.04.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.07.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.10.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.13.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.16.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.19.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.22.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.25.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.28.he5 G 8 D 5 T 0
/tmp/GSSTF_NCEP.3.1987.12.31.he5 G 8 D 5 T 0
/tmp/foo.h5 G 2 D 0 T 0