Job Monitoring and Cancellation

Following simple commands show how interact with the YARN resource manager and to monitor and cancel map-reduce or Spark jobs scheduled from the command line.

Map-reduce job monitoring and cancelation

To list MapReduce tasks and check their status:

$ mapred job -list
15/05/18 10:28:00 INFO client.RMProxy: Connecting to ResourceManager at /10.33.2.246:8032
Total jobs:1
JobId State StartTime UserName Queue Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info
job_1430333061340_0099 RUNNING 1431959273906 myusername root.default NORMAL 6 0 7168M 0M 7168M http://bd-appnode:8088/proxy/application_1430333061340_0099/

To view the status of a specific MapReduce job (example output is shown below):

$ mapred job -status job_1430333061340_0099
Job: job_1430333061340_0099

Job File: hdfs://bdcluster:8020/user/history/done/2015/05/18/000000/job_1430333061340_0099_conf.xml
Job Tracking URL : bd-jobnode:19888/jobhistory/job/job_1430333061340_0099
Uber job : false
Number of maps: 10
Number of reduces: 1
map() completion: 1.0
reduce() completion: 1.0
Job state: SUCCEEDED
retired: false
reason for failure:
Counters: 50
... Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97

To cancel a map-reduce job:
$ mapred job -kill <job id>
To retrieve the log files of a completed job (either map-reduce or a spark job):
$ yarn logs -applicationId <application id>
Workflow management with Oozie