Hadoop: measuring shuffle time from JAVA

Is there a way to get the shuffle time required for each client side pruning task using the Hadoop API (Hadoop 1.2.1). I can get the execution times of the reduction tasks from the JobClient using the getReduceTaskReports (JobID jobID) method, but I'm wondering if there is a way to get the percentage corresponding to the shuffle time. Thank you in advance.

+3


source to share


1 answer


The solution to the problem was to use Apache Rumen ( http://hadoop.apache.org/docs/r1.2.1/rumen.html ). This structure allows you to fetch JSON formatted job history logs with simple JSON parsing. I was able to get the information I needed.



+1


source







All Articles