Sqoop import job crashes due to task timeout
I was trying to import a 1TB table to MySQL in HDFS using sqoop. Command used:
sqoop import --connect jdbc:mysql://xx.xx.xxx.xx/MyDB --username myuser --password mypass --table mytable --split-by rowkey -m 14
After executing the request for the limiting shafts, all starters start, but after a while the tasks will be killed due to a timeout (1200 seconds). This I think is due to the fact that the time it takes to execute a query select
running in each transformer takes longer than the time set for the timeout (in sqoop it seems to be 1200 seconds); and hence it does not report the status and the task is killed afterwards. (I also tried it for 100gb datasets, it still failed due to timeout for multiple cartographers.) For a single crankcase import, it works great as no filtered results are required. Is there a way to override the timeout of the map task (e.g. set it to 0
or very high) when using multiple mappers in sqoop?
source to share