Sqoop import job crashes due to task timeout

I was trying to import a 1TB table to MySQL in HDFS using sqoop. Command used:

sqoop import --connect jdbc:mysql://xx.xx.xxx.xx/MyDB --username myuser --password mypass --table mytable --split-by rowkey -m 14

After executing the request for the limiting shafts, all starters start, but after a while the tasks will be killed due to a timeout (1200 seconds). This I think is due to the fact that the time it takes to execute a query select

running in each transformer takes longer than the time set for the timeout (in sqoop it seems to be 1200 seconds); and hence it does not report the status and the task is killed afterwards. (I also tried it for 100gb datasets, it still failed due to timeout for multiple cartographers.) For a single crankcase import, it works great as no filtered results are required. Is there a way to override the timeout of the map task (e.g. set it to 0

or very high) when using multiple mappers in sqoop?

+3


source to share


1 answer


Sqoop uses a special thread to send statuses so that the map task is not killed by the jobtracker. I would be interested to explore your problem further. Could you separate the sqoop log, one of the map task logs and the table schema?



Jarcec

0


source







All Articles