Xargs command does not support hasoop put command with multiple input files

hasoop supports copying multiple local files to hdfs using the command

hadoop fs -put localfile1 localfile2 /user/hadoop/hdfsdir

      

We need to copy hundreds of thousands of files due to memory problems, we want to copy to chuncks using xargs.

But below command gives error.

echo "localfile1 localfile2" |xargs  -t -I {} hadoop fs -put {} /user/hadoop/hdfsdir

      

He gives put: unexpected URISyntaxException error

.

Here localfile1

and localfile2

are the files in my working directory.

One file command works, i.e.

echo "localfile1" |xargs  -t -I {} hadoop fs -put {} /user/hadoop/hdfsdir

      

+3


source to share


1 answer


It may be too late, but I stumbled upon your question when I tried to do the same.

I followed this tutorial and wrote the following command to download all 4 text files at a time:

find . -name '*.textile' -print0 |xargs  -0 -P 4 -I % hadoop fs -put % /user/myName/

      



  • -print0: make a null separating token list
  • -0: so xarg can recognize the null separator
  • -n: so that multiple puts are executed in parallel
  • -I: so that each token is included in the hasoop fs -put TOKEN_GOES_HERE

I don't think this approach applies to folder structures, which means that the folder structure from your local system is not persisted across the cluster. Also, if you have filenames in multiple folders, you will get the file already exists.

+2


source







All Articles