Xargs command does not support hasoop put command with multiple input files
hasoop supports copying multiple local files to hdfs using the command
hadoop fs -put localfile1 localfile2 /user/hadoop/hdfsdir
We need to copy hundreds of thousands of files due to memory problems, we want to copy to chuncks using xargs.
But below command gives error.
echo "localfile1 localfile2" |xargs -t -I {} hadoop fs -put {} /user/hadoop/hdfsdir
He gives put: unexpected URISyntaxException error
.
Here localfile1
and localfile2
are the files in my working directory.
One file command works, i.e.
echo "localfile1" |xargs -t -I {} hadoop fs -put {} /user/hadoop/hdfsdir
source to share
I followed this tutorial and wrote the following command to download all 4 text files at a time:
find . -name '*.textile' -print0 |xargs -0 -P 4 -I % hadoop fs -put % /user/myName/
- -print0: make a null separating token list
- -0: so xarg can recognize the null separator
- -n: so that multiple puts are executed in parallel
- -I: so that each token is included in the hasoop fs -put TOKEN_GOES_HERE
I don't think this approach applies to folder structures, which means that the folder structure from your local system is not persisted across the cluster. Also, if you have filenames in multiple folders, you will get the file already exists.
source to share