How to load many (100K +) XML documents using mlcp without encountering too long argument list error?

When I try to load 160,000 XML documents into MarkLogic 8.0-2 using mlcp on MacOS 10.10.4 an error is thrown mlcp-Hadoop2-1.3-1/bin/mlcp.sh: line 16: /usr/bin/java: Argument list too long

.

The command I issue is:

mlcp import -database FO -username sss4r -password ******* -host localhost -port 8003 -mode local -input_file_pattern '*\.xml' -output_uri_replace "/Users/sss4r/Documents/FOPOC,''" -input_file_path .

I realize this is probably a Unix shell problem, mlcp uses the filesystem facilities to return a list of names. There is a system limit on the number of filenames that can be processed in a command.

What does MarkLogician best recommend to fix this problem? Trying to bulk in small chunks? Try changing your system limit?

Thank.

+3


source to share


2 answers


MLCP does not depend on the shell extension to be able to upload files. I'm afraid the shell expansion is happening inside mlcp.sh, but only unintentionally. If you reset the input file template parameter, you will likely see that it downloads all files. A quick fix might be to put the files in a sub-dir, not use the file template, and just point the sub-dir as the input_file_path.

Rob S. gives another solution that prevents this. Put your options in a file, each option on a separate line, and point to the option -options_file

on the command line. It also saves you the trouble with quotes and other special characters unintentionally interpreted by the shell environment.

More details here: https://docs.marklogic.com/guide/ingestion/content-pump#id_36150



NTN!

PS: I filed a bug to improve MLCP (# 33670)

+3


source


First, you save a lot of grief if you use the options file when there are command line argument values ​​that the shell can interpolate. Otherwise, you end up fighting uphill against shell quoting. Geert has already provided a link to this syntax, so I won't repeat it.

Second, -input_file_pattern

Java regex is required. *\.xml

probably not what you want. You probably mean .*\.xml

. For template language references used by mlcp see



https://docs.marklogic.com/guide/ingestion/content-pump#id_10243

+4


source







All Articles