Best approach using Spring package to handle large file
I am using Spring package to upload a large file to process it. the scenario is pretty simple:
1. Download the file via http
2. process it(validations,transformations)
3. send it into queue
- No need to save data from input files.
- we can have multiple instances of a job (of the same script) running at the same time
I am looking for the best practice to solve this problem.
Should I create a Tasklet to download the file locally and start processing it with the usual steps?
in this case i need to address some temp file issues (make sure i delete it, make sure i don't override another temp file name, etc.)
In the other hand, I could download it and store it in memory, but I'm afraid that if I run many instances of jobs, I will soon be out of memory.
How would you suggest nailing this script? Should I be using a tasklet at all?
thank.
source to share
If you have a large file, I would recommend storing it on disk unless there is a good reason. By saving the file to disk, you can restart the job without having to re-download the file if an error occurs.
As far as integration Tasklet
vs Spring is concerned , we generally recommend Spring Integration for this type of functionality, since the FTP functionality is already available there. That being said, Spring XD uses a function Tasklet
for FTP, so this approach should often be used as well.
A good video to watch Spring Batch and Spring Integration Integration is Gunnar Hillert's talk I gave on SpringOne2GX. You can find the entire video here: https://www.youtube.com/watch?v=8tiqeV07XlI . The section that talks about using Spring Batch Integration for FTP before Spring Batch is around 29:37.
source to share
I believe below example is a classic solution to your problem http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#launching-batch-jobs-through-messages
source to share