Best approach using Spring package to handle large file

I am using Spring package to upload a large file to process it. the scenario is pretty simple:

1. Download the file via http
2. process it(validations,transformations)
3. send it into queue

      

  • No need to save data from input files.
  • we can have multiple instances of a job (of the same script) running at the same time

I am looking for the best practice to solve this problem.

Should I create a Tasklet to download the file locally and start processing it with the usual steps?
in this case i need to address some temp file issues (make sure i delete it, make sure i don't override another temp file name, etc.)

In the other hand, I could download it and store it in memory, but I'm afraid that if I run many instances of jobs, I will soon be out of memory.

How would you suggest nailing this script? Should I be using a tasklet at all?

thank.

+1


source to share


2 answers


If you have a large file, I would recommend storing it on disk unless there is a good reason. By saving the file to disk, you can restart the job without having to re-download the file if an error occurs.

As far as integration Tasklet

vs Spring is concerned , we generally recommend Spring Integration for this type of functionality, since the FTP functionality is already available there. That being said, Spring XD uses a function Tasklet

for FTP, so this approach should often be used as well.



A good video to watch Spring Batch and Spring Integration Integration is Gunnar Hillert's talk I gave on SpringOne2GX. You can find the entire video here: https://www.youtube.com/watch?v=8tiqeV07XlI . The section that talks about using Spring Batch Integration for FTP before Spring Batch is around 29:37.

+3


source


I believe below example is a classic solution to your problem http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#launching-batch-jobs-through-messages



+1


source







All Articles