How reliable would it be to download over 100,000 files via wget from a bash file via ssh?

I have a bash file containing wget commands to download over 100,000 files for a total of about 20GB.

The bash file looks something like this:

wget http://something.com/path/to/file.data

wget http://something.com/path/to/file2.data

wget http://something.com/path/to/file3.data

wget http://something.com/path/to/file4.data

And there are exactly 114,770 lines of that. How reliable would ssh be to the server I have an account with and run it? Will my ssh session time out eventually? should i be all the time? What if my local computer crashes / goes offline?

Also, does anyone know how much resources will be required? Am I crazy to do this on a shared server?

I know this is a weird question, just wondering if anyone has any ideas. Thank!

0


source to share


7 replies


Using

# nohup. / scriptname &> logname.log

This will ensure

  • The process will continue even if the ssh session is interrupted
  • You can control him as he is in action


We also recommend that you can receive hints at regular intervals, it will be useful for analyzing the logs. eg#echo "1000 files copied"


As far as resource usage goes, it depends entirely on the system and mainly on the characteristics of the network. In theory, you can call time with just data size and bandwidth. But in real life, delays, delays and data loss occur.

So, do some additions, do some math and you should have an answer :)

+4


source


Depends on the reliability of the communication medium, hardware, ...!



You can use screen

to maintain it when disconnected from a remote computer.

+1


source


Start it with

nohup ./scriptname &

      

and everything should be all right. Also I would recommend that you log the progress so you can know where it left off if it does.

wget url >>logfile.log

      

may be enough.

To track your progress in real time, you can:

tail -f logfile.log

      

0


source


You want to disconnect the script from your shell and run it in the background (using nohup) so that it keeps running on logout.

You also want to have some sort of progress indicator, such as a log file, that logs every file uploaded as well as all error messages. Nohup sends stderr and stdout to files. With such a file, you can pick up broken downloads and abort later.

First try running a test with a small set of files to see if you have the command down and how the output.

0


source


I suggest you detach it from the shell with nohup

.

$ nohup myLongRunningScript.sh > script.stdout 2>script.stderr &
$ exit

      

The script will run before it finishes - you don't need to log everything.

Check any options you can give wget to try again.

0


source


If possible, create MD5 checksums for all files and use them to check if they were transferred correctly.

0


source


It might be worth looking at an alternative technology like rsync . I've used it in many projects and it works really well.

0


source







All Articles