How reliable would it be to download over 100,000 files via wget from a bash file via ssh?
I have a bash file containing wget commands to download over 100,000 files for a total of about 20GB.
The bash file looks something like this:
wget http://something.com/path/to/file.data
wget http://something.com/path/to/file2.data
wget http://something.com/path/to/file3.data
wget http://something.com/path/to/file4.data
And there are exactly 114,770 lines of that. How reliable would ssh be to the server I have an account with and run it? Will my ssh session time out eventually? should i be all the time? What if my local computer crashes / goes offline?
Also, does anyone know how much resources will be required? Am I crazy to do this on a shared server?
I know this is a weird question, just wondering if anyone has any ideas. Thank!
source to share
Using
# nohup. / scriptname &> logname.log
This will ensure
- The process will continue even if the ssh session is interrupted
- You can control him as he is in action
We also recommend that you can receive hints at regular intervals, it will be useful for analyzing the logs. eg#echo "1000 files copied"
As far as resource usage goes, it depends entirely on the system and mainly on the characteristics of the network. In theory, you can call time with just data size and bandwidth. But in real life, delays, delays and data loss occur.
So, do some additions, do some math and you should have an answer :)
source to share
You want to disconnect the script from your shell and run it in the background (using nohup) so that it keeps running on logout.
You also want to have some sort of progress indicator, such as a log file, that logs every file uploaded as well as all error messages. Nohup sends stderr and stdout to files. With such a file, you can pick up broken downloads and abort later.
First try running a test with a small set of files to see if you have the command down and how the output.
source to share