How to copy and extract .gz files using python
I am just starting to learn python and am asking a question.
How do I create a script to do the following: (will write how I do it in bash)
-
Copy
<file>.gz
from remote server1 to local storage.cp / dumps / server1 / file1.gz / local /
-
Then extract this file locally.
gunzip / local / file1.gz
-
Then copy the extract file to remote server2 (for archiving and deduplication purposes)
cp / local / file1.dump / dedupmount
-
delete the local copy of the .gz file to free space on "temporary" storage
rm -rf / local / file1.gz
I need to run this all in a loop for all files. All files and directories are installed by NFS on one server.
A for loop goes through the folder /dump/
and looks for files .gz
. Each file .gz
will first be copied to the directory /local
and then extracted. Once extracted, the unpacked file .dmp
will be copied /dedupmount
to the archive folder .
Just banging my head against the wall, how to write it.
source to share
Python solution
Although the shell code may be shorter, the entire process can be performed natively in python. The key points of python solution are:
-
With the
gzip
gzipped module , files are easy to read like regular files. -
A module is used to list the source files
glob
. It is modeled after the glob shell function. -
Use the python module to manipulate the paths
os.path
. It provides an OS independent interface to the file system.
Here's some sample code:
import gzip
import glob
import os.path
source_dir = "/dumps/server1"
dest_dir = "/dedupmount"
for src_name in glob.glob(os.path.join(source_dir, '*.gz')):
base = os.path.basename(src_name)
dest_name = os.path.join(dest_dir, base[:-3])
with gzip.open(src_name, 'rb') as infile:
with open(dest_name, 'wb') as outfile:
for line in infile:
outfile.write(line)
This code is read from the remote1 server and written to the remote2 server. This is not needed for a local copy unless you want to.
In this code, all decompression is done by the CPU on the local computer.
Shell code
For comparison, here's the equivalent wrapper code:
for src in /dumps/server1/*.gz
do
base=${src##*/}
dest="/dedupmount/${base%.gz}"
zcat "$src" >"$dest"
done
Three-step Python code
This slightly more sophisticated approach implements the OP's three-step algorithm that uses a temporary file on the local machine:
import gzip
import glob
import os.path
import shutil
source_dir = "./dumps/server1"
dest_dir = "./dedupmount"
tmpfile = "/tmp/delete.me"
for src_name in glob.glob(os.path.join(source_dir, '*.gz')):
base = os.path.basename(src_name)
dest_name = os.path.join(dest_dir, base[:-3])
shutil.copyfile(src_name, tmpfile)
with gzip.open(tmpfile, 'rb') as infile:
with open(dest_name, 'wb') as outfile:
for line in infile:
outfile.write(line)
This copies the source file to a temporary file on the local computer tmpfile
, and then runs it from there to the target file. tmpfile
will be overwritten every time this script is called.
Temporary files can be a security issue. To avoid this, place the temporary file in a directory that is only writable by the user who runs this script.
source to share