How to copy and extract .gz files using python

I am just starting to learn python and am asking a question.

How do I create a script to do the following: (will write how I do it in bash)

  • Copy <file>.gz

    from remote server1 to local storage.

    cp / dumps / server1 / file1.gz / local /

  • Then extract this file locally.

    gunzip / local / file1.gz

  • Then copy the extract file to remote server2 (for archiving and deduplication purposes)

    cp / local / file1.dump / dedupmount

  • delete the local copy of the .gz file to free space on "temporary" storage

    rm -rf / local / file1.gz

I need to run this all in a loop for all files. All files and directories are installed by NFS on one server.

A for loop goes through the folder /dump/

and looks for files .gz

. Each file .gz

will first be copied to the directory /local

and then extracted. Once extracted, the unpacked file .dmp

will be copied /dedupmount

to the archive folder .

Just banging my head against the wall, how to write it.

+3


source to share


2 answers


Python solution

Although the shell code may be shorter, the entire process can be performed natively in python. The key points of python solution are:

  • With the gzip

    gzipped module , files are easy to read like regular files.

  • A module is used to list the source files glob

    . It is modeled after the glob shell function.

  • Use the python module to manipulate the paths os.path

    . It provides an OS independent interface to the file system.

Here's some sample code:

import gzip
import glob
import os.path
source_dir = "/dumps/server1"
dest_dir = "/dedupmount"

for src_name in glob.glob(os.path.join(source_dir, '*.gz')):
    base = os.path.basename(src_name)
    dest_name = os.path.join(dest_dir, base[:-3])
    with gzip.open(src_name, 'rb') as infile:
        with open(dest_name, 'wb') as outfile:
            for line in infile:
                outfile.write(line)

      

This code is read from the remote1 server and written to the remote2 server. This is not needed for a local copy unless you want to.

In this code, all decompression is done by the CPU on the local computer.

Shell code



For comparison, here's the equivalent wrapper code:

for src in /dumps/server1/*.gz
do
    base=${src##*/}
    dest="/dedupmount/${base%.gz}"
    zcat "$src" >"$dest"
done

      

Three-step Python code

This slightly more sophisticated approach implements the OP's three-step algorithm that uses a temporary file on the local machine:

import gzip
import glob
import os.path
import shutil

source_dir = "./dumps/server1"
dest_dir = "./dedupmount"
tmpfile = "/tmp/delete.me"

for src_name in glob.glob(os.path.join(source_dir, '*.gz')):
    base = os.path.basename(src_name)
    dest_name = os.path.join(dest_dir, base[:-3])
    shutil.copyfile(src_name, tmpfile)
    with gzip.open(tmpfile, 'rb') as infile:
        with open(dest_name, 'wb') as outfile:
            for line in infile:
                outfile.write(line)

      

This copies the source file to a temporary file on the local computer tmpfile

, and then runs it from there to the target file. tmpfile

will be overwritten every time this script is called.

Temporary files can be a security issue. To avoid this, place the temporary file in a directory that is only writable by the user who runs this script.

+12


source


you can use urlopen module

import urllib
#urlretrieve will save the file to local drive
urllib.urlretrieve(url,file_name_to_save)

      



now you can use gunzip utitlty to fetch, use os.system

0


source







All Articles