Is there any advantage to using gsutil or google cloud storage API in production passes?

What is better to use when transferring products, gsutil, or Google Cloud Storage API?

+3


source to share


2 answers


gsutil

uses the Google Cloud Storage API to transfer data, in particular the JSON API (you can change it by default). Its main advantage over using the API directly is that it is configured for fast data transfer. For example, it can open multiple concurrent connections to GCS, each downloading or downloading a portion of a file at the same time, which in many cases can provide a significant increase in overall throughput.



There is no reason to believe that programming against APIs directly cannot provide the same or even better performance, but I would expect gsutil to be at least slightly faster on average if you implement things in the simplest way possible.

+3


source


I'm not sure if that adds much to what Brandon said. I'm very new to the gcloud and Python repository, but I quickly found myself preferring to use the gsutil command line over the python client library. I am creating compute instances that copy several GB of input from cloud storage after they have been downloaded. I found it to be both faster and faster using the gsutil command line where possible, so in my Python code I use:

import subprocess
subprocess.call("gsutil -m cp gs://my-uberdata-archive/* /home/<username>/rawdata/", shell=True)

      



The main reasons are that I can execute the command on one line, whereas it uses multiple lines using the client library, and as Brandon points out, gsutil supports multithreading with the '-m' flag. I have not yet found an equivalent way to do this using the Python client library.

+3


source







All Articles