How can I loop through all filenames in a Google Cloud Storage subdirectory using python?
Let's say I have a bucket / subdirectory in Google Cloud Storage and this bucket address is:
gs://test-monkeys-example/training_data/cats
In this cat subdirectory, I have a bunch of cat images, all of which are jpg. How could I, in python, loop through the cats subdirectory and print all the filenames in it?
Something like:
for x in directory('gs://test-monkeys-example/training_data/cats'):
print(x)
Obviously the directory ('gs: // test-monkeys-example / training_data / cats') is not how to do it, but just psuedocode - how would I do that ?!
source to share
Use a storage module:
import google.datalab.storage as storage
cats = [o.key for o in storage.Bucket('test-monkeys-example').objects()
if o.key.startswith('training_data/cats')]
This gives you a list of such cats.
Alternatively, you can use the class Objects
:
cats = [o.key for o in storage.Objects('test-monkeys-example', '', '')
if o.key.startswith('training_data/cats')]
If you don't want a list wrapped in a variable, you can use magic %gcs
, it's easier:
%gcs list -o gs://test-monkeys-example/training_data/cats/*
This outputs the HTML keymap. Note that this is the full GCS path starting at gs://
.
source to share
Google Cloud Storage only supports listing items that start with a specific prefix. You can access it from the client library like this:
from google.cloud import storage
client = storage.Client()
bucket = client.bucket('mybucket')
for blob in bucket.list_blobs(prefix='training_data/cats'):
print blob.name
source to share