How can I loop through all filenames in a Google Cloud Storage subdirectory using python?

Let's say I have a bucket / subdirectory in Google Cloud Storage and this bucket address is:

gs://test-monkeys-example/training_data/cats

      

In this cat subdirectory, I have a bunch of cat images, all of which are jpg. How could I, in python, loop through the cats subdirectory and print all the filenames in it?

Something like:

for x in directory('gs://test-monkeys-example/training_data/cats'):
    print(x)

      

Obviously the directory ('gs: // test-monkeys-example / training_data / cats') is not how to do it, but just psuedocode - how would I do that ?!

+3


source to share


2 answers


Use a storage module:

import google.datalab.storage as storage
cats = [o.key for o in storage.Bucket('test-monkeys-example').objects()
  if o.key.startswith('training_data/cats')]

      

This gives you a list of such cats.

Alternatively, you can use the class Objects

:



cats = [o.key for o in storage.Objects('test-monkeys-example', '', '')
  if o.key.startswith('training_data/cats')]

      

If you don't want a list wrapped in a variable, you can use magic %gcs

, it's easier:

%gcs list -o gs://test-monkeys-example/training_data/cats/*

      

This outputs the HTML keymap. Note that this is the full GCS path starting at gs://

.

+5


source


Google Cloud Storage only supports listing items that start with a specific prefix. You can access it from the client library like this:



from google.cloud import storage

client = storage.Client()
bucket = client.bucket('mybucket')
for blob in bucket.list_blobs(prefix='training_data/cats'):
  print blob.name

      

+4


source







All Articles