Load external data into Colab from PC, Google Drive, Google Sheets, and Google Cloud Storage

Ying-Ting 2018/04/18


有四種方式將檔案上傳至colab 平台:

1 從PC上傳 (load data from local file system)

2 從google drive上傳 (Load data from Google Drive)

3 從google sheet上傳

4


a. Load data from local file system

Uploading files from the computer local file system

files.upload return a dictionary of the files which were uploaded. The dictionary is keyed by the file name, the value is the data which was uploaded.

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

執行結果:files.upload() 語法提供上傳檔案到 Colab 的執行環境中

Downloading files to the computer local file system

files.download will invoke a browser downloaded of the file to user's local computer.

from google.colab import files

with open('example.txt', 'w') as f:
    f.write('some content')

files.download('example.txt')

執行結果:執行過程中開啟 "example.txt" 文件並寫入一行字 "some content"。最後將檔案下載到自己的電腦中。

b. Load data from Google Drive

Access files in Google Drive using the native REST API or a wrapper like PyDrive.

PyDrive

The example below shows #1 authenticate, #2 file upload, and #3 file download from Google Drive. More examples are available in the PyDrive documentation

!pip install -U -q PyDrive

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# PyDrive reference:
# https://googledrive.github.io/PyDrive/docs/build/html/index.html

# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))

執行結果:執行過程會選擇要使用的 Google 帳戶,根據引導取得該帳戶的認證碼,貼回程式下方

執行結果:在 Google Drive 產生一個新檔案 "Sample upload.txt"

執行結果:"Sample upload.txt" 檔案在 Google Drive 中

Drive REST API

Authentication is the first step

from google.colab import auth
auth.authenticate_user()

Then construct a Drive API client.

from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

When client created, can use any of functions in the Google Drive API reference.

Create a new Google Drive file with data from Python

# Create a local file.
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

# print out the file content
print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

執行結果:

After executing the code above, a new file named "Sample file" will appear in drive.google.com file list.

from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

# Upload the file to Drive. See:
#
# https://developers.google.com/drive/v3/reference/files/create
# https://developers.google.com/drive/v3/web/manage-uploads
from googleapiclient.http import MediaFileUpload

file_metadata = {
  'name': 'Sample file',
  'mimeType': 'text/plain'
}
media = MediaFileUpload('/tmp/to_upload.txt', 
                        mimetype='text/plain',
                        resumable=True)
created = drive_service.files().create(body=file_metadata,
                                       media_body=media,
                                       fields='id').execute()
print('File ID: {}'.format(created.get('id')))

The file ID in execution result will different with the example code above.

執行結果:

Downloading data from a Google Drive file into Python

Before execute the code below, change the file_id field by id with the file in Google Drive.

# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'target_file_id'

import io
from googleapiclient.http import MediaIoBaseDownload

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, done = downloader.next_chunk()

downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))

執行結果:

c. Google Sheets

d. Google Cloud Storages (GCS)


Mount Google Drive to Colab

google-drive-ocamlfuse[1] is a FUSE filesystem backed by Google Drive, written in OCaml. It lets you mount your Google Drive on Linux.

Features (see what's new in version 0.6.x)

  • Full read/write access to ordinary files and folders
  • Read-only access to Google Docs, Sheets, and Slides (exported to configurable formats)
  • Multiple account support
  • Duplicate file handling
  • Accessto trash (.Trash directory)
  • Unix permission and ownership
  • Symbolic links
  • Read-ahead buffers when streaming
  • Accessing content shared with you (requires configuration)
  • Team Drive Support
  1. Install Fuse Wrapper
    # Install a Drive FUSE wrapper.
    # https://github.com/astrada/google-drive-ocamlfuse
    !apt-get update -qq 2>&1 > /dev/null
    !apt-get install -y -qq software-properties-common python-software-properties module-init-tools
    !add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
    !apt-get update -qq 2>&1 > /dev/null
    !apt-get -y install -qq google-drive-ocamlfuse fuse
    
  2. Generate auth token
    # Generate auth tokens for Colab
    from google.colab import auth
    auth.authenticate_user()
    
  3. Generate Creds for Drive FUSE library

    # Generate creds for the Drive FUSE library.
    from oauth2client.client import GoogleCredentials
    creds = GoogleCredentials.get_application_default()
    import getpass
    
    # Work around misordering of STREAM and STDIN in Jupyter.
    # https://github.com/jupyter/notebook/issues/3159
    prompt = !google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
    vcode = getpass.getpass(prompt[0] + '\n\nEnter verification code: ')
    !echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
    

    Command Line Output:\colon

    Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
    
    Enter verification code: ··········
    Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
    Please enter the verification code: Access token retrieved correctly.
    
  4. Mount Drive to Colab

    # Create a directory and mount Google Drive using that directory.
    !mkdir -p drive
    !google-drive-ocamlfuse drive
    
    print 'Files in Drive:'
    !ls drive/
    
    # Create a file in Drive.
    !echo "This newly created file will appear in your Drive file list." > drive/created.txt
    

Reference

[0] https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb&scrollTo=KHeruhacFpSU

[1] https://github.com/astrada/google-drive-ocamlfuse

results matching ""

    No results matching ""