Load external data into Colab from PC, Google Drive, Google Sheets, and Google Cloud Storage
Ying-Ting 2018/04/18
有四種方式將檔案上傳至colab 平台:
1 從PC上傳 (load data from local file system)
2 從google drive上傳 (Load data from Google Drive)
3 從google sheet上傳
4
a. Load data from local file system
Uploading files from the computer local file system
files.upload
return a dictionary of the files which were uploaded. The dictionary is keyed by the file name, the value is the data which was uploaded.
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))
執行結果:files.upload() 語法提供上傳檔案到 Colab 的執行環境中
Downloading files to the computer local file system
files.download
will invoke a browser downloaded of the file to user's local computer.
from google.colab import files
with open('example.txt', 'w') as f:
f.write('some content')
files.download('example.txt')
執行結果:執行過程中開啟 "example.txt" 文件並寫入一行字 "some content"。最後將檔案下載到自己的電腦中。
b. Load data from Google Drive
Access files in Google Drive using the native REST API or a wrapper like PyDrive.
PyDrive
The example below shows #1 authenticate, #2 file upload, and #3 file download from Google Drive. More examples are available in the PyDrive documentation
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://googledrive.github.io/PyDrive/docs/build/html/index.html
# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))
執行結果:執行過程會選擇要使用的 Google 帳戶,根據引導取得該帳戶的認證碼,貼回程式下方
執行結果:在 Google Drive 產生一個新檔案 "Sample upload.txt"
執行結果:"Sample upload.txt" 檔案在 Google Drive 中
Drive REST API
Authentication is the first step
from google.colab import auth
auth.authenticate_user()
Then construct a Drive API client.
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
When client created, can use any of functions in the Google Drive API reference.
Create a new Google Drive file with data from Python
# Create a local file.
with open('/tmp/to_upload.txt', 'w') as f:
f.write('my sample file')
# print out the file content
print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt
執行結果:
After executing the code above, a new file named "Sample file" will appear in drive.google.com file list.
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
# Upload the file to Drive. See:
#
# https://developers.google.com/drive/v3/reference/files/create
# https://developers.google.com/drive/v3/web/manage-uploads
from googleapiclient.http import MediaFileUpload
file_metadata = {
'name': 'Sample file',
'mimeType': 'text/plain'
}
media = MediaFileUpload('/tmp/to_upload.txt',
mimetype='text/plain',
resumable=True)
created = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
print('File ID: {}'.format(created.get('id')))
The file ID in execution result will different with the example code above.
執行結果:
Downloading data from a Google Drive file into Python
Before execute the code below, change the file_id
field by id with the file in Google Drive.
# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'target_file_id'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))
執行結果:
c. Google Sheets
d. Google Cloud Storages (GCS)
Mount Google Drive to Colab
google-drive-ocamlfuse[1] is a FUSE filesystem backed by Google Drive, written in OCaml. It lets you mount your Google Drive on Linux.
Features (see what's new in version 0.6.x)
- Full read/write access to ordinary files and folders
- Read-only access to Google Docs, Sheets, and Slides (exported to configurable formats)
- Multiple account support
- Duplicate file handling
- Accessto trash (
.Trash
directory) - Unix permission and ownership
- Symbolic links
- Read-ahead buffers when streaming
- Accessing content shared with you (requires configuration)
- Team Drive Support
- Install Fuse Wrapper
# Install a Drive FUSE wrapper. # https://github.com/astrada/google-drive-ocamlfuse !apt-get update -qq 2>&1 > /dev/null !apt-get install -y -qq software-properties-common python-software-properties module-init-tools !add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null !apt-get update -qq 2>&1 > /dev/null !apt-get -y install -qq google-drive-ocamlfuse fuse
- Generate auth token
# Generate auth tokens for Colab from google.colab import auth auth.authenticate_user()
Generate Creds for Drive FUSE library
# Generate creds for the Drive FUSE library. from oauth2client.client import GoogleCredentials creds = GoogleCredentials.get_application_default() import getpass
# Work around misordering of STREAM and STDIN in Jupyter. # https://github.com/jupyter/notebook/issues/3159 prompt = !google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL vcode = getpass.getpass(prompt[0] + '\n\nEnter verification code: ') !echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
Command Line Output
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force Enter verification code: ·········· Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force Please enter the verification code: Access token retrieved correctly.
Mount Drive to Colab
# Create a directory and mount Google Drive using that directory. !mkdir -p drive !google-drive-ocamlfuse drive print 'Files in Drive:' !ls drive/ # Create a file in Drive. !echo "This newly created file will appear in your Drive file list." > drive/created.txt
Reference