Load external data: Google Drive, Sheets, and Cloud Storage
Ying-Ting 2018/04/18
a. Load data from local file system
Uploading files from the computer local file system
files.upload
return a dictionary of the files which were uploaded. The dictionary is keyed by the file name, the value is the data which was uploaded.
from
google.colab
import
files
uploaded = files.upload()
for
fn
in
uploaded.keys():
print(
'User uploaded file "{name}" with length {length} bytes'
.format(name=fn, length=len(uploaded[fn])))
執行結果:files.upload() 語法提供上傳檔案到 Colab 的執行環境中
Downloading files to the computer local file system
files.download
will invoke a browser downloaded of the file to user's local computer.
from
google.colab
import
files
with
open(
'example.txt'
,
'w'
)
as
f:
f.write(
'some content'
)
files.download(
'example.txt'
)
執行結果:執行過程中開啟 "example.txt" 文件並寫入一行字 "some content"。最後將檔案下載到自己的電腦中。
b. Load data from Google Drive
Access files in Google Drive using thenative REST APIor a wrapper likePyDrive.
PyDrive
The example below shows #1 authenticate, #2 file upload, and #3 file download from Google Drive. More examples are available in thePyDrive documentation
!pip install -U -q PyDrive
from
pydrive.auth
import
GoogleAuth
from
pydrive.drive
import
GoogleDrive
from
google.colab
import
auth
from
oauth2client.client
import
GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://googledrive.github.io/PyDrive/docs/build/html/index.html
# 2. Create
&
upload a file text file.
uploaded = drive.CreateFile({
'title'
:
'Sample upload.txt'
})
uploaded.SetContentString(
'Sample upload file content'
)
uploaded.Upload()
print(
'Uploaded file with ID {}'
.format(uploaded.get(
'id'
)))
# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({
'id'
: uploaded.get(
'id'
)})
print(
'Downloaded content "{}"'
.format(downloaded.GetContentString()))
執行結果:執行過程會選擇要使用的 Google 帳戶,根據引導取得該帳戶的認證碼,貼回程式下方
執行結果:在 Google Drive 產生一個新檔案 "Sample upload.txt"
執行結果:"Sample upload.txt" 檔案在 Google Drive 中
Drive REST API
Authentication is the first step
from
google.colab
import
auth
auth.authenticate_user()
Then construct a Drive API client.
from
googleapiclient.discovery
import
build
drive_service = build(
'drive'
,
'v3'
)
When client created, can use any of functions in theGoogle Drive API reference.
Create a new Google Drive file with data from Python
# Create a local file.
with
open(
'/tmp/to_upload.txt'
,
'w'
)
as
f:
f.write(
'my sample file'
)
# print out the file content
print(
'/tmp/to_upload.txt contains:'
)
!cat /tmp/to_upload.txt
執行結果:
After executing the code above, a new file named "Sample file" will appear indrive.google.comfile list.
from
googleapiclient.discovery
import
build
drive_service = build(
'drive'
,
'v3'
)
# Upload the file to Drive. See:
#
# https://developers.google.com/drive/v3/reference/files/create
# https://developers.google.com/drive/v3/web/manage-uploads
from
googleapiclient.http
import
MediaFileUpload
file_metadata = {
'name'
:
'Sample file'
,
'mimeType'
:
'text/plain'
}
media = MediaFileUpload(
'/tmp/to_upload.txt'
,
mimetype=
'text/plain'
,
resumable=
True
)
created = drive_service.files().create(body=file_metadata,
media_body=media,
fields=
'id'
).execute()
print(
'File ID: {}'
.format(created.get(
'id'
)))
The file ID in execution result will different with the example code above.
執行結果:
Downloading data from a Google Drive file into Python
Before execute the code below, change thefile_id
field by id with the file in Google Drive.
# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id =
'target_file_id'
import
io
from
googleapiclient.http
import
MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done =
False
while
done
is
False
:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(
0
)
print(
'Downloaded file contents are: {}'
.format(downloaded.read()))
執行結果:
c. Google Sheets
d. Google Cloud Storages (GCS)
Reference