Load external data: Google Drive, Sheets, and Cloud Storage

Ying-Ting 2018/04/18

a. Load data from local file system

Uploading files from the computer local file system

files.uploadreturn a dictionary of the files which were uploaded. The dictionary is keyed by the file name, the value is the data which was uploaded.

from
 google.colab 
import
 files

uploaded = files.upload()


for
 fn 
in
 uploaded.keys():
    print(
'User uploaded file "{name}" with length {length} bytes'
.format(name=fn, length=len(uploaded[fn])))

執行結果:files.upload() 語法提供上傳檔案到 Colab 的執行環境中

Downloading files to the computer local file system

files.downloadwill invoke a browser downloaded of the file to user's local computer.

from
 google.colab 
import
 files


with
 open(
'example.txt'
, 
'w'
) 
as
 f:
    f.write(
'some content'
)

files.download(
'example.txt'
)

執行結果:執行過程中開啟 "example.txt" 文件並寫入一行字 "some content"。最後將檔案下載到自己的電腦中。

b. Load data from Google Drive

Access files in Google Drive using thenative REST APIor a wrapper likePyDrive.

PyDrive

The example below shows #1 authenticate, #2 file upload, and #3 file download from Google Drive. More examples are available in thePyDrive documentation

!pip install -U -q PyDrive


from
 pydrive.auth 
import
 GoogleAuth

from
 pydrive.drive 
import
 GoogleDrive

from
 google.colab 
import
 auth

from
 oauth2client.client 
import
 GoogleCredentials


# 1. Authenticate and create the PyDrive client.

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


# PyDrive reference:
# https://googledrive.github.io/PyDrive/docs/build/html/index.html
# 2. Create 
&
 upload a file text file.

uploaded = drive.CreateFile({
'title'
: 
'Sample upload.txt'
})
uploaded.SetContentString(
'Sample upload file content'
)
uploaded.Upload()
print(
'Uploaded file with ID {}'
.format(uploaded.get(
'id'
)))


# 3. Load a file by ID and print its contents.

downloaded = drive.CreateFile({
'id'
: uploaded.get(
'id'
)})
print(
'Downloaded content "{}"'
.format(downloaded.GetContentString()))

執行結果:執行過程會選擇要使用的 Google 帳戶,根據引導取得該帳戶的認證碼,貼回程式下方

執行結果:在 Google Drive 產生一個新檔案 "Sample upload.txt"

執行結果:"Sample upload.txt" 檔案在 Google Drive 中

Drive REST API

Authentication is the first step

from
 google.colab 
import
 auth
auth.authenticate_user()

Then construct a Drive API client.

from
 googleapiclient.discovery 
import
 build
drive_service = build(
'drive'
, 
'v3'
)

When client created, can use any of functions in theGoogle Drive API reference.

Create a new Google Drive file with data from Python

# Create a local file.
with
 open(
'/tmp/to_upload.txt'
, 
'w'
) 
as
 f:
  f.write(
'my sample file'
)


# print out the file content

print(
'/tmp/to_upload.txt contains:'
)
!cat /tmp/to_upload.txt

執行結果:

After executing the code above, a new file named "Sample file" will appear indrive.google.comfile list.

from
 googleapiclient.discovery 
import
 build
drive_service = build(
'drive'
, 
'v3'
)


# Upload the file to Drive. See:
#
# https://developers.google.com/drive/v3/reference/files/create
# https://developers.google.com/drive/v3/web/manage-uploads
from
 googleapiclient.http 
import
 MediaFileUpload

file_metadata = {

'name'
: 
'Sample file'
,

'mimeType'
: 
'text/plain'

}
media = MediaFileUpload(
'/tmp/to_upload.txt'
, 
                        mimetype=
'text/plain'
,
                        resumable=
True
)
created = drive_service.files().create(body=file_metadata,
                                       media_body=media,
                                       fields=
'id'
).execute()
print(
'File ID: {}'
.format(created.get(
'id'
)))

The file ID in execution result will different with the example code above.

執行結果:

Downloading data from a Google Drive file into Python

Before execute the code below, change thefile_idfield by id with the file in Google Drive.

# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz

file_id = 
'target_file_id'
import
 io

from
 googleapiclient.http 
import
 MediaIoBaseDownload

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = 
False
while
 done 
is
False
:

# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)

  _, done = downloader.next_chunk()

downloaded.seek(
0
)
print(
'Downloaded file contents are: {}'
.format(downloaded.read()))

執行結果:

c. Google Sheets

d. Google Cloud Storages (GCS)


Reference

[0]https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb&scrollTo=KHeruhacFpSU

results matching ""

    No results matching ""