Using Google Cloud Platform and BigQuery with Jupyter Notebook

MIKE ARMISTEAD
2 min readNov 9, 2020

I recently took a deep dive in the rabbit hole that is SQL. As I was practicing and mastering my query skills a practice problem talked about using BigQuery. I had no idea what BigQuery was so I looked into it and found that Google has a lot of large datasets that you can use for projects. This was perfect for me because I had just started looking for somewhere to start a new project. Usually when I find a dataset it is in a CSV file and I download the file and analysis it with Pandas in Jupyter Notebook. That is not the case with BigQuery and that is because that datasets are extremely large so I had to put my SQL skills to the test and only pull what I wanted to my Jupyter Notebook.

But the how do we get the dataset to Jupyter Notebook? First thing that we need to do is create an account with Google Cloud. After that you need to install Google SDK which gives you tools and libraries to work with BigQuery. Once that is squared away you’ll first want to install Google Cloud BigQuery. This can be done through the terminal with

pip install — — upgrade google-cloud-bigquery

From there you’ll go to your Google Cloud Counsel and go to IAM and Admin and go to service accounts. You are going to create a new service account to connect to Google Cloud from Jupyter Notebook. You will need to name the account and then hit create. Next you will select a role and select BigQuery and BigQuery User. Next you use the email that you set up the Google Cloud account with and then we create a JSON key and download it locally so we can connect later.

Now we go back to our terminal and need to use our JSON key.

nano ~/.bash_profile

Go to the bottom of the bash profile and add

export GOOGLE_APPLICATION_CREDENTIALS=’the_location_of_where_you_downloaded_the_JSON_key’

Make sure the location to the JSON key is exact or else you are going to have trouble connecting.

Finally we can go into our Jupyter Notebook and connect to BigQuery

import pandas as pd
FROM google.cloud import bigquery
client=bigquery.Client(project=’name of project you will be working on in Google Cloud’)
Dataset_ref=client.dataset(’name of empty dataset you will store the data that is being taken from BigQuery’)
QUERY=”””SELECT *
FROM name_of_public_dataset_on_bigquery
LIMIT 100"””
q=client.query(QUERY)
results=q.results()
df=results.to_dataframe()
df.head()

Once you get through all of that you should be able to see the first 5 rows of the BigQuery that you pulled.

--

--