Google Sheets is a very powerful (and free) tool for creating spreadsheets. I’ve almost replaced LibreOffice Calc with Sheets, because it’s very comfortable to work with. Sometimes, a data scientist has to pull some data from a Google Sheet into a Python notebook. In this article, I’ll show you how to do it using just Pandas.
The first thing to do is to create a Google Sheet. For this example, it will contain just 2 columns, one of which (the Age) has one missing value.
This is the dataset we’re going to work with.
Now we have to make it visible over the Internet. For doing this, we just have to go to File->Publish to the web.
Let’s choose “Microsoft Excel (.xlsx)” from the dropdown menu and make the window look like this:
When we click on “Publish”, we’ll be able to copy an URL like the one below:
This is the URL that Google Sheet has created to make our sheet downloadable in Excel format.
We can now move to a Python terminal like Google Colaboratory and use Pandas library to get the contents of the sheet. We can simply use the URL as a value for the first argument of the read_excel function of Pandas.
And this is the result
As you can see, data has been properly injected into our data frame. We have a NaN related to the missing value as well. We don’t need to worry about data types, because they are read from the excel format directly.
With particular arguments, we can read specific sheets, skip some rows and other stuff.
In this simple way, we can connect to a Google Sheet directly from Python without using particular API integration. Obviously, the URL generated by Google Sheet makes the Sheet public, so be careful when you give it to anybody.
Awesome blog I am definitely enjoying your website. keep sharing more articles
Thanks!! This was all I was after rather than needing to connect to GCP or APIs etc, nice and simple.