The Google Colab notebook is available here:
Getting Started and Compiling the Data Set
00:00 I’ve set up a Jupyter Notebook on Google Colab, a free service for editing and hosting Jupyter Notebooks. I’ll be referring to it throughout the course. You can follow along by cloning the Notebook located at the URL at the bottom of the screen. Using Google Colab is done for your convenience.
00:18 If you’d like, you can still download the Notebook and run it on other services like Microsoft Azure or locally using a stock Jupyter Notebook server. The advantage of Google Colab is that all of the packages needed to complete the demo are pre-installed. To get started, you just need to connect to a runtime hosted on Google servers with a single click.
00:43 Before starting any data science or machine learning project, you should fully understand the data you are working with. This course will use the Sentiment Labelled Sentences Data Set from the Machine Learning Repository, located at the University of California, Irvine.
You won’t need to labor over the function
get_data(). It simply uses modules from the Python standard library to download the data set ZIP file, save it to disk, and then the
extract_data() function to extract it.
01:16 At this point, you can open the Files tab in Google Colab and see a folder named sentiment labelled sentences, which is the contents of the data set ZIP file. Note that the free tier of Google Colab is a shared service. If your session times out and the resources are reclaimed, you will lose any files that you downloaded. In this case, it’s not that big of a deal because you just download them again, but if you save any data generated by your Notebook, it’s a good idea to download it.
02:06 Each line has a sentence and then a sentiment score. For negative sentiment, the score is 0, while positive sentiment has a 1 score. And there are two other files for data from the Internet Movie Database and Yelp with the same structure.
Notice that even though you are using the
read_csv() function from
pandas, you can specify any separator. The files in the data set are tab-separated, so use the
sep keyword argument to tell Pandas to use the tab character instead.
Become a Member to join the conversation.