Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Analyzing and Visualizing

Data Analysis Libraries linked in this lesson:

Visualization Libraries:

Real Python Data Analysis and Visualization Resources:

00:00 In the previous lesson, I showed you some machine learning and scientific computing libraries. In this lesson, I’ll cover more data libraries.

00:09 Data analysis is the process of gathering and analyzing data to make predictions and to find trends to help make decisions. The line between this and some of the scientific computing stuff is blurry, as you can do data analysis on your scientific data.

00:25 A common thing to do with your data is to visualize it. Creating graphs and presentations can help you analyze your data as well as present it to others.

00:33 Data science is a bit of a catch-all term for people who muck with data and create visualizations. There’s a lot of growth in this space, and it’s an active area in Python, both with coders learning data science and data folks learning to code, taking them past Excel to the next level.

00:52 I already mentioned NumPy in the previous lesson, and it sort of fits in this space as well. Data analysis means number crunching. In fact, NumPy has multidimensional arrays, which allow you to do work in memory-like spreadsheets.

01:06 This kind of code is so common that libraries have come about to make it easier. pandas is one of the more popular ones out there that does this. The idea of a spreadsheet in memory is called a DataFrame, and there are several different libraries out there that manipulate DataFrames.

01:22 pandas is kind of the granddaddy of them.

01:25 Another such library is Polars. This is newer than pandas, and in fact, I find it easier to use personally. pandas has some stuff in it that doesn’t quite feel Pythonic to me, and it occasionally leaves me scratching my head about how to accomplish certain things.

01:40 Polars was inspired by pandas, but took a more object-oriented approach. There’s less black magic in it, and I find the code easier to read. Add to that, it’s a very speedy little polar bear and it’s capable of dealing with really large datasets.

01:54 That’s why I tend to use it if I need a DataFrame myself.

02:00 Visualization is the art of turning your data into pictures. This pair of graphs is from our course on using pandas to do astrophysics. This particular graph was built with a library called Matplotlib, and as you can see, it’s capable of doing multiple graphs in a set, multiple lines in a graph, shading sections for highlighting, and much more.

02:21 As I said, that graph I just showed was built with Matplotlib. This is probably the most popular graphing library for Python. It has some quirks to it, but once you get used to those quirks, it’s a very powerful tool.

02:34 Because Matplotlib is a little quirky, there are alternatives out there. I haven’t used Seaborn myself, but I’m told it wraps Matplotlib, allowing you to build your pretty graphs with less code and fewer quirks.

02:48 The next two tools help you build visualizations in your browser. This can be quite handy if you want to share your work with others. I haven’t used either Bokeh or Dash, so I can’t speak to what you should use.

02:59 Both are popular and have loads of good documentation to help you get started.

03:04 Jupyter Notebooks aren’t quite visualization tools specifically, but more of an interactive Python session. It’s like a cross between a REPL and a spreadsheet.

03:13 You put values in cells, do calculations, and then you go back and change the values and everything gets redone. Jupyter often gets lumped in with visualization tools as it integrates well with Matplotlib, so if you’re mucking with a graph, you can do it in a notebook and get live updates as you tinker.

03:32 Real Python has loads of content in the data analysis and visualization space. This learning path shows you how to collect and store data, while this one shows you how to do visualization.

03:43 This learning path is a general introduction to data science, while this one is specific to pandas.

03:51 If you want to learn to use Dash, this tutorial’s a good place to start, while this one teaches you how to use Polars. If you’re looking for some practice, this project-based tutorial shows you how to evaluate grades in a course, while this teaches you a bit about space using pandas and Matplotlib.

04:09 Next up, I’ll talk about crawling and scraping websites.

Become a Member to join the conversation.