Python for social scientists

Python for Social Scientists

by Real Python data-science python

Python is an increasingly popular tool for data analysis in the social scientists. Empowered by a number of libraries that have reached maturity, R and Stata users are increasingly moving to Python in order to take advantage of the beauty, flexibility, and performance of Python without sacrificing the functionality these older programs have accumulated over the years.

But while Python has much to offer, existing Python resources are not always well-suited to the needs of social scientists. With that in mind, I’ve recently created a new resource—www.data-analysis-in-python.org (DAP)—tailored specifically to the goals and desires of the social scientist Python user.

The site is not a new set of tutorials, however—there are more than enough Python tutorials in the world. Rather, the aim of the site is to curate and annotate existing resources, and to provide users guidance on what topics to focus on and which to skip.

Why a Site for Social Scientists?

Social scientists—and indeed, most data scientists—spend most of their time trying to wrestle individual, idiosyncratic datasets into the shape needed to run statistical analyses. This makes the way most social scientists use Python fundamentally different from how it is used by most software developers.

Social scientists are primarily interested in writing relatively simple programs (scripts) that execute a series of commands (recoding variables, merging datasets, parsing text documents, etc.) to wrangle their data into a form they can analyze. And because they are usually writing their scripts for a specific, idiosyncratic application and set of data, they are generally not focused on writing code with lots of abstractions.

Social scientists, in other words, tend to be primarily interested in learning to use existing tools effectively, not develop new ones.

Because of this, social scientists learning Python tend to have different priorities in terms of skill development than software developers. Yet most tutorials online were written for developers or computer science students, so one of the aims of DAP is to provide social scientists with some guidance on the skills they should prioritize in their early training. In particular, DAP suggests:

Need immediately:

  • Data types: integers, floats, strings, booleans, lists, dictionaries, and sets (tuples are kinda optional)
  • Defining functions
  • Writing loops
  • Understanding mutable versus immutable data types
  • Methods for manipulating strings
  • Importing third party modules
  • Reading and interpreting errors

Things you’ll want to know at some point, but not necessary immediately:

  • Advanced debugging utilities (like pdb)
  • File input / output (most libraries you’ll use have tools to simplify this for you)

Don’t need:

  • Defining or writing classes
  • Understanding exceptions

Pandas

Today, most empirical social science remains organized around tabular data, meaning data that is presented with a different variable in each column and a different observation in each row. As a result, many social scientists using Python are a little confused when they don’t find a tabular data structure covered in their intro to Python tutorial. To address this confusion, DAP does its best to introduce users to the pandas library as fast as possible, providing links to tutorials and a few tips on gotchas to watch out for.

The pandas library replicates much of the functionality that social scientists are used to finding in Stata or R—data can be represented in a tabular format, column variables can be easily labeled, and columns of different types (like floats and strings) can be combined in the same dataset.

pandas is also the gateway to many other tools social scientists are likely to use, like graphing libraries (seaborn and ggplot2) and the statsmodels econometrics library.

Other Libraries by Research Area

While all social scientists who wish to work with Python will need to understand the core language and most will want to be familiar with pandas, the Python eco-system is full of application-specific libraries that will only be of use to a subset of users. With that in mind, DAP provides an overview of libraries to help researchers working in different topic areas, along with links to materials on optimal use, and guidance on relevant considerations:

Want to Get Involved?

This site is young, so we are anxious for as much input as possible on content and design. If you have experience in this area you want to share please drop me an email or comment on Github.


This is a guest blog post by Nick Eubank, a Post-Doctoral Fellow at the Center for the Study of Democratic Institutions at Vanderbilt University

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About The Team

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Support Free Python Education...

Real Python brings you free, book-quality tutorials and in-depth articles about Python programming every single week. Everyone on our editorial team gets paid for their work—from our authors, to our editors and proof readers, our designers, community team, and web developers.

We do not believe in spammy ad banners from the big advertising networks. We don’t secretly mine Bitcoin in your browser to cover our hosting costs… And unlike many other publications, we haven’t put up a paywall—we want to keep our educational content as open as we can.

Help make sustainable programming journalism and education a reality by supporting us with a small monthly contribution. For as little as $1, you can support Real Python—and it only takes a minute. Thank you.

VISA Discover American Express Maestro PayPal
Support Real Python →

What Do You Think?

Real Python Comment Policy: The most useful comments are those written with the goal of learning from or helping out other readers—after reading the whole article and all the earlier comments. Complaints and insults generally won’t make the cut here.

Boost Your Python Skills

Master Python 3 and write more Pythonic code with our in-depth books and video courses:

Get Python Books & Courses »

Keep Reading