Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Make Toy Data Structures With Pandas' Testing Module

Give Feedback

Ever find youself setting up fake data to test certain functions you’ve written? Let Pandas do that for you! Pandas’ testing module provides a number of convenient functions for building quasi-realistic Series and DataFrames. After watching this video you’ll know how to quickly create a simple Pandas DataFrame and how to find out, which functions the testing module provides to create fake data.

Sciencificity on April 4, 2020

Hello! Thanks for teaching me about creating fake data. I am not sure if this is a result of a newer version of pandas (I have version 1.0.0) but now the N and K can no longer be set, as done in the video. The _K, _N are attributes in the testing module and set to 4 and 30 respectively. You can change _N by entering nper = 15 in the method call, but I can’t see how to change _K. If you know of a way, let me know, please! Thanks!

Brad Solomon RP Team on April 7, 2020

Hi @Sciencificity, what error are you seeing?

It looks like pandas.util.testing was deprecated in Pandas 1.0 (pandas.pydata.org/docs/whatsnew/v1.0.0.html#deprecations), though you can still set those attributes:

>>> import pandas.util.testing as tm
__main__:1: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
>>> tm.N, tm.K = 15, 3
>>> tm.makeTimeDataFrame(freq='M').head()
                   A         B         C         D
2000-01-31  1.446057  0.660831 -1.395632  0.576446
2000-02-29  0.336925 -0.705131 -0.438653  0.336438
2000-03-31  0.534070  0.433786 -1.367734  0.292544
2000-04-30 -0.508290 -0.130769  0.079307 -0.815311
2000-05-31  1.277667  0.878491  1.372388 -1.640210

This is from pandas 1.0.3 on a python:3-slim-jessie Docker container.

Note also that the “replacement” module (pandas.testing) only exposes assert_extension_array_equal, assert_frame_equal, assert_series_equal, and assert_index_equal.

Brad Solomon RP Team on April 7, 2020

@Sciencificity oops, my mistake, I see what you’re saying now. It looks like setting N and K directly won’t have effect because the new attributes (which the Pandas developers don’t seem to want to be part of the public API) are _N and _K: github.com/pandas-dev/pandas/blob/master/pandas/_testing.py.

Sciencificity on April 14, 2020

Yip, thanks for the feedback Brad. It’s a pity (would be cool to generate fake data up to a number of cols and rows you want), but it’s not a train smash ;). As an aside, if looking for fake data this website is very cool: www.mockaroo.com/ (I have generated dummy data via their site before for testing :)).

Become a Member to join the conversation.