Automate Python Data Analysis With YData Profiling

Automate Python Data Analysis With YData Profiling

The YData Profiling package generates an exploratory data analysis (EDA) report with a few lines of code. The report provides dataset and column-level analysis, including plots and summary statistics to help you quickly understand your dataset. These reports can be exported to HTML or JSON so you can share them with other stakeholders.

By the end of this tutorial, you’ll understand that:

  • YData Profiling generates interactive reports containing EDA results, including summary statistics, visualizations, correlation matrices, and data quality warnings from DataFrames.
  • ProfileReport creates a profile you can save with .to_file() for HTML or JSON export, or display inline with .to_notebook_iframe().
  • Setting tsmode=True and specifying a date column with sortby enables time series analysis, including stationarity tests and seasonality detection.
  • The .compare() method generates side-by-side reports highlighting distribution shifts and statistical differences between datasets.

To get the most out of this tutorial, you’ll benefit from having knowledge of pandas.

You can install this package using pip:

Shell
$ python -m pip install ydata-profiling

Once installed, you’re ready to transform any pandas DataFrame into an interactive report. To follow along, download the example dataset you’ll work with by clicking the link below:

The following example generates a profiling report from the 2024 flight delay dataset and saves it to disk:

Python flight_report.py
import pandas as pd
from ydata_profiling import ProfileReport

df = pd.read_csv("flight_data_2024_sample.csv")

profile = ProfileReport(df)
profile.to_file("flight_report.html")

This code generates an HTML file containing interactive visualizations, statistical summaries, and data quality warnings:

Dataset overview displaying statistics and variable types. Statistics include 35 variables, 10,000 observations, and 3.2% missing cells. Variable types: 5 categorical, 23 numeric, 1 DateTime, 6 text.

You can open the file in any browser to explore your data’s characteristics without writing additional analysis code.

There are a number of tools available for high-level dataset exploration, but not all are built for the same purpose. The following table highlights a few common options and when each one is a good fit:

Use case Pick Best for
You want to quickly generate an exploratory report ydata-profiling Generating exploratory data analysis reports with visualizations
You want an overview of a large dataset skimpy or df.describe() Providing fast, lightweight summaries in the console
You want to enforce data quality pandera Validating schemas and catching errors in data pipelines

Overall, YData Profiling is best used as an exploratory report creation tool. If you’re looking to generate an overview for a large dataset, using SkimPy or a built-in DataFrame library method may be more efficient. Other tools, like Pandera, are more appropriate for data validation.

If YData Profiling looks like the right choice for your use case, then keep reading to learn about its most important features.

Building a Report With YData Profiling

A YData Profiling report is composed of several sections that summarize different aspects of your dataset. Before customizing a report, it helps to understand the main components it includes and what each one is designed to show.

Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Article

Already a member? Sign-In

Locked learning resources

The full article is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Article

Already a member? Sign-In

About

Isabel is an open source tool builder passionate about data science.

» More about

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

What Do You Think?

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Become a Member to join the conversation.

Keep Learning