Python Data Science Tutorials

You use Python to explore, analyze, and visualize data with pandas, NumPy, SciPy, and Jupyter. Create clear charts with Matplotlib and Seaborn, clean messy datasets, and write tests so analyses are repeatable. Work through practical tasks like feature engineering, time series, and text processing while using virtual environments to keep tooling reliable.

Join Now: Click here to join the Real Python Newsletter and you’ll never miss another Python tutorial, course, or news update.

When you are ready to model, apply scikit-learn for classification, regression, clustering, and pipelines. For deep learning, train with Keras, TensorFlow, or PyTorch and track results. Scale workloads with Dask, store data in SQLite, PostgreSQL, and deploy predictions with FastAPI and Docker.

Browse all resources below, or commit to a guided Learning Path with progress tracking:

Learning Path

Data Science With Python Core Skills

20 Resources ⋅ Skills: Pandas, NumPy, Data Cleaning, Data Visualization, Statistics

Learning Path

Math for Data Science

5 Resources ⋅ Skills: Statistics, Correlation, Linear Regression, Logistic Regression, NumPy, SciPy, pandas, Gradient Descent

Learning Path

pandas for Data Science

15 Resources ⋅ Skills: pandas, Data Science, Data Visualization, DataFrame, GroupBy, Data Cleaning

Install pandas with python -m pip install pandas. Read files using pd.read_csv() or pd.read_parquet(), inspect with df.info() and df.describe(), and summarize with groupby() and agg().

Use scikit-learn for classical ML tasks and pipelines. Choose TensorFlow or PyTorch for deep learning, and consider XGBoost for strong tabular baselines.

Start with Matplotlib for full control and Seaborn for quick, statistical plots. Set styles, labels, and legends, and export figures with plt.savefig() for reports and dashboards.

Use Dask for pandas-like processing on larger-than-memory data, or PySpark when you need a cluster. For single-machine workflows, stream with chunksize, downcast dtypes, and store data as Parquet.

Serialize the model with joblib.dump(), load it in a FastAPI app, and expose a POST /predict endpoint. Run with Uvicorn or behind Gunicorn, and containerize with Docker for consistent releases.

Python Data Science Tutorials

Data Science With Python Core Skills

Math for Data Science

pandas for Data Science

The Real Python Podcast – Episode #171: Making Each Line of Code Efficient & Python In Excel

The Real Python Podcast – Episode #169: Improving Classification Models With XGBoost

Python Polars: A Lightning-Fast DataFrame Library

The Real Python Podcast – Episode #167: Exploring pandas 2.0 & Targets for Apache Arrow

Creating Web Maps From Your Data With Python Folium

The Real Python Podcast – Episode #162: Exploring the Zen of Python & pandas Features for Finance

Using the NumPy Random Number Generator

Using k-Nearest Neighbors (kNN) in Python

The Real Python Podcast – Episode #150: Lessons Learned From Four Years Programming With Python

The Real Python Podcast – Episode #146: Using NumPy and Linear Algebra for Faster Python Code

How to Iterate Over Rows in pandas, and Why You Shouldn't

The Real Python Podcast – Episode #142: Orchestrating Large and Small Projects With Apache Airflow

Linear Algebra in Python: Matrix Inverses and Least Squares

Working With Linear Systems in Python With scipy.linalg

The Real Python Podcast – Episode #140: Speeding Up Your DataFrames With Polars

The Real Python Podcast – Episode #135: Preparing Data to Measure True Machine Learning Model Performance

Microsoft Power BI and Python: Two Superpowers Combined

ChatterBot: Build a Chatbot With Python

The Real Python Podcast – Episode #121: Moving NLP Forward With Transformer Models and Attention

The Real Python Podcast – Episode #119: Natural Language Processing and How ML Models Understand Text

Combining Data in pandas With concat() and merge()

The Real Python Podcast – Episode #113: Build Streamlit Data Science Dashboards & Verbose Regex f-Strings

A First Look at PyScript: Python in the Web Browser

The Real Python Podcast – Episode #112: Managing Large Python Data Science Projects With Dask

Data Cleaning With pandas and NumPy

Linear Regression in Python

Combining Data in pandas With merge(), .join(), and concat()

The Real Python Podcast – Episode #103: Becoming More Effective at Manipulating Data With Pandas

Sorting Data in Python With pandas

Starting With Linear Regression in Python

The Real Python Podcast – Episode #96: Manipulating and Analyzing Audio in Python

Data Visualization Interfaces in Python With Dash

Building a Neural Network & Making Predictions With Python AI

Graph Your Data With Python and ggplot

The Real Python Podcast – Episode #76: Harnessing Python's math Module and Exposing Practical Pandas Functions

Splitting Datasets With scikit-learn and train_test_split()

Reading and Writing Files With pandas

The pandas DataFrame: Working With Data Efficiently

Speech Recognition With Python

The Real Python Podcast – Episode #68: Exploring the functools Module and Complex Numbers in Python

The Real Python Podcast – Episode #65: Expanding the International Python Community With the PSF

Using pandas to Make a Gradebook in Python

The Real Python Podcast – Episode #64: Detecting Deforestation With Python & Using GraphQL With Django and Vue

Explore Your Dataset With pandas

The Real Python Podcast – Episode #61: Scaling Data Science and Machine Learning Infrastructure Like Netflix

Natural Language Processing With Python's NLTK Package

Learn Text Classification With Python and Keras

The k-Nearest Neighbors (kNN) Algorithm in Python