
Episode 264: Large Language Models on the Edge of the Scaling Laws
The Real Python Podcast
What’s happening with the latest releases of large language models? Is the industry hitting the edge of the scaling laws, and do the current benchmarks provide reliable performance assessments? This week on the show, Jodie Burchell returns to discuss the current state of LLM releases.
Episode Sponsor:
The most recent release of GPT-5 has been a wake-up call for the LLM industry. We discuss how the current scaling of these systems is reaching a diminishing edge. Jodie also shares how many AI model assessments and benchmarks are flawed. We also take a sober look at the productivity gains from using these tools for software development within companies.
We discuss how newer developers should consider additional factors when looking at the current job market. Jodie digs into how economic changes and rising interest rates are influencing layoffs and hiring freezes. Then we share a wide collection of resources for you to continue exploring these topics.
This episode is sponsored by Influxdata.
Course Spotlight: Exploring Python Closures: Examples and Use Cases
Learn about Python closures: function-like objects with extended scope used for decorators, factories, and stateful functions.
Topics:
- 00:00:00 – Introduction
- 00:03:00 – Recent conferences and talks
- 00:04:18 – What’s going on with LLMs?
- 00:06:06 – What happened with the GPT-5 release?
- 00:08:14 – Simon Willison - 2025 in LLMs so far
- 00:09:00 – How did we get here?
- 00:10:37 – OpenAI’s and scaling laws
- 00:12:25 – Pivoting to post-training
- 00:16:01 – Some history of AI eras
- 00:17:54 – Issues with measuring performance and benchmarks
- 00:22:19 – Chatbot Arena
- 00:24:06 – Languages are finite
- 00:26:22 – LLMs and the illusion of humanity
- 00:30:41 – Sponsor: Influxdata
- 00:31:34 – Types of solutions to move past these limits
- 00:36:57 – Does AI actually boost developer productivity?
- 00:44:19 – Agentic Al Programming with Python
- 00:48:02 – Results of non-programmers vibe coding
- 00:50:18 – Back to the concept of overfitting
- 00:52:52 – The money involved in training
- 00:56:50 – Video Course Spotlight
- 00:58:21 – Deepseek and new methods of training
- 01:01:02 – Quantizing and fitting on a local machine
- 01:04:48 – The layoffs and the economic changes
- 01:10:32 – AI implementation failures
- 01:21:01 – Don’t doubt yourself as a developer
- 01:24:06 – What are you excited about in the world of Python?
- 01:25:39 – What do you want to learn next?
- 01:26:42 – What’s the best way to follow your work online?
- 01:27:04 – Thanks and goodbye
Survey:
Show Links:
- EuroPython 2025 - July 14th-20th 2025 - Prague, Czech Republic & Remote
- Episode #232: Exploring Modern Sentiment Analysis Approaches in Python
- GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it.
- GPT 5’s Rocky Launch Highlights AI Disillusionment - IEEE Spectrum
- 2025 in LLMs so far, illustrated by Pelicans on Bicycles — Simon Willison
- Attention is All You Need - Google
- Scaling laws for neural language models - OpenAI
- What if AI Doesn’t Get Much Better Than This? - Cal Newport
- Hiltzik: AI hype is fading fast - Los Angeles Times
- Does AI Actually Boost Developer Productivity? (100k Devs Study) - Yegor Denisov-Blanch, Stanford - YouTube
- Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR
- Amazon Cloud Chief: Replacing Junior Staff With AI Is ‘Dumbest’ Idea - Business Insider
- 20 LLM evaluation benchmarks and how they work
- MMLU - Measuring Massive Multitask Language Understanding
- HellaSwag: Can a Machine Really Finish Your Sentence?
- Mechanical Turk - Wikipedia
- Amazon Mechanical Turk
- Chatbot Arena - LMArena
- LLMs Can’t Reason - The Reversal Curse, The Alice In Wonderland Test, And The ARC - AGI Challenge - CustomGPT
- Mirror, mirror: LLMs and the illusion of humanity - Jodie Burchell - NDC Oslo 2024 - YouTube
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
- Context Rot: How Increasing Input Tokens Impacts LLM Performance - YouTube
- Does AI Actually Boost Developer Productivity? (100k Devs Study) - Yegor Denisov-Blanch, Stanford - YouTube
- AWS CEO says no more programmers in 2 years - Tech Industry - Blind
- MIT report: 95% of generative AI pilots at companies are failing - Fortune
- Agentic Al Programming with Python - Talk Python To Me Podcast
- Vibe coding through the GPT-5 mess - The Verge
- Overfitting - Wikipedia
- Andrej Karpathy - Busy Person’s Intro to LLMs - YouTube
- AI Isn’t Taking Your Job – The Economy Is - Andrew Stiefel
- Commonwealth Bank backtracks on AI job cuts, apologizes for ‘error’ as call volumes rise - ABC News
- Klarna CEO Reverses Course By Hiring More Humans, Not AI | Entrepreneur
- Has Duolingo Lost Its Streak? - Matt Jones - Medium
- McDonald’s removes AI drive-throughs after order errors
- OpenAI Usage Plummets in the Summer, When Students Aren’t Cheating on Homework
- What Happened When I Tried to Replace Myself with ChatGPT in My English Classroom - Literary Hub
- Learning to code in the age of AI — Sheena O’Connell - YouTube
- Jodie Burchell - The JetBrains Blog
- Jodie Burchell’s Blog - Standard error
- Jodie Burchell (@t-redactyl.bsky.social) — Bluesky
- Jodie Burchell 🇦🇺🇩🇪 (@t_redactyl@fosstodon.org) - Fosstodon
- JetBrains: Essential tools for software developers and teams