February is the shortest month, but it brought no shortage of activity in the Python world! Exciting developments include a new company aiming to improve cloud services for developers, publication of the PyCon US 2023 schedule, and the first release candidate for pandas 2.0.0.
In the world of artifical intelligence, OpenAI has continued to make strides. But while the Big Fix has worked to reduce vulnerabily for programmers, more malicious programs showed up on PyPI.
Read on to dive into the biggest Python news from the last month.
Join Now: Click here to join the Real Python Newsletter and you'll never miss another Python tutorial, course update, or post.
Pydantic Launches Commercial Venture
With over 40 million downloads per month, pydantic is the most-used data validation library in Python. So when its founder, Samuel Colvin, announces successful seed funding, there’s plenty of reason to believe there’s a game changer in the works.
In his announcement, Colvin compares the current state of cloud services to that of a tractor fifteen years after its invention. In both cases, the technology gets the job done, but without much consideration for the person in the driver’s seat.
Colvin’s new company builds on what pydantic has learned about putting developer experience and expertise first. The exact details of this new venture are still under wraps, but it sets out to answer these questions:
What if we could build a platform with the best of all worlds? Taking the final step in reducing the boilerplate and busy-work of building web applications — allowing developers to write nothing more than the core logic which makes their application unique and valuable? (Source)
This doesn’t mean that the open-source project is going anywhere. In fact, pydantic V2 is on its way and will be around seventeen times faster than V1, thanks to being rewritten in Rust.
To stay on up to date on what’s happening with pydantic’s new venture, subscribe to the GitHub issue.
OpenAI Continues to Develop Its Technologies
OpenAI, the company behind DALL·E 2 and ChatGPT, has continued to bring the power of artificial intelligence to programmers.
In February, OpenAI published its tutorials page. Currently, there’s only one tutorial, on using embeddings to answer questions, but more are on the way. As you wait for new tutorials, you can check out the examples gallery and the OpenAI Cookbook on GitHub to get ideas.
Also in February, OpenAI launched ChatGPT Plus for twenty US dollars a month. While free access is still available, the paid version offers use during peak hours, as well as faster response times and priority access to new features and improvements.
If you’re interested in learning more about AI, then check out Real Python’s tutorials on machine learning. You might also enjoy building a chatbot with Python or creating an unbeatable tic-tac-toe player with AI.
PyCon US 2023 Schedule Announced
PyCon US is coming up in April, but you can already register for the in-person or online conference and start planning your learning journey.
The full schedule was released in February, and you may notice some familiar names. In fact, Real Python team member Geir Arne Hjelle is giving a tutorial on Python decorators that you won’t want to miss.
Note: For a sneak peek, check out his Primer on Python Decorators and the associated video course.
This year marks the twentieth anniversary of PyCon US. To celebrate, the conference team is putting together a slideshow, and you’re invited to be part of it. This invitation is open to all attendees, including first-timers, so be sure to contribute!
If you’re planning to attend the conference and want to make sure you have a good experience, then check out How to Get the Most Out of PyCon US.
Malicious PyPI Packages Continue to Appear
The team over at Phylum is staying busy as wrongdoers continue to target programmers, largely through typosquatting and impersonating legitimate packages. Back in August 2022, we reported malware attacks on PyPI, and in November, Phylum reported a separate incident targeting cryptocurrency programs:
After installation, a malicious Javascript file is dropped to the system and executed in the background of any web browsing session. When a developer copies a cryptocurrency address, the address is replaced in the clipboard with the attacker’s address. (Source)
At that time, there were just over two dozen malicious packages. But in early February, Phylum reported a new attack involving over 451 unique packages, largely in cryptocurrency, finance, and web development.
These attacks work similarly to those in November, but automation allowed malicious PyPI users to register several packages almost simultaneously. This is how they were able to target so many packages. For a full list of affected packages, see the Malicious Package List on Phylum’s blog. And to learn more about safety when using PyPI, check out How to Evaluate the Quality of Python Packages.
The Big Fix Boosts Software Security
From February 14 to March 14, the Big Fix was on. Its mission was to fix vulnerabilities in open- and closed-source software. The event aimed to fix over 200,000 vulnerabilities, and it did. At the time of writing, the event boasted 275,924 fixes!
Participants could participate in a Discord group, watch a fix-a-thon live stream, and win prizes. Anyone who fixed at least one security vulnerability was awarded a limited edition Big Fix t-shirt.
While the event is just about over for 2023, be sure to watch for it in 2024. In the meantime, you can learn about creating more secure applications by checking out the learning paths that event sponsor Snyk offers. You can also explore the Open Web Application Security Project (OWASP) playlist that Snyk created for the Big Fix.
Release Candidate for pandas 2.0.0 Announced
If you work with data, you’ve probably pip
installed pandas into countless virtual environments. So a new version of pandas is cause for celebration! In late February, pandas maintainers announced a version 2.0.0 release candidate and strongly encouraged developers who rely on pandas to run their test suites with the release candidate and report any breaking changes before the official release.
This new version brings a handful of exciting developments: interchangable backends, nullable datatypes, and copy-on-write improvements.
Note: To hear from pandas core developer Marc Garcia about the release of pandas 2.0, give The Real Python Podcast Episode 167: Exploring pandas 2.0 & Targets for Apache Arrow a listen.
Traditionally, pandas has stored data in NumPy arrays. Over time, pandas has decoupled more and more from NumPy and can now use Apache Arrow for in-memory data storage instead of NumPy. Some advantages of using Arrow are richer datatypes, better interoperability with other DataFrame libraries, and faster operations. For now, you need to opt in to use Arrow datatypes. For example, by setting a global mode:
>>> import pandas as pd
>>> pd.options.mode.dtype_backend = "pyarrow"
Learn more about the Arrow backend in pandas 2.0 and the Arrow revolution.
When you’re working with real-world datasets, you’ll often run into missing data. Previously, missing data posed a challenge if it represented, say, a Boolean or integer value. That’s because only floating-point data had a null value (NaN
). Now, you can set nullable_dtypes
to True
to automatically convert values to nullable dtypes:
>>> import pandas as pd
>>> pd.read_csv("numbers.csv")
name value
0 thousand 1000.0
1 million 1000000.0
2 bajillion NaN
>>> pd.read_csv("numbers.csv", use_nullable_dtypes=True)
name value
0 thousand 1000
1 million 1000000
2 bajillion <NA>
The first example shows that the integer column value
gets converted to floating-point numbers because of the missing value in the last row. The second example shows how this is handled better with the new data types.
The final big change is increased support for copy-on-write, or lazy copying, meaning that pandas will only copy an object when it’s modified. Copying an object is memory intensive, yet previous implementations of pandas were inconsistent about when an operation would return a view vs a copy. Lazy copying brings a couple of enhancements:
1) a clear and consistent user API (a clear rule: any subset or returned series/dataframe always behaves as a copy of the original, and thus never modifies the original) and 2) improving performance by avoiding excessive copies (eg a chained method workflow would no longer return an actual data copy at each step). (Source)
For a full list of updates, check out What’s new in 2.0.0. Note that some of these changes are also partially available in version 1.5 of pandas.
Conclusion
The Python news desk is always brimming with updates! What’s on your radar from this past month? Are you excited to try the new version of pandas or keep up with what’s happening over at pydantic? Are you building something exciting with OpenAI or working to improve your app’s security? Will we see you at PyCon US 2023? Let us know in the comments!
Join Now: Click here to join the Real Python Newsletter and you'll never miss another Python tutorial, course update, or post.