Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Using Python's pathlib Module
Working with files and interacting with the file system are common tasks for Python developers. Some cases may involve only reading or writing files, but sometimes more complex tasks are at hand. Maybe you need to list all files of a given type in a directory, find the parent directory of a given file, or create a unique filename that doesn’t already exist. That’s where pathlib
comes in.
The pathlib
module is part of Python’s standard library, and it helps you deal with all those challenges. It gathers the necessary functionality in one place and makes it available through methods and properties on a convenient Path
object.
In this tutorial, you’ll learn how to:
- Work with file and directory paths in Python
- Instantiate a
Path
object in different ways - Use
pathlib
to read and write files - Carefully copy, move, and delete files
- Manipulate paths and the underlying file system
- Pick out components of a path
You’ll also explore a bunch of code examples in this tutorial, which you can use for your everyday file operations. For example, you’ll dive into counting files, finding the most recently modified file in a directory, and creating unique filenames.
It’s great that pathlib
offers so many methods and properties, but they can be hard to remember on the fly. That’s where a cheat sheet can come in handy. To get yours, click the link below:
Free Download: Click here to claim your pathlib
cheat sheet so you can tame the file system with Python.
The Problem With Representing Paths as Strings
With Python’s pathlib
, you can save yourself some headaches. Its flexible Path
class paves the way for intuitive semantics. Before you have a closer look at the class, take a moment to see how Python developers had to deal with paths before pathlib
was around.
Traditionally, Python has represented file paths using regular text strings. However, since paths are more than plain strings, important functionality was spread all around the standard library, including in libraries like os
, glob
, and shutil
.
As an example, the following code block moves files into a subfolder:
import glob
import os
import shutil
for file_name in glob.glob("*.txt"):
new_path = os.path.join("archive", file_name)
shutil.move(file_name, new_path)
You need three import
statements in order to move all the text files to an archive directory.
Python’s pathlib
provides a Path
class that works the same way on different operating systems.
Instead of importing different modules such as glob
, os
, and shutil
, you can perform the same tasks by using pathlib
alone:
from pathlib import Path
for file_path in Path.cwd().glob("*.txt"):
new_path = Path("archive") / file_path.name
file_path.replace(new_path)
Just as in the first example, this code finds all the text files in the current directory and moves them to an archive/
subdirectory.
However, with pathlib
, you accomplish these tasks with fewer import
statements and more straightforward syntax, which you’ll explore in depth in the upcoming sections.
Path Instantiation With Python’s pathlib
One motivation behind pathlib
is to represent the file system with dedicated objects instead of strings. Fittingly, the official documentation of pathlib
is called pathlib
— Object-oriented filesystem paths.
The object-oriented approach is already quite visible when you contrast the pathlib
syntax with the old os.path
way of doing things. It gets even more obvious when you note that the heart of pathlib
is the Path
class:
If you’ve never used this module before or just aren’t sure which class is right for your task,
Path
is most likely what you need. (Source)
In fact, Path
is so frequently used that you usually import it directly:
>>> from pathlib import Path
>>> Path
<class 'pathlib.Path'>
Because you’ll mainly be working with the Path
class of pathlib
, this way of importing Path
saves you a few keystrokes in your code. This way, you can work with Path
directly, rather than importing pathlib
as a module and referring to pathlib.Path
.
There are a few different ways of instantiating a Path
object. In this section, you’ll explore how to create paths by using class methods, passing in strings, or joining path components.
Using Path Methods
Once you’ve imported Path
, you can make use of existing methods to get the current working directory or your user’s home directory.
The current working directory is the directory in the file system that the current process is operating in. You’ll need to programmatically determine the current working directory if, for example, you want to create or open a file in the same directory as the script that’s being executed.
Additionally, it’s useful to know your user’s home directory when working with files. Using the home directory as a starting point, you can specify paths that’ll work on different machines, independent of any specific usernames.
To get your current working directory, you can use .cwd()
:
When you instantiate pathlib.Path
, you get either a WindowsPath
or a PosixPath
object.
The kind of object will depend on which operating system you’re using.
On Windows, .cwd()
returns a WindowsPath
. On Linux and macOS, you get a PosixPath
.
Despite the differences under the hood, these objects provide identical interfaces for you to work with.
It’s possible to ask for a WindowsPath
or a PosixPath
explicitly, but you’ll only be limiting your code to that system without gaining any benefits. A concrete path like this won’t work on a different system:
>>> import pathlib
>>> pathlib.WindowsPath("test.md")
Traceback (most recent call last):
...
NotImplementedError: cannot instantiate 'WindowsPath' on your system
But what if you want to manipulate Unix paths on a Windows machine, or vice versa? In that case, you can directly instantiate PureWindowsPath
or PurePosixPath
on any system.
When you make a path like this, you create a PurePath
object under the hood. You can use such an object if you need a representation of a path without access to the underlying file system.
Generally, it’s a good idea to use Path
. With Path
, you instantiate a concrete path for the platform that you’re using while also keeping your code platform-independent. Concrete paths allow you to do system calls on path objects, but pure paths only allow you to manipulate paths without accessing the operating system.
Working with platform-independent paths means that you can write a script on Windows that uses Path.cwd()
, and it’ll work correctly when you run the file on macOS or Linux.
The same is true for .home()
:
With Path.cwd()
and Path.home()
, you can conveniently get a starting point for your Python scripts.
In cases where you need to spell paths out or reference a subdirectory structure, you can instantiate Path
with a string.
Passing in a String
Instead of starting in your user’s home directory or your current working directory, you can point to a directory or file directly by passing its string representation into Path
:
This process creates a Path
object. Instead of having to deal with a string, you can now work with the flexibility that pathlib
offers.
On Windows, the path separator is a backslash (\
). However, in many contexts, the backslash is also used as an escape character to represent non-printable characters. To avoid problems, use raw string literals to represent Windows paths:
>>> r"C:\Users"
'C:\\Users'
A string with an r
in front of it is a raw string literal. In raw string literals, the \
represents a literal backslash. In a normal string, you’d need to use two backslashes (\\
) to indicate that you want to use the backslash literally and not as an escape character.
Note: An idiomatic way of working with the current module’s location as the path is using __file__
:
# hello.py
from pathlib import Path
print(f"You can find me here: {Path(__file__).parent}!")
The __file__
attribute contains the path to the file that Python is currently importing or executing. You can pass in __file__
to Path
when you need to work with the path to the module itself. For example, maybe you want to get the parent directory with .parent
.
You may have already noticed that although you enter paths on Windows with backslashes, pathlib
represents them with the forward slash (/
) as the path separator. This representation is named POSIX style.
POSIX stands for Portable Operating System Interface, which is a standard for maintaining the compability between operating systems. The standard covers much more than path representation. You can learn more about it in Open Group Base Specifications Issue 7.
Still, when you convert a path back to a string, it’ll use the native form—for example, with backslashes on Windows:
>>> from pathlib import Path
>>> str(Path(r"C:\Users\gahjelle\realpython\file.txt"))
'C:\\Users\\gahjelle\\realpython\\file.txt'
In general, you should try to use Path
objects as much as possible in your code to take advantage of their benefits, but converting them to strings can be necessary in certain contexts. Some libraries and APIs still expect you to pass file paths as strings, so you may need to convert a Path
object to a string before passing it to certain functions.
Joining Paths
A third way to construct a path is to join the parts of the path using the special forward slash operator (/
), which is possibly the most unusual part of the pathlib
library. You may have already raised your eyebrows about it in the example at the beginning of this tutorial:
from pathlib import Path
for file_path in Path.cwd().glob("*.txt"):
new_path = Path("archive") / file_path.name
file_path.rename(new_path)
The forward slash operator can join several paths or a mix of paths and strings as long as you include one Path
object.
You use a forward slash regardless of your platform’s actual path separator.
If you don’t like the special slash notation, then you can do the same operation with the .joinpath()
method:
>>> from pathlib import Path
>>> Path.home().joinpath("python", "scripts", "test.py")
PosixPath('/home/gahjelle/python/scripts/test.py')
This notation is closer to os.path.join()
, which you may have used in the past. It can feel more familiar than a forward slash if you’re used to backslashed paths.
After you’ve instantiated Path
, you probably want to do something with your path. For example, maybe you’re aiming to perform file operations or pick parts from the path. That’s what you’ll do next.
File System Operations With Paths
You can perform a bunch of handy operations on your file system using pathlib
.
In this section, you’ll get a broad overview of some of the most common ones. But before you start performing file operations, have a look at the parts of a path first.
Picking Out Components of a Path
A file or directory path consists of different parts. When you use pathlib
, these parts are conveniently available as properties. Basic examples include:
.name
: The filename without any directory.stem
: The filename without the file extension.suffix
: The file extension.anchor
: The part of the path before the directories.parent
: The directory containing the file, or the parent directory if the path is a directory
Here, you can observe these properties in action:
Note that .parent
returns a new Path
object, whereas the other properties return strings. This means, for instance, that you can chain .parent
in the last example or even combine it with the slash operator to create completely new paths:
>>> path.parent.parent / f"new{path.suffix}"
PosixPath('/home/gahjelle/new.md')
That’s quite a few properties to keep straight. If you want a handy reference for these Path
properties, then you can download the Real Python pathlib
cheat sheet by clicking the link below:
Free Download: Click here to claim your pathlib
cheat sheet so you can tame the file system with Python.
Reading and Writing Files
Consider that you want to print all the items on a shopping list that you wrote down in a Markdown file. The content of shopping_list.md
looks like this:
<!-- shopping_list.md -->
# Shopping List
## Fruit
* Banana
* Apple
* Peach
## Candy
* Chocolate
* Nougat Bits
Traditionally, the way to read or write a file in Python has been to use the built-in open()
function.
With pathlib
, you can use open()
directly on Path
objects.
So, a first draft of your script that finds all the items in shopping_list.md
and prints them may look like this:
# read_shopping_list.py
from pathlib import Path
path = Path.cwd() / "shopping_list.md"
with path.open(mode="r", encoding="utf-8") as md_file:
content = md_file.read()
groceries = [line for line in content.splitlines() if line.startswith("*")]
print("\n".join(groceries))
In fact, Path.open()
is calling the built-in open()
function behind the scenes. That’s why you can use parameters like mode
and encoding
with Path.open()
.
On top of that, pathlib
offers some convenient methods to read and write files:
.read_text()
opens the path in text mode and returns the contents as a string..read_bytes()
opens the path in binary mode and returns the contents as a byte string..write_text()
opens the path and writes string data to it..write_bytes()
opens the path in binary mode and writes data to it.
Each of these methods handles the opening and closing of the file. Therefore, you can update read_shopping_list.py
using .read_text()
:
# read_shopping_list.py
from pathlib import Path
path = Path.cwd() / "shopping_list.md"
content = path.read_text(encoding="utf-8")
groceries = [line for line in content.splitlines() if line.startswith("*")]
print("\n".join(groceries))
You can also specify paths directly as filenames, in which case they’re interpreted relative to the current working directory. So you can condense the example above even more:
# read_shopping_list.py
from pathlib import Path
content = Path("shopping_list.md").read_text(encoding="utf-8")
groceries = [line for line in content.splitlines() if line.startswith("*")]
print("\n".join(groceries))
If you want to create a plain shopping list that only contains the groceries, then you can use .write_text()
in a similar fashion:
# write_plain_shoppinglist.py
from pathlib import Path
content = Path("shopping_list.md").read_text(encoding="utf-8")
groceries = [line for line in content.splitlines() if line.startswith("*")]
Path("plain_list.md").write_text("\n".join(groceries), encoding="utf-8")
When using .write_text()
, Python overwrites any existing files on the same path without giving you any notice. That means you could erase all your hard work with a single keystroke!
As always, when you write files with Python, you should be cautious of what your code is doing. The same is true when you’re renaming files.
Renaming Files
When you want to rename files, you can use .with_stem()
, .with_suffix()
, or .with_name()
. They return the original path but with the filename, the file extension, or both replaced.
If you want to change a file’s extension, then you can use .with_suffix()
in combination with .replace()
:
>>> from pathlib import Path
>>> txt_path = Path("/home/gahjelle/realpython/hello.txt")
>>> txt_path
PosixPath("/home/gahjelle/realpython/hello.txt")
>>> md_path = txt_path.with_suffix(".md")
PosixPath('/home/gahjelle/realpython/hello.md')
>>> txt_path.replace(md_path)
Using .with_suffix()
returns a new path. To actually rename the file, you use .replace()
. This moves txt_path
to md_path
and renames it when saving.
If you want to change the complete filename, including the extension, then you can use .with_name()
:
>>> from pathlib import Path
>>> txt_path = Path("/home/gahjelle/realpython/hello.txt")
>>> txt_path
PosixPath("/home/gahjelle/realpython/hello.txt")
>>> md_path = txt_path.with_name("goodbye.md")
PosixPath('/home/gahjelle/realpython/goodbye.md')
>>> txt_path.replace(md_path)
The code above renames hello.txt
to goodbye.md
.
If you want to rename the filename only, keeping the suffix as it is, then you can use .with_stem()
. You’ll explore this method in the next section.
Copying Files
Surprisingly, Path
doesn’t have a method to copy files. But with the knowledge that you’ve gained about pathlib
so far, you can create the same functionality with a few lines of code:
>>> from pathlib import Path
>>> source = Path("shopping_list.md")
>>> destination = source.with_stem("shopping_list_02")
>>> destination.write_bytes(source.read_bytes())
You’re using .with_stem()
to create the new filename without changing the extension.
The actual copying takes place in the highlighted line, where you use .read_bytes()
to read the content of source
and then write this content to destination
using .write_bytes()
.
While it’s tempting to use pathlib
for everything path related, you may also consider using shutil
for copying files. It’s a great alternative that also knows how to work with Path
objects.
Moving and Deleting Files
Through pathlib
, you also have access to basic file system–level operations like moving, updating, and even deleting files. For the most part, these methods don’t give a warning or wait for confirmation before getting rid of information or files. So, be careful when using these methods.
To move a file, you can use .replace()
. Note that if the destination already exists, then .replace()
will overwrite it. To avoid possibly overwriting the destination path, you can test whether the destination exists before replacing:
from pathlib import Path
source = Path("hello.py")
destination = Path("goodbye.py")
if not destination.exists():
source.replace(destination)
However, this does leave the door open for a possible race condition. Another process may add a file at the destination
path between the execution of the if
statement and the .replace()
method. If that’s a concern, then a safer way is to open the destination path for exclusive creation then explicitly copy the source data and delete the source file afterward:
from pathlib import Path
source = Path("hello.py")
destination = Path("goodbye.py")
try:
with destination.open(mode="xb") as file:
file.write(source.read_bytes())
except FileExistsError:
print(f"File {destination} exists already.")
else:
source.unlink()
If destination
already exists, then the code above catches a FileExistsError
and prints a warning. To perform a move, you need to delete source
with .unlink()
after the copy is done. Using else
ensures that the source file isn’t deleted if the copying fails.
Creating Empty Files
To create an empty file with pathlib
, you can use .touch()
.
This method is intended to update a file’s modification time, but you can use its side effect to create a new file:
>>> from pathlib import Path
>>> filename = Path("hello.txt")
>>> filename.exists()
False
>>> filename.touch()
>>> filename.exists()
True
>>> filename.touch()
In the example above, you instantiate a Path
object and create the file using .touch()
. You use .exists()
both to verify that the file didn’t exist before and then to check that it was successfully created.
If you use .touch()
again, then it updates the file’s modification time.
If you don’t want to modify files accidentally, then you can use the exist_ok
parameter and set it to False
:
>>> filename.touch(exist_ok=False)
Traceback (most recent call last):
...
FileExistsError: [Errno 17] File exists: 'hello.txt'
When you use .touch()
on a file path that doesn’t exist, you create a file without any content.
Creating an empty file with Path.touch()
can be useful when you want to reserve a filename for later use, but you don’t have any content to write to it yet. For example, you may want to create an empty file to ensure that a certain filename is available, even if you don’t have content to write to it at the moment.
Python pathlib Examples
In this section, you’ll see some examples of how to use pathlib
to deal with everyday challenges that you’re facing as a Python developer.
You can use these examples as starting points for your own code or save them as code snippets for later reference.
Counting Files
There are a few different ways to get a list of all the files in a directory with Python. With pathlib
, you can conveniently use the .iterdir()
method, which iterates over all the files in the given directory. In the following example, you combine .iterdir()
with the collections.Counter
class to count how many files of each file type are in the current directory:
>>> from pathlib import Path
>>> from collections import Counter
>>> Counter(path.suffix for path in Path.cwd().iterdir())
Counter({'.md': 2, '.txt': 4, '.pdf': 2, '.py': 1})
You can create more flexible file listings with the methods .glob()
and .rglob()
. For example, Path.cwd().glob("*.txt")
returns all the files with a .txt
suffix in the current directory. In the following, you only count file extensions starting with p
:
>>> Counter(path.suffix for path in Path.cwd().glob("*.p*"))
Counter({'.pdf': 2, '.py': 1})
If you want to recursively find all the files in both the directory and its subdirectories, then you can use .rglob()
. This method also offers a cool way to display a directory tree, which is the next example.
Displaying a Directory Tree
In this example, you define a function named tree()
, which will print a visual tree representing the file hierarchy, rooted at a given directory. This is useful when, for example, you want to peek into the subdirectories of a project.
To traverse the subdirectories as well, you use the .rglob()
method:
# display_dir_tree.py
def tree(directory):
print(f"+ {directory}")
for path in sorted(directory.rglob("*")):
depth = len(path.relative_to(directory).parts)
spacer = " " * depth
print(f"{spacer}+ {path.name}")
Note that you need to know how far away from the root directory a file is located. To do this, you first use .relative_to()
to represent a path relative to the root directory. Then, you use the .parts
property to count the number of directories in the representation. When run, this function creates a visual tree like the following:
>>> from pathlib import Path
>>> from display_dir_tree import tree
>>> tree(Path.cwd())
+ /home/gahjelle/realpython
+ directory_1
+ file_a.md
+ directory_2
+ file_a.md
+ file_b.pdf
+ file_c.py
+ file_1.txt
+ file_2.txt
If you want to push this code to the next level, then you can try building a directory tree generator for the command line.
Finding the Most Recently Modified File
The .iterdir()
, .glob()
, and .rglob()
methods are great fits for generator expressions and list comprehensions. To find the most recently modified file in a directory, you can use the .stat()
method to get information about the underlying files. For instance, .stat().st_mtime
gives the time of last modification of a file:
>>> from pathlib import Path
>>> from datetime import datetime
>>> directory = Path.cwd()
>>> time, file_path = max((f.stat().st_mtime, f) for f in directory.iterdir())
>>> print(datetime.fromtimestamp(time), file_path)
2023-03-28 19:23:56.977817 /home/gahjelle/realpython/test001.txt
The timestamp returned from a property like .stat().st_mtime
represents seconds since January 1, 1970, also known as the epoch. If you’d prefer a different format, then you can use time.localtime
or time.ctime
to convert the timestamp to something more usable. If this example has sparked your curiosity, then you may want learn more about how to get and use the current time in Python.
Creating a Unique Filename
In the last example, you’ll construct a unique numbered filename based on a template string. This can be handy when you don’t want to overwrite an existing file if it already exists:
# unique_path.py
def unique_path(directory, name_pattern):
counter = 0
while True:
counter += 1
path = directory / name_pattern.format(counter)
if not path.exists():
return path
In unique_path()
, you specify a pattern for the filename, with room for a counter. Then, you check the existence of the file path created by joining a directory and the filename, including a value for the counter. If it already exists, then you increase the counter and try again.
Now you can use the script above to get unique filenames:
>>> from pathlib import Path
>>> from unique_path import unique_path
>>> template = "test{:03d}.txt"
>>> unique_path(Path.cwd(), template)
PosixPath("/home/gahjelle/realpython/test003.txt")
If the directory already contains the files test001.txt
and test002.txt
, then the above code will set path
to test003.txt
.
Conclusion
Python’s pathlib
module provides a modern and Pythonic way of working with file paths, making code more readable and maintainable.
With pathlib
, you can represent file paths with dedicated Path
objects instead of plain strings.
In this tutorial, you’ve learned how to:
- Work with file and directory paths in Python
- Instantiate a
Path
object in different ways - Use
pathlib
to read and write files - Carefully copy, move, and delete files
- Manipulate paths and the underlying file system
- Pick out components of a path
The pathlib
module makes dealing with file paths convenient by providing helpful methods and properties.
Peculiarities of the different systems are hidden by the Path
object, which makes your code more consistent across operating systems.
If you want to get an overview PDF of the handy methods and properties that pathlib
offers, then you can click the link below:
Free Download: Click here to claim your pathlib
cheat sheet so you can tame the file system with Python.
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Using Python's pathlib Module