Python

Stop Wrestling with os.path: A Practical Deep Dive into Python's pathlib

Learn how Python's pathlib turns clumsy string-based file handling into clean, readable, object-oriented code. A hands-on tour of building paths, reading and writing files, globbing, and the pitfalls to avoid.

Anton Jacker

Jun 16, 2026 — 5 min read

If your file-handling code is still a tangle of os.path.join, os.path.dirname, and string slicing to pull out an extension, you are working harder than Python asks you to. Since version 3.4, the standard library has shipped pathlib: an object-oriented way to represent filesystem paths that is shorter to write, easier to read, and far less error-prone across operating systems. This guide walks through the parts of pathlib you will actually reach for every day, with runnable examples and the pitfalls that trip people up.

Why pathlib instead of os.path?

The classic os.path module treats paths as plain strings. That works, but it pushes all the structure into your head: you have to remember which function returns the directory, which returns the extension, and how to glue segments together safely. pathlib flips this around by making a path a first-class object that knows about its own parts. Compare the two styles:

# The old way
import os
config = os.path.join(os.path.dirname(__file__), "config", "settings.ini")
name = os.path.basename(config)
ext = os.path.splitext(config)[1]

# The pathlib way
from pathlib import Path
config = Path(__file__).parent / "config" / "settings.ini"
name = config.name
ext = config.suffix

The star of the show is the / operator. Instead of calling a function to join segments, you divide one path by the next. It reads naturally, and pathlib inserts the correct separator for the current platform, so the same code produces valid paths on Linux, macOS, and Windows.

Constructing and inspecting paths

A Path object exposes its components as simple attributes. Once you internalise these names you rarely need to slice a string again:

from pathlib import Path

p = Path("/home/anton/projects/report.tar.gz")

print(p.name)      # report.tar.gz   (final component)
print(p.stem)      # report.tar      (name without final suffix)
print(p.suffix)    # .gz             (last extension)
print(p.suffixes)  # ['.tar', '.gz'] (all extensions)
print(p.parent)    # /home/anton/projects
print(p.parts)     # ('/', 'home', 'anton', 'projects', 'report.tar.gz')

You can also derive new paths from an existing one without manual string surgery. These methods return brand-new Path objects and never mutate the original:

p = Path("/data/report_v1.csv")

print(p.with_suffix(".parquet"))  # /data/report_v1.parquet
print(p.with_name("summary.csv")) # /data/summary.csv
print(p.with_stem("report_v2"))   # /data/report_v2.csv   (Python 3.9+)

Reading and writing files the short way

For everyday text and binary I/O, pathlib offers convenience methods that open, read or write, and close the file in a single call. They are perfect for small-to-medium files where you do not need streaming:

from pathlib import Path

notes = Path("notes.txt")
notes.write_text("hello\nworld\n", encoding="utf-8")

content = notes.read_text(encoding="utf-8")
print(content.splitlines())   # ['hello', 'world']

# Binary equivalents
Path("blob.bin").write_bytes(b"\x00\x01\x02")
data = Path("blob.bin").read_bytes()

Always pass encoding="utf-8" explicitly. If you omit it, Python falls back to the platform default, which can differ between machines and lead to subtle decoding bugs that only appear in production. For larger files, fall back to Path.open(), which behaves exactly like the built-in open() but starts from a path object and works beautifully as a context manager:

line_count = 0
with Path("big.log").open(encoding="utf-8") as f:
    for line in f:
        line_count += 1

Creating directories and cleaning up

Two flags make directory creation painless. parents=True builds any missing intermediate folders (like mkdir -p), and exist_ok=True stops pathlib from raising if the folder already exists:

from pathlib import Path

out = Path("build/artifacts/2026")
out.mkdir(parents=True, exist_ok=True)

# Remove a single file
(out / "temp.txt").write_text("scratch")
(out / "temp.txt").unlink()

# Remove an empty directory
(out / "empty").mkdir(exist_ok=True)
(out / "empty").rmdir()

Note that unlink() only removes files and rmdir() only removes empty directories. To delete a folder and everything inside it, reach for shutil.rmtree() — pathlib deliberately leaves recursive deletion out because it is destructive. In Python 3.8+ you can also use unlink(missing_ok=True) to avoid an error when the target is already gone.

Finding files with glob and rglob

This is where pathlib really shines. glob() matches a pattern in one directory, while rglob() recurses into every subdirectory. Both return generators, so they stay memory-efficient even over large trees:

from pathlib import Path

project = Path("src")

# Every Python file directly inside src/
for py in project.glob("*.py"):
    print(py.name)

# Every Python file anywhere under src/
for py in project.rglob("*.py"):
    print(py)

# Combine with comprehensions for quick filtering
big_files = [f for f in project.rglob("*") if f.is_file() and f.stat().st_size > 1_000_000]

Because each result is a Path, you can immediately chain other operations — read it, check its size with f.stat().st_size, or move it — without converting types. To list a directory without a pattern, use iterdir(), which yields each child entry once.

Resolving, comparing, and relative paths

When you need an absolute, symlink-free canonical path, call resolve(). To express one path relative to another, use relative_to(), and to test containment safely, use is_relative_to() (Python 3.9+):

from pathlib import Path

messy = Path("src/../src/app/./main.py")
print(messy.resolve())   # /abs/path/src/app/main.py

full = Path("/var/www/site/static/logo.png")
print(full.relative_to("/var/www/site"))  # static/logo.png
print(full.is_relative_to("/var/www"))    # True

Pure paths: manipulation without touching the disk

Sometimes you want to reason about paths that belong to a different operating system, or simply manipulate strings without any filesystem access — for example in tests. PurePosixPath and PureWindowsPath give you all the parsing and joining logic with none of the I/O methods:

from pathlib import PurePosixPath

p = PurePosixPath("/etc/nginx/nginx.conf")
print(p.parent)          # /etc/nginx
print(p.match("*.conf")) # True
print(p.parts)           # ('/', 'etc', 'nginx', 'nginx.conf')

Common pitfalls

A few gotchas catch newcomers. First, many libraries historically expected strings, not Path objects; modern Python (3.6+) accepts Path almost everywhere through the os.PathLike protocol, but if you hit a stubborn third-party API, wrap the path in str(). Second, the / operator needs a Path on the left side — "folder" / Path("file") raises a TypeError, so start the chain with Path(...). Third, remember that path methods are pure functions: p.with_suffix(".txt") returns a new object and leaves p unchanged, so assign the result. Finally, do not forget the explicit encoding on text I/O, as mentioned above.

Wrap-up and next steps

The pathlib module replaces a scattered set of string functions with a single, coherent object that knows how to join, split, search, read, and write. Adopting it tends to make file-handling code noticeably shorter and easier to follow, while quietly fixing cross-platform separator bugs along the way. A good next step is to refactor one real script: swap every os.path.join for the / operator, replace manual open()/read()/close() blocks with read_text(), and convert directory walks to rglob(). From there, explore the related standard-library tools — shutil for copying and recursive deletion, tempfile for scratch directories, and os.scandir when you need raw performance. Once pathlib is in your fingers, you will wonder how you ever managed paths as bare strings.