How to Get the First Match From a Python List or Iterable

How to Get the First Match From a Python List or Iterable

by Ian Currie intermediate

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Getting the First Match From a Python List or Iterable

At some point in your Python journey, you may need to find the first item that matches a certain criterion in a Python iterable, such as a list or dictionary.

The simplest case is that you need to confirm that a particular item exists in the iterable. For example, you want to find a name in a list of names or a substring inside a string. In these cases, you’re best off using the in operator. However, there are many use cases when you may want to look for items with specific properties. For instance, you may need to:

  • Find a non-zero value in a list of numbers
  • Find a name of a particular length in a list of strings
  • Find and modify a dictionary in a list of dictionaries based on a certain attribute

This tutorial will cover how best to approach all three scenarios. One option is to transform your whole iterable to a new list and then use .index() to find the first item matching your criterion:

Python
>>> names = ["Linda", "Tiffany", "Florina", "Jovann"]
>>> length_of_names = [len(name) for name in names]
>>> idx = length_of_names.index(7)
>>> names[idx]
'Tiffany'

Here, you’ve used .index() to find that "Tiffany" is the first name in your list with seven characters. This solution isn’t great, partly because you calculate the criterion for all elements, even if the first item is a match.

In the above situations, you’re searching for a calculated property of the items you’re iterating over. In this tutorial, you’ll learn how to match such a derived attribute without needing to do unnecessary calculations.

How to Get the First Matching Item in a Python List

You may already know about the in Python operator, which can tell you if an item is in an iterable. While this is the most efficient method that you can use for this purpose, sometimes you may need to match based on a calculated property of the items, like their lengths.

For example, you might be working with a list of dictionaries, typical of what you might get when processing JSON data. Check out this data that was obtained from country-json:

Python
>>> countries = [
...     {"country": "Austria", "population": 8_840_521},
...     {"country": "Canada", "population": 37_057_765},
...     {"country": "Cuba", "population": 11_338_138},
...     {"country": "Dominican Republic", "population": 10_627_165},
...     {"country": "Germany", "population": 82_905_782},
...     {"country": "Norway", "population": 5_311_916},
...     {"country": "Philippines", "population": 106_651_922},
...     {"country": "Poland", "population": 37_974_750},
...     {"country": "Scotland", "population": 5_424_800},
...     {"country": "United States", "population": 326_687_501},
... ]

You might want to grab the first dictionary that has a population of over one hundred million. The in operator isn’t a great choice for two reasons. One, you’d need to have the full dictionary to match it, and two, it wouldn’t return the actual object but a Boolean value:

Python
>>> target_country = {"country": "Philippines", "population": 106_651_922}
>>> target_country in countries
True

There’s no way to use in if you need to find the dictionary based on an attribute of the dictionary, such as population.

The most readable way to find and manipulate the first element in the list based on a calculated value is to use a humble for loop:

Python
>>> for country in countries:
...     if country["population"] > 100_000_000:
...         print(country)
...         break
...
{"country": "Philippines", "population": 106651922}

Instead of printing the target object, you can do anything you like with it in the for loop body. After you’re done, be sure to break the for loop so that you don’t needlessly search the rest of the list.

The for loop approach is the one taken by the first package, which is a tiny package that you can download from PyPI that exposes a general-purpose function, first(). This function returns the first truthy value from an iterable by default, with an optional key parameter to return the first value truthy value after it’s been passed through the key argument.

Later in the tutorial, you’ll implement your own variation of the first() function. But first, you’ll look into another way of returning a first match: using generators.

Using Python Generators to Get the First Match

Python generator iterators are memory-efficient iterables that can be used to find the first element in a list or any iterable. They’re a core feature of Python, being used extensively under the hood. It’s likely you’ve already used generators without even knowing it!

The potential issue with generators is that they’re a bit more abstract and, as such, not quite as readable as for loops. You do get some performance benefits from generators, but these benefits are often negligible when the importance of readability is taken into consideration. That said, using them can be fun and really level up your Python game!

In Python, you can make a generator in various ways, but in this tutorial you’ll be working with generator comprehensions:

Python
>>> gen = (country for country in countries)
>>> next(gen)
{'country': 'Austria', 'population': 8840521}

>>> next(gen)
{'country': 'Canada', 'population': 37057765}

Once you’ve defined a generator iterator, you can then call the next() function with the generator, producing the countries one by one until the countries list is exhausted.

To find the first element matching a certain criteria in a list, you can add a conditional expression to the generator comprehension so the resulting iterator will only yield items that match your criteria. In the following example, you use a conditional expression to generate items based on whether their population attribute is over one hundred million:

Python
>>> gen = (
...     country for country in countries
...     if country["population"] > 100_000_000
... )
>>> next(gen)
{'country': 'Philippines', 'population': 106651922}

So now the generator will only produce dictionaries with a population attribute of over one hundred million. This means that the first time you call next() with the generator iterator, it’ll yield the first element that you’re looking for in the list, just like the for loop version.

In terms of readability, a generator isn’t quite as natural as a for loop. So why might you want to use one for this purpose? In the next section, you’ll be doing a quick performance comparison.

Comparing the Performance Between Loops and Generators

As always when measuring performance, you shouldn’t read too much into any one set of results. Instead, design a test for your own code with your own real-world data before you make any important decisions. You also need to weigh complexity against readability—perhaps shaving off a few milliseconds just isn’t worth it!

For this test, you’ll want to create a function that can create lists of an arbitrary size with a certain value at a certain position:

Python
>>> from pprint import pp

>>> def build_list(size, fill, value, at_position):
...     return [value if i == at_position else fill for i in range(size)]
...

>>> pp(
...     build_list(
...         size=10,
...         fill={"country": "Nowhere", "population": 10},
...         value={"country": "Atlantis", "population": 100},
...         at_position=5,
...     )
... )
[{'country': 'Nowhere', 'population': 10},
 {'country': 'Nowhere', 'population': 10},
 {'country': 'Nowhere', 'population': 10},
 {'country': 'Nowhere', 'population': 10},
 {'country': 'Nowhere', 'population': 10},
 {'country': 'Atlantis', 'population': 100},
 {'country': 'Nowhere', 'population': 10},
 {'country': 'Nowhere', 'population': 10},
 {'country': 'Nowhere', 'population': 10},
 {'country': 'Nowhere', 'population': 10}]

The build_list() function creates a list filled with identical items. All items in the list, except for one, are copies of the fill argument. The single outlier is the value argument, and it’s placed at the index provided by the at_position argument.

You imported pprint and used it to output the built list to make it more readable. Otherwise, the list would appear on one single line by default.

With this function, you’ll be able to create a large set of lists with the target value at various positions in the list. You can use this to compare how long it takes to find an element at the start and at the end of the list.

To compare for loops and generators, you’ll want two more basic functions that are hard-coded to find a dictionary with a population attribute over fifty:

Python
def find_match_loop(iterable):
    for value in iterable:
        if value["population"] > 50:
            return value
    return None

def find_match_gen(iterable):
    return next(
      (value for value in iterable if value["population"] > 50),
      None
    )

The functions are hard-coded to keep things simple for the test. In the next section, you’ll be creating a reusable function.

With these basic components in place, you can set up a script with timeit to test both matching functions with a series of lists with the target position and different locations in the list:

Python
from timeit import timeit

TIMEIT_TIMES = 100
LIST_SIZE = 500
POSITION_INCREMENT = 10

def build_list(size, fill, value, at_position): ...

def find_match_loop(iterable): ...

def find_match_gen(iterable): ...

looping_times = []
generator_times = []
positions = []

for position in range(0, LIST_SIZE, POSITION_INCREMENT):
    print(
        f"Progress {position / LIST_SIZE:.0%}",
        end=f"{3 * ' '}\r",  # Clear previous characters and reset cursor
    )

    positions.append(position)

    list_to_search = build_list(
        LIST_SIZE,
        {"country": "Nowhere", "population": 10},
        {"country": "Atlantis", "population": 100},
        position,
    )

    looping_times.append(
        timeit(
            "find_match_loop(list_to_search)",
            globals=globals(),
            number=TIMEIT_TIMES,
        )
    )
    generator_times.append(
        timeit(
            "find_match_gen(list_to_search)",
            globals=globals(),
            number=TIMEIT_TIMES,
        )
    )

print("Progress 100%")

This script will produce two parallel lists, each containing the time it took to find the element with either the loop or the generator. The script will also produce a third list that’ll contain the corresponding position of the target element in the list.

You aren’t doing anything with the results yet, and ideally you want to chart these out. So, check out the following completed script that uses matplotlib to produce a couple of charts from the output:

Python
# chart.py

from timeit import timeit

import matplotlib.pyplot as plt

TIMEIT_TIMES = 1000  # Increase number for smoother lines
LIST_SIZE = 500
POSITION_INCREMENT = 10

def build_list(size, fill, value, at_position):
    return [value if i == at_position else fill for i in range(size)]

def find_match_loop(iterable):
    for value in iterable:
        if value["population"] > 50:
            return value

def find_match_gen(iterable):
    return next(value for value in iterable if value["population"] > 50)

looping_times = []
generator_times = []
positions = []

for position in range(0, LIST_SIZE, POSITION_INCREMENT):
    print(
        f"Progress {position / LIST_SIZE:.0%}",
        end=f"{3 * ' '}\r",  # Clear previous characters and reset cursor
    )

    positions.append(position)

    list_to_search = build_list(
        size=LIST_SIZE,
        fill={"country": "Nowhere", "population": 10},
        value={"country": "Atlantis", "population": 100},
        at_position=position,
    )

    looping_times.append(
        timeit(
            "find_match_loop(list_to_search)",
            globals=globals(),
            number=TIMEIT_TIMES,
        )
    )
    generator_times.append(
        timeit(
            "find_match_gen(list_to_search)",
            globals=globals(),
            number=TIMEIT_TIMES,
        )
    )

print("Progress 100%")

fig, ax = plt.subplots()

plot = ax.plot(positions, looping_times, label="loop")
plot = ax.plot(positions, generator_times, label="generator")

plt.xlim([0, LIST_SIZE])
plt.ylim([0, max(max(looping_times), max(generator_times))])

plt.xlabel("Index of element to be found")
plt.ylabel(f"Time in seconds to find element {TIMEIT_TIMES:,} times")
plt.title("Raw Time to Find First Match")
plt.legend()

plt.show()

# Ratio

looping_ratio = [loop / loop for loop in looping_times]
generator_ratio = [
    gen / loop for gen, loop in zip(generator_times, looping_times)
]

fig, ax = plt.subplots()

plot = ax.plot(positions, looping_ratio, label="loop")
plot = ax.plot(positions, generator_ratio, label="generator")

plt.xlim([0, LIST_SIZE])
plt.ylim([0, max(max(looping_ratio), max(generator_ratio))])

plt.xlabel("Index of element to be found")
plt.ylabel("Speed to find element, relative to loop")
plt.title("Relative Speed to Find First Match")
plt.legend()

plt.show()

Depending on the system that you’re running and the values for TIMEIT_TIMES, LIST_SIZE, and POSITION_INCREMENT that you use, running the script can take a while, but it should produce one chart that shows the times plotted against each other:

Chart showing the time taken to find first match in iterable, loop vs generator

Additionally, after closing the first chart, you’ll get another chart that shows the ratio between the two strategies:

Chart showing the relative time taken to find first match in iterable, loop vs generator

This last chart clearly illustrates that in this test, when the target item is near the beginning of the iterator, generators are far slower than for loops. However, once the element to find is at position 100 or greater, generators beat the for loop quite consistently and by a fair margin:

Zoomed in chart showing the relative time taken to find first match in iterable, loop vs generator

You can interactively zoom in on the previous chart with the magnifying glass icon. The zoomed chart shows that there’s a performance gain of around five or six percent. Five percent may not be anything to write home about, but it’s also not negligible. Whether it’s worth it for you depends on the specific data that you’ll be using, and how often you need to use it.

With those results, you can tentatively say that generators are faster than for loops, even though generators can be significantly slower when the item to find is in the first hundred elements of the iterable. When you’re dealing with small lists, the overall difference in terms of raw milliseconds lost isn’t much. Yet for large iterables where a 5 percent gain can mean minutes, it’s something to bear in mind:

Chart showing the relative time taken to find first match in iterable, loop vs generator, very large list size

As you can see by this last chart, for very large iterables, the increase in performance stabilizes at around 6 percent. Also, ignore the spikes—to test this large iterable, the TIMEIT_TIMES were decreased substantially.

Making a Reusable Python Function to Find the First Match

Say that the iterables you expect to use are going to be on the large side, and you’re interested in squeezing out every bit of performance out of your code. For that reason, you’ll use generators instead of a for loop. You’ll also be dealing with a variety of different iterables with a variety of items and want flexibility in the way you match, so you’ll design your function to be able to accomplish various goals:

  • Returning the first truthy value
  • Returning the first match
  • Returning the first truthy result of values being passed through a key function
  • Returning the first match of values being passed through a key function
  • Returning a default value if there’s no match

While there are many ways to implement this, here’s a way to do it with pattern matching:

Python
def get_first(iterable, value=None, key=None, default=None):
    match value is None, callable(key):
        case (True, True):
            gen = (elem for elem in iterable if key(elem))
        case (False, True):
            gen = (elem for elem in iterable if key(elem) == value)
        case (True, False):
            gen = (elem for elem in iterable if elem)
        case (False, False):
            gen = (elem for elem in iterable if elem == value)

    return next(gen, default)

You can call the function with up to four arguments, and it’ll behave differently depending on the combination of arguments that you pass into it.

The function’s behavior mainly depends on the value and key arguments. That’s why the match statement checks if value is None and uses the callable() function to learn whether key is a function.

For example, if both the match conditions are True, then it means that you’ve passed in a key but no value. This means that you want each item in the iterable to be passed through the key function, and the return value should be the first truthy result.

As another example, if both match conditions are False, that means that you’ve passed in a value but not a key. Passing a value and no key means that you want the first element in the iterable that’s a direct match with the value provided.

Once match is over, you have your generator. All that’s left to do is to call next() with the generator and the default argument for the first match.

With this function, you can search for matches in four different ways:

Python
>>> countries = [
...     {"country": "Austria", "population": 8_840_521},
...     {"country": "Canada", "population": 37_057_765},
...     {"country": "Cuba", "population": 11_338_138},
...     {"country": "Dominican Republic", "population": 10_627_165},
...     {"country": "Germany", "population": 82_905_782},
...     {"country": "Norway", "population": 5_311_916},
...     {"country": "Philippines", "population": 106_651_922},
...     {"country": "Poland", "population": 37_974_750},
...     {"country": "Scotland", "population": 5_424_800},
...     {"country": "United States", "population": 326_687_501},
... ]

>>> # Get first truthy item
>>> get_first(countries)
{'country': 'Austria', 'population': 8840521}

>>> # Get first item matching the value argument
>>> get_first(countries, value={"country": "Germany", "population": 82_905_782})
{'country': 'Germany', 'population': 82905782}

>>> # Get first result of key(item) that equals the value argument
>>> get_first(
...     countries, value=5_311_916, key=lambda country: country["population"]
... )
{'country': 'Norway', 'population': 5311916}

>>> # Get first truthy result of key(item)
>>> get_first(
...     countries, key=lambda country: country["population"] > 100_000_000
... )
{'country': 'Philippines', 'population': 106651922}

With this function, you have lots of flexibility in how to match. For instance, you could deal with only values, or only key functions, or both!

In the first package mentioned earlier, the function signature is slightly different. It doesn’t have a value parameter. You can still accomplish the same effect as above by relying on the key parameter:

Python
>>> from first import first
>>> first(
...     countries,
...     key=lambda item: item == {"country": "Cuba", "population": 11_338_138}
... )
{'country': 'Cuba', 'population': 11338138}

In the downloadable materials, you can also find an alternative implementation of get_first() that mirrors the first package’s signature:

Regardless of which implementation you ultimately use, you now have a performant, reusable function that can get the first item you need.

Conclusion

In this tutorial, you’ve learned how to find the first element in a list or any iterable in a variety of ways. You learned that the fastest and most basic way to match is by using the in operator, but you’ve seen that it’s limited for anything more complex. So you’ve examined the humble for loop, which will be the most readable and straightforward way. However, you’ve also looked at generators for that extra bit of performance and swagger.

Finally, you’ve looked at one possible implementation of a function that gets the first item from an iterable, whether that be the first truthy value or a value transformed by a function that matches on certain criteria.

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Getting the First Match From a Python List or Iterable

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Ian Currie

Ian is a Python nerd who relies on it for work and much enjoyment.

» More about Ian

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Master Real-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

Master Real-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Topics: intermediate

Recommended Video Course: Getting the First Match From a Python List or Iterable