Tips and Tricks for Python Lists
This article is for Python programmers who have gone beyond the basics of using lists, but would like to see some interesting concepts. It isn’t very advanced and should be easy for you to understand.
Ensure that a List Contains Unique Elements.
Sets contain unique elements but are unordered. On the other hand, lists can contain duplicate elements but retain their insertion order. This example shows one way to get the benefits of both sets and lists — an ordered sequence of unique elements.
We are subclassing the Python built-in collections.UserList
object. We could subclass list
, or use collections.abc
to create a new abstract base class of type MutableSequence
. But those two approaches entail significantly more work to correctly implement all of the __dunder__
methods.
Here, we’re only interested in ensuring that sequences passed to the constructor, or any new elements appended to the end of the list, are unique.
|
|
Line #39 is doing all the work. If the count() method is false: greater than 1, then skip the append.
Find all the Index Values of a Matching a Test Condition
What do you do if you want to efficiently test a large list to find the indices of all matching elements? This can be quite easy to get wrong: either by being too slow or consuming too much memory.
This is one way, to use an iterator and a generator.
Line #13 coalesces all the iterator’s yielded results into a list.
import typing
def get_indices(the_list: list, test_value: object) -> typing.Iterable[int]:
"""
Returns the indices of matching list items.
Uses a generator to create an iterator.
Args:
the_list: the list containing search elements
test_value: what we want to find
Returns: the index of matching list items
>>> print(list(get_indices("The jolly green giant", "e")))
[2, 12, 13]
"""
generator = (key for key, val in enumerate(the_list) if test_value == val)
for key in generator:
yield key
It is also possible to use the get_indices() method in a for loop:
for k in get_indices("The jolly green giant", "e"):
print(k)
Flatten a List of Lists into one Super List
What do we do if we want to flatten a really long list, perhaps many thousands of items, and each one is itself a list? Again, I’ve opted to use the technique of not using return, and instead using yield.
The function flatten_nested_lists doesn’t really complete in a traditional sense. Every time a value is computed it is yielded for the calling programming to make use of. This is a really important trick, also known as lazy evaluation. If my code decided I only needed the first 10 values, I can stop there and not need to calculate everything.
from itertools import chain
def flatten_nested_lists(*input_list: list) -> list:
"""flatten_nested_lists.
Args:
input_list:
Returns:
list:
>>> l1: list = []
>>> l1.append([1, "2", 3])
>>> l1.append([4, 5, 6, 7])
>>> l1.append(["Eight", {"this one": 9}])
>>> l1.append([10, 11, 12])
>>> print(list(flatten_nested_lists(*l1)))
[1, '2', 3, 4, 5, 6, 7, 'Eight', {'this one': 9}, 10, 11, 12]
"""
for i in chain.from_iterable(input_list):
yield i
l2: list = []
l2.append([1, 2, 3])
l2.append([4, 5, 6])
l2.append([10, 11, 12])
for list_item in flatten_nested_lists(*l2):
print(list_item)
The chain.from_iterable() method is very useful, and it does what the name implies! According to the docs, this method is roughly equivalent to:
def from_iterable(iterables):
# chain.from_iterable(['ABC', 'DEF']) --> A B C D E F
for it in iterables:
for element in it:
yield element
How get first n item from generator
1 - itertools.islice (preferred method)
import itertools
list(itertools.islice(generator, n))
example:
import itertools
>>> generator = (i for i in range(10))
>>> list(itertools.islice(generator, 4))
[0, 1, 2, 3]
>>> list(itertools.islice(generator, 4))
[4, 5, 6, 7]
>>> list(itertools.islice(generator, 4))
[8, 9]
>>> list(itertools.islice(generator, 4))
[]
2 - use next function
>>> generator = (i for i in range(10))
>>> list(next(generator) for _ in range(4))
[0, 1, 2, 3]
>>> list(next(generator) for _ in range(4))
[4, 5, 6, 7]
>>> list(next(generator) for _ in range(4))
Traceback (most recent call last):
File "<stdin>", line 1, in <genexpr>
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: generator raised StopIteration
3 - zip function with list
>>> generator = (i for i in range(10))
>>> [x for _, x in zip(range(4), generator)]
[0, 1, 2, 3]
>>> [x for _, x in zip(range(4), generator)]
[4, 5, 6, 7]
>>> [x for _, x in zip(range(4), generator)]
[8, 9]
>>> [x for _, x in zip(range(4), generator)]
[]
4 - zip function with generator
>>> generator = (i for i in range(10))
>>> gen = (x for _, x in zip(range(4), generator))
>>> list(gen)
[0, 1, 2, 3]
>>> gen = (x for _, x in zip(range(4), generator))
>>> list(gen)
[4, 5, 6, 7]
>>> gen = (x for _, x in zip(range(4), generator))
>>> list(gen)
[8, 9]
>>> gen = (x for _, x in zip(range(4), generator))
>>> list(gen)
[]
Implement a FrozenList
Python doesn’t contain a FrozenList object out of the box. The suggestion was rejected by the Python community, not least because it’s easy for us to implement ourselves.
I’ve implemented a subclass of the collections.UserList object, and overriden the methods which allow changes to the list’s elements.
from collections.abc import Iterable
from collections import UserList
def immutable_decorator(f):
def wrapper(self, *args, **kwargs):
raise TypeError("Object is frozen")
return wrapper
class FrozenList(UserList): # pylint: disable=too-many-ancestors
"""
A List which is immutable.
>>> fl: FrozenList = FrozenList("hello")
>>> fl:FrozenList = FrozenList([1, 2, 4])
>>> print(fl[1:2])
[2]
>>> print(fl)
[1, 2, 4]
>>> fl.append(1)
Traceback (most recent call last):
...
TypeError: Object is frozen
>>> fl.extend(1)
Traceback (most recent call last):
...
TypeError: Object is frozen
"""
@immutable_decorator
def __setitem__(self, i: int, o) -> None:
pass
@immutable_decorator
def __add__(self, other):
pass
@immutable_decorator
def __iadd__(self, other):
pass
@immutable_decorator
def __mul__(self, n: int):
pass
@immutable_decorator
def __imul__(self, n: int):
pass
@immutable_decorator
def append(self, item) -> None:
pass
@immutable_decorator
def insert(self, i: int, item) -> None:
pass
@immutable_decorator
def pop(self, i: int):
pass
@immutable_decorator
def remove(self, item) -> None:
pass
@immutable_decorator
def clear(self) -> None:
pass
@immutable_decorator
def reverse(self) -> None:
pass
@immutable_decorator
def extend(self, other) -> None:
pass
l: list = [1, 2, 4]
fl: FrozenList = FrozenList(l)
assert fl[1:2] == [2]
fl: FrozenList = FrozenList("help")
assert fl[1::2] == ["e", "p"]
Lines 5–8 define a simple decorator, which raises an exception on whatever methods it decorates.
Create a List Which Only Accepts Specific Object Types
Of course, it is possible to create a List which only accepts certain object types or only certain values.
Again, subclassing UserList is an easy way to accomplish this: simply by selecting all the methods which accept input and validating it contains what we want.
from collections import UserList
from urllib.parse import urlparse
import json
import requests
def recursive_key_values(dictionary):
for key, value in dictionary.items():
i = 0
if type(value) is str:
yield (key, value)
elif type(value) is dict:
yield from recursive_key_values(value)
elif type(value) in (list, tuple, set):
for seq_item in value:
yield from recursive_key_values({f"{key}_{str(i)}": seq_item})
i = i + 1
else:
yield (key, str(value))
class URLFilteredList(UserList):
"""
URLFilteredList. Will only accept URLs via append.
"""
def __init__(self):
self.data = []
def append(self, item) -> None:
if self._is_url(item):
super().append(item)
def __setitem__(self, i: int, item):
if self._is_url(item):
super().append(item)
@staticmethod
def _is_url(value: str) -> bool:
if value and isinstance(value, str):
validation = urlparse(value)
if all([validation.scheme, validation.netloc]):
return True
return False
dict1 = dict(
json.loads(
requests.get("http://ergast.com/api/f1/2014/5/results.json").text))
ul: URLFilteredList = URLFilteredList()
for k, v in recursive_key_values(dict1):
ul.append(v)
assert "http://en.wikipedia.org/wiki/2014_Spanish_Grand_Prix" in ul
assert "http://en.wikipedia.org/wiki/Daniel_Ricciardo" in ul
ul[0] = "definitely not a url"
assert ul[0] == 'http://ergast.com/mrd/1.4'
print(ul)
# output:
['http://ergast.com/mrd/1.4', 'http://ergast.com/api/f1/2014/5/results.json', 'http://en.wikipedia.org/wiki/2014_Spanish_Grand_Prix', 'http://en.wikipedia.org/wiki/Circuit_de_Barcelona-Catalunya', 'http://en.wikipedia.org/wiki/Lewis_Hamilton', 'http://en.wikipedia.org/wiki/Mercedes-Benz_in_Formula_One', 'http://en.wikipedia.org/wiki/Nico_Rosberg', 'http://en.wikipedia.org/wiki/Mercedes-Benz_in_Formula_One', 'http://en.wikipedia.org/wiki/Daniel_Ricciardo', 'http://en.wikipedia.org/wiki/Red_Bull_Racing', 'http://en.wikipedia.org/wiki/Sebastian_Vettel', 'http://en.wikipedia.org/wiki/Red_Bull_Racing', 'http://en.wikipedia.org/wiki/Valtteri_Bottas', 'http://en.wikipedia.org/wiki/Williams_Grand_Prix_Engineering', 'http://en.wikipedia.org/wiki/Fernando_Alonso', 'http://en.wikipedia.org/wiki/Scuderia_Ferrari', 'http://en.wikipedia.org/wiki/Kimi_R%C3%A4ikk%C3%B6nen', 'http://en.wikipedia.org/wiki/Scuderia_Ferrari', 'http://en.wikipedia.org/wiki/Romain_Grosjean', 'http://en.wikipedia.org/wiki/Lotus_F1', 'http://en.wikipedia.org/wiki/Sergio_P%C3%A9rez', 'http://en.wikipedia.org/wiki/Racing_Point_Force_India', 'http://en.wikipedia.org/wiki/Nico_H%C3%BClkenberg', 'http://en.wikipedia.org/wiki/Racing_Point_Force_India', 'http://en.wikipedia.org/wiki/Jenson_Button', 'http://en.wikipedia.org/wiki/McLaren', 'http://en.wikipedia.org/wiki/Kevin_Magnussen', 'http://en.wikipedia.org/wiki/McLaren', 'http://en.wikipedia.org/wiki/Felipe_Massa', 'http://en.wikipedia.org/wiki/Williams_Grand_Prix_Engineering', 'http://en.wikipedia.org/wiki/Daniil_Kvyat', 'http://en.wikipedia.org/wiki/Scuderia_Toro_Rosso', 'http://en.wikipedia.org/wiki/Pastor_Maldonado', 'http://en.wikipedia.org/wiki/Lotus_F1', 'http://en.wikipedia.org/wiki/Esteban_Guti%C3%A9rrez', 'http://en.wikipedia.org/wiki/Sauber', 'http://en.wikipedia.org/wiki/Adrian_Sutil', 'http://en.wikipedia.org/wiki/Sauber', 'http://en.wikipedia.org/wiki/Jules_Bianchi', 'http://en.wikipedia.org/wiki/Marussia_F1', 'http://en.wikipedia.org/wiki/Max_Chilton', 'http://en.wikipedia.org/wiki/Marussia_F1', 'http://en.wikipedia.org/wiki/Marcus_Ericsson', 'http://en.wikipedia.org/wiki/Caterham_F1', 'http://en.wikipedia.org/wiki/Kamui_Kobayashi', 'http://en.wikipedia.org/wiki/Caterham_F1', 'http://en.wikipedia.org/wiki/Jean-%C3%89ric_Vergne', 'http://en.wikipedia.org/wiki/Scuderia_Toro_Rosso']