API Reference

Grouping

These tools yield groups of items from a source iterable.


New itertools

more_itertools.chunked(iterable, n)[source]

Break iterable into lists of length n:

>>> list(chunked([1, 2, 3, 4, 5, 6], 3))
[[1, 2, 3], [4, 5, 6]]

If the length of iterable is not evenly divisible by n, the last returned list will be shorter:

>>> list(chunked([1, 2, 3, 4, 5, 6, 7, 8], 3))
[[1, 2, 3], [4, 5, 6], [7, 8]]

To use a fill-in value instead, see the grouper() recipe.

chunked() is useful for splitting up a computation on a large number of keys into batches, to be pickled and sent off to worker processes. One example is operations on rows in MySQL, which does not implement server-side cursors properly and would otherwise load the entire dataset into RAM on the client.

more_itertools.sliced(seq, n)[source]

Yield slices of length n from the sequence seq.

>>> list(sliced((1, 2, 3, 4, 5, 6), 3))
[(1, 2, 3), (4, 5, 6)]

If the length of the sequence is not divisible by the requested slice length, the last slice will be shorter.

>>> list(sliced((1, 2, 3, 4, 5, 6, 7, 8), 3))
[(1, 2, 3), (4, 5, 6), (7, 8)]

This function will only work for iterables that support slicing. For non-sliceable iterables, see chunked().

more_itertools.distribute(n, iterable)[source]

Distribute the items from iterable among n smaller iterables.

>>> group_1, group_2 = distribute(2, [1, 2, 3, 4, 5, 6])
>>> list(group_1)
[1, 3, 5]
>>> list(group_2)
[2, 4, 6]

If the length of iterable is not evenly divisible by n, then the length of the returned iterables will not be identical:

>>> children = distribute(3, [1, 2, 3, 4, 5, 6, 7])
>>> [list(c) for c in children]
[[1, 4, 7], [2, 5], [3, 6]]

If the length of iterable is smaller than n, then the last returned iterables will be empty:

>>> children = distribute(5, [1, 2, 3])
>>> [list(c) for c in children]
[[1], [2], [3], [], []]

This function uses itertools.tee() and may require significant storage. If you need the order items in the smaller iterables to match the original iterable, see divide().

more_itertools.divide(n, iterable)[source]

Divide the elements from iterable into n parts, maintaining order.

>>> group_1, group_2 = divide(2, [1, 2, 3, 4, 5, 6])
>>> list(group_1)
[1, 2, 3]
>>> list(group_2)
[4, 5, 6]

If the length of iterable is not evenly divisible by n, then the length of the returned iterables will not be identical:

>>> children = divide(3, [1, 2, 3, 4, 5, 6, 7])
>>> [list(c) for c in children]
[[1, 2, 3], [4, 5], [6, 7]]

If the length of the iterable is smaller than n, then the last returned iterables will be empty:

>>> children = divide(5, [1, 2, 3])
>>> [list(c) for c in children]
[[1], [2], [3], [], []]

This function will exhaust the iterable before returning and may require significant storage. If order is not important, see distribute(), which does not first pull the iterable into memory.

more_itertools.split_at(iterable, pred)[source]

Yield lists of items from iterable, where each list is delimited by an item where callable pred returns True. The lists do not include the delimiting items.

>>> list(split_at('abcdcba', lambda x: x == 'b'))
[['a'], ['c', 'd', 'c'], ['a']]
>>> list(split_at(range(10), lambda n: n % 2 == 1))
[[0], [2], [4], [6], [8], []]
more_itertools.split_before(iterable, pred)[source]

Yield lists of items from iterable, where each list starts with an item where callable pred returns True:

>>> list(split_before('OneTwo', lambda s: s.isupper()))
[['O', 'n', 'e'], ['T', 'w', 'o']]
>>> list(split_before(range(10), lambda n: n % 3 == 0))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
more_itertools.split_after(iterable, pred)[source]

Yield lists of items from iterable, where each list ends with an item where callable pred returns True:

>>> list(split_after('one1two2', lambda s: s.isdigit()))
[['o', 'n', 'e', '1'], ['t', 'w', 'o', '2']]
>>> list(split_after(range(10), lambda n: n % 3 == 0))
[[0], [1, 2, 3], [4, 5, 6], [7, 8, 9]]
more_itertools.bucket(iterable, key, validator=None)[source]

Wrap iterable and return an object that buckets it iterable into child iterables based on a key function.

>>> iterable = ['a1', 'b1', 'c1', 'a2', 'b2', 'c2', 'b3']
>>> s = bucket(iterable, key=lambda x: x[0])
>>> a_iterable = s['a']
>>> next(a_iterable)
'a1'
>>> next(a_iterable)
'a2'
>>> list(s['b'])
['b1', 'b2', 'b3']

The original iterable will be advanced and its items will be cached until they are used by the child iterables. This may require significant storage.

By default, attempting to select a bucket to which no items belong will exhaust the iterable and cache all values. If you specify a validator function, selected buckets will instead be checked against it.

>>> from itertools import count
>>> it = count(1, 2)  # Infinite sequence of odd numbers
>>> key = lambda x: x % 10  # Bucket by last digit
>>> validator = lambda x: x in {1, 3, 5, 7, 9}  # Odd digits only
>>> s = bucket(it, key=key, validator=validator)
>>> 2 in s
False
>>> list(s[2])
[]

Itertools recipes

more_itertools.grouper(n, iterable, fillvalue=None)[source]

Collect data into fixed-length chunks or blocks.

>>> list(grouper(3, 'ABCDEFG', 'x'))
[('A', 'B', 'C'), ('D', 'E', 'F'), ('G', 'x', 'x')]
more_itertools.partition(pred, iterable)[source]

Returns a 2-tuple of iterables derived from the input iterable. The first yields the items that have pred(item) == False. The second yields the items that have pred(item) == True.

>>> is_odd = lambda x: x % 2 != 0
>>> iterable = range(10)
>>> even_items, odd_items = partition(is_odd, iterable)
>>> list(even_items), list(odd_items)
([0, 2, 4, 6, 8], [1, 3, 5, 7, 9])

Lookahead and lookback

These tools peek at an iterable’s values without advancing it.


New itertools

more_itertools.spy(iterable, n=1)[source]

Return a 2-tuple with a list containing the first n elements of iterable, and an iterator with the same items as iterable. This allows you to “look ahead” at the items in the iterable without advancing it.

There is one item in the list by default:

>>> iterable = 'abcdefg'
>>> head, iterable = spy(iterable)
>>> head
['a']
>>> list(iterable)
['a', 'b', 'c', 'd', 'e', 'f', 'g']

You may use unpacking to retrieve items instead of lists:

>>> (head,), iterable = spy('abcdefg')
>>> head
'a'
>>> (first, second), iterable = spy('abcdefg', 2)
>>> first
'a'
>>> second
'b'

The number of items requested can be larger than the number of items in the iterable:

>>> iterable = [1, 2, 3, 4, 5]
>>> head, iterable = spy(iterable, 10)
>>> head
[1, 2, 3, 4, 5]
>>> list(iterable)
[1, 2, 3, 4, 5]
class more_itertools.peekable(iterable)[source]

Wrap an iterator to allow lookahead and prepending elements.

Call peek() on the result to get the value that will be returned by next(). This won’t advance the iterator:

>>> p = peekable(['a', 'b'])
>>> p.peek()
'a'
>>> next(p)
'a'

Pass peek() a default value to return that instead of raising StopIteration when the iterator is exhausted.

>>> p = peekable([])
>>> p.peek('hi')
'hi'

peekables also offer a prepend() method, which “inserts” items at the head of the iterable:

>>> p = peekable([1, 2, 3])
>>> p.prepend(10, 11, 12)
>>> next(p)
10
>>> p.peek()
11
>>> list(p)
[11, 12, 1, 2, 3]

peekables can be indexed. Index 0 is the item that will be returned by next(), index 1 is the item after that, and so on: The values up to the given index will be cached.

>>> p = peekable(['a', 'b', 'c', 'd'])
>>> p[0]
'a'
>>> p[1]
'b'
>>> next(p)
'a'

Negative indexes are supported, but be aware that they will cache the remaining items in the source iterator, which may require significant storage.

To check whether a peekable is exhausted, check its truth value:

>>> p = peekable(['a', 'b'])
>>> if p:  # peekable has items
...     list(p)
['a', 'b']
>>> if not p:  # peekable is exhaused
...     list(p)
[]
class more_itertools.seekable(iterable)[source]

Wrap an iterator to allow for seeking backward and forward. This progressively caches the items in the source iterable so they can be re-visited.

Call seek() with an index to seek to that position in the source iterable.

To “reset” an iterator, seek to 0:

>>> from itertools import count
>>> it = seekable((str(n) for n in count()))
>>> next(it), next(it), next(it)
('0', '1', '2')
>>> it.seek(0)
>>> next(it), next(it), next(it)
('0', '1', '2')
>>> next(it)
'3'

You can also seek forward:

>>> it = seekable((str(n) for n in range(20)))
>>> it.seek(10)
>>> next(it)
'10'
>>> it.seek(20)  # Seeking past the end of the source isn't a problem
>>> list(it)
[]
>>> it.seek(0)  # Resetting works even after hitting the end
>>> next(it), next(it), next(it)
('0', '1', '2')

The cache grows as the source iterable progresses, so beware of wrapping very large or infinite iterables.

You may view the contents of the cache with the elements() method. That returns a SequenceView, a view that updates automatically:

>>> it = seekable((str(n) for n in range(10)))
>>> next(it), next(it), next(it)
('0', '1', '2')
>>> elements = it.elements()
>>> elements
SequenceView(['0', '1', '2'])
>>> next(it)
'3'
>>> elements
SequenceView(['0', '1', '2', '3'])

Windowing

These tools yield windows of items from an iterable.


New itertools

more_itertools.windowed(seq, n, fillvalue=None, step=1)[source]

Return a sliding window of width n over the given iterable.

>>> all_windows = windowed([1, 2, 3, 4, 5], 3)
>>> list(all_windows)
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]

When the window is larger than the iterable, fillvalue is used in place of missing values:

>>> list(windowed([1, 2, 3], 4))
[(1, 2, 3, None)]

Each window will advance in increments of step:

>>> list(windowed([1, 2, 3, 4, 5, 6], 3, fillvalue='!', step=2))
[(1, 2, 3), (3, 4, 5), (5, 6, '!')]
more_itertools.stagger(iterable, offsets=(-1, 0, 1), longest=False, fillvalue=None)[source]

Yield tuples whose elements are offset from iterable. The amount by which the i-th item in each tuple is offset is given by the i-th item in offsets.

>>> list(stagger([0, 1, 2, 3]))
[(None, 0, 1), (0, 1, 2), (1, 2, 3)]
>>> list(stagger(range(8), offsets=(0, 2, 4)))
[(0, 2, 4), (1, 3, 5), (2, 4, 6), (3, 5, 7)]

By default, the sequence will end when the final element of a tuple is the last item in the iterable. To continue until the first element of a tuple is the last item in the iterable, set longest to True:

>>> list(stagger([0, 1, 2, 3], longest=True))
[(None, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, None), (3, None, None)]

By default, None will be used to replace offsets beyond the end of the sequence. Specify fillvalue to use some other value.


Itertools recipes

more_itertools.pairwise(iterable)[source]

Returns an iterator of paired items, overlapping, from the original

>>> take(4, pairwise(count()))
[(0, 1), (1, 2), (2, 3), (3, 4)]

Augmenting

These tools yield items from an iterable, plus additional data.


New itertools

more_itertools.count_cycle(iterable, n=None)[source]

Cycle through the items from iterable up to n times, yielding the number of completed cycles along with each item. If n is omitted the process repeats indefinitely.

>>> list(count_cycle('AB', 3))
[(0, 'A'), (0, 'B'), (1, 'A'), (1, 'B'), (2, 'A'), (2, 'B')]
more_itertools.intersperse(e, iterable, n=1)[source]

Intersperse filler element e among the items in iterable, leaving n items between each filler element.

>>> list(intersperse('!', [1, 2, 3, 4, 5]))
[1, '!', 2, '!', 3, '!', 4, '!', 5]
>>> list(intersperse(None, [1, 2, 3, 4, 5], n=2))
[1, 2, None, 3, 4, None, 5]
more_itertools.padded(iterable, fillvalue=None, n=None, next_multiple=False)[source]

Yield the elements from iterable, followed by fillvalue, such that at least n items are emitted.

>>> list(padded([1, 2, 3], '?', 5))
[1, 2, 3, '?', '?']

If next_multiple is True, fillvalue will be emitted until the number of items emitted is a multiple of n:

>>> list(padded([1, 2, 3, 4], n=3, next_multiple=True))
[1, 2, 3, 4, None, None]

If n is None, fillvalue will be emitted indefinitely.

more_itertools.adjacent(predicate, iterable, distance=1)[source]

Return an iterable over (bool, item) tuples where the item is drawn from iterable and the bool indicates whether that item satisfies the predicate or is adjacent to an item that does.

For example, to find whether items are adjacent to a 3:

>>> list(adjacent(lambda x: x == 3, range(6)))
[(False, 0), (False, 1), (True, 2), (True, 3), (True, 4), (False, 5)]

Set distance to change what counts as adjacent. For example, to find whether items are two places away from a 3:

>>> list(adjacent(lambda x: x == 3, range(6), distance=2))
[(False, 0), (True, 1), (True, 2), (True, 3), (True, 4), (True, 5)]

This is useful for contextualizing the results of a search function. For example, a code comparison tool might want to identify lines that have changed, but also surrounding lines to give the viewer of the diff context.

The predicate function will only be called once for each item in the iterable.

See also groupby_transform(), which can be used with this function to group ranges of items with the same bool value.

more_itertools.groupby_transform(iterable, keyfunc=None, valuefunc=None)[source]

An extension of itertools.groupby() that transforms the values of iterable after grouping them. keyfunc is a function used to compute a grouping key for each item. valuefunc is a function for transforming the items after grouping.

>>> iterable = 'AaaABbBCcA'
>>> keyfunc = lambda x: x.upper()
>>> valuefunc = lambda x: x.lower()
>>> grouper = groupby_transform(iterable, keyfunc, valuefunc)
>>> [(k, ''.join(g)) for k, g in grouper]
[('A', 'aaaa'), ('B', 'bbb'), ('C', 'cc'), ('A', 'a')]

keyfunc and valuefunc default to identity functions if they are not specified.

groupby_transform() is useful when grouping elements of an iterable using a separate iterable as the key. To do this, zip() the iterables and pass a keyfunc that extracts the first element and a valuefunc that extracts the second element:

>>> from operator import itemgetter
>>> keys = [0, 0, 1, 1, 1, 2, 2, 2, 3]
>>> values = 'abcdefghi'
>>> iterable = zip(keys, values)
>>> grouper = groupby_transform(iterable, itemgetter(0), itemgetter(1))
>>> [(k, ''.join(g)) for k, g in grouper]
[(0, 'ab'), (1, 'cde'), (2, 'fgh'), (3, 'i')]

Note that the order of items in the iterable is significant. Only adjacent items are grouped together, so if you don’t want any duplicate groups, you should sort the iterable by the key function.


Itertools recipes

more_itertools.padnone(iterable)[source]

Returns the sequence of elements and then returns None indefinitely.

>>> take(5, padnone(range(3)))
[0, 1, 2, None, None]

Useful for emulating the behavior of the built-in map() function.

See also padded().

more_itertools.ncycles(iterable, n)[source]

Returns the sequence elements n times

>>> list(ncycles(["a", "b"], 3))
['a', 'b', 'a', 'b', 'a', 'b']

Combining

These tools combine multiple iterables.


New itertools

more_itertools.collapse(iterable, base_type=None, levels=None)[source]

Flatten an iterable with multiple levels of nesting (e.g., a list of lists of tuples) into non-iterable types.

>>> iterable = [(1, 2), ([3, 4], [[5], [6]])]
>>> list(collapse(iterable))
[1, 2, 3, 4, 5, 6]

String types are not considered iterable and will not be collapsed. To avoid collapsing other types, specify base_type:

>>> iterable = ['ab', ('cd', 'ef'), ['gh', 'ij']]
>>> list(collapse(iterable, base_type=tuple))
['ab', ('cd', 'ef'), 'gh', 'ij']

Specify levels to stop flattening after a certain level:

>>> iterable = [('a', ['b']), ('c', ['d'])]
>>> list(collapse(iterable))  # Fully flattened
['a', 'b', 'c', 'd']
>>> list(collapse(iterable, levels=1))  # Only one level flattened
['a', ['b'], 'c', ['d']]
more_itertools.sort_together(iterables, key_list=(0, ), reverse=False)[source]

Return the input iterables sorted together, with key_list as the priority for sorting. All iterables are trimmed to the length of the shortest one.

This can be used like the sorting function in a spreadsheet. If each iterable represents a column of data, the key list determines which columns are used for sorting.

By default, all iterables are sorted using the 0-th iterable:

>>> iterables = [(4, 3, 2, 1), ('a', 'b', 'c', 'd')]
>>> sort_together(iterables)
[(1, 2, 3, 4), ('d', 'c', 'b', 'a')]

Set a different key list to sort according to another iterable. Specifying mutliple keys dictates how ties are broken:

>>> iterables = [(3, 1, 2), (0, 1, 0), ('c', 'b', 'a')]
>>> sort_together(iterables, key_list=(1, 2))
[(2, 3, 1), (0, 0, 1), ('a', 'c', 'b')]

Set reverse to True to sort in descending order.

>>> sort_together([(1, 2, 3), ('c', 'b', 'a')], reverse=True)
[(3, 2, 1), ('a', 'b', 'c')]
more_itertools.interleave(*iterables)[source]

Return a new iterable yielding from each iterable in turn, until the shortest is exhausted.

>>> list(interleave([1, 2, 3], [4, 5], [6, 7, 8]))
[1, 4, 6, 2, 5, 7]

For a version that doesn’t terminate after the shortest iterable is exhausted, see interleave_longest().

more_itertools.interleave_longest(*iterables)[source]

Return a new iterable yielding from each iterable in turn, skipping any that are exhausted.

>>> list(interleave_longest([1, 2, 3], [4, 5], [6, 7, 8]))
[1, 4, 6, 2, 5, 7, 3, 8]

This function produces the same output as roundrobin(), but may perform better for some inputs (in particular when the number of iterables is large).

more_itertools.collate(*iterables, key=lambda a: a, reverse=False)[source]

Return a sorted merge of the items from each of several already-sorted iterables.

>>> list(collate('ACDZ', 'AZ', 'JKL'))
['A', 'A', 'C', 'D', 'J', 'K', 'L', 'Z', 'Z']

Works lazily, keeping only the next value from each iterable in memory. Use collate() to, for example, perform a n-way mergesort of items that don’t fit in memory.

If a key function is specified, the iterables will be sorted according to its result:

>>> key = lambda s: int(s)  # Sort by numeric value, not by string
>>> list(collate(['1', '10'], ['2', '11'], key=key))
['1', '2', '10', '11']

If the iterables are sorted in descending order, set reverse to True:

>>> list(collate([5, 3, 1], [4, 2, 0], reverse=True))
[5, 4, 3, 2, 1, 0]

If the elements of the passed-in iterables are out of order, you might get unexpected results.

On Python 2.7, this function delegates to heapq.merge() if neither of the keyword arguments are specified. On Python 3.5+, this function is an alias for heapq.merge().

more_itertools.zip_offset(*iterables, offsets, longest=False, fillvalue=None)[source]

zip the input iterables together, but offset the i-th iterable by the i-th item in offsets.

>>> list(zip_offset('0123', 'abcdef', offsets=(0, 1)))
[('0', 'b'), ('1', 'c'), ('2', 'd'), ('3', 'e')]

This can be used as a lightweight alternative to SciPy or pandas to analyze data sets in which somes series have a lead or lag relationship.

By default, the sequence will end when the shortest iterable is exhausted. To continue until the longest iterable is exhausted, set longest to True.

>>> list(zip_offset('0123', 'abcdef', offsets=(0, 1), longest=True))
[('0', 'b'), ('1', 'c'), ('2', 'd'), ('3', 'e'), (None, 'f')]

By default, None will be used to replace offsets beyond the end of the sequence. Specify fillvalue to use some other value.


Itertools recipes

more_itertools.dotproduct(vec1, vec2)[source]

Returns the dot product of the two iterables.

>>> dotproduct([10, 10], [20, 20])
400
more_itertools.flatten(listOfLists)[source]

Return an iterator flattening one level of nesting in a list of lists.

>>> list(flatten([[0, 1], [2, 3]]))
[0, 1, 2, 3]

See also collapse(), which can flatten multiple levels of nesting.

more_itertools.roundrobin(*iterables)[source]

Yields an item from each iterable, alternating between them.

>>> list(roundrobin('ABC', 'D', 'EF'))
['A', 'D', 'E', 'B', 'F', 'C']

This function produces the same output as interleave_longest(), but may perform better for some inputs (in particular when the number of iterables is small).

more_itertools.prepend(value, iterator)[source]

Yield value, followed by the elements in iterator.

>>> value = '0'
>>> iterator = ['1', '2', '3']
>>> list(prepend(value, iterator))
['0', '1', '2', '3']

To prepend multiple values, see itertools.chain().

Summarizing

These tools return summarized or aggregated data from an iterable.


New itertools

more_itertools.ilen(iterable)[source]

Return the number of items in iterable.

>>> ilen(x for x in range(1000000) if x % 3 == 0)
333334

This consumes the iterable, so handle with care.

more_itertools.first(iterable[, default])[source]

Return the first item of iterable, or default if iterable is empty.

>>> first([0, 1, 2, 3])
0
>>> first([], 'some default')
'some default'

If default is not provided and there are no items in the iterable, raise ValueError.

first() is useful when you have a generator of expensive-to-retrieve values and want any arbitrary one. It is marginally shorter than next(iter(iterable), default).

more_itertools.last(iterable[, default])[source]

Return the last item of iterable, or default if iterable is empty.

>>> last([0, 1, 2, 3])
3
>>> last([], 'some default')
'some default'

If default is not provided and there are no items in the iterable, raise ValueError.

more_itertools.one(iterable, too_short=None, too_long=None)[source]

Return the first item from iterable, which is expected to contain only that item. Raise an exception if iterable is empty or has more than one item.

one() is useful for ensuring that an iterable contains only one item. For example, it can be used to retrieve the result of a database query that is expected to return a single row.

If iterable is empty, ValueError will be raised. You may specify a different exception with the too_short keyword:

>>> it = []
>>> one(it)  
Traceback (most recent call last):
...
ValueError: too many items in iterable (expected 1)'
>>> too_short = IndexError('too few items')
>>> one(it, too_short=too_short)  
Traceback (most recent call last):
...
IndexError: too few items

Similarly, if iterable contains more than one item, ValueError will be raised. You may specify a different exception with the too_long keyword:

>>> it = ['too', 'many']
>>> one(it)  
Traceback (most recent call last):
...
ValueError: too many items in iterable (expected 1)'
>>> too_long = RuntimeError
>>> one(it, too_long=too_long)  
Traceback (most recent call last):
...
RuntimeError

Note that one() attempts to advance iterable twice to ensure there is only one item. If there is more than one, both items will be discarded. See spy() or peekable() to check iterable contents less destructively.

more_itertools.unique_to_each(*iterables)[source]

Return the elements from each of the input iterables that aren’t in the other input iterables.

For example, suppose you have a set of packages, each with a set of dependencies:

{'pkg_1': {'A', 'B'}, 'pkg_2': {'B', 'C'}, 'pkg_3': {'B', 'D'}}

If you remove one package, which dependencies can also be removed?

If pkg_1 is removed, then A is no longer necessary - it is not associated with pkg_2 or pkg_3. Similarly, C is only needed for pkg_2, and D is only needed for pkg_3:

>>> unique_to_each({'A', 'B'}, {'B', 'C'}, {'B', 'D'})
[['A'], ['C'], ['D']]

If there are duplicates in one input iterable that aren’t in the others they will be duplicated in the output. Input order is preserved:

>>> unique_to_each("mississippi", "missouri")
[['p', 'p'], ['o', 'u', 'r']]

It is assumed that the elements of each iterable are hashable.

more_itertools.locate(iterable, pred=bool)[source]

Yield the index of each item in iterable for which pred returns True.

pred defaults to bool(), which will select truthy items:

>>> list(locate([0, 1, 1, 0, 1, 0, 0]))
[1, 2, 4]

Set pred to a custom function to, e.g., find the indexes for a particular item:

>>> list(locate(['a', 'b', 'c', 'b'], lambda x: x == 'b'))
[1, 3]

Use with windowed() to find the indexes of a sub-sequence:

>>> from more_itertools import windowed
>>> iterable = [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]
>>> sub = [1, 2, 3]
>>> pred = lambda w: w == tuple(sub)  # windowed() returns tuples
>>> list(locate(windowed(iterable, len(sub)), pred=pred))
[1, 5, 9]

Use with seekable() to find indexes and then retrieve the associated items:

>>> from itertools import count
>>> from more_itertools import seekable
>>> source = (3 * n + 1 if (n % 2) else n // 2 for n in count())
>>> it = seekable(source)
>>> pred = lambda x: x > 100
>>> indexes = locate(it, pred=pred)
>>> i = next(indexes)
>>> it.seek(i)
>>> next(it)
106
more_itertools.rlocate(iterable, pred=bool)[source]

Yield the index of each item in iterable for which pred returns True, starting from the right and moving left.

pred defaults to bool(), which will select truthy items:

>>> list(rlocate([0, 1, 1, 0, 1, 0, 0]))  # Truthy at 1, 2, and 4
[4, 2, 1]

Set pred to a custom function to, e.g., find the indexes for a particular item:

>>> iterable = iter('abcb')
>>> pred = lambda x: x == 'b'
>>> list(rlocate(iterable, pred))
[3, 1]

Beware, this function won’t return anything for infinite iterables. If iterable is reversible, rlocate will reverse it and search from the right. Otherwise, it will search from the left and return the results in reverse order.

See locate() to for other example applications.

more_itertools.consecutive_groups(iterable, ordering=lambda x: x)[source]

Yield groups of consecutive items using itertools.groupby(). The ordering function determines whether two items are adjacent by returning their position.

By default, the ordering function is the identity function. This is suitable for finding runs of numbers:

>>> iterable = [1, 10, 11, 12, 20, 30, 31, 32, 33, 40]
>>> for group in consecutive_groups(iterable):
...     print(list(group))
[1]
[10, 11, 12]
[20]
[30, 31, 32, 33]
[40]

For finding runs of adjacent letters, try using the index() method of a string of letters:

>>> from string import ascii_lowercase
>>> iterable = 'abcdfgilmnop'
>>> ordering = ascii_lowercase.index
>>> for group in consecutive_groups(iterable, ordering):
...     print(list(group))
['a', 'b', 'c', 'd']
['f', 'g']
['i']
['l', 'm', 'n', 'o', 'p']
more_itertools.exactly_n(iterable, n, predicate=bool)[source]

Return True if exactly n items in the iterable are True according to the predicate function.

>>> exactly_n([True, True, False], 2)
True
>>> exactly_n([True, True, False], 1)
False
>>> exactly_n([0, 1, 2, 3, 4, 5], 3, lambda x: x < 3)
True

The iterable will be advanced until n + 1 truthy items are encountered, so avoid calling it on infinite iterables.

class more_itertools.run_length[source]

run_length.encode() compresses an iterable with run-length encoding. It yields groups of repeated items with the count of how many times they were repeated:

>>> uncompressed = 'abbcccdddd'
>>> list(run_length.encode(uncompressed))
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]

run_length.decode() decompresses an iterable that was previously compressed with run-length encoding. It yields the items of the decompressed iterable:

>>> compressed = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> list(run_length.decode(compressed))
['a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd']
more_itertools.map_reduce(iterable, keyfunc, valuefunc=None, reducefunc=None)[source]

Return a dictionary that maps the items in iterable to categories defined by keyfunc, transforms them with valuefunc, and then summarizes them by category with reducefunc.

valuefunc defaults to the identity function if it is unspecified. If reducefunc is unspecified, no summarization takes place:

>>> keyfunc = lambda x: x.upper()
>>> result = map_reduce('abbccc', keyfunc)
>>> sorted(result.items())
[('A', ['a']), ('B', ['b', 'b']), ('C', ['c', 'c', 'c'])]

Specifying valuefunc transforms the categorized items:

>>> keyfunc = lambda x: x.upper()
>>> valuefunc = lambda x: 1
>>> result = map_reduce('abbccc', keyfunc, valuefunc)
>>> sorted(result.items())
[('A', [1]), ('B', [1, 1]), ('C', [1, 1, 1])]

Specifying reducefunc summarizes the categorized items:

>>> keyfunc = lambda x: x.upper()
>>> valuefunc = lambda x: 1
>>> reducefunc = sum
>>> result = map_reduce('abbccc', keyfunc, valuefunc, reducefunc)
>>> sorted(result.items())
[('A', 1), ('B', 2), ('C', 3)]

You may want to filter the input iterable before applying the map/reduce proecdure:

>>> all_items = range(30)
>>> items = [x for x in all_items if 10 <= x <= 20]  # Filter
>>> keyfunc = lambda x: x % 2  # Evens map to 0; odds to 1
>>> categories = map_reduce(items, keyfunc=keyfunc)
>>> sorted(categories.items())
[(0, [10, 12, 14, 16, 18, 20]), (1, [11, 13, 15, 17, 19])]
>>> summaries = map_reduce(items, keyfunc=keyfunc, reducefunc=sum)
>>> sorted(summaries.items())
[(0, 90), (1, 75)]

Note that all items in the iterable are gathered into a list before the summarization step, which may require significant storage.

The returned object is a collections.defaultdict with the default_factory set to None, such that it behaves like a normal dictionary.


Itertools recipes

more_itertools.all_equal(iterable)[source]

Returns True if all the elements are equal to each other.

>>> all_equal('aaaa')
True
>>> all_equal('aaab')
False
more_itertools.first_true(iterable, default=False, pred=None)[source]

Returns the first true value in the iterable.

If no true value is found, returns default

If pred is not None, returns the first item for which pred(item) == True .

>>> first_true(range(10))
1
>>> first_true(range(10), pred=lambda x: x > 5)
6
>>> first_true(range(10), default='missing', pred=lambda x: x > 9)
'missing'
more_itertools.nth(iterable, n, default=None)[source]

Returns the nth item or a default value.

>>> l = range(10)
>>> nth(l, 3)
3
>>> nth(l, 20, "zebra")
'zebra'
more_itertools.quantify(iterable, pred=bool)[source]

Return the how many times the predicate is true.

>>> quantify([True, False, True])
2

Selecting

These tools yield certain items from an iterable.


New itertools

more_itertools.islice_extended(start, stop, step)[source]

An extension of itertools.islice() that supports negative values for stop, start, and step.

>>> iterable = iter('abcdefgh')
>>> list(islice_extended(iterable, -4, -1))
['e', 'f', 'g']

Slices with negative values require some caching of iterable, but this function takes care to minimize the amount of memory required.

For example, you can use a negative step with an infinite iterator:

>>> from itertools import count
>>> list(islice_extended(count(), 110, 99, -2))
[110, 108, 106, 104, 102, 100]
more_itertools.strip(iterable, pred)[source]

Yield the items from iterable, but strip any from the beginning and end for which pred returns True.

For example, to remove a set of items from both ends of an iterable:

>>> iterable = (None, False, None, 1, 2, None, 3, False, None)
>>> pred = lambda x: x in {None, False, ''}
>>> list(strip(iterable, pred))
[1, 2, None, 3]

This function is analogous to str.strip().

more_itertools.lstrip(iterable, pred)[source]

Yield the items from iterable, but strip any from the beginning for which pred returns True.

For example, to remove a set of items from the start of an iterable:

>>> iterable = (None, False, None, 1, 2, None, 3, False, None)
>>> pred = lambda x: x in {None, False, ''}
>>> list(lstrip(iterable, pred))
[1, 2, None, 3, False, None]

This function is analogous to to str.lstrip(), and is essentially an wrapper for itertools.dropwhile().

more_itertools.rstrip(iterable, pred)[source]

Yield the items from iterable, but strip any from the end for which pred returns True.

For example, to remove a set of items from the end of an iterable:

>>> iterable = (None, False, None, 1, 2, None, 3, False, None)
>>> pred = lambda x: x in {None, False, ''}
>>> list(rstrip(iterable, pred))
[None, False, None, 1, 2, None, 3]

This function is analogous to str.rstrip().


Itertools recipes

more_itertools.take(n, iterable)[source]

Return first n items of the iterable as a list.

>>> take(3, range(10))
[0, 1, 2]
>>> take(5, range(3))
[0, 1, 2]

Effectively a short replacement for next based iterator consumption when you want more than one item, but less than the whole iterator.

more_itertools.tail(n, iterable)[source]

Return an iterator over the last n items of iterable.

>>> t = tail(3, 'ABCDEFG')
>>> list(t)
['E', 'F', 'G']
more_itertools.unique_everseen(iterable, key=None)[source]

Yield unique elements, preserving order.

>>> list(unique_everseen('AAAABBBCCDAABBB'))
['A', 'B', 'C', 'D']
>>> list(unique_everseen('ABBCcAD', str.lower))
['A', 'B', 'C', 'D']

Sequences with a mix of hashable and unhashable items can be used. The function will be slower (i.e., O(n^2)) for unhashable items.

more_itertools.unique_justseen(iterable, key=None)[source]

Yields elements in order, ignoring serial duplicates

>>> list(unique_justseen('AAAABBBCCDAABBB'))
['A', 'B', 'C', 'D', 'A', 'B']
>>> list(unique_justseen('ABBCcAD', str.lower))
['A', 'B', 'C', 'A', 'D']

Combinatorics

These tools yield combinatorial arrangements of items from iterables.


New itertools

more_itertools.distinct_permutations(iterable)[source]

Yield successive distinct permutations of the elements in iterable.

>>> sorted(distinct_permutations([1, 0, 1]))
[(0, 1, 1), (1, 0, 1), (1, 1, 0)]

Equivalent to set(permutations(iterable)), except duplicates are not generated and thrown away. For larger input sequences this is much more efficient.

Duplicate permutations arise when there are duplicated elements in the input iterable. The number of items returned is n! / (x_1! * x_2! * … * x_n!), where n is the total number of items input, and each x_i is the count of a distinct item in the input sequence.

more_itertools.circular_shifts(iterable)[source]

Return a list of circular shifts of iterable.

>>> circular_shifts(range(4))
[(0, 1, 2, 3), (1, 2, 3, 0), (2, 3, 0, 1), (3, 0, 1, 2)]

Itertools recipes

more_itertools.powerset(iterable)[source]

Yields all possible subsets of the iterable.

>>> list(powerset([1,2,3]))
[(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]
more_itertools.random_product(*args, **kwds)[source]

Draw an item at random from each of the input iterables.

>>> random_product('abc', range(4), 'XYZ')  
('c', 3, 'Z')

If repeat is provided as a keyword argument, that many items will be drawn from each iterable.

>>> random_product('abcd', range(4), repeat=2)  
('a', 2, 'd', 3)

This equivalent to taking a random selection from itertools.product(*args, **kwarg).

more_itertools.random_permutation(iterable, r=None)[source]

Return a random r length permutation of the elements in iterable.

If r is not specified or is None, then r defaults to the length of iterable.

>>> random_permutation(range(5))  
(3, 4, 0, 1, 2)

This equivalent to taking a random selection from itertools.permutations(iterable, r).

more_itertools.random_combination(iterable, r)[source]

Return a random r length subsequence of the elements in iterable.

>>> random_combination(range(5), 3)  
(2, 3, 4)

This equivalent to taking a random selection from itertools.combinations(iterable, r).

more_itertools.random_combination_with_replacement(iterable, r)[source]

Return a random r length subsequence of elements in iterable, allowing individual elements to be repeated.

>>> random_combination_with_replacement(range(3), 5) 
(0, 0, 1, 2, 2)

This equivalent to taking a random selection from itertools.combinations_with_replacement(iterable, r).

more_itertools.nth_combination(iterable, r, index)[source]

Equivalent to list(combinations(iterable, r))[index].

The subsequences of iterable that are of length r can be ordered lexicographically. nth_combination() computes the subsequence at sort position index directly, without computing the previous subsequences.

Wrapping

These tools provide wrappers to smooth working with objects that produce or consume iterables.


New itertools

more_itertools.always_iterable(obj, base_type=(<type 'unicode'>, <type 'str'>))[source]

If obj is iterable, return an iterator over its items:

>>> obj = (1, 2, 3)
>>> list(always_iterable(obj))
[1, 2, 3]

If obj is not iterable, return a one-item iterable containing obj:

>>> obj = 1
>>> list(always_iterable(obj))
[1]

If obj is None, return an empty iterable:

>>> obj = None
>>> list(always_iterable(None))
[]

By default, binary and text strings are not considered iterable:

>>> obj = 'foo'
>>> list(always_iterable(obj))
['foo']

If base_type is set, objects for which isinstance(obj, base_type) returns True won’t be considered iterable.

>>> obj = {'a': 1}
>>> list(always_iterable(obj))  # Iterate over the dict's keys
['a']
>>> list(always_iterable(obj, base_type=dict))  # Treat dicts as a unit
[{'a': 1}]

Set base_type to None to avoid any special handling and treat objects Python considers iterable as iterable:

>>> obj = 'foo'
>>> list(always_iterable(obj, base_type=None))
['f', 'o', 'o']
more_itertools.consumer(func)[source]

Decorator that automatically advances a PEP-342-style “reverse iterator” to its first yield point so you don’t have to call next() on it manually.

>>> @consumer
... def tally():
...     i = 0
...     while True:
...         print('Thing number %s is %s.' % (i, (yield)))
...         i += 1
...
>>> t = tally()
>>> t.send('red')
Thing number 0 is red.
>>> t.send('fish')
Thing number 1 is fish.

Without the decorator, you would have to call next(t) before t.send() could be used.

more_itertools.with_iter(context_manager)[source]

Wrap an iterable in a with statement, so it closes once exhausted.

For example, this will close the file when the iterator is exhausted:

upper_lines = (line.upper() for line in with_iter(open('foo')))

Any context manager which returns an iterable is a candidate for with_iter.


Itertools recipes

more_itertools.iter_except(func, exception, first=None)[source]

Yields results from a function repeatedly until an exception is raised.

Converts a call-until-exception interface to an iterator interface. Like iter(func, sentinel), but uses an exception instead of a sentinel to end the loop.

>>> l = [0, 1, 2]
>>> list(iter_except(l.pop, IndexError))
[2, 1, 0]

Others

New itertools

more_itertools.numeric_range(start, stop, step)[source]

An extension of the built-in range() function whose arguments can be any orderable numeric type.

With only stop specified, start defaults to 0 and step defaults to 1. The output items will match the type of stop:

>>> list(numeric_range(3.5))
[0.0, 1.0, 2.0, 3.0]

With only start and stop specified, step defaults to 1. The output items will match the type of start:

>>> from decimal import Decimal
>>> start = Decimal('2.1')
>>> stop = Decimal('5.1')
>>> list(numeric_range(start, stop))
[Decimal('2.1'), Decimal('3.1'), Decimal('4.1')]

With start, stop, and step specified the output items will match the type of start + step:

>>> from fractions import Fraction
>>> start = Fraction(1, 2)  # Start at 1/2
>>> stop = Fraction(5, 2)  # End at 5/2
>>> step = Fraction(1, 2)  # Count by 1/2
>>> list(numeric_range(start, stop, step))
[Fraction(1, 2), Fraction(1, 1), Fraction(3, 2), Fraction(2, 1)]

If step is zero, ValueError is raised. Negative steps are supported:

>>> list(numeric_range(3, -1, -1.0))
[3.0, 2.0, 1.0, 0.0]

Be aware of the limitations of floating point numbers; the representation of the yielded numbers may be surprising.

more_itertools.always_reversible(iterable)[source]

An extension of reversed() that supports all iterables, not just those which implement the Reversible or Sequence protocols.

>>> print(*always_reversible(x for x in range(3)))
2 1 0

If the iterable is already reversible, this function returns the result of reversed(). If the iterable is not reversible, this function will cache the remaining items in the iterable and yield them in reverse order, which may require significant storage.

more_itertools.side_effect(func, iterable, chunk_size=None, before=None, after=None)[source]

Invoke func on each item in iterable (or on each chunk_size group of items) before yielding the item.

func must be a function that takes a single argument. Its return value will be discarded.

before and after are optional functions that take no arguments. They will be executed before iteration starts and after it ends, respectively.

side_effect can be used for logging, updating progress bars, or anything that is not functionally “pure.”

Emitting a status message:

>>> from more_itertools import consume
>>> func = lambda item: print('Received {}'.format(item))
>>> consume(side_effect(func, range(2)))
Received 0
Received 1

Operating on chunks of items:

>>> pair_sums = []
>>> func = lambda chunk: pair_sums.append(sum(chunk))
>>> list(side_effect(func, [0, 1, 2, 3, 4, 5], 2))
[0, 1, 2, 3, 4, 5]
>>> list(pair_sums)
[1, 5, 9]

Writing to a file-like object:

>>> from io import StringIO
>>> from more_itertools import consume
>>> f = StringIO()
>>> func = lambda x: print(x, file=f)
>>> before = lambda: print(u'HEADER', file=f)
>>> after = f.close
>>> it = [u'a', u'b', u'c']
>>> consume(side_effect(func, it, before=before, after=after))
>>> f.closed
True
more_itertools.iterate(func, start)[source]

Return start, func(start), func(func(start)), …

>>> from itertools import islice
>>> list(islice(iterate(lambda x: 2*x, 1), 10))
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
more_itertools.difference(iterable, func=operator.sub)[source]

By default, compute the first difference of iterable using operator.sub().

>>> iterable = [0, 1, 3, 6, 10]
>>> list(difference(iterable))
[0, 1, 2, 3, 4]

This is the opposite of accumulate()’s default behavior:

>>> from more_itertools import accumulate
>>> iterable = [0, 1, 2, 3, 4]
>>> list(accumulate(iterable))
[0, 1, 3, 6, 10]
>>> list(difference(accumulate(iterable)))
[0, 1, 2, 3, 4]

By default func is operator.sub(), but other functions can be specified. They will be applied as follows:

A, B, C, D, ... --> A, func(B, A), func(C, B), func(D, C), ...

For example, to do progressive division:

>>> iterable = [1, 2, 6, 24, 120]  # Factorial sequence
>>> func = lambda x, y: x // y
>>> list(difference(iterable, func))
[1, 2, 3, 4, 5]
more_itertools.make_decorator(wrapping_func, result_index=0)[source]

Return a decorator version of wrapping_func, which is a function that modifies an iterable. result_index is the position in that function’s signature where the iterable goes.

This lets you use itertools on the “production end,” i.e. at function definition. This can augment what the function returns without changing the function’s code.

For example, to produce a decorator version of chunked():

>>> from more_itertools import chunked
>>> chunker = make_decorator(chunked, result_index=0)
>>> @chunker(3)
... def iter_range(n):
...     return iter(range(n))
...
>>> list(iter_range(9))
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]

To only allow truthy items to be returned:

>>> truth_serum = make_decorator(filter, result_index=1)
>>> @truth_serum(bool)
... def boolean_test():
...     return [0, 1, '', ' ', False, True]
...
>>> list(boolean_test())
[1, ' ', True]

The peekable() and seekable() wrappers make for practical decorators:

>>> from more_itertools import peekable
>>> peekable_function = make_decorator(peekable)
>>> @peekable_function()
... def str_range(*args):
...     return (str(x) for x in range(*args))
...
>>> it = str_range(1, 20, 2)
>>> next(it), next(it), next(it)
('1', '3', '5')
>>> it.peek()
'7'
>>> next(it)
'7'
class more_itertools.SequenceView(target)[source]

Return a read-only view of the sequence object target.

SequenceView objects are analagous to Python’s built-in “dictionary view” types. They provide a dynamic view of a sequence’s items, meaning that when the sequence updates, so does the view.

>>> seq = ['0', '1', '2']
>>> view = SequenceView(seq)
>>> view
SequenceView(['0', '1', '2'])
>>> seq.append('3')
>>> view
SequenceView(['0', '1', '2', '3'])

Sequence views support indexing, slicing, and length queries. They act like the underlying sequence, except they don’t allow assignment:

>>> view[1]
'1'
>>> view[1:-1]
['1', '2']
>>> len(view)
4

Sequence views are useful as an alternative to copying, as they don’t require (much) extra storage.


Itertools recipes

more_itertools.consume(iterator, n=None)[source]

Advance iterable by n steps. If n is None, consume it entirely.

Efficiently exhausts an iterator without returning values. Defaults to consuming the whole iterator, but an optional second argument may be provided to limit consumption.

>>> i = (x for x in range(10))
>>> next(i)
0
>>> consume(i, 3)
>>> next(i)
4
>>> consume(i)
>>> next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

If the iterator has fewer items remaining than the provided limit, the whole iterator will be consumed.

>>> i = (x for x in range(3))
>>> consume(i, 5)
>>> next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
more_itertools.accumulate(iterable, func=operator.add)[source]

Return an iterator whose items are the accumulated results of a function (specified by the optional func argument) that takes two arguments. By default, returns accumulated sums with operator.add().

>>> list(accumulate([1, 2, 3, 4, 5]))  # Running sum
[1, 3, 6, 10, 15]
>>> list(accumulate([1, 2, 3], func=operator.mul))  # Running product
[1, 2, 6]
>>> list(accumulate([0, 1, -1, 2, 3, 2], func=max))  # Running maximum
[0, 1, 1, 2, 3, 3]

This function is available in the itertools module for Python 3.2 and greater.

more_itertools.tabulate(function, start=0)[source]

Return an iterator over the results of func(start), func(start + 1), func(start + 2)

func should be a function that accepts one integer argument.

If start is not specified it defaults to 0. It will be incremented each time the iterator is advanced.

>>> square = lambda x: x ** 2
>>> iterator = tabulate(square, -3)
>>> take(4, iterator)
[9, 4, 1, 0]
more_itertools.repeatfunc(func, times=None, *args)[source]

Call func with args repeatedly, returning an iterable over the results.

If times is specified, the iterable will terminate after that many repetitions:

>>> from operator import add
>>> times = 4
>>> args = 3, 5
>>> list(repeatfunc(add, times, *args))
[8, 8, 8, 8]

If times is None the iterable will not terminate:

>>> from random import randrange
>>> times = None
>>> args = 1, 11
>>> take(6, repeatfunc(randrange, times, *args))  
[2, 4, 8, 1, 8, 4]