Four Tricks for Comprehensions in Python

jieforest · 发表于 2013-9-20 11:51

The direct translation to a list comprehension is

matrix = [[i * j for j in range(1, 5)] for i in range(1, 4)]

复制代码

That is about as good as we can make it. No itertools magic will make this significantly better, as it's really just two list comprehensions that, by themselves, are already as simple as they can get.

However, what if we wanted to transpose the matrix? Here's how:

transposed = [list(row) for row in zip(*matrix)]

复制代码

jieforest · 发表于 2013-9-20 11:52

We need to pass row to list(), because zip() returns an iterator of tuples, not lists (if you don't care, just use zip(*matrix)). And what's with the *? It expands matrix, so that the zip() function does not see the matrix, but instead its elements (that is: its rows) as arguments (I assume you know zip() – if not, see the documentation).

As I have said before, in cases like this, we could use map() and write

transposed = list(map(list, zip(*matrix)))

复制代码

While that is very succinct, I find the list comprehension more readable.

Note that if you are working with large matrices, or if speed is an issue, or if you are doing anything mathematically interesting, you really should be using numpy.

jieforest · 发表于 2013-9-20 11:52

3 - Grouping Data

I admit it: I have Gentoo installed. On most machines I use Fedora these days, but there's still one computer I use that's running on Gentoo. For some strange reason I enjoy tinkering with it. But enough of that – it's just a prelude for the example we're going to examine. In Gentoo, there's a file that lists all the installed packages (minus the dependencies), and that is /var/lib/portage/world. It looks like that (not my actual world file):

app-editors/vim
dev-lang/python
dev-lang/ruby
dev-python/cython
dev-python/pyatspi
dev-python/pygments
dev-python/setuptools
dev-python/virtualenv
dev-util/ccache
dev-util/cunit
dev-util/meld
dev-util/perf
dev-vcs/bzr
dev-vcs/tig
dev-vcs/git
[...snip...]

复制代码

jieforest · 发表于 2013-9-20 11:52

and so on. Every package has a category, and the file is neatly ordered. That is important: The input data needs to be sorted for what we are going to do now – what I want is to construct a dictionary from this file that looks like:

{
"app-editors": [
"app-editors/vim"
],
"dev-lang": [
"dev-lang/python",
"dev-lang/ruby"
],
"dev-python": [
"dev-python/cython",
"dev-python/pyatspi",
"dev-python/pygments",
"dev-python/setuptools",
"dev-python/virtualenv"
],
"dev-util": [
"dev-util/ccache",
"dev-util/cunit",
"dev-util/meld",
"dev-util/perf"
],
"dev-vcs": [
"dev-vcs/bzr",
"dev-vcs/tig",
"dev-vcs/git"
],
# ... snip ...
}

复制代码

jieforest · 发表于 2013-9-22 09:10

Without further explaining, here's the code:

from itertools import groupby
def get_category(line):
return line.split("/")[0]
with open("/var/lib/portage/world") as worldfile:
packages = {category: list(packages) for category, packages in groupby(worldfile, get_category)}

复制代码

As you probably noticed, we used a dictionary comprehension. It works just like a list comprehension, with the difference that we don't only add elements, but also keys for them. That last line is a bit long because the words are long, but I do believe it is quite readable (and would not get more readable if we shortened the names).

jieforest · 发表于 2013-9-22 09:11

The other new gimmick we used it itertools.groupby(). The official documentation is rather complicated and I didn't get it at first, but in our example we see the gist of it: groupby() takes an iterable (in our case the open wordfile) and returns an iterable with iterables in it (the inner ones are the packages).

The grouping is done with a function we pass to groupby() (our function get_category). That function is called on every element in the iterable passed to groupby() (that is: for every line in worldfile), and the return values of that function are the keys groupby() uses to build its groups.

One of the nice things of groupby() is that is does not allocate additional memory, but consumes the iterator lazily as we build the dictionary.

4 – Breaking the Loop

This is going to take a bit longer to explain, and could very well be considered a hack, but it's cool and maybe even useful.

Imagine a sequence of words. Again, we want to iterate over that sequence and get the first character, but this time we stop once we find the word “and”. With an explicit for loop, that might look like that:

text = " I can't tell the difference between Whizzo butter and this dead crab."
first_chars = []
for word in text.split():
if word != "and":
first_chars.append(word[0])
else:
break
That works just fine:
>>> first_chars
['I', 'c', 't', 't', 'd', 'b', 'W', 'b']
How about a list comprehension? Let's try it:
# raises SyntaxError
first_chars = [word[0] for word in text.split() if word != "and" else break]

复制代码

jieforest · 发表于 2013-9-22 09:12

Unfortunately, that is illegal syntax, because the if clause of list comprehensions does not permit an else. That's because it is a “special” kind of if, meant as some kind of filter. But in our case, the if is not really a filter, rather something else. So we could try not to use the if-the-comprehension-syntax, but the if-the-ternary-operator-syntax kind instead. And on we go:

# raises SyntaxError
first_chars = [word[0] if word != "and" else break for word in text.split()]

复制代码

Again, that raises a SyntaxError. This time, the if/else is correct, but unfortunately the break statement is not allowed in a comprehension or generator expression. We need to dig deeper – how does an iterator signal that it is done and that there are no more elements left? Lets try to find out by constructing a simple example.

>>> a_list = [1, 2, 3]
>>> a_iter = iter(a_list)
>>> next(a_iter)
1
>>> next(a_iter)
2
>>> next(a_iter)
3
>>> next(a_iter)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>

复制代码

jieforest · 发表于 2013-9-22 09:12

StopIteration

So there we are: When an iterator is done, it simply raises the Exception StopIteration. In fact, Python's for loop is just syntactic sugar. This for loop

text = "My hovercraft is full of eels."
first_chars = []
for word in text.split():
first_chars.append(word[0])
Is shorthand for this:
text = "My hovercraft is full of eels."
first_chars = []
try:
text_iter = iter(text.split())
while True:
word = next(text_iter)
first_chars.append(word[0])
except StopIteration:
pass

复制代码

jieforest · 发表于 2013-9-22 09:12

The for loop does the try/except magic behind our backs, but the difference between those two loops is really just syntax.

Can we raise a StopIteration in our comprehensions or generator expressions? Not directly, because raise is a statement, but only expressions are allowed in comprehensions. A function call is an expression, though, so let's define a little function to raise StopIteration:

def stop():
raise StopIteration
See if it works:
>>> first_chars = [word[0] if word != "and" else stop() for word in text.split()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "<stdin>", line 2, in stop

复制代码

jieforest · 发表于 2013-9-23 15:27

StopIteration

No. Strange. But how about a generator expression we pass to list()?

>>> first_chars = list(word[0] if word != "and" else stop() for word in text.split())
>>> first_chars
['I', 'c', 't', 't', 'd', 'b', 'W', 'b']

复制代码

That works! The reason the former does not work is really something of a bug; the Python developer Nick Coghlan explained why in a mail to the python-ideas mailing list.

If one wanted to move the stop expression to the end of the generator expression, one could use this function instead:

def stopif(expr):
if expr:
raise StopIteration
return True

复制代码

Four Tricks for Comprehensions in Python

浏览过的版块