楼主: jieforest

Four Tricks for Comprehensions in Python

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
11#
 楼主| 发表于 2013-9-20 11:51 | 只看该作者
The direct translation to a list comprehension is
  1. matrix = [[i * j for j in range(1, 5)] for i in range(1, 4)]
复制代码
That is about as good as we can make it. No itertools magic will make this significantly better, as it's really just two list comprehensions that, by themselves, are already as simple as they can get.

However, what if we wanted to transpose the matrix? Here's how:
  1. transposed = [list(row) for row in zip(*matrix)]
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
12#
 楼主| 发表于 2013-9-20 11:52 | 只看该作者
We need to pass row to list(), because zip() returns an iterator of tuples, not lists (if you don't care, just use zip(*matrix)). And what's with the *? It expands matrix, so that the zip() function does not see the matrix, but instead its elements (that is: its rows) as arguments (I assume you know zip() – if not, see the documentation).

As I have said before, in cases like this, we could use map() and write
  1. transposed = list(map(list, zip(*matrix)))
复制代码
While that is very succinct, I find the list comprehension more readable.

Note that if you are working with large matrices, or if speed is an issue, or if you are doing anything mathematically interesting, you really should be using numpy.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
13#
 楼主| 发表于 2013-9-20 11:52 | 只看该作者
3 - Grouping Data

I admit it: I have Gentoo installed. On most machines I use Fedora these days, but there's still one computer I use that's running on Gentoo. For some strange reason I enjoy tinkering with it. But enough of that – it's just a prelude for the example we're going to examine. In Gentoo, there's a file that lists all the installed packages (minus the dependencies), and that is /var/lib/portage/world. It looks like that (not my actual world file):
  1. app-editors/vim
  2. dev-lang/python
  3. dev-lang/ruby
  4. dev-python/cython
  5. dev-python/pyatspi
  6. dev-python/pygments
  7. dev-python/setuptools
  8. dev-python/virtualenv
  9. dev-util/ccache
  10. dev-util/cunit
  11. dev-util/meld
  12. dev-util/perf
  13. dev-vcs/bzr
  14. dev-vcs/tig
  15. dev-vcs/git
  16. [...snip...]
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
14#
 楼主| 发表于 2013-9-20 11:52 | 只看该作者
and so on. Every package has a category, and the file is neatly ordered. That is important: The input data needs to be sorted for what we are going to do now – what I want is to construct a dictionary from this file that looks like:
  1. {
  2.     "app-editors": [
  3.         "app-editors/vim"
  4.     ],
  5.     "dev-lang": [
  6.         "dev-lang/python",
  7.         "dev-lang/ruby"
  8.     ],
  9.     "dev-python": [
  10.         "dev-python/cython",
  11.         "dev-python/pyatspi",
  12.         "dev-python/pygments",
  13.         "dev-python/setuptools",
  14.         "dev-python/virtualenv"
  15.     ],
  16.     "dev-util": [
  17.         "dev-util/ccache",
  18.         "dev-util/cunit",
  19.         "dev-util/meld",
  20.         "dev-util/perf"
  21.     ],
  22.     "dev-vcs": [
  23.         "dev-vcs/bzr",
  24.         "dev-vcs/tig",
  25.         "dev-vcs/git"
  26.     ],
  27.     # ... snip ...
  28. }
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
15#
 楼主| 发表于 2013-9-22 09:10 | 只看该作者
Without further explaining, here's the code:
  1. from itertools import groupby

  2. def get_category(line):
  3.     return line.split("/")[0]

  4. with open("/var/lib/portage/world") as worldfile:
  5.     packages = {category: list(packages) for category, packages in groupby(worldfile, get_category)}
复制代码
As you probably noticed, we used a dictionary comprehension. It works just like a list comprehension, with the difference that we don't only add elements, but also keys for them. That last line is a bit long because the words are long, but I do believe it is quite readable (and would not get more readable if we shortened the names).

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
16#
 楼主| 发表于 2013-9-22 09:11 | 只看该作者
The other new gimmick we used it itertools.groupby(). The official documentation is rather complicated and I didn't get it at first, but in our example we see the gist of it: groupby() takes an iterable (in our case the open wordfile) and returns an iterable with iterables in it (the inner ones are the packages).

The grouping is done with a function we pass to groupby() (our function get_category). That function is called on every element in the iterable passed to groupby() (that is: for every line in worldfile), and the return values of that function are the keys groupby() uses to build its groups.

One of the nice things of groupby() is that is does not allocate additional memory, but consumes the iterator lazily as we build the dictionary.

4 – Breaking the Loop

This is going to take a bit longer to explain, and could very well be considered a hack, but it's cool and maybe even useful.

Imagine a sequence of words. Again, we want to iterate over that sequence and get the first character, but this time we stop once we find the word “and”. With an explicit for loop, that might look like that:
  1. text = " I can't tell the difference between Whizzo butter and this dead crab."
  2. first_chars = []
  3. for word in text.split():
  4.     if word != "and":
  5.         first_chars.append(word[0])
  6.     else:
  7.         break
  8. That works just fine:

  9. >>> first_chars
  10. ['I', 'c', 't', 't', 'd', 'b', 'W', 'b']
  11. How about a list comprehension? Let's try it:

  12. # raises SyntaxError
  13. first_chars = [word[0] for word in text.split() if word != "and" else break]
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
17#
 楼主| 发表于 2013-9-22 09:12 | 只看该作者
Unfortunately, that is illegal syntax, because the if clause of list comprehensions does not permit an else. That's because it is a “special” kind of if, meant as some kind of filter. But in our case, the if is not really a filter, rather something else. So we could try not to use the if-the-comprehension-syntax, but the if-the-ternary-operator-syntax kind instead. And on we go:
  1. # raises SyntaxError
  2. first_chars = [word[0] if word != "and" else break for word in text.split()]
复制代码
Again, that raises a SyntaxError. This time, the if/else is correct, but unfortunately the break statement is not allowed in a comprehension or generator expression. We need to dig deeper – how does an iterator signal that it is done and that there are no more elements left? Lets try to find out by constructing a simple example.
  1. >>> a_list = [1, 2, 3]
  2. >>> a_iter = iter(a_list)
  3. >>> next(a_iter)
  4. 1
  5. >>> next(a_iter)
  6. 2
  7. >>> next(a_iter)
  8. 3
  9. >>> next(a_iter)
  10. Traceback (most recent call last):
  11.   File "<stdin>", line 1, in <module>
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
18#
 楼主| 发表于 2013-9-22 09:12 | 只看该作者
StopIteration

So there we are: When an iterator is done, it simply raises the Exception StopIteration. In fact, Python's for loop is just syntactic sugar. This for loop
  1. text = "My hovercraft is full of eels."
  2. first_chars = []
  3. for word in text.split():
  4.     first_chars.append(word[0])
  5. Is shorthand for this:

  6. text = "My hovercraft is full of eels."
  7. first_chars = []
  8. try:
  9.     text_iter = iter(text.split())
  10.     while True:
  11.         word = next(text_iter)
  12.         first_chars.append(word[0])
  13. except StopIteration:
  14.     pass
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
19#
 楼主| 发表于 2013-9-22 09:12 | 只看该作者
The for loop does the try/except magic behind our backs, but the difference between those two loops is really just syntax.

Can we raise a StopIteration in our comprehensions or generator expressions? Not directly, because raise is a statement, but only expressions are allowed in comprehensions. A function call is an expression, though, so let's define a little function to raise StopIteration:
  1. def stop():
  2.     raise StopIteration
  3. See if it works:

  4. >>> first_chars = [word[0] if word != "and" else stop() for word in text.split()]
  5. Traceback (most recent call last):
  6.   File "<stdin>", line 1, in <module>
  7.   File "<stdin>", line 1, in <listcomp>
  8.   File "<stdin>", line 2, in stop
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
20#
 楼主| 发表于 2013-9-23 15:27 | 只看该作者
StopIteration

No. Strange. But how about a generator expression we pass to list()?
  1. >>> first_chars = list(word[0] if word != "and" else stop() for word in text.split())
  2. >>> first_chars
  3. ['I', 'c', 't', 't', 'd', 'b', 'W', 'b']
复制代码
That works! The reason the former does not work is really something of a bug; the Python developer Nick Coghlan explained why in a mail to the python-ideas mailing list.

If one wanted to move the stop expression to the end of the generator expression, one could use this function instead:
  1. def stopif(expr):
  2.     if expr:
  3.         raise StopIteration
  4.     return True
复制代码

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表