Python 研究(Dive Into Python)

lastwinner · 发表于 2006-7-19 00:40

15.2. 应对需求变化
尽管你竭尽努力地分析你的客户，并点灯熬油地提炼出精确的需求，但需求还是会是不断变化。大部分客户在看到产品前不知道他们想要什么。即便知道，也不擅于精确表述出他们的有效需求。即便能表述出来，他们在下一个版本一定会要求更多的功能。因此你需要做好更新测试用例的准备以应对需求的改变。

假设你想要扩展罗马数字转换函数的范围。还记得没有哪个字符可以重复三遍以上这条规则吗？呃，现在罗马人希望给这条规则来个例外，用连续出现 4 个 M 字符来表示 4000。如果这样改了，你就可以把转换范围从 1..3999 扩展到 1..4999。但你先要对测试用例进行修改。

例 15.6. 修改测试用例以适应新需求（romantest71.py）
这个文件可以在例子目录下的 py/roman/stage7/ 目录中找到。

如果您还没有下载本书附带的例子程序, 可以下载本程序和其他例子程序。

lastwinner · 发表于 2006-7-19 00:40

[PHP] import roman71
import unittest

class KnownValues(unittest.TestCase):
knownValues = ( (1, 'I'),
                  (2, 'II'),
                  (3, 'III'),
                  (4, 'IV'),
                  (5, 'V'),
                  (6, 'VI'),
                  (7, 'VII'),
                  (8, 'VIII'),
                  (9, 'IX'),
                  (10, 'X'),
                  (50, 'L'),
                  (100, 'C'),
                  (500, 'D'),
                  (1000, 'M'),
                  (31, 'XXXI'),
                  (148, 'CXLVIII'),
                  (294, 'CCXCIV'),
                  (312, 'CCCXII'),
                  (421, 'CDXXI'),
                  (528, 'DXXVIII'),
                  (621, 'DCXXI'),
                  (782, 'DCCLXXXII'),
                  (870, 'DCCCLXX'),
                  (941, 'CMXLI'),
                  (1043, 'MXLIII'),
                  (1110, 'MCX'),
                  (1226, 'MCCXXVI'),
                  (1301, 'MCCCI'),
                  (1485, 'MCDLXXXV'),
                  (1509, 'MDIX'),
                  (1607, 'MDCVII'),
                  (1754, 'MDCCLIV'),
                  (1832, 'MDCCCXXXII'),
                  (1993, 'MCMXCIII'),
                  (2074, 'MMLXXIV'),
                  (2152, 'MMCLII'),
                  (2212, 'MMCCXII'),
                  (2343, 'MMCCCXLIII'),
                  (2499, 'MMCDXCIX'),
                  (2574, 'MMDLXXIV'),
                  (2646, 'MMDCXLVI'),
                  (2723, 'MMDCCXXIII'),
                  (2892, 'MMDCCCXCII'),
                  (2975, 'MMCMLXXV'),
                  (3051, 'MMMLI'),
                  (3185, 'MMMCLXXXV'),
                  (3250, 'MMMCCL'),
                  (3313, 'MMMCCCXIII'),
                  (3408, 'MMMCDVIII'),
                  (3501, 'MMMDI'),
                  (3610, 'MMMDCX'),
                  (3743, 'MMMDCCXLIII'),
                  (3844, 'MMMDCCCXLIV'),
                  (3888, 'MMMDCCCLXXXVIII'),
                  (3940, 'MMMCMXL'),
                  (3999, 'MMMCMXCIX'),
                  (4000, 'MMMM'),
                  (4500, 'MMMMD'),
                  (4888, 'MMMMDCCCLXXXVIII'),
                  (4999, 'MMMMCMXCIX'))

def testToRomanKnownValues(self):
      """toRoman should give known result with known input"""
      for integer, numeral in self.knownValues:
         result = roman71.toRoman(integer)
         self.assertEqual(numeral, result)

def testFromRomanKnownValues(self):
      """fromRoman should give known result with known input"""
      for integer, numeral in self.knownValues:
         result = roman71.fromRoman(numeral)
         self.assertEqual(integer, result)

class ToRomanBadInput(unittest.TestCase):
def testTooLarge(self):
      """toRoman should fail with large input"""
      self.assertRaises(roman71.OutOfRangeError, roman71.toRoman, 5000)

def testZero(self):
      """toRoman should fail with 0 input"""
      self.assertRaises(roman71.OutOfRangeError, roman71.toRoman, 0)

def testNegative(self):
      """toRoman should fail with negative input"""
      self.assertRaises(roman71.OutOfRangeError, roman71.toRoman, -1)

def testNonInteger(self):
      """toRoman should fail with non-integer input"""
      self.assertRaises(roman71.NotIntegerError, roman71.toRoman, 0.5)

class FromRomanBadInput(unittest.TestCase):
def testTooManyRepeatedNumerals(self):
      """fromRoman should fail with too many repeated numerals"""
      for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
         self.assertRaises(roman71.InvalidRomanNumeralError, roman71.fromRoman, s)

def testRepeatedPairs(self):
      """fromRoman should fail with repeated pairs of numerals"""
      for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'):
         self.assertRaises(roman71.InvalidRomanNumeralError, roman71.fromRoman, s)

def testMalformedAntecedent(self):
      """fromRoman should fail with malformed antecedents"""
      for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV',
               'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):
         self.assertRaises(roman71.InvalidRomanNumeralError, roman71.fromRoman, s)

def testBlank(self):
      """fromRoman should fail with blank string"""
      self.assertRaises(roman71.InvalidRomanNumeralError, roman71.fromRoman, ""

class SanityCheck(unittest.TestCase):
def testSanity(self):
      """fromRoman(toRoman(n))==n for all n"""
      for integer in range(1, 5000):
         numeral = roman71.toRoman(integer)
         result = roman71.fromRoman(numeral)
         self.assertEqual(integer, result)

class CaseCheck(unittest.TestCase):
def testToRomanCase(self):
      """toRoman should always return uppercase"""
      for integer in range(1, 5000):
         numeral = roman71.toRoman(integer)
         self.assertEqual(numeral, numeral.upper())

def testFromRomanCase(self):
      """fromRoman should only accept uppercase input"""
      for integer in range(1, 5000):
         numeral = roman71.toRoman(integer)
         roman71.fromRoman(numeral.upper())
         self.assertRaises(roman71.InvalidRomanNumeralError,
                           roman71.fromRoman, numeral.lower())

if __name__ == "__main__":
unittest.main()
[/PHP]

lastwinner · 发表于 2006-7-19 00:40

原来的已知值没有改变（它们仍然是合理的测试值）但你需要添加几个大于 4000 的值。这里我添加了 4000 （最短的一个）， 4500 （次短的一个）， 4888 （最长的一个）和 4999 （值最大的一个）。
  “最大输入”的定义改变了。以前是以 4000 调用 toRoman 并期待一个错误；而现在 4000-4999 成为了有效输入，需要将这个最大输入提升至 5000。
  “过多字符重复” 的定义也改变了。这个测试以前是以 'MMMM' 调用 fromRoman 并期待一个错误；而现在 MMMM 被认为是一个有效的罗马数字表示，需要将这个“过多字符重复”改为 'MMMMM'。
  完备测试和大小写测试原来在 1 到 3999 范围内循环。现在范围扩展了，这个 for 循环需要将范围也提升至 4999。

现在你的测试用例和新需求保持一致了，但是你的程序代码还没有，因此几个测试用例的失败是意料之中的事。

lastwinner · 发表于 2006-7-19 00:41

例 15.7. 用 romantest71.py 测试 roman71.py 的结果

fromRoman should only accept uppercase input ... ERROR
toRoman should always return uppercase ... ERROR
fromRoman should fail with blank string ... ok
fromRoman should fail with malformed antecedents ... ok
fromRoman should fail with repeated pairs of numerals ... ok
fromRoman should fail with too many repeated numerals ... ok
fromRoman should give known result with known input ... ERROR
toRoman should give known result with known input ... ERROR
fromRoman(toRoman(n))==n for all n ... ERROR
toRoman should fail with non-integer input ... ok
toRoman should fail with negative input ... ok
toRoman should fail with large input ... ok
toRoman should fail with 0 input ... ok
  我们的大小写检查是因为循环范围是 1 到 4999，而 toRoman 只接受 1 到 3999 之间的数，因此测试循环到 4000 就会失败。
  fromRoman 的已知值测试在遇到 'MMMM' 就会失败，因为 fromRoman 还认为这是一个无效的罗马数字表示。
  toRoman 的已知值测试在遇到 4000 就会失败，因为 toRoman 仍旧认为这超出了有效值范围。
  完备测试在遇到 4000 也会失败，因为 toRoman 也会认为这超出了有效值范围。

lastwinner · 发表于 2006-7-19 00:41

======================================================================
ERROR: fromRoman should only accept uppercase input
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 161, in testFromRomanCase
numeral = roman71.toRoman(integer)
  File "roman71.py", line 28, in toRoman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)
======================================================================
ERROR: toRoman should always return uppercase
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 155, in testToRomanCase
numeral = roman71.toRoman(integer)
  File "roman71.py", line 28, in toRoman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)
======================================================================
ERROR: fromRoman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 102, in testFromRomanKnownValues
result = roman71.fromRoman(numeral)
  File "roman71.py", line 47, in fromRoman
raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
InvalidRomanNumeralError: Invalid Roman numeral: MMMM
======================================================================
ERROR: toRoman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 96, in testToRomanKnownValues
result = roman71.toRoman(integer)
  File "roman71.py", line 28, in toRoman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)
======================================================================
ERROR: fromRoman(toRoman(n))==n for all n
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 147, in testSanity
numeral = roman71.toRoman(integer)
  File "roman71.py", line 28, in toRoman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)
----------------------------------------------------------------------
Ran 13 tests in 2.213s

FAILED (errors=5)既然新的需求导致了测试用例的失败，你该考虑修改代码以便它能再次通过测试用例。（在你开始编写单元测试时要习惯一件事：被测试代码永远不会在编写测试用例“之前”编写。正因为如此，你还有一些工作要做，一旦可以通过所有的测试用例，停止编码。）

lastwinner · 发表于 2006-7-19 00:42

例 15.8. 为新的需求编写代码（roman72.py）
这个文件可以在例子目录下的 py/roman/stage7/ 目录中找到。

[PHP] """Convert to and from Roman numerals"""
import re

#Define exceptions
class RomanError(Exception): pass
class OutOfRangeError(RomanError): pass
class NotIntegerError(RomanError): pass
class InvalidRomanNumeralError(RomanError): pass

#Define digit mapping
romanNumeralMap = (('M',  1000),
               ('CM', 900),
               ('D',  500),
               ('CD', 400),
               ('C',  100),
               ('XC', 90),
               ('L',  50),
               ('XL', 40),
               ('X',  10),
               ('IX', 9),
               ('V',  5),
               ('IV', 4),
               ('I',  1))

def toRoman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 5000):
      raise OutOfRangeError, "number out of range (must be 1..4999)"
if int(n) <> n:
      raise NotIntegerError, "non-integers can not be converted"

result = ""
for numeral, integer in romanNumeralMap:
      while n >= integer:
         result += numeral
         n -= integer
return result

#Define pattern to detect valid Roman numerals
romanNumeralPattern = '^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$'

def fromRoman(s):
"""convert Roman numeral to integer"""
if not s:
      raise InvalidRomanNumeralError, 'Input can not be blank'
if not re.search(romanNumeralPattern, s):
      raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s

result = 0
index = 0
for numeral, integer in romanNumeralMap:
      while s[index:index+len(numeral)] == numeral:
         result += integer
         index += len(numeral)
return result
。。。。。。。。
[/PHP]

lastwinner · 发表于 2006-7-19 00:42

toRoman 只需要在取值范围检查一处做个小改动。将原来的 0 < n < 4000，更改为现在的检查 0 < n < 5000。你还要更改你 raise 的错误信息以反映接受新取值范围（1..4999 而不再是 1..3999）。你不需要改变函数的其他部分，它们已经适用于新的情况。（它们会欣然地为新的 1000 添加 'M'，以 4000 为例，他们会返回 'MMMM' ）之前没能这样做是因为到范围检查时就被停了下来。）
你对 fromRoman 也不需要做过多的修改。唯一的修改就在 romanNumeralPattern：如果你注意的话，你会发现你只需在正则表达式的第一部分增加一个可选的 M 。这就允许最多 4 个 M 字符而不再是 3 个，意味着你允许的代表 4999 而不是 3999 的罗马数字。 fromRoman 函数本身是普遍适用的，它并不在意字符被多少次的重复，只是根据重复的罗马字符对应的数值进行累加。以前没能处理 'MMMM' 是因为你通过正则表达式的检查强行停止了。

你可能会怀疑只需这两处小改动。嘿，不相信我的话，你自己看看吧：

例 15.9. 用 romantest72.py 测试 roman72.py 的结果
fromRoman should only accept uppercase input ... ok
toRoman should always return uppercase ... ok
fromRoman should fail with blank string ... ok
fromRoman should fail with malformed antecedents ... ok
fromRoman should fail with repeated pairs of numerals ... ok
fromRoman should fail with too many repeated numerals ... ok
fromRoman should give known result with known input ... ok
toRoman should give known result with known input ... ok
fromRoman(toRoman(n))==n for all n ... ok
toRoman should fail with non-integer input ... ok
toRoman should fail with negative input ... ok
toRoman should fail with large input ... ok
toRoman should fail with 0 input ... ok

----------------------------------------------------------------------
Ran 13 tests in 3.685s

OK 所有的测试用例都通过了，停止编写代码

全面的单元测试意味着不必依赖于程序员的一面之词： “相信我！”

lastwinner · 发表于 2006-7-19 00:43

15.3. 重构
全面的单元测试带来的最大好处不是你的全部测试用例最终通过时的成就感；也不是被责怪破坏了别人的代码时能够证明自己的自信。最大的好处是单元测试给了你自由去无情地重构。

重构是在可运行代码的基础上使之更良好工作的过程。通常，“更好”意味着“更快”，也可能意味着 “使用更少的内存”，或者 “使用更少的磁盘空间”，或者仅仅是“更优雅的代码”。不管对你，对你的项目意味什么，在你的环境中，重构对任何程序的长期良性运转都是重要的。

这里， “更好” 意味着 “更快”。更具体地说， fromRoman 函数可以更快，关键在于那个丑陋的、用于验证罗马数字有效性的正则表达式。尝试不用正则表达式去解决是不值得的（这样做很难，而且可能也快不了多少），但可以通过预编译正则表达式使函数提速。

例 15.10. 编译正则表达式
>>> import re
>>> pattern = '^M?M?M?$'
>>> re.search(pattern, 'M')
<SRE_Match object at 01090490>
>>> compiledPattern = re.compile(pattern)
>>> compiledPattern
<SRE_Pattern object at 00F06E28>
>>> dir(compiledPattern)
['findall', 'match', 'scanner', 'search', 'split', 'sub', 'subn']
>>> compiledPattern.search('M')
<SRE_Match object at 01104928>  这是你曾在 re.search 中看到的语法。把一个正则表达式作为字符串（pattern）并用这个字符串来匹配（'M'）。如果能够匹配，函数返回一个 match 对象，可以用来确定匹配的部分和如何匹配的。
  这里是一个新的语法： re.compile 把一个正则表达式作为字符串参数接受并返回一个 pattern 对象。注意这里没去匹配字符串。编译正则表达式和以特定字符串（'M'）进行匹配不是一回事，所牵扯的只是正则表达式本身。
  re.compile 返回已编译的 pattern 对象有几个值得关注的功能：包括了几个 re 模块直接提供的功能（比如： search 和 sub）。
  用 'M' 作参数来调用已编译的 pattern 对象的 search 函数与用正则表达式和字符串 'M' 调用 re.search 可以得到相同的结果，只是快了很多。（事实上，re.search 函数仅仅将正则表达式编译，然后为你调用编译后的 pattern 对象的 search 方法。）

在需要多次使用同一个正则表达式的情况下，应该将它进行编译以获得一个 pattern 对象，然后直接调用这个 pattern 对象的方法。

lastwinner · 发表于 2006-7-19 00:44

例 15.11. roman81.py 中已编译的正则表达式
这个文件可以在例子目录下的 py/roman/stage8/ 目录中找到。

如果您还没有下载本书附带的例子程序, 可以下载本程序和其他例子程序。

[PHP]
# toRoman and rest of module omitted for clarity

romanNumeralPattern = \
re.compile('^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$')

def fromRoman(s):
"""convert Roman numeral to integer"""
if not s:
      raise InvalidRomanNumeralError, 'Input can not be blank'
if not romanNumeralPattern.search(s):
      raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s

result = 0
index = 0
for numeral, integer in romanNumeralMap:
      while s[index:index+len(numeral)] == numeral:
         result += integer
         index += len(numeral)
return result
。。。。。。。。
[/PHP]

看起来很相似，但实质却有很大改变。 romanNumeralPattern 不再是一个字符串了，而是一个由 re.compile 返回的 pattern 对象。
  这意味着你可以直接调用 romanNumeralPattern 的方法。这比每次调用 re.search 要快很多。模块被首次导入（import）之时，正则表达式被一次编译并存储于 romanNumeralPattern。之后每次调用 fromRoman 时，你可以立刻以正则表达式匹配输入的字符串，而不需要在重复背后的这些编译的工作。

lastwinner · 发表于 2006-7-19 00:44

那么编译正则表达式可以提速多少呢？你自己来看吧：

例 15.12. 用 romantest81.py 测试 roman81.py 的结果
.............
----------------------------------------------------------------------
Ran 13 tests in 3.385s

OK                      有一点说明一下：这里，我在运行单元测试时没有使用 -v 选项，因此输出的也不再是每个测试完整的 doc string，而是用一个圆点来表示每个通过的测试。（失败的测试标用 F 表示，发生错误则用 E 表示，你仍旧可以获得失败和错误的完整追踪信息以便查找问题所在。）
  运行 13 个测试耗时 3.385 秒，与之相比是没有预编译正则表达式时的 3.685秒。这是一个 8% 的整体提速，记住单元测试的大量时间实际上花在做其他工作上。（我单独测试了正则表达式部分的耗时，不考虑单元测试的其他环节，正则表达式编译可以让匹配 search 平均提速 54% 。）小小修改还真是值得。
  对了，不必顾虑什么，预先编译正则表达式并没有破坏什么，你刚刚证实这一点。

我还想做另外一个性能优化工作。就正则表达式语法的复杂性而言，通常有不止一种方法来构造相同的表达式是不会令人惊讶的。在 comp.lang.python 上对该模块进行一些讨论后，有人建议我使用 {m,n} 语法来查找可选重复字符。

[参考文档] Python 研究(Dive Into Python)

浏览过的版块