我正在testing下面的否定lookbehind断言,我想了解结果:
echo "foo foofoo" | grep -Po '(?<!foo)foo'
它打印出来
foo foo foo
我以为只有两个第一个foo会被打印出来,'echo foo foo foo',而不是第三个,因为我的断言应该是指find'foo'之前没有的'foo' 。
我错过了什么? 为什么第三个foo被匹配?
注:grep -P表示将正则expression式解释为perl兼容正则expression式。 grep -o表示只打印出匹配的string。 我的grep是版本2.5.1。
我不能重现这一点 – 运行确切的命令,我只得到两场比赛。
我正在使用GNU grep 2.6.3
不过,我发现一个有用的技巧是解决一个正则表达式是这样的 – perl允许你运行regex debug :
#!/usr/bin/env perl use strict; use warnings; #dump results use Data::Dumper; #set regex indo debug mode use re 'debug'; #iterate __DATA__ below while ( <DATA> ) { #apply regex to current line my @matches = m/(?<!foo)(foo)/g; print Dumper \@matches; } __DATA__ foo foofoo
这给了我们输出:
Compiling REx "(?<!foo)(foo)" Final program: 1: UNLESSM[-3] (7) 3: EXACT <foo> (5) 5: SUCCEED (0) 6: TAIL (7) 7: OPEN1 (9) 9: EXACT <foo> (11) 11: CLOSE1 (13) 13: END (0) anchored "foo" at 0 (checking anchored) minlen 3 Matching REx "(?<!foo)(foo)" against "foo foofoo" Intuit: trying to determine minimum start position... doing 'check' fbm scan, [0..10] gave 0 Found anchored substr "foo" at offset 0 (rx_origin now 0)... (multiline anchor test skipped) Intuit: Successfully guessed: match at offset 0 0 <> <foo foofoo> | 1:UNLESSM[-3](7) 0 <> <foo foofoo> | 7:OPEN1(9) 0 <> <foo foofoo> | 9:EXACT <foo>(11) 3 <foo> < foofoo> | 11:CLOSE1(13) 3 <foo> < foofoo> | 13:END(0) Match successful! Matching REx "(?<!foo)(foo)" against " foofoo" Intuit: trying to determine minimum start position... doing 'check' fbm scan, [3..10] gave 4 Found anchored substr "foo" at offset 4 (rx_origin now 4)... (multiline anchor test skipped) try at offset... Intuit: Successfully guessed: match at offset 4 4 <foo > <foofoo> | 1:UNLESSM[-3](7) 1 <f> <oo foofoo> | 3: EXACT <foo>(5) failed... 4 <foo > <foofoo> | 7:OPEN1(9) 4 <foo > <foofoo> | 9:EXACT <foo>(11) 7 <foo foo> <foo> | 11:CLOSE1(13) 7 <foo foo> <foo> | 13:END(0) Match successful! Matching REx "(?<!foo)(foo)" against "foo" Intuit: trying to determine minimum start position... doing 'check' fbm scan, [7..10] gave 7 Found anchored substr "foo" at offset 7 (rx_origin now 7)... (multiline anchor test skipped) Intuit: Successfully guessed: match at offset 7 7 <foo foo> <foo> | 1:UNLESSM[-3](7) 4 <foo > <foofoo> | 3: EXACT <foo>(5) 7 <foo foo> <foo> | 5: SUCCEED(0) subpattern success... failed... Match failed
在对这个问题进行了大讨论之后,我得出了一个结论,那就是我对隐藏的负面主张的理解是正确的:
echo "foo foofoo" | grep -Po '(?<!foo)foo'
应该返回foo两次。
我的grep版本或者编译过的PCRE lib是buggy。
有些人在他们的机器上用不同版本的grep测试了这个命令,结果不一样。 有的看了两个foo ,有的看了三个foo ,像我一样。
我用Perl测试了这个正则表达式,并且我得到了预期的结果,两次。
grep手册页指出-P选项是实验性的 。
我的教训是:如果你想PCRE真的有效,使用Perl。