用\ nreplace带引号的string中的换行符

我需要编写一个快速的(明天)filter脚本来replace转义string中的换行符(LF或CRLF) \n 。 内容是一个(破碎)的JavaScript程序,所以我需要在string中允许像"ab\"cd""ab\\"cd"ef"这样的转义序列。

我知道sed并不适合这项工作,因为它每行工作,所以我转向perl,其中我什么都不知道:)

我写了这个正则expression式"(((\\.)|[^"\\\n])*\n?)*"并用http://regex.powertoy.org进行了testing。但是, perl -p -e 's/"(((\\.)|[^"\\\n])*(\n)?)*"/TEST/g'不是。

所以我的问题是:

  1. 如何让perl匹配换行符?
  2. 如何编写“replace-by”部分,使其保留原始string,只replace换行符?

awk解决scheme有这个类似的问题 ,但它不是我所需要的。

注:我通常不会问“请为我做这个”的问题,但我真的不想明天学习perl / awk … 🙂

编辑 :示例数据

 "abc\"def" - matches as one string "abc\\"def"xy" - match "abcd\\" and "xy" "ab cd ef" - is replaced by "ab\ncd\nef" 

这是一个简单的Perl解决方案:

 s§ \G # match from the beginning of the string or the last match ([^"]*+) # till we get to a quote "((?:[^"\\]++|\\.)*+)" # match the whole quote § $a = $1; $b = $2; $b =~ s/\r?\n/\\n/g; # replace what you want inside the quote "$a\"$b\""; §gex; 

这是另一种解决方案,如果你不想使用/e ,只是用一个正则表达式来做:

 use strict; $_=<<'_quote_'; hai xtest "aa xx aax" baix "xx" x "axa\"x\\" xa "x\\\\\"x" ax xbai!x _quote_ print "Original:\n", $_, "\n"; s/ ( (?: # at the beginning of the string match till inside the quotes ^(?&outside_quote) " # or continue from last match which always stops inside quotes | (?!^)\G ) (?&inside_quote) # eat things up till we find what we want ) x # the thing we want to replace ( (?&inside_quote) # eat more possibly till end of quote # if going out of quote make sure the match stops inside them # or at the end of string (?: " (?&outside_quote) (?:"|\z) )? ) (?(DEFINE) (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts (?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes ) /$1Y$2/xg; print "Replaced:\n", $_, "\n"; 

输出:

 Original: hai xtest "aa xx aax" baix "xx" x "axa\"x\\" xa "x\\\\\"x" ax xbai!x Replaced: hai xtest "aa YY aaY" baix "YY" x "aYa\"Y\\" xa "Y\\\\\"Y" ax xbai!x 

要使用换行而不是x,只需在正则表达式中替换它就像这样:

 s/ ( (?: # at the beginning of the string match till inside the quotes ^(?&outside_quote) " # or continue from last match which always stops inside quotes | (?!^)\G ) (?&inside_quote) # eat things up till we find what we want ) \r?\n # the thing we want to replace ( (?&inside_quote) # eat more possibly till end of quote # if going out of quote make sure the match stops inside them # or at the end of string (?: " (?&outside_quote) (?:"|\z) )? ) (?(DEFINE) (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes ) /$1\\n$2/xg; 

在OP发布一些示例内容之前,尝试在正则表达式的末尾添加“m”(也可能是“s”)标志; 来自perldoc perlreref (参考) :

 m Multiline mode - ^ and $ match internal lines s match as a Single line - . matches \n 

对于测试,您可能还会发现添加命令行参数“-i.bak”,以便保留原始文件(现在扩展名为“.bak”)的备份。

还要注意,如果你想捕获但不能存储你可以使用(?:PATTERN)而不是(PATTERN) 。 一旦你有捕获的内容使用$1$9从匹配部分访问存储的匹配。

有关更多信息,请参阅以下链接以及perldoc perlretut (教程)和perldoc perlre (full-ish文档)

 #!/usr/bin/perl use warnings; use strict; use Regexp::Common; $_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef"); print "befor: {{$_}}\n"; s{($RE{quoted})} { (my $x=$1) =~ s/\n/\\n/g; $x }ge; print "after: {{$_}}\n"; 

使用Perl 5.14.0(用perlbrew安装)可以这样做:

 #!/usr/bin/env perl use strict; use warnings; use 5.14.0; use Regexp::Common qw/delimited/; my $data = <<'END'; "abc\"def" "abc\\"def"xy" "ab cd ef" END my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr; print $output; 

我需要5.14.0作为内部替换的/r标志。 如果有人知道如何避免这个,请让我知道。