如何在string字段中使用多个逗号格式化.CSV文件的date字段

我有一个.CSV文件(file.csv),其数据全部用双引号引起来。 该文件的示例格式如下所示:

column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12" "4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55" "11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232" 

第九个字段是格式为“DD-MMM-YY”的date字段。 我必须将其转换为YYYY / MM / DD格式。 我正在尝试使用下面的代码,但没用。

 awk -F, ' BEGIN { split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ") for (i=1; i<=12; i++) mdigit[month[i]]=i } { m=substr($9,4,3) $9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20)) print }' OFS="," file.csv > temp_file.csv 

执行上述代码后,输出文件temp_file.csv如下所示。

 column1,column2,column3,column4,column5,column6,column7,Column8,00/00/2000,Column10 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,00/00/2000,"890","88","11-OCT-11","12" "4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455",00/00/2002, name","12","455","12-OCT-11","55" "11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,00/00/2000,"333","22","13-OCT-11","232" 

据我所知,问题是双引号中的逗号,因为我的代码也在考虑它们…请在以下问题上提出build议:

1)双引号在所有领域的所有值有什么区别? 如果他们有什么区别,我怎样才能摆脱他们所有的价值,除了在他们的逗号string? 2)对我的代码的任何修改,以便我可以格式化“DD-MMM-YYYY”格式的第9个字段为YYYY / MM / DD

你可以尝试下面的一行:

 awk ' BEGIN { FS = OFS = "," split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, / /) for (i=1; i<=12; i++) { mm[month[i]]=i } } NR>1 { gsub(/\"/, "", $(NF-1)) split($(NF-1), d, /-/) $(NF-1)=q "20" d[3] "/" mm[d[2]] "/" d[1] q}1' q='"' file 

输出:

 column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","2011/10/11","12" "4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","2011/10/12","55" "11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","2011/10/13","232" 

我强烈建议你使用适当的CSV解析器。 例如在Perl中使用Text :: CSV_XS将以正常的方式完成工作。 例如,这一行:

 perl -MText::CSV_XS -E'$csv=Text::CSV_XS->new({eol=>"\n", allow_whitespace=>1});@m=qw(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);@m{@m}=(1 .. @m);while(my $row=$csv->getline(ARGV)){($d,$m,$y)=split("-",$row->[8]);$row->[8]=sprintf"%02d/%02d/%04d",$d,$m{$m},$y if $m{$m};$csv->print(STDOUT, $row)}' file.csv > temp_file.csv 

你可以试试这个awk

 awk -F"\"" 'BEGIN { OFS="\"" }{ "date -d "$18" +%Y/%m/%d" | getline $18; print $0; }' yourfile.txt 

输出:

 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,name","890","88","2011/10/11","12" "4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2,name","12","455","2011/10/12","55" "11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,name","333","22","2011/10/13","232"