从awk中排除列

我想删除几列,然后删除文件内容。 我想要删除的列就像月,日,时间和纪元时间;这些是不同的每一行,不能让我独特的文件内容。

sample.log的示例内容:

Jun 5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 Jun 5 05:13:14 AAA AAA AAAA 1433495594.306612 XXXX CCCC CCCC AAAA SDDDD DFFFFF222 Jun 5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 Jun 5 05:13:15 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 Jun 5 05:13:16 AAA AAA AAAA XXXXX 1433495597.306615 XXXX CCCC CCCC AAAA SDDDD DFFFFF333 Jun 5 05:13:17 AAA AAA AAAA XXXXX 1433495598.306616 XXXX CCCC CCCC AAAA SDDDD DFFFFF444 

问题:

月份,date,时间是固定的列,但是时间段在列号7和8之间切换。想要知道如何处理这个。

示例输出:

 Jun 5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 Jun 5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 Jun 5 05:13:15 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 

如果上面的问题太多,那么请问如下:

 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111 

我正在尝试以下方向的事情,但不是很有帮助。

 while read line do seven=$(echo $line |awk '{print $7}') eight=$(echo $line |awk '{print $8}') if [[ "$seven" =~ "^[0-9]" ]];then #echo "seventh column starts with number" echo $line|awk '$1=$2=$3=$7=" " {print}' else #echo "Eighth column starts with number" echo $line|awk '$1=$2=$3=$8=" " {print}' fi done < $1 

更多例子:

input文件内容:

 Jun 5 05:13:13 AAA BBB CCC 142222222222.000 DDD EEE FFFF Jun 5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF Jun 5 05:13:14 AAA BBB CCC 142222222224.000 DDD EEE GGGG Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF Jun 5 05:13:13 AAA BBB CCC XXX 142222222226.000 DDD EEE FFFF 

输出:

 Jun 5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF Jun 5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE GGGG Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF 

要么

输出:

  AAA BBB CCC DDD EEE FFFF AAA BBB CCC DDD EEE GGGG AAA BBB CCC XXX DDD EEE GGGG AAA BBB CCC XXX DDD EEE FFFF 

如果我正确地理解这个问题,那么这里就不需要Bash了,只是Awk:

 % awk ' { for (f = 4; f <= NF; ++f) { # Start at column 4 if (f == 7 || f == 8) { # Treat columns 7 or 8 differently if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric printf $f " " } } else { printf $f " " } } printf "\n" } ' sample.log AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF111 AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF222 AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF111 AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF111 AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF333 AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF444 

要获取唯一的行:

 % awk ' { for (f = 4; f <= NF; ++f) { # Start at column 4 if (f == 7 || f == 8) { # Treat columns 7 or 8 differently if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric printf $f " " } } else { printf $f " " } } printf "\n" } ' sample2.log | sort -u AAA BBB CCC DDD EEE FFFF AAA BBB CCC DDD EEE GGGG AAA BBB CCC XXX DDD EEE FFFF AAA BBB CCC XXX DDD EEE GGGG 

在处理%s

如果您的输入文件包含%符号,则根据您的评论,您需要在将这些符号传递给printf之前将其转义。 你可以用这样的function做到这一点…

 % awk ' function escape_percents(s) { gsub("%", "%%", s) return s } { for (f = 4; f <= NF; ++f) { # Start at column 4 if (f == 7 || f == 8) { # Treat columns 7 or 8 differently if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric printf escape_percents($f) " " } } else { printf escape_percents($f) " " } } printf "\n" } ' sample2.log | sort -u AAA BBB CCC DDD %E%E%E FFFF AAA BBB CCC DDD %E%E%E GGGG AAA BBB CCC XXX DDD %E%E%E FFFF AAA BBB CCC XXX DDD %E%E%E GGGG 

一个非常基本的方法是检查字段的格式:如果它包含数字+ . +数字,就是那个!

 awk '{$1=$2=$3="" if ($7 ~ /^[0-9]+\.[0-9]+$/) {$7=""} else {$8=""} } 1' file 

注意这会在周围留下一些额外的空间,因为当你清空一个字段时,交错FS保持在那里。 要清除色谱柱,请检查Ed Morton的答案,以打印除前三列之外的所有数据 。


为了确保每一个第1,2,3,和最后一列不重复,使用awk '!uniq[$0]++' file方法:

 awk '!uniq[$1 $2 $3 $(NF-4) $(NF-2) $(NF-1) $NF]++{$1=$2=$3="" if ($7 ~ /^[0-9]+\.[0-9]+$/) {$7=""} else {$8=""} } 1' file 

如果历元时代之后的列保持不变,那么最简单的方法就是只操作NF。

使用来自更多例子的输入:

 awk '{NewLine=$4; for(i=(NF-5);i>=0;i--){ if(i!=3){ NewLine=NewLine" "$(NF-i) } } print NewLine }' Sample.log | sort | uniq 

使用输入

 Jun 5 05:13:13 AAA BBB CCC 142222222222.000 DDD EEE FFFF Jun 5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF Jun 5 05:13:14 AAA BBB CCC 142222222224.000 DDD EEE GGGG Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF Jun 5 05:13:13 AAA BBB CCC XXX 142222222226.000 DDD EEE FFFF 

你会得到

 AAA BBB CCC DDD EEE FFFF AAA BBB CCC DDD EEE GGGG AAA BBB CCC XXX DDD EEE FFFF AAA BBB CCC XXX DDD EEE GGGG