在Linux中对文件进行sorting

说我有一个sort_me.txt文件：

a d b c f g // dont mix the two sections a c db

此刻，我做了sort sort_me.txt明显sort sort_me.txt ，我得到：

 a a b b c c d d // dont mix the two sections f g

这当然不是我想要的，我想要的是它将评论前的部分和评论后的部分分开。

以期望的结果为：

 a b c d f g // dont mix the two sections a b c d

我正在考虑使用csplit将部分拆分成单独的文件，但当然应该有更简单的方法来完成这个：

 #!/bin/bash linenum=`csplit -z $1 /^$/ {*}` count=0 output='' for line in $linenum do file=`printf "xx%.2d" $count` sorted=`cat $file | sort` output="$output$sorted" ((count++)) done echo "$output"

请注意， csplit将为每个部分创建一个临时文件，因此您可能会更新上面的脚本来取消链接每个部分，即unlink $file 。

Perl来拯救：

 perl -007 -nE ' @sections = map [ split /\n/ ], split m{^(?=//)}m; say join "\n", sort @$_ for @sections; ' -- file

-007读取整个文件，而不是逐行处理（只有在文件不是很大时才起作用）
@sections是一个数组数组，外部数组对应于部分，内部数组对应于单独的行

如果文件太大而不能放入内存，则需要逐行处理，只存储当前段：

 perl -ne ' sub out { print sort @lines; @lines = $_ } if (m{^//}) { out() } else { push @lines, $_ } END { out() } ' -- file

没有Perl，你可以用这样的脚本来做到这一点：

 #!/bin/bash FILE_NAME=$1 SEPARATOR='//' LINE_NUMBER=`grep -n $SEPARATOR $FILE_NAME | cut -f1 -d:` FILE_LENGTH=`wc -l $FILE_NAME | cut -f1 -d\s` head -$(($LINE_NUMBER-1)) $FILE_NAME | sort grep $SEPARATOR $FILE_NAME tail -$(($FILE_LENGTH-$LINE_NUMBER-1)) $FILE_NAME | sort

它搜索分隔线并逐一对这些部分进行排序。当然，如果你有两个以上的部分，它将无法正常工作。