我有这样一个问题:我有两个文件的键:
file1: aa, bb, cc, dd, ee, ff, gg; file2: aa, bb, cc, zz, yy, ww, oo;
res1.txt - will contain similar keys from both files: aa, bb, cc; res2.txt - will contain ONLY keys from file2 which differs from files1: zz, yy, ww, oo.
我可以用这个工具来做到这一点吗?或者我需要使用python脚本来完成这项工作? 谢谢。
我正在使用Windows。
在Python中,您可以执行以下操作。
string1 = "aa, bb, cc, dd, ee, ff, gg;" string2 = "aa, bb, cc, zz, yy, ww, oo;" list1 = string1.rstrip(';').split(', ') list2 = string2.rstrip(';').split(', ') common_words = filter(lambda x: x in list1, list2) unique_words = filter(lambda x: x not in list1, list2) >>> common_words ['aa', 'bb', 'cc'] >>> unique_words ['zz', 'yy', 'ww', 'oo']
如果需要,可以将它们写入文件。
例如:
common_string = ', '.join(common_words) + ';' with open("common.txt", 'w') as common_file: common_file.write(common_string)
你可以使用comm
来显示通用的行,但你必须对文件进行排序(并通过tr
将它们转换为每行格式的键 ):
comm -12 <(tr -s ' ,' '\n' < file1 | sort) <(tr -s ' ,' '\n' < file2 | sort) comm -13 <(tr -s ' ,' '\n' < file1 | sort) <(tr -s ' ,' '\n' < file2 | sort)
丑陋的GNU sed工作 :
sed -r 's#(\w+)[,;]\s*#/\1/{x;s/.*/\&\1,/;x};#g;s#.*#&x;s/,$/;/#' file1|sed -rf - file2 > res1.txt sed -r 's#(\w+),\s#\1[,;]\\s*|#g;s#(.*);#s/\1//g#' file1|sed -rf - file2 > res2.txt
$ cat file1 file2 aa,bb,cc,dd,ee,ff,gg; aa,bb,cc,zz,yy,ww,oo; $ sed -r's#(\ w +)[,;] \ s *#/ \ 1 / {x; s /.*/ \&1,/; x};#g; s#。*#&x ; s /,$ /; /#'file1 | sed -rf - file2 AA,BB,CC; $ sed -r'#(\ w +),\ s#\ 1 [,;] \\ s * | #g; s#(。*);#s / \ 1 // g#'file1 | sed -rf - file2 zz,yy,ww,oo;
引用Windows
:
sed -r "s#(\w+)[,;]\s*#/\1/{x;s/.*/\&\1,/;x};#g;s#.*#&x;s/,$/;/#" file1|sed -rf - file2 > res1.txt sed -r "s#(\w+),\s#\1[,;]\\s*|#g;s#(.*);#s/\1//g#" file1|sed -rf - file2 > res2.txt
每个UNIX安装附带的通用文本处理工具都命名为awk
:
awk -F', *|;' ' NR==FNR { for (i=1; i<NF;i++) file1[$i]; next } { for (i=1; i<NF; i++) { sfx = ($i in file1 ? 1 : 2) printf "%s%s", sep[sfx], $i > ("res" sfx ".txt") sep[sfx]=", " } } END { for (sfx in sep) print ";" > ("res" sfx ".txt") } ' file1 file2