
我通常使用grep -rIn pattern_str big_source_code_dir中find一些东西。 但grep不是平行的,我该如何平行? 我的系统有4个内核,如果grep可以使用所有内核,速度会更快。


但是,如果你真的想做并行grep,那么这个网站给出了两个提示,如何使用findxargs来做到这一点。 例如

 find . -type f -print0 | xargs -0 -P 4 -n 40 grep -i foobar 

GNU parallel命令对此非常有用。

 sudo apt-get install parallel # if not available on debian based systems 

然后, paralell手册页提供了一个例子:

 EXAMPLE: Parallel grep grep -r greps recursively through directories. On multicore CPUs GNU parallel can often speed this up. find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {} This will run 1.5 job per core, and give 1000 arguments to grep. 


 find big_source_code_dir -type f | parallel -k -j150% -n 1000 -m grep -H -n pattern_str {} 


 DIFFERENCES BETWEEN xargs AND GNU Parallel xargs offer some of the same possibilities as GNU parallel. xargs deals badly with special characters (such as space, ' and "). To see the problem try this: touch important_file touch 'not important_file' ls not* | xargs rm mkdir -p "My brother's 12\" records" ls | xargs rmdir You can specify -0 or -d "\n", but many input generators are not optimized for using NUL as separator but are optimized for newline as separator. Eg head, tail, awk, ls, echo, sed, tar -v, perl (-0 and \0 instead of \n), locate (requires using -0), find (requires using -print0), grep (requires user to use -z or -Z), sort (requires using -z). So GNU parallel's newline separation can be emulated with: cat | xargs -d "\n" -n1 command xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel. xargs has no support for grouping the output, therefore output may run together, eg the first half of a line is from one process and the last half of the line is from another process. The example Parallel grep cannot be done reliably with xargs because of this. ... 


parallel --pipe --block 10M --ungroup LC_ALL=C grep -F 'PostTypeId=\"1\"' < ~/Downloads/Posts.xml > questions.xml

使用独立的grep, grep -F 'PostTypeId="1"'可以在不转义双引号的情况下工作。 我花了一段时间才弄明白这一点!
