在Mac上将.json文件拆分为多个文件

我在Mac上运行,并有一个非常大的超过100k对象的.json文件。

我想将文件分割成许多文件(最好是50-100)。

源文件

原始的.json文件是一个multidimensional array,看起来有点像这样:

[{ "id": 1, "item_a": "this1", "item_b": "that1" }, { "id": 2, "item_a": "this2", "item_b": "that2" }, { "id": 3, "item_a": "this3", "item_b": "that3" }, { "id": 4, "item_a": "this4", "item_b": "that4" }, { "id": 5, "item_a": "this5", "item_b": "that5" }] 

期望的输出

如果这被分成三个文件,我想输出看起来像这样:

文件1:

 [{ "id": 1, "item_a": "this1", "item_b": "that1" }, { "id": 2, "item_a": "this2", "item_b": "that2" }] 

文件2:

 [{ "id": 3, "item_a": "this3", "item_b": "that3" }, { "id": 4, "item_a": "this4", "item_b": "that4" }] 

文件3:

 [{ "id": 5, "item_a": "this5", "item_b": "that5" }] 

任何想法将不胜感激。 谢谢!

Perl来拯救:

 #!/usr/bin/perl use warnings; use strict; use JSON; my $file_count = 5; # You probably want 50 - 100 here. my $json_text = do { local $/; open my $IN, '<', '1.json' or die $!; <$IN> }; my $arr = decode_json($json_text); my $size = @$arr / $file_count; my $rest = @$arr % $file_count; my $i = 1; while (@$arr) { open my $OUT, '>', "file$i.json" or die $!; my @chunk = splice @$arr, 0, $size; ++$size if $i++ >= $file_count - $rest; print {$OUT} encode_json(\@chunk); close $OUT or die $!; } 

@ choroba的答案是非常有效和灵活的。 我有一个与jq的bash解决方案。

 #!/bin/bash i=0 file=0 for f in `cat data.json | jq -c -M '.[]'`; do if [ $i -eq 2 ]; then ret=`jq --slurp "." /tmp/0.json /tmp/1.json > File$file.json`; ret=`rm /tmp/0.json /tmp/1.json`; #cleanup ((file = file + 1)); i=0 fi ret=`echo $f > /tmp/$i.json`; ((i = i + 1)); done if [ -f /tmp/0.json ]; then ret=`jq --slurp '.' /tmp/0.json > File$file.json`; ret=`rm /tmp/0.json`; #cleanup fi 
 $ cat tst.awk /{/ && (++numOpens % 2) { if (++numOuts > 1) { print out, "}]" close(out) } out = "out" numOuts $0 = "[{" } { # print > out print out, $0 } 

 $ awk -f tst.awk file out1 [{ out1 "id": 1, out1 "item_a": "this1", out1 "item_b": "that1" out1 }, { out1 "id": 2, out1 "item_a": "this2", out1 "item_b": "that2" out1 }] out2 [{ out2 "id": 3, out2 "item_a": "this3", out2 "item_b": "that3" out2 }, { out2 "id": 4, out2 "item_a": "this4", out2 "item_b": "that4" out2 }] out3 [{ out3 "id": 5, out3 "item_a": "this5", out3 "item_b": "that5" out3 }] 

只要在测试# print > out后取消print out, $0和取消注释# print > out ,并对此感到满意。