我在Mac上运行,并有一个非常大的超过100k对象的.json文件。
我想将文件分割成许多文件(最好是50-100)。
源文件
原始的.json文件是一个multidimensional array,看起来有点像这样:
[{ "id": 1, "item_a": "this1", "item_b": "that1" }, { "id": 2, "item_a": "this2", "item_b": "that2" }, { "id": 3, "item_a": "this3", "item_b": "that3" }, { "id": 4, "item_a": "this4", "item_b": "that4" }, { "id": 5, "item_a": "this5", "item_b": "that5" }]
期望的输出
如果这被分成三个文件,我想输出看起来像这样:
文件1:
[{ "id": 1, "item_a": "this1", "item_b": "that1" }, { "id": 2, "item_a": "this2", "item_b": "that2" }]
文件2:
[{ "id": 3, "item_a": "this3", "item_b": "that3" }, { "id": 4, "item_a": "this4", "item_b": "that4" }]
文件3:
[{ "id": 5, "item_a": "this5", "item_b": "that5" }]
任何想法将不胜感激。 谢谢!
Perl来拯救:
#!/usr/bin/perl use warnings; use strict; use JSON; my $file_count = 5; # You probably want 50 - 100 here. my $json_text = do { local $/; open my $IN, '<', '1.json' or die $!; <$IN> }; my $arr = decode_json($json_text); my $size = @$arr / $file_count; my $rest = @$arr % $file_count; my $i = 1; while (@$arr) { open my $OUT, '>', "file$i.json" or die $!; my @chunk = splice @$arr, 0, $size; ++$size if $i++ >= $file_count - $rest; print {$OUT} encode_json(\@chunk); close $OUT or die $!; }
@ choroba的答案是非常有效和灵活的。 我有一个与jq
的bash解决方案。
#!/bin/bash i=0 file=0 for f in `cat data.json | jq -c -M '.[]'`; do if [ $i -eq 2 ]; then ret=`jq --slurp "." /tmp/0.json /tmp/1.json > File$file.json`; ret=`rm /tmp/0.json /tmp/1.json`; #cleanup ((file = file + 1)); i=0 fi ret=`echo $f > /tmp/$i.json`; ((i = i + 1)); done if [ -f /tmp/0.json ]; then ret=`jq --slurp '.' /tmp/0.json > File$file.json`; ret=`rm /tmp/0.json`; #cleanup fi
$ cat tst.awk /{/ && (++numOpens % 2) { if (++numOuts > 1) { print out, "}]" close(out) } out = "out" numOuts $0 = "[{" } { # print > out print out, $0 }
。
$ awk -f tst.awk file out1 [{ out1 "id": 1, out1 "item_a": "this1", out1 "item_b": "that1" out1 }, { out1 "id": 2, out1 "item_a": "this2", out1 "item_b": "that2" out1 }] out2 [{ out2 "id": 3, out2 "item_a": "this3", out2 "item_b": "that3" out2 }, { out2 "id": 4, out2 "item_a": "this4", out2 "item_b": "that4" out2 }] out3 [{ out3 "id": 5, out3 "item_a": "this5", out3 "item_b": "that5" out3 }]
只要在测试# print > out
后取消print out, $0
和取消注释# print > out
,并对此感到满意。