Output JSON from big datasets
05 Sep 2017 in Algorithms, JSON, Software Architecture
The problem: some time ago we were creating a RESTfull service to give some JSON-encoded data to a client. The resulting data was so huge the webserver process run out of memory.
The script was something like:
<?php
header("Content-Type:application/json");
$data = fetchDataFromDatabaseAndTransformIt();
echo json_encode($data);
The proposed solution was to break the data into pieces and echo those pieces one at a time. This way we were able to have a constant memory consumption not depending on the size of data:
<?php
$jsonWriter = new JsonWriter();
$jsonWriter->start();
while (($data = fetchDataFromDatabaseAndTransformItOnePieceAtATime())) {
$jsonWriter->push($data);
}
$jsonWriter->end();
With the JsonWriter
class something like:
class JsonWriter {
function start() {
header("Content-Type:application/json");
echo "{";
}
function end() {
echo "}";
}
function push($data) {
echo json_encode($data) . ",";
}
}
The only problem was that for an input like ["a", "b", "c"]
the output json
string was invalid: {"a","b","c",}
.
So we needed to get rid of the last comma but not using any output buffering
functions. The final solution was simple, output the first element last and
based on that the JsonWriter
class becomes:
class JsonWriter {
// store first element
private $element;
function start() {
header("Content-Type:application/json");
echo "{";
}
function end() {
echo json_encode($this->element);
echo "}";
}
function push($data) {
if (empty($this->element)) {
$this->element = $data;
} else {
echo json_encode($data) . ",";
}
}
}