Introduction to Noise’s Node.js API
2017-12-21 22:35
In the previous blog post about Noise we imported data with the help of some already prepared scripts. This time it’s an introduction in how to use Noise‘s Promise-based Node.js API directly yourself.
The dataset we use is not a ready to use single file, but one that consists of several ones. The data is the “Realized Cost Savings and Avoidance” for US government agencies. I’m really excited that such data gets openly published as JSON. I wished Germany would be that advanced in this regard. If you want to know more about the structure of the data, there’s documentation about the [JSON Schmema], they even have a “OFCIO JSON User Guide for Realized Cost Savings” on how to produce the data out of Excel.
I’ve prepared a repository containing the final code and the data. But feel free to follow along this tutorial by yourself and just point to the data
directory of that repository when running the script.
Let’s start with the boilerplate code for reading in those files and parsing them as JSON. But first create a new package:
mkdir noise-cost-savings
cd noise-cost-savings
npm init --force
You can use --force
here as you probably won’t publish this package anyway. Put the boilerplate code below into a file called index.js
. Please note that the code is kept as simple as possible, for a real world application you surely want better error handling.
#!/usr/bin/env node
'use strict';
const fs = require('fs');
const path = require('path');
// The only command line argument is the directory where the data files are
const inputDir = process.argv[2];
console.log(`Loading data from ${inputDir}`);
fs.readdir(inputDir, (_err, files) => {
files.forEach(file => {
fs.readFile(path.join(inputDir, file), (_err, data) => {
console.log(file);
const json = JSON.parse(data);
processFile(json);
});
});
});
const processFile = (data) => {
// This is where our actual code goes
};
This code should already run. Checkout my repository with the data into some directory first:
git clone https://github.com/vmx/blog-introduction-to-noises-nodejs-api
Now run the script from above as:
node index.js <path-to-directory-from-my–repo-mentioned-above>/data
Before we take a closer look at the data, let’s install the Noise module. Please note that you need to have Rust installed (easiest is probably through rustup) before you can install Noise.
npm install noise-search
This will take a while. So let’s get back to code. Load the noise-search
module by adding:
const noise = require('noise-search');
A Noise index needs to be opened and closed properly, else your script will hang and not terminate. Opening a new Noise index is easy. Just put this before reading the files:
const index = noise.open('costsavings', true);
It means that open an index called costsavings
and create it if it doesn’t exist yet (that’s the boolean true
). Closing the index is more difficult due to the asynchronous nature of the code. We can close the index only after all the processing is done. Hence we wrap the fs.readFile(…)
call in a Promise. So that new code looks like this:
fs.readdir(inputDir, (_err, files) => {
const promises = files.map(file => {
return new Promise((resolve, reject) => {
fs.readFile(path.join(inputDir, file), (err, data) => {
if (err) {
reject(err);
throw err;
}
console.log(file);
const json = JSON.parse(data);
resolve(processFile(json));
});
});
});
Promise.all(promises).then(() => {
console.log("Done.");
index.close();
});
});
If you run the script now it should print out the file names as before and terminate with a Done.
. There got a directory called costsavings
created after you ran the script. This is where the Noise index is stored in.
Now let’s have a look at the data files, e.g. the cost savings file from the Department of Commerce (or the JSON Schema), you’ll see that it has a single field called "strategies"
, which contains an array with all strategies. We are free to pre-process the data as much as we want before we insert it into Noise. So let’s create a separate document for every strategy. Our processFile()
function now looks like:
const processFile = (data) => {
data.strategies.forEach(async strategy => {
// Use auto-generated Ids for the documents
await index.add(strategy);
});
};
Now all the strategies get inserted. Make sure you delete the index (the costsavings
directory) if you re-run the scripts, else you would end up with duplicated entries, as different Ids will be generated on every run.
To query the index you could use the Noise indexserve script that I’ve also used in the last blog post about Noise. Or we just add a small query at the end of the script after the loading is done. Our query function will do the query and output the result:
const queryNoise = async (query) => {
const results = await index.query(query);
for (const result of results) {
console.log(result);
}
};
There’s not much to say, except it’s again a Promised-based API. And now hook up this function after the loading and before the index is closed. For that, replace the Promise.all(…)
call with:
Promise.all(promises).then(async () => {
await queryNoise('find {} return count()');
console.log("Done.");
index.close();
});
It’s a really simple query, it just returns the number of documents that are in there (644). After all this hard work, it’s time to make a more complicated query on this dataset to show that it was worth doing all this. Let’s return the total net savings of all agencies in 2017. Replace the query find {} return count()
with:
find {fy2017: {netOrGross: == "Net"}} return sum(.fy2017.amount)
That’s $845m savings. Not bad at all!
You can learn more about the Noise Node.js API from the README at the corresponding repository. If you want to learn more about possible queries, have a look at the Noise Query Language reference.
Happy cost saving!
Categories: en, Noise, Node, JavaScript, Rust