This week we doubled the number of stores available, published the API at the Programmable Web, and hooked up New Relic to do some measurements on usage.
The Albert Heijn visitor
Albert Heijn is probably the largest retail supermarket brand in The Netherlands. They have got several recepies for stores depending on location and size. This is what their store locations page looks like:
The page has a single feed that gives access to a list of almost 1000 shops. The list is embedded in simple readable HTML format, and can be easily parsed using Cheerio:
var request = require('request');
var cheerio = require('cheerio');
var storeClient = require('./StoreClient');
exports.Crawl = function ()
var visitor = 'ALBERTHEIJN';
var url = 'http://www.ah.nl/data/winkelinformatie/winkels/lijst';
request(url, function (err, resp, body)
console.error('error: ' + err);
var $ = cheerio.load(body);
$('tr').each(function (i, element)
var store = new storeClient.Store();
store.identifier = element.attribs['id'].toString();
store.brand = element.attribs['data-format'].toString();
if (store.brand != 'AH')
store.brand = 'AH ' + store.brand;
store.latitude = parseFloat(element.attribs['data-lat'])
store.longitude = parseFloat(element.attribs['data-lon'])
store.address = $(element).find('th').find('h3').text();
store.zipcode = $(element).find('th').find('p').text()
.split(', ').replace(/\s+/g, '');
store.city = $(element).find('th').find('p').text()
store.phone = $(element).find('td.tel').text()
.replace('-', '').replace(/\s+/g, '');
store.name = store.city + ' - ' +
Nothing much to say, just some regular selectors, and string manipulation routines that give us a data representation consistent with previously created visitors.
Publishing the API
Now that the data model is stabilized and we are gaining some volume (we’re at almost 2000 entries, half of what we expect it to be in the end), it’s time to start publishing our service interface and let the world know we’re here. Well, for testing purposes at least.
The de-facto standard for publishing and finding public services APIs on the net is the Programmable Web site. Registration of the new API is filling-in a single page form:
After filling the form, the API needs to be ‘approved’, which means it is not available directly after registration. I’m not sure what happens during the review, but I guess some person will try to connect to the service using the URL and instructions provided. No problem, we’re ready for it, and we’ve opened up access to the data in Azure Mobile services:
I will keep the application key secret for private use. This allows my scripts to be the only ones able to make changes to the tables. Public access is read only.
Getting some metrics
Important part of publishing the service API for public consumption is knowing the popularity of usage of it. Certainly in a cloud based environment were in the end, I will need to cough up the money if consumption limits are passed, and usage goes through the roof. I don’t expect that to happen that quickly, but better to be safe than sorry.
My favorite ‘light-weight’ metric and performance tool for services is New Relic. It provides a nice analytics dashboard in the cloud. Support for Azure is build in, and activation is clearly described in this article.
Now we can follow the number of requests made to the published service APIs and the duration each request takes. Quite elementary, and the ease of activation makes it a low investment to get started. We’ll look at some more detailed metrics acquired in the coming week in the next post.
My plan for next week is to increase the number of available store locations in our database with another 1000 stores. And if we set the same target for the week to follow, we are done with this exercise in two weeks from now, after which we can start to operationalize our store finder app as a first step in achieving our Smarter Groceries experience.