One of the variants of the Pareto Principle is that “Finishing the last 20% of the job is likely to take 80% of your time”. This principle seems to apply to the last stores to index, and the complexity of getting their locations. Last week was a battle, and while I added the Plus retail brand like a breeze this week, the Attent franchise chain is posing new increased challenges again. As the number of stores brought in per visitor gets lower, the effort increases.
The Plus visitor
Plus is one of the bigger brands in the Netherlands, and you can easily identify that by visiting its retail website, which looks quite professional. The store locator for their website looks like this:
Luckily, their is a very easy JSON data feed coming into their search page that allows us to quickly gather their store locations in our database. The visitor is one of the most simplest we have created until now:
var request = require('request');
var storeClient = require('./StoreClient');
exports.Crawl = function ()
var visitor = 'PLUS';
var url =
request(url, function (err, resp, body)
console.error('error: ' + err);
var store = new storeClient.Store();
store.brand = 'Plus';
store.name = jItem.StoreName.toString();
store.address = jItem.Address.toString()
+ ' ' + jItem.AddressNumber.toString();
store.zipcode = jItem.PostalCode.replace(/\s+/g, '');
store.city = jItem.City.toString();
store.phone = jItem.Phone
.replace('-', '').replace(/\s+/g, '');
store.latitude = jItem.y.toFixed(5).toString();
store.longitude = jItem.x.toFixed(5).toString();
store.identifier = store.name + ' - ' +
store.address + ' - ' + store.city;
As can be identified, a simple JSON feed is consumable that has all the information embedded in it. The big supermarket brands understand the value of easy access to store information. Because, the easier shared, the more copied, the better found, the more customers.
The Attent visitor (Work-In-Progress)
The challenge with the Attent store locator is directly visible when you take a look at their store locator functionality:
The site only allows to stores in a radius of 20 km around a City or Postal Code. And because there are only around 100+ stores expected, how to safely find them? Certainly, not every position searched will give a hit.
- Calculate the smallest rectangular bounding box around the Netherlands,
- Within the bounding box, iterate over the latitude and longitude in such a way that the areas of circles with a radius of 20 km cover all of the Netherlands,
- For each circle’s central latitude and longitude, get the closest matching postal code,
- For each valid Dutch postal code returned, perform a scripted search on the site.
I did finish step 2, which gives a picture like this:
Now I need an automatic mechanism to give me back the postal codes for the circles centers … at least once. I tried to use Google Maps for that, but I seem to be overrunning my quota earlier than expected. Maybe too many requests at the same time, need to figure out why it only returns me information for some of the queries I perform.
At least a lesson learned already this week is that, while NodeJS is completely asynchronous and we’re supposed to keep applying our code to leverage that to the fullest, the website’s we’re visiting do not like to be stressed to return the information that much. Hence, a good timeout inside a Closure is a inevitable, even valuable asset in your toolkit applying NodeJS to other non-NodeJS web interfaces.
Next post we will likely be finishing the Attent visitor, and hopefully one or two more, which puts us behind schedule by a week … because I forgot about the golden 80/20 rule to start with …