Facts of Life

Due to circumstances, progress has been a bit low this week. I did improve the NodeJS implementation stack considerably, and added the Digros / Bas / Dirk visitor as promised. But not a enough time to create the log email sender.

The Digros / Bas / Dirk visitor

In a previous post we already created a site crawler for the Digros / Bas / Dirk retail brand. But this was in C#. I moved it over to the NodeJS part of the universe, like this:

The structure of visitors is a little more simplified because of modifications to the StoreClient. Essentially, the visitor is becoming a plugin to a crawler framework more and more, only implementing overridable functions.

Furthermore, we see the first application of the Cheerio library, which helps us parse and interpret the DOM of webpages with jQuery like selectors. This page we’re indexing here is more structured, and lends itself for applying Cheerio effectively.

This exercise provides a nice opportunity to compare the C# code in the earlier post that piloted this visitor, with the new JavaScript code. The port demonstrates that the constructs can be kept quite comparable, and only language details need to be massaged a bit.

Azure Mobile Data and Logging

I though it would be nice to share some of the status we currently have achieved on the store crawlers. Let’s do that with some screenshots, easy and factual.

The database currently contains the location information for 555 stores. Compared to our estimated end total, this is about 15% of the total data capacity. Just a quick glance on part of the data:


The logging for the nightly job that is crawling the sites is shown right here:


We can see what changes to the database are made for each visitor syncing their content, both in summary format and in detailed format for the changed stores. Just enough not to clutter the logs. Just want to push this info to the admins as an email, then we’re done.

Now the scripts start to stabilize and become more maintainable, and the effort of adding new visitor decreases, we will index some of the larger retail brands in the Netherlands to increase the volume of our database. Also planned is to investigate how we can register our data at the programmable web site. Hope to spend some time on that real soon, and share our effort with the world!


Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *