A continual improvement process, also often called a continuous improvement process (abbreviated as CIP or CI), is an ongoing effort to improve products, services, or processes. These efforts can seek “incremental” improvement over time or “breakthrough” improvement all at once. Delivery (customer valued) processes are constantly evaluated and improved in the light of their efficiency, effectiveness and flexibility. – en.wikipedia.org
Got you thinking there! “What the h**l does that have to do with your app?!” Well, now we are closing in to the end of our list of stores to index, less attention is going into scripting the actual store visitors, and more focus is expected to go into quality improvement of the data. You will learn about the why and how later in this blog.
I’ll stop listing the visitor code in the blog posts, because it is starting to become quite repetitive in the way they work. Typically, I repeat the following sequence of events:
- Visit the website for the retailer, and open the page with the store locator
- In Fiddler, look at how the location data is loaded into the page
- Copy a completed visitor that closest matches the data pattern used
- Adjust the visitor to match the specific attributes for the brand
- Commit and push the new code to the Azure Mobile platform and execute the script to add the stores using the new visitor
Below I will just list the visitors added this week, with their specifics.
The Attent visitor (completed)
The Troefmarkt visitor
Troefmarkt uses an external site that brings together a number of store brands that I suspect are all supplied by the same retailer. Advantage was that our system has already been prepared for these situations by introduction of the brand attribute to the Store entity in the information model. A single visitor to the external site can easily identify and index multiple brands, which it now does for both the Troefmarkt and the Dagwinkel at the same time.
This is what the lekkermakkelijk.nl site looks like:
Indexing the site itself was sort of like a similar nightmare as the one above, because it only returns a list of 9 stores closest to a postal code at a time. For this, we reuse the pattern of the Attent visitor created last week. Nice, but we need to add some validation logic that warns us if the most distant store in the list of 9 is less than 20 km from the center. In that case, there might be more stores in the circle than we got returned. Thoughts for later …
The DekaMarkt visitor
The DekaMarkt visitor is almost a one to one copy of the Hoogvliet visitor described in this post. Quite easy to index once you get the JSON structure from the body of the webpage:
The Coop visitor
Coop has three store recepies that are easily identifiable in the JSON stream: Coop (regular), Super Coop (big) and Coop Compact (small). The brand attribute is use to discriminate between these species because we can expect price differences to apply to them.
The Deen visitor
Deen is a relatively small brand, with a simple JSON stream feeding the store locations in total. Pattern-wise it is compatible to the C1000 visitor described in this post. Nothing more to say, here is the site:
The Boni Visitor
Last this week, Boni is a twin sister of the Jumbo store locator in the way the data pattern looks like. The visitor was therefore made within 10 minutes, without much thinking, and as you can see from the long list of visitors this week, counting up to a total of 3619 stores indexed, it is becoming a repetitive, quick job.
This is what the store locator on their site looks like:
As I go through the effort of creating the visitors, I encounter different levels of quality in the way the retailers maintain and expose their store location data. As one of our targets is to provide a high quality data stream, there are a number of points that I want to start working on after next week, when I expect the majority of store indexing to be completed:
- Fill in the blanks
A number of sites do not expose all their data in the stream. Some sites are missing postal code or phone numbers. Some are missing the coordinates. We will use the appropriate additional data sources to start filling in the values for these gaps.
- Duplicate check
Some of the retailers have duplicate entries for the same store. The name of the store is sometimes subtly different, but the address is exactly the same. We will search our database for such instances an prompt for resolution.
- Check position
Some latitude and longitude data retrieved from the website feels incorrect. Positions may be largely estimated, or just of wrong. We will use reverse geo-code to identify suspected instances and correct them.
- Address validity
The addresses can be validated using other sources than the retailer. We will check if the address, zip code and city are found in national databases and prompt for further investigation if not found.
- Store completeness
As indicated in this post for the Troefmarkt visitor, some of our mechanisms might miss a store or two in their indexing effort. Appropriate warnings need to be fired in case certain boundary conditions are met exposing a risk of missing essential data.
It is my intent to make retailers aware of mistakes in their data, so they can correct them at the source and everybody can benefit from that. Furthermore, in the near future, we will investigate contribution the Open Street Map initiative, which I feel is a visual counterpart of the Programmable Web that we already use to give away our data freely.
So much to do, so little time. See you next post!