donderdag 22 december 2016

Open data is the new solar energy

Data is the new oil. This phrase has been with us for several years now. Just googling the term gives you about 197 million hits. Data is fueling the our Digital Economy making data a valuable asset for anyone able to leverage its potential. Right now, organizations everywhere are improving their data infrastructure trying to gain an advantage over its competitors.

But viewing data as the new oil has a downside. These inititatives are mainly aimed inwards, keeping data within the confines of the company. Like oil, data benefits the large players owning the data the most. It requires vast investments in acquiring, transporting, processing and refining data to truly benefit an organization. An organization that isn't able to produce its own data might buy it from a data vendor, but that makes it dependent on that vendor.

Open data, on the other hand, provides a level playing field and can advance society as a whole. Open data, like sunlight, is democratic and can be captured by anyone, companies and citizens alike. It can provide in big and small data needs. You don't need big investments in software to profit from available open data sources. Open data spurs new initiatives that can become self-sustaining.

If data is the new oil, then open data is the new solar energy. And we need to move to sustainable data sources quickly.

Source: United Nations Photo,  https://www.flickr.com/photos/un_photo/5105193914/

dinsdag 20 december 2016

Spatial business keys

In the previous examples we've considered the location of a business object as just another one of it's attributes. Most geographical data sources are indeed fitted with at proper business key. Business datasets like cadastral parcel registrations, addresses and assets such as water courses and culverts contain a unique attribute that is used in natural business language.

Sometimes a geographical dataset doesn't seem to contain a clear natural key. A technical key is surely present in the dataset (bc most GIS oriented software requires one) but it is mostly meaningless for the business. However, in business discussions, these objects are referenced by their location. People are literally pointing on a map to identify the object.

This makes me wonder: can location bé the natural key of an object? And therefore a candidate business key for an hub entity?

To answer this question, we need to take a look at the definition of the natural business key. There are several different definitions for the business key, but they all share the same characteristics. Business keys:
  • are unique for the object (at least within the dataset, preferably enterprise-wide, ideally worldwide)
  • exist in the real world
  • have meaning for the business users
With these criteria in mind, let's take a look at some examples.

Example #1: local zoning ordinance plans


Dutch municipalities regulate building activities by issuing zoning ordinance plans. All zoning ordinance plans are published in a nationwide interactive webservice on the Dutch national open webservices portal www.pdok.nl. The webservice used is https://www.pdok.nl/nl/service/wms-ruimtelijke-plannen.

Each zone in a plan defines which activities are allowed within the zone, like opening a shop or building a new home. A zoning ordinance plan consists of a map and a set of zoning regulations. Each polygon on the map refers to any number of these regulations. An extract of the map may look like this.

Example of local zoning ordinance zones

In the picture we can see various zones with the same legend color. Two of the yellow zones with designation 'urban housing' (in Dutch: 'wonen') have been labeled for the example. If we look at the attributes for these two zones, we see the following:

Results for FeatureType 'http://plu.geonovum.nl:Enkelbestemming':
--------------------------------------------
bestemmingshoofdgroep = wonen
historisch = false
hoofdfuncties = null
identificatie = NL.IMRO.0846.EP30158152437-00
naam = Wonen
ondergeschiktefuncties = null
plangebied = NL.IMRO.0846.BP2008BOS01Boskant-oh02
planstatus = onherroepelijk
typeplan = bestemmingsplan
versieimro = IMRO2008
verwijzingnaarobjectgerichteteksturl = null
verwijzingnaarteksturl = http://ruimtelijkeplannen.nl/documents/NL.IMRO.0846.BP2008BOS01Boskant-oh02/r_NL.IMRO.0846.BP2008BOS01Boskant-oh02_2.13.html
dossierid = NL.IMRO.0846.BP2008BOS01Boskant
dossierstatus = geheel onherroepelijk in werking
datum = 2012-03-14
verwijzingnaarexternplan = null
geometrie = [GEOMETRY (Polygon) with 8 points]
--------------------------------------------
 
Results for FeatureType 'http://plu.geonovum.nl:Enkelbestemming':
--------------------------------------------
bestemmingshoofdgroep = wonen
historisch = false
hoofdfuncties = null
identificatie = NL.IMRO.0846.EP34158152442-00
naam = Wonen
ondergeschiktefuncties = null
plangebied = NL.IMRO.0846.BP2008BOS01Boskant-oh02
planstatus = onherroepelijk
typeplan = bestemmingsplan
versieimro = IMRO2008
verwijzingnaarobjectgerichteteksturl = null
verwijzingnaarteksturl = http://ruimtelijkeplannen.nl/documents/NL.IMRO.0846.BP2008BOS01Boskant-oh02/r_NL.IMRO.0846.BP2008BOS01Boskant-oh02_2.13.html
dossierid = NL.IMRO.0846.BP2008BOS01Boskant
dossierstatus = geheel onherroepelijk in werking
datum = 2012-03-14
verwijzingnaarexternplan = null
geometrie = [GEOMETRY (Polygon) with 10 points]
--------------------------------------------


The attributes of these yellow colored polygons are all equal, except for two columns: the 'identificatie' column and the 'geometrie' column which contains the geometry. The first attribute, 'identificatie', contains what looks like a business key but in reality is a composite technical key with business aspects. The first part of the key contains a reference to country ('NL'), domain ('IMRO') and plan ('0846') while the last part of the key ('EP30158152437-00') is essentially a sequence.

What makes each ordinance zone truly unique is its location, captured in the second attribute 'geometrie'. Without the geometry of the zones, the zones don't even exist and the 'identificatie' attribute is useless. This might make the geometry of the zones the true business key of this dataset. Let's check how it holds up to the three characteristics of a business key:
  • The geometry is unique for each zone. There is no overlap between zones.
  • Each zone references real public space. You can live and work in a zone. If you want to change the designated purpose of a building within the zone, you need to file a zoning request.
  • The map is the main entrance for usage of the business object. Each zoning request is negated by comparing the location of the request to the zone it is in. This can only be done with the geometry of the zone.

 

Example #2: Watercourse signs 


Along the watercourses in our area, bridges are being fitted with name signs. Each sign contains the name of the watercourse that is crossed. Below is a picture of a sign on a bridge crossing the river Dommel.

Name sign for river Dommel


Of course, one single watercourse can be crossed many times by different bridges. For example, there currently are 53 locations where one or more signs with description 'Dommel' are fitted. So the signs itself are not quite unique. In fact, they're manufactured in bulk.

All signs are managed in a table in our GIS-database, that looks like this. Most columns in the table are purely descriptive data, which are not very suitable as a business key. Again there are two candidate business keys present, the fields 'PNT' and 'GEOMETRY'.

Table WATERCOURSE_NAMESIGN
The first candidate is the field named 'PNT'. It is clearly a unique key but it lacks business meaning. The fact that its values are equal to the field 'OBJECTID' (which is a sequence populated by the GIS-software) indicates that it is a technical key at best.

Watercourse signs on the map


The second candidate 'GEOMETRY' is, as the name suggests, a spatial column holding the point location for each sign location. Again, let's check if it conforms to the definition elements of a business key:
  • Each sign location is unique as there are no two bridges on the same location. 
  • The location undoubtedly exists in the real world as you can physically touch the objects on their location. 
  • And it is definitely used by the business users to refer and to navigate to the signs.
Positioning each signs location on the map gives it its specific meaning. When the signs on a location are unscrewed from the bridge railing and put away in a storage locker, the record seizes to exist.

In conclusion


So, can a spatial column be the business key for a table? I think it can, and in case of geospatial data sources, will prove to be very common. Of course, this will have implications for the way information with a spatial business key is handled in methods like Data Vault. That is a topic for a next post.