maandag 18 juli 2016

"There is no geodata"

Last week I had a conversation with some colleagues about a project. While we were talking about the inventory of all attributes for a particular object across all systems, one colleague said: "Our main focus is limited to the geodata."

At this point,my reply was: "But, there is no 'geodata'!"

Everyone was shocked. "Then what have we been doing the past year?", they reciprocated. I explained that doing a data inventory is always a good thing, but if you limit yourself to only a partial set your inventory will be incomplete. The discussion ended quickly after this, but the subject kept nagging me. So, time for a post.


"80 of all data has a spatial component." Every geographer has heard this quote somewhere in his career. It has been cited numerous times. The big GIS companies use it in their commercial outings to sell geospatial products. The validity of the statistic has been questioned repeatedly (e.g. here and here). It might or might not be true, but that's not the point. The quote obscures the simple fact that, even in a dataset that features geographic data, most of the data is non-spatial.

Instead, location is just one attribute of an object.

Just think about it. Every geospatial data table is comprised of columns, ranging from a few to several hundreds. Only one column in the table contains the location of the objects. All other columns contain descriptive data about the objects. They're filled with numbers, strings, hyperlinks and binaries. Data that is clearly non-spatial.

In GIS, data is used and presented in the form of maps. Most datasets with that spatial component are locked up in specific geospatial databases, accessible only through GIS software operated by trained geospatial specialists. GIS has staked its own domain in the IT landscape. It's only loosely coupled with mainstream IT. More often than not, there is a lack of mutual understanding between GIS and IT people. I figure this might have something to do with the fact that GIS emerged from the domain of environmental land use management, rather than from the IT domain.

Because of this, mainstream IT concepts, methodologies and technologies tend to be underused in the GIS domain. Which is a pity, because these can provide so much more value to spatial data. On the other hand, in mainstream data solutions the spatial potential of data is underused as well. The main reason for me to start this blog was to try and connect spatial data to BI concepts and methodologies.

I have always felt, and still feel, very strongly that GIS should reconnect with its IT origin. We can start by stop isolating datasets that we call 'geodata' and acknowledging that spatial data is part of something bigger. Our spatial data must be integrated in our mainstream IT solutions, applications and databases. After all, spatial data is data too. Who's with me?