woensdag 9 december 2015

The types of geospatial information in Business Intelligence

Location plays an inportant role in many aspects of our daily lives. In Business Intelligence, every so often you will encounter geospatial data. From a BI perspective, this geospatial data is different from traditional data stacked in source systems. It will even look different, or even unreadable when you query it. But there's a growing awareness that location is valuable and useful for business purposes. So as a BI specialist you can deliver added business value by using this data in reporting, analytics and presentation.

What is geospatial data anyway? Geospatial data is used to locate an item or event on, below or over the earth's surface. Whether it's the location of a store, the perimeter of a building or the route of a delivery truck, if you can pinpoint it anywhere on earth it is geospatial data. It doesn't even have to represent a real-world object. County borders, property boundaries and environmental zones are clearly geospatial objects.Though you can't literally point to them and say “there it is” you can print them on a map.

The geospatial location of an object is often stored as one of three geometric representations: a point, line or polygon. In essence, all geography consists of coordinates. A point is simply a coordinate pair. Lines are made of 2 or more coordinate pairs linked in a specific order, while polygons are made of 1 or more connecting lines. There are more elaborate type objects like arcs, multigeometries, raster grids and TINs (and there's also 3D).

But let's keep it simple for now. In BI, geospatial data can be encountered in roughly three different ways.

Geocoding

Any object that contains a reference to an address, town, zipcode, county, state or country can be located on the map using that reference. There are several geocoding services out there that are happy to provide you with the location of objects. You simply enter your reference into the geocoding service and it will return the location as point, line or polygon. You don't need to maintain geography. Of course your reference needs to be complete to return a valid result. For example, if the name of a state is misspelled, the geocoding service might return no or an incorrect result.

Below is an example of a geocoding request to the Google Maps Geocoding API with the output it generates. You can see the location coordinates returned by the service, as well as some additional geographical info.

https://maps.googleapis.com/maps/api/geocode/json?address=bosscheweg+56+boxtel+the+netherlands

{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "56",
               "short_name" : "56",
               "types" : [ "street_number" ]
            },
            {
               "long_name" : "Bosscheweg",
               "short_name" : "Bosscheweg",
               "types" : [ "route" ]
            },
            {
               "long_name" : "Boxtel",
               "short_name" : "Boxtel",
               "types" : [ "locality", "political" ]
            },
            {
               "long_name" : "Boxtel",
               "short_name" : "Boxtel",
               "types" : [ "administrative_area_level_2", "political" ]
            },
            {
               "long_name" : "Noord-Brabant",
               "short_name" : "NB",
               "types" : [ "administrative_area_level_1", "political" ]
            },
            {
               "long_name" : "Netherlands",
               "short_name" : "NL",
               "types" : [ "country", "political" ]
            },
            {
               "long_name" : "5283 WB",
               "short_name" : "5283 WB",
               "types" : [ "postal_code" ]
            }
         ],
         "formatted_address" : "Bosscheweg 56, 5283 WB Boxtel, Netherlands",
         "geometry" : {
            "location" : {
               "lat" : 51.60054179999999,
               "lng" : 5.323087699999999
            },
            "location_type" : "ROOFTOP",
            "viewport" : {
               "northeast" : {
                  "lat" : 51.60189078029149,
                  "lng" : 5.324436680291502
               },
               "southwest" : {
                  "lat" : 51.59919281970849,
                  "lng" : 5.321738719708497
               }
            }
         },
         "place_id" : "ChIJWxp4d1noxkcRm32m9pvecrs",
         "types" : [ "street_address" ]
      }
   ],
   "status" : "OK"
}
Geocoding services also work the other way around. You can give them a location (e.g. a coordinate pair) and request for the return of an address, town, county, state or country. This is called reverse geocoding. Professional geocoding services provide additional functionality like returning the distance or shortest route between two objects. Both Google (Google Maps Geocoding API ) and Microsoft (Geocode Dataflow API) provide a geocode service, but there are other free and/or paid services. Free geocoding services can have a limitation in the number of feature requests you can place with them.

In addition, geocoding services are integrated in off-the-shelf BI solutions. An object can be presented on the map using on-the-fly geocoding. You simply throw your location reference onto the map component and it automatically adds location through a geocoding service.

 

Coordinate pairs

Another common form of geospatial data is the usage of coordinate pairs to store the location of an object. In a table, this is noticeable by the presence of two columns which each carry one part of the coordinate pair. One column contains an X-value or latitude, the other contains an Y-value or longitude. A big advantage of coordinate pairs is that both columns are number type columns that can easily be read or edited by anyone. The downside is that coordinate pairs can only be used for point locations. But for many uses, like store locations, addresses or incident reports, this is more than enough.

Coordinate pairs are also the lifeblood of big data. Every post, tweet, photo or video made with your smartphone can (and probably is) geographically tagged by adding the latitude and longitude recorded by your phones GPS unit. So as a business intelligence specialist you will be running into this data once in a while.

Using coordinate pairs is fairly straightforward. Most BI tools for reporting and dashboarding will support some kind of mapping component that accept two columns for location reference. Even spreadsheet software has plugins enabling the use of maps with coordinate pairs. The usage of coordinate pairs is inherently limited to plotting placemarkers on the map. These placemarkers can carry additional information through use of different sizes, colors and fonts. 
In the above image, you see a map as part of a dashboard environment. The dots on the map are positioned by two fields (latitude and longitude), as seen on the right of the image. 

Spatial attributes

We're entering the real spatial domain now. Some data sources contain spatial information stored in table attribute columns. These columns will contain actual points, lines and polygons. From the outside, the columns will probably look like to contain some unreadable BLOB type information. But in reality, these columns contain what we call spatial attributes. Below are examples of a geometry column in Oracle, MS SQL Server and PostgreSQL.

These spatial attributes can be used for complex geospatial analysis through querying or with use of specialized geospatial tooling. Nowadays, all major RDBMS vendors support spatial information through a special datatype, which allows you to define attribute columns specifically for storage of spatial information. Besides the datatype, these products will also offer functions and procedures for definining spatial relationships and spatial indexes.

When using spatial attributes, some basic knowledge of geospatial techniques is required. Spatial attributes will contain more relevant information than the point, line or polygon alone, like a defined coordinate system, map projection, rectangular bouding box, precision settings and some other stuff. Storage can be an issue too as spatial data can grow quite large.

All this makes handling of true spatial attributes more of a specialty, but it also gives you the most possibilities in terms of analysis and defining relationships between datasets. I will elaborate on the possibilities in future posts.

In conclusion

These three types of geospatial data all have their pros and cons. But they don't mutually exclude one another. In fact, they complement and even merge with one another on a regular basis. If you obtain the coordinate pair of an object with a request to a geocoding service, it is fairly simple to subsequently store the coordinate pair in your system. And coordinate pairs can be translated into spatial point attributes with the same amount of ease. Coordinate pairs can also be derived from spatial attributes. Furthermore, mapping components can also use spatial attributes for visualisation purposes. So whatever geospatial data you encounter, you always have options.

 If you have any comments or suggestions, feel free to do so below!