Geospatial Data Analysis

Geographical data is data with a location attribute which can be absolute location (coordinates) or relative positioning (distance). The location attribute of each observation gives rise to spatial dependence. Ignoring spatial effects, we underestimate the standard error of the parameters and thus overestimate the significance of the covariates.


Types of spatial data


A. Areal data which arise when the study region is divided into a set of areal units, connected by borders, e. g. countries, districts, census zones.

The aims of analysis are:

1. Identify spatial patterns (e.g. disease mapping)

2. Assess association between an outcome (e.g. mortality) and factors that may vary gradually over geographical regions (ecological studies).


B. Geostatistics data which arise when the data are collected at a fixed set of locations within a continuous study region.

The aims of analysis are to:

1. Assess geographical variation in the data.

2. Identify covariates significantly associated with the outcome in the presence of spatial variation.

3. Predict the outcome at new locations.

C. Point patterns data which arise when the locations of particular events are not fixed but random quantities e.g. locations of diseased cases, locations of a species of trees.

The aims of analysis are to assess:

1. Whether there is any pattern in the locations themselves.

2. Whether events appear sporadically or they cluster.

3. Risk factors which such clusters are associated with.


Analysis of areal data


Geographical data are correlated in space. Spatial proximity can be defined in terms of :

  • Neighbours (more adequate for areal data). For each region we should specify the set of neighbouring regions.

  • Distance (more adequate for geostatistical data but it can be also applied for areal data). For each location (i.e. centroid of the region) we should define its distance from all the other locations.