Mapping in R

There are a lot of advantages to learning how to map in R. Most of our work already occurs in this platform and for simple applications it is most efficient to keep everything in one place. This tutorial goes over a fairly straight forward example that might be of interest to those studying international politics, but can be applied to many other areas of study.

For Stanford affiliates, you can also download the commercial grade mapping software ArcGIS for free, here. ArcGIS only works on a PC.

I also recommend QGIS, which is open source, cross-platform, and tends to crash a little less in my experience. It is available here.

Country Maps

Here’s a simple example of mapping the world, coloring countries by some value, and drawing lines between them.

First we need to get some shapefiles, which are maps that have already been turned into manageable polygons. For general applications, Natural Earth probably has what you’re looking for.

If your project is simple, you can also get maps of North American and the World from the maps library in R. My example will use the shapefile because learning how to set the project is pretty important, but here’s the code for the maps included in R:

map('world', fill = TRUE, col = "grey")

For this example, I’m looking for country-level data, which is called ADM0 (provinces and districts are ADM1 and ADM2 respectively). I don’t particularly care whether the country borders are accurate down to the meter so I’m selecting the coarsest level of granularity, which is data point on border locations every 110 meters, this is give the best visual experience. The most fine-grained border data available at Natural Earth is every 10 meters, which is useful when your analysis is at the district level. You can find the data that I’m loading here.

Keep everything inside the zip file together, they rely upon each other, but we are going to load the .shp file into R. You’ll also notice that I’m setting the projection of the shapefile. There are a lot of different ways to display maps but the WGS (World Geodetic System) 1984 is usually the default.

Here are the steps that I’m going going through the following code:

  1. I’m plotting the map using the plot command just like a scatter plot. When data is projected as WGS 1984, the x and y limits are latitude and longitude.

  2. Then I’m adding points on top, in this case just a random location but these could be the location of country capitals, or just the central point of the county (there are commands to calculate that for you).

  3. Then I’m drawing some random lines between things, which might be useful for the level of trade between different countries.

  4. Finally, I’m coloring countries by making the color command a vector of color names. It will color the pieces in the same order as the countries are listed in the vector.

library(maptools)   #mapping: plotting
library(rgeos)      #mapping: simplifying geometry
library(rgdal)      #mapping: projecting
library(spdep)      #Moran's I, Geary's C, poly2nb
library(rnaturalearth)  #natural earth API


#loading natural earth shape file
unproj <- CRS("+proj=longlat +datum=WGS84") #default WGS84 projection

countries <- ne_countries(scale=110)

par(bg = "grey95")
plot(countries, col="grey60", xlim=c(-80,80), ylim=c(-50,85)) 
points(x=0, y=0, col="red") #add a point
segments(x0=0, y0=0, x1=10, y1=10, col="red") #draw some lines
col <- c(rep("blue", 20), rep("green", 27), rep("grey60", 130)) #color some countries
plot(countries, col=col, add=TRUE)  #countries from this data

You can also plot your map using ggPlot and make it iteractive by using plotly. Here’s a simple example.


p <- ggplot() +  geom_polygon(data=countries, aes(x=long, y=lat, group=group),  color="white", lwd = .25)
#%>% config(displaylogo = FALSE, displayModeBar = TRUE)

Within Country Analysis

Here are some steps for carrying out analysis with point data, which is presumably data within a country. One important difference here is that I’m going to project the data to UTM (Universal Transverse Mercator) coordinates for Afghanistan, which allows us to measure distance in meters instead of degrees. You’ll have to Google to figure out what UMT zone is usually used for your country.

Here I’m going to make up some data located within Afghanistan, project it to the right coordinate system, and then add a 40km buffer around these data points.

unproj <- CRS("+proj=longlat +datum=WGS84") #default WGS84 projection
proj <- CRS("+init=epsg:32642")  #projected to UTM 42N (Afghanistan)

#administrative data
districts <- readShapeSpatial("/Users/vincentbauer/Desktop/Data/Afghanistan/Boundaries/Districts/district398.shp", proj4string=unproj)
## Warning: use rgdal::readOGR or sf::st_read

## Warning: use rgdal::readOGR or sf::st_read
districts <- spTransform(districts, proj) #reprojecting shapefile to WGS84 UTM 42N

#events data
n <- 100
lat <- runif(n, 32, 36)
long <- runif(n, 62, 70)
x <- rpois(n, 10)

dat <- data.frame("Latitude"=lat, "Longitude"=long)
##   Latitude Longitude
## 1 32.29619  63.17151
## 2 32.44747  68.45875
## 3 34.49578  67.61976
## 4 34.68433  68.39581
## 5 33.46358  66.24641
## 6 32.73273  65.68124
coordinates(dat) <- c(x="Longitude", y="Latitude") #converts into a shapefile
proj4string(dat) <- unproj  #assign a projection
dat <- spTransform(dat, proj) #reprojecting shapefile to WGS84 UTM 42N (Afghanistan)

#create a buffer
dat.buffer <- gBuffer(dat, width=40000, byid=TRUE)

plot(dat.buffer, col="blue", add=TRUE) 
plot(dat, col="red", pch=19, add=TRUE) 

For spatial statistics, you need to have a measurement of the distance between your points. This code computes a standard measure of distance, the inverse distance between points, which can then be used to calculate spatially weighted averages. I also compute the row standardized district matrix, which creates weights for a given observation that sum to 1.

#compute inverse distance

dist.mat <-as.matrix(dist(coordinates(dat)))
dist.mat <- 1/dist.mat  #inverting the distance, higher numbers mean closer
diag(dist.mat) <- 0  #setting the diagonals to zero (instead of infinity)

row.tot <- apply(dist.mat, 1, sum)  #standardizing by row, getting row total
dist.mat.stan<- dist.mat/row.tot #standardizing by row, dividing by row total

Here’s a quick look at the first 5 rows and columns of the spatial weights matrix.

dist.mat[1:5, 1:5] %>% round(4)
##   1 2 3 4 5
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0

Here I use the inverse-distance weights to compute the spatially weighted average of the (random) x-values I created before. Values of x that are nearest to a given point are given the highest weight, while points further away are given lower weights, and these weights follow linearly from the distance between observations. Because I set the distance of each point to itself to be zero, the points own value does not affect the spatially-weighted average but it is instead an average of all the other points.

(dist.mat %*% x) %>% head
##          [,1]
## 1 0.004514669
## 2 0.004740885
## 3 0.006118191
## 4 0.005526393
## 5 0.005368478
## 6 0.004964802

Here’s how to turn this matrix into a spatial weights object that can be used by functions like the splm spatial panel models.

dist.w <- mat2listw(dist.mat, style="W") #converting matrix to a weight list object

Next I zoom into southern Afghanistan and generate a new cloud of points so that I can show show to subset these points by the district that they are located in. Here’s the full cloud.

plot(districts, add=TRUE)

The code to subset the data is actually very simple. As long as our points are correctly formatted as a SpatialPoints object, then we can just subset our districts shape file to the locations that we want, and then use this to select the points.

## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
select <- districts[districts$DIST_34_NA=="Chahar Burjak",]
plot(districts, add=TRUE)