Map671 Module 09: Advanced Mapping Techniques with QGIS and CartoDB

Note: the following lesson material is drawn from the New Maps Plus Map671: Introduction to New Mapping course, instructed during the Fall 2015 semester. In addition to this lesson, students enrolled within the New Maps Plus Graduate Certificate in Digital Mapping are provided instructional video resources, additional laboratory components and instruction, and individualized support through online office hours and Q&A discussion boards. Lesson material posted here may not reflect further updates to course material nor changes to the technologies described. Use at your own risk, and please do not reuse for institutional education without our permission. Happy mapping!

Overview

This lesson continues our exploration of CartoDB. We'll use CartoDB to symbolize some point-level data of US Greenhouse Gas emissions as Graduated Symbols. We'll then explore possibilities for bringing in another data layer and adjusting the visibility of these layers as the user zooms in and out of the map.

Data files

Note: this dataset was downloaded from the EPA's Facility Level Information on GreenHouse gasses Tool (FLIGHT) and includes 2014 Greenhouse Gas Emissions from Large Facilities.

TOC

  • Symbolizing point data and "Bubble Maps" in CartoDB
    • Loading point-level data into CartoDB
    • Creating Graduated Symbols in CartoDB
  • Strategies for dealing with visual information overload
    • Filtering data in CartoDB
    • Harnessing zoom levels of an interactive map as a design solution
  • Performing a spatial join in CartoDB
  • Creating a multi-scale web map with different thematic data types
    • Controlling zoom level visability with CartoCSS
  • Finishing up the web map

Symbolizing point data and "Bubble Maps" in CartoDB

Loading point-level data into CartoDB

Let's first load a new dataset into our CartoDB account. To begin, login to CartoDB and switch to view our datasets (e.g., https://username.cartodb.com/dashboard/datasets).

Click on the New Dataset button, and select the emissions_2014.csv file that accompanies this module. Leave the option to let CartoDB "automatically guess data types and content on import" checked. Click CONNECT DATASET.

Selecting the local data file and connecting the database to CartoDB

Figure 01: Selecting the local data file and connecting the database to CartoDB

Once the dataset is connected, CartoDB will open the file in the Data View. Take a moment to complete the metadata information (see above under Data files).

Note that the _the_geom column is populated with geometry values drawn from the latitude and longitude fields contained within the CSV file. CartoDB automatically detected these values and georeferenced the data. How convenient!

Take a moment to scan through the rest of the field attributes. Note the one titled "ghg_quantity_metric_tons_co2e." This is a quantitive measure of the amount of reported greenhouse gases emissions from large facilities across the US in 2014. It is a raw number (i.e., not a standardized rate) of a point location. A desired method for cartographically representing such data is a graduated or proportional symbol map.

Graduated and proportional symbols are a useful alternative to choropleth maps and are used to map either total or ratio data. They have some important advantages over choropleth maps (i.e., the data need not be standardized). We can use true point data to make a proportional symbol map (such as the location of coal plant), or conceptual point data (i.e., a wind farm covers a large area but we can still represent it as a point). These maps are also good at showing relative magnitudes (i.e., "I can tell that this one is larger than that one").

We can make graduated or proportional symbol maps with any shape, though circles are the most common. Graduated symbol maps are simply proportional symbol maps that use a series of class breaks. Currently CartoDB creates graduated symbol maps with its Map layer wizard and playfully (or unknowingly) calls this thematic technique "bubble maps."

Creating Graduated Symbols in CartoDB

Switch to the Map View. We see that CartoDB has already styled these point locations as orange circles by default. This is already a fairly revealing thematic map: it shows us the spatial distribution of these large GHG emitting facilities (or at least those contained within this dataset).

CartoDB displaying point data of facilities emissions

Figure 02: CartoDB displaying point data of facilities emissions

Within this module we're going to be making use of some Structure Query Language (SQL), which CartoDB uses to select features from the dataset for displaying on the map. Take a little time to read through CartoDB's Learning SQL through the CartoDB Editor.

A quick peak at the SQL query that's responsible for the current display of orange dots reveals that we're selecting all data from the emissions_2014 dataset table (this is the most basic of SQL queries).

CartoDB displaying point data of facilities emissions

Figure 03: CartoDB displaying point data of facilities emissions

A quick peak at the CartoCSS that's responsible for styling these facility data shows us that we're using a marker-type of ellipse with various properties values that define the dots' size, opacity, color, etc. Feel free to play around with these values and apply the changes (Ctlr + S) to change the visual appears of the ellipse markers. Don't be afraid to break something. You can always reset the map with the Map layer wizard.

CartoCSS default rules for displaying point-level data

Figure 04: CartoCSS default rules for displaying point-level data

Let's now make this thematic map a bit more meaningful by changing the sizes of these circles based upon the quantitative measure of those greenhouse gas emissions. To do so within CartoDB, go to the Map layer wizard. Scroll through the map types to the right and choose Bubble. Ensure that our "ghg_quantity_metric_tons_co2e" is selected in the Column.

Using **Map layer wizard** to make a "Bubble Map" (a.k.a. Graduated Symbol Map)

Figure 05: Using Map layer wizard to make a "Bubble Map" (a.k.a. Graduated Symbol Map)

Here you can play with the quantification method and are given a few options. You should already recognize Quantile, Equal Interval, and Jenks. CartoDB has also included a newer classification method called Heads/Tails, which is good for classifying datasets with a long distribution tail (like this particular set). You can also adjust the radius size of the bubbles, as well as their fill and stroke color and opacity levels to make the map more readable and aesthetically pleasing. These changes are reflected in the "bubbles" on the map.

After re-symbolizing the point data using the Map layer wizard, switch back to the CartoCSS to see the actual style rules responsible for their new appearance. We can see that CartoDB is creating classed proportional symbols (10 classes by default) and using a serious of conditional statements (within the square brackets) to select certain symbols based upon the value of their associated ghg_quantity_metric_tons_co2e data attribute. A new value for the marker-width (its radius) is applied within each class break. Here you can manually adjust these values as well.

CartoCSS rules used to make a Graduated Symbol Map

Figure 06: CartoCSS rules used to make a Graduated Symbol Map

Like in Module 08, you can now choose to Visualize and publish this map to share on the web, add title information, edit the menus, and add additional popup functionality (let's hold off on this for now though).

CartoDB has quickly given us the power to make a thematic graduated symbol map. However, one problem we see when zoomed out is data overload. There's either too much information being displayed on the map, or we haven't effectively designed it.

Too much information being displayed too densely

Figure 07: Too much information being displayed too densely

Let's now consider some options to make this map more user-friendly.

Strategies for dealing with visual information overload

Filtering data in CartoDB

To improve the legibility of the map, one option is to filter our data so that we only display the larger facilities. While this could be handled within the CartoCSS rules (e.g., "if ghg_quantity_metric_tons_co2e value is < a certain threshold, make the marker not display"), we can also use SQL to do this as well so that we're only selecting certain features to display at all.

Again, CartoDB's web interface offers us a wizard for first approaching tasks using SQL (i.e., we don't need to write the statements ourselves, but can let CartoDB do this). Click on the Filters icon at the bottom of our menu. Here you can choose a data field with which to filter. CartoDB then shows you a histogram of the data distribution. Slide the filter to only display facilities emitted 1 million metric tons of GHG or more. These changes are reflected in the map itself.

Using CartoDB's **Filters** wizard to apply a SQL query

Figure 08: Using CartoDB's Filters wizard to apply a SQL query

Clicking back to the Custom SQL query tab reveals the SQL used to accomplish this. CartoDB is using an SQL query to select all entries from the dataset where the value of ghg_quantity_metric_tons_co2e is between 1000000 and the max value of 20482460. Again, you could modify this SQL query manually if you chose.

Using an SQL query to filter the data

Figure 09: Using an SQL query to filter the data

CartoDB now gives you an option to create a new dataset using this query. You can also choose clear view to remove the SQL filter query. Go ahead and clear the view and we'll consider another option for solving the problem of visual data overload.

Harnessing zoom levels of an interactive map as a design solution

While we can play with the visual design of the symbols (adjust their size and appearance) and filter out data, another potential solution within web maps we don't have in static maps lies in the ability to zoom into the map.

Notice that the sizes of the bubbles re-adjust when zooming in and out of the map. Because of the density of the facility points, the map is fairly legible when zoomed in, but not so when zoomed out.

Symbols legible when zoomed in, but too crowded when zoomed out

Figure 10: Symbols legible when zoomed in, but too crowded when zoomed out

Users can easily make visual comparisons between the facilities when zoomed into the map (say, zoom level 6 or greater). But what can we do when zoomed out (let's say zoom levels less than 6)?

The rest of the module will test a design solution. Our aim is to display the data using a different thematic technique at these lower (i.e., zoomed out) zoom levels. Let's aggregate these point-level data to a larger unit (e.g., US States). With that symbolization technique, the user can see the general distribution when zoomed out. The user can then derive more specific information about the data when zooming in. We'll now walk ourselves through that process.

Performing a spatial join in CartoDB

While we could bring these point-level data down into QGIS and do a spatial join or point-in-polygon analysis using US states before pushing it back up to CartoDB, we can also do this directly within the CartoDB web interface.

First, let's get some US States geometries. Again, let's go to the US Census Cartographic Boundary files and this time import the US states at 20 meter resolution (https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html). Remember we can import these directly into CartoDB without first downloading to our local machine (refer back to the instructions in module 08).

Once the data is loaded, from either the data or the map view choose Edit and then select Merge with dataset.

Merging a dataset in CartoDB

Figure 11: Merging a dataset in CartoDB

CartoDB offers the option to merge either with a tabular, Column join or using a Spatial join. Let's choose to do a spatial join.

Choosing a spatial join

Figure 12: Choosing a spatial join

On the left side are the US states data. Choose Select dataset and select the dataset of the large emission facilities. You're given the opportunity to preserve some of the columns from the US States dataset. In this case, since we're anticipating making a choropleth, we'll want some unit to normalize our data with, so select aland to keep. You may wish to preserve the state FIPS id (statefp) in case we want to do any tabular joins later, as well as the state name.

Importantly, you're given the option to perform some spatial analysis during this spatial join. In this case we want to aggregate or SUM the amount of GHG being emitted within each state. To do so, select the ghg_quantity_metric_tons_co2e and SUM before clicking MERGE DATASETS.

Doing a spatial join with US States and the facilities point data, while aggregating the numeric column

Figure 13: Doing a spatial join with US States and the facilities point data, while aggregating the numeric column

CartoDB with then perform the spatial join, creates and opens a new copy of the US states dataset (which it names cb_2014_us_states_20m_merge) with the GHG column summed. We can see these raw totals in a field column named "intersect_sum". Rename this field to something like "ghg".

Renaming the field with aggregated GHG emissions

Figure 14: Renaming the field with aggregated GHG emissions

Now switch to the MAP VIEW. Using the CartoDB Map layer wizard and choosing Choropleth, we can quickly create a choropleth map of these data.

Creating a quick choropleth map of GHG using the **Map layer wizard**

Figure 15: Creating a quick choropleth map of GHG using the Map layer wizard

However, there's one problem with this map. Unfortunately, CartoDB still allows us to do cartographically silly things, like create a choropleth map of un-normalized data. To make this map more honest, we should divide those raw GHG totals by another number (in this case, the area of the land makes sense).

Unfortunately for now, the CartoDB Map wizard doesn't allow us to do this with simple dropdown, so we're going to need to create a new column and make the calculation manually.

We already know how to do this within QGIS using the field calculator, and that's certainly one option for us here. This would require:

  1. importing the merged dataset into QGIS using the CartoDB plugin
  2. saving the imported file as a local Shapefile
  3. using the field calculator to make the calculation and save into a new field
  4. saving the changes to that shapefile
  5. exposing the new shapefile from QGIS back to CartoDB

There's nothing wrong with this workflow. But wouldn't it be nice if we could just do that quickly here in CartoDB? That's where the usefulness of knowing a little SQL comes in.

Switch back to the DATA View and choose add column in the lower right corner of the interface.

Adding a new column in the **Data View**

Figure 16: Adding a new column in the Data View

Name this new column something like "ghg_normalized" and change the type to be a number type.

Next, open the Custom SQL query panel. We're going to apply an SQL query to our dataset that's going to update the existing table and assign new values within our newly created "ghg_normalized" field by dividing our "gig" field by our "aland" field (this is essentially the same thing we're doing using the field calculator in QGIS).

An SQL statement setting values of a new field by dividing integer values

Figure 17: An SQL statement setting values of a new field by dividing integer values

The SQL code for this query looks like this:

UPDATE cb_2014_us_state_20m_merge SET ghg_normalized = ghg/aland

However, applying this query will not yield the result we are hoping for. We can see the values in the "ghg_normalized" column are all zeros (not what we expected). The reason for this is that CartoDB is using a PostgreSQL database to do this calculation, and when PostgreSQL divides two integers it chops off the remainder (not very helpful indeed!).

To correct this problem we need to convert one of these field values to a floating point data type upon the calculation. We do this by writing two colons and the world "float" after the field name.

UPDATE cb_2014_us_state_20m_merge SET ghg_normalized = ghg::float/aland

Applying this query yields are anticipated results, and we can see the calculated normalized values for our "ghg_normalized" field.

If we now switch back to the Map View and try applying the Choropleth quantification/classification to our map, we're now looking at a successfully normalized choropleth (it's being displaying in a non-equal-area projection, but we can fix that later if we wish). Note that you may need to switch back and forth between Quantification options for the map to update within CartoDB.

Before moving on to complete our intended task, let's take a moment to reflect on these calculated data values. We see the legend is showing a scale of 0.00 - 0.00 (not particularly useful for understanding the measure). This is because the calculated values are quiet small (e.g., 0.000295813718277517). Remember that we're dividing tons of metric GHG emissions by the area of land (the cartographic boundary file encodes area in square meters within the "aland" field). A more meaningful measure would be to calculate the amount of GHG per square kilometer. To do this, we can easily divide the values of "land" by 1,000,000 to get square kilometers (yes, 1,000,000 and not 1,000 ... remember we're dealing with square meters and kilometers here).

We thereby update our SQL query to be (we write 1000000.0 to avoid the division by integers problem):

UPDATE cb_2014_us_state_20m_merge SET ghg_normalized = ghg::float/(aland/1000000.0)

Converting square meters to square km during the calculation

Figure 18: Converting square meters to square km during the calculation

Now our "ghg_normalized" field is populated with more meaningful numbers, and we can anticipate adding map and legend titles to the effect of "Metric tons of GHG per square kilometer."

Let's continue on with making our web map of GHG emissions at different zoom levels now.

Creating a multi-scale web map with different thematic data types

Let's create a new map from this dataset. Click Visualize in the upper right hand corner. CartoDB will inform you that A map is required to publish, so click OK, CREATE MAP.

Publishing a dataset as a map

Figure 19: Publishing a dataset as a map

First, rename our map to something like US GHG Emissions. We can see our choropleth map we just created.

Once our dataset is published within CartoDB, we can add additional layers to it (you're likely limited to 4 layers with your academic account). In the upper right corner of the interface, click on the blue plus sign to add an additional layer.

Adding a new layer

Figure 20: Adding a new layer

Select the point-level data of facility emissions (i.e., the emissions_2014 layer) we were working with earlier and click ADD LAYER.

Selecting our point-level data and adding as a new layer to the map

Figure 21: Selecting our point-level data and adding as a new layer to the map

We see that the facilities have once again been plotted on our map as simple circles.

Facilities layer plotted atop the choropleth layer

Figure 22: Facilities layer plotted atop the choropleth layer

Open up the panels to the right and take a moment to rename the two layers (e.g., facility emissions and state emissions).

Renaming a map layer

Figure 23: Renaming a map layer

Now quickly repeat the steps used above to create a graduated "bubble" map with the facility emissions layer. This will again create our overly-crowded data map. We're now ready to employ our UI zoom design strategy proposed above.

Controlling zoom level visability with CartoCSS

Switch to the CartoCSS editor of the facility emissions layer. Here we see the CartoCSS we examined earlier. We're going to make a minor edit to the first selection that applies to all the symbols. In a similar way as the CartoCSS is using some square brackets within this code to select only features with values of ghg_quantity_metric_tons_co2e less than or equal to some number, we can use square brackets to select features within certain zoom levels. Examine the following code carefully, and then edit your CartoCSS to match:

Adding a zoom conditional to the facility CartoCSS

Figure 24: Adding a zoom conditional to the facility CartoCSS

What we've done is tell the CartoCSS that when the zoom level is less than 6, make the fill and line opacity values 0 for all the features (essentially rendering them invisible). Apply the style and test the result by zooming the map in and out.

If applied correctly, the graduated symbols should only appear when the zoom level is above 6.

We can apply this same technique to our state emissions layer's CartoCSS, though rather than making the zoom less than 6, we'll make the conditional greater than or equal to.

Adding a zoom conditional to the choropleth CartoCSS

Figure 25: Adding a zoom conditional to the choropleth CartoCSS

And, rather than applying the opacity changes to a "marker-file" we're doing it to a "polygon" (we didn't need to look these up; these CartoCSS statements are copied and modified from what the Map wizard produced for us).

The map now should switch between these two different thematic types as the user zooms in and out of the map. Click Publish, copy the link, and test the map in a new browser window.

Finishing up the web map

From here we have various options for improving the appearance of and experience using the map (basemap, titles, legends, etc). We can as well add some popup functionality for the user to derive specific values from features, either from the facilities or the states.

Additionally, we could consider reprojecting the map to an equal-area projection. However, the trade off doing that is that once we zoom past level 6 and lose the choropleth, a user may want the titled basemap in order to spatially locate and identify certain facilities. After we gain more advancing programming skills in MAP672 and MAP673 we could consider a script that would retroject the map at different zoom levels. But for now we can ask ourselves if the trade-off is worth it. How important is it to conform to a cartographic principle (e.g., equal area projections for choropleth maps) at the expensive of allowing the user to locate specific facilities near their hometown?

In Front Previews: 
X