Learning GIS: Quantitative Data Classification & Data Uncertainty

For this course with Dr. Hermansen, I was tasked with classifying housing costs in Metro Vancouver with 4 different methods. In this post, I will discuss how these methods influence map interpretation and how data uncertainty and suppression create "no data" census areas on my maps.

Quantitative Data Classification:

One's choice of quantitative data classification scheme to use depends both on the task at hand and the intended audience. For example, a journalist who is publishing a Metro Vancouver housing cost map for the public would probably use manual breaks because the intervals are more normalized and thus audience-friendly. However, some analyses for, say, a panel of city planners, may want to use a standard deviation classification scheme. This can be used to identify certain census tracts which are substantially higher or lower than the average dwelling cost. Another example is a cross-city comparison of housing costs (as shown in my Affordability Learning GIS activity) which may warrant use of a manual breaks scheme in order to establish a consistent baseline of comparison between the cities. All in all, no classification method is necessarily the "best" since they all can be useful in various situations.

Data Classification Comparison

Data Uncertainty

Data uncertainty and error can factor into mapping housing costs because of the sourcing of the data. Statistics Canada, the source of the median dwelling value data, has rules for data suppression which exclude standard areas under a population size of 40 (or below 100 for postal code areas). Moreover, the population in consideration for their statistics is limited to the estimated coujnt of population in private households. Other data suppression rules apply to questions other than dwelling value. For individuals on Indian reserves, survey results about citizenship, landed immigrant status, or year of immigration questions may be excluded from the census results.

Statistics Canada also has certain standards for delineating census tract boundaries which may create error. While their standards state that CT zones need to have populations as socioeconomically homogeneous as possible, the modifiable area unit problem should still be considered. When census data is aggregated over a census administrative area, the entire population is generalized into a mean or median, leading to visual problems like blank or unrepresentative areas.

While Statistics Canada's data is extremely comprehensive and useful for the analysis conducted for this activity, it is important to note that 19 census tracts on my map have "0" recorded for shelter due to data suppression. This equates to approximately 4.06% of Metro Vancouver's census tracts.

Share this learning activity with others

Learning Significance

  1. I was tasked with comparing 4 methods of visual classification of vector datasets and interpreting how data uncertainty and error may originate from Canadian census data. By executing standard deviation, manual breaks, equal interval, and natural breaks classifications and referencing Statistics Canada's rules for data suppression, I developed a more critical eye with respect to data and improved my understanding of data visualization principles.