DataViz about Data Literacy

This semester I am teaching a course called Data for Good. This is a fully online master's level course, and most of the students are pursuing a degree in philanthropy or nonprofit studies. To kick off the course, the students completed a data inventory in which they self-assessed their rankings on several data skills. I thought it would be fun to show some DataViz of these data literacy data. My aim here is to highlight the differences between standard outputs for data visualization versus more compelling and aesthetically appealing options.

(Note: You can click on the visuals below to view in an expanded page.)

Background

First, a bit of background on the data. The questions on this inventory were drawn from two sources. One is from Tableau's list of 17 key traits of data literacy. These items are ranked on a 5-point scale and grouped into four categories: knowledge, skills, attitudes, and behaviors. The self-assessment includes a ranking of the importance of each item, along with one's proficiency on that item.

The second data source is adapted from Gemignani and colleagues 2014 data fluency inventory. From the full list of questions, I drew a block of items for a 7-point Likert agree/disagree scale for statements such as: I know the most important metrics for measuring success of my organization. I have access to the data I need.

The inventory concludes with a block of original items that request a rating of current ability on a scale from 0-10 (0 lowest and 10 highest) for data literacy skills such as: Synthesize complex data from multiple sources; Visualize data and tell compelling data narratives. I developed these student learning outcomes based on a review of existing literature (example).

Standard Visualizations

Using a common survey tool, a report was automatically generated that included the bar chart to the left, which represents student ratings for the importance of each data literacy trait. To put it bluntly, this is a terrible visual that leaves the viewer with little sense of how to meaningfully understand the results.

From this visual, a data fluent viewer may be able to decipher that the means across these items hovered around 4 (indeed, the class average was 4.25), but beyond that it would be hard to discern much from this clutter.

If we were interested in seeing which particular items people ranked the highest versus lowest, we could dig into the settings of the survey software to output the below set of visuals. Though the data are not truly continuous as the line implies, this visual helps to identify peaks and valleys across the data literacy traits.

For example, a viewer could better see that there is a noticeable peak in both the importance (purple) and proficiency (orange) ratings for the item that is five from the right side (Peak 1), which concerns the ethical use of data. An astute viewer may also identify a discrepancy in ratings of advocating for the effective use of data, by considering the relatively high peak for the importance of the item that is second from the right (purple, Peak 2), compared to its lower proficiency rating (orange, Not Really Peak 2). Yet, drawing such conclusions from these visuals requires a great deal of cognitive attention, which is a considerable cost to even the most passionately interested consumer of this information.

More Appealing DataViz

To move up in the extent to which visuals are aesthetically appealing and readily understood, there are several contemporary data visualization tools that markedly improve on traditional outputs. For example, Datawrapper is an excellent and free tool that facilitates users in making interactive charts with no coding and relatively few advanced data skills. The clarity of this data tool enables its widespread use, including by the New York Times.

Using this tool, I was able to copy and paste the data from the survey, perform a few modifications to the default visual, test for various forms of color blindness (as my red-green colorblind husband would appreciate), and output the visual to the right.

This visual compares importance (dark green) to proficiency (light green), sorted by importance. Bolded text emphasizes items with the greatest difference between importance and proficiency. For example, the average rating for importance of creating clear visuals was 4.31, whereas average proficiency in this skill was 2.63.

The visual is a step forward from the previous sets and is worth venturing to an external tool to create. Yet, further DataViz improvements are possible from there.

For example, I entered the numerical values for the class averages into the Tableau Public data literacy profile, and below is the visual (adjusting down one number from the 1-5 scale in my survey to the tool's 0-4 scale).

This visual displays the four quadrants possible for combining low-high importance with low-high proficiency. Most importantly, the right side of the figure draws ready attention to the outliers from the cluster of items for high importance-high proficiency to identify items 10, 12, 13, 15, and 17 as those with the greatest discrepancy between the two. This visual story is further accentuated by the display on the left side of the figure, which aids identification of the item titles, such as: continuously improves data, enthusiastically spreads data literacy.

Most Compelling DataViz

While those data visualizations were fun to make and begin to tell the story, there are important aspects of the narrative missing. What cannot be seen from these averages alone is the wide degree of variation among students in the class in their ratings on each item. Measures of central tendency (mean, median, mode) tell us something important, but they do not describe everything we need to know. Indeed, most traditional statistics courses pick up on this exact point as a justification for more complex modeling. While advanced stats is great for what it needs to do, it often fails the typical philanthropic practitioner who is interested in extracting something from the data but not usually with enough time or expertise to delve into complicated statistics.

Yet, there is a comfortable middle ground between the two extremes of advanced statistics and data oversimplification. For example, the below two visuals display a simple table of data in Excel that represents more than basic averages, namely displays of minimum and maximum. Again employing Datawrapper, this time with more time invested in diving into the modification settings (and again with a eye out for my color-blind viewers), I outputted the below two figures, both of which highlight the range of class responses for each item. The visuals are both sorted from the lowest range (bottom) to the highest range (top).

To pinpoint a few of the visual insights, the first chart (above) shows that the students have resounding agreement that it is important that data sources be understood, reliable, and valid (ranges only from 6 to 7, with a mean of 6.81). Likewise, in the middle of the figure, the class is seen to mostly cohere around viewing themselves to be relatively accurate consumers of data (ranging only from 3 to 5, with a mean of 4.38). Alternatively, the class range is tremendous in the extent of knowing the most important metrics for measuring organizational success. While the mean is fairly high (5.06), the range extends from 3-Disagree to 7-Strongly Agree. This spread is important and entirely invisible in the previous set of visuals.

In terms of the scale from 0-Low to 10-High on data literacy abilities, the second chart (below) again tells an important story about variation: the students range from 0-2 all the way to 7-10 in ratings of their ability to carry out each of these aspects of data literacy. It is telling to know that the class average ranges from a low 3.94 for ability to analyze the size and scope of funding for data technology to a high of 6.25 for ability to comprehend, interpret, assess, and apply data. Yet, the tremendous range on each of these items would have been lost if only the previous set of visuals was engaged. This kind of compelling DataViz takes more time and attention to create, but it is crucial for drawing attention to key aspects of the data narrative.

In summary, the kind of data visuals we engage matter. The easiest output from standard visualizations are often problematic, and they can readily be improved upon with free and accessible tools to return more aesthetically pleasing DataViz. Yet, beyond what can be easily outputted from those tools, the field needs people who can critically think about what the accurate data story is and use visuals to compelling tell that narrative.

Patricia Snell Herzog, PhD

Sociological Imagination in Action: Advancing Data and Tech for Good

DataViz about Data Literacy