Lesson 5. An example of content analysis
From: Privy Council Office
Introduction
In the previous lessons, you learned about using Python and preparing data for content analysis. This lesson will walk you through an example of content analysis conducted by Open North. It will demonstrate how you can use these methods to find insights in qualitative consultation data.
Context
In the fall of 2016, the Government of Canada consulted the public about national security laws and policies. During the consultation, the government asked people if they felt that the current section 38 procedures of the Canada Evidence Act properly balanced fairness with security in legal proceedings. For this analysis, Open North used the results of the question to gather insights on people’s feelings about security.
Initial investigations
When you first look at the responses, you may not know where to start. There were 106 responses to this question, each averaging 39 words, resulting in 5,037 words altogether. Open North’s analysis began with a simple question: are there any topics that come up often in the results?
Topic modelling
You’ll recall from Lesson 2 that topic modelling uses statistical models to look for similar “parts” in a dataset. Topic modelling tries to locate parts that humans can interpret as logical “topics.” At first glance, you will not know how many topics there are in the dataset or how many words will fit into each topic.
In the first test, they asked the model to look for 3 topics in the dataset, each containing 3 words. The results are as follows:
Topic 1 | Topic 2 | Topic 3 |
---|---|---|
no, information, evidence | yes, information, criminal | offence, guilty, right |
Although these topics might be statistically significant, they are mostly meaningless when an analyst interprets them. Given the context, these topics do not give any deeper understanding of respondents’ feelings on security.
Remember, topic modelling is an exploratory process. Imagine that you are working on the problem by testing different arrangements of the data for insights.
In the table below, Open North tested the same data using different parameters. See if you can see any apparent categories emerge.
3 topics, 3 words | 3 topics, 5 words | 7 topics, 3 words | 7 topics, 5 words |
---|---|---|---|
no, information, evidence | information, government, no, right, court | no, criminal, canadian | yes, information, security, could, law |
yes, information, criminal | offence, yes, law, right, guilty | law, criminal, no | section, security, believe, canadian |
offence, guilty, right | information, security, no, case, judge | security, information, canadian | information, government, case, need, right |
information, if, judge | no, evidence, security, disclosure, information | ||
yes, government, information | offence, guilty, law, right, punishment | ||
no, fair, evidence | u, right, no, tell, a | ||
offence, guilty, law | le, de, police, nous, la |
Ideally, you want to label certain topics with a title. You could say, for example, that the “no, fair, evidence” topic is generally related to “injustice.” However, few categories emerged in this topic modelling.
Instead, several themes emerge. The qualitative responses appear to address things like criminality, security and judiciary processes. You can explore this further by looking at word frequencies.
Word frequencies
You will recall from Lesson 4 that there are two different ways to determine the “root” of a word.
- Lemmatization looks for the smallest “root” in the dictionary.
- Stemming removes common prefixes and suffixes, to produce a root that does not necessarily exist in the dictionary.
When applied to this dataset, the following lists emerge:
Top 5, Stemming | Top 5, Lemmatize | Top 15, Stemming | Top 15, Lemmatize |
---|---|---|---|
42 inform 31 secur 29 no 27 evid 26 ye |
38 information 29 security 29 no 26 evidence 26 yes |
42 inform 31 secur 29 no 27 evid 26 ye 25 right 24 case 24 govern 22 offenc 21 fair 18 law 17 court 16 nation 16 need 15 crimin |
38 information 29 security 29 no 26 evidence 26 yes 25 right 24 case 24 government 22 offence 18 law 17 court 15 criminal 15 if 15 need 15 guilty |
The themes of information and security emerge from these lists. An interesting pattern also appears regarding the use of the words “no” (29) and “yes” (26). Both words might be used in direct response to the prompt, which asked if the current section 38 procedures of the Canada Evidence Act properly balanced fairness with security in legal proceedings. You can infer, then, that a slight majority of the respondents do not feel that “proper balance” was achieved.
Conclusion
You have now explored the text using several content analysis tools. As initial analyses, these tools raised questions that you can explore further by using other content analysis techniques. Content analysis may require several rounds of analysis. Automating the techniques and experimenting with the tools in these lessons can make the process faster.
Page details
- Date modified: