Lesson 5. An example of content analysis

From: Privy Council Office

Introduction

In the previous lessons, you learned about using Python and preparing data for content analysis. This lesson will walk you through an example of content analysis conducted by Open North. It will demonstrate how you can use these methods to find insights in qualitative consultation data.

Context

In the fall of 2016, the Government of Canada consulted the public about national security laws and policies. During the consultation, the government asked people if they felt that the current section 38 procedures of the Canada Evidence Act properly balanced fairness with security in legal proceedings. For this analysis, Open North used the results of the question to gather insights on people’s feelings about security.

Initial investigations

When you first look at the responses, you may not know where to start. There were 106 responses to this question, each averaging 39 words, resulting in 5,037 words altogether. Open North’s analysis began with a simple question: are there any topics that come up often in the results?

Topic modelling

You’ll recall from Lesson 2 that topic modelling uses statistical models to look for similar “parts” in a dataset. Topic modelling tries to locate parts that humans can interpret as logical “topics.” At first glance, you will not know how many topics there are in the dataset or how many words will fit into each topic.

In the first test, they asked the model to look for 3 topics in the dataset, each containing 3 words. The results are as follows:

Topic 1 Topic 2 Topic 3
no, information, evidence yes, information, criminal offence, guilty, right

Although these topics might be statistically significant, they are mostly meaningless when an analyst interprets them. Given the context, these topics do not give any deeper understanding of respondents’ feelings on security.

Remember, topic modelling is an exploratory process. Imagine that you are working on the problem by testing different arrangements of the data for insights.

In the table below, Open North tested the same data using different parameters. See if you can see any apparent categories emerge.

3 topics, 3 words 3 topics, 5 words 7 topics, 3 words 7 topics, 5 words
no, information, evidence information, government, no, right, court no, criminal, canadian yes, information, security, could, law
yes, information, criminal offence, yes, law, right, guilty law, criminal, no section, security, believe, canadian
offence, guilty, right information, security, no, case, judge security, information, canadian information, government, case, need, right
information, if, judge no, evidence, security, disclosure, information
yes, government, information offence, guilty, law, right, punishment
no, fair, evidence u, right, no, tell, a
offence, guilty, law le, de, police, nous, la

Ideally, you want to label certain topics with a title. You could say, for example, that the “no, fair, evidence” topic is generally related to “injustice.” However, few categories emerged in this topic modelling.

Instead, several themes emerge. The qualitative responses appear to address things like criminality, security and judiciary processes. You can explore this further by looking at word frequencies.

Word frequencies

You will recall from Lesson 4 that there are two different ways to determine the “root” of a word.

When applied to this dataset, the following lists emerge:

Top 5, Stemming Top 5, Lemmatize Top 15, Stemming Top 15, Lemmatize
42 inform
31 secur
29 no
27 evid
26 ye
38 information
29 security
29 no
26 evidence
26 yes
42 inform
31 secur
29 no
27 evid
26 ye
25 right
24 case
24 govern
22 offenc
21 fair
18 law
17 court
16 nation
16 need
15 crimin
38 information
29 security
29 no
26 evidence
26 yes
25 right
24 case
24 government
22 offence
18 law
17 court
15 criminal
15 if
15 need
15 guilty

The themes of information and security emerge from these lists. An interesting pattern also appears regarding the use of the words “no” (29) and “yes” (26). Both words might be used in direct response to the prompt, which asked if the current section 38 procedures of the Canada Evidence Act properly balanced fairness with security in legal proceedings. You can infer, then, that a slight majority of the respondents do not feel that “proper balance” was achieved.

Conclusion

You have now explored the text using several content analysis tools. As initial analyses, these tools raised questions that you can explore further by using other content analysis techniques. Content analysis may require several rounds of analysis. Automating the techniques and experimenting with the tools in these lessons can make the process faster.

Page details

Date modified: