Computational Methods in Interpretive Research

Computational algorithms, specifically machine learning algorithms, can be an effective augment for qualitative or interpretive analysis of massive corpora of texts. They can help with sensemaking of texts through representation (eg. machine summarization), category creation (eg. machine classification), purposeful sampling (eg. topic prevalence estimation), temporal mapping, and relational analysis using metadata.

Unfortunately, no good out of the box solutions exist for this sort of thing. So, I'd like to build open source knowledge and tools - enterprise quality data science infrastructure - for qualitative researchers who have deep knowledge of theory and phenomena. I'd like to create educational vignettes and tools for qualitive researchers as I continue to learn from doing

An open question for qualitative researchers who are grappling with rapidly expanding volumes of textual and audiovisual data: What kinds of decsions are you comfortable allowing a machine to make? Would you want a machine to categorize your data for you? Or simply apply your categorization method, such as it is, to an existing data set? Machine methods for systematic classification like probabilistic topic modeling can also be used to create information dense representations of corpora of texts, using tables, heatmaps, and phylograms, which can be used to derive meaning from text beyond reading them. They are, it should however be remembered, just one tool in the researcher's toolbelt. We exercise agency when we cut a literature or corpus in a particular way. Letting machines do all the cutting means giving up some of that agency.


Check out my Topics in Management Research tool. Its a machine-categorized corpus of research abstracts from 20 prominent strategy, organization theory, organizational behavior and information systems journals over 20 years. The categorization is done using LDA topic modeling. You can use it to analyze recent research or generate reading lists.


Also check out my LDA-assisted analysis of how technology has been featured in recent managment research. This is a code notebook with accompanying dataset that will allow you to replicate my  analysis to some extent.

I'm also interested in developing a PhD course around this subject. I want to create something that is part practical "performance-based learning" methods course, part defence against the dark arts with a more agreeable Snape, part machine/human cognition seminar. My thanks to the brilliant undergraduates who took the McGill - MGPO434 Emerging Technologies: Organizing and Societal Stakes elective taught by Samer in Winter 2022 for inspiring me.


Have you read that paper that Paul DiMaggio wrote with David Blei and Manish Nag that uses LDA to analyze newspaper coverage of government arts funding? You should, I loved it and if you're into what this page is about, I think you will too. I also think there's something cool about that complementary expertise combination- sage and tinker. That's right, I compared David Blei to a techno-gypsy. I went there.