Code Library

Open-source data science tools for students and scholars

Everything here will run on free or open source software, and will run on a reasonably modern laptop or desktop.  I'll try to illustrate my code with interesting real data; you can choose to read or code-along. 

Please install R and RStudio on your machine, in that order. Welcome to programming, we'll be doing a lot of copy + paste.

Recommended guides to complement the content on this page

Requirements: An internet connection and a Twitter developer account with an approved Academic Track project.


Requirements: A tabular text data set, with one row per document, and all the text in one column.

Applications: Analyzing and representing changes in texts over time


Requirements: A tabular text data set, with one row per document, and all the text in one column (Analysis dataset provided in-page)

Applications: Inductively identifying key themes, identifying "fingerprints" of authors or publications in a large longitudinal text data. Measuring and representing similarity/dissimilarity/change over time (Good starting point for Lit Reviews). 


To cite this page for code or research methodology, please use:

Bhardwaj, A. (2023) Code Library: Open-source data science tools for students and scholars. https://www.abhardwaj.net/code



Coming soon...

Using an LDA Topic Model to sample from a large document dataset

Comparing the performance of LDA and BERT topic models

Creating a co-author network from data exported from Web of Science/SCOPUS

Creating a co-citation network from data exported from Web of Science/SCOPUS