At TCPD, I worked at the intersection of technology and political science - designing tools to collect, clean, and standardize data so researchers could ask new questions about Indian political life.

Souvenirs from the field

About TCPD

The Trivedi Centre for Political Data (TCPD) at Ashoka University was a research centre dedicated to building high-quality datasets on Indian political life. Raw election and legislative data published by the Indian government is fragmented, inconsistent, and often locked in PDFs or poorly structured files, making it impossible to use for analysis. TCPD was founded to bridge this gap.

My work focused on building reliable data pipelines and infrastructure to make this information usable for research and the public.

TCPD was dissolved in 2023.
The data is now hosted under the Centre for Data Science and Analytics, Ashoka University.


My Role

I started as an intern in December 2021, assisting with data wrangling.
By June 2022, I was leading multiple projects as the centre’s primary data and software engineer.


Key Contributions

My work spanned election datasets, social media research, and collaborative projects, building pipelines, analyses, and research tools.


1. Lok Dhaba

Lok Dhaba was TCPD’s flagship dataset: a repository of Indian election results, widely used by researchers, journalists, and policy analysts. In addition to outcomes, it included candidate-level details such as education, profession, and unique IDs to track political careers over time.

Political Career Tracker, built on top of Lok Dhaba

My contributions included:

  • Maintaining the dataset’s pipelines and website, ensuring timely integration of new election data.
  • Rewriting the pipeline architecture to improve reliability and reduce errors during scraping.
  • Designing novel data-quality checks that improved the accuracy of career trajectories across elections.


2. Social Media Project (Meta, Twitter)

This project tracked political advertising on Meta and Twitter across 8–10 state elections, offering one of the first systematic looks at India’s digital campaigning ecosystem.

Gujarat 2022 elections: Meta ad expenditure by party
Gujarat 2022 elections: Meta ad expenses by page type
Punjab 2022 elections: Twitter activity
Gujarat 2022 elections: Twitter presence

My contributions and responsibilities included:

  • Scraping political ads through Meta’s API and building structured datasets of advertiser activity.
  • Classifying advertiser pages (party, candidate, satire, etc.) and linking them to specific candidates and constituencies.
  • Extracting metadata on spending, content, and media formats (e.g., text, images, video).
  • Designing annotation protocols for ad content (themes, presence of national leaders, instances of hate speech).
  • Producing visual analyses showing trends such as:
    • BJP’s initially dominant ad spending, followed by declines in later cycles.
    • Different campaign strategies (party-driven vs. candidate-driven).
    • Use of satire and offshore-funded ads.

Findings were presented at the Social Media and Society conference (University of Michigan, Apr ’22), and published in the Hindustan Times (Mar ’22, Dec ’22).

These analyses informed understanding of digital campaigning strategies in India.


3. Digital Society Project (DSP)

TCPD collaborated with DSP to study the social media presence of Indian politicians, focusing on Twitter activity during election cycles. The data is published here.

My contributions included:

  • Leading TCPD’s role in the collaboration, serving as point of contact with DSP researchers.
  • Recruiting and managing a team of annotators to label Twitter accounts and activity.
  • Merging annotated data with Lok Dhaba’s electoral datasets, transforming it to conform to DSP’s schema.
  • Delivering cleaned datasets in multiple phases aligned with election timelines, while documenting data limitations and quality issues.

While the Social Media Project initially focused on Meta ads, the DSP collaboration allowed us to extend these analyses to Twitter, providing a comprehensive view of politicians’ online presence.


Publications

News Articles

  1. Hindustan Times / Dec’ 12, 2022: “Number Theory: How Parties used Social Media in Gujarat Elections”
  2. Hindustan Times / Mar’ 22, 2022: Punjab Election: How Candidates and Parties used Social Media

Dataset Contributions

  1. “TCPD Indian Elections dataset (TCPD-IED), 1951-1962”. Trivedi Centre for Political Data, Ashoka University.
  2. “TCPD Rajya Sabha dataset (TCPD-RSD), 1952 – 2022”. Trivedi Centre for Political Data, Ashoka University.
  3. “TCPD Judiciary dataset (TCPD-IJD), 1950–2021”. Trivedi Centre for Political Data, Ashoka University.
  4. “TCPD and DSP Indian Politicians’ Social Media (TCPD-DSP IPSM) Dataset.” Digital Society Project.