TN affidavit downloads - Buzz or bot?

Curiosity by the educated or an outlier?

Data from Election Commission

Election commision of India publishes the list of candidates for each constituency with their affidavit detailing their assets, income for past 5 years & pending cases.

Though the link has been communicated by the New Media for the public to visit the website, download the scanned copy of the affidavit and learn about their preferreed candidates. Due to the cumbersome process of pdf downloads, demanding a laptop or a large screen to view the content, the information in the affidavits are decoded by New Media or NGOs.

Article by Puthiyathalaimurai

Recently on 1-April-2021, new channel "Puthiya thalaimurai", published an article, on candidate Padma Priya from Makkal Needhi Maiam (MNM) having 300,000+ downloads of her affidavit. Taking reference from the article, lets us view the downloads for the leaders of major parties in Tamil Nadu


Questions to understand:

This article generates two questions?

  • Did these downloads indicate the interest shown by the educated mass to independantly learn about their preferred candidates? or
  • Were these downloads done by bots or paid people to create a "News Buzz"?
Eitherway, let us dig into this data point and explore the BUZZ!

Important dates:

  • Voting day: 6-April-2021
  • Voting count: 2-May-2021


Top Downloads by Candidates (above 2k downloads)

Comparing the # of downloads between 1-April-21 to 6-April-21, atleast for the below two candidates, the downloads has been massive.

  • Seeman - a massive 730 times increase from 10,000 + to 7,300,000 +
  • Kamalhaasan - another abrupt 25 times increase from 15,000 + to 380,000 +

Is the increase a natural curiosity by the educated mass or a gamified outlier?


Definition for Buzz or Bot

The download counts are incremental in nature, so it is possible to have extreme values, meaning high variability.

Here we want to,

  • Bots: Isolate the extreme download counts, identified by comparing against all download counts at "State level", and show them separately.
  • Buzz: After excluding the State level extreme values, applying the same logic as in bots, isolate and show the extreme values at "Constituency level".

Method

To identify extreme values:

  • The download count should be above the upper limit of a box plot &
  • The download count should be above 0.95, proving those values are statistical significant outliers (cumulative density function)
Note: Other methods like Hampel filter, Grubbs’s test, Dixon’s test, Rosner’s test, did not help to isolate the outliers, it showed almost all values from boxplot. QQ plot was helpful, but too complex for this article.


Bots (State level outliers)

One large "Box plot" to view the extreme values.

Stats from the box plot, showing the lower and upper limits.



Alternate view, to look at the extreme values by District.



Final list of State level extreme values (Bots) which were above the upper whiskers of the boxplot and satisfy the hypothesis, as statistically significant (.95) outliers!


Buzz (Constituency level outliers)

To be added - INPROGRESS