Studying and Auditing Social Media Algorithms

Dr. Martin Degeling
07.04.2026

Slides: slides.degeling.com/tiktok-research-methods

Overview

  • Part 1: Platform design, the TikTok algorithm, and research methods
  • Part 2: Hands-on mini study on TikTok

About me

  • Senior Researcher with a background in computer science.
  • Studied profiling, data protection, recommender systems (Ruhr University, Carnegie Mellon University)
  • Since 2022 working with civil society organizations, currently at auditing algorithmic systems
  • tiktok-audit.com: Everything I've learned about TikTok.

About You

  • Who are you? (Name, Background)
  • What social media platforms do you use regularily?
  • Which social media effects do you think are worth studying?
1 minute each

Why Independent Social Media Analysis is Warrented

Platform Influence

  • Billions of people use social media daily as a primary source for news and discussion
  • Platforms shape what information reaches whom — and in what order
  • Technology is not neutral: algorithmic choices have societal consequences

Systemic Risks

  • Disinformation: false narratives often spread faster than corrections
  • Hate speech: amplification of extremist content at scale, toxic comment section undermine who can speak
  • Mental health: especially for young users
  • Election interference: political content ranking, foreign influence

The Transparency Problem

  • Platforms rarely release data openly
  • Research data access is limited, often inaccurate, and access is gatekept by platforms
  • Researchers must develop independent methods to study these systems

Platform Design: Recommendation Models

Three Models of Content Delivery

Subscriptions       Network                  Algorithm

e.g. Podcasts, RSS             e.g. the "old" Facebook, IG                       e.g. TikTok FYP, IG Reels, Twitter

active selection                    selection of others                         weights (based on implicit feedback)

Graphic: Arvind Narayanan: Understanding Social Media Recommendation Algorithms, 3.9.2023

The Shift to Algorithmic Feeds

  • Subscription and network models give users explicit control over content sources
  • Algorithmic feeds use implicit signals (watch time, engagement) to decide what to show
  • This creates a black box: users don't know why they see what they see
  • And researchers face the same opacity

TikTok - What we know

Scale and Impact

  • 1 billion+ monthly active users globally.
  • 19.3 million in Malaysia.
  • Average watch time per day is more than one hour
  • Major platform for youth, politics, news, and commerce

What makes TikTok different?

  • Algorithm-driven For You Page (FYP) — no subscription required to see content
  • Low barrier to reach a large audience: anyone can (in theory) go viral
  • Content is served in a constant vertical stream
  • Other platforms (IG Reels, YouTube Shorts, Twitter) have copied this model

A Recommendation Algorithm (Simplified)

At TikTok's Scale

  • 34 billion videos uploaded to TikTok daily
  • The average watch time per video: 18 seconds
  • The recommender selects ~190 out of 34,000,000,000 videos for each user session
  • This is done in real-time, per user

Official Documentation

Parameters TikTok states publicly:

  • User interactions: likes, shares, comments, follows
  • Video information: captions, sounds, hashtags
  • Device and account settings: language, country, device type (less important)

Internal Memo ("Algo 101")

  • Optimization on usage time (how long you are on the platform) and user retention (how often you return)
  • Each video is assigned a value score per user based on predicted interaction probabilities
  • Key finding: calculations rely mostly on metadata (who posted, existing likes), not on actual content
  • View time is measured to the millisecond
Ben Smith, NYTimes: How TikTok Reads Your Mind, 6.12.2021 — Translated Memo

"Anyone Can Go Viral" — Not Quite

  • No video goes viral without review (SMW, 14.03.23)
  • TikTok employees can manually "heat" certain videos, boosting them
  • Consistent platform presence is hard to maintain
The total video views of heated videos accounts for a large portion of the daily total video views, around 1-2%, which can have a significant impact on overall core metrics.
Emily Baker-White: TikTok's Secret 'Heating' Button Can Make Anyone Go Viral, Forbes, 20.01.2023

Examples of Studies on TikTok

(* that I was involved in)

Advertising on TikTok

  • How many ads do users actually see?
  • Only measurable by simulating user behavior
  • Result: ~20% of For You Feed videos are paid ads
  • For some interest categories: up to 1 in 3 videos

Depression Rabbit Holes

  • Although new accounts always see funny videos and cookie recipies you can end up in a depressive rabbit hole within 20 minutes.
  • Methodology: Manual tests sock puppet accounts that interact with depression-related content

From FYP to WW3

  • Compared FYP vs. Search results.
  • FYPs shows more military content, weapons, and war speculation; searches showes content on NATO perspectives, ongoing conflicts, and news.
  • Methodology: Manual tests & automated

Audit Types for Platform Research

A. Meßmer & M. Degeling: "Auditing Recommender Systems" (2023)

Document Audit

Analyze publicly available platform documentation

  • Terms of service, privacy policies, newsroom posts
  • Leaked internal documents
  • transparency reports

Goal: Understand the platform's stated processes and motivations

Limits: Tells you what the platform claims, not what it actually does

Automated Audit

What: Simulate user behavior with automated scripts (aka scraping)

  • Create sock puppet accounts with controlled behavior
  • Measure how the algorithm responds to specific interactions

Manual or Crowd-Sourced Audit

What: Gather data from real users (via data donations)

  • Examples: Washington Post, Data from 800 users.
  • Understand what users actually see
  • Challenges: recruitment, sample size, privacy
Caitlin Gilbert, Richard Sima and Clara Ence Morse (2025), First report, Second report, Third Report

Architecture Audit

What: Examine how different software components create the platform experience

  • Cool Down and Gravedigging mechanisms
  • Multi-step content moderation

Quick Intro: Study Design

How to Guide Your Research

Create concrete scenarios from abstract risks:

  1. Define the systemic risk you want to study
  2. Identify measurable indicators
  3. Choose the right audit type (or combination)
  4. Plan for null results and counterevidence
A. Meßmer & M. Degeling: "Auditing Recommender Systems" (2023)

Practical Considerations

  • Fresh accounts: use a research phone, register new email accounts, e.g. ProtonMail
  • Control conditions: Can you confirm that what you found does not happen to everyone?
  • Ethical concerns: To understand the algorithm we only search and swipe, no liking, commenting etc.
A. Meßmer & M. Degeling: "Auditing Recommender Systems" (2023)

What Is Good Evidence?

  • Replicability: can others reproduce your findings with the same method?
  • Representativeness: Are your findings spefic to a specific group?
  • Causal vs. correlational: Is the connection you observed causal or incidential?
  • Platform interference: bot detection, A/B tests, and geo-variation can confound results

Hands-On Research

Data Collection & Analysis

Data Collection

  • Zeeschuimer: Laptop+Firefox+Plugin; works for TikTok and others
  • Just TikTok: Use data export tool from TikTok

Data Analysis

  • 4Cat: Analysis Platform designed to be used with Zeeschuimer
  • ChatBot: Use Claude.or something similar to explore the data

Zeeschuimer - Content Collection

  • Browser extension that captures content from social media platforms
  • Supports TikTok, Instagram, Twitter/X, YouTube, and more
  • Saves structured files of what you browse
  • published by digitalmethodsinitiative

Data Export

4CAT — Analysis

  • Web-based tool for capturing and analyzing social media data
  • Works with Zeeschuimer output
  • Example Analysis: Supports TikTok search/hashtag counting, network analysis, topic modeling
  • no coding required for common analyses; reproducable methods
Test instance: https://4cat.tiktok-audit.com user: aod, pw: aodtest
4cat on github

ChatBot data exploration

  • Upload the data
  • Works with any structured data
  • no coding required
  • good for exploring the data. Caution is advised if you don't understand what claude did.

1. Test a bubble

Create a new account and explore how fast you can get into one bubble

  • Register a new TikTok account in a new private browser tab
  • Focus on a topic that shows up within the first 8 videos
  • Data Approach: Use TikTok Data Export and explore the data with a chatbot

2. Topic Exploration

What videos does TikTok recommend via search on a political topic?

  • Decide 3 search terms for a current hot topic
  • Use your existing TikTok account in Firefox with Zeeschuimer installed, collect 50 videos per term
  • Approach: Collect and analyse data with zeeschuimer and 4cat.

What to Watch For

  • What you see via search vs. what the FYP shows are different data sources
  • Scraping is a snapshot — ordering and availability changes
  • Platform ToS: be aware of what is and isn't permitted for your use case
  • Document your methodology carefully for reproducibility

Thanks!