Tweasel open data Datasette instance

Tweasel is a project building infrastructure for detecting and complaining about tracking and privacy violations in mobile apps on Android and iOS. Among other things, we are developing a suite of tools and libraries for automated app analysis and tracking detection, and maintaining a wiki of HTTP endpoints used by tracking companies (for a full overview of what we’re doing, have a look at our documentation).

For our work, we regularly run large-scale traffic analyses on mobile apps. We are using this data for example to maintain the tracking endpoint adapters of our TrackHAR library. Our goal is to shine a light on how trackers work and what they collect, and as such we of course want as many people as possible researching them. In addition, we want to provide documentation on why/how we have concluded what certain values transmitted to a tracking endpoint mean, and do so in a way that is replicable by others.

As such, we are publishing our datasets as open data for other researchers, activists, and anyone else who is interested in understanding the inner workings of trackers. We hope to thereby lower the barrier of entry for people to start investigating trackers themselves.

data

2 tables

requests, datasets

Datasets

Currently, requests from the following datasets are available, of which the first three were collected as part of student research projects at the Institute for Application Security at TU Braunschweig (more details in the datasets table):

Do they track? Automated analysis of Android apps for privacy violations (data from January 2021, view requests)
iOS watching you: Automated analysis of “zero-touch” privacy violations under iOS (data from June to July 2021, view requests)
Informed Consent? A Study of “Consent Dialogs” on Android and iOS (data from March to April 2022, view requests)
Worrying confessions: A look at data safety labels on Android (data from September 2022, view requests)
Traffic collection for TrackHAR adapter work (July 2023) (data from July 2023, view requests)
Traffic collection for TrackHAR adapter work (April 2024) (data from April 2024, view requests)

Note: We have decided to only publish requests to endpoints that are contacted by apps from at least two different vendors, using Apple’s definition for determining the vendor from the app ID. As such, our data is not suited for reverse-engineering internal app APIs.

Web interface

We are publishing the data as a Datasette instance, which allows you to interactively explore the full data online, including running arbitrary SQL queries against it. Here are just a few examples of interesting things you can look at:

Datasette and the plugins we have installed have lots of additional features that you may find helpful. You can for example:

Copy data in various formats, e.g. some details about the ten latest requests as a Markdown table, CSV, or JSON
Use the data as a GraphQL API
Query using regular expressions thanks to the sqlite-regex extension
Query based on URLs and paths thanks to the sqlite-url and sqlite-path extensions
Query JSON values using jq thanks to datasette-jq
Download the full SQLite database for local analysis