Soda vs. Pop with Twitter

One of the great things about Twitter is that it’s a global conversation anyone can join anytime. Eavesdropping on the world, what what!

Of course, it gets even better when you can mine all this chatter to study the way humans live and interact.

For example, how do people in New York City differ from those in Silicon Valley? We tend to think they’re more financially driven and restless with the world – is this true, and if so, how much more?

Or how does language change as you travel to different regions? Recall the classic soda vs. pop. vs. coke question: some people use the word “soda” to describe their soft drinks, others use “pop”, and still others use “coke”. Who says what where?

Let’s take a look.

United States

To make this map, I sampled geo-tagged tweets containing the words “soda”, “pop”, or “coke”, performed some state-of-the-art NLP technology to ensure the tweets were soft drink related (e.g., the tweets had to contain “drink soda” or “drink a pop”), and tried to filter out coke tweets that were specifically about the Coke brand (e.g., Coke Zero).

It’s a little cluttered, though, so let’s clean it up by aggregating nearby tweets.

United States Binned

Here, I bucketed all tweets within a 0.333 latitude/longitude radius, calculated the term distribution within each bucket, and colored each bucket with the word furthest from its overall mean. I also sized each point according to the (log-transformed) number of tweets in the bucket.

We can see that:

  • The South is pretty Coke-heavy.
  • Soda belongs to the Northeast and far West.
  • Pop gets the mid-West, except for some interesting spots of blue around Wisconsin and the Illinois-Missouri border.

For comparison, here’s another map based on a survey at

Pop vs. Soda Map

We can see similar patterns, though interestingly, our map has less Coke in the Southeast and less pop in the Northwest.

Finally, here’s a world map of the terms, bucketed again. Notice that “pop” seems to be prevalent only in parts of the United States and Canada.


As some astute readers noted, though, the seeming dominance of coke is probably due to the difficulty in distinguishing the generic use of coke for soft drinks in general from the particular use of coke for referring to the Coca-Cola brand.

So let’s instead look at a world map of a couple other soft drink terms (“fizzy drink”, “mineral”, and “tonic”):

Fizzy Drink vs. Mineral vs. Tonic

Notice that:

  • “Fizzy drink” shows up for the UK, New Zealand, and Maine.
  • “Tonic” appears in Massachusetts.
  • While South Africa gets “fizzy drink”, Nigeria gets “mineral”.

I’ve been getting a lot of questions lately about interesting things you can do with the Twitter API, so this was just one small project I’ve worked on to illustrate. This paper contains another awesome application of Twitter data to geographic language variation, and just for fun, here are a few other cute mini-projects:

What do people eat during the Super Bowl? (wings and beer, apparently)

Superbowl Snacks

What do people want for Christmas, compared to what they actually get?


What do guys and girls really say?

Shit Guys and Girls Say

When were people losing and gaining power during Hurricane Sandy? (click the image to interact)

Sandy Power Outages

How does information of a geographic-specific nature spread? (click the image to see a dynamic visualization of when and where tweets related to surviving Hurricane Sandy were shared)

Hurricane Sandy Retweets

Can we use Twitter to measure presidential votes? (yes!)

Electoral Map

Edwin Chen

Founder at Surge AI: a unified data labeling platform and workforce, designed for the richness of AI.

Need high-quality, human-powered data? Reach out! We help top AI companies and research labs aroun the world create high-skill, human-labeled datasets.

Former AI, data science, and engineering lead at Google, Facebook, Twitter, Dropbox, and MSR. Pure math, theoretical CS, and linguistics at MIT.

Surge AI
Surge AI Blog
Surge AI Twitter
Surge AI LinkedIn
Surge AI Github


Recent Posts

How Could Facebook Align its ML Systems to Human Values? A Data-Driven Approach

A Visual Tool for Exploring Word Embeddings

Surge AI: A New Data Labeling Platform and Workforce for NLP

Exploring LSTMs

Moving Beyond CTR: Better Recommendations Through Human Evaluation

Propensity Modeling, Causal Inference, and Discovering Drivers of Growth

Product Insights for Airbnb

Improving Twitter Search with Real-Time Human Computation

Edge Prediction in a Social Graph: My Solution to Facebook's User Recommendation Contest on Kaggle

Soda vs. Pop with Twitter

Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process

Instant Interactive Visualization with d3 + ggplot2

Movie Recommendations and More via MapReduce and Scalding

Quick Introduction to ggplot2

Introduction to Conditional Random Fields

Winning the Netflix Prize: A Summary

Stuff Harvard People Like

Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

Introduction to Latent Dirichlet Allocation

Introduction to Restricted Boltzmann Machines

Topic Modeling the Sarah Palin Emails

Filtering for English Tweets: Unsupervised Language Detection on Twitter

Choosing a Machine Learning Classifier

Kickstarter Data Analysis: Success and Pricing

A Mathematical Introduction to Least Angle Regression

Introduction to Cointegration and Pairs Trading

Counting Clusters

Hacker News Analysis

Layman's Introduction to Measure Theory

Layman's Introduction to Random Forests

Netflix Prize Summary: Factorization Meets the Neighborhood

Netflix Prize Summary: Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

Prime Numbers and the Riemann Zeta Function

Topological Combinatorics and the Evasiveness Conjecture

Item-to-Item Collaborative Filtering with Amazon's Recommendation System