Surge: Data Labeling You Can Trust

tl;dr I started Surge earlier this year to fix the problems I've always encountered with getting high-quality, human-labeled data at scale. Think MTurk 2.0—but with an obsessive focus on quality and speed, and an elite workforce you can trust. If you've ever had problems getting human-annotated data, or wish you had a labeling platform you could use with your own workforce, reach out at or get started here! I'd love to hear from you and chat.

Getting trustworthy human-labeled data has constantly been one of my biggest blockers, through a decade of working on real-world AI. Whether at Google, Facebook, or Twitter:

  • Getting ground truth for training our ML models, and measuring their relevance and precision, invariably took months from backlogged teams of in-house labelers.
  • Speed and scale weren't the only issues—in-house labelers often weren't high quality either! When your labeling software is just an Excel spreadsheet, it's difficult to monitor and motivate performance.
  • The quality and speed of external labeling companies were even worse. We'd wait 3 months to get 10K rows of text labeled for a new NLP model, only to find that 50% of the labels were complete spam.

This was a huge bottleneck for our AI and data science teams. If it takes months to get the data your ML needs, how can you iterate and move fast?

So I built our human evaluation platforms at YouTube and Twitter to fix these problems. And we used them to do amazing things: from real-time human/AI search and advertising systems, to shifting our recommender objectives from clickbait to human preferences (especially important in this age of political polarization!), and more.

A cornerstone of these data labeling platforms was the premise that in order to build increasingly sophisticated real-world AI—for complex problems like hate speech and misinformation—we need skilled, motivated human workforces to measure and train them. So when COVID hit, and a huge educated population became out of a job or stuck at home, we realized there was an opportunity to turn the platforms we'd built into a startup, and created Surge.

We've grown 10x in the past 6 months, and have worked with an amazing set of early customers on amazing things:

  • We've helped companies boost their ML model performance by 50%, simply by relabeling their existing datasets.
  • Our customers have dropped their time waiting for new labels—their biggest bottleneck in training new models—from 3-6 months to just a few days.
  • We label millions of images and pieces of text every week, in over a dozen languages.
  • We're not limited to content labeling: we help our customers with everything from content moderation and AI fairness, to making phone calls to gather business and medical information, to core image recognition and NLP.
  • Teams even use our API to gather human judgments in real time!

So if you need a high-quality labeling workforce you can trust, or want to use our labeling software for your own agents, please reach out or get started! I'd love to help you get the datasets and machine learning models you need, or even just hear more about your own problems and experiences.

Edwin Chen

Building human/AI infrastructure at Surge.

Need obsessively high-quality human-labeled data? Interested in a self-serve data labeling platform? Just reach out! We help top companies create massive datasets to train and measure their AI.

Former AI & engineering lead at Google, Facebook, Twitter, and Dropbox. Pure math and linguistics research at MIT.


Recent Posts

Surge: Data Labeling You Can Trust

Exploring LSTMs

Moving Beyond CTR: Better Recommendations Through Human Evaluation

Propensity Modeling, Causal Inference, and Discovering Drivers of Growth

Product Insights for Airbnb

Improving Twitter Search with Real-Time Human Computation

Edge Prediction in a Social Graph: My Solution to Facebook's User Recommendation Contest on Kaggle

Soda vs. Pop with Twitter

Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process

Instant Interactive Visualization with d3 + ggplot2

Movie Recommendations and More via MapReduce and Scalding

Quick Introduction to ggplot2

Introduction to Conditional Random Fields

Winning the Netflix Prize: A Summary

Stuff Harvard People Like