Surge AI: A New Data Labeling Platform and Workforce for NLP

tl;dr I started Surge AI to fix the problems I've always encountered with getting high-quality, human-labeled data at scale. Think MTurk 2.0—but with an obsessive focus on quality and speed, and an elite workforce you can trust. If you've ever had problems getting human-annotated data, or wish you had a labeling platform you could use with your own workforce, reach out at or get started here! We work with amazing AI companies like OpenAI, Airbnb, Amazon, and more on cutting-edge data and ML problems, and we'd love to help you as well.

Getting trustworthy human-labeled data has constantly been one of my biggest blockers, through a decade of working on real-world AI. Whether at Google, Facebook, or Twitter:

  • Getting ground truth for training our ML models, and measuring their relevance and precision, invariably took months from backlogged teams of in-house labelers.
  • Speed and scale weren't the only issues—in-house labelers often weren't high quality either! When your labeling software is just an Excel spreadsheet, it's difficult to monitor and motivate performance.
  • The quality and speed of external labeling companies were even worse. We'd wait 3 months to get 10K rows of text labeled for a new NLP model, only to find that 50% of the labels were complete spam.

This was a huge bottleneck for our AI and data science teams. If it takes months to get the data your ML needs, how can you iterate and move fast?

So I built our human evaluation platforms at YouTube and Twitter to fix these problems. And we used them to do amazing things: from real-time human/AI search and advertising systems, to shifting our recommender objectives from clickbait to human preferences (especially important in this age of political polarization!), and more.

A cornerstone of these data labeling platforms was the premise that in order to build increasingly sophisticated real-world AI—for complex problems like hate speech and misinformation—we need skilled, motivated human workforces to measure and train them. So when COVID hit, and a huge educated population became out of a job or stuck at home, we realized there was an opportunity to turn the platforms we'd built into a startup, and created Surge AI.

We've grown 10x in the past 6 months, and have worked with an amazing set of early customers on amazing things:

  • We've helped companies boost their ML model performance by 50%, simply by relabeling their existing datasets.
  • Our customers have dropped their time waiting for new labels—their biggest bottleneck in training new models—from 3-6 months to just a few days.
  • We label millions of images and pieces of text every week, in over a dozen languages.
  • We're not limited to content labeling: we help our customers with everything from content moderation and AI fairness, to making phone calls to gather business and medical information, to core image recognition and NLP.
  • Teams even use our API to gather human judgments in real time!

So if you need a high-quality labeling workforce you can trust, or want to use our labeling software for your own agents, please reach out or get started! I'd love to help you get the datasets and machine learning models you need, or even just hear more about your own problems and experiences.

Edwin Chen

Surge AI CEO: data labeling and RLHF, designed for the next generation of AI.

Need high-quality, human-powered data? We help top AI and LLM companies around the world create powerful, human-labeled datasets.

Ex: AI, data science at Google, Facebook, Twitter, Dropbox, MSR. Pure math and linguistics at MIT.

Surge AI
Surge AI Blog
Surge AI Twitter
Surge AI LinkedIn
Surge AI Github


Recent Posts

A Visual Tool for Exploring Word Embeddings

Surge AI: A New Data Labeling Platform and Workforce for NLP

How Could Facebook Align its ML Systems to Human Values? A Data-Driven Approach

Exploring LSTMs

Moving Beyond CTR: Better Recommendations Through Human Evaluation

Propensity Modeling, Causal Inference, and Discovering Drivers of Growth

Product Insights for Airbnb

Improving Twitter Search with Real-Time Human Computation

Edge Prediction in a Social Graph: My Solution to Facebook's User Recommendation Contest on Kaggle

Soda vs. Pop with Twitter

Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process

Instant Interactive Visualization with d3 + ggplot2

Movie Recommendations and More via MapReduce and Scalding

Quick Introduction to ggplot2

Introduction to Conditional Random Fields

Winning the Netflix Prize: A Summary

Stuff Harvard People Like

Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

Introduction to Latent Dirichlet Allocation

Introduction to Restricted Boltzmann Machines

Topic Modeling the Sarah Palin Emails

Filtering for English Tweets: Unsupervised Language Detection on Twitter

Choosing a Machine Learning Classifier

Kickstarter Data Analysis: Success and Pricing

A Mathematical Introduction to Least Angle Regression

Introduction to Cointegration and Pairs Trading

Counting Clusters

Hacker News Analysis

Layman's Introduction to Measure Theory

Layman's Introduction to Random Forests

Netflix Prize Summary: Factorization Meets the Neighborhood

Netflix Prize Summary: Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

Prime Numbers and the Riemann Zeta Function

Topological Combinatorics and the Evasiveness Conjecture

Item-to-Item Collaborative Filtering with Amazon's Recommendation System