Helio: Where does the data come from?

Ryan Caldbeck
5 min readOct 19, 2017

Earlier this year, we introduced our machine learning platform Helio to the world. We’ve been building Helio for years, so we were thrilled to finally be able to talk about it publicly. Since that introduction, we’ve published a wide variety of different insights powered by Helio on everything from fashion to food to product reviews. One of the things that’s really exciting about Helio is that we’re not limited to a particular category or a particular way of looking at the world. The incredible diversity of the billions of data points that go into Helio means we can look at any given sector from countless angles. This allows us to be holistic in our evaluation of brands and ensures that we don’t miss things that other investors do.

When people see the insights produced by Helio, they often ask us about the inputs that lead to these insights. In other words, where does the data come from? Our team at CircleUp has worked for years to build a complex web of hundreds of sources that collectively help make up Helio. While we can’t share every source due to privacy reasons, we thought it would be helpful to describe some of the data inputs that go into Helio and discuss why we think they’re important.

Public data

Important signals about a brand are created every day and the consumer goods sector is unique in that many of these signals are created publicly. When a consumer reviews a product, that matters. When a brand posts on social media, that is an important indicator. When a company hires a new employee, that is a sign. Understanding these signals is vitally important for fully evaluating a brand. At the same time, the data equivalent of 250,000 Libraries of Congress is created every single day, so there is also a lot of noise produced alongside these signals that is less meaningful. Additionally, availability and relevance of data points can change significantly as time goes on. Do a quick Google search of your favorite brand today and you’ll see information on that brand’s products, distribution, team, consumer reviews, industry, competitive set, and social media presence. Do that same Google search again in a month, and the results are likely to look a lot different. Capturing these changes at scale is an incredibly difficult task, but it’s an important one. Finding the consequential information in this deluge of data can seem like finding the needle in a haystack, and in many ways, it is. That’s where Helio comes in.

Helio draws in billions of publicly available data points on a regular basis, collecting information on things like how customers are responding to content, how the company is describing itself, where the company’s products can be bought or where its stores are located, the work experience of people who are employed at the company, and much more.

As difficult as collecting massive amounts of information is, making sense of that information and storing it in a way that lends itself to analysis can be even more difficult. For example, if two different data sources refer to the same product in different ways, how do you systematically recognize that those two products are the same? This challenge, known as entity resolution, is one that many people who work with a diverse set of data sources struggle through and it is exacerbated as you work with more disparate data sources. There is also the problem of determining how to best transform and store the data in a way that is most useful to various algorithms. Normalizing data is a critical step in making it actionable. None of this is easy to do, and we’ve spent years getting Helio to the point where it can perform all these operations quickly and accurately.

Partnership data

We’re fortunate to have fruitful partnerships with many different consumer goods, retail, data and research organizations who share information with us so that we can better understand the industry and return insights to them. Some of our partnerships empower Helio to better identify high-potential companies, others help open opportunities for companies once Helio has identified them, but all of our partnerships create substantial value to us and the entrepreneurs we serve.

Our partnerships give us access to data that no other for-profit institutions have and add powerful insights to Helio’s predictive abilities that let us look into areas we wouldn’t otherwise be able to. These partnerships take time to develop and are possible because the organizations we work with see the enormous value of getting access to the CircleUp network of promising brands. They also recognize that we treat their data with the utmost care and never give out private information without the owner’s explicit consent. As we continue to build with more and more partners we’ll be able to add more and better data to our models. We believe this will have a snowball effect and cause additional organizations to recognize the value of working with us.

Practitioner data

While there is a lot of valuable information on companies available publicly, other data, important to our models, comes from our relationships with entrepreneurs. A private company typically wouldn’t share certain information, like its revenue, with the world at large, but this type of data is key to understanding how businesses launch and grow.

At CircleUp, we’ve helped hundreds of entrepreneurs raise capital and grow their businesses. In working with these companies, we collect proprietary information that we use to help them raise capital, and we’re able to use this information — in an aggregated, non-publicly-identifiable way — to improve our models. We never use their private data publicly, but it does give us an insider look at businesses and helps our models make more accurate predictions across all consumer companies.

How does this all come together?

Helio pulls the public, proprietary, and partnership data we have into algorithms that help predict a company’s success across a variety of metrics from brand, to distribution, to product, to team. For example, data on a company’s website may intersect in interesting ways with data we get from similar companies we have worked with or information on that company available through some of our partners. This allows us to weave insights from many different sources and develop a discerning narrative on a brand, creating a system of information more valuable than the sum of its parts.

By doing this at scale, we’ve already been able to successfully identify hundreds of promising brands that traditional investors would have missed. In future posts, we’ll discuss some of these companies and how Helio led us to spot their potential early on.

--

--

Ryan Caldbeck

Founder and Executive Chairman @CircleUp. Follow me on Twitter @ryan_caldbeck