predictive lead scoring

Predictive Lead Scoring: Discovering Best-Fit Prospects with Machine Learning

B2B sales strategies can be roughly divided into two activities: lead generation and lead conversion. It’s clear how each works. The former, attracting visitors to your website and then helping them take certain actions, is almost automated and works through carefully placed calls to action. The latter, supporting a lead to make the purchasing decision, is done by professional sales people with their arsenal of personalized tactics.

But somewhere in between there’s a process of understanding which visitors your company values the most. Because a salesperson’s work has a high cost -- both in time and money -- it should be focused on nurturing only the most engaged and fitting leads, those that will assure ROI. This process is called lead scoring along with data analytics allows you to predict the weight each lead carries with utmost accuracy. You will learn more about how it works and why you should use it in this article.

How does lead scoring work? Manual vs predictive lead scoring

Lead scoring is a tool that helps marketers pinpoint the potential customers among the leads. That way, the sales team doesn’t waste time contacting all prospects and focuses only on the most valuable ones.

A lead scoring platform

A lead scoring platform Source: HubSpot

The name gives away the main idea behind this technique. All prospects get points based on how well they fit the attributes of a potential customer. The ones that score the highest are determined to be more qualified. But the methods used for determining attributes and assigning points can be different.

Traditional lead scoring

In traditional or manual lead scoring, the job is done by marketers themselves. It includes:
  1. Determining key attributes. Website visitors leave tons of data behind, both explicit (like email or name in the online form) and implicit (based on their actions on the page). For example, the number of visits to your contact page shows their engagement, while their industry or position in the company indicates how well they fit with your business. Marketers figure out what attributes can help identify the most promising customers.
  2. Assigning points to attributes. Then, you need to determine which attributes tell you the most and assign points based on their value. For example, someone from your targeted industry or a person in a higher position should get more points. The same for those who subscribe to your email list or visit the contact page. Points can also be deducted, for example, when leads have generic email domains.
  3. Contacting the most qualified leads. Leads with highest scores are prioritized by the sales team.
traditional lead scoring

Traditionally, leads are scored based on how well they fit the company’s customer profile (demographic data) and their engagement (behavioral data)

Traditional lead scoring is better than having no lead scoring, but it’s not a perfect system either. For once, marketers cannot always accurately determine what attributes matter most -- people can be subjective, relying on false and outdated knowledge.

And even if they were entirely unbiased, there’s a limit to human perception and finding sense in thousands of data points. Enter predictive lead scoring.

Predictive lead scoring

Predictive lead scoring is a marketing application of a statistical technique called propensity modeling that helps forecast behavior of target audiences. When combined with machine learning and data mining, it can make forecasts based on historical and existing data to identify the likelihood of conversion. So, the main difference from traditional lead scoring is the model’s ability to determine more reliable attributes based on expansive data.

Here’s how predictive lead scoring works:
  1. Automatically picking attributes. ML algorithms process your historic sales data to discover patterns and determine which attributes and combinations of attributes indicate a customer’s propensity to convert.
  2. Preparing an ML model. Once the attributes are selected, a model is created and tested. Most commonly, in predictive lead scoring, a logistic regression model is used, which gives a probability of conversion for each lead. Usually, multiple models are trained and validated to pick one that creates fewer errors, false positives.
  3. Running and updating the model. As more leads enter the system, the model corrects its attributes and their weights, allowing it to stay relevant in changing markets.
traditonal vs predictive lead scoring

The main differences between traditional vs predictive lead scoring

Now, to highlight the benefits of the ML-based method, let’s look at some implementation examples from well-known companies.

Predictive lead scoring success stories

InVision converts more users from free to paid plan

InVision is a massively popular design and prototyping tool that operates on a subscription-based, freemium model. With 7 million users, InVision deals with huge volumes of leads. They were pretty successful with identifying their Product Qualified Leads, but the conversions remained low as most clients used free or self-serve versions. When the sales team started contacting PQLs, they kept hearing the same: Clients just weren’t ready to commit to paid plans.

So, they teamed up with marketing prediction vendor MadKudu, whose predictive model was able to isolate accounts that were ready to buy a license from the whole PQLs pool. The model used historical firmographic and behavioral data to identify winning attribute combinations.

invision lead scoring

Scoring reports help the team understand why the account scored high and who they should contact Source: MadKudu

InVision created alerts notifying their team whenever an account is ready to buy, enabling their expert sales representatives to reach out and upsell the most promising candidates.

Hotjar automates personalized message sending

A marketing software product itself, Hotjar didn’t have a problem with lead generation -- but had limited resources to segment their leads and create personalized messages for them. This posed a serious problem, since just like with InVision, its freemium pricing strategy required targeted communication with leads.

Hotjar collaborated with Infer on a custom predictive model, which scoured their historical and external data and produced predictions on how each lead fits the company’s perfect customer. Now, Hotjar receives scores directly in their HubSpot CRM and can schedule demos during the first two weeks of a free trial with the best prospects.

DocuSign achieves 38 percent increase of SQLs

A leading eSignature provider, DocuSign used to do traditional lead scoring and assigned a letter ranking (A for most valuable and D for least valuable) for their leads. But sales reps struggled to understand how one A lead is better than another and picked them randomly. To increase accuracy and provide more clarity, DocuSign worked with Lattice to transition to the purchasing likelihood percentage with predictive analytics.

DocuSign started by running predictive scoring models on one customer persona. They increased the number of indicators from 4 to 10 and achieved a 38 percent increase in the number of SQLs. They also reported a 22-times ROI in two initial months.

If you feel predictive analytics is a great match, keep reading to learn a crucial part: what data the system will require to determine winning attributes.

Key data points for predictive lead scoring

In traditional lead scoring, you decide what data matters to you; the predictive approach needs the largest datasets possible to work its magic. Let’s review all data points that can help the engine identify key attributes.

Demographic data

Lead scoring relies heavily on determining your customer persona, which often starts with understanding their most basic characteristics:
  • Job title
  • Date of birth/Age
  • Location
  • Education
  • Gender
  • Marital status
  • Income
  • Lifestyle and interests, etc.
Sources: Google Analytics, website forms, social media, direct communication.

Purchase history

You probably already collect detailed purchase data about your past customers and that will be a great source for building a model. At the same time, you can gain more information about user purchase habits from third-party sources. This includes such data points as:
  • Minimum, maximum, and average amount of money spent
  • Purchase frequency
  • Bought full price or with coupon/on sale
  • Transaction date and time
  • Product quantity purchased, etc.
Sources: CRM, aggregated data from card processors and other vendors.

Engagement and activity

So-called implicit data, engagement information is collected from customer behavior and actions on your website. It can be divided into two groups: usage and brand engagement.

Usage data often includes:
  • Number of pages viewed
  • Number of times the same page is viewed
  • Number of downloads
  • Number of logins in a period
  • Free trial sign-up
  • Features used, etc.
Engagement data covers marketing activities:
  • Emails opened
  • Email links clicked
  • CTA reactions
  • Contact page visits
  • Chat messages sent, etc.
Sources: Google Analytics, CRM, email sending tool.

Account profile/Firmographic data

For B2B companies, it’s important to distinguish a customer persona -- created from demographic data -- from a business account or firmographic data, that includes such information as:
  • Industry type
  • Organization size
  • Sales and revenue
  • Location
  • Ownership framework
Sources: website forms, market studies, data aggregators, LinkedIn.

These key data points will power the predictive model and finally help make sense of the relationships you have with your client. But what happens after you get the scores?

Using predicted lead scores day to day

Lead scores by themselves are nothing more than numbers that the sales team will need to turn into a tool that can be used in daily operations. This tool should help marketers easily identify which prospects to contact next. For this, we need to define how high quality customers will be different from low quality customers and where this threshold lies. Here’s what marketers do with predicted lead scores.

Calculate the model’s cumulative gains

You can calculate your cumulative gains by applying the predictive model to customers who already made the purchase. The cumulative gains curve evaluates the model’s success by comparing its results with random picks.

On the graph below
  • the blue line indicates the predicted “gain” from a certain percentage of customers: 10 percent of them with the highest score of above 90 bring you 25 percent of customer purchase, 20 percent with a smaller score bring over 40, 30 percent are responsible for 60 percent of all purchases, and so on.
  • the orange line shows what we achieve by contacting customers at random, basically “no gain”: the greater the curve to the top-left corner, the greater the gain.
Cumulative gains curve

Cumulative gains curve

But after 50 percent of customers, the lift started to decline, which we can see as the graph flattens. Predictive lead scores and the cumulative gains curve indicate where you should apply segmentation.

Segment customers

According to cumulative gains, we can put our customers into three buckets:
  • High quality leads: prospects with lead score points from 90 to 55, who are responsible for 80 percent of all purchases.
  • Medium quality leads: prospects with 54-45 scores that bring around 15 percent of all purchases.
  • Low quality leads with the lowest score of 44-0 that bring 5 percent of all purchases.
Depending on the bucket, different nurturing tactics are applied. For example, high quality leads will be contacted directly by sales people, medium quality leads will receive a series of automatic personalized emails, and low quality leads will be ignored or pushed to provide more data, for example, by filling in the form.

Now as we learned where predictive modeling begins (data collections) and ends (insights and practices), let’s choose the implementation method. There are two main approaches.

Predictive lead scoring implementation scenarios

Predictive lead scoring can be applied two ways: building a custom predictive engine in-house or using a ready predictive scoring platform. This is a classic build vs buy question, and we will give you some pros and cons of each method to help you choose.

Using predictive lead scoring software

Here’s what you get when working with one of the tools such as ones by Infer, HubSpot, MadKudu, or many other options.

  • Easy implementation. You connect such software to your CRM or other marketing tools via API.
  • Access to external databases. The platform will often complement your data with firmographic and demographic data from their own sources.
  • One ecosystem. If you buy such software from your CRM provider (HubSpot or Salesforce, for example), you will have a unified experience and easier data exchange.
  • Limited integrations. The list of software tools the platform will be able to seamlessly integrate with is limited. If you’re using a custom or unconventional CRM or more specific marketing platforms, you’ll have trouble setting up connections.
  • Data security. Your prospect data is your biggest asset. When giving an external system access to it, you don’t control how secure it is and face the risk of breach.

Building a custom predictive lead scoring engine

Custom development is not for everyone, especially concerning machine learning. Here’s what you need to consider.

  • Data sources of your choice. As you will be building your own data infrastructure, you get to decide where you pump your data from.
  • Custom integrations. You can build connections to virtually any software and won’t have to opt for a short list provided by a vendor.
  • Full control over machine learning. With a ready platform, you get to use the model the provider already tested on generic data and algorithms that they picked. By taking care of ML in house, you get to be more creative and innovative with how you apply one of the most useful technologies a business can ask for.
  • Long and expensive process. Building an ML system will take longer and require more money. You will have to prove to your stakeholders that the investment will bring the desired outcome.

Predictive lead scoring requirements

There are two things to consider before implementing lead scoring:
  • Can you implement predictive lead scoring? This question determines whether you have enough resources to do predictive lead scoring today. Basically, prerequisites for adoption.
  • Should you implement predictive lead scoring? This question concerns if it actually makes sense for you to invest time and money into this endeavor.
Let’s help you answer them. Here are the requirements that show that you’ll be able to create and successfully use an ML model to predict your leads.

You have high quality historical data. A machine learning process starts with gathering data. While in some cases, public datasets might be enough, lead scoring requires historical data about your sales activity to understand what activities lead to sales in your exact case. So, if you never logged all your customer behavior and profile information or weren’t vigilant about keeping it accurate, you’re behind the eight ball. Further in the article, we will talk in detail about the types of data needed.

You have a lot of data. The biggest advantage AI models have over traditional analytics is the sheer volume of data they are able to process and derive value from. A model needs the maximum amount of high quality data about your best leads to understand hidden patterns of success. How much data is enough? It varies, but HubSpot recommends having at least three months’ worth of customer data or 500 contacts to try yourself on an ML model but not until the third condition is covered.

You haven’t changed your target market and product recently. Your data should be relevant to the product you’re currently selling and its positioning. If you’ve changed it not so long ago, wait at least another three months before applying analytics.

You have skills and resources or the budget to get it. A predictive analytics engine is a complex mechanism that requires the whole data infrastructure to be able to pump live data and give you up-to-date insights on your leads. You can build such a system in house by hiring skilled data scientists or you can opt for a ready platform like the one offered by HubSpot, Infer, or Either way, you need to have the budget and specialists to build, integrate, and maintain the engine.

You have a lot of leads. If you have hundreds of leads every month, you won’t be able to manually determine their common attributes or analyze a lot of data points. You need a stronger mechanism to go through that much information.

You’ve been using traditional lead scoring with mixed results. One of the problems with manual lead scoring is that good leads may not be obviously good, so marketers can’t pinpoint what actually matters and create key attributes. MadKudu proposes the following method to determine if your traditional lead scoring is doing a good enough job: Pull a random list of 10 recently closed deals and measure the percent that come from non-obviously good leads. If 50 percent or more of SQLs come from non-obviously good contacts, you’re losing a lot of deals and can improve your metrics with predictive lead scoring.