So here’s the scenario: you’re Company A, and you're on the hunt for a data scientist who can jump in with minimal hand-holding. Naturally, you start looking at other companies as potential sources of talent. But which ones develop talent like yours, not just in job title, but in real, usable skills?
That’s where this comes in handy. We’re looking at how closely the skills and proficiencies of data scientists at other companies (let’s call them Companies B through O) align with yours at Company A.
The idea? The more similar the skill profile, the smoother the transition. Less ramp-up time, fewer surprises, and fewer “wait...you’ve never worked with an API before?” moments.
Of course, there’s a big ol’ asterisk here: we’re assuming that similarity in skills equals readiness to perform. Not always true, but a reasonable shortcut when you're trying to hire at scale and want to stack the odds in your favor.
How we got here
Finding the right fit for modern, AI-augmented roles isn’t just challenging, it’s like trying to solve a puzzle with pieces from different boxes.
In a recent article on TalStrat (you can check it out here), I explored how today’s technical roles don’t map cleanly to where skills are heading in the next few years. The overlap wasn’t great, which means talent acquisition teams have their work cut out for them.
That led me to another question: Which companies are already producing talent that aligns with the skills your roles truly need, not just the job titles?
Because while some logos look great on a résumé, the day-to-day work might be nothing like what your positions require.
Getting the data
To test this out, I chose to use ChatGPT 4o to gather my data. I am going to share what the prompts looked like and some of the fixes/tweaks that were necessary along the way.
You can see that this is not a source of information that is meant to replace your enterprise analytics stack or guide high-dollar decisions. It’s a directional look. Think of it like a sketch, not a blueprint.
The numbers you’ll see reflect the estimated percentage of a company’s talent pool, specifically, their data scientists, that are proficient in each skill, based on assessments from a large language model. Is it perfect? Certainly not. But is it useful for pattern-spotting and shaking loose a few insights? Definitely.
For the real heavy lifting, like headcount planning, workforce transformation, or hiring strategy, use robust tools like Lightcast, LinkedIn Talent Insights, or Draup.
With that, here is the initial prompt to get things going, as shown in Figure 1:
I did this for 15 companies. Note that I didn't use any current or former employers. That said, I have anonymized them because I want the focus to be on the method, not the details related to the data, which I have already expressed should be replaced by data from a reputable vendor.
I appended the data at this point in Excel and used Tableau to apply a pivot. To get oriented, the example below shows that 46% of data scientists at Company A are proficient in APIs. That’s the general format, skills are the rows of the table, and the associated proficiency ratios by company are in the columns.
Applying cosine similarity
Now that we have skill profiles for each company, the next step is to measure similarity. I used cosine similarity. Imagine every role...say, “data scientist”, as a vector of skills. Each skill gets a weight based on how many people at that company are proficient in it. Cosine similarity then compares these skill vectors by measuring the angle between them.
Here’s the cheat sheet: If the value is 1.0, it means the company being compared is perfectly aligned in its skill orientation. If it’s 0, the composition of the data scientist talent pool at that comparison company is completely unrelated, with no overlap in skills. And if it’s negative, it means they’re essentially opposites in skill orientation. They don’t just lack overlap; they pull in entirely different directions.
In Figure 3, you’ll see the results for Company A compared against several others. I used Tableau to do the k-means clustering (with k=3) and to visualize the clusters. The top three matches came in around 0.6, which is decent, but to get into strong fit territory, I’d want to see something closer to 0.75 or higher.
Tweak your pipeline for where you want to be
So far, we’ve looked at which companies are producing talent that matches your current team’s skill profile. But what if your goal isn’t to maintain the status quo? What if you want to reshape your team by bringing in new capabilities, especially around AI and emerging technologies? That’s where we shift from matching today to planning for tomorrow.
To illustrate this, I adjusted the skill profile of Company A to emphasize AI-related capabilities. I manually set all AI-tagged skills to 100%, essentially telling the system: “Assume we want every future hire to be fully proficient in AI.” This tweak doesn’t reflect reality, it’s aspirational. But it helps us target companies already building that kind of talent.
"I skate to where the puck is going to be, not where it has been." Wayne Gretzky
So here's the talent strategy version:
"I build my talent pipeline for the future state of our team, not the current snapshot.” Future-Thinking Talent Sourcers
In this case, I overwrite the cells (competencies and/or skills) where I want to leverage new hires to add capabilities to the talent pool. I chose to ride the hype rocket for this example and am emphasizing every competency/skill related to AI. This is executed by overwriting each mapped to AI as 100 in the data for Company A (100% of data scientists in your talent pool are proficient in the item). Is that true? No. But I want to force it to match us with other company talent pools as if we were.
I put the results from the cosine similarity next to the legacy calculations. You can see that we are still looking at roughly the same priority order. That said, results will certainly be different if you use the real data and as you toggle between roles. You may also be swapping out different job titles from the pipeline companies, and this will make things rather interesting, and what you predict may be far from accurate.
The gist is that you may need to change your pipeline companies depending on where you’re headed.
So what?
This walkthrough wasn’t just about playing with data. It was about making better decisions, faster. First, we explored how to identify the best pipeline companies based on the current composition of your role. Essentially, who’s already developing talent that maps well to your existing needs. Then, we shifted the lens to a future-focused view, modifying your internal profile to prioritize high-value, AI-specific skills that you have determined are important going forward for that role. That simple tweak will improve your pipeline thinking and potentially reveal a whole new set of potential sourcing targets that you had not previously recognized. It may be proof that where you look for talent should change depending on where you're headed.
That's a wrap. I hope that you found this interesting, and THANK YOU FOR READING!
-Scott
Quick plug for a side project...If you like fantasy football, please subscribe to my site on the subject. I'll have it up and running with updates starting within a few days.
Let’s keep the conversation going and explore how we can collectively prepare for and shape the future of work. Your feedback and insights are invaluable. Please like, comment, and let’s keep these ideas flowing!
Disclaimer: The views and opinions expressed in this article are solely those of the author and do not reflect the official policy or position of any current or former employer. Any content provided in this article is for informational purposes only and should not be taken as professional advice.
#WorkforcePlanning #Upskilling #FutureOfWork #PeopleAnalytics #SkillsStrategy #TalentManagement #Reskilling #CareerDevelopment #EmergingTech