How To Structure Data Science Teams for Scalable Growth

With the “unicorn” definition of the data scientist decomposing, businesses are beginning to scale up data science teams by hiring candidates with unique skill sets. The roles within these teams are becoming more apparent, and the technical skills needed are becoming more niche.

On the business skills side, positions such as analytic translators are beginning to serve as product managers. These individuals are armed with a practical understanding of AI capabilities, a solid business background, and an understanding of the company’s business drivers. This combination allows them to effectively act as a liaison between the business stakeholders and the data science team in order to prioritize projects and features by what addresses the business challenges. This central position enables the business and technical sides to align better.

On the IT skills side, the evolution of data engineers from batch ETL data processing to more API- and streaming-based applications has helped with the transition to supporting AI-enabled applications. The introduction of machine learning engineers has dramatically helped IT departments to understand the analytic products being built by the data science algorithm teams.

Successful data science teams typically take a blended approach. These teams encompass business, data science, and IT team members who can align goals, resources, and skill sets to execute data science projects successfully. Finding the talent necessary to build these teams is, of course, a challenge in its own right. But even when companies find the talent, scaling a data science team is far different than scaling other types of teams. Scaled incorrectly or sloppily, businesses will find themselves with operational inefficiencies and multiple misunderstandings.

Decide how to scale

One mistake companies make when scaling a data science team is hiring without organization. This leads to hiring overlapping skill sets and leaving gaps on the team. Businesses usually opt to scale by function or by expertise. The former relies on hiring by role, whether that candidate will work on the business intelligence team, IT side, or product side. The latter means hiring by knowledge and specialty. The CIO or CDO, whoever handles building out the data team, should always make sure information gaps are filled, meaning the IT, business, and data science teams work together with aligned interests. Once management decides how to scale, hiring can begin.

Ensure proper infrastructure

Data officers should examine the state of the company’s data infrastructure and review the IT infrastructure in place. It’s important to review items such as network compatibility (especially between legacy systems and newer software), extract-transform-load (ETL) process, documentation (or lack thereof) of processes and databases, etc. Does the infrastructure support data science projects? Is there a data lake? Will it run on-premises, in a public cloud, or using some hybrid approach? What is plan to support both batch and streaming use cases?  Will new hires be able to build and test models and move them into production? If not, what hinders that process? Scaling up a data science team without proper infrastructure will bottleneck the process, frustrate the team members, and result in immense production delays.

Assign clear roles and responsibilities

Failing to assign roles inside the data team causes confusion, and failing to assign external roles causes many big data projects to fail. Business teams will need to interact with the data science team; who is responsible for aligning business and data science teams? If the sales department has a request, who handles that? Who prioritizes work requests? Who executes them? And who sends reports? Who, ultimately, is responsible for the final product?

Especially when companies keep the IT, data, and business teams separate, questions like this cause delays in production. Of course, if a CDO arms the data science team with an IT and business support lead, then the confusion diminishes. Each member of the team must have an established role, and while that may shift, each person understands his or her responsibilities. And while it’s not necessarily the data science team’s responsibility, it’s crucial to establish a clear line between the IT and business teams and continuously review goals to set expectations on all ends.

Establish a standardized workflow and toolkit

Tool sprawl is a common problem among data scientists. Leaders must organize and standardize the tools and technologies that the team uses including deployment tools, programming languages, model serialization, code editors, etc. Each new hire should be onboarded with a standardized workflow, a set of coding standards, and a set of development tools.

The data science challenge

The challenge of scaling a data science team is one that will presumably linger for years to come since more companies are investing in big data. Research found that 78% of companies claim that big data has the potential to fundamentally change the way they do business in the upcoming one to three years, and 71% of companies believe big data will generate new revenue opportunities.

Data science team leaders should ensure proper infrastructure and organization before taking on many new hires. Furthermore, establishing a clear way of working will lessen confusion and lead to a more cohesive team.