After spending years navigating the complexities of evolving technologies and business demands in the data world, I’ve gathered some wisdom of my own—much of it centered on driving new developments or re-engineering existing ones.

I can’t say it was always an intuitive or straightforward journey to improve or create something from scratch, or that this path was free of mistakes. Yet, through these experiences, I have learned what it takes to build a new and high-performing data team, lead a new data project, or design an ecosystem of new data products that drive business value up.

With all the business and technical variables, the positive outcome usually depends on one constant—the right people who will support and shadow you.

At its core, the success of the new data platforms boils down to the proper selection of individual contributors with multidisciplinary and complementary skills. These individuals go beyond job titles; they bring together expertise across domains and a shared innovation mindset.

Because of this, one of my favorite sayings related to data staffing is:

“The best data engineers are runaway software engineers; the best data analysts, scientists, and solution (data) architects are runaway data engineers; and the best data product managers are runaway data analysts or scientists.”

Or in semi-visual format: SWE → DE → DA/DS/SA → DPM.

This flow motivated me to write this article, where I want to focus on answering the question: Why do you need multidisciplinary data roles and how do they navigate and balance their responsibilities in new settings?

Or, to say it bluntly, how different data roles “split the bill” when building a new data platform.

Moreover, I will address how they act as leads, shadows, or sparring (LSS) partners during different phases of the new data setting.

The New Data Settings and the Role Dynamics

There are usually four phases in building a new data platform: 1) preparation, 2) prototyping, 3) productionalizing, and 4) monetizing.

Speaking from my experience of working in medium to large data projects, to arrive at phase four, you mostly need a few key data roles:

Engineer/Architect
Analyst/Scientist
Business/Technical Analyst
Team/Product/Project Lead
Domain Expert/Stakeholder

Now you probably wonder, “Which roles take on the lead, shadow, and sparring tasks? And what’s the significance of the backslashes in every role?”

Let me address the latter question first and explain the backslashes. With project budgets in mind, few companies nowadays can afford to dedicate a single position to a single role in new developments. Hence, you will mostly find hybrid roles in these settings, with the “M-shaped” profile, where individuals bring depth and breadth of expertise.

Consequently, every role in a new setting can be a lead, a shadow, and a sparring one in some capacity or scenario (which is an answer to the first question above). This is where multidisciplinary becomes crucial; it’s no longer about focusing on small project components but taking bigger responsibility and contributing to multiple areas.

This leads me back our introduction, where I explained the wisdom of the SWE → DE → DA/DS/SA → DPM flow.

In other words, if you get a chance, staff the people with knowledge spanning multiple areas. As knowledge is power, they will understand what comes “before” and what comes “after.” Their ability to lead, shadow, and spar effectively will enhance the quality and efficiency of project delivery.

With this said, it’s important to note that the intensity of every role isn’t uniform across all phases.

But why?

Because the workload distribution shifts as development evolves. The technical roles are most active during the core technical phases, and non-technical roles maintain consistent engagement across all phases.

To better understand how these roles evolve, I’ve created a matrix that maps their involvement across the four phases of new developments.

LSS dynamics per new data setting phase and data role. Image by author.

This matrix shows how LSS capabilities shift across four development phases, and their involvement intensity can be summarized as follows:

Lead: Takes the primary ownership and direction.
Lead/Sparring: Primary leads but also actively provides feedback.
Sparring: Provides parallel delivery, support, and/or feedback.
Shadow: Observes with limited participation, preparing to continue the delivery process.
Shadow/Sparring: Combines observation, preparation, parallel delivery, and occasional feedback.

Let’s dive into their tasks and a more detailed scope of the work per project phase.

1. Preparation Phase

This phase is about laying the groundwork. It’s where you identify the potential threats (business and technical problems that can be expected), outline probable solutions, and then translate these into project/product work packages, architectural plans, and budget estimates. With this in mind, the work-task distribution of the different data roles usually looks like this:

Lead roles:

Business/Technical Analyst: Typically, this role is the lead in preparation, as they gather requirements and assess the feasibility. The role aims to bridge the business and development inputs and help in a boundary setting. More precisely, it provides inputs on what you need, functionalities to focus on, and setting realistic timelines to meet business and technical expectations.
Team/Product/Project Lead: Coordinating tasks, timelines, approvals, and overall planning. It has the task of mapping out the big picture—project timelines, team composition, communication channels, and high-level solution design. Their main task is to organize other roles on strengths and ensure the structure is in place to keep development on track in future phases. They are responsible for communicating with project sponsors and serve as a single point of contact.

Shadow/Sparring roles:

Domain Expert/Stakeholder: Functions as a shadow and sparring role, advising on specific business inquiries, and works closely with other business and technical roles in full plan creation. It usually pinpoints relevant data sources and supports setting the project success metrics.
Engineer/Architect: Acts as a sparring role here, contributing to the technical plan (data architecture and service definition) and feasibility assessment, but not as a lead. Their early architectural decisions impact the important aspects of the data platform (e.g., data quality, platform scalability or even delivery speed).
Analyst/Scientist: Limited involvement; has a shadow role, and it’s contributing in a similar way as a domain expert. They usually provide inputs to support the creation of the architecture by listing their requirements for specific analytical/data features that data services (e.g., BI or ML platforms) should have.

2. Prototyping Phase

This is where technical plans become tangible, and the technical data roles start their hands-on part in delivering proofs of concept and minimum viable products that will be brought to life in the next phase.

Lead roles:

Engineer/Architect: Responsible for building the initial prototype of the integration pipelines, with a definition of the data storage and management architecture, data quality tools, and orchestration services. In this phase, they aim to ensure that the selected technical infrastructure supports smooth data flow and scalability, especially if this is a first-time build.
Analyst/Scientist: Steps up to a lead role, building and prototyping data models or algorithms that will be new data products. They run preliminary analyses to explore data’s potential and design early-stage self-service models, or BI/ML reports/dashboards.

Shadow/Sparring roles:

Business/Technical Analyst: Becomes a sparing role, focusing on refining business requirements based on prototyping inputs from technical colleagues. In summary, they gather and share feedback from the business and technical side to ensure new data products are aligned with end-user expectations.
Team/Product/Project Lead: Acts as a sparring role, overseeing and resolving project blockers—e.g., resolving connectivity issues to the source/target systems. In addition, it keeps track of the project scope and ensures that business requirements don’t change too much.
Domain Expert/Stakeholder: Maintains a shadow/sparring role, with the task of validating the data product prototypes and providing feedback for their design. They check for the relevance of the delivered MVPs, ensuring the initial data products align with the expected business impact.

3. Productionalizing Phase

The production phase brings data products to life. This stage also covers the last-mile development and testing/improvement of new developments. It is a critical phase focused on deployment and its coordination, requiring collaboration between technical and business roles.

Lead roles:

Engineer/Architect: Remains in the lead seat by overseeing the deployment of the different data pipelines, performance adjustments, and maintenance setups to ensure a smooth production run.
Analyst/Scientist: Also a lead role, responsible for supporting the deployment of data models and their performance monitoring. Accordingly, they implement the necessary tweaks to deliver fully functional data products to end-users.
Team/Product/Project Lead: Has a lead role and responsibility for managing release coordination and communication with stakeholders. They manage the final handover to operations and conduct a post-launch review to assess the project’s performance.

Shadow/Sparring roles:

Business/Technical Analyst: Moves to a sparring role, supporting user acceptance testing (UAT) and documenting processes. They assist with creating user guides and training resources for new developments.
Domain Expert/Stakeholder: Remains in shadow mode, validating final outputs and giving the data products a sign-off, ensuring they align with business goals and are ready for market use.

4. Monetizing Phase

In the monetizing stage, non-technical roles are essential to ensure the new data platform and new products start creating actual business value.

Lead roles:

Business/Technical Analyst: A lead role that defines and collaborates on pricing strategy. If necessary, it also conducts market analysis to identify potential revenue streams. They articulate the product’s value proposition, ensuring it resonates with potential customers and stakeholders.
Domain Expert/Stakeholder: Takes on a lead/sparring role, sharing market insights and helping differentiate the new developments within the market. Their industry knowledge directly supports the product’s positioning.
Team/Product/Project Lead: As a lead role, manages the monetization strategy, explores partnerships, tracks revenue, and reports performance back to the team and project stakeholders.

Shadow/Sparring roles:

Engineer/Architect: As a shadow/sparring role, ensures the data infrastructure can scale and remain secure as the product is monetized. They set up the necessary compliance measures to align with market standards, especially if data privacy regulations are involved.
Analyst/Scientist: Acts as a sparring role and provides analytical insights and data product development insights to support the business side in monetizing new developments.

Final Remarks

Building a new data platform isn’t just about modern technology or tools. It’s more about selecting people with multidisciplinary expertise who can collaborate, and balance responsibilities in new settings to drive business value up.

After exploring different hybrid data roles and their dynamics across the four delivery phases, I wanted to showcase how the success of a new data platform depends on each role’s LSS capabilities.

So, the next time you find yourself at the beginning of a new development, ask yourself one question: Do I have the right people with the right knowledge?

Originally published in Towards Data Science.

The Lead, Shadow, and Sparring Roles in New Data Settings

The New Data Settings and the Role Dynamics

1. Preparation Phase

2. Prototyping Phase

3. Productionalizing Phase

4. Monetizing Phase

Final Remarks