Is Distributed Data Ownership Right for Your Organization?
You go to your nearest supermarket, grab a basket, fill it with everything you need, and head to the checkout counter. You scan the barcode of items, pay, put them in your bag and leave the place – all without interacting with a single person in the supermarket.
Imagine that your business stakeholders are looking to answer a specific question. They bring their data into a data lake or data warehouse, curate it and generate a BI report for it. Once again, doing this without relying on a team to enable them. Welcome to Self-Service Data Platform — “A platform to enable stakeholders to ingest, curate, share and report on their proprietary data without relying on another team,
Traditionally, data platforms are built, maintained and iterated by central teams that manage both the infrastructure and the data. This creates the notion of a dependency and a bottleneck for new initiatives and requirements. To address this challenge, many organizations embarked on the path of distributed data ownership or, in simpler words, multi-tenancy.
Each data owner is responsible for building, maintaining and operating its own data pipelines. The Platform team will be solely responsible for the underlying infrastructure and will provide new capabilities to make life easier for data owners. Theoretically, this is a great idea where you empower your end users to be self-reliant. In fact, it’s not all roses.
Most of the challenges with self-service are not due to the underlying technology that underpins the platform. Let’s talk about notable people.
senior management blessings
Moving to a self-service data platform is more of a mindset shift than a technological one. Buying and selling with your senior management is essential for the success of your initiative. It helps to steer the organization and use of data in the right direction.
Senior leadership is primarily responsible for publicizing the concept of self-service analysis. Unless they see value in it, they will not be able to persuade their counterparts to adopt it.
lack of relevant skills
One of the most common complaints from data owners is that our team does not have people skilled in data engineering. If you think about the premise of existence of self-service concept, you realize that we are empowering and enabling our users.
These users have always been analysts or business users who do not come from an engineering mindset. Asking them to do data engineering is no easy feat. So the teams that hired analysts need to start hiring data engineers now. While it’s doable, it will still take a few months to get hired, ramp up, and start deliveries.
You’d think it’s a one-time investment to hire data engineers and build a team that can take care of everything. Imagine if you had tens of thousands of groups that hire data engineers and then multiply these by the cost of acquiring those resources. We are now looking at tens of skilled engineers spread across your organization.
Each team has methods of working that it has adopted based on the backlog, priorities and objectives. The same applies to distributed data engineering teams. Standard ingestion, duration, sharing and reporting patterns are difficult to implement. Most of the time, your data platform will suffer from a lack of common data engineering practices.
This is a significant problem due to the interdependence of other internal and external teams sharing and distributing data. Multiple teams apply different patterns to silos to solve the same problem, essentially re-inventing the wheel.
Delegating control of data and pipelines to multiple business teams poses a new challenge to access and security of data. No matter how many rules you apply, the risk of data dissemination is always there. The more access teams have, the greater the risk of data leakage and incorrect access throughout the organization.
There is a fine line between restricting access and enabling users to do their jobs. Too many restrictions and nothing will happen; Too little would mean that the data is more accessible than needed. It is essential to have adequate controls to reduce your operational risk and enable self-service analysis.
The most powerful tool in democratizing your data platform is getting promoters; The only way is to socialize your idea with them. Demonstrating the capabilities and benefits your platform will bring before it even develops will get you started on the right foot.
start volunteering on a small scale
As good as it sounds, it may not be a good idea to ask your business teams for end-to-end self-service. The goal is to enable full self-service; However, breaking it down into smaller steps will only help your purpose. Maybe start with enabling self-service reporting and let the platform create standard frameworks and patterns for ingestion and curation. Begin by delegating responsibilities so as not to overwhelm business teams but to balance your present with future direction.
If you give a group of people a pen and paper and ask them to draw a picture, rest assured that no two will be the same. The same applies to data engineering. To prevent chaos, implement standard practices and patterns and train engineering teams to take advantage of them. The aim is to follow the same path if it already exists and create a new one if there is none.
The need for continuous collaboration increases with distributed teams working towards a common goal. Training sessions, chat groups and regular workshops to share findings, learnings, processes and practices will increase the acceptance of your data platform.
The data platform has become an essential component of the business strategy. Organizations from small to large are hungry for insight faster than ever. Investing in a data platform requires careful thought and implementation. The last thing you want is to make something slow and unusable. Whatever way you design your platform, be sure to give equal priority to the use of the platform by focusing on its use by the stakeholders.
“Build a Platform – Prepare for the Unexpected… You’ll know you’re successful when the platform you build serves you in unexpected ways.” — Pierre Omidyari
#lessons #learned #building #selfservice #data #platform #Manvik #Kathuria #July