Disclaimer: Many points made in this post have been derived from discussions with various parties, but do not represent any individuals or organisations.
Defining clear roles, responsibilities and ways of working is very important. Although my other post has already described the Engine and the Driver, it is interesting to understand what capabilities should remain centralised and what should be decentralised for an organisation to become more effective in their data analytics journey.
Let's start by looking at the essential functions required to facilitate a data-driven organisation.
- Infrastructure - with a few exceptions of highly-regulated sectors, the direction for a data infrastructure has been moving towards the cloud. Thanks to the serverless architecture and container technology, the shift not only reduces the operational complexity but also allows for higher reliability, availability and scalability, which are essential attributes for a data platform.
- Data Pipelines - a robust movement and processing of data from one point to another is required to make it suitable for consumption. A data pipeline can be either a simple ELT/ETL process or a complex orchestration including real-time streaming and modelling. The emergence of streaming engines such as Storm, Flink and Spark also makes real-time analysis easier.
- Reporting and Analysis - the ultimate goals of getting into the data space is to gain additional values. It can be a simple process that turns data into informational summaries or a complex analysis that extracts meaningful insights in a descriptive, a predictive or a prescriptive way. The product of such reporting or analysis can be presented in different ways subject to the usability and functionality requirements.
- Other Functions - security and governance are considered intrinsic functions to the data platform. Access controls and appropriate policies must be in place to safeguard against attacks and unintended usage of sensitive data. Suitable capabilities on monitoring, auditing and billing are also essential depending on the operational requirements of each organisation.
Before considering what capabilities should be decentralised or remain centralised, it is worth to understand what can happen under a different context.
The quadrant above illustrates some products and capabilities given the level of automation and data literacy within the business. Two crazy corners represent extreme cases, sluggish insights (top-left) and over-engineering (bottom-right). When a centralised function is unable to serve the organisation quickly, it is common for the business to establish a Shadow IT or reach out to third-party vendors, which often results in duplicate efforts and expensive tactical solutions. Similarly, without adequate data literacy, a highly automated data platform is just a costly investment that does not give much value.
What should remain centralised?
Regardless of the maturity of an organisation, the main functions that should remain centralised are the infrastructure, security and governance. However, from the stakeholders' perception, it can result in higher overhead and lower productivity, which can be valid under specific scenarios. For example, the function is underbudgeted, and since it is likely to be a cost centre, there is a challenge in justifying for more resources. Another instance is when the task force is governed by a very rigid process or operates based on the "waterfall for everything" principle.
With the right level of automation and process optimisation, the organisation can still be agile. Generally, it is more cost-effective to have a dedicated DataOps team to develop, operate, and maintain the infrastructure/platform. Also, running a cybersecurity team or a governance team does not appear at first to be essential from a non-IT point of view as the impact is not as tangible as that of commercial. However, these functions are parts of the engine that must strictly follow the best practice, relevant policies and regulations. Delegating or ignoring these capabilities is likely to leave the organisation at risk.
A dedicated team often provides a more consistent and robust long-term service as the expertise continues to develop. If in doubt, ask simple questions such as: "Do I need cybersecurity personnel in logistics?" and "Do I want to be responsible for the platform security?". If the business decides to invest in these areas, it is more practical to become a sponsor instead of trying to embed the capabilities within the department.
What can be decentralised?
Reporting and analysis are natural functions that can be performed by individual business based on their domain expertise. However, reports that are useful to the whole organisation can be delegated to a centralised team for automation and optimisation. For example, the team can build meaningful data assets that can be used by all business areas.
Once the data platform becomes more matured with advanced features such as a self-service portal and a data science infrastructure, the business can start to perform data experimentation. Such experimentation enables them to create a better semantic layer that is more relevant to end-users based on existing data sets or data assets. It is the first step required to accelerate the discovery of any additional insights.
With a well-developed governance framework, the analytics teams can start to conduct more data exploration and cross-functional analytics involving data beyond their area. Needless to say, this requires a significant effort and a strong collaboration between departments both technically and politically. Depending on the maturity and the skill set of an individual team, the orchestration of last-mile data pipelines can potentially be decentralised to create a micro data warehouse.
It is worthwhile for the leaders who partake on a data journey to regularly consider what the business would benefit from decentralising the capabilities mentioned above. If the teams can have more agility to demonstrate value while being held accountable, why not?