Being agnostic has become a repeating trend in the technology sector. The manufacturing being vendor-agnostic, the applications being platform-agnostic, the workload being cloud-agnostic, and the list goes on. However, when it comes to a data environment, there are few things to consider.
From an architecture point of view, being technology-agnostic allows an organisation to focus on its strategy through a set of principles and patterns, which promotes the reuse of solutions to recurring problems. It also helps the development team to think more deeply about the true value proposition when assessing a new tool or a new library. However, regular improvements and assessments of the technology stack are needed to maintain that impartiality. The following elements are worth looking at when it comes to a data environment.
Building a data environment in the cloud is generally more enticing than creating an on-premise solution as it minimises the operational complexity while maximising the speed to delivery. Big cloud providers like AWS, Azure, and GCP provide a good range of services required to create a scalable and reliable environment. Also, their offerings usually catch up relatively quickly with the latest technologies. With a well-designed architecture, using a tool like Terraform can easily make the infrastructure pretty much cloud-agnostic.
Since there are comparable products across the cloud infrastructure providers, an organisation can base the selection criteria on compliance, manageability, proprietary services, and relevant skills existing within the organisation. For other factors such as cost, service level, and security, there is no significant difference.
Suggestion: Choose a cloud provider for the data infrastructure based on existing capabilities and the target landscape, and invest in infrastructure as code to become cloud-agnostic.
Making the workload platform-agnostic is relatively straightforward, thanks to the container technology. The engineers need to develop the code only once and deploy it on any platform that supports containers. While it is true that the build configuration may need a slight modification to achieve such repeatable behaviour, the effort to migrate the workload is low as there is little to no code change required.
However, some workload types may benefit from a serverless or other computing concepts. As a consequence, the developers lose their absolute control over the environment.
Suggestion: Build and run the workload in containers to become platform-agnostic and design the codes to be modular with well-defined interfaces to achieve excellent portability.
There are plenty of tools with different approaches available in the market for data integration. However, a good one should be scalable, portable, and easy to use. At the same time, it should support a wide range of source and target systems while not compromising data integrity.
Ensuring that the data integration is tool-agnostic might be a challenge at present as many implementations are vendor-specific. Nevertheless, the market is moving towards the direction where customers can plug and play these tools quite effortlessly.
Suggestion: Choose a fully managed service tool that implements standard patterns with abundant integration options over a self-hosted solution with proprietary designs.
Many companies are still using traditional RDBMS databases to drive their analytics agenda. And that is perfectly fine as long as it meets the business requirements. However, organisations that want to gain additional insights from large cross-functional data sets may have to consider a solution that is more scalable in terms of the data volume, the schema complexity, and the concurrency limitation. Although most of the data warehouses support the query using SQL, there are always constraints and variations on the analytics functions. Of course, achieving similar capabilities across vendors is just a matter of time, so these issues should not be significant deciding factors.
Suggestion: Choose a data warehouse solution that fits within the business context, aligns with the strategy, and is suitable given existing capabilities.
Business Intelligence (BI) Tools
Selecting BI tools can be a bit trickier if the goal is to make them technology-agnostic. The analytics teams tend to build their semantics on top of the enterprise data virtualisation layer where and when they can. This semantic layer helps the analytics teams to understand the data better using business terms common to each domain. However, doing this with features available in a specific tool often results in a vendor lock-in situation and a suboptimal performance (many of these features act like a black box).
Suggestion: Build the semantic layer on top of a data warehouse instead of inside a BI tool to make it tool-agnostic.
It is of utmost importance to weigh the benefits of being technology-agnostic and the practicality of achieving such a trait within a reasonable time frame.
Perfect is the enemy of good - Voltaire
There are important topics not covered here but are worth discussing separately. For examples, data catalogue, data governance tool, and portable data science infrastructure. However, the selection criteria of different elements should follow the guideline written in the data architecture document to avoid unnecessary technical debt. I hope that the above suggestions can be useful for leaders with long-term ambition in data analytics.