Transforming the Future of Data Architecture

Design your data management architecture for the way your data and analytics (D&A) teams organize, share and analyze data.

5 Ways a Data Architecture Discipline Can Assist the CDAO

Download this research to operationalize your data architecture discipline and align D&A capabilities with organizational goals.

By clicking the "Continue" button, you are agreeing to the Gartner Terms of Use and Privacy Policy.

Contact Information

All fields are required.

Company/Organization Information

All fields are required.

Optional

Data Architecture is a strategic enabler for data and analytics success

To effectively connect business strategy with technical execution, CDAOs must establish a dedicated data architecture discipline. This enables D&A to plan effectively, connect strategy to actions and create long-term value for the organization. Download this research, and:

  • Connect D&A initiatives to tangible business outcomes.

  • Facilitate cross-organizational technical collaboration.

  • Maintain a robust risk posture for D&A assets.

Design your data architecture for modern business needs

Modern organizations need a modular data architecture that supports complex enterprise environments while delivering data access to business users. Here are some key considerations.

Data architecture is evolving to deliver data self-service enabled by metadata

Data analytics architecture best practices have passed through a number of eras over the past decades, as digital transformation initiatives have highlighted the need to modernize the data strategy and leverage opportunities to use data. These eras include:

  • Pre-2000 period — Enterprise Data Warehouse Era: Data architecture centered on the success of the enterprise data warehouse (EDW).

  • 2000-2010 — Post-EDW era: This period is defined by fragmented data analysis, whereby data marts were dependent on the data warehouse. And depending on who you asked, you had a different version of the truth, as each data mart consolidation led to yet another data silo leading to fragmented and inconsistent analytics.

  • 2010-2020 — Logical Data Warehouse (LDW) Era: This period saw more unified analysis of data via a common semantic layer, that enabled access to data warehouses, data marts and data lakes. This is current best practice.

  • 2020-future — Active Metadata Era: The future will see augmented analysis of data using all the relevant data sources, accessed and enabled by advanced analytics, recommendation engines, data and AI orchestration, adaptive practices and metadata analysis. 

Democratizing data access and self-service analytics is motivating the current evolution underway from the LDW Era to Active Metadata Era. Chief data and analytics officers (CDAOs) likewise hope to expand the use cases for data beyond that which LDWs can handle. These include master data management, interenterprise data sharing, B2B data integration, partner data sharing, application data integration and others.

But what is metadata, and what role does it play in this evolution?

Metadata describes different facets of data, such as the data’s context. It is produced as a byproduct of data moving through enterprise systems. There are four types of metadata: Technical, operational, business and social. Each of those types can be either “passive” metadata that organizations collect but do not actively analyze, or “active” metadata that identifies actions across two or more systems utilizing the same data.

Framework showing the evolution of data management architecture beyond enterprise data warehouses and into the age of active metadata.

Active metadata can enable automation, deliver insights and optimize user engagement, and is a key enabler of self-service analytics. Realizing its potential, however, requires a data architecture that balances requirements for repeatability, reusability, governance, authority, provenance and optimized delivery.

Data analytics leaders see two options for evolving their data architecture from the LDW Era, where most operate today, toward the Active Metadata Era. Those options are data fabric or data mesh. These separate concepts share the goal of providing easier access to data for everyone that uses it — including data scientists, data analysts and data engineers, as well as data consumers. Though many data leaders talk about data fabric and data mesh as competing data architecture approaches, they are more accurately viewed as complementary.

Data fabric leverages existing assets from the logical data warehouse era

Data fabric is an emerging data management and data integration design concept (also see “Modernize Data Management to Increase Value and Reduce Costs”). Its goal is to attain flexible, reusable and augmented data integration to support data access across the business. 

Data fabric is a natural evolution for many organizations from their logical data warehouse models because it leverages existing technology and metadata in a modernized data architecture. There is no “rip and replace” with a data fabric design. Instead, it capitalizes on sunk costs while providing prioritization and cost control guidance for new data management spending.

Data fabrics deliver benefits across different perspectives:

  • Business Perspective: Enables less technical business users (including analysts) to quickly find, integrate, analyze and share data

  • Data Management Team Perspective: Productivity advantages from automated data access and integration for data engineers, and increased agility resulting in more closures of data requests per day/week/year

  • Overall Organization Perspective: Faster time to insight from data and analytics investments; improved utilization of organizational data; reduced cost by analyzing the metadata across all participating systems and providing insights on effective data design, delivery and utilization

Simple framework shows the key components of a data fabric, including the layers of data integration, cataloging and analysis.

The two factors that determine whether a data fabric design is right for a given organization are: Metadata completeness and data fabric subject matter expertise in the organization. Specifically, organizations with too little metadata will not see the benefits of data fabric. A lack of metadata also increases dependency on subject matter experts (SMEs) that can assist in discovering, inferring and even authoring metadata — which can negate the relatively low-SME requirements of a data fabric design.

Data mesh, while appealing, requires a disciplined approach

Data mesh is an architectural approach that allows for decentralized data management. Its goal is to support efforts to define, deliver, maintain and govern data products in a way that makes them easy to find and use by data consumers. Data mesh architecture is based on the concept of decentralizing and distributing data responsibility to people who are closest to the data and to share that data as a service.

The most common drivers for data mesh are: More data autonomy for lines of business (LOBs), less dependency on central IT and leveraging decentralization of data to break down silos (though some data centralization within a mesh architecture may be warranted). Despite its obvious appeal, be aware of the following prerequisites and challenges.

Data mesh architecture is not yet an established best practice.

The term is associated with varied approaches that differ by organizational model, management of the data and technology implementation. The organizational drivers also vary. They include removing IT as a bottleneck and rationalizing siloed datasets resulting from LOB-led data pipeline creation, or triggered by a cloud-modernization data-management initiative.

Data analytics leaders should not adopt data mesh architecture as a seemingly easy solution to their data management challenges. Although it formalizes common practices, it abdicates data accountability to LOB experts, which risks proliferating siloed data uses.

Data mesh success depends on the organizational model and data skills in LOBs.

If data literacy, autonomy and data skills vary greatly across departments, and if organizations lack the ability to operationalize data management activities, central IT will need to provide more support — at least at first. LOBs can evolve toward greater autonomy within a data mesh environment by creating new roles, such as data product owners, to manage the definition, creation and governance of data products. Organizations that lack commitment to building distributed data skills, however, should avoid data mesh.

Diagram shows how data mesh architecture integrates federated governance across data products and lines of business.

Data mesh architecture, design and technology implementation varies greatly.

Data mesh architecture implementations are generally cloud-based and use shared storage and processing. However, tools used with each LOB for the delivery, maintenance and governance of data will vary greatly based on the use cases and the contract between the producer and the consumer. These contracts define the scope, SLAs and cost of operations for data products, such as availability, compute costs, concurrency of access, governance and quality policies, context and semantics. Organizations that proceed without clear contracts in place often face shareability and reusability constraints — which go against the goals of developing a data mesh architecture.

Organizations need a federated governance model.

Data mesh shifts responsibility for data governance to domain application designers and users. For an LOB to autonomously build and expose data products, it must define local data governance and data management that comply with central guidance from the chief information security officer (CISO) and the chief data officer (CDO) or central governance board. In mature data mesh organizations, the business organization enforces its own governance policies with central IT support, not the other way around.

Data mesh is a viable option for organizations with incomplete metadata. So long as they have data architects with subject matter expertise, they can start with data mesh and build their active metadata stores in parallel.

 

The complexity of modern environments demands a flexible data architecture

Data leaders operating with on-premises, cloud, multicloud, intercloud and hybrid deployments will need to revise their existing data architecture strategy to support their present and future complexity. A carefully planned and robust data architecture ensures that new technologies cohere with existing infrastructure and can support future demands — including integration and interoperability across cloud providers, SaaS solutions and on-premises resource deployments, among others. Focus your planning around the following activities:

  • Devise a strategy that addresses the whole data ecosystem. It's common even for organizations with initial cloud deployments to grow into a hybrid and multicloud environment over time. Establishing an overarching cloud strategy that prioritizes providers can govern additional cloud deployments. This will mitigate the risks that unsanctioned cloud deployments can pose to your data architecture.

  • Align data requirements to use cases. Distributed and complex use cases are now driving newer innovations that deliver business value — in particular, by enabling self-service data access. Success in the cloud will depend on the ability to satisfy business consumer use cases, which are most likely distributed in nature, close to data sources and operating on edge networks and devices.

  • Evaluate integration patterns. Rapid data growth and self-service data access have exacerbated the challenge of moving data across different cloud and on-premises systems with the right bandwidth, latency and throughput. Evaluate your integration patterns to identify a reliable and efficient data architecture that serves evolving business use cases and meets data compliance and sovereignty needs.

  • Embrace open source and open standards to future-proof data investments. Get familiar with open-source pricing models in the cloud, including charges for compute and storage resources. Use standards that are open or provider-neutral, and understand the options for open-source data stores, as well as open-source metadata standards that make metadata shareable across platforms in an enterprise environment. Finally, have a support plan in place to address issues with open-source solutions.

Data architecture FAQs

What are the latest trends and technologies in data architecture?

The latest trends and technologies in data architecture are: 

  • Data Fabric: This design concept supports data access across the business through flexible, reusable, and augmented data integration. It leverages existing technology and metadata to modernize data architecture without a complete overhaul. 
  • Data Mesh: An architectural approach that decentralizes data management, assigning data ownership to business domains. It aims to support efforts to define, deliver, maintain, and govern data products, making them easy to find and use by data consumers. 
  • Active Metadata: The shift from passive to active metadata enables automation, delivers insights, and optimizes user engagement. Active metadata identifies actions across systems utilizing the same data, facilitating self-service analytics. 

What are the best practices for ensuring data architecture scalability and flexibility?

  • Modular Design: Build a modular architecture to allow independent scaling of components as demands evolve.
  • Microservices: Use microservices for the deployment and scaling of specific data services, enhancing flexibility and agility.
  • Elastic Scaling: Adopt cloud-native solutions for automatic scaling to handle varying workloads efficiently.

How data architecture complies with data privacy regulations like GDPR and CCPA?

  • Data Governance Frameworks: Implement strong governance policies to manage data's lifecycle, ensuring its proper creation, consumption, and control in alignment with regulations.
  • Data Masking and Encryption: Apply these techniques to protect sensitive data and maintain compliance with privacy standards.
  • Audit Trails: Keep detailed logs of data access and changes to support compliance monitoring and reporting.

Drive stronger performance on your mission-critical priorities.