Data Architecture
Data architecture describes the structure of an organization’s logical and physical data assets and data management resources, according to The Open Group Architecture Framework (TOGAF). It is an offshoot of enterprise architecture that comprises the models, policies, rules, and standards that govern the collection, storage, arrangement, integration, and use of data in organizations. An organization’s data architecture is the purview of data architects.
The goal of data architecture is to translate business needs into data and system requirements and to manage data and its flow through the enterprise. Many organizations today are looking to modernize their data architecture as a foundation to fully leverage AI and enable digital transformation. Consulting firm McKinsey Digital notes that many organizations fall short of their digital and AI transformation goals due to process complexity rather than technical complexity.
Data Architecture Principles
Data is a shared asset. A modern data architecture needs to eliminate departmental data silos and give all stakeholders a complete view of the company.
Users require adequate access to data. Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs.
Security is essential. Modern data architectures must be designed for security and they must support data policies and access controls directly on the raw data.
Common vocabularies ensure common understanding. Shared data assets, such as product catalogs, fiscal calendar dimensions, and KPI definitions, require a common vocabulary to help avoid disputes during analysis.
Data should be curated. Invest in core functions that perform data curation (modeling important relationships, cleansing raw data, and curating key dimensions and measures).
Data flows should be optimized for agility. Reduce the number of times data must be moved to reduce cost, increase data freshness, and optimize enterprise agility.
Data Architecture Components
A modern data architecture consists of the following components, according to IT consulting firm BMC:
Data pipelines. A data pipeline is the process in which data is collected, moved, and refined. It includes data collection, refinement, storage, analysis, and delivery.
Cloud storage. Not all data architectures leverage cloud storage, but many modern data architectures use public, private, or hybrid clouds to provide agility.
Cloud computing. In addition to using cloud for storage, many modern data architectures make use of cloud computing to analyze and manage data.
Modern data architectures use APIs to make it easy to expose and share data.
AI and ML models. AI and ML are used to automate systems for tasks such as data collection, labeling, etc. At the same time, modern data architectures can help organizations unlock the ability to leverage AI and ML at scale.
Data streaming. Data streaming is flowing data continuously from a source to a destination for processing and analysis in real-time or near real-time.
Container orchestration. A container orchestration system such as open-source Kubernetes is often used to automate software deployment, scaling, and management.
Real-time analytics. The goal of many modern data architectures is to deliver real-time analytics, the ability to perform analytics on new data as it arrives in the environment.