How to build a data architecture to drive innovation—today and tomorrow
Yesterday’s data architecture can’t meet today’s need for speed, flexibility, and innovation. The key to a successful upgrade—and significant potential rewards—is agility.
Over the past several years, organizations have had to move quickly to deploy new data technologies alongside legacy infrastructure to drive market-driven innovations such as personalized offers, real-time alerts, and predictive maintenance.
However, these technical additions—from data lakes to customer analytics platforms to stream processing—have increased the complexity of data architectures enormously, often significantly hampering an organization’s ongoing ability to deliver new capabilities, maintain existing infrastructures, and ensure the integrity of artificial intelligence (AI) models.
Current market dynamics don’t allow for such slowdowns. Leaders such as Amazon and Google have been making use of technological innovations in AI to upend traditional business models, requiring laggards to reimagine aspects of their own business to keep up. Cloud providers have launched cutting-edge offerings, such as serverless data platforms that can be deployed instantly, enabling adopters to enjoy a faster time to market and greater agility. Analytics users are demanding more seamless tools, such as automated model-deployment platforms, so they can more quickly make use of new models. Many organizations have adopted application programming interfaces (APIs) to expose data from disparate systems to their data lakes and rapidly integrate insights directly into front-end applications. Now, as companies navigate the unprecedented humanitarian crisis caused by the COVID-19 pandemic and prepare for the next normal, the need for flexibility and speed has only amplified.
For companies to build a competitive edge—or even to maintain parity, they will need a new approach to defining, implementing, and integrating their data stacks, leveraging both cloud (beyond infrastructure as a service) and new concepts and components.
Six shifts to create a game-changing data architecture
We have observed six foundational shifts companies are making to their data-architecture blueprints that enable more rapid delivery of new capabilities and vastly simplify existing architectural approaches. They touch nearly all data activities, including acquisition, processing, storage, analysis, and exposure. Even though organizations can implement some shifts while leaving their core technology stack intact, many require careful re-architecting of the existing data platform and infrastructure, including both legacy technologies and newer technologies previously bolted on.
Such efforts are not insignificant. Investments can often range in the tens of millions of dollars to build capabilities for basic use cases, such as automated reporting, to hundreds of millions of dollars for putting in place the architectural components for bleeding-edge capabilities, such as real-time services in order to compete with the most innovative disruptors. Therefore, it is critical for organizations to have a clear strategic plan, and data and technology leaders will need to make bold choices to prioritize those shifts that will most directly impact business goals and to invest in the right level of architecture sophistication. As a result, data-architecture blueprints often look very different from one company to another.
When done right, the return on investment can be significant (more than $500 million annually in the case of one US bank, and 12 to 15 percent profit-margin growth in the case of one oil and gas company). We find these types of benefits can come from any number of areas: IT cost savings, productivity improvements, reduced regulatory and operational risk, and the delivery of wholly new capabilities, services, and even entire businesses.
So what key changes do organizations need to consider?
1. From on-premise to cloud-based data platforms
Cloud is probably the most disruptive driver of a radically new data-architecture approach, as it offers companies a way to rapidly scale AI tools and capabilities for competitive advantage. Major global cloud providers such as Amazon (with Amazon Web Services), Google (with the Google Cloud Platform), and Microsoft (with Microsoft Azure) have revolutionized the way organizations of all sizes source, deploy, and run data infrastructure, platforms, and applications at scale.
One utility-services company, for example, combined a cloud-based data platform with container technology, which holds microservices such as searching billing data or adding new properties to the account, to modularize application capabilities. This enabled the company to deploy new self-service capabilities to approximately 100,000 business customers in days rather than months, deliver large amounts of real-time inventory and transaction data to end users for analytics, and reduce costs by “buffering” transactions in the cloud rather than on more expensive on-premise legacy systems.
Enabling concepts and components
— Serverless data platforms, such as Amazon S3 and Google BigQuery, allow organizations to build and operate data-centric applications with infinite scale without the hassle of installing and configuring solutions or managing workloads. Such offerings can lower the expertise required, speed deployment from several weeks to as little as a few minutes, and require virtually no operational overhead.
— Containerized data solutions using Kubernetes (which are available via cloud providers as well as open source and can be integrated and deployed quickly) enable companies to decouple and automate deployment of additional compute power and data-storage systems. This capability is particularly valuable in ensuring that data platforms with more complicated setups, such as those required to retain data from one application session to another and those with intricate backup and recovery requirements, can scale to meet demand.
2. From batch to real-time data processing
The costs of real-time data messaging and streaming capabilities have decreased significantly, paving the way for mainstream use. These technologies enable a host of new business applications: transportation companies, for instance, can inform customers as their taxi approaches with accurate-to-the-second arrival predictions; insurance companies can analyze real-time behavioral data from smart devices to individualize rates; and manufacturers can predict infrastructure issues based on real-time sensor data.
Real-time streaming functions, such as a subscription mechanism, allow data consumers, including data marts and data-driven employees, to subscribe to “topics” so they can obtain a constant feed of the transactions they need. A common data lake typically serves as the “brain” for such services, retaining all granular transactions.
Enabling concepts and components
— Messaging platforms such as Apache Kafka provide fully scalable, durable, and fault-tolerant publish/subscribe services that can process and store millions of messages every second for immediate or later consumption. This allows for support of real-time use cases, bypassing existing batch-based solutions, and a much lighter footprint (and cost base) than traditional enterprise messaging queues.
— Streaming processing and analytics solutions such as Apache Kafka Streaming, Apache Flume, Apache Storm, and Apache Spark Streaming allow for direct analysis of messages in real time. This analysis can be rule based or involve advanced analytics to extract events or signals from the data. Often, analysis integrates historic data to compare patterns, which is especially vital in recommendation and prediction engines.
— Alerting platforms such as Graphite or Splunk can trigger business actions to users, such as notifying sales representatives if they’re not meeting their daily sales targets, or integrate these actions into existing processes that may run in enterprise resource planning (ERP) or customer relationship management (CRM) systems.
3. From pre-integrated commercial solutions to modular, best-of-breed platforms
To scale applications, companies often need to push well beyond the boundaries of legacy data ecosystems from large solution vendors.
Many are now moving toward a highly modular data architecture that uses best-of-breed and, frequently, open-source components that can be replaced with new technologies as needed without affecting other parts of the data architecture.
The utility-services company mentioned earlier is transitioning to this approach to rapidly deliver new, data-heavy digital services to millions of customers and to connect cloudbased applications at scale. For example, it offers accurate daily views on customer energy consumption and real-time analytics insights comparing individual consumption with peer groups. The company set up an independent data layer that includes both commercial databases and open-source components. Data is synced with back-end systems via a proprietary enterprise service bus, and microservices hosted in containers run business logic on the data.
Enabling concepts and components
— Data pipeline and API-based interfaces simplify integration between disparate tools and platforms by shielding data teams from the complexity of the different layers, speeding time to market, and reducing the chance of causing new problems in existing applications. These interfaces also allow for easier replacement of individual components as requirements change.
— Analytics workbenches such as Amazon Sagemaker and Kubeflow simplify building end-to-end solutions in a highly modular architecture. Such tools can connect with a large variety of underlying databases and services and allow highly modular design.
See you in next blog with the following topics :
- From point-to-point to decoupled data access
- From an enterprise warehouse to domain-based architecture
- From rigid data models toward flexible, extensible data schemas
- How to get started
Kadıköy, İstanbul – TURKEY
Author: M. Temel AYGÜN, Ph. D. in Aerospace Eng.
Copyright belongs to author.