By Mike Gianfagna, Sr. Director, Synopsys
High-performance computing (HPC) has taken on many meanings over the years. The primary goal of HPC is to provide the needed computational power to run a data center – a utilitarian facility dedicated to storing, processing, and distributing data. Historically, the data being processed was the output of business operations for a given organization. Transactions, customer profiles, sales detail, things like that. The goal was typically to create business intelligence out of a sea of transactional information.
Sometimes, the data represented research. The business intelligence in this case was the pursuit of knowledge of some physical effect – how could we infer more with measured data to advance our knowledge, and our profit? Drug discovery, oil field analysis, and weather prediction are all examples of this kind of data processing.
For many, many years, data centers performed these kinds of operations with a complement of hardware consisting primarily of storage, computation, and communication in an arrayed configuration. More data meant more copies of storage, computation, and communication.
There are two key attributes of this type of data center. First, the data being processed is generated by real-world events. Sales, transactions, and physical observational data for research are all examples of data that is generated at speeds consistent with human interaction. Second, the processing of the data to create information was done using procedural software systems, written and debugged by humans.
Over about the past 10 years, the paradigm described above has undergone a fundamental shift, both in the volume of data to process and how it is processed. Let’s look at both phenomena.
Data is no longer generated by human events. Thanks to widespread sensor deployment, coupled with a hyperconnected environment, all types of devices are generating data at an exponentially increasing rate. Your smartwatch captures details about your exercise regimen and your health. According to one study, an autonomous vehicle can generate 5TB of data per hour of operation. If you consider how many such vehicles will be in operation in the coming years, you can clearly see a data avalanche.
Statista puts the composite of these trends in perspective. The chart below is the projected volume of data created, captured, copied, and consumed worldwide from 2010 to 2025 in zettabytes. One zettabyte is approximately one thousand exabytes or one billion terabytes. Note that in 2010, the planet contained two zettabytes of data against a project volume of 181 zettabyes by 2025.
And the way intelligence is extracted has shifted as well. Various forms of artificial intelligence (AI) are used to extract key, actionable insights from all this data. Inference approaches can discern speech and visual patterns. Reinforcement learning techniques can identify the best outcomes from a massive number of possibilities.
This kind of processing doesn’t look like traditional software either. It isn’t code written by a human. Rather it’s a massive array of processing events that are operated on and learn from examining huge quantities of information and outcomes.
With this fundamental shift in information generation and information processing and storage over the past decade, it’s no surprise that data center architectures have changed dramatically over the same period. The strategy of replicating storage, computation, and communication elements to meet the need simply doesn’t work.
The explosion in data volumes demands new approaches to storage that rely on distributed networks. Computation can no longer be done with a central processing unit, no matter how fast it is. Instead, custom processing elements that are optimized to specific workloads are needed. Lots of them, doing different tasks in a synchronized way on massive, distributed data sets. Communication is also quite different. Concepts such as a discrete network interface card (NIC) and top-of-rack switches in a server rack in a data center are no longer efficient from a performance (latency) point of view for moving data within the data center.
There are now organizations where inferring intelligence from massive amounts of data is part of the core business. These companies have led the way into the data center renaissance. Known as hyperscalers, these companies have re-defined both the architecture of data centers and their place in society. Google, Amazon, Facebook, Microsoft, Alibaba, Baidu, and Tencent are hyperscalers.
They have each advanced the state of data center design and information processing in their own way. Google built its Tensor Processing Unit, or TPU, to provide the right architecture to run AI algorithms. Amazon built AWS Trainium for the same reason. In fact, virtually all the hyperscalers are building custom chips to power their data centers.
The way data centers are configured is also changing. Key elements such as memory, storage, processing power, and network bandwidth are now pooled. These resources can then be combined and deployed based on the needs of a particular workload as opposed to configuring the right mix of those resources in a server. As the workloads change, the architecture of the data center changes. This approach is known as a composable data center.
The business model for data centers has also changed. While on premises, private facilities are still quite prevalent, the extreme cost of building and operating next-generation facilities can be prohibitive. As a result, those who can build them do, and they also sell capacity to those who can’t. This was the birth of cloud computing. This process is similar to what happened for chip fabrication. Many companies owned and operated wafer fabs, until the cost got prohibitive, and the technology became very complex. At that point, a few key players emerged who provided wafer fab capability to whoever needed it.
As they say, the devil is in the details. That is definitely true of the data center renaissance. The end result is quite spectacular in scope and impact. But implementing all this capability brings its own set of challenges.
It is well known that Moore’s law is slowing. Going to the next process node won’t get you the performance, energy efficiency, and cost reduction needed to succeed. The scaling benefits of Moore’s law are still important, but other strategies come into play.
The scale complexity of Moore’s law is now supplemented with a series of strategies that exploit systemic complexity. Purpose-built, custom devices to execute specific AI algorithms is one. TPU and Trainium are examples of this. Methods to create multi-die designs, composed of either chips, dense memories, or chiplets into a single system is another daunting challenge. As is adding massive amounts of memory as 3D stacks and synchronizing large and highly complex software stacks to run on these new architectures.
Public cloud computing has also put high emphasis on security. The information and insights created in public cloud data centers is highly valuable. Hardware and software systems are needed to safeguard the security of that information.
This new era of innovation combines the scale complexity of Moore’s law with new approaches that exploit systemic complexity. We call this the SysMoore era and it is changing life as we know it.
Taming the design complexity of large-scale chips and systems of chips is a critical item for success. Methods to integrate all this technology into a single, unified system is another. Robust verification, strong security, and reliability are needed as well, along with low energy consumption and a trusted source of pre-verified building blocks.
The good news is that Synopsys has a comprehensive focus on high-performance computing and data center development. We provide end-to-end solutions for the entire process, with robust design optimization and productivity. Our portfolio of pre-verified IP covers all requirements. We can even provide design services to help you build your next venture, with skilled resources in design, implementation, and verification EDA tools and methodologies, as well as the integration of complex IPs in advanced technology nodes.
We can also show you how to monitor and optimize the performance of your design after it’s deployed in the field and integrate the latest photonics capabilities as well.
Data centers have indeed come a long way. They are no longer just a utility. The insights produced by state-of-the-art facilities improve our health and safety, bring our society together, and even improve the planet’s environment. They are truly now the center of the universe.
Visit the Synopsys High-Performance Computing & Data Center Solution site and see what your future might look like.