Google infrastructure chief Urs Hölzle: This is the future of software and the cloud

  • admin
  • September 3, 2018
  • Comments Off on Google infrastructure chief Urs Hölzle: This is the future of software and the cloud


Urs Hölzle knows a thing or two about the rapid evolution of data centers and the infrastructure behind them. As a Google Fellow and senior vice president of technical infrastructure, he has helped define that evolution over the past couple of decades as the search and ad giant built its own industry-leading global infrastructure. Now, Google aims to leverage all that to become a bigger force in public cloud computing.

For the typical enterprise, we see a lot of move to the cloud, we see colocation, we see hyperconvergence, we see containers, we see composable infrastructures. A lot of trends are coming together right now and I wonder how this is all going to look in five years.

Cloud actually is about a software platform. Obviously, it involves hosting or virtual machines, but that’s not what’s going to be the lasting value. It really is about a software stack that is more uniform, makes you more productive, makes you more secure.

Look at the history of open source. Twenty years ago there was nothing that was relevant to an enterprise that was open source. Maybe BSD [Berkeley Software Distribution version of Unix], but basically nothing. Five years later, 2003, Linux and the LAMP stack [Linux, the Apache HTTP Server, the MySQL relational database management system and the PHP programming language] was pretty common already. Java wasn’t quite open source, but I’ll throw it in there. Basically, every five years afterwards, the amount of IT where open source was relevant was bigger.

And the part about open source isn’t so much that it’s free. It’s the way to standardize things. Linux is portable not because there is a big book that documents what Linux is, but because everyone’s using the same source code and therefore obviously they’re all compatible and they have portability. That’s really been our angle on cloud: “Look, it’s a software stack, first with Kubernetes and now with Istio and cloud services platform. Look, here’s all these things where it makes no sense to be different, because there’s no differentiation.” You actually get a working piece of code that is not encumbered by commercial or IT concerns.

How does Google’s approach fit in there?

We’re moving that up the stack and we’re saying today, before Kubernetes and before Istio,  there were too many things that, like orchestrating workloads, load balancing, how you configure load balancing, how you configure logging or access control, that everyone was doing. And everyone was doing more or less the same thing. They were all doing it in different ways.

Our approach to a data center on-premises or in the cloud is to say, up to the service level, what is the service? Who can call it? How does it log? How does audit logging or debug logging working? How does tracing requests between services work? How do you instantiate services? How do you discover service? How do you load balance the service? All these basic things. None of them new. All of them done for decades. How are they standardized? And the answer is through Kubernetes and through Istio, which is really the next stage of the higher level part of Kubernetes.

What we announced at [the recent conference Cloud] Next is that we’re going to have both a open source version but then also a managed version of that so that we actually can manage your workloads on-premises or you can manage the workloads in the cloud. What we demonstrated was an application that ran across both. And when you looked at the administration of that application, you could not tell the difference between the on-premises world and the off-premises world. They were really the same because they were ultimately both running a Kubernetes workload and a Kubernetes API is standardized and we were running the same Kubernetes code on both sides. I think five years from now, everyone’s going to run the stack.

The stack is Kubernetes, it’s Istio, it’s the LAMP stack. Are there elements of the stack that you think are going to go mainstream like that?

Kubernetes and Istio don’t really do anything except support services, either services you built or services you consume from a third party. It’s really the infrastructure that runs services without implementing the service itself. Five years from now, all the services you consume from SaaS providers will run on top of that stack because that’s good for SaaS providers because [they can] implement it once. They can deploy it on any cloud and they can deploy it on-premises.

And then all the services that you write yourself as an enterprise, obviously you’re going to use that tool for the same reason, because you don’t, when you write the service, you don’t have to commit [if] this service going to run on-premises or is it going to run in the cloud? And if so, which cloud? Kubernetes and Istio basically free you from that coupling.

Do the distinctions between on-prem and cloud and colo and whatever options you have just disappear in that scenario?

They don’t disappear because there still are things that may not work on-premises, that may work only on one cloud. Let’s say one second you are using zero CPUs and in the next second you are using a thousand CPUs because you’re running a query and a second afterwards, you’re running zero. You can’t implement that on-premises because there’s just not enough slack.

But everything else…. If you have a Java app or a legacy app or a newly written microservice, then you’re not going to have these gratuitous differences between the different environments. It really looks like what you got with Linux. If you have a Linux app, you’re not asking, “Do I have to rewrite it if I move it from on-premise to Google or to Amazon?” No, of course not, because Linux is the same anyplace. And that will be true at the higher level as well in terms of how you instantiate the service, how do you secure it and how you scale it.

Now there are cloud providers who don’t want that to happen and will try to pull customers back in proprietary direction. How do you think that conflict works out?

Open source has been winning in pretty much every category where it had a strong product. You can see it in Kubernetes. Something like 75 percent of enterprises are using Kubernetes and four years ago it didn’t exist. It’s so obvious that it’s the right thing. I don’t actually think that resistance will last five years. I think it will last two years because the customer interest is so large. One customer after another is telling us that they really are struggling between the discrepancy between how things work in the cloud and how things work on premise even on simple things.

Both Google and Amazon this summer came out with on-premise versions of their cloud for on-premises adoption. What was the thinking there on your part.?

We saw a lot of demand where people said, “Look, Kubernetes is awesome. But when I use it on premise I have to administrate it myself and I wish I wouldn’t. And I wish I could have a experience that’s more like Google Kubernetes Engine, which basically automates all the maintenance and upgrades and things of the Kubernetes system itself. That’s what we built.

It’s actually quite different from what Amazon has or what Microsoft has because Amazon’s system or Microsoft system is a proprietary API. If you use it on-premises, there’s only one way how you can go into the cloud because it’s not actually a standard. It’s what they did and nobody else has that. Kubernetes you can run on-premises and you can run on any cloud, not just the three big ones. It works on any Chinese cloud. It works on IBM mainframes. It really works everywhere.

In the world that you’re describing where essentially anything can run anywhere, how should IT organizations prepare to train people appropriately for that?

That’s actually one of the other really big advantages of the Kubernetes ecosystem because, when a company adopts that on-premises and in the cloud, it is a new stack. But they can train everyone on the same stack. And as applications move around, they don’t have to retrain people because all the details remain the same.

You’re much less likely to make mistakes. You don’t have the gaps between the environments that otherwise happen where you thought you were implementing the same thing on both sides. But because they work differently, you actually misunderstood one of them and now you have a gap in your security setup. That’s why I actually think training is gonna be another area that really pushes people toward Kubernetes.

With Kubernetes, you obviously filled a critical gap with the orchestration process. But what technology gaps still exist before we reach this perfect world you’re describing?

Kubernetes was great for orchestration, but Kubernetes was more about the “how” than the “what.” You could package things up to container and you deploy it. It’s actually very useful for that.

But Kubernetes had no opinion about what the thing actually is that you’re deploying. Most of the time it’s a service or an application. Istio really talks directly about that service or the application that’s inside the container and lets you manage that service or the application and its security properties. That’s the biggest gap actually that we’re filling with Istio.

There’s quite a few things on top that then actually have a chance of emerging because once you agree on what services are, and how to secure them, how to discover them, then you can build, for example, more industry-specific things on top of it. I would expect those things to emerge too over time. Once everyone agrees on what a service is, how to discover it, et cetera, then, for example, multiple SaaS providers can get together and integrate their applications in a better way for their customers and this integration will then work in every cloud and on-premises.

Why is that so important?

You have your phone, right? Your phone updates applications overnight and you never look at that. You don’t  run a program and it just works. It’s not because the software engineers who write mobile apps are perfect. It’s because, by and large, all of the users are on the same platform. So they can test things really well against that platform. Then they can do a 1 percent rollout… and find problems early before they affect the whole population. And as a result, you get a real frictionless update experience.

Today in the enterprise, God help you. It’s not frictionless and you’re afraid of any change because it’s going to have a problem. And the reason is because every enterprise is different. This whole ecosystem hasn’t happened yet and therefore, even if it’s the same software provider with the same updates, every enterprise disagrees on pretty much everything. For IT to be back in the information curve that is more like the consumer [side], the update has to be like a consumer update.

What is the motivation for a company like Google to encourage customers to build apps that are multicloud-friendly, that can run on your competitors’ infrastructure?

First of all, we’re smaller than AWS, so … that is more likely to benefit us than them. But really, the key thing is that, if … enterprises adopt a standardized system, then the value we can deliver just gets much higher. Going against a single platform is much, much easier than doing the traditional enterprise things for everyone that’s different. It really helps us innovate and helps our customers innovate.

And last but not least, we, of course, hope that if you’re adopting Kubernetes on-premises, you’re going to look the Google Kubernetes Engine to manage that on-premises environment and your cloud environment as well. Therefore, we are a likely choice for those on-premises users to be their cloud users because they’re already using our tools to administer things on-premises and in the cloud. I think the times of everything being proprietary, they ended 15 years ago, right? It’s not going to work.

What stays proprietary in this world?

Let’s say today I’d say [Google’s] BigQuery is the world’s best data warehouse. If you want the world’s best data warehouse … you got to come to Google. Cloud TPUs [Google’s Tensor Processing Unit chips] are the fastest hardware [for machine learning]. If you want to have them, you got to come to Google. Spanner, the horizontally scalable SQL [database], exists in only one place. It’s Google Cloud.

Everyone has their differentiators and there is plenty of room to differentiate. What we’re really seeing is that, in the bread-and-butter things that every enterprise is doing, that is not the place to differentiate. It’s really just the next era that has not yet been standardized through open source that we’re addressing here.