Emad Benjamin

Chief Technologist, Application Platforms, VMware

Emad has spent the past 25 years in various software engineering positions involving software development of application platforms and distributed systems for various industries such as finance, health, IT, and heavy industry – in various international locations. Emad is currently the Sr. Director and Chief Technologist of Application Platforms with Office of the CTO at VMware, focusing on building hybrid cloud distributed runtimes that are application aware.

Presentations

Come to this session to learn about building application platforms that are capable of handling new deployment paradigms such as microservices, fast data, big data, and functions. While these paradigms have offered immense developer velocity and productivity; they often lead to many challenges at runtime from performance and scalability perspectives.

I will demonstrate how complexity of the old monolith has been shifted to a new complexity of a distributed nature. If what use to be an in memory call now is a network hop away to another service, this comes at a cost, but then can the platform be made smarter to handle this? It turns out the answer to this is absolutely yes! The rise of the application platform has given way for other runtime specialized layers to be encoded into the platform to handle network distribution complexities that you don’t have to worry about. For example, if two microservices happen to call each other 99% of the time across the network within one application domain (or multiple domains for that matter), then why do they need to be distributed so far from each other? Why suffer such a latency burden? What if a specialized network layer could detect this and bring the two services really close to each other so that the latency between them is minimal. There are many other patterns I will discuss here, but this essentially gives rise to a specialized layer the industry is calling service mesh. What if I could specify a latency-tolerates limit across a sequence of calls between microservices, and indicate that regardless of what happens I want this layer to guarantee it doesn’t exceed this limit, that would be super cool!

Come to this session to learn about how we solved a fairly complex problem associated with maintaining predictable response time across set of service calls that are spread across multiple clouds. Many over the past few years have embraced microservices based architectures to increase flexibility and speed of feature delivery.

However, with this comes the challenge of maintaining consistent performance, scale, and good response times, in this session we will talk about a specialized controller that we have written called Predictable Response Time Controller (PRTC) to help with the challenges of maintaining scale and response times.

There is no doubt that much debate and opinion exists in our industry as to what really is a cloud native app platform! Regardless of what your definition is, it is certain that we simply don't have time to re-write all of the second generation monolithic apps, but yet we must modernize at the same time. With this reality in mind, I begin the discussion of how to build platforms that are both 3rd generation and 2nd generation capable while delivering feasible enterprise application platforms.

As for the cloud native movement, it is important to understand the elements of this rapidly moving phenomenon through our industry, a phenomenon of building platforms, not just business logic software but infrastructure as software. I humbly believe that the drive towards these platform solutions is due to the following fact: approximately half of new applications fail to meet their performance objectives, and almost all of these have 2.x more cloud capacity provisioned than what is actually needed. As developers we live with this fact every day, always chasing performance and feasible scalability, but never actually cementing it into a scientific equation where it is predictable, but rather it has always been trial based, and heavily prone to error. As a result we find ourselves delving with some interesting platforming patterns of this decade, and unfortunately we are lead to believe that such patterns as Microservices, 3rd platforms, cloud native, and 12factor are mainly a change in coding patterns, to the contrary – these patterns reveal a much more deep shift in the way developers view and consume infrastructure. In fact these patterns represent a major change in “deployment” approach, a change in how we deploy and structure code artifacts within applications runtimes, and how those application runtimes can leverage the underlying cloud capacity. These patterns are not code design patterns, but rather platform engineering patterns, with a drive to using APIs/Software to define application platform policies to manage scalability, availability and performance in a predictable manner. In this session I will briefly inspect platform patterns that we built over the last decade, and the ones we are building for the next decade. The main objective of the session will be to layout the definition of Platform Engineering as the software engineering science needed in order to understand how to precisely deploy application components onto application runtimes and how in-turn one should appropriately map the application runtimes onto the infrastructure that it needs. With this knowledge you should be able to more effectively decide when to scale-out (by how many instances), and when to scale-up both manually and dynamically as needed within your software defined platform. Which are key ingredients for knowing how to run 2nd and 3rd gen application platforms at the same time under one platform.

We often meet customers that have migrated to the public cloud only to later determine that some of their critical legacy application patterns have transitioned to a public cloud implementation, and they are now paying higher costs due to this design flaw. Regardless of cloud location, what really matters is how well you have abstracted the application platform nature of your enterprise workloads. If you don’t understand your application workloads in terms of scalability, performance, reliability, security, and overall management, then you are simply shifting the problem from one cloud to another.

IT practitioners are bringing their old habits to new problems. The key to this problem is deeply rooted in the knowledge gap that exists between development and operations organizations. In this session, we talk about the notion of the application platform and its teachings to close the gap that exists between developers and infrastructure architects. At the most fundamental level you can think of application platforms as an abstraction of three major parts: 1) application code logic; 2) application runtime where the code runs; and 3) infrastructure abstractions such as CaaS, K8s, and fundamental IaaS. We will also cover the notion of the Hybrid Cloud Runtime (HCR) as a common control plane that will help in getting a common observability across such multi cloud distributed applications. At the most fundamental level HCR is made of Servicemesh, and a set of application runtime aware controllers to manage SLA and help SREs optimize their day-to-day interactions with such systems.

In-memory databases have now become permanent components of the enterprise application stack, and knowing how to size, scale, and tune them in VMware vSphere or bare metal environments is a paramount skillset. In recent years, we have seen in-memory cluster sizes from 1 to 5 TB of memory within a single cluster driving millions of transactions per day. Not only do these systems have zero tolerance to failure, most expect a predictable throughput and response time. In this session, we visit the most common deployment patterns and what choices you have to make in placing the server components vs. the consumption/ingestion clients. We will also inspect various transaction volumes and discuss common administration tasks.

This session will do a sizing deep dive, in terms of how to best size the cache nodes, how to size the virtual environment, and other considerations to make these systems highly available, scalable and with predictable performance. In the case of Java based in memory DBs we will do a deep dive into various GC algorithms and how to best configure JVMs.

In this 2 part session (each part 1.5 hours), I will deep dive into the deployment architectures of large scale Java platforms. I will first set the context of the discussion around what problems exist in our industry before I proceed to lay out the concept of platform engineering and its renaissance among the developer/deployment community today. It is astonishing to see that majority of new application platforms being rolled out today miss their SLA, and 90% of these systems require 2X more hardware than what they actually need in order to run. This is an industry suffering from a double whammy, where you spend 2X on hardware and still miss your SLA; clearly something is completely broken. Now prior to delving into these new concepts, such as Microservices, Cloud Native, 3rd Platform, & 12Factor App, it is imperative to first understand the problem at hand before we apply these deployment architectural patterns. I will layout the definition of Platform Engineering as the software engineering science needed in order to understand how to precisely deploy application components onto application runtimes and how in-turn one should appropriately map the application runtimes onto the infrastructure that it needs. With this knowledge you should be able to more effectively decide when to scale-out (by how many instances), and when to scale-up.

We will conclude with covering various GC tuning techniques, and how to best build platform engineered systems; in particular, the focus will be on tuning large scale JVM deployments and various sizing techniques. While predominantly most enterprise class Java workloads can fit into a scaled-out set of JVM instances of less than 4GB JVM heap, there are workloads in the in memory database space that require fairly large JVMs. We will look at various Java platform scales, some holding a few large JVMs of 90GB heap space – servicing large multi-terabyte in memory DBs, while other platforms are of thousands of JVM instances of less than 4GB heap space on each, typical of web-app deployments. We will also take a close look at an example XYZCars.com, where a microservices approach was designed and deployed. The discussion will cover how to more correctly deploy microservices, without causing fragmentation of scale, and hence without impeding performance. In this session, we take a deep dive into the issues and the optimal tuning configurations for tuning large JVMs in the range of 4GB to 360GB, using the GC tuning recipes that were gained over the past 15 years of GC tuning engagements. You should be able to walk away with the ability to commence a decent GC tuning exercise on your own, and your way to platform engineer your application runtimes to feasibly meet your SLAs.

Books

  • Virtualizing and Tuning Large-Scale Java Platforms

     

    Technical best practices and real-world tips for optimizing enterprise Java applications on VMware vSphere®

     

    Enterprises no longer ask, “Can Java be virtualized”? Today, they ask, “Just how large can we scale virtualized Java application platforms, and just how efficiently can we tune them?” Now, the leading expert on Java virtualization answers these questions, offering detailed technical information you can apply in any production or QA/test environment.

     

    Emad Benjamin has spent nine years virtualizing VMware’s own enterprise Java applications and working with nearly 300 leading VMware customers on projects of all types and sizes—from 100 JVMs to 10,000+, with heaps from 1GB to 360GB, and including massive big-data applications built on clustered JVMs. Reflecting all this experience, he shows you how to successfully size and tune any Java workload.

     

    This reference and performance “cookbook” identifies high-value optimization opportunities that apply to physical environments, virtual environments, or both. You learn how to rationalize and scale existing Java infrastructure, modernize architecture for new applications, and systematically benchmark and improve every aspect of virtualized Java performance. Throughout, Benjamin offers real performance studies, specific advice, and “from-the-trenches” insights into monitoring and troubleshooting.

     

    Coverage includes

    --Performance issues associated with large-scale Java platforms, including consolidation, elasticity, and flexibility

    --Technical considerations arising from theoretical and practical limits of Java platforms

    --Building horizontal in-memory databases with VMware vFabric SQLFire to improve scalability and response times

    --Tuning large-scale Java using throughput/parallel GC and Concurrent Mark and Sweep (CMS) techniques

    --Designing and sizing a new virtualized Java environment

    --Designing and sizing new large-scale Java platforms when migrating from physical to virtualized deployments

    --Designing and sizing large-scale Java platforms for latency-sensitive in-memory databases

    --Real-world performance studies: SQLFire vs. RDBMS, Spring-based Java web apps, vFabric SpringTrader, application tiers, data tiers, and more

    --Performance differences between ESXi3, 4.1, and 5

    --Best-practice considerations for each type of workload: architecture, performance, design, sizing, and high availability

    --Identifying bottlenecks in the load balancer, web server, Java application server, or DB Server tiers

    --Advanced vSphere Java performance troubleshooting with esxtop

    --Performance FAQs: answers to specific questions enterprise customers have asked

     

     

  • This book is the culmination of 7 years of experience in running Java on VMware vSphere both internally at VMware and at VMware customer sites. In fact many of VMware’s customers run critical enterprise Java applications on VMware vSphere where they have achieved better TCO, and SLAs. This book covers high level architecture and implementation details, such as design and sizing, high availability designs, automation of deployments, best practices, tuning, and troubleshooting techniques.