Rohit Bhardwaj is a Director of Architecture working at Salesforce. Rohit has extensive experience architecting multi-tenant cloud-native solutions in Resilient Microservices Service-Oriented architectures using AWS Stack. In addition, Rohit has a proven ability in designing solutions and executing and delivering transformational programs that reduce costs and increase efficiencies.
As a trusted advisor, leader, and collaborator, Rohit applies problem resolution, analytical, and operational skills to all initiatives and develops strategic requirements and solution analysis through all stages of the project life cycle and product readiness to execution.
Rohit excels in designing scalable cloud microservice architectures using Spring Boot and Netflix OSS technologies using AWS and Google clouds. As a Security Ninja, Rohit looks for ways to resolve application security vulnerabilities using ethical hacking and threat modeling. Rohit is excited about architecting cloud technologies using Dockers, REDIS, NGINX, RightScale, RabbitMQ, Apigee, Azul Zing, Actuate BIRT reporting, Chef, Splunk, Rest-Assured, SoapUI, Dynatrace, and EnterpriseDB. In addition, Rohit has developed lambda architecture solutions using Apache Spark, Cassandra, and Camel for real-time analytics and integration projects.
Rohit has done MBA from Babson College in Corporate Entrepreneurship, Masters in Computer Science from Boston University and Harvard University. Rohit is a regular speaker at No Fluff Just Stuff, UberConf, RichWeb, GIDS, and other international conferences.
Rohit loves to connect on http://www.productivecloudinnovation.com.
http://linkedin.com/in/rohit-bhardwaj-cloud or using Twitter at rbhardwaj1.
Classic system design teaches you how to scale requests. AI-era architecture teaches you how to scale reasoning, retrieval, tokens, tools, trust, and cost.
In the AI era, the best architects do not just draw boxes. They design authority, evidence, fallback, observability, and cost controls into every system.
Modern system design has entered a new era. It’s no longer enough to optimize for uptime and latency — today’s systems must also be AI-ready, token-efficient, trustworthy, and resilient. Whether building global-scale apps, powering recommendation engines, or integrating GenAI agents, architects need new skills and playbooks to design for scale, speed, and reliability.
This full-day workshop blends classic distributed systems knowledge with AI-native thinking. Through case studies, frameworks, and hands-on design sessions, you’ll learn to design systems that balance performance, cost, resilience, and truthfulness — and walk away with reusable templates you can apply to interviews and real-world architectures.
Learning Outcomes
By the end of this workshop, participants will be able to:
AI inference is no longer a simple model call—it is a multi-hop DAG of planners, retrievers, vector searches, large models, tools, and agent loops. With this complexity comes new failure modes: tail-latency blowups, silent retry storms, vector store cold partitions, GPU queue saturation, exponential cost curves, and unmeasured carbon impact.
In this talk, we unveil ROCS-Loop, a practical architecture designed to close the four critical loops of enterprise AI:
•Reliability (Predictable latency, controlled queues, resilient routing)
•Observability (Full DAG tracing, prompt spans, vector metrics, GPU queue depth)
•Cost-Awareness (Token budgets, model tiering, cost attribution, spot/preemptible strategies)
•Sustainability (SCI metrics, carbon-aware routing, efficient hardware, eliminating unnecessary work)
KEY TAKEAWAYS
•Understand the four forces behind AI outages (latency, visibility, cost, carbon).
•Learn the ROCS-Loop framework for enterprise-grade AI reliability.
•Apply 19 practical patterns to reduce P99, prevent retry storms, and control GPU spend.
•Gain a clear view of vector store + agent observability and GPU queue metrics.
•Learn how ROCS-Loop maps to GCP, Azure, Databricks, FinOps & SCI.
•Leave with a 30-day action plan to stabilize your AI workloads.
⸻
AGENDA
1.The Quiet Outage: Why AI inference fails
2.X-Ray of the inference pipeline (RAG, agents, vector, GPUs)
3.Introducing the ROCS-Loop framework
4.19 patterns for Reliability, Observability, FinOps & GreenOps
5.Cross-cloud mapping (GCP, Azure, Databricks)
6.Hands-on: Diagnose an outage with ROCS
7.Your 30-day ROCS stabilization plan
8.Closing: Becoming a ROCS AI Architect
Autonomous LLM agents don’t just call APIs — they plan, retry, chain, and orchestrate across multiple services.
That fundamentally changes how we architect microservices, define boundaries, and operate distributed systems.
This session delivers a practical architecture playbook for Agentic AI integration — showing how to evolve from simple request/response designs to resilient, event-driven systems.
You’ll learn how to handle retry storms, contain failures with circuit breakers and bulkheads, implement sagas and outbox patterns for correctness, and version APIs safely for long-lived agents.
You’ll leave with reference patterns, guardrails, and operational KPIs to integrate agents confidently—without breaking production systems.
Problems Solved
Why Now
What Is Agentic AI in Microservices
Agenda
Opening: The Shift to Agent-Driven Systems
How autonomous agents change microservice assumptions.
Why request/response architectures fail when faced with planning, chaining, and self-healing agents.
Pattern 1: Event-Driven Flows Use events, queues, and replay-safe designs to decouple agents from synchronous APIs. Patterns: pub/sub, event sourcing, and replay-idempotency.
Pattern 2: Saga and Outbox Patterns Manage long workflows with compensations. Ensure atomicity and reliability between DB and event bus. Outbox → reliable publish; Saga → rollback on failure.
Pattern 3: Circuit Breakers and Bulkheads Contain agent-triggered failure storms. Apply timeout, retry, and fallback policies per domain. Prevent blast-radius amplification across services.
Pattern 4: Service Boundary Design Shape services around tasks and domains — not low-level entities. Example: ReserveInventory, ScheduleAppointment, SubmitClaim. Responses must return reason codes + next actions for agent clarity. Avoid polymorphic or shape-shifting payloads.
Pattern 5: Integrating Agent Frameworks Connect LLM frameworks (Agentforce, LangGraph) safely to services. Use operationId as the agent tool name; enforce strict schemas. Supervisor/planner checks between steps. Asynchronous jobs: job IDs, progress endpoints, webhooks.
Pattern 6: Infrastructure and Operations
Wrap-Up: KPIs and Guardrails for Production Key metrics: retry rate, success ratio, agent throughput, event replay lag. Lifecycle governance: monitoring, versioning, deprecation, and sunset plans.
Key Framework References
Takeaways
Enterprises are moving from single AI agents to networks of agents that trigger thousands of API calls, retries, and tool-chains per prompt. Without orchestration discipline and APIs built for AI-scale, systems buckle under bursty load, retry storms, cache-miss spikes, inconsistent decisions, and runaway costs.
This talk shows how to combine MCP (Model Context Protocol) with proven inter-agent orchestration patterns — Supervisor, Pub/Sub, Blackboard, Capability Router — and how to harden APIs for autonomous traffic using rate limits, dedupe, backpressure, async workflows, resilient caching, and autoscaling without bill shock.
You’ll also learn the AIRLOCK Framework for governing multi-agent behavior with access boundaries, identity checks, rate controls, least-privilege routing, observability, compliance filters, and kill-switches.
You will walk away with a practical blueprint for building multi-agent systems that are fast, safe, reliable, and cost-predictable.
KEY TAKEAWAYS
Pattern Literacy: When to use Orchestrator, Pub/Sub, Blackboard, Router
MCP Fluency: Standardize agent↔tool integration
API Scaling: Rate limits, dedupe, backpressure, async, caching
Resilience: Bulkheads, jitter, circuit breakers, autoscaling guardrails
Observability: Trace chain-ID/tool-ID across agents & tools
AIRLOCK Governance: Access boundaries, identity, rate controls, least-privilege routing, compliance, kill-switches
AGENDA
Why AI Changes Load Patterns
Bursty workloads · fan-out · retry amplification · cost spikes
MCP 101
Standardized agent→tool access · hot-swappable tools
Orchestration Patterns
Supervisor · Pub/Sub · Blackboard · Capability Router
Architecting APIs for AI Traffic
Multi-dimensional rate limits · dedupe · backpressure · SWR caching · async
Resilience & Autoscaling
Circuit breakers · bulkheads · kill-switches · budget caps
Observability & Governance
Chain-ID tracing · anomaly detection · AIRLOCK boundaries
A live, end-to-end walkthrough of an AWS Well-Architected Review for a GenAI app. You’ll learn how to apply the AWS Generative AI Lens across the six pillars, then add Bedrock Guardrails and Knowledge Bases (RAG) to raise reliability, safety, and accuracy. You’ll leave with a reusable checklist and a prioritized remediation plan.
Who it’s for & why
What you’ll learn
What you’ll take away
Certification-readiness talk with architecture scenarios, exam-domain mapping, practical examples, and production-design guidance.
Claude is no longer just a chatbot for writing answers. It is becoming part of how developers design, build, review, and automate software. Claude Code can help developers work across repositories, Claude Code GitHub Actions can respond to issues and pull requests, MCP can connect Claude to external tools and systems, and the Claude Agent SDK enables developers to build custom agentic workflows. This creates a new skill requirement for architects: knowing how to design Claude-powered systems that are safe, measurable, governable, and production-ready.
This talk provides a practical readiness roadmap for developers and architects preparing for Claude architecture work and Claude certification-style expectations. We will cover Claude platform fundamentals, Claude Code workflows, MCP/tool governance, Agent SDK patterns, API design, RAG, evals, observability, security, and enterprise deployment concerns. Participants will also work through certification-style scenarios that test architectural judgment, not memorization.
The goal is simple: do not just learn Claude. Learn how to architect with Claude.
Claude's certification should not be treated as a badge. It should be treated as proof that an architect can design safe, production-ready Claude-powered systems.
Main audience promise
By the end of the talk, participants will understand what they need to study, practice, and demonstrate to become Claude architecture-ready.
They will leave with:
Most enterprise LLM failures aren’t technical — they’re trust failures. Models hallucinate, drift from source truth, or produce outputs with no provenance. For regulated industries, that’s unacceptable. This session introduces GraphRAG — a breakthrough approach combining knowledge graphs (Neo4j) with retrieval-augmented generation to deliver traceable, explainable, and auditable AI outputs. You’ll learn how to design, evaluate, and deploy GraphRAG architectures aligned with the EU AI Act, NIST AI Risk Management Framework, and enterprise AI governance standards.
Problems Solved
Why Now
What GraphRAG Is
Where It Applies
Why It’s Valuable
Agenda
Opening & Problem Context
Why trust is the bottleneck for enterprise AI.
Examples of LLMs failing in regulated use cases — what breaks when outputs lack provenance.
Pattern 1: Anatomy of GraphRAG
Understanding how GraphRAG extends RAG with Neo4j graphs.
Schema design for entities, relationships, and evidence paths.
Structured retrieval from graph → vector → generator.
Pattern 2: Architecture & Data Flow End-to-end GraphRAG blueprint: Ingestion → Entity extraction → Graph population → Retrieval orchestration → Response grounding. Contrast with plain RAG and vector-only approaches.
Pattern 3: Explainability & Evaluation Metrics for evaluating explainability: Faithfulness, groundedness, and coverage. How to trace model answers back to graph nodes and documents. Integration with AI observability platforms (PromptLayer, Arize, etc.).
Pattern 4: Compliance & Governance Alignment Connecting GraphRAG design to regulatory frameworks:
Pattern 5: Real-World Scenarios Industry case patterns:
Wrap-Up & Discussion Recap of GraphRAG architecture and design patterns. Checklist for adoption: schema templates, metrics, and governance integration. Q/A and enterprise discussion on explainable AI roadmaps.
Key Framework References
Takeaways
Coding interviews and production systems share the same challenge: transforming vague problems into correct, efficient, and explainable solutions.
This talk introduces a 7-step algorithmic thinking framework that begins with a brute-force baseline and evolves toward an optimized, production-grade solution—using AI assistants like ChatGPT and GitHub Copilot to accelerate ideation, edge-case discovery, and documentation, without sacrificing rigor.
Whether you’re solving array or graph problems, optimizing data pipelines, or refactoring legacy logic, this framework builds the discipline of clarity before optimization—and shows how to use AI responsibly as a thinking partner, not a shortcut.
Why This Talk Now (in the AI Era)
Problems Solved
The 7-Step Algorithmic Thinking Playbook
Clarify – Define inputs, outputs, and constraints precisely.
Baseline – Write the simplest brute-force solution for correctness.
Measure – Analyze time and space complexity; identify bottlenecks.
Map Patterns – Recognize the family (array, tree, graph, DP, greedy).
Refactor – Apply the optimal pattern or data structure.
Validate – Test edge cases and boundary conditions automatically.
Explain – Communicate trade-offs, scalability, and readability.
Learning Outcomes
Agenda
Opening: The AI-Accelerated Engineer
How AI is reshaping developer workflows—and why algorithmic clarity matters more than ever.
Examples of AI code that’s correct syntactically but wrong logically.
Pattern 1: Clarify and Baseline Turning vague questions into crisp specifications. Why starting with brute force improves correctness and confidence.
Pattern 2: Measure and Map Patterns How to quickly estimate complexity and identify known solution families. Mapping problems to arrays, graphs, or DP templates.
Pattern 3: Refactor with AI as a Partner Using Copilot or ChatGPT to suggest refactors, not replace reasoning. Prompt patterns for safe collaboration (“generate + verify + explain”). Spotting hallucinated optimizations.
Pattern 4: Validate and Explain Building automated test scaffolds and benchmark harnesses. AI-assisted edge-case discovery. How to articulate trade-offs in interviews or design docs.
Pattern 5: Framework in Action Live problem walkthrough: From brute-force substring search → optimized sliding window solution → complexity and trade-off explanation. Demonstrate where AI adds value and where human logic rules.
Pattern 6: Guardrails for AI-Assisted Coding Version control hygiene, reproducibility, test coverage. Ensuring deterministic, reviewable AI suggestions. Avoiding “hallucination debt” in production codebases.
Wrap-Up: From Algorithms to Systems Thinking How this framework extends from whiteboard problems to microservices, pipelines, and data workflows. Checklist for using AI as a disciplined amplifier of human reasoning.
Key Framework References
Takeaways
Claude Code is not just a coding assistant. Used casually, it can create fast prototypes. Used architecturally, it can become a powerful engineering accelerator for discovery, refactoring, test generation, documentation, architecture reviews, and modernization.
This talk teaches architects, tech leads, and senior developers how to use Claude Code as part of a governed software delivery system. We will explore how to structure repositories, write effective CLAUDE.md guidance, create architecture guardrails, generate tests, review AI-produced code, and use Claude Code without turning your codebase into an ungoverned “vibe coding” experiment.
The core message is simple: Claude Code should not replace architecture judgment. It should amplify it.
Anthropic’s own Claude documentation emphasizes prompting clarity, examples, structured guidance, and agentic workflows, which makes architecture-level instructions especially important when using Claude in engineering systems.
Learning Outcomes
Participants will learn how to:
Agenda
LLM agents don’t just fetch data—they decide and act. To support planning and chaining, microservices must expose not only endpoints but also semantic context: what entities mean, which states are valid, which actions come next, and why decisions were made. This talk shows how to evolve from data-only APIs to MCP-aware, semantically rich services using JSON-LD/Schema.org, Hydra-style affordances, domain events, and OpenAPI metadata. You’ll learn retrofit vs greenfield paths, see cross-industry demos, and leave with a migration checklist that makes your services truly agent-ready.
Agenda
Takeaways
Large Language Models unlock new capabilities—and expose brand-new attack surfaces.
From prompt injection and data exfiltration to model denial-of-service and insecure plugin calls, adversaries are exploiting weaknesses traditional AppSec never anticipated.
The new OWASP LLM Top-10 provides a shared vocabulary for AI risks; this session turns that list into actionable engineering practice.
You’ll learn how to threat-model LLM endpoints, design guardrails that actually block malicious behavior, sandbox tools and plug-ins with least privilege, and align your mitigations to the NIST AI Risk Management Framework for audit-ready governance.
Problems Solved
Why Now
What You’ll Learn
Agenda
Opening: The New AI Attack Surface
How LLMs change the threat model. Examples of real-world attacks: prompt injections, indirect injections, model DoS, and exfiltration via vector stores.
Pattern 1: Threat Modeling LLM Endpoints Identify assets, trust boundaries, and high-risk flows. Apply STRIDE-inspired analysis to prompts, context windows, retrieval layers, and plugin calls.
Pattern 2: Designing Input/Output Guardrails Policy filtering, schema validation, and content moderation. Runtime vs compile-time guardrails—what actually works in production. Enforcing determinism and fail-safe defaults.
Pattern 3: Sandboxing and Least Privilege Plugins Secure function calling: scoped IAM, network egress rules, per-plugin secrets, and API key vaulting. Container isolation and ephemeral agent sandboxes.
Pattern 4: Data Protection and Tenancy in RAG Redacting sensitive data before embedding. Segregating tenant vectors and access policies. Auditing data lineage and evidence paths.
Pattern 5: Red Team & Evaluation Frameworks Running adversarial simulations aligned with OWASP LLM Top-10. Common exploits and how to detect them. Integrating automated red-team tests into CI/CD pipelines.
Pattern 6: Governance & Framework Mapping Mapping mitigations to NIST AI RMF (categories RA, MA, ME). Building dashboards and executive summaries for risk reporting.
Wrap-Up & Action Plan Summarize practical controls that can be implemented within 30 days. Introduce the Guardrail Policy Starter Kit + Red-Team Runbook templates. Live checklist review for readiness maturity.
Key Framework References
Takeaways