Key Responsibilities Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency. GPU Pooling & AI Infra: Design
Some careers have more impact than others. If you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be. We are currently seeking an experienced professional to
Some careers have more impact than others. If you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be. We are currently seeking an experienced professional to