Mô Tả Công Việc
Senior AI Engineer (Infrastructure) About UsWe run an AI platform with 50K+ daily active users, millions of generations per day, entirely powered by open-weight models running on our own GPU fleet.New models drop weekly. New hardware ships quarterly. Our job is to be fast at adopting both. The teamYou'll be the AI/ML Infrastructure Engineer.Our first built the system from scratch: dynamic LoRA serving with 100+ adapters hot- swapped per request, inference optimization (DeepCache, torch.compile, quantization, abliteration), and keeps us on the latest GPU hardware as it ships.Together, you'll own everything between "a new model just dropped" and "it's live, fast, and cost-efficient." A typical weekBenchmark a new open-weight model, quantize it, test LoRA compatibility, decide ship or skipTune block-level caching for Blackwell architecture, measure quality/speed tradeoffsDig into GPU utilization data, find wasted spend, redesign auto- scalingDebug a 3 AM latency spike — OOM on two pods, fix it, write up what happened You'll thrive here if youHave shipped open-weight models to production at scale — not notebooks, not demos. LLMs, VLMs, image — the more architectures the better.Can show real optimization results with numbers — Xs faster, $Y/ month saved, Z% latency reduction.Think in cost-per-generation, not just raw performance. We care about both.Pick up new models and hardware fast. The ecosystem won't wait for you.Work independently. You'll figure out what to optimize — we won't hand you a roadmap. Bonus pointsBuilt or worked on dynamic adapter serving (LoRA hot-loading, multi-model routing)Model surgery beyond default settings: custom quantization, abliteration, architectural pruningEvaluated and migrated workloads across GPU generations What we runModels: Various open-weight LLMs, VLMs, and image models — changes constantly Optimization: PyTorch, torch.compile, DeepCache, GPTQ/AWQ Serving: Custom dynamic LoRA system Hardware: RTX 6000 Blackwell, H100 — we evaluate and migrate as new GPUs ship Infra: RunPod + on-prem · Docker · Python · Go backend
Xem toàn bộ Mô Tả Công Việc
Yêu Cầu Công Việc
Why us over a bigger companyYou won't spend 6 months getting access to a GPU cluster. You won't write design docs that never ship. You'll push to production this week.The problems are real, the scale is real, and you'll see your work in the numbers every morning.
Xem toàn bộ Yêu Cầu Công Việc
Hình thức
Full-time
Quyền Lợi
Social insurance, health insurance & private health insurance13th month salary + year-end bonus based on real contributionBreakfast, lunch & afternoon snacks providedFlexible working hoursAI Learning Budget — tools, courses, subscriptions to level up your skillsBirthday leaveCompetitive pay + bonuses tied directly to impact.Macbook, iMac and monitors provided.
Mức lương
Thỏa thuận
Báo cáo tin tuyển dụng: Nếu bạn thấy rằng tin tuyển dụng này không đúng hoặc có dấu hiệu lừa đảo,
hãy phản ánh với chúng tôi.
Tham khảo: 10 Dấu hiệu nhận biết hành vi lừa đảo qua tin tuyển dụng.
Tham khảo: 10 Dấu hiệu nhận biết hành vi lừa đảo qua tin tuyển dụng.