Mô Tả Công Việc
Duration: Full-time for 6 monthsThe RoleYour primary mission will be to own the design and implementation of our LLM Evaluation Platform, a critical system that will serve as the quality gate for all our AI features. You will be a key builder on a new initiative, working alongside dedicated Data Engineering and DevOps experts to deliver a tangible, high-impact platform.This role is for a hands-on engineer who thrives on building robust systems that provide leverage. You will be fully empowered to own the implementation and success of this project.ResponsibilitiesDesign and develop the core backend systems for a new LLM Evaluation Platform, leveraging Arize Phoenix as the foundational framework for traces, evaluations, and experiments.Architect and implement the observability backbone for AI services, integrating Phoenix with OpenTelemetry to build a centralized system for logging, tracing, and evaluating LLM behavior in production.Design and implement a CI/CD framework for versioning, testing, and deploying prompt-based logic and LLM configurations, ensuring reproducible and auditable deployments across all AI features.Make pragmatic technical decisions that prioritize business value and delivery speed, in line with an early-stage startup environment.Work closely with the Data Science team to understand their workflow and ensure the platform meets their core needs for experiment tracking and validation.Help define and document the initial technical patterns for MLOps and model evaluation, creating the foundation for future development.
Xem toàn bộ Mô Tả Công Việc
Yêu Cầu Công Việc
5+ years of professional software engineering experience, with a strong focus on backend or platform systems.Proven expertise in Python with a track record of building robust, testable, and maintainable production systems.Hands-on experience with modern observability frameworks, especially OpenTelemetry.Strong problem-solving skills with a pragmatic approach to avoid over-engineering while ensuring robustness.Nice to haveExperience in MLOps/LLMOps, including productionizing and evaluating modern ML/LLM systems, ideally with exposure to frameworks like Arize Phoenix, LangSmith, or similar platforms.Hands-on experience with Arize Phoenix, including setting up custom evaluators, managing datasets, and implementing trace-based debugging.Familiarity with AWS core services in a platform context (Kubernetes/EKS, RDS, S3, IAM).Experience in a startup environment with comfort in ambiguity and a fast-paced setting.
Xem toàn bộ Yêu Cầu Công Việc
Quyền Lợi
Competitive salary;Mon-Fri, flexible remote working;English, professional working environment.