Mô Tả Công Việc
We're seeking an experienced Lead DevOps Engineer to spearhead our critical infrastructure transformation as PAVE.ai scales to enterprise level. This role will lead the strategic migration from Google Cloud Platform to AWS while building and managing a high-performing DevOps team. As Lead DevOps Engineer at PAVE.ai, you'll architect enterprise-grade infrastructure, establish site reliability engineering practices, and ensure 99.9%+ uptime for our vehicle inspection platform serving global automotive enterprises. This is a pivotal role that will define our infrastructure strategy and operational excellence as we process millions of vehicle inspections for dealerships, fleet operators, insurers, and vehicle marketplaces worldwide.Cloud Migration LeadershipLead and execute the complete migration strategy from GCP to AWS, ensuring zero downtimeDesign and implement AWS enterprise architecture following Well-Architected Framework principlesCreate detailed migration roadmaps with clear milestones, risk assessments, and rollback plansArchitect hybrid cloud solutions during transition phase to maintain business continuityOptimize costs during and after migration while improving performance and reliabilityDocument migration processes and create runbooks for knowledge transferTeam Leadership & DevelopmentBuild and lead a world-class DevOps team, including hiring, mentoring, and performance managementDefine team structure, roles, and responsibilities for 24/7 operational coverageEstablish DevOps culture and best practices across the engineering organizationCreate career development paths and training programs for team membersFoster collaboration between DevOps, development, and security teamsLead incident response and post-mortem processes to drive continuous improvementSite Reliability Engineering (SRE)Establish and maintain SLIs, SLOs, and SLAs for all critical servicesDesign and implement comprehensive monitoring and observability strategiesBuild automated incident detection and response systemsEnsure 99.9%+ uptime for production systems through proactive reliability engineeringImplement chaos engineering practices to identify and fix potential failuresCreate capacity planning models to support 10x growthInfrastructure & AutomationDesign scalable, secure, and cost-effective AWS infrastructure for enterprise workloadsImplement Infrastructure as Code (IaC) using Terraform/CloudFormationBuild CI/CD pipelines supporting multiple deployment strategies (blue-green, canary)Automate security compliance and governance using AWS native toolsImplement auto-scaling and self-healing infrastructureDesign disaster recovery and business continuity strategiesDevelop and enhance logging systems and observability tools (ongoing improvement initiative)Enterprise Platform DevelopmentArchitect multi-tenant infrastructure supporting enterprise isolation requirementsImplement enterprise-grade security including VPN, SSO, and zero-trust networkingDesign data residency and compliance solutions for global operationsBuild platform services for logging, monitoring, secrets management, and service meshCreate developer self-service platforms to accelerate deliveryEstablish FinOps practices for cloud cost optimizationStrategic PlanningDevelop long-term infrastructure roadmap aligned with business objectivesPartner with leadership to define technology strategy and investmentsEvaluate and introduce new technologies to improve operational efficiencyCreate business cases for infrastructure investments with ROI analysisEstablish vendor relationships and manage AWS enterprise supportDrive infrastructure standardization and consolidation initiativesSuccess MetricsComplete GCP to AWS migration within 6 months with zero critical incidentsAchieve and maintain 99.9% uptime across all production servicesReduce infrastructure costs by 30% while improving performanceBuild and retain a high-performing DevOps team with <10% attritionDecrease deployment frequency from weekly to multiple times dailyReduce MTTR (Mean Time To Recovery) by 50%
Xem toàn bộ Mô Tả Công Việc
Yêu Cầu Công Việc
Technical SkillsCloud Expertise:Expert-level AWS knowledge (Solutions Architect Professional preferred)Strong GCP experience with migration expertiseMulti-cloud architecture and managementAWS services mastery: EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, Route53Cloud networking: VPC, Transit Gateway, Direct Connect, Global AcceleratorSecurity services: IAM, KMS, WAF, Shield, GuardDuty, Security HubDevOps & Automation:Infrastructure as Code: Terraform, CloudFormation, AWS CDKConfiguration management: Ansible, Chef, or PuppetCI/CD platforms: Jenkins, GitLab CI, GitHub Actions, AWS CodePipelineContainer orchestration: Kubernetes (EKS), Docker, HelmGitOps practices with ArgoCD or FluxScripting languages: Bash, Go, PythonSite Reliability:Monitoring/Observability: Prometheus, Grafana, ELK, Datadog, New RelicAPM and distributed tracing: OpenTelemetry, JaegerIncident management: PagerDuty, OpsgenieSRE practices: Error budgets, SLI/SLO definition, toil reductionPerformance tuning and capacity planningChaos engineering tools: Gremlin, Chaos MonkeyLeadership SkillsProven ability to lead and inspire technical teamsExperience managing remote and distributed teamsStrong project management and organizational skillsExcellent stakeholder management across technical and business teamsBudget management and cost optimization experienceChange management expertise for large-scale transformationsSoft SkillsExcellent written and verbal communication skills in both English and VietnameseStrategic thinking with ability to balance long-term vision with immediate needsStrong problem-solving skills with calm demeanor during incidentsAbility to influence and drive consensus across organizationsMentoring mindset with passion for developing talentAdaptable to rapidly changing requirements and technologiesExperience7+ years of DevOps/SRE experience with 3+ years in a leadership roleProven experience leading large-scale cloud migrations (GCP to AWS preferred)Track record of managing DevOps teams of 4+ engineersExperience with enterprise B2B SaaS platforms at scale (millions of requests/day)Demonstrated success improving system reliability from <99% to 99.9%+Preferred QualificationsAWS Certified DevOps Engineer or Solutions Architect ProfessionalExperience with AI/ML workload infrastructure and GPU clustersKnowledge of automotive industry compliance and regulationsExperience with computer vision and image processing pipelinesServerless architecture and event-driven systemsFinOps certification or demonstrated cost optimization achievementsExperience with regulated environments (SOC2, ISO 27001, GDPR)Contributions to open-source DevOps/SRE toolsPublic speaking experience at DevOps/SRE conferencesExperience scaling startups to enterprise level
Xem toàn bộ Yêu Cầu Công Việc
Hình thức
Full-time
Quyền Lợi
1. Competitive Compensation & PerksAttractive salary package.15 days of annual leave.13th-month bonusPremium healthcare coverage for you and your family.Thoughtful appreciation gifts throughout the year.2. Growth & Learning OpportunitiesWork on cutting-edge, large-scale products in the car inspection field.Clear career paths for both technical experts and aspiring leaders.Continuous learning programs to sharpen your skills and grow your career.Learn from everything, everywhere—but be a smart copy-paster, not a copycat!Be ready to embrace and implement new ideas in a fast-paced environment.3. An Inspiring WorkplaceFlexible hybrid work model and a strong focus on work-life balance.A modern, fully-equipped Office with a well-stocked pantry.Be motivated, creative, and passionate—we can’t ask for more!Respect and care for your teammates, your environment, and even yourself.Treat yourself well, and while you’re at it, save the Earth too.4. A Mindset for GrowthHave the courage to move fast, stay flexible, and take full responsibility for every single line of code.Always look back at your work and strive to make it better—nothing is perfect, and that’s where you come in.It’s okay to be late sometimes, but make sure you’re fully accountable and aware of your actions.5. A Dynamic and Open CultureWe don’t stick rigidly to the gameplan, so feel free to add or remove your own “blah blah” from this list. 😉
Mức lương
Thỏa thuận
Báo cáo tin tuyển dụng: Nếu bạn thấy rằng tin tuyển dụng này không đúng hoặc có dấu hiệu lừa đảo,
hãy phản ánh với chúng tôi.
Tham khảo: 10 Dấu hiệu nhận biết hành vi lừa đảo qua tin tuyển dụng.
Tham khảo: 10 Dấu hiệu nhận biết hành vi lừa đảo qua tin tuyển dụng.