IntroductionWith over a decade of experience in IT and fintech, Blue Belt has become a leading software development company, delivering innovative technology solutions to a diverse global clientele. We specialize in developing web, mobile, payment, and blockchain applications that offer seamless user experiences. Headquartered in Tokyo, Japan, with a state-of-the-art Technology Hub in Hanoi, Vietnam, Blue Belt operates in more than ten countries, including Japan, Thailand, Indonesia, the Philippines, Malaysia, Taiwan, and Brazil. Our team of over 200 professionals brings a wealth of expertise to drive our global operations.We are looking for a highly skilled DevOps Engineer to join our team, with a focus on deploying, scaling, and maintaining infrastructure for conversational AI and chatbot systems. You will work closely with AI engineers, software developers, and product teams to automate workflows, ensure high availability, and optimize performance for AI-driven applicationsJob DescriptionInfrastructure Automation & CI/CD
Design, implement, and maintain CI/CD pipelines for chatbot and AI services.
Automate environment provisioning using tools like Terraform, Ansible, or Pulumi.
Integrate testing and deployment workflows to support agile delivery cycles.
Cloud Infrastructure Management
Build and manage infrastructure on cloud platforms AWS, tailored for AI workloads.
Implement secure and scalable architectures for real-time chatbot interactions.
Monitoring, Logging & Incident Management
Setup Logging Centralized using EFK (ElasticSearch, Fluentbit, Kibana)
Set up monitoring tools (Prometheus, Grafana, ELK, or Datadog) for proactive alerting.
Define and enforce SLOs/SLAs for chatbot uptime and response time.
Lead incident response and root cause analysis for system failures.
Security & Compliance
Ensure best practices in infrastructure security (IAM, VPC, secrets management).
Support compliance efforts for data protection (GDPR, SOC2) in chatbot data pipelines.
Perform ad-hoc DevOps tasks as required, including emergency patches, incident support, or rapid deployment of security updates.
AI Deployment Model
Collaborate with teams to containerize and deploy NLP models (e.g., with Docker, Kubernetes).
Manage GPU/TPU workloads, including dynamic scaling and resource optimization.
Monitor model inference performance and latency across staging and production environments.
Optimize cost, compute, and storage strategies for high-volume inference and training.