Infrastructure Automation & CI/CDDesign, implement, and maintain CI/CD pipelines for scalable backend and data services.Automate infrastructure provisioning using tools like Terraform, Terragrunt, or Ansible.Integrate automated testing, deployment workflows, and rollback strategies to support agile development.Maintain GitOps practices and CI/CD infrastructure using Jenkins, GitLab, and related tooling.Kubernetes & EKS OrchestrationManage and optimize Kubernetes clusters on AWS (EKS), including node autoscaling, namespace management, and resource allocation.Build, configure, and maintain Helm charts for deployment automation and cluster app lifecycle.Implement best practices for Kubernetes security, multi-tenant isolation, and cluster upgrades.Work with developers to containerize services and deploy them reliably in production.DataOps (Nice to have)Collaborate with data engineers to automate data pipeline deployment using tools like Apache Airflow, ensuring end-to-end scheduling, dependency management, and monitoring of data workflows.Implement and manage transformation pipelines, supporting versioned SQL models, testing, and documentation across environments.Integrate and manage data warehouse platforms, optimized for analytical and operational workloads.Support Metabase as the core business intelligence tool, including integration with data sources, permission management, and dashboard reliability.Ensure data quality, lineage, and observability by collaborating on validation frameworks and metrics integration.Operationalize and maintain databases such as MySQL, PostgreSQL, and streaming/message brokers like Kafka, ActiveMQ, and Redis.Automate and document backup, restore, failover, and disaster recovery strategies for critical infrastructure and data assets.Contribute to the deployment and tuning of data storage solutions (e.g., MinIO, S3) and metadata/catalog tools to enhance discoverability and governance.Cloud Infrastructure ManagementArchitect and manage cloud infrastructure primarily on AWS (including VPC, EC2, EKS, RDS, MSK, ElastiCache).Design high-availability (HA) and fault-tolerant infrastructure for critical backend and data workloads.Support multi-region deployment patterns and network configuration (e.g., DNS, VPN, routing, load balancing).Drive cost optimization efforts in compute, storage, and networking.Monitoring, Logging & Incident ManagementSetup Logging Centralized using EFK (ElasticSearch, Fluentbit, Kibana)Set up monitoring tools (Prometheus, Grafana, ELK, or Datadog) for proactive alerting.Define and enforce SLOs/SLAs for chatbot uptime and response time.Lead incident response and root cause analysis for system failures.Security & ComplianceEnsure best practices in infrastructure security (IAM, VPC, secrets management).Support compliance efforts for data protection (GDPR, SOC2) in chatbot data pipelines.Perform ad-hoc DevOps tasks as required, including emergency patches, incident support, or rapid deployment of security updates.