Mô Tả Công Việc
Deploy and manage observability stacks (Prometheus (Mimir), Grafana, Loki, OpenTelemetry) to define SLOs/SLIs, alerts, and dashboards.Operate and maintain Kubernetes clusters (EKS + on-prem), ensuring scalability, service discovery, ingress, and network policy compliance.Build and maintain hybrid networking solutions with secure, low-latency connectivity.Implement reliability-focused practices including autoscaling, load balancing, disaster recovery, and fault-tolerant system designs.Define and enforce DevSecOps policies, including secrets management, RBAC, Pod Security Standards, and secure container runtimes.Lead incident response, root cause analysis, postmortems, and continuous reliability improvements.Optimize cost, performance, and security across AWS services and on-prem resources.Harden Linux servers and infrastructure with security best practices (firewalls, SELinux/AppArmor, TLS/mTLS).Collaborate with developers to improve system reliability, operational efficiency, and secure application design.Participate in on-call rotations to ensure 24/7 system availability and quick resolution of production issues.Document system design and procedures
Xem toàn bộ Mô Tả Công Việc
Yêu Cầu Công Việc
Must-HaveStrong observability skills: Prometheus, Grafana, Loki, OpenTelemetry.Expertise and experience in developing, operating and troubleshooting Kubernetes clusters (EKS) at scale.Expertise in AWS services (VPC, EC2, IAM, CloudWatch, EKS, ElastiCache, MSK).Expertise in infrastructure-as-code tools (e.g., Terraform, CloudFormation).Solid knowledge of CI/CD pipelines (GitHub Actions, ArgoCD) with a focus on operational reliability and security.Deep understanding of Linux internals, networking (iptables, nftables, routing), and security hardening.Experience supporting high-availability, high-concurrency production systems.Experience in incident management, postmortems, and continuous improvement of reliability metrics.Willingness to participate in on-call rotations to ensure 24/7 system uptime.Nice-to-HaveExperience with Redis and Kafka in production at scale.Familiarity with secure networking automation and compliance frameworks.Knowledge of DevSecOps practices, Vault, IAM policy enforcement, and vulnerability management.Experience in service mesh tools like Istio on KubernetesDesign and test BCP/DR for cloud workloads (Site Recovery, Backup vaults, zone/region redundancy, RTO/RPO adherence).QualificationsBachelor's degree in Computer Science, Information Systems, or equivalent experience.5+ years in SRE, DevOps, Site Reliability, or Infrastructure Engineering roles.
Xem toàn bộ Yêu Cầu Công Việc
Hình thức
Full-time
Quyền Lợi
What we offerCompetitive salary package aligned with your experience and market standardsPerformance-based reviews and clear growth opportunitiesA global working environment with exposure to international teams and projectsOpportunities for personal and professional development, including training and new-skill learningFlexible working culture, built on trust and responsibilitySupportive, open-minded team culture, where your ideas and contributions are valuedChances to travel or collaborate across offices, depending on project needsAnd more benefits tailored to help you thrive at Secuwall
Mức lương
Thỏa thuận
Báo cáo tin tuyển dụng: Nếu bạn thấy rằng tin tuyển dụng này không đúng hoặc có dấu hiệu lừa đảo,
hãy phản ánh với chúng tôi.
Tham khảo: 10 Dấu hiệu nhận biết hành vi lừa đảo qua tin tuyển dụng.
Tham khảo: 10 Dấu hiệu nhận biết hành vi lừa đảo qua tin tuyển dụng.