Get to Know the TeamWe maintain Grab's data infrastructure, a dependable and economical platform that supports internal data processes and company-wide data lake access. This includes managing compute systems such as Apache Spark, Trino, and Starrocks, scheduling via Airflow, and an AWS S3-based storage layer. Our storage solutions utilize modern open-source formats like Apache Iceberg and Delta, in addition to traditional Apache Hive Parquet tables.We simplify data operations for internal users by providing managed Spark, Starrocks, Trino, and Airflow services. Furthermore, our team collaborates with Grab's Data Catalog team to offer self-service datalake capabilities powered by Datahub. We also partner closely with Grab's AI platforms, ensuring a smooth experience for users in their AI development.Get to Know the RoleYou will support the mission by maintaining and extending the platform capabilities through implementation of new features and continuous improvements. You will also explore new developments in the space and bring them to our platform there by helping the data community at Grab. You will work onsite in Grab Vietnam office, CMC Creative Space, D7, HCMC and report to Data Engineering Manager, who is based in Singapore.The Critical Tasks You Will Perform
You will maintain and extend the Python/Go/Scala backend for Grab's Airflow, Spark, Trino and Starrocks platform
You will build, modify and extend Python/Scala Spark applications and Airflow pipelines for better performance, reliability, and cost.
You will design and implement architectural improvements for new use cases or efficiency.
You will build platforms that can scale to the 3 Vs of Big Data (Volume, Velocity, Variety)
You will follow various testing best practices and SRE best practices to ensure system stability and reliability.