ABOUT THE ROLEWe are building an E-commerce Data Platform following Lakehouse Architecture, leveraging Airflow, Airbyte, dbt, and Redshift on AWS Cloud. Our system is currently in the early development stage, providing a unique opportunity for you to be involved from the ground up—shaping the architecture, defining data pipelines, and implementing best practices. Here are some examples: Storefront, Payment Gateway, Inventory, Catalog, Logistics, Marketing Insights, CRM.WHAT YOU WILL DO
Design, develop, and maintain scalable data pipelines and ETL/ELT processes to collect, clean, and process large-scale datasets in data systems.
Collaborate with Data Scientists, Data Analysts, and other stakeholders to understand data requirements and develop Data Lake and Data Warehouse solutions for analytics, reporting, and AI/ML. Utilize BI tools like Grafana, AWS QuickSight, and Google Sheets for data visualization.
Build and optimize data architecture, including Lakehouse (AWS S3, Delta Lake), Data Warehouse (Redshift), and technologies like Kafka for both streaming and batch data processing.
Monitor and troubleshoot data pipeline issues to ensure data integrity and timely delivery. Utilize tools such as Sentry, Grafana, Prometheus, and AWS CloudWatch to track logs and send alerts via Slack.
Document technical processes, maintain data catalogs, and ensure compliance with data governance policies. Use the Atlassian toolset (Jira, Confluence) or Slack for work management and team collaboration.
Optimize the performance, reliability, and cost-efficiency of data systems. Enhance the scalability and efficiency of all components, including Kafka, dbt, Airflow, Airbyte, and storage solutions like PostgreSQL, Elasticsearch, Redis, and AWS S3, as well as data processing and orchestration frameworks.
Continuously research and implement new technologies to optimize the current system, applying best practices in Data Engineering and DataOps.