Design, develop, and maintain scalable data pipelines and ETL/ELT processes to collect, clean, and process large-scale datasets in data systems.
Collaborate with Data Scientists, Data Analysts, and other stakeholders to understand data requirements and develop Data Lake and Data Warehouse solutions for analytics, reporting, and AI/ML. Utilize BI tools like Grafana, AWS QuickSight, and Google Sheets for data visualization.
Build and optimize data architecture, including Lakehouse (AWS S3, Delta Lake), Data Warehouse (Redshift), and technologies like Kafka for both streaming and batch data processing.
Monitor and troubleshoot data pipeline issues to ensure data integrity and timely delivery. Utilize tools such as Sentry, Grafana, Prometheus, and AWS CloudWatch to track logs and send alerts via Slack.
Document technical processes, maintain data catalogs, and ensure compliance with data governance policies. Use the Atlassian toolset (Jira, Confluence) or Slack for work management and team collaboration.
Optimize the performance, reliability, and cost-efficiency of data systems. Enhance the scalability and efficiency of all components, including Kafka, dbt, Airflow, Airbyte, and storage solutions like PostgreSQL, Elasticsearch, Redis, and AWS S3, as well as data processing and orchestration frameworks.
Continuously research and implement new technologies to optimize the current system, applying best practices in Data Engineering and DataOps.