As the AWS Data Engineer, you will be instrumental in building, optimizing, and maintaining scalable data pipelines and lakehouse architectures on AWS to enable reliable, high-performance data flows for analytics and reporting. Your primary focus will be designing robust data models, implementing lakehouse solutions using AWS services, and ensuring efficient ingestion, transformation, and storage of data from diverse sources. This is an excellent opportunity to shape a modern data ecosystem using cutting-edge AWS technologies. This position reports directly to the Manager of Engineering.
What is expected in this role:
- Data Modeling: Design and implement dimensional and normalized data models optimized for analytical workloads in lakehouse environments.
- Lakehouse Architecture: Build and manage modern data lakehouse solutions using AWS services including Amazon S3, AWS Glue, Amazon Redshift Spectrum, and Lake Formation for governance, cataloging, and fine-grained access control.
- ETL/ELT Pipelines: Develop, monitor, and optimize batch and near-real-time data pipelines using AWS Glue, AWS Lambda, Amazon Kinesis, and Step Functions.
- Data Ingestion & Integration: Ingest structured, semi-structured, and unstructured data from APIs, databases, streaming sources, and third-party systems into the lakehouse.
- Infrastructure as Code & Automation: Define and manage data infrastructure using AWS CloudFormation, CDK, with CI/CD pipelines.
- Support model training and analytics initiatives by preparing datasets, performing analysis with strong attention to detail, and creating visualizations using AWS Quick Suite.
- Perform other duties as assigned.
How success is measured in this role:
- Deliver scalable, maintainable data models that support evolving analytical and reporting needs.
- Implement secure, governed lakehouse architecture with centralized metadata, access controls, and auditability via AWS Lake Formation.
- Build high-throughput, fault-tolerant data pipelines with monitoring, alerting, and automated recovery using Amazon CloudWatch and EventBridge.
- Optimize query performance and cost using Redshift, Athena, and S3 partitioning/compression strategies.
- Ensure data quality, lineage, and documentation through automated testing and catalog integration.
- Collaborate effectively with analytics engineers, data scientists, and business stakeholders to align data solutions with organizational objectives.