Capnexus is a comprehensive services provider. Our team consists of outstanding professionals, highly experienced in designing, building, and supporting retail software. We see ourselves as a build-as-a-service provider who follows a repeatable business pattern that can be applied to a variety of platforms and verticals. Having a culture built on outcomes and delivery at the core of the business, Capnexus is providing its customers with a complete suite of services for software development, system analysis, integration, implementation, and support, as well as the option to engage a single team to perform all the services they require.
Who You Are and What You'll Do:
Capnexus is looking for a highly skilled Senior AWS Data Engineer to lead data architecture, pipeline development, and data integrations. This is an exciting opportunity to apply advanced cloud data engineering skills on a platform that leverages generative AI to automate and modernize enterprise workflows.
Responsibilities:
- Participate in data discovery workshops to inventory source systems including property management platforms, marketing channels, and CRM data, and translate findings into data lake architecture requirements.
- Design and implement a multi-zone enterprise data lake on Amazon S3 (raw, conformed, enriched, aggregated) with ingest, cleansing, and business layers including schema versioning, checksum validation, business rule validation, and quarantine/notify workflows on failure.
- Build batch and streaming data ingestion pipelines using AWS Glue, Amazon Kinesis, and containerized ingestion applications across CDP, marketing, and property management data sources.
- Write PySpark and Python ETL code for AWS Glue jobs to transform, cleanse, and enrich data at scale; apply Apache Iceberg table format for ACID-compliant, schema-evolving data lake tables.
- Implement data transformation and orchestration frameworks using AWS Glue ETL and AWS Step Functions; configure AWS Glue Data Catalog with crawlers for automated metadata management and discovery.
- Implement AWS Lake Formation for fine-grained data governance including table-level and column-level permissions, data filters, and resource links — not just IAM-level access controls.
- Configure Amazon Athena for serverless SQL querying across the data lake with performance optimization (Parquet format, partitioning, column pruning, file size management, caching); implement Amazon DynamoDB for sub-second customer profile lookups, with DAX where latency requirements demand it.
- Develop and deploy AWS Lambda functions using AWS Lambda Powertools for structured logging, handler routing, and observability; implement error handling patterns including exponential backoff, retries, dead-letter queues, and CloudWatch alarms.
- Write and maintain Terraform (or CloudFormation/CDK) modules to provision and deploy AWS data infrastructure as part of the CI/CD pipeline — data engineers own their infrastructure deployment, not DevOps.
- Integrate CI/CD pipelines using GitHub Actions for automated deployment of Glue jobs, Lambda functions, and Step Functions workflows with lint checks and validation gates.
- Support Azure Data Lake migration: conduct discovery of ADLS assets, schemas, and transformation logic; provision AWS target environments; execute migration via AWS DataSync; perform row-count reconciliation, schema validation, and checksum comparison post-migration.
- Design and implement entity resolution pipelines to identify, deduplicate, and merge customer records into unified golden records using deterministic and fuzzy matching with lineage tracking and manual review pathways.
- Build and maintain data models to support Customer 360 views and executive analytics dashboards via Amazon QuickSight.
- Ensure data quality, validation, and integrity across all pipeline stages; support UAT for data-dependent features.
- Collaborate with Full Stack, DevOps/MLOps, and AI/ML team members working with Bedrock and SageMaker; contribute to architecture documentation, pipeline runbooks, and data governance documentation.
Qualifications:
- 5+ years of hands-on data engineering experience with at least 2+ years in AWS cloud environments.
- Strong proficiency in Python and SQL; hands-on PySpark or Scala coding experience for AWS Glue ETL — this is a coding role, not a configuration role.
- Hands-on experience with AWS Glue (jobs, crawlers, Data Catalog), AWS Step Functions, AWS Lambda, and Amazon S3 data lake architecture.
- Proficiency with AWS Lambda Powertools for structured logging, handler management, and observability in production serverless workloads.
- Working knowledge of Apache Iceberg table format including schema evolution, time travel, and partition management.
- Hands-on experience with Terraform, AWS CloudFormation, or AWS CDK for infrastructure as code integrated into CI/CD pipelines — candidates who have only consumed pre-made DevOps templates will not meet this requirement.
- Experience with AWS Lake Formation for fine-grained access control including table-level and column-level permissions, data filters, and resource links.
- Solid understanding of DynamoDB data modeling and key design patterns for sub-second lookups; familiarity with DAX for caching.
- Experience with Amazon Athena performance tuning: file formats, partitioning strategies, query optimization, and understanding of when Athena is and is not the right tool.
- Experience with GitHub Actions or comparable CI/CD tooling for automated deployment of data pipeline code.
- Strong understanding of data quality patterns: schema validation, checksum validation, business rule validation, quarantine workflows, and lineage tracking.
- Strong analytical, problem-solving, and communication skills; comfortable working in Agile/Scrum teams alongside AWS Professional Services.
Nice to Have:
- Experience with Azure Data Lake Storage (ADLS) and Azure-to-AWS migration using AWS DataSync.
- Familiarity with AWS Entity Resolution service — specifically matching workflows, rule-based and ML-based matching, and output schema features.
- Exposure to Amazon Bedrock or Amazon SageMaker in a data engineering support capacity (pipeline integration, feature stores, inference data prep).
- Knowledge of Amazon QuickSight for dataset preparation, SPICE optimization, and embedded dashboard development.
- Familiarity with Kiro CLI or AI-assisted development tooling for pipeline automation.
- AWS Certification (Data Analytics Specialty, Database Specialty, or Solutions Architect).
- Background in real estate, property management, marketing technology, or CRM data platforms.
"Our Culture":
At Capstone, the central principles that we all adhere to, and the glue that holds us together, are our keystones. Our four keystones are:
"A Customer Obsessed, Delivery Focused, Culture"
- We’re driven to exceed our customers’ expectations by listening, leading, solving problems, and delivering what we promise
- We aim to be the most dependable and trusted partner serving our customers. TRUST = CONSISTENCY x TIME
"A Culture of Learning and Sharing"
- We value “Lifetime Learners”; those who are hungry, competitive, curious, and self-motivated in their pursuit of knowledge.
- Personal and professional growth depends on teamwork and continuous learning. By sharing knowledge, skills, ideas, and effort, we benefit our customers, ourselves, and our communities.
- We recognize that the thoughts, feelings, and backgrounds of others are as important as our own. Everyone has something to learn and everyone has something they can teach.
- Knowledge and ability are valued. Sharing knowledge and helping others learn new capabilities is valued exponentially.
"A Culture of Growth and Scalability"
- Growth comes from not establishing barriers in your role. “Cross functional skill sets are valued and help us deliver to our customers in a truly agile fashion. It comes with understanding that when asked to do something new, you will need support, have questions, and make some mistakes along the way.
- The most elegant solution is a simple solution. Simple doesn’t mean easy. It’s often more difficult to break a complex problem down into simple, scalable terms. We don’t appreciate, or value, over architected solutions or superfluous coding.
- Time is one of our most precious commodities. Scalability implies being respectful of this and passionate about making the most efficient use of each and every one of our team members time.
"All Work is Strategic"
- No matter how small a project or assignment appears, every single engagement is an opportunity for us to prove ourselves, build trust, and develop relationships that last and grow
- Every task, interaction, and commitment matters
- Big or small, we execute our plans and strategies with focus, commitment, and passion
We offer:
Job Type: Full-time, 1099
Benefits:
Capnexus is an equal opportunity employer. We embrace and celebrate diversity and are committed to creating an inclusive and safe environment for all employees. Experience comes in many forms, and we’re dedicated to adding new perspectives to the team. We encourage you to apply even if your experience doesn’t perfectly align with what we have listed. We look forward to hearing from you.
No Agencies Please!