Baselayer·13 days ago
About Baselayer
Trusted by 2,200+ financial institutions, Baselayer is the intelligent business identity platform that helps verify any business, automate KYB, and monitor real-time risk. Baselayer’s B2B risk solutions and identity graph network leverage state and federal government filings and proprietary data sources to prevent fraud, accelerate onboarding, and lower credit losses.
About the Role
We are looking for a Data Engineer to design, build, and operate the data infrastructure that powers Baselayer’s analytics and machine learning capabilities. You will own robust, scalable pipelines that ingest, transform, and validate structured and unstructured data from internal systems and external sources, with a strong focus on reliability, observability, and data quality.
This is a hands-on role for someone who thrives in complexity, cares deeply about correctness, and wants to work close to AI and product workflows in a regulated domain.
What You’ll Do
Design, build, and maintain robust ETL and ELT pipelines that power analytics and machine learning use cases
Own and improve the architecture and tooling for storing, processing, and querying large-scale datasets in cloud data platforms
Implement orchestration and automation for data workflows using tools such as Airflow, dbt, or similar
Build and maintain reusable data models to enable faster experimentation and reliable reporting
Implement data quality checks, observability, and alerting to ensure integrity and reliability across environments
Partner with Data Science, ML Engineering, Product, and Engineering to ensure reliable data delivery and feature readiness for modeling
Optimize warehouse and query performance, scalability, and cost as data volumes grow
Maintain clear documentation, runbooks, and operational processes for pipelines and datasets
Partner with security and compliance stakeholders to ensure pipelines and access controls meet regulatory and internal standards
About You
You want to learn fast, take ownership, and build systems that other teams can rely on. You are not just doing this for the win. You are doing it because you have something to prove and want to be great.
You care about data integrity and reliability, you enjoy turning messy inputs into clean systems, and you are comfortable operating without a playbook. You are curious about AI and ML infrastructure and want to build the foundation that powers it.
Required Experience and Skills
4 to 12 years of experience in data engineering or analytics engineering
Strong Python and SQL skills, with experience building production-grade data workflows
Experience building and maintaining ETL or ELT pipelines and working with cloud data warehouses or analytics databases
Familiarity with orchestration, workflow scheduling, and transformation tooling (for example Airflow, dbt, Dagster, Prefect, or similar)
Comfort working with both structured and unstructured data and designing scalable data architectures
Strong understanding of data quality, testing, observability, and operational best practices
Ability to communicate clearly across technical and non-technical audiences
What Sets You Apart
Experience working in regulated environments or with sensitive identity, risk, fraud, compliance, or financial services data
Experience integrating external data sources and APIs, including government or registry data
Familiarity with near-real-time or streaming data patterns
Highly feedback-oriented with a desire for continuous improvement
Strong bias toward ownership and building systems that scale
Work Location
Hybrid in SF, in office 3 days per week
Compensation and Benefits
Salary range of $135,000 to $220,000
Equity package
Unlimited vacation
Fully paid health insurance, dental, and vision
401(k) with company match