Biohub·6 days ago
Biohub is leading the new era of AI-powered biology to cure or prevent disease through its 501c3 medical research organization, with the support of the Chan Zuckerberg Initiative.
Biohub supports the science and technology that will make it possible to help scientists cure, prevent, or manage all diseases by the end of this century. While this may seem like an audacious goal, in the last 100 years, biomedical science has made tremendous strides in understanding biological systems, advancing human health, and treating disease.
Achieving our mission will only be possible if scientists are able to better understand human biology. To that end, we have identified four grand challenges that will unlock the mysteries of the cell and how cells interact within systems — paving the way for new discoveries that will change medicine in the decades that follow:
The Data Pipelines team processes scientific datasets specifically designed to enable biological modeling and supporting AI research. It is responsible for data ETL, data validation, testing, storage, and partners with the data management team for retrieval. We handle over 89 million unique cells worth of single cell transcriptomic data, over 15 thousand cryoET tomograms that are in imaging datasets as large as 20TB and counting, and will be expanding to support larger scale and additional imaging, sequencing, and literature modalities. Our resources provide access to open source data that is structured and used by tens of thousands of scientists each month to quickly query and form hypotheses on understanding how genetic variants in cells impact disease risk, define drug toxicities, and eventually discover better therapies.
As a software engineer on the Data Engineering team, you will contribute for architecture, help implement all the above mentioned data needs for our platforms, CELLxGENE Discover, CryoET, as well as the new platform we are building that has a focus on data for AI and the virtual cell, in order to enable scientists to further interrogate our very large and growing corpus of data without any need to download the data itself or have any computational expertise. You will work on a collaborative, multidisciplinary team to develop solutions for our scientist users to accelerate their workflows and accelerate the pace of scientific discovery.
No prior biology experience is needed for this role. You will have the opportunity to pair with Computational Biologists to develop solutions for our users and be able to learn about biology from experts on our team.
Our tech stack: Python, Terraform, AWS infrastructure, Argo CD and Workflows. TileDB.
Nice to Have
The Redwood City, CA base pay range for a new hire in this role is $214,000 - $294,800. New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process.
As we grow, we’re excited to strengthen in-person connections and cultivate a collaborative, team-oriented environment. This role is a hybrid position requiring you to be onsite for at least 60% of the working month, approximately 3 days a week, with specific in-office days determined by the team’s manager. The exact schedule will be at the hiring manager's discretion and communicated during the interview process.
We’re thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible.
If you’re interested in a role but your previous experience doesn’t perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.
#LI-Hybrid