Site logo

Your impact 

This is an exciting opportunity to join the newly established Data team at IsoLabs, working closely with world leading AI experts and Drug Discovery scientists to establish machine learning ready datasets that power the discovery of the next generation of medicines. As a data curator specialising in chemistry and biochemical properties you will be foundational in ensuring the quality of data and lead our efforts to represent biochemical information in the most impactful way for IsoLabs an AI driven drug-discovery platform.

What you will do 

  • Integrate large scale biochemical datasets and curate them to enhance their quality and create interoperable data assets that fuels IsoLabs research efforts.
  • Work in partnership across research teams to create ML-ready datasets and use your expertise in biochemistry to maximise the quality and scale of available training data.
  • Partner with drug discovery project teams to streamline experimental assay data integration and management processes.
  • Contribute to the data team’s efforts to identify, evaluate and assess new data sources and data generation opportunities.
  • Collaborate to devise novel ways to couple machine learning based data extraction methods with human domain expertise to build large scale high-quality datasets.
  • Develop innovative approaches to generate high quality synthetic data to supplement experimental data in ML development.
  • Communicate your work and raise awareness of opportunities to improve data quality.

Skills and qualifications 

Essential:

  • Proven experience working in industry at a biotech or pharmaceutical company or closely with industry at a research institution.
  • PhD in Chemistry or Cheminformatics, or equivalent experience in scientific research.
  • Expert in (bio)chemical data representation, analysis and curation including chemical structure, biochemical properties and retrosynthesis data.
  • Experience working with a broad range of experimental assay readouts used in the drug discovery Hit Identification and Lead Optimisation process (e.g. binding assays, ADMET properties).
  • Deep knowledge of chemistry databases and data sources and approaches to improve their interoperability for machine learning use cases.
  • Experience at curating structure-activity data and other molecular properties from unstructured sources including patents and scientific literature.
  • Good understanding of data quality in the context of machine learning for chemistry.
  • Experience at using cheminformatics and data science toolkits and an ability to apply them as part of data pipelining tools (e.g. Knime), python scripts and/or SQL.

Nice to have:

  • Expertise in structural bioinformatics and protein structure data processing.
  • Familiarity with using APIs from biochemical databases such as PubChem and ChEMBL.
  • Familiarity with data engineering concepts and experience with running jobs on Cloud-based infrastructure.

Job Overview

Print Job Listing

Isomorphic Labs

Share