Part-Time Research Data Scientist

Texas Health Institute
Austin, US
On-site

Job Description

We’re hiring a part-time Research Data Scientist to lead end-to-end preparation of complex, large-scale health datasets for peer-reviewed publication. This role centers on cleaning, harmonizing, and structuring messy, multi-source datasets, followed by advanced statistical analysis and machine learning to generate publishable insights.

You’ll work with survey, observational, and real-world health data, building reproducible analytical workflows that meet academic research standards. This role is best suited for a PhD‑trained data scientist or quantitative researcher with deep experience in machine learning, advanced statistics, and real-world data analysis.

Key Responsibilities

  • Data Cleaning & Harmonization
  • Clean, normalize, and integrate messy datasets from multiple sources (e.g., survey data from longitudinal studies)
  • Resolve inconsistencies and schema mismatches across datasets
  • Design scalable approaches to dataset harmonization for cross‑study comparability
  • Data Pipeline Development
  • Build and maintain reproducible data processing workflows for large‑scale datasets
  • Structure datasets for downstream statistical modeling and publication‑ready outputs
  • Implement version‑controlled workflows for data processing and analysis
  • Statistical Analysis & Machine Learning
  • Apply advanced statistical methods (e.g., mixed‑effects models, causal inference, longitudinal modeling)
  • Develop, validate, and interpret machine learning models for large‑scale observational data as needed
  • Ensure methodological rigor aligned with peer‑reviewed research standards
  • Research Collaboration
  • Partner with researchers to refine hypotheses, define analytic strategies, and interpret findings
  • Translate complex analyses into clear, defensible results for academic publication
  • Reproducibility & Publication Support
  • Develop reproducible codebases and documentation (e.g. notebooks, pipelines)
  • Prepare datasets, figures, and statistical outputs for manuscripts, abstracts, and reports
  • Contribute to methodological transparency and auditability of analyses
  • Technical publication‑ready writing ability required—writing up Results and Methods sections for publication

Qualifications

  • PhD (preferred) in Data Science, Statistics, Biostatistics, Epidemiology, Computer Science, Experimental Psychology or a related quantitative field
  • 3–5+ years experience working with large, complex datasets in research, healthcare, or applied data science
  • Strong expertise in data cleaning, preprocessing, and dataset harmonization at scale
  • Advanced proficiency in Python or R (e.g., pandas, tidyverse, scikit‑learn, stats models) or related software/programming experience
  • Deep experience with machine learning and advanced statistical methods
  • Strong foundation in reproducible research practices
  • Ability to communicate technical findings clearly to interdisciplinary teams and collaborate with team members to produce high quality publications

Preferred Qualifications

  • Prior experience preparing analyses for peer‑reviewed publication
  • Familiarity with survey data (Qualtrics, REDCap) and/or healthcare data standards (FHIR)
  • Background in public health, epidemiology, or biostatistics
  • Experience with causal inference, longitudinal analysis, or real‑world evidence studies
  • Experience working with messy, real‑world observational datasets across multiple sources
  • Familiarity with cloud or distributed data tools (AWS, GCP, Spark)
  • Background or familiarity in cannabinoid research

#J-18808-Ljbffr

Skills & Requirements

Technical Skills

PythonRPandasTidyverseScikit-learnStats modelsData scienceStatisticsMachine learningBiostatisticsEpidemiology

Employment Type

PART TIME

Level

senior

Posted

4/29/2026

Apply Now

You will be redirected to Texas Health Institute's application portal.

Sign in and we'll score your resume against this role.