Part-Time Research Data Scientist

Texas Health Institute

Austin, US

On-site

Job Description

We’re hiring a part-time Research Data Scientist to lead end-to-end preparation of complex, large-scale health datasets for peer-reviewed publication. This role centers on cleaning, harmonizing, and structuring messy, multi-source datasets, followed by advanced statistical analysis and machine learning to generate publishable insights.

You’ll work with survey, observational, and real-world health data, building reproducible analytical workflows that meet academic research standards. This role is best suited for a PhD‑trained data scientist or quantitative researcher with deep experience in machine learning, advanced statistics, and real-world data analysis.

Key Responsibilities

Data Cleaning & Harmonization
Clean, normalize, and integrate messy datasets from multiple sources (e.g., survey data from longitudinal studies)
Resolve inconsistencies and schema mismatches across datasets
Design scalable approaches to dataset harmonization for cross‑study comparability
Data Pipeline Development
Build and maintain reproducible data processing workflows for large‑scale datasets
Structure datasets for downstream statistical modeling and publication‑ready outputs
Implement version‑controlled workflows for data processing and analysis
Statistical Analysis & Machine Learning
Apply advanced statistical methods (e.g., mixed‑effects models, causal inference, longitudinal modeling)
Develop, validate, and interpret machine learning models for large‑scale observational data as needed
Ensure methodological rigor aligned with peer‑reviewed research standards
Research Collaboration
Partner with researchers to refine hypotheses, define analytic strategies, and interpret findings
Translate complex analyses into clear, defensible results for academic publication
Reproducibility & Publication Support
Develop reproducible codebases and documentation (e.g. notebooks, pipelines)
Prepare datasets, figures, and statistical outputs for manuscripts, abstracts, and reports
Contribute to methodological transparency and auditability of analyses
Technical publication‑ready writing ability required—writing up Results and Methods sections for publication

Qualifications

PhD (preferred) in Data Science, Statistics, Biostatistics, Epidemiology, Computer Science, Experimental Psychology or a related quantitative field
3–5+ years experience working with large, complex datasets in research, healthcare, or applied data science
Strong expertise in data cleaning, preprocessing, and dataset harmonization at scale
Advanced proficiency in Python or R (e.g., pandas, tidyverse, scikit‑learn, stats models) or related software/programming experience
Deep experience with machine learning and advanced statistical methods
Strong foundation in reproducible research practices
Ability to communicate technical findings clearly to interdisciplinary teams and collaborate with team members to produce high quality publications

Preferred Qualifications

Prior experience preparing analyses for peer‑reviewed publication
Familiarity with survey data (Qualtrics, REDCap) and/or healthcare data standards (FHIR)
Background in public health, epidemiology, or biostatistics
Experience with causal inference, longitudinal analysis, or real‑world evidence studies
Experience working with messy, real‑world observational datasets across multiple sources
Familiarity with cloud or distributed data tools (AWS, GCP, Spark)
Background or familiarity in cannabinoid research

#J-18808-Ljbffr

Skills & Requirements

Technical Skills

PythonRPandasTidyverseScikit-learnStats modelsData scienceStatisticsMachine learningBiostatisticsEpidemiology

Employment Type

PART TIME

Level

senior

Posted

4/29/2026

Apply Now

You will be redirected to Texas Health Institute's application portal.