We’re hiring a part-time Research Data Scientist to lead end-to-end preparation of complex, large-scale health datasets for peer-reviewed publication. This role centers on cleaning, harmonizing, and structuring messy, multi-source datasets, followed by advanced statistical analysis and machine learning to generate publishable insights.
You’ll work with survey, observational, and real-world health data, building reproducible analytical workflows that meet academic research standards. This role is best suited for a PhD‑trained data scientist or quantitative researcher with deep experience in machine learning, advanced statistics, and real-world data analysis.
Key Responsibilities
- Data Cleaning & Harmonization
- Clean, normalize, and integrate messy datasets from multiple sources (e.g., survey data from longitudinal studies)
- Resolve inconsistencies and schema mismatches across datasets
- Design scalable approaches to dataset harmonization for cross‑study comparability
- Data Pipeline Development
- Build and maintain reproducible data processing workflows for large‑scale datasets
- Structure datasets for downstream statistical modeling and publication‑ready outputs
- Implement version‑controlled workflows for data processing and analysis
- Statistical Analysis & Machine Learning
- Apply advanced statistical methods (e.g., mixed‑effects models, causal inference, longitudinal modeling)
- Develop, validate, and interpret machine learning models for large‑scale observational data as needed
- Ensure methodological rigor aligned with peer‑reviewed research standards
- Research Collaboration
- Partner with researchers to refine hypotheses, define analytic strategies, and interpret findings
- Translate complex analyses into clear, defensible results for academic publication
- Reproducibility & Publication Support
- Develop reproducible codebases and documentation (e.g. notebooks, pipelines)
- Prepare datasets, figures, and statistical outputs for manuscripts, abstracts, and reports
- Contribute to methodological transparency and auditability of analyses
- Technical publication‑ready writing ability required—writing up Results and Methods sections for publication
Qualifications
- PhD (preferred) in Data Science, Statistics, Biostatistics, Epidemiology, Computer Science, Experimental Psychology or a related quantitative field
- 3–5+ years experience working with large, complex datasets in research, healthcare, or applied data science
- Strong expertise in data cleaning, preprocessing, and dataset harmonization at scale
- Advanced proficiency in Python or R (e.g., pandas, tidyverse, scikit‑learn, stats models) or related software/programming experience
- Deep experience with machine learning and advanced statistical methods
- Strong foundation in reproducible research practices
- Ability to communicate technical findings clearly to interdisciplinary teams and collaborate with team members to produce high quality publications
Preferred Qualifications
- Prior experience preparing analyses for peer‑reviewed publication
- Familiarity with survey data (Qualtrics, REDCap) and/or healthcare data standards (FHIR)
- Background in public health, epidemiology, or biostatistics
- Experience with causal inference, longitudinal analysis, or real‑world evidence studies
- Experience working with messy, real‑world observational datasets across multiple sources
- Familiarity with cloud or distributed data tools (AWS, GCP, Spark)
- Background or familiarity in cannabinoid research
#J-18808-Ljbffr