Senior Data Engineer — Hybrid--San Francisco Bay Area, Metropolitan Area Details

PraxisPro Inc.

San Francisco, US

Hybrid

Job Description

Position: Senior Data Engineer — Hybrid--San Francisco Bay Area, New York Metropolitan Area More Details

Full-Time / Hybrid--San Francisco Bay Area, New York Metropolitan Area

Senior Data Engineer

Full-Time

Engineering

$190,000 - $250,000

Apply Now

Who We Are

Praxis Pro, a data intelligence company, is on a mission to heal the fractured state of Life Sciences commercial data by surfacing undocumented and previously inaccessible datasets to drive novel commercial intelligence and improve patient outcomes. Praxis Pro has begun surfacing undocumented and previously inaccessible data with the industry’s first purpose-built Learning Experience Platform (LXP).

By serving as the commercial intelligence backbone across commercial, medical, and compliance functions, our industry-specific AI models for therapeutic areas and disease states form the foundation for a new standard of commercial intelligence, one that enables disciplined execution at scale while allowing Life Sciences organizations to focus on what matters most: advancing patient outcomes.

The Role

We’re looking for a Senior Data Engineer (5–7 years) who is fluent in both streaming and batch paradigms on AWS or GCP

. You’ll design and operate data platforms that power analytics, personalization, and recommendation use cases—partnering closely with ML engineers to move models from notebooks to production.

What You’ll Do

Design & build pipelines: Low-latency streaming and reliable batch ETL/ELT for multi-tenant datasets across AWS or GCP.
Own data quality: Implement contracts, validation, observability, lineage, backfills, and SLAs/SLOs.
Operationalize ML: Productionize features, embeddings, and model I/O for personalization/recommendation (feature stores, real-time inference paths, batch retraining).
Model the warehouse/lake: Create well-governed schemas (e.g., medallion/lakehouse patterns) to support BI and experimentation.
Harden & scale: Optimize cost/perf, implement autoscaling, partitioning, compaction, and tiered storage; champion reliability and incident response.
Security & compliance: Build with least‑privilege IAM, encryption, PII handling, and auditability aligned to SOC 2 and healthcare data expectations.
Collaborate: Partner with product, ML, and app teams; contribute to data platform roadmap and coding standards.

Required Qualifications

5–7 years building and running production streaming + batch data pipelines.
Cloud: Expertise in AWS (Kinesis/MSK, Glue/EMR, Lambda, S3, Redshift) or GCP (Pub/Sub, Dataflow/Dataproc, GCS, Big Query).
Polyglot engineering: Strong hands‑on in Python plus one or more of Scala/Java/Go/Type Script .
Distributed processing: Solid with Spark/Flink/Beam and related performance tuning (checkpointing, state, watermarking).
Orchestration & ELT: Airflow/Dagster and dbt or equivalent; CI/CD for data (tests, contracts).
ML‑adjacent experience: Shipping data features for personalization/recs (e.g., candidate generation, ranking features, user/item embeddings, offline/online consistency).
Data foundations: Schema design, partitioning, CDC, late/duplicate data handling, idempotency, backfills.
Reliability: Monitoring/alerting, on‑call familiarity, cost/perf optimization.
Communication: Clear written/spoken communication across engineering and product stakeholders.

Nice to Have

Feature stores (e.g., Feast), vector DBs (Qdrant, Pinecone, FAISS), or realtime retrieval for recs.
Event bus & contract tooling (Kafka + Protobuf/Avro), schema registry.
Data governance/lineage (Open Lineage, Data Hub, Collibra or similar).
MLOps/model serving (Vertex AI, Sage Maker, Ray Serve, Triton, custom microservices).
Infra‑as‑code (Terraform/CDK), containers (Docker/Kubernetes/ECS/GKE).
Experience with regulated data (HIPAA‑adjacent), multi‑tenant SaaS, and privacy‑preserving analytics.
Experimentation/platform work for ranking systems (A/B testing, counterfactual logging).

Our Current Stack (illustrative)

AWS: S3, Kinesis/MSK, Glue/EMR, Lambda, Redshift;

GCP: GCS, Pub/Sub, Dataflow/Dataproc, Big Query

Processing: Spark, Flink, Beam;

Transform: db

Orchestration: Airflow or Dagster;

Contracts/Obs: Great Expectations/Deequ, Open Lineage

Serving: REST/gRPC services, Model inference endpoints;

Storage: Postgres, Redis, vector stores (e.g., Qdrant)

Work Style & Hours

Remote‑first U.S. team; preference for Pacific Time overlap (West Coast strongly preferred).
Collaboration via docs, async updates, and crisp incident/ops playbooks.

If you are excited by the intersection of AI research and real‑world product building

, we’d love to hear from you.

How to Apply

Click "Apply Now" on this page, or email us at careers

. We welcome candidates from all backgrounds and identities who share our passion for innovation and continuous learning.

Praxis Pro is proud to be an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Skills & Requirements

Technical Skills

PythonScalaJavaGoTypeScriptSparkFlinkBeamAirflowDagsterdbtFeastQdrantPineconeFAISSKafkacommunicationcollaborationdata engineeringcloud computingmachine learningdata pipelinesdata qualitydata governancedata securitydata compliance

Salary

$190,000 - $250,000

year

Employment Type

FULL TIME

Level

mid

Posted

4/8/2026

Apply Now

You will be redirected to PraxisPro Inc.'s application portal.