Gen AI Data Engineer - Pyspark/Python

Vallum Associates

London, GB

Hybrid

Job Description

The Role:

GenAI Data Engineer

Location:

London (or) Edinburgh, UK

Position Type:

Contract Inside IR35

Remote work option Available:

Hybrid – 2 Days Onsite

Job Description:

Essential skills/knowledge/experience:

Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines.

Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing.

Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration.

Hands‑on expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting).

Practical experience with GenAI/LLM model creation, finetuning, benchmarking, and evaluation.

Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods.

Experience working with structured and unstructured datasets (documents, logs, text, images).

Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB).

Understanding model optimization techniques (quantization, distillation, inference optimization).

Strong capability to debug, tune, and optimize distributed systems and AI pipelines.

Desirable skills/knowledge/experience:

Pyspark, Python, SQL,AWS, GenAI

Skills & Requirements

Technical Skills

PysparkDistributed data processingEtl/elt pipelinesSqlStar/snowflake schema designIndexing strategiesCdcScd type 1/2/3PythonAwsS3GlueLambdaEmrBedrockGenai/llm model creationFinetuningBenchmarkingEvaluationRag architecturesEmbeddingsVector storesLlm evaluation methodsStructured and unstructured datasetsDelta lakeParquetRedshiftDynamodbModel optimization techniquesQuantizationDistillationInference optimization

Employment Type

CONTRACT

Level

mid

Posted

4/29/2026

Apply Now

You will be redirected to Vallum Associates's application portal.