Dice is the leading career destination for tech experts at every stage of their careers. Our client, Everest Technologies, is seeking the following. Apply via Dice today!
We are seeking a QA Engineer with a strong background in
API testing
and
LLM fine-tuning/evaluation
. You will be responsible for the quality assurance of our Agent Mesh infrastructure, ensuring that the correctly translate enterprise business logic into machine-readable actions. Your goal is to ensure that AI agents interact with our reliably, securely, and without hallucinating tool calls.
Key Responsibilities
- AI Tool Validation: Test the accuracy of by verifying that LLMs correctly interpret OpenAPI specifications and trigger the right C#/.NET backend logic.
- Fine-Tuning Data Preparation: Curate and clean high-quality datasets (JSON/JSONL) in Python to fine-tune models for specific domain tasks and tool-calling accuracy.
- Prompt Regression Testing: Develop automated test suites to ensure that updates to underlying APIs or MCP servers do not break the reasoning or planning capabilities of the AI agents.
- Security & Auth QA: Validate that in Gravitee correctly enforce OAuth 2.1 and OpenFGA, preventing unauthorized data leakage through agent conversations.
- Performance Testing: Use to measure latency in the agent-to-API loop and identify bottlenecks in MCP server responses.
Technical Qualifications
- API Testing Mastery: Expert knowledge of REST, OpenAPI, and tools like Postman or Insomnia.
- Scripting: Proficiency in Python (for data processing and eval frameworks) and familiarity with C# (to understand backend MCP implementation).
- LLM Evaluation: Experience with frameworks like DeepEval, Ragas, or LangSmith to measure model performance (faithfulness, relevancy, and tool-call precision).
- API Management: Hands-on experience with or similar gateways to monitor and intercept traffic.
- Model Context Protocol: Understanding of and how it standardizes the way LLMs access external data.
Preferred Skills
- Experience with Red Teaming AI agents to identify prompt injection vulnerabilities.
- Knowledge of Vector Databases and how RAG (Retrieval-Augmented Generation) interacts with live API tools.
- Familiarity with GitHub Actions for CI/CD integration of AI evaluation pipelines.