JobsAisle
M

Arabic LLM Evaluator — Bilingual QA&Research Analyst

Mercor

Ajman, UAEAED 10,000-16,667/moToday
UAEIT & TechnologyFull Time

Skills Required

ExcelArabic

Job Description

<div><p><strong>About the job</strong></p><p><strong>Mercor</strong>connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco our investors include<strong>Benchmark</strong><strong>General Catalyst</strong><strong>Peter Thiel</strong><strong>Adam DAngelo</strong><strong>Larry Summers</strong>and<strong>Jack Dorsey</strong>.</p><p><strong>Position:</strong>Language Model Evaluator<br><strong>Type:</strong><strong>Full-time or Part-time Contract Work</strong><br><strong>Compensation:</strong><strong>$23/hour</strong><br><strong>Location:</strong><strong>Geography restricted to Egypt Saudi Arabia UAE USA</strong></p><p><strong>Role Responsibilities</strong></p><ul><li>Evaluate<strong>LLM-generated responses</strong>on their ability to effectively answer user queries.</li><li>Conduct fact-checking using trusted public sources and external<strong>tools</strong>.</li><li>Generate high-quality human evaluation data by annotating response strengths areas for improvement and factual inaccuracies.</li><li>Assess reasoning quality clarity tone and completeness of responses.</li><li>Ensure model responses align with expected conversational behavior and system guidelines.</li><li>Apply consistent annotations by following clear taxonomies benchmarks and detailed evaluation guidelines.</li></ul><p><strong>Qualifications</strong></p><p><strong>Must-Have</strong></p><ul><li><strong>Bachelors degree</strong></li><li><strong>Native speaker</strong>or<strong>ILR 5/primary fluency (C2 on the CEFR scale)</strong>in<strong>Arabic</strong></li><li><strong>Significant experience using large language models</strong>(LLMs)</li><li><strong>Excellent writing skills</strong></li><li><strong>Strong attention to detail</strong></li><li><strong>Adaptable</strong>and comfortable moving across topics domains and customer requirements</li><li>Background or experience in domains requiring<strong>structured analytical thinking</strong></li><li><strong>Excellent college-level mathematics skills</strong></li></ul><p><strong>Preferred</strong></p><ul><li>Prior experience with<strong>RLHF model evaluation or data annotation work</strong></li><li>Experience writing or editing<strong>high-quality written content</strong></li><li>Experience comparing multiple outputs and making<strong>fine-grained qualitative judgments</strong></li><li><strong>Familiarity with evaluation rubrics</strong>benchmarks or quality scoring systems</li></ul><p><strong>Application Process (Takes 2030 mins to complete)</strong></p><ul><li>Upload resume</li><li>AI interview based on your resume</li><li>Submit form</li></ul><p><strong>Resources&Support</strong></p><ul><li>For details about the interview process and platform information please check:</li><li>For any help or support reach out to:</li></ul><p><em>PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.</em></p></div>#J-18808-Ljbffr