about company
I am currently working with a company focusing on AI, LLM and Computer Vision.
Salary structure: Base + bonus! Hybrid working arrangement- 2 days office, 3 days WFH. Office location at CBD. 4 rounds of interview to offer stage.
about job
- Design and implement robust frameworks to evaluate the performance of generative AI systems, including text and multi-modal models for Large Language Models (LLMs), including but not limited to GPT-based models, BERT, T5, and other state-of-the-art architectures
- Perform technical AI evaluations, benchmarking and “red-team” tests on LLM model, including assessing them for robustness in performance, embedded biases, vulnerability to jailbreaks and prompt injection attacks
- Work with stakeholders to design strong LLM models, custom evaluation approaches and a suite of technical and analytical AI evaluation frameworks and tools
- Define and refine metrics for evaluating model performance, such as perplexity, BLEU, ROUGE, accuracy, coherence, factual consistency, and bias detection
- Lead efforts in curating and managing large, high-quality datasets for evaluating LLMs
skills and requirements
- Min 5 years of total working experience with min 2 years in building and deploying LLMs
- Strong experience in evaluating LLMs using metrics such as perplexity, BLEU, ROUGE, and human-centered evaluation techniques
- Proven track record of managing and analyzing large, complex language datasets, including text preprocessing and tokenization
- Solid programming skills in Python and experience building automated pipelines for continuous model evaluation
- To apply online please use the 'apply' function, alternatively you may contact Stella at 96554170 (EA: 94C3609 /R1875382)