JobsotherData Scientist
K
Koantek

Data Scientist

📍Hyderabad, Telangana, Indiaunknown

About this role

About the Role:

We are looking for a Data Scientist with 3-9 years of experience in developing Natural Language Processing (NLP) and Generative AI (GenAI) solutions. The ideal candidate is hands-on with a proven track record of designing and developing agentic AI solutions within customer-facing roles. You will be responsible for researching and building state-of-the-art AI solutions that can perform multi-step reasoning to solve complex business challenges. Experience with Databricks (especially MLOps Stacks) is highly desirable.

Key Responsibilities:

• Translate business challenges into solvable NLP and GenAI use cases, such as document understanding, web search, automated Q&A, summarisation, and workflow automation.
• Stay updated with the latest GenAI/LLM advancements and evaluate them for feasibility and potential use.
• Design, build, and deploy LLM-powered retrieval-augmented generation (RAG) pipelines and agentic AI solutions, including multi-step reasoning systems, tool-using agents, and associated pipelines.
• Build basic UI frontends (e.g., using Streamlit, Flask) for internal demos or client-facing pilot GenAI applications.
• Apply MLOps best practices including MLflow-based tracking, Docker containerization, and CI/CD for GenAI pipelines.
• Develop customer demos and prototypes using Databricks MosaicAI suite.
• Contribute to both internal R&D efforts and customer implementations, including rapid POCs and scalable production deployments.

Required Qualifications:

• 3–9 years of implementation experience in machine learning, with a strong focus on NLP and GenAI applications in a customer-facing role.
• Must have productionized machine learning or deep learning models.
• Familiarity with SQL and working with large, complex datasets.
• Proficiency in Python and NLP/LLM libraries/tools such as HuggingFace Transformers, LangChain, LangGraph, LlamaIndex, etc.
• Practical experience with prompt engineering, chunking, vector embeddings, semantic search, RAG pipelines, and LLM fine-tuning.
• Understanding of GenAI-specific challenges - hallucination, prompt security, rate limits, cost optimisation, etc.

Strong foundation in statistics, including:

• Model assumptions and diagnostics.
• Evaluation metrics and error analysis.
• Probabilistic modelling, hypothesis testing, and uncertainty quantification.
• Feature importance and interpretability techniques.

Experience in MLOps tools and processes, including:

• Model versioning and experiment tracking (e.g., MLflow)
• Containerization (Docker)
• CI/CD for ML workflows (e.g., GitHub Actions, Azure DevOps, or similar)
• Model monitoring and retraining workflows
• Desirable: Hands-on experience with Databricks for model development and deployment.
• Desirable: Familiarity with cloud environments and the native AI/ML-relate tools/services (Azure, AWS, or GCP).
• Strong analytical and communication skills, with a demonstrated ability to convert business requirements into NLP/GenAI solutions.

Educational Background:

• Bachelor’s or Master’s degree in Computer Science, Data Science, Mathematics, Statistics, Operational Research, or a related quantitative discipline.
• Relevant certifications (e.g., Databricks certifications, AWS/Azure/GCP AI/ML certifications) are a plus.
• Workplace Flexibility.
• This is a hybrid role with remote flexibility.
• On-site presence at customer locations will be required based on the project and business needs. Candidates should be willing and able to travel for short or medium-term assignments when necessary.