簡介
Job Role: Data Analyst (Databricks, Apache Spark, and Delta Lake, GenAI or AI/ML integrations).
Location: Manila.
Duration: 6+ Months Contract.
Job Description:
Scope of Work/Responsibilities
1. Data Pipeline Development:
- Design, implement, and optimize end-to-end data pipelines using Databricks and related technologies.
- Build workflows to handle large-scale data ingestion, transformation, and storage.
2. Data Preparation for LLMs:
- Preprocess, clean, and structure diverse datasets (text, structured, and unstructured) for LLM training and fine-tuning.
- Implement feature engineering, tokenization, and vectorization techniques to support NLP models.
3. Performance Optimization:
- Use Databricks features, including Delta Lake and MLflow, to streamline data workflows.
- Optimize data infrastructure for high availability, scalability, and cost-efficiency.
4. Collaboration with Teams:
- Work closely with data scientists, ML engineers, and other stakeholders to understand data requirements for LLM technology requirements.
- Ensure alignment between engineering pipelines and machine learning goals.
5. Data Quality & Governance:
- Implement processes to ensure data quality, consistency, and compliance with governance policies.
- Monitor and maintain data integrity throughout the pipeline lifecycle.
6. Emerging Technology Adoption:
- Stay updated on advancements in Databricks, generative AI, and LLM technologies.
- Contribute to the adoption of innovative tools and practices to improve workflows.
Requirement and Qualification (Education & Work Experience)
Experience:
- 7+ years of experience in data engineering roles, with at least 2 years in a leadership role and projects involving Databricks.
- Proven expertise in data pipelines, feature engineering, and dataset preparation for machine learning, specifically LLMs.
- Experience building enterprise-grade applications with GenAI or AI/ML integrations.
Technical Skills:
- Expertise in Databricks, Apache Spark, and Delta Lake.
- Strong programming skills in Python and SQL; knowledge of libraries like pandas, NumPy, or PyTorch is a plus
- Understanding of state management libraries like Redux, Recoil, or Zustand.Cypress), and version control (Git).
- Understanding of web security principles and compliance requirements for enterprise applications.
Soft Skills:
- Exceptional problem-solving and decision-making abilities.
- Excellent communication and leadership skills, with the ability to guide technical discussions and mentor team members.
- Strong focus on delivering quality
職位要求
Please refer to job description.