Data Engineer(Data Analyst (Databricks, Apache Spark, and Delta Lake, GenAI or AI/ML integrations).)

Datamatics

面議
远程办公3 - 5 年經驗專科契約工
分享

遠程工作詳情

工作開放國家菲律賓

語言要求英語

這項遠距工作向特定國家的候選人開放。 請確認您是否要繼續,儘管可能有位置限制

職位描述

簡介

Job Role: Data Analyst (Databricks, Apache Spark, and Delta Lake, GenAI or AI/ML integrations).

Location: Manila.

Duration: 6+ Months Contract.


Job Description:

Scope of Work/Responsibilities

1. Data Pipeline Development:

  • Design, implement, and optimize end-to-end data pipelines using Databricks and related technologies.
  • Build workflows to handle large-scale data ingestion, transformation, and storage.


2. Data Preparation for LLMs:

  • Preprocess, clean, and structure diverse datasets (text, structured, and unstructured) for LLM training and fine-tuning.
  • Implement feature engineering, tokenization, and vectorization techniques to support NLP models.


3. Performance Optimization:

  • Use Databricks features, including Delta Lake and MLflow, to streamline data workflows.
  • Optimize data infrastructure for high availability, scalability, and cost-efficiency.


4. Collaboration with Teams:

  • Work closely with data scientists, ML engineers, and other stakeholders to understand data requirements for LLM technology requirements.
  • Ensure alignment between engineering pipelines and machine learning goals.


5. Data Quality & Governance:

  • Implement processes to ensure data quality, consistency, and compliance with governance policies.
  • Monitor and maintain data integrity throughout the pipeline lifecycle.


6. Emerging Technology Adoption:

  • Stay updated on advancements in Databricks, generative AI, and LLM technologies.
  • Contribute to the adoption of innovative tools and practices to improve workflows.



Requirement and Qualification (Education & Work Experience)


Experience:

  • 7+ years of experience in data engineering roles, with at least 2 years in a leadership role and projects involving Databricks.
  • Proven expertise in data pipelines, feature engineering, and dataset preparation for machine learning, specifically LLMs.
  • Experience building enterprise-grade applications with GenAI or AI/ML integrations.


Technical Skills:


  • Expertise in Databricks, Apache Spark, and Delta Lake.
  • Strong programming skills in Python and SQL; knowledge of libraries like pandas, NumPy, or PyTorch is a plus
  • Understanding of state management libraries like Redux, Recoil, or Zustand.Cypress), and version control (Git).
  • Understanding of web security principles and compliance requirements for enterprise applications.


Soft Skills:

  • Exceptional problem-solving and decision-making abilities.
  • Excellent communication and leadership skills, with the ability to guide technical discussions and mentor team members.
  • Strong focus on delivering quality

職位要求

Please refer to job description.

數據建模ETL ProcessesSQLPythonData WarehousingBig Data Technologies雲計算Data Pipeline AutomationNoSQLData Quality Assurance
Preview

Boss

HR ManagerDatamatics

發布於 23 April 2025

舉報

Bossjob安全提醒

若該職位需要您出國工作,請提高警惕,並小心詐騙。

如果您在求職過程中遇到雇主有以下行為, 請立即檢舉

  • 扣留您的身分證件,
  • 要求您提供擔保或收取財產,
  • 迫使您投資或籌集資金,
  • 收取非法利益,
  • 或其他違法情形。