Information Technology - Lead Data Engineer

Job Description

The Lead Data Engineer is a senior software developer with strong software engineering skills who is responsible for building custom open-source-based data ingestion and MLOps platforms. He/she has deep appreciation of the complexity of the data engineering process, such as the challenges of data ingestion involving large or near-real-time datasets, the maintenance of high data quality, and the importance of automation for increasing pipeline robustness and reducing the need for human intervention.

Key Responsibilities

  • Be an effective distributed-system implementer in the following core activities:

o Design and develop data engineering services and their ecosystem using distributed databases (relational, columnar, graph, in-memory); orchestration (Apache Airflow); and distributed stream/batch data processing (Kafka, Kinesis, Spark).

o Design and develop MLOps production pipelines; provide technical support to data scientists/ML engineers by getting their ML/DL models deployed at scale and meeting SLAs on both cloud and on-premises GPU and CPU instances.

o Design data models for mission-critical, high-volume, near-real-time/batch data; build idempotent/atomic production data pipelines to make data ingestion more fault tolerant.

o Design and develop intuitive, highly automated, self-service data platform functions for business users.

o Design, build, and operate scalable and reliable data pipelines on the Databricks platform.

  • Explore, evaluate and champion the introduction of next-generation technologies in the data-ingestion workflow. Participate in project planning and provide technical guidance on cloud architecture for data projects.

 

 

Requirements

  • BS in Computer Science or other related discipline is required. Advanced degrees in Computer Science (PhD, MS) are highly desirable.
  • 5+ years of relevant industry experience in some or most of the following technical areas:
    o Advanced programming skills in Python. Conversant with data structures and algorithm design.
    o Experience in building data pipelines (including data collection, warehousing, processing, analysis, monitoring, and governance) using open-source data ingestion platforms.
    o Intermediate-level knowledge and experience with AWS cloud components and best practices. Good understanding in deploying data stores such as S3, RedShift, Elasticache, PostgreSQL, and EMR.
    o Hands on experience with Databricks workspace, cluster management, AI Agent capabilities, and job orchestration
    o Prior experience in modern software development is required (such as web frontend UI, backend API microservices, understanding of CI/CD and Scrum/Kanban agile development).  Strong grasp on object-oriented or functional programming (using e.g. Python, Java, Scala, or C#).

 


We thank all candidates for your interest in Singapore Airlines, and regret that only shortlisted candidates will be notified. 

1858