Information Technology - Lead Data Engineer

Job Description

The Lead Data Engineer is a senior software developer with strong software engineering skills who is responsible for building custom open-source-based data ingestion and MLOps platforms. He/she has deep appreciation of the complexity of the data engineering process, such as the challenges of data ingestion involving large or near-real-time datasets, the maintenance of high data quality, and the importance of automation for increasing pipeline robustness and reducing the need for human intervention.

Key Responsibilities

Be an effective distributed-system implementer in the following core activities:

o Design and develop data engineering services and their ecosystem using distributed databases (relational, columnar, graph, in-memory); orchestration (Apache Airflow); and distributed stream/batch data processing (Kafka, Kinesis, Spark).

o Design and develop MLOps production pipelines; provide technical support to data scientists/ML engineers by getting their ML/DL models deployed at scale and meeting SLAs on both cloud and on-premises GPU and CPU instances.

o Design data models for mission-critical, high-volume, near-real-time/batch data; build idempotent/atomic production data pipelines to make data ingestion more fault tolerant.

o Design and develop intuitive, highly automated, self-service data platform functions for business users.

o Design, build, and operate scalable and reliable data pipelines on the Databricks platform.

Explore, evaluate and champion the introduction of next-generation technologies in the data-ingestion workflow. Participate in project planning and provide technical guidance on cloud architecture for data projects.

Requirements

BS in Computer Science or other related discipline is required. Advanced degrees in Computer Science (PhD, MS) are highly desirable.
5+ years of relevant industry experience in some or most of the following technical areas:
o Advanced programming skills in Python. Conversant with data structures and algorithm design.
o Experience in building data pipelines (including data collection, warehousing, processing, analysis, monitoring, and governance) using open-source data ingestion platforms.
o Intermediate-level knowledge and experience with AWS cloud components and best practices. Good understanding in deploying data stores such as S3, RedShift, Elasticache, PostgreSQL, and EMR.
o Hands on experience with Databricks workspace, cluster management, AI Agent capabilities, and job orchestration
o Prior experience in modern software development is required (such as web frontend UI, backend API microservices, understanding of CI/CD and Scrum/Kanban agile development). Strong grasp on object-oriented or functional programming (using e.g. Python, Java, Scala, or C#).

We thank all candidates for your interest in Singapore Airlines, and regret that only shortlisted candidates will be notified.

1858