Information Technology - Lead Data Engineer
Job Description
The Lead Data Engineer is a senior software developer with strong software engineering skills who is responsible for building custom open-source-based data ingestion and MLOps platforms. He/she has deep appreciation of the complexity of the data engineering process, such as the challenges of data ingestion involving large or near-real-time datasets, the maintenance of high data quality, and the importance of automation for increasing pipeline robustness and reducing the need for human intervention.
Key Responsibilities
- Be an effective distributed-system implementer in the following core activities:
o Design and develop data engineering services and their ecosystem using distributed databases (relational, columnar, graph, in-memory); orchestration (Apache Airflow); and distributed stream/batch data processing (Kafka, Kinesis, Spark).
o Design and develop MLOps production pipelines; provide technical support to data scientists/ML engineers by getting their ML/DL models deployed at scale and meeting SLAs on both cloud and on-premises GPU and CPU instances.
o Design data models for mission-critical, high-volume, near-real-time/batch data; build idempotent/atomic production data pipelines to make data ingestion more fault tolerant.
o Design and develop intuitive, highly automated, self-service data platform functions for business users.
o Design, build, and operate scalable and reliable data pipelines on the Databricks platform.
- Explore, evaluate and champion the introduction of next-generation technologies in the data-ingestion workflow. Participate in project planning and provide technical guidance on cloud architecture for data projects.
Requirements
- BS in Computer Science or other related discipline is required. Advanced degrees in Computer Science (PhD, MS) are highly desirable.
- 5+ years of relevant industry experience in some or most of the following technical areas:
o Advanced programming skills in Python. Conversant with data structures and algorithm design.
o Experience in building data pipelines (including data collection, warehousing, processing, analysis, monitoring, and governance) using open-source data ingestion platforms.
o Intermediate-level knowledge and experience with AWS cloud components and best practices. Good understanding in deploying data stores such as S3, RedShift, Elasticache, PostgreSQL, and EMR.
o Hands on experience with Databricks workspace, cluster management, AI Agent capabilities, and job orchestration
o Prior experience in modern software development is required (such as web frontend UI, backend API microservices, understanding of CI/CD and Scrum/Kanban agile development). Strong grasp on object-oriented or functional programming (using e.g. Python, Java, Scala, or C#).
We thank all candidates for your interest in Singapore Airlines, and regret that only shortlisted candidates will be notified.