DeepLight AI is a specialist AI and data consultancy with extensive experience implementing intelligent enterprise systems across multiple industries, with particular depth in financial services and banking. Our team combines deep expertise in data science, statistical modeling, AI/ML technologies, workflow automation, and systems integration with a practical understanding of complex business operations.
The Data Engineer is responsible for designing, implementing, and optimising data pipelines and infrastructure to support our cutting-edge AI systems. The Data Engineer collaborates closely with our multidisciplinary team to ensure the efficient collection, storage, processing, and analysis of large-scale data, enabling us to unlock valuable insights and drive innovation across various domains.
Responsibilities of the role:
- Design, build, and optimise scalable data solutions, primarily utilising the Lakehouse architecture to unify data warehousing and data lake capabilities. Advise stakeholders on the strategic choice between Data Warehouse, Data Lake, and Lakehouse architectures based on specific business needs, cost, and latency requirements.
- Design, develop, and maintain scalable and reliable data pipelines to ingest, transform, and load diverse datasets from various sources, including structured and unstructured data, streaming data, and real-time feeds.
- Implement standards and tooling to ensure ACID properties, schema evolution, and high data quality within the Lakehouse environment. Implement robust data governance frameworks (security, privacy, integrity, compliance, auditing).
- Continuously optimize data storage, compute resources, and query performance across the data platform to reduce costs and improve latency for both BI and ML workloads, leveraging techniques such as indexing, partitioning, and parallel processing.
- Develop and maintain CI/CD pipelines to automate the entire machine learning lifecycle, from data validation and model training to deployment and infrastructure provisioning.
- Deploy, manage, and scale machine learning models into production environments, utilizing MLOps principles for reliable and repeatable operations.
- Establish and manage monitoring systems to track model performance metrics, detect data drift (changes in input data), and model decay (degradation in prediction accuracy).
- Ensure rigorous version control and tracking for all components: code, datasets, and trained model artifacts (using tools like MLflow or similar).
- Create comprehensive documentation, including technical specifications, data flow diagrams, and operational procedures, to facilitate understanding, collaboration, and knowledge sharing.