You're a key individual driving a Data Warehouse project, focusing on the design, development, and maintenance of a new data lakehouse platform.
about the job.
Design and implement a robust, scalable, resilient, and high-performance data storage and management system specifically for genomic data. This includes defining and documenting system architecture, data models, data flow diagrams, and system interfaces.
- Evaluate and recommend optimal storage technologies (e.g., object storage, parallel file systems, databases) considering performance, cost, scalability, and reliability. Identify and address performance bottlenecks in data storage and retrieval , and develop scalability plans for rapid genomic data growth.
- Work closely with bioinformaticians, software engineers, and DevOps teams to ensure the data infrastructure meets their needs.
- Perform additional tasks as assigned by senior officers.
skills & experiences required.
Bachelor's degree in Computer Science or a related field with at least 6 years of relevant work experience.
Strong understanding of data warehousing principles, data modeling, ETL processes, and cloud-based data technologies.
- Deep expertise in various storage technologies like object storage, parallel file systems, and databases.
Proficiency in Linux system administration (shell scripting, monitoring, performance tuning) , experience with data security and access control mechanisms , and knowledge of containerization with Docker and Kubernetes.
- A keen focus on platform-level optimization for enhanced performance and scalability.
- Experience with genomic data analysis and bioinformatics tools is preferred, but not required.