P
Senior Analyst - Data Engineer
Puma Energy Holdings Limited
India₹50,000–₹130,000/mo≈ AED 2.2K-5.7K/moToday
IndiaSparkData GovernanceData SecurityBI ToolsDelta LakeStructured StreamingAzure DatabricksFull Time
Skills Required
PythonSqlAzure
Job Description
Job Description As a Data Engineer at our company, your role will involve collaborating with data scientists and business stakeholders to design, develop, and maintain efficient data pipelines feeding into the organization's data lake. You will be responsible for maintaining the integrity and quality of the data lake to enable accurate insights for data scientists and informed decision-making for business stakeholders. Your extensive knowledge of data engineering and cloud technologies will be crucial in enhancing the organization's data infrastructure and promoting a culture of data-driven decision-making. You will apply your data engineering expertise to define and optimize data pipelines using advanced concepts to improve the efficiency and accessibility of data storage.
Key Responsibilities:
- Contribute to the development of scalable and performant data pipelines on Databricks, leveraging Delta Lake, Delta Live Tables (DLT), and other core Databricks components.
- Develop data lakes/warehouses designed for optimized storage, querying, and real-time updates using Delta Lake.
- Implement effective data ingestion strategies from various sources (streaming, batch, API-based), ensuring seamless integration with Databricks.
- Ensure the integrity, security, quality, and governance of data across Databricks-centric platforms.
- Collaborate with stakeholders to translate business requirements into Databricks-native data solutions.
- Build and maintain ETL/ELT processes heavily utilizing Databricks, Spark (Scala or Python), SQL, and Delta Lake for transformations.
- Monitor and optimize the cost-efficiency of data operations on Databricks, ensuring optimal resource utilization.
- Utilize a range of Databricks tools, including the Databricks CLI and REST API, alongside Apache Spark, to develop, manage, and optimize data engineering solutions.
Qualifications Required:
- 5 years of overall experience & at least 3 years of relevant experience
- 3 years of experience working with Azure or any cloud platform & Databricks
- Proficiency in Spark, Delta Lake, Structured Streaming, and other Azure Databricks functionalities for sophisticated data pipeline construction.
- Strong capability in diagnosing and optimizing Spark applications and Databricks workloads, including strategic cluster sizing and configuration.
- Expertise in sharing data solutions that leverage Azure Databricks ecosystem technologies for enhanced data management and processing efficiency.
- Profound knowledge of data governance, data security, coupled with an understanding of large-scale distributed systems and cloud architecture design.
- Experience with a variety of data sources and BI tools
Please note that you will also collaborate with internal departments including the Data Engineering Manager, developers across various departments, and managers of departments in other regional hubs of Puma Energy. As a Data Engineer at our company, your role will involve collaborating with data scientists and business stakeholders to design, develop, and maintain efficient data pipelines feeding into the organization's data lake. You will be responsible for maintaining the integrity and quality of the data lake to enable accurate insights for data scientists and informed decision-making for business stakeholders. Your extensive knowledge of data engineering and cloud technologies will be crucial in enhancing the organization's data infrastructure and promoting a culture of data-driven decision-making. You will apply your data engineering expertise to define and optimize data pipelines using advanced concepts to improve the efficiency and accessibility of data storage.
Key Responsibilities:
- Contribute to the development of scalable and performant data pipelines on Databricks, leveraging Delta Lake, Delta Live Tables (DLT), and other core Databricks components.
- Develop data lakes/warehouses designed for optimized storage, querying, and real-time updates using Delta Lake.
- Implement effective data ingestion strategies from various sources (streaming, batch, API-based), ensuring seamless integration with Databricks.
- Ensure the integrity, security, quality, and governance of data across Databricks-centric platforms.
- Collaborate with stakeholders to translate business requirements into Databricks-native data solutions.
- Build and maintain ETL/ELT processes heavily utilizing Databricks, Spark (Scala or Python), SQL, and Delta Lake for transformations.
- Monitor and optimize the cost-efficiency of data operations on Databricks, ensuring optimal resource utilization.
- Utilize a range of Databricks tools, including the Databricks CLI and REST API, alongside Apache Spark, to develop, manage, and optimize data engineering solutions.
Qualifications Required:
- 5 years of overall experience & at least 3 years of relevant experience
- 3 years of experience working with Azure or any cloud platform & Databricks
- Proficiency in Spark, Delta Lake, Structured Str