Talent Leads HR Solutions Pvt Ltd

Junior Data Engineer Chennai

  • Job Type: Full Time
  • Industry Type: IT Sector
  • Industry Location: Chennai
  • Experience: 3-6yrs
  • No. of Positions: 1
  • Salary Range: 8-10 lac
  • Primary Skills: Java C Hadoop Big Data Machine Learning SQL Hive Algorithms
  • Secondary Skills: NoSQL Spark AWS Python Spyspark
  • Job Location: Chennai
  • Posted Date: 384 days ago
Job Description

Roles and Responsibilities

Skills: (Mandatory)

  • SQL
  • Python, Pyspark, Big Data
  • SQL, PL/SQL, NoSQL/Hadoop oriented database,
  • DevOps/MLOps,
  • Apache Kafka, Apache Spark Streaming Apache Samza

Skills: (Nice to have)

  • AWS
  • Cognos/ QlikView, Business Objects

Key Responsibilities:

The bulk of the data engineer’s work will be building, managing and optimizing data pipelines and then moving these data pipelines effectively into production for key data and analytics consumers.

Data engineers also need to guarantee compliance with data governance and data security requirements while creating, improving and operationalizing these integrated and reusable data pipelines.

Primary Responsibilities:

Sourcing / loading / transforming and storing data: Managed data pipelines consist of a series of stages through which data flows (for example, from data sources or endpoints of acquisition to integration to consumption for specific use cases). These data pipelines have to be created, maintained and optimized as workloads move from development to production for specific use cases. Architecting, creating and maintaining data pipelines will be the primary responsibility of the data engineer.

Drive Automation through effective metadata management:

The data engineer will be responsible for using innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity. The data engineer will also need to assist with renovating the data management infrastructure to drive automation in data integration and management.

Learning and using modern data preparation, integration and AI-enabled metadata management tools and techniques. Tracking data consumption patterns. Performing intelligent sampling and caching.

Recommending and automating — existing and future integration flows.

Data compliance and governance:

It will be the responsibility of the data engineer to ensure that the data users and consumers use the data provisioned to them responsibly through data governance and compliance initiatives. Data engineers should work with data governance teams (and information stewards within these teams)

 

 

 

Job Description:

Education:

A bachelor's degree in computer science, statistics, applied mathematics, data management, information systems, information science or a related quantitative field [or equivalent work experience] is required.

Experience Essential:

5-8 years of work experience in data management disciplines including [data integration, modeling, optimization and data quality], and/or other areas directly relevant to data engineering responsibilities and tasks.

At least three years of experience working in cross-functional teams and collaborating with business stakeholders in support of a departmental and/or multi-departmental data management and analytics initiative.

Strong experience with advanced analytics tools for Object-oriented/object function scripting using languages such as [R, Python, Java, C++, Scala, others].

Strong ability to design, build and manage data pipelines for data structures encompassing data transformation, data models, schemas, metadata and workload management. The ability to work with both IT and business in integrating analytics and data science output into business processes and workflows.

Strong experience with popular database programming languages including [SQL, PL/SQL, others] for relational databases and certifications on upcoming [NoSQL/Hadoop oriented databases like MongoDB, Cassandra, others] for non-relational databases.

Strong experience in working with large, heterogeneous datasets in building and optimizing data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies. These should include [ETL/ELT, data replication/CDC, message-oriented data movement, API design and access] and upcoming data ingestion and integration technologies such as [stream data integration, CEP and data virtualization].

Strong experience in working with SQL on Hadoop tools and technologies including [HIVE, Impala, Presto, others] from an open source perspective and [Hortonworks Data Flow (HDF), Dremio, Informatica, Talend, others] from a commercial vendor perspective.

Basic experience in working with [data governance/data quality] and [data security] teams and specifically [information stewards] and [privacy and security officers] in moving data pipelines into production with appropriate data quality, governance and security standards and certification. Preferred:

A master's degree or PHD in computer science, statistics, applied mathematics, data management, information systems, information science or a related quantitative field [or equivalent work experience] is required.

Strong experience in working with both open-source and commercial message queuing technologies [such as Kafka, JMS, Azure Service Bus, Amazon Simple queuing Service, others], stream data integration technologies such as [Apache Job Description Nifi, Apache Beam, Apache Kafka Streams, Amazon Kinesis, others] and stream analytics technologies such as [Apache Kafka KSQL Apache Spark Streaming Apache Samza, others].

. Demonstrated ability to work across multiple deployment environments including [cloud, on-premises and hybrid], multiple operating systems and through containerization techniques such as [Docker, Kubernetes, AWS Elastic Container Service and others].

Basic experience working with popular data discovery, analytics and BI software tools like [Tableau, Qlik, PowerBI and others] for semantic-layer-based data discovery.

Strong experience in working with data science teams in refining and optimizing data science and machine learning models and algorithms.

Ideally, the candidates are adept in agile methodologies and well-versed in applying DevOps/MLOps methods to the construction of ML and data science pipelines.

Knowledge of industry standard BA tools, including Cognos, QlikView, Business Objects, and other tools that could be used for enterprise solutions

Should exhibit superior presentation skills, including storytelling and other techniques to guide and inspire and explain analytics capabilities and techniques to the organization.

 

 

 

Relevant Job Openings
Front End Developer
Python Developer
Python Developer
Content Writer or Technical Content Writer
Network Administrator or System Administrator
IT Recruiter