Roles and Responsibilities
Key Objectives/Responsibilities:
Understand the data source and data at source. Understand the data requirements at stage and master from business and plan the strategy and implementation to ETL or ELT platform to bring and transform the data from source to stage and master
Data cleaning, mining and enrichment
Create data pipeline and data lake considering the multiple business and visualization options
Collaboration with other team members (Business product owners, technical product owners, Architect, Data science team) to work in coherence
Pro-active to bring the risks & resolutions upfront
Creation of backlog, epics, story and Spikes in pro-active manner after analyzing the future roadmap and the implementations
Working experience into Agile and scrum methodologies
Should we flexible to solve algorithmic & technical challenges out of the comfort area
Mandatory Skillset & Tools:ETL tools (Preferred on Azure, informatica, Pentaho), Spark 2.0, Spark streaming, Hue, Spark Hive, Spark SQL, SQL, Hive, Hadoop (Apache, Hortonworks, Cassandra),Hbase, Apache YARN, Presto, Hadoop, Spark logic to write in (java orscala), configuration and fine tune the Spark engine, Azure ETL, SQL, Hive,Mysql, MSSqlserver, MongoDB, NOSQL, Java 1.8 and above, Scala, Apachekafka, Microservices,RESTFulweb services, GIT
Primary Skill:Java 1.8 and above, Scala, Spark jobs, Spark tuning,RESTFulweb services, Microservices, Docker, Docker hub, Kubernetes, GIT, Basic shell commands, Linux