Roles and Responsibilities
- 3+ years of experience with big data processing frameworks like Hadoop, Spark, Kafka, etc.
- 3+ years of strong hands on experience in configuring and maintaining AWS cloud or GCP cloud services: EC2, EMR, RDS, Redshift, Dataproc, BigQuery
- 3+ years of experience in handling, analysing, and transforming large, disconnected datasets using Python and Pyspark.
- Experience with data pipeline and workflow management tool like Airflow
- Experience with stream-processing systems: Storm, Spark-Streaming, etc.
- Experience building and optimizing big data data pipelines, architectures, and data sets.
- 3+ years of experience of working in data warehousing projects and must have good understanding of dimensional modelling. Prior working experience as an ETL developer is preferred.
- 5+ years of advanced SQL experience working with relational databases like Oracle and MS SQL Server.
- Strong analytic skills related to working with unstructured datasets.
- Build processes supporting data transformation, data structures, metadata, dependency, and workload management.
- Working knowledge of message queuing, stream processing, and highly scalable big data data stores.
- Excellent knowledge of Linux, AIX, or other Unix flavors
- Deep understanding of Hadoop and Spark cluster security, networking connectivity and IO throughput along with other factors that affect distributed system performance
Desired Candidate Profile
Data Engineering, in either GCP or AWS, proficient in PySpark and Big data tools, SQL programming
Data Engineering, in either GCP or AWS, proficient in Python, SQL programming
Data Engineering, BigQuery and strong in Data
Mandatory Skills
Spark, python, SQL, GCP /AWS, Big data, BigQuery, Data engineering, Data Warehousing