Primary Skills:hadoop HDFS Spark Pyspark Apache Airflow Big data
Secondary Skills:Data engineering Apache Hadoop Apache HBase Agile Data visualization
Job Location:
Irving, Texas
Posted Date:
Posted today
Job Description
Candidates should possess strong knowledge and interest across big data technologies and have a background in data engineering.
Build data pipeline frameworks to automate high-volume and real-time data delivery for our Spark and streaming data hub • Transform complex analytical models in scalable, production-ready solutions • Provide support and enhancements for an advanced anomaly detection machine learning platform • Continuously integrate and ship code into our cloud production environments • Develop cloud based applications from the ground up using a modern technology stack • Work directly with Product Owners and customers to deliver data products in a collaborative and agile environment
Skills:
At least 4 years of experience in the following Big Data frameworks: File Format (Parquet, AVRO, ORC), Resource Management, Distributed Processing and RDBMS • At least 4 years of developing applications with Monitoring, Build Tools, Version Control, Unit Test, TDD, Change Management to support DevOps • At least 2 years of experience with SQL and Shell Scripting experience • Experience of designing, building, and deploying production-level data pipelines using tools from Hadoop stack (HDFS, Hive, Spark, HBase, Kafka, NiFi, Oozie, Apache Beam, Apache Airflow etc). • Experience with Spark programming (pyspark or scala or java). • Experience troubleshooting JVM-related issues. • Experience and strategies to deal with mutable data in Hadoop. • Familiarity with Spark Structure Streaming and/or Kafka Streams. • Familiarity with machine learning implementation using PySpark. • Experience in data visualization tools like Cognos, Arcadia, Tableau. • Experience in Ab Initio technologies including, but not limited to Ab Initio graph development, EME, Co-Op, BRE, Continuous flow)