Graduate degree in Computer Science, Information Systems or equivalent quantitative field and 5+ years of experience.
Experience working with and extracting value from large, disconnected and/or unstructured datasets.
Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities.
Demonstrated ability to build processes that support data transformation, data structures, metadata, dependency and workload management.
Strong interpersonal skills and ability to project manage and work with cross-functional teams.
Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
Experience building and optimizing big data data pipelines, architectures and data sets.
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Experience with integration of data from multiple data sources
Experience with the following tools and technologies:
Hadoop, Spark, Kafka,
Data ingestion [Apache NiFi]
Big Data querying tools, such as Pig, Hive, Impala, YARN and HDFS
Relational SQL and NoSQL databases
Change Data Capture tools/technologies.
Data pipeline/workflow management tools such as Azkaban and Airflow
AWS cloud services such as EC2, EMR, RDS and Redshift
Stream-processing systems such as Storm and Spark-Streaming
API integration with bigdata
Object-oriented/object function scripting languages such as Python, Java, C++, etc.