Job Title : Hadoop Admin
Job Location : Bellevue, Washington, United States (Remote as of now)
Job Description : Hadoop Admin:
Roles & Responsibilities:
• Responsible for managing a large Hadoop Cluster & 100% Components Availability.
a) HDFS - Check Name Node UI for under replicated / corrupted blocks) & Data Nodes availability;
b) Yarn - Resource Manager availability & Node Manager availability;
c) Storm - Supervisor availability & Nimbus availability;
d) Hbase - Hbase Master & Region Servers availability, Phoenix Servers;
e) Hive Server 2 availability, 2. Check Jstat & Heap size observation to determine the response time & health)
g) Zookeeper / Journal node
j) Spark - Kafka;
• Cluster maintenance, including addition and removal of nodes.
• Addition of nodes, installation of services for the new nodes.
• Performance tuning (for eg. Yarn is slow, Tez jobs are slow, Slow Data loading) & maintain platform integrity.
• Industry best practices & recommendations review and roll out as appropriate
• Managing the alerts on the Ambari page & take corrective & preventive actions
• HDFS Disk space management
• HDFS Disk Utilization. Weekly utilization report for Capacity planning.
• User access management. Setup new Hadoop users.
• Manage & maintain layered access through Authentication, Authorization, and Auditing.
• Addition and maintenance of user access for both new and existing users.
• Maintain & manage High Availability.
• Manage permissions and roll over Ranger KMS keys.
• Monitor the Automated Audit forwarding Job.
• Audit Log Clean up as directed by security information and event management (SIEM) system.
• Management and coordination of trouble tickets related to Hadoop with HortonWorks
• New switch configuration on the FTP servers.
• Setting up the folders & Permission on the FTP servers
• Monitor & Manage file transfer from FTP & writes onto HDFS.
• Monitor & Manage data transfer from HDFS to Hive through RabbitMQ using Storm Processing.
• Monitor the Dashboard to ensure Data Loading completion before Aggregation kick off.
• Point of Contact for Vendor escalation
• Familiarity with open source configuration management and deployment tools such as Puppet or Chef and Linux scripting.
l) Responsible for data ingestion & loading, extraction. Any slowness to be addressed
• Good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks.
• Experience in managing the HortonWorks Distribution.
• Hadoop skills like HBase, Hive, Pig, Mahout, etc.
• Experience in deploying Hadoop cluster, add /remove nodes, keeping track of jobs, monitoring critical parts of the cluster.
• Good knowledge of Linux as Hadoop runs on Linux.
• Knowledge of Troubleshooting Core Java Applications is a plus.