What you will do
- Perform real-time proactive monitoring of servers, databases and network devices
- Perform troubleshooting for diverse technologies based on existing procedures (highly diverse infrastructure environments)
- Accurately log incidents within a ticketing system, documenting symptoms and perform event rating
- Update assigned tasks in a timely manner; escalate as required to management, vendors and clients.
- Monitors multiple logs across diverse platforms to uncover specific activities as they occur from platform to platform.
- Ensure proper escalation during outages or periods of degraded system performance, problem resolution and customer satisfaction.
- Interact with systems engineers, developers and other personnel to quickly troubleshoot, triage and resolve issues
- Develop and maintain positive communications between Global Service Desk, Site Services and IT Infrastructure teams.
- Manage the intake and recording of information to ensure that it is accurate. Continuously evaluate the services provided and capture this information in the department’s reports and metrics.
- Create and implement documentation and procedures to for alerting and escalation.
Required knowledge & skills
- Fundamental understanding of Windows & Linux Operating System (Server side); for example: understands job control, soft and hard links, and distinctions between the kernel and user space.
- Willingness to learn new things and technologies (middleware, network, storage)
- Good spoken and written English.
- Experience using an enterprise ticketing system (SCCD/Service Now/Remedy/JIRA etc.)
- Strong interpersonal and communication skills; capable of training users in applications, operating system fundamentals, and writing basic documentation
- Understanding of ITIL process with appropriate fundamental accreditation
- Excellent follow up and time control skills
- Ability to work effectively with team members, multiple departments and vendors
- Should be able to mentor junior member of the team.
- Should act as a role model
- Ability to conceptualize problems
- Extremely flexible if workflow is changing
- Communicative competence (pro-active)
- Team player
- Self motivation
Nice to have
- Ability to assess the impact of different network problems
- Experience with various monitoring and alert systems and the operating procedures to produce optimal results.
- Previous experience with Fault and Performance Management tools (Nagios, Zenoss, Cacti, MRTG, IBM Tivoli Monitoring, Netcool or equivalent),
- Knowledge of web applications and frameworksnbsp.