Menu

Senior Network Observability Engineer

at Marriott in Springfield, Illinois, United States

Job Description

Job Number 24124359

Job Category Information Technology

Location Marriott International HQ, 7750 Wisconsin Avenue, Bethesda, Maryland, United States

Schedule Full-Time

Located Remotely? Y

Relocation? N

Position Type Management

JOB SUMMARY

The Senior Network Observability Engineer of Network Site Reliability Engineering (SRE) is the subject matter expert in designing and implementing the Network monitoring strategy and platforms for the next-gen operations for all Marriott International (MI) networks including the Property Networks, Datacenter/Cloud Networks, and Corporate Networks. This role will work closely and collaboratively with a matrix team of expert network architects and engineers to drive adoption of SRE practices and operating models across all network product towers, and around globe. As the Senior Network Observability Engineer, the candidate brings focus on proactive monitoring, leveraging telemetry data to detect anomalies, identify potential issues before they impact users, and enable faster incident response. Reporting to senior leadership, the engineer will be responsible to articulate and effectively execute a vision to address increasing complexity and scale of modern IT systems and overcome traditional monitoring approaches limitations. The successful candidate will be expected to bridge the gap by implementing telemetry strategies and advanced monitoring techniques to understand system behavior at granular level. This role is accountable for establishing technical level relationships and partnering with all business disciplines, other MI teams, outsourcing vendors, and external suppliers to define and implement the Network SRE observability roadmap, building and enforcing the standards defined in it.

CANDIDATE PROFILE

Required Education and Experience

+ Undergraduate degree in an engineering or computer science discipline and/or equivalent

+ experience/certification

+ 7+ years’ experience in information technology including:

+ 6+ years’ experience in collecting, processing, and monitoring telemetry data with a focus on analyzing, troubleshooting, and driving continuous improvements in mission critical networks.

+ 6+ years’ experience with network and application monitoring tools related products

+ Experience in installing, configuring, and troubleshooting of network and application monitoring tools (NetScout, ThousandEyes, Solarwinds/Broadcom DX NetOps, BigPanda and AI/ML based network performance monitoring tools or other similar tools)

+ Experience in developing, documenting, and managing the requirements gathering process and provide detailed design and implementation plan to support the requirements throughout the project life cycle

+ Field experience and knowledge of foundational data networking and IP technologies including (ARP, TCP/IP, UDP, DHCP, DNS, NAT and others)

+ Experience with common routing and switching platforms (Cisco, Juniper, HP/Aruba etc.)

+ Experience with one or more Cloud Computing platforms (e.g. Amazon AWS, Microsoft Azure, Google Compute Engine)

+ Demonstrated experience in delivering written documents detailing network solutions and diagrams

+ Knowledge and experience in NetFlow related configuration practices

+ Must possess expertise in administration of devices and policies in Network Tools.

+ Technical knowledge of common routing protocols (e.g., OSPF, BGP)

+ Experience in Agile methodologies, daily stand-up meetings, sprint planning sessions and user story preparations

Preferred:

+ Advanced Degree (e.g., MS, PhD) in Computer Science or other technical discipline or MBA, preferably with a focus on technology

+ Experience with managing monitoring tools in a hospitality industry a plus

+ Experience in leveraging public APIs for developing automation scripts

+ Team player with the ability to collaborate and work with cross functional teams in multiple time zones

+ Experience in researching emerging technologies and trends, standards, and products and synthesizing into clear technology roadmaps and strategies

+ Strong knowledge of emerging tools, applications, and systems for attaining best-in-class network observability across the enterprise

+ Excellent problem-solving skills working independently and through leading outcomes for cross functional teams

+ Excellent understanding of change management, testing requirements and techniques, to ensure high availability and business readiness of platforms

+ Strong attention to detail with an ability to operate effectively across multiple priorities

+ Ability to perform independently as a member of a team and through cross functional initiatives

+ Proven track record of driving transformation in network technologies, tools, and processes through a data driven continuous improvement methodology

+ Demonstrated experience in improving reliability, performance, and agility of complex enterprise networks

+ Strong understanding of network infrastructure automation, instrumentation, and monitoring platforms and the emerging technologies in this area

+ Strong influencing skills and an ability to overcome barriers while driving change

+ Excellent verbal and written communication skills for a wide range of audiences including executives, business stakeholders, and IT teams

CORE WORK ACTIVITIES

+ Develop complex global distributed infrastructure monitoring, management, and automation solutions to manage our global network.

+ Lead design, write, and build tools to improve the reliability, availability, and scalability of Datacenter/Cloud Networks, Property Networks, and Corporate Networks

+ Serve as technical lead for the development of complex global distributed infrastructure monitoring, management, and automation solutions to manage our global network.

+ Serve as technical lead for the design new tools to monitor and smart alerts that help discover failures or issues before our customers.

+ Collaborate with other Network teams to develop network SRE solutions with a focus on production integration

+ Conduct network analysis, configuration management and develop improvements for system software performance, availability, and reliability

+ Provide program management assistance and contribute input to help manage project schedules, risks, and costs.

+ Manage Network SRE products and solutions, including the design, low level engineering, and delivery of new hardware systems for Marriott applications across the network.

+ Define and implement an operational Recovery Time Objective (RTO) and Recovery Point Objective (RPO) strategy for all Network Infrastructure areas.

+ Establish management level relationships and partnering with all Business disciplines and other MI teams to define Network SRE services, meet service level requirements, and serve as an escalation point to resolve service delivery and operational issues.

+ Develop, document, and manage the requirements gathering process and provide detailed design and business processes to support the requirements throughout the project life cycle

+ Drive accountability with strategic sourcing partners, vendors, telco/ISPs, etc., launching and managing Performance Improvement initiatives where appropriate.

+ Create functional strategies and specific objectives for the sub-function and contributes to development of budgets/policies/procedures to support the functional Network SRE tools, systems, and infrastructure.

+ Perform network troubleshooting and upgrades. Coordinate with local teams and vendors, solve problems and restore services as needed

+ Foster an environment of continuous improvement and structured processes and procedures that support a zero-fault culture.

Maintaining Goals

+ Submits reports in a timely manner, ensuring delivery deadlines are met.

+ Promotes the documentin

To view full details and how to apply, please login or create a Job Seeker account
How to Apply Copy Link

Job Posting: JC262758606

Posted On: Jul 18, 2024

Updated On: Jul 23, 2024

Please Wait ...