• Staff Site Reliability Engineer

    Job Location US-CA-Pleasanton
    ID
    2018-9482
    Category
    Engineering
    Position Type
    Full-Time
  • About Blackhawk Network:

    Blackhawk Network Holdings, Inc. is a global financial technology company and a leader in connecting brands and people through branded value solutions. Blackhawk platforms and solutions enable the management of stored value products, promotions and rewards programs in retail, ecommerce, financial services and mobile wallets. Blackhawk’s Hawk Commerce division offers technology solutions to businesses and direct to consumers. The Hawk Incentives division offers enterprise, SMB and reseller partners an array of platforms and branded value products to incent and reward consumers, employees and sales channels. Headquartered in Pleasanton, Calif., Blackhawk operates in 26 countries. For more information, please visit blackhawknetwork.com, cashstar.comhawkcommerce.comhawkincentives.com or our product websites GiftCards.comgiftcardmall.comGiftCardLab.com and OmniCard.com.

    Overview:

    For given application platforms, the Staff Site Reliability Engineer (SRE) conducts root causes analyses (RCA) and recommends engineering changes to improve site reliability. In addition, he/she addresses technical escalations that cannot be handled by L2 support.  Furthermore, he/she develops and monitors core application metrics that support preventative maintenance as well as analytics.

    Responsibilities:

    • The SRE is expected to drive sustainable solutions across functions with minimum or no supervision.
    • Directs RCA and gauges systems status against established base line.
    • Working with design engineering, propose architectural changes, and fosters communication between different organizational units.
    • Mentors and trains junior administrators and Operations Control Center personnel.
    • The Sr. SRE provides hands-on leadership during service impacting events and technical escalations (e.g. analysis / trouble shooting of systems and servers).  This also includes configuration changes to production systems, as well as standardization of trouble shooting procedures.
    • The SRE standardizes and maintains internal technical documentation (e.g. Wiki, run-books) and improves technical situational awareness (i.e. implement and monitoring of key metrics).
    • The SRE is continuously improving his/her specialist knowledge, and takes on new areas to support the team. The SRE also educates his/her co-workers, and serves as a subject matter expert in his/her field of specialty for the organization.
    • Sr. SRE identifies “areas of interest” for additional investigation to improve long-term availability and integrity.
    • Oversees and drives Root Cause Analyses and Corrective Actions to improve site availability and integrity.
    • Create backlog for IT to solve it they are graded on sprint planning,

    Qualifications:

    • Bachelor’s degree in Computer Sciences preferred, or equivalent degree & work experience necessa
    • Strong experience in either Unix systems administration and or Networking:
    • 7 years plus experience in independently managing Linux systems
    • Knowledge in Unix systems analytics and performance management
    • Cloud Experience with AWS and GCP
    • Experience with  1 or more Monitoring, Analytic, and Reporting Systems: New Relic, Zabbix, Splunk, ExtraHop, ThousandEyes
    • Proven trouble shooting skills including the ability to execute Root Cause Analysis
    • Experience in managing distributed services (e.g. high performance nfs, LDAP, dns)
    • Exposure to management of complex network equipment (e.g. load balancers or firewalls)
    • Operational experience is required in support of production networks
    • Identifying code and design pattern errors
    • Experience with payment systems is preferred, Gift Card, PrePaid, Credit Card Acquiring and Issuing/systems a plus.
    • Operational knowledge of Software Revision tracking and Release Management desired.
    • This role requires the right candidate to demonstrate an analytical mind-set, natural curiosity, initiative and the willingness to go "beyond” in determining trigger events and root cause.
    • Excellent written and verbal communication skills are required.
    • Ability to work collaboratively to convince others in their field of expertise is expected.
    • Applied practical experience with Systems Thinking (ability to analyze systems and its components), as well as analytical methods such as FMEA are beneficial.
    • The Staff SRE is expected to be available for on call duty as well as for application support during off-hours, as needed.

     #D18

    #GLDR

    #LI-SP1

     

    Blackhawk Network is an Equal Opportunity/Affirmative Action Employer.  Blackhawk Network believes that diversity leads to strength.

     

    Options:

    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed