Site Reliability Engineer

Information Technology
Position Type
Full Time
Location : Location

About Blackhawk Network:

At Blackhawk Network, we shape the future of global branded payments through the prepaid products, technologies, and networks that connect brands and people. Our collaborative innovation and scalable, security-minded solutions help our partners to increase reach, loyalty, and revenue. We believe our future holds great things for Blackhawk Network and its partners. We believe that together, we can shape the future. Our beliefs? Win as one team, be innovative, global excellence and be inspiring!

So, what are you waiting for? Shape your career and join our global network.


Blackhawk Network is building a digital platform and products that bring people and brands together.  We facilitate cross channel payments via cash-in, cash-out and mobile payments. By leveraging blockchain, smart contracts, serverless technology, real time payment systems, we are unlocking the next million users through innovation.


Our employees are our biggest assets!  Come find out how we engage, with the biggest  brands in the world.  We look for people who collaborate, who are inspirational, who have passion that can make a difference by working as a team while striving for global excellence.


For given application platforms, the Site Reliability Engineer (SRE) conducts root causes analyses (RCA) and recommends engineering changes to improve site reliability. In addition, he/she works with a cross-functional group of leaders and individual contributors to analyze incident data, provide expert guidance and lead solutions for proactive detection and prevention. 





  • Design and provide guidance to engineers around monitoring and improving observability for scalable, high performance software designs in on-prem or cloud environments.
  • Drives sustainable solutions across functions with minimum or no supervision.
  • Works with design engineering to propose architectural changes, and foster communication between different organizational units.
  • Sets and/or adjusts configuration changes to production systems, as well as standardization of trouble shooting procedures.
  • Standardizes and maintains internal technical documentation (e.g. Wiki, run-books) and improves technical situational      awareness (i.e. implement and monitoring of key metrics).
  • Continuously improves his/her specialist knowledge, and takes on new areas to support the team.
  • Identifies “areas of interest” for additional investigation to improve long-term availability and integrity.
  • Works directly with the Operations Control Center to define best practices in regards to escalations and minimizing Mean Time To Detect and Mean Time To Resolution. 





    • Bachelor’s degree in Computer Science or Information Technology Management, or equivalent degree & work experience necessary.
    • 1-3 years experience in either Windows or Unix systems administration and / or Networking.
    • Knowledge in Unix systems analytics and performance management
    • Works with peers within the Site Reliability team to analyze performance metrics, incident data, and system logs to deliver informed corrective actions for engineers and infrastructure administrators.
    • Develop best practices around monitoring or performance and availability through new or existing tools.
    • Proven trouble shooting skills including the ability to execute Root Cause Analysis of recent incidents.
    • Partner with development teams to research and provide proof of concept support for new and emerging technologies.
    • Experience with payment systems is ideal-- Gift Card, PrePaid, Credit Card Acquiring and Issuing/systems a plus.
    • Must have an analytical mind-set, natural curiosity, initiative and the willingness to go "beyond” in learning and applying new knowledge for systems monitoring and fast recovery.
    • Must demonstrate a code of ethics and maintain the highest standards of confidentiality in dealing with sensitive data and proprietary information.
    • Must maintain security awareness and support efforts for maintaining network and system security measures.
    • Excellent written and verbal communication skills are required.
    • Ability to work collaboratively to convince others in their field of expertise is expected.
    • Must be available for on call duty as well as for application support during off-hours, as needed.



    • Hands on experience with one of the following: Shell, PHP, Python, JavaScript, Go or similar is a plus
    • Strong understanding of key AWS technologies including RDS, S3, Glacier, EC2, ELB’s, auto scaling groups and Route53.





Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed