Awesome Open Source
Awesome Open Source

sysadmin-reading-list

GitHub stars Awesomebot link checking GitHub last commit (branch)

A reading list for the larval stage sysadmin/SRE. This list is focused on the UNIX family of OSes, but PRs about other OSes are welcome.

Table of Contents

So you've got your first sysadmin/sre job or internship. Congratulations, it's going to be an interesting ride.

Articles and Books to Read

  • A Few Ops Lessions We All Learn the Hard Way - A collection of lessons that everyone in ops and SRE inevitably learns. You may not personally experience all of them, but they'll ring true after you're in ops for a while.
  • Clean Code - Every year, countless hours and significant resources are lost because of poorly written code. But it doesn't have to be that way. Martin has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code "on the fly" into a book that will instill within you the values of a software craftsman and make you a better programmer-but only if you work at it.
  • Continuous Delivery - A book that has rapidly become the guide to planning and implementing build pipelines.
  • DevOps Roadmap - Community driven, articles, resources, guides, interview questions, quizzes for DevOps. Learn to become a modern DevOps engineer by following the steps, skills, resources and guides listed in this roadmap.
  • Effective DevOps - A practical guide for creating affinity among teams and promoting efficient tool usage in your company.
  • Git Magic (free ebook) - git is a version control Swiss army knife. A reliable versatile multipurpose revision control tool whose extraordinary flexibility makes it tricky to learn, let alone master.
  • Hello DNS - Every sysadmin/sre needs to know how DNS works. Start with DNS Basics it's a good introduction.
  • Lean Startup or Lean Enterprise - This pair describes the process surrounding implementation and use of Lean principles in Startup and Enterprise organizations. There are a number of companion pieces that extend the principles to specific fields of study and implementation, such as Lean Analytics.
  • LinkedIn's School of SRE - Linkedin is using this curriculum for onboarding non-traditional hires and new college grads into the SRE role.
  • Site Reliability Engineering - How Google Runs Production Systems. This can be read online for free at Google's SRE site.
  • Systems Performance: Enterprise and Cloud by Brendan Gregg, this book is an award winner and a favorite of many a sysadmin and SRE, it addresses systems performance at scale.
  • The Art of Capacity Planning - John Allspaw's book is a hands-on and practical guide to planning for such growth, with many techniques and considerations to help you plan, deploy, and manage web application infrastructure.
  • The Art of Monitoring - James Turnbull's book on the art of modern application and infrastructure monitoring and metrics.
  • The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations - The results of a multi-user case study on DevOps and the practical-oriented sequel to The Phoenix Project.
  • The Goal - A foundational novel on the Theory of Constraints and many other operational concerns.
  • The Phoenix Project - know why your projects are important to the business.
  • The Practice of Cloud System Administration, by Tom Limoncelli. Focuses on distributed or cloud computing and brings a DevOps/SRE sensibility to the practice of system administration. Includes case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises.
  • Time Management for System Administrators, by Tom Limoncelli. You're going to be pulled in a dozen different directions, if you can't manage your time you and your job performance are going to suffer.
  • UNIX-Linux-System-Administration-Handbook by Evi Nemeth is a great book, this book is targeted to larger system deployments and real world large systems.
  • Web Operations: Keeping the Data on Time - A collection of essays and interviews, with web veterans such as Theo Schlossnagle, Baron Schwartz, and Alistair Croll that will teach you strategies for designing your web site to scale up smoothly to web-scale load.
  • Wizard Zines - Julia Evans has a great set of zines she's published about many topics useful to a starting (or even an experienced) sysadmin/SRE.

Languages

The Dev part of DevOps means you're going to inevitably end up writing some code. Here's a list of free programming books for many languages.

Here are some of the scripting languages you're most likely to see in your infrastructure, with links to some good references and tutorials.

Awk

The awk family (awk, gawk, nawk and I'm sure I've missed other implementations) of scripting languages is one of the oldest - the first version of awk was written in 1977, but it's on pretty much any unix (even minimal variants that might not have perl, python or ruby) and is still very useful.

I still use it frequently for pulling columns out of tabular output because by default (and unlike cut where you have to count spaces) it treats consecutive runs of whitespace characters as a delimiter, so for example you can pipe things to awk '{print $3}', but it's Turing-complete - people can and have written complex programs in it.

Here are some good references to get you started:

Bash

bash is objectively a terrible programming language. All variables default to being globals, there is no module system built into the language, dealing with hashes is horrible, and there are other horrors resulting from it trying to be backward compatible with sh.

That said, it is on every system, so every *NIX sysadmin needs to know bash.

Here are some useful resources to help you step up your shell scripting game:

  • The Art of the Command Line - A good set of notes and tips on using the command-line that is useful when working on Linux/Unix.
  • Advancing in the Bash Shell - Sam Rowe's bash as CLI tutorial.
  • Bash Guide For Beginners - A practical guide which, while not always being too serious, tries to give real-life instead of theoretical examples.
  • Bash Guide - Gives examples of good practice when writing bash scripts. It is targeted at beginning users with no advanced knowledge.
  • Bash Pitfalls - Greg Wooledge has a great list of unpleasant surprises in bash.
  • Commandlinefu - An extensive list of bash oneliners for almost every task you may need to accomplish.
  • Google's Shell Style Guide lists what Google's developers consider best practices for bash scripts.
  • Learning the Bash Shell - It's hard to go wrong with an O'Reilly reference on anything, really.
  • Pure Bash Bible - A collection of pure bash alternatives to external processes.
  • Safe Ways to do Things in Bash - An excellent set of tips from the authors of shellharden.
  • shellcheck is a lint for bash. It'll help you find unused variables, deprecated syntax and other things that make your bash scripts less stable. You can install it with apt-get, brew, cabal, or yum.
  • shellharden - is a syntax highlighter and a tool to semi-automate the rewriting of scripts to ShellCheck conformance, mainly focused on quoting.
  • zshelldoc - Documentation generator for Bash & ZSH, with call-trees, comment extraction, etc.

Finally, remember that bash is not sh. If you're writing a script in bash, and testing it with bash, don't use #!/bin/sh as the shebang. Firstly, because bash behaves differently when called as sh, and secondly, not all *NIX systems (and not even all linux distributions) use bash as their /bin/sh any more.

Powershell

Often you'll find yourself in a Windows enviroment, like it or not. These resources might help you in those cases -

Python

Python has much better support for string manipulation and system infrastructure than Bash. In addition, there is a rich library of modules supporting various tasks you can use in your scripts that are just a pip3 install away.

A couple of places to go into as training are:

Python Books

Tutorials @ Python

  • Google's Python Course, an introduction to Python, assuming little programming experience.
  • Python Koans is an interactive tutorial for learning the Python programming language by making tests pass.

Python & Sysadmin

Python & Deployment Utils

Ruby

Ruby also has a rich ecosystem of gems you can use in your programs, and like Python, much better string and data structure manipulation than Bash.

Ruby Books

If you're in a Ruby shop, you'll want these books:

Perl

Perl has a long history of being the system administrator's friend, bringing the best of bash, sed and awk together. It is also suitable for building tools for the system administrator to utilise in their work.

Perl Books

Toolchain

  • carton - A dependency management tool.
  • cpanm - An alternative and very friendly tool for installing modules from CPAN. This pairs well with perlbrew.
  • perlbrew - A tool for managing one or more Perl installations, without needing to modify the system-level Perl.
  • pinto - A tool for managing a private CPAN repository.

Web

  • Mojolicious - A rich framework for doing all things web, from building web services and sites to building HTTP client applications.
  • Plack/PSGI - The Perl implementation of WSGI, with many Plack servers available for use.

Modules

  • Task::Kensho - A list of recommended modules for many purposes, including reading configuration files, connecting to databases, logging, sending email, web crawling and development, and handling XML.

Blog Posts

Tools

Cloud

Multi-Platform

  • Terraform is a tool that allows you to configure your infrastructure as code, just like Chef/Puppet/etc allow you to manage the configuration of individual machines as code, with all the benefits of being able to diff, code review, etc. Terraform works with (as of this edit) AWS, Google Cloud, Microsoft Azure, vSphere and many other systems.

AWS

  • AWSCli provides a unified command line interface to Amazon Web Services. Wean yourself off of the webui if you want to be truly productive.
  • og-aws is an excellent resource to AWS written by and for engineers who use AWS extensively.
  • S3cmd is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage, Backblaze B2 or DreamHost DreamObjects.

Azure

Google Cloud

OpenStack

  • Introduction The OpenStack project's official introductory overview.
  • Installing The list of tools you should consider if you want to install and operate OpenStack yourself.
  • Community Where to go and who to ask for help.
  • Planet OpenStack An aggregated feed from across the Internet of OpenStack-related content, including contributions from individuals.
  • Public Clouds Similar to AWS, GCP or Azure, this is a list of providers who offer cloud services running on OpenStack.
  • Stackalytics Code contribution statistics to OpenStack and related projects.
  • SuperUser SuperUser is an online 'publication' aggregating and editorialising content related to OpenStack and Open Infrastructure.

Configuration Management

Quite simply, if you aren't using configuration management, you're doing it wrong.

You don't want to manually configure any servers - no matter how hard you try, they won't end up truly identical and having meat typing in commands takes far too long per server, doesn't scale, and the manual labor will discourage you from standing up new VMs for testing.

Treating your configuration as something described in text files allows you to treat it like code. You can do pull-requests, get your changes reviewed by your team, view the differences between your configuration at different times, and almost most-importantly, find out who changed the configuration, when, and if they wrote good commit messages, why.

There are several good options:

  • Ansible is designed to be minimal in nature, consistent, secure, and highly reliable. Owned & supported by Red Hat.
  • CFEngine has been in continuous development since 1993. Unlike some of its peers on this list, it is written in C and is built with speed and scalability in mind. It should be considered for very, very large systems and for very small (think embedded) systems.
  • Chef is written in Ruby and Erlang and uses a Ruby DSL to describe system configuration.
  • Chocolatey a Windows software management tool.
  • Puppet makes it easy to automate the provisioning, configuration and ongoing management of your machines and the software running on them. Make rapid, repeatable changes and automatically enforce the consistency of systems and devices across physical and virtual machines, on premise or in the cloud.
  • Salt orchestrates the build and ongoing management of your infrastructure.

Container Tooling

Containers package software and all its dependencies in a single package that can be run in isolation from other containers or applications running on the server, without the overhead of a full virtual machine.

Containerd

Containerd is an industry-standard container runtime with an emphasis on simplicity, robustness and portability. It is available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system: image transfer and storage, container execution and supervision, low-level storage and network attachments, etc.

Installing and Learning Containerd

Follow the installation instructions for your preferred platform (Currently, only Linux and Windows are directly supported) and start learning how to use Containerd:

On macOS, you can use Lima, which launches Linux virtual machines with automatic file sharing, port forwarding, and containerd installed. You can use the lima xbar plugin for a simple menubar application to control your Lima VMs.

Docker

Docker is a tool for running and managing containers. Containers are rapidly growing in popularity for local development (as an alternative to virtual machines), and can also run software in production with tools like Kubernetes or Amazon ECS.

Installing Docker

Follow the installation instructions for your preferred platform:

Learning Docker
  • The Docker Book - An excellent resource for getting started with Docker. This book is quick & easy to read.

Kubernetes

Kubernetes is a portable open-source container orchestration system used to automate deployment, scaling, and management of containerized applications.

Tutorials

There are many good tutorials at kubernetes.io. I recommend you start with either the minikube walkthrough since it will get you a running test cluster quickly, or enable the kubernetes cluster option in Docker Desktop.

VMWare sponsors a free set of online Kubernetes courses at https://kube.academy/courses.

If you want to understand everything that is involved in getting a Kubernetes cluster up and running, Kubernetes the Hard Way by Kelsey Hightower is a must-read.

Have you ever wondered exactly what happens when you type something like kubectl run nginx --image=nginx --replicas=3 to make everything happen? What happens when K8s... is an in-depth guide that leads you through the full lifecycle of a request from the client to the kubelet, linking off to the source code where necessary to illustrate what's going on.

Utilities

  • krew - Makes it easy to use kubectl plugins. krew helps you discover plugins, install and manage them on your machine. It is similar to tools like apt, dnf or brew. Today, over 70 kubectl plugins are available on krew.
  • kubectx - Provides the kubectx command, which makes it easy to switch between clusters specified in your .kube/config, and kubens, which helps you switch between Kubernetes namespaces smoothly.

Monitoring

There are several good projects for monitoring.

  • Grafana - Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.
  • OSquery for Windows, linux, macOS, and FreeBSD - Use SQL queries to look into items such as installed programs, running processes, and other events for inventory and monitoring.
  • Prometheus - Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company.

Articles/Tutorials

Impactful Dashboards - It's easy to make monitoring dashboards that are a jumble of poorly presented information, this article gives guidelines on making good dashboards.

JSON parsing with jq

Many of the tools you're going to use have JSON output options. Trying to parse JSON with grep or awk is a world of pain, fortunately there is jq, a lightweight JSON processor you can use to slice out useful bits of the output for use in scripts similarly to how you can use awk or sed on text files.

Regular Expressions

Inevitably you're going to find yourself in a situation where you have to look at logs to see what's going wrong with a service. When it's a multi-gigabyte logfile, that can be extremely painful.

Enter regexes and the grep family of tools.

When you have a multi-gigabyte logfile, it's a lot less painful to look at just the entries generated by the service that you got alerted about. Even better to only look at the error messages from the service, and something as basic as grep -i yourservice < log | grep -i errorcode can convert a potentially multi-hour ordeal into a quick minute or two task.

  • autoregex - This site will let you paste a regex in and have it translate it to English, or make an English statement like "First character A, second character B, up to three B characters, then a C and end of line" and have that translated to ^A.{0,3}BC$.
  • debuggex.com will visualize regular expressions graphically.
  • Introducing Regular Expressions - Michael Fitzgerald's O'Reilly Book is a good place to start.
  • Regex for Noobs - An illustrated guide to regex that aims to provide a gentle introduction for people who never have fiddled with regex, want to, but are kind of intimidated by the whole thing.
  • Regular Expressions Cookbook

Sed & Awk

  • Sed and Awk Pocket Reference presents a concise summary of regular expressions and pattern matching, and summaries of sed and awk and how to use them to edit files and convert data from one format to another.

Serverless

Serverless doesn't mean no sysadmins, even though there aren't instances to administer. We need to change common processes that we rely on to monitor and manage services that run on serverless platforms. There are not system level metrics to understand how our application is working.

Here are a few resources to help:

Source control

No matter what source control system you use (git, hg, perforce, whatever), you're going to have to write commit messages. Make them good. It may be obvious today why you made the change, but in six months or a year you won't have that context.

  • Explain why you made the change, not just what you changed. And no, the diff is not an explanation.
  • Start your commit messages with a single line that explains what you were trying to do in general
  • Go into more detail about your changes in the message body. Talk about what you intend the change to do and why more than how you did it. If there's an issue or ticket number, include that in your commit message too, it'll give more context to your coworkers (or you in a year).

Good commit messages help the rest of your team understand what you're trying to do and make it easier for them to find logic errors in your pull requests - the code may be technically correct, but if they understand what you're trying to do, they can see when your code isn't actually doing what you say you want it to do, even when it is syntactically correct.

Here are a few articles that while focused on git commit messages apply to any source control system:

Git

Whether or not your shop uses git internally, you're going to end up needing to use it for the many useful things on GitHub and GitLab.

SSH

Testing

Testing is incredibly important and you should undertake this for your infrastructure as well as your applications.

Test Harnesses

  • Test Kitchen https://kitchen.ci - Test your configuration management tooling. Test kitchen was originally written to test chef cookbooks, but can be used for other configuration management systems as well.

Text Editors

Don't get involved in the Editor Wars. Just. Don't. Your choice of tool does not need defending. Nor does anyone else's choice.

However, you should care about your tools. You should be able to use them efficiently.

Vim

vim is a reality of life for SysAdmins. It is the one editor you can be sure is installed in even the most minimal *NIX or linux install. You must be able to do at least basic edits with it. You don't need to love it, but you will have to use it.

Emacs

Emacs is an extremely extensible editor. In jest, it is frequently referred to as an operating system with a half-decent editor.

If you want to get a taste of what emacs can do, you can defer to Magnars and his excellent video tutorials/demos:

One of the biggest problems with emacs is that the defaults present a fairly different experience to what people are used to in other editors. Your first stop should be learning the basics using the built-in tutorial, followed by the mini-manual from tuhdo:

-Type ctrl-h, followed closely by t from within emacs to see the tutorial http://tuhdo.github.io/index.html

Emacs can be can be made to look and act relatively modern if that's your desire:

If you're looking for emacs packages, the following online package index is the most popular, and tracks many:

There are several excellent starter kits out there, with varying delineations of wizz-bang. Here are some starter kits, with spacemacs being the most popular:

Here are some emacs configurations for inspiration:

Visual Editors and IDEs

Use tools with which you are productive. If you want to use a GUI Text Editor or IDE, don't let anyone give you a hard time about that.

There are GUI versions of vim and emacs that have ardent followers.

  • Sublime Text is another editor with an extensive plugin ecosystem and arguably one of the inspirations for Atom.
  • Visual Studio Code is a cross platform editor that is gaining traction in the community.

Blogs and Podcasts

  • Arrested Devops - hosted by Matt Stratton, Trevor Hess, and Bridget Kromhout. ADO is the podcast that helps you achieve understanding, develop good practices, and operate your team and organization for maximum DevOps awesomeness.
  • Code as Craft - Etsy's ops blog and is full of well written examples of dealing with real-world problems at scale.
  • Corecursive - Each episode someone shares the fascinating story behind a piece of software being built.
  • DevOps'ish - A weekly newsletter assembled by open source contributor, DevOps leader, and Cloud Native Computing Foundation (CNCF) Ambassador Chris Short.
  • Hey, Scripting Guy! Blog is a blog that answers common (and some uncommon) PowerShell queries.
  • Julia Evans' Blog - Julia writes a great blog where she dives into interesting ops topics and explains them clearly.
  • Kitchen Soap - John Alspaw is the CTO at Etsy and writes a great blog about web operations and operating at scale and other things that are interesting to ops types.
  • Last Week in AWS - Corey Quinn's weekly newsletter about the latest goings-on in the world of AWS.
  • Last Week in Kubernetes Development - Weekly newsletter summarizing code activity in the Kubernetes project: merges, PRs, deprecations, version updates, release schedules, and the weekly community meeting.
  • Monitoring Weekly - Weekly compilation of curated articles, news and tools related to monitoring.
  • On the Metal - Bryan Cantrill and Jessie Frazelle host a podcast about all sorts of interesting aspects of computing.
  • PowerScripting Podcast - hosted by Jon Walz and Hal Rottenberg.
  • SRE Weekly - SRE Weekly is a newsletter devoted to everything related to keeping a site or service available as consistently as possible.

Online Communities

  • DevOpsChat Slack is another community of DevOps minded folk with a diverse set of topic specific chat rooms. Home to Arrested DevOps.
  • Hangops Slack is a community of DevOps minded folk with many subject focused chat rooms.
  • PowerShell Slack is a community of PowerShell enthusiasts and Windows centric DevOps topics.

Windows Administration

Help wanted here.

Other Resources

Packetlife has some great cheat sheets and posters here for a lot of applications (wireshark and tcpdump for example) and networking principles. Well worth a look, even if you think you know the apps in question.

Free Services

  • Free-for-Dev is a list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev.

Miscellanea

  • awesome-scalability - An organized reading list for illustrating the patterns behind scalable, reliable, and performant large-scale systems.
  • awesome-sre - A curated list of awesome Site Reliability and Production Engineering resources.
  • awesome-sysadmin - A curated list of awesome open-source sysadmin resources.
  • devops-exercises - Questions and exercises on various technical topics, sometimes related to DevOps and SRE.
  • devops-resources - Another repository of useful resources and information about DevOps.
  • nohello - Why you shouldn't just say 'Hello' when you chat with someone. Make it easier for them to help you.
  • oncall-handbook - Alice Goldfuss' excellent oncall handbook, read this before your first oncall shift.
  • sre-interview - A collection of questions to practice for interviews.
  • stack-on-a-budget - A list of free/cheap tiers of services that you can use to learn the various cloud-based systems.
  • sysadvent - Every year the sysadvent team publishes 24 good articles for sysadmins and SREs.
  • tools.tldr.run - A curated list of security tools for Hackers and Builders.
  • Etsy's Debriefing Facilitation Guide is a great guide to conducting a blame-free debrief after an outage.
  • Pteris ikiforovs has a good blog post explaining what everything you see in top/htop output here.
  • Dan Luu wrote an excellent article about the Normalization of Deviance that is good food for thought about engineering practices.
  • Donne Martin maintains a great System Design Primer.

Career

Communication

Writing good documentation and design docs is as important as writing code. The more senior you are, the more writing you're going to have to do - communication skills are a must.

Finance/Salary

  • Patrick McKenzie wrote a great blog post on salary negotiation. Salary negotiation is one of the few times in your life where a five minute conversation can earn you (or cost you!) thousands of dollars - be prepared.
  • Patrick also has a good podcast episode on salary negotiation - Kalzumeus Podcast Episode 12: Salary Negotiation with Josh Doody (there's a transcript too). You have to do it, it affects your life, you should do it well.
  • The Fearless Salary Negotiation site is a good read overall, especially the article on handling a Salary Expectations Interview Question when the recruiter asks. To quote them, "Your salary expectations are one of the few things you know that the company doesnt. That makes them extremely valuable and sharing them can make your salary negotiations very difficult and even cost you a lot of money." so read (at least) this article before you start your next job interview cycle.
  • An Engineer's Guide to Stock Options - Alex McCaw wrote a good blog post explaining stock options in plain English.
  • Equity 101 by Gergely Orosz is a good summary of the most common equity compensation variants.
  • How We Can Fix Startup Stock Options is a good post by Pete Cheslock on optimizing the tax implications of your stock options.
  • The Holloway Guide to Equity Compensation - Stock options, RSUs, job offers, and taxes a detailed reference, including hundreds of resources, explained from the ground up.
  • Understanding Startup Stock Options - Ben Beltzer explains when you should exercise, how to get paid out, how much you'll make, and how much tax you'll probably have to pay (get advice from your own accountant, don't rely on a blog post).
  • What I Wish I'd Known About Equity Before Joining A Unicorn - This is an excellent (though USA-centric) summary of how to value stock options and what the tax implications are and how to minimize potential tax. I heartily recommend reading it before you accept any offers involving stock or stock options as part of your compensation.

License

This repository is copyright 2017-2021 Joseph Block under a Attribution-NonCommercial-ShareAlike 4.0 International license.

Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Aws (38,646
Cloud (29,159
Book (20,219
Azure (17,868
Gcp (5,455
Lean (2,023
Reading List (256
Gcloud (160
Devoops (8
Linux Administration (8