The goal of this handbook is to give you clarity on DevOps:
- Understand what’s DevOps (in simple words)
- Know what’s possible with DevOps (in simple goals)
- Get simple “when-to-do-what” DevOps guidelines
I added a bonus at the bottom of the article.
It's a production-ready setup example you could take inspiration from.
Who this article is for
You might be a founder who wishes to get started with DevOps the right way.
You might be a CTO of a 1,000 employees company who wishes to get simple principles.
Or, maybe you’re a Software Engineer, and you want to understand if your company’s DevOps approach is good.
If you’re looking for a simple DevOps playbook, this is it.
Understand the desired result
Two things your company needs to be able to do
- Serve its product to customers
- Build and improve the product
Abilities you need to build, improve, and serve software
- Run experiments and test changes
DevOps has a simple meaning
Developers and Operators have shared responsibility for building and improving the system.
In practice:
- Developers are responsible to “Operate”
- DevOps Engineers are responsible to enable to “Operate” AND do some of it themselves
Operate = provision, monitor, secure, configure, deploy, scale.
Choose a balance: Enabler, Doer, or Automator
The DevOps role will end up as a balance between:
- Enabler: Provides the tools and knowledge to fulfill the DevOps goals
- Doer: Does the tasks that fulfill the DevOps goals
- Automator: Automates any repeating operation
Know what things you should enable, do, or automate
- Provision infrastructure
- Secure the system
- Deploy workloads
- Monitor the system
- Recover from issues
- Scale up or down
- Track & test changes
- Automate processes
Choose the right tools
- Has state management = Saves time automating state-aware processes (e.g., Terraform)
- Has a big community & good docs = Saves time dealing with common issues (e.g., Kubernetes)
- Has multiple interface types: API, CLI, UI = Saves time integrating with the existing system (e.g., Vault)
Set useful goals
There are DevOps goals that adopting them will focus you on the right direction:
- One-Click Environments: makes e2e tests easy and quick
- Atomic Commits: provides confidence that a tested change will work in production
- Separate the Shared & Env-Specific Parts: enables e2e tests as the company scales up
If you want to learn about more useful DevOps goals, feel free to book a free consultation here.
Enablers: Choose the Tools-to-Knowledge Balance
Developers can either have the knowledge or the tools to do something.
- More knowledge-reliance: if you want the developers to contribute to the DevOps efforts
- More tools-reliance: if you want to abstract the operations from the developers
If the balance between the two is not intentional, it’s accidental.
Doers: Have a good reason to do it
- Is it a one-time task?
- Does it teach you how the developers work?
- Are you directly accountable for the results of the task?
If you answered “no” to the above questions, enable or automate it instead.
Doing more = Learning the system's use-cases
Doing too much = Not scalable, too-much knowledge-reliance
Automators: Have a good reason to automate it
- Did it happen before?
- Is it likely to happen again?
- Will automating it take less time than doing it?
- Will automating it teach you an important company process?
If you answered “yes” to 2 out of the 4 questions - automate it!
More automations = Less reliance on knowledge to operate the system.
Too much automations = No system awareness.
P.S. - you can also enable developers to automate it.
Create available DevOps Capacity
The DevOps needs of a company have spikes.
One month you need 2 DevOps Engineers, and half of that the next month.
Switchovers between big efforts and small tasks are common.
This is true, especially for new companies.
Break the assumption: “DevOps tasks must be done by a DevOps Engineer”.
There are 3 types of DevOps capacity
- Non-Flexible: A full-time DevOps Engineer on the team
- Semi-Flexible: Key developers that can contribute to the DevOps goals
- Fully-Flexible: A flexible DevOps Services company or freelancer
You can read more about calculating the DevOps capacity your company needs here.
When to focus on what: Common Dilemmas
When: You work alone, and the system is simple
Focus: On simplifying the development - Dockerize your apps, Create a post-commit pipeline that runs tests
When: You need to be able to create new environments quickly (for development, or for clients)
Focus: On implementing “One-Click Environments”: Using IaC (e.g., Terraform) + Deployment tool (Depends on the platform).
When: You want to e2e test every code modification, but there are many code modifications
Focus: On splitting the “One-Click Env” into a “base” with shared resources, and “env” with env-specific resources
When: You want to unify & standardize how you deploy, monitor, scale, configure, and secure your workloads
Focus: On implementing an orchestrator such as Kubernetes
When: You want you have many moving parts and wish to be certain a tested change will work
Focus: On implementing GitOps and consider a Monorepo (the sooner the better)
When: You want the DevOps efforts to be done by the dev team
Focus: On using “actual” IaC tools (Pulumi Typescript/Python), Full “how to operate” (see above) documentation
Never: - Invest lots of time in new tech without a strong reason
Always:
- Have your code in Git
- Monitor the basic stuff: CPU, Memory, Disk, Network, App Logs, Cloud Costs
- Architect for high-availability
- Test before you deploy
BONUS: An example setup for a CTO approaching Production

2 AWS Accounts
- One for development and staging
- Another for production
Monorepo in Github
- Docker-Compose for local development
2 Infrastructure-as-Code projects: 'base' & 'apps'
- base = shared resources (e.g., VPC, RDS, ECS Cluster, EKS Cluster)
- apps = env-specific resources (e.g., Lambda Functions, ECS Services, Kubernetes Namespaces)
- config file per environment
Github Actions Workflow: Development workflow
- Checkout branch and locally develop + test changes
- Create a Pull Request: Deploys a Pull-Request ‘apps’ environment on the ‘development’ environment ‘base’
- On merge to main: Deploys from the ‘main’ branch an ‘apps’ environment onto the ‘development’ environment ‘base’
- Manual: Deploy from the ‘main’ branch onto the ‘staging’ / ‘production’ environment ‘base’
Notes:
- Avoid mentioning an environmnent's name in the code for conditional resources deployment
- Use each environment’s config file to declare if a resource should be created
- Could be implemented using Terraform, Terragrunt, Pulumi, CDK, and other IaC tools
- Production should have 2-instances of every workload for high-availability
If you’d like to see this setup in your startup, click here to book a call 👈🏼
P.S. - I'll be updating this page occasionally, so you might want to visit again
Another Bonus: DevOps Dictionary for Human Beings
| Term | Definition | Tools |
| Environment | A working instance of the entire system | |
| CI (Continuous Integration) | Enable developers to collaborate by agreeing on a single source-of-truth (master/main) | Jenkins, Github Actions, GitlabCI |
| CD (Continuous Delivery) | Create an artifact that’s ready for production (tested, tagged) | JFrog Artifactory, Nexus, AWS ECR |
| CD (Continuous Deployment) | Every available deliverable (artifact) gets deployed automatically | ArgoCD, Jenkins, AWS CodeDeploy |
| Monitoring / Observability | Collect metrics/traces/logs from apps and infrastructure, analyze them, and display them, and setup alerts | Prometheus, Jaeger, Elasticsearch, Fluentd, OpenTelemetry |
| Infrastructure | The resources on which the workloads run, in which the data is stored, and through which the network flows | Servers, Databases, Network Routers & Switches |
| Cloud Infrastructure | Same as the above, but specifically in the cloud | AWS EC2, AWS RDS, GCP Compute Engine, Azure Virtual Machines |
| Cloud | Computing & Data services served from remote locations for you to build your system | AWS, Azure, GCP |
| Containerization & Virtualization | Technologies utilizing Kernel & OS features to create virtual machines, or isolate process (AKA run containers) | Docker, vSphere, KVM |
| Secrets Management | Storing and retrieving sensitive configurations (e.g., tokens, passwords) | Hashicorp Vault, AWS Secrets Manager, SealedSecrets |
| Configuration Management | Usually refers to preparing servers for workloads (e.g., creating directories & files, starting processes) | Ansible, Chef, Puppet |
| Version Control | Saving the code in a versioned way (Git) | Github, Gitlab |
| GitOps | Making the system is the same as it’s described in Git | Flux, ArgoCD, Jenkins |
| Monorepo | All of the company’s code is in one Git Repository | NX, Turborepo |
| Polyrepo | Multiple Git repositories for different components | |
| IaC (Infrastructure-as-Code) | Creating Cloud infrastructure with idempotent code and state management | Terraform, Pulumi, CDK, Crossplane |
| Deployment | Execute, serve, or install the artifacts | ArgoCD, Jenkins, AWS CodeDeploy, Scripts (Bash, Python, etc.) |
| Orchestrator | Dynamically allocating workloads to a pool of nodes | Kubernetes, Nomad, AWS ECS |
| Authentication & Authorization | Making sure each person, workload, or resource, has access only to what’s necassary (other workloads and resources) | AWS IAM, OpenID, OpenVPN, Twingate, Istio |
| Service Discovery | Exposing available workloads using DNS | Consul, CoreDNS |




