2025-11-18 Can I replace Ansible with PyInfra? ================================================================================ I'm a long time Ansible user and I have to say that it is the easiest configuration management tool to start with. I've worked with more sophisticated solutions like Puppet and Kubernetes operators, but Ansible just gets things done without abusing budget or forcing me to spend months teaching my team a new approach. Having said that, I don't agree with some of its design choices. I've been able to safely ignore them for many years, but they are still cumbersome and limiting in everyday use. Ansible is a tool, not a library -------------------------------------------------------------------------------- At the beginning it was easier this way. Not everyone thinks like a programmer and YAML is fairly readable even to old schools systems administrators not interested in coding at all. I even recall that Ansible had some Python API, but it's no longer supported to call it directly. It means that you *must* organize your code into playbooks and call `ansible-playbook` to use it. There's plethora of options to extend core logic with plugins like using callback plugin for extra reporting or the thing that `Ara `_ did to make Ansible more auditable. You can also render YAML files automatically using some meta script, but this way you add yet another layer which needs to be read and understood by user to debug code. I wouldn't whine about it, if Ansible was written in some compiled language making YAML the fastest way to get your configuration into it, but that's not the case. Ansible is written in Python. Most of its modules are written in Python. YAML playbooks serve as a great starting point, but become more limiting the more advanced your setup becomes. Ansible is very clunky as a scripting language -------------------------------------------------------------------------------- I know it's not really designed to be a scripting language, but it feels like one. YAML creates illusion of declarativenes and I even often catch myself explaining playbooks in terms of changes to resources, but Ansible is imperative. Modules are very well designed and often give impression that you can just express files, services etc. state using attributes. That's not all. You still need to order tasks properly, handle variables, check mode, tagging and so on. And Ansible allows you to do that. You can make tasks conditional, you can make loops and error handling blocks. It feels nice that despite using YAML you can still handle all these corner cases where flow needs to be changed. Handler tasks pattern also helps a lot in expressing nondeclarative actions. Yet I regularly see people trying to express some arbitrary algorithms using these modest flow control features and ending up with enormous playbooks using `set_fact`/`register` as a means for assigning variables in any other programming language. There's also a distinction for static `import_*` and dynamic `include_*`, which doesn't make it any easier. One way to tackle this issue would be to ask colleagues politely to not abuse Ansible's flexibility, but then again splitting this logic into a dedicated Jinja2 filters or dedicated modules often feels like overcomplexity and forces user to jump between files. Variable templating is no fun -------------------------------------------------------------------------------- I'm used to web development where one can often render arbitrary(even invalid) HTML content using versatile templating languages. Jinja2 also fits into this approach, but that's not how it's used in Ansible. You can't render tasks howether you like. Each playbook must be a valid YAML document. Jinja2 can be used only for values inside this YAML structure, e.g. you can't render tasks in a loop using just Jinja2. Yet Jinja2 allows you to embed fairly complex logic inside YAML values and it's very often used to parametrize tasks. So on one hand you have extremely readable YAML, but on the other hand it's sprinkled with not so readable one-line templates all the way. This solution is also fragile on it's own. I recall times when Ansible didn't render some nested variable or crashed upon discovering recursive variable definition in the middle of the playbook. I often think of configuration in a scatter-gather pattern, e.g. Nginx is configured in one place but virtual hosts definitions can be scattered across many roles. Or SSH role can be configured with some default list of users allowed to log in, but some group may add a few more users and I don't want to copy default values around to mimic configuration merging. This was historically achievable with `varnames` and `var` lookups and recently got consolidated into a `community.general.merge_variables` plugin, but I still see puzzled expressions when I explain how it works and why it's required to prefix variable and to pay attention to make its name unique. I get it - this may be not your everyday Ansible use-case, but on long run such patterns make scale manageable and Ansible doesn't really excel in this area. Tasks execution overhead makes development cycles unbearably long -------------------------------------------------------------------------------- Ansible works by uploading and then running independent scripts. You can also run actions locally with plugins, but the core idea behind modules is, that you upload some code/binary, run it and then present result in a concise way to end user. This is probably the strongest advantage of Ansible above other solutions. Entry barrier for writing modules is minimal and issues like memory leaks or unclosed descriptors don't really exist, because module is always spawned separately. Modules quality varies, but as long as you stick to popular collections you can expect that configuration will be applied in a secure manner without exposing any credentials or hogging server resources. The issue is that this model doesn't really cares about performances measured as a time of completion. Most modules are written in Python and Python interpreter has a fairly long startup time compared to any given shell or single purpose binaries like `curl`. As a result you can quickly hit long execution times even with zero-change runs. And I don't mean few minutes long runs like it's often the case in programming projects. You will often hear about 30 minutes long executions. That's quite a problem, because during execution Ansible stores some information in memory, like which handlers need to be launched after introducing changes. This information gets lost, if playbook gets interrupted. `Mitogen `_ fixes this issue partially, but breaks some compatibility and as far as I know it's not planned to be merged into Ansible anytime soon. That's where PyInfra comes in -------------------------------------------------------------------------------- After reading above you may think that I hate Ansible, but that's not true. I believe it has proven the flexibility of tasks based workflow in configuration management and restored faith in splitting complex setups into smaller roles and then into standalone scripts which can be tested independently from infrastructure. Yet what I often need is not a whole tool, but a simple library which I can freely hack and create some specific abstractions based on external requirements. Before I delve into `PyInfra `_ you need to understand that for its size and development history it's a pretty decent tool. A lot of comparisons may feel unfair, since on the other side I have an enterprise grade project used daily by hundreds of companies. Goal is to replace Ansible in my workflows and I have specific requirements which play a crucial role to me. Main differences -------------------------------------------------------------------------------- The first time I read about PyInfra it reminded me more of a Puppet than Ansible, due to staged runs where: - Facts are gathered - Deployment scripts are executed to build an actions plan - Changes are applied It may even sound a bit like Terraform to you, but unlike Puppet and Terraform, PyInfra won't automatically order operations for you. You still need to make sure that Nginx is installed before virtual hosts are configured etc. Also unlike Terraform, you can't easily pass output from one operation as input to another operation. In HCL it's handled automatically when you mention output from different resource in attributes, in Pulumi it's handled using promises, but in PyInfra you would need to use immediate Python execution operation, which leaves you without idempotency guarantee. Advantages -------------------------------------------------------------------------------- It's just a code ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Yes - you can use your favourite formatters, linters and language servers. When using PyInfra CLI, host context is automatically provided to operations, so you don't need to carry around any `Host` object or anything like this - you can just import it, if you really need it. I won't advertise it any further. Practically all the things you can do in Python can be also done in PyInfra. This brings back the joy of programming into infrastructure management. Performance is decent ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PyInfra doesn't upload full-blown scripts to servers. Instead it tries to run one-off shell commands whenever possible. Some people say that Ansible is not agentless, because it requires Python and Pip modules. Well, by that logic PyInfra is certainly closer to being agentless, because it relies only on shell utilities. I didn't run any microbenchmarks, but during short tests PyInfra seemed to run much faster than stock Ansible. In past I did some Python vs shell utilities benchmarks and discovered that Bash + curl combo can often finish job before Python interpreter even starts, so it seems to me that PyInfra should allow much faster development cycles. Note that default SSH connector uses Paramiko. I recall from Ansible that it offered worse performance than native OpenSSH client, but use pattern is a bit different here. Ansible calls SSH less often but uploads bigger chunks of code, while PyInfra makes more calls but uploads smaller one-liners - or at least that's what I think it does after reviewing some operations implementations. You can organize code into reusable units ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sure you can use just Python modules, but on top of that PyInfra introduces a concept of `deploys` which loosely maps to Ansible roles. Forget about some tool specific repositories like Galaxy. PyInfra deploys can be packaged like a regular Python modules and published in PyPi. Optional real-time output ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now this is a masterpiece, which acts a bit like multiplexed SSH client. Not great for batch runs, but gives you a lot of context when developing code. You don't just stare dumbly into screen when operation gets stuck. You can see exactly at which point it got stuck. You can also get all details on screen before failure happens. Ansible does a decent job here, but some logic is hidden in Python and presented just as a status report from task. It has it's advantages, but it's a less direct approach to debugging. Cons -------------------------------------------------------------------------------- It's just simple commands, right? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Well... no, or at least not the kind of simple commands that you're used to when working with shell interactively. You can run PyInfra with `-vvv` to debug commands, e.g. here is how fact for gathering file details looks like:: [@ssh/host] >>> sudo -H -n sh -c '! (test -e /etc/apt/keyrings || test -L /etc/apt/keyrings ) || ( stat -c '"'"'user=%U group=%G mode=%A atime=%X mtime=%Y ctime=%Z size=%s %N'"'"' /etc/apt/keyrings 2> /dev/null || stat -f '"'"'user=%Su group=%Sg mode=%Sp atime=%a mtime=%m ctime=%c size=%z %N%SY'"'"' /etc/apt/keyrings || ls -ld /etc/apt/keyrings )' [@ssh/host] user=root group=root mode=drwxr-xr-x atime=1759727935 mtime=1750784566 ctime=1759727935 size=4096 '/etc/apt/keyrings' Some simple commands like APT package management also require additional options to make them behave well in batch runs, e.g.:: [@ssh/host] >>> sudo -H -n sh -c 'DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin' Ansible's debug output is much more chaotic, but PyInfra's commands are also not exactly simple. The good thing is that you can copy them to SSH session and test how they really behave, but you may spend some time trying to follow all these shorthand conditionals. Also I would expect that in more advanced scenarios you still end up with most logic being implemented in Python, but at least here you can use traditional debug print statements like in any other script. Less battle-tested ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I might have been just unlucky, but I was interested how PyInfra handles temporary files and stumbled upon `Deb` operation. `Deb` operation let's you manage Debian packages. If you specify URL as a source, then it will be downloaded to temporary location and acted upon. All good so far, but I couldn't find any cleanup code. Tested it with Chrome package and to my surprise PyInfra left ~115M artefact in `/tmp` directory. Not like Ansible doesn't have bugs, but there's bigger pressure on Ansible to solve them and more eyes to report them. Not all operations are suitable for shell commands ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Running commands has a big drawback: everyone on a system can see what you're doing. There's protection called `hidepid`, but I've never seen it turned on by default. Without `hidepid` things like `passing MySQL password `_ in arguments become risky. Again just a matter of carefully reading code and reviewing operations before they touch sensitive data, but I feel like Ansible community solved these kind of issues long ago by using Python modules as drivers for managing databases. If we still want to use MySQL CLI, then we can at least put password in a `configuration file `_. No builtin way to limit execution scope ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ansible tags are seriously underrated. They allow to limit changes just to the smallest part of codebase that you care about. In PyInfra there's no tags replacement. It's not a problem for small projects, but I would expect to find a lot of different implementations of tags in rapidly scaling projects. After all these kind of automations are often developed ad-hoc by humans and being able to turn off some code with configuration argument is much more plausible solution then asking someone to comment some block before running. Verdict -------------------------------------------------------------------------------- To me it feels like PyInfra got things right. It won't be Ansible's replacement for old school sysadmins or less technical users, but it's a perfect way to engage developers more into infrastructure and find common language in a spirit of devops methodology. I also see it as a better driver for vendoring one-command tools, especially since `uv `_ allows to automatically install script dependencies upon launching based on script metadata. Seems like ecosystem could use some patches and I'm still learning details. Some of above issues may be already patched by the time you read this article.