2025-11-18 Can I replace Ansible with PyInfra?
================================================================================

I'm a long time Ansible user and I have to say that it is the easiest
configuration management tool to start with. I've worked with more sophisticated
solutions like Puppet and Kubernetes operators, but Ansible just gets things
done without abusing budget or forcing me to spend months teaching my team a
new approach. Having said that, I don't agree with some of its design choices.
I've been able to safely ignore them for many years, but they are still
cumbersome and limiting in everyday use.

Ansible is a tool, not a library
--------------------------------------------------------------------------------

At the beginning it was easier this way. Not everyone thinks like a programmer
and YAML is fairly readable even to old schools systems administrators not
interested in coding at all. I even recall that Ansible had some Python API, but
it's no longer supported to call it directly. It means that you *must* organize
your code into playbooks and call `ansible-playbook` to use it. There's plethora
of options to extend core logic with plugins like using callback plugin for
extra reporting or the thing that
`Ara <https://github.com/ansible-community/ara>`_ did to make Ansible more
auditable. You can also render YAML files automatically using some meta script,
but this way you add yet another layer which needs to be read and understood by
user to debug code. I wouldn't whine about it, if Ansible was written in some
compiled language making YAML the fastest way to get your configuration into it,
but that's not the case. Ansible is written in Python. Most of its modules are
written in Python. YAML playbooks serve as a great starting point, but become
more limiting the more advanced your setup becomes.

Ansible is very clunky as a scripting language
--------------------------------------------------------------------------------

I know it's not really designed to be a scripting language, but it feels like
one. YAML creates illusion of declarativenes and I even often catch myself
explaining playbooks in terms of changes to resources, but Ansible is
imperative. Modules are very well designed and often give impression that you
can just express files, services etc. state using attributes. That's not all.
You still need to order tasks properly, handle variables, check mode, tagging
and so on. And Ansible allows you to do that. You can make tasks conditional,
you can make loops and error handling blocks. It feels nice that despite using
YAML you can still handle all these corner cases where flow needs to be changed.
Handler tasks pattern also helps a lot in expressing nondeclarative actions. Yet
I regularly see people trying to express some arbitrary algorithms using these
modest flow control features and ending up with enormous playbooks using
`set_fact`/`register` as a means for assigning variables in any other
programming language. There's also a distinction for static `import_*` and
dynamic `include_*`, which doesn't make it any easier. One way to tackle this
issue would be to ask colleagues politely to not abuse Ansible's flexibility,
but then again splitting this logic into a dedicated Jinja2 filters or dedicated
modules often feels like overcomplexity and forces user to jump between files.

Variable templating is no fun
--------------------------------------------------------------------------------

I'm used to web development where one can often render arbitrary(even invalid)
HTML content using versatile templating languages. Jinja2 also fits into this
approach, but that's not how it's used in Ansible. You can't render tasks
howether you like. Each playbook must be a valid YAML document. Jinja2 can be
used only for values inside this YAML structure, e.g. you can't render tasks in
a loop using just Jinja2. Yet Jinja2 allows you to embed fairly complex logic
inside YAML values and it's very often used to parametrize tasks. So on one hand
you have extremely readable YAML, but on the other hand it's sprinkled with not
so readable one-line templates all the way. This solution is also fragile on it's
own. I recall times when Ansible didn't render some nested variable or crashed
upon discovering recursive variable definition in the middle of the playbook. I
often think of configuration in a scatter-gather pattern, e.g. Nginx is
configured in one place but virtual hosts definitions can be scattered across
many roles. Or SSH role can be configured with some default list of users
allowed to log in, but some group may add a few more users and I don't want to
copy default values around to mimic configuration merging. This was historically
achievable with `varnames` and `var` lookups and recently got consolidated into
a `community.general.merge_variables` plugin, but I still see puzzled
expressions when I explain how it works and why it's required to prefix
variable and to pay attention to make its name unique. I get it - this may be
not your everyday Ansible use-case, but on long run such patterns make scale
manageable and Ansible doesn't really excel in this area.

Tasks execution overhead makes development cycles unbearably long
--------------------------------------------------------------------------------

Ansible works by uploading and then running independent scripts. You can also
run actions locally with plugins, but the core idea behind modules is, that you
upload some code/binary, run it and then present result in a concise way to end
user. This is probably the strongest advantage of Ansible above other solutions.
Entry barrier for writing modules is minimal and issues like memory leaks or
unclosed descriptors don't really exist, because module is always spawned
separately. Modules quality varies, but as long as you stick to popular
collections you can expect that configuration will be applied in a secure manner
without exposing any credentials or hogging server resources. The issue is that
this model doesn't really cares about performances measured as a time of
completion. Most modules are written in Python and Python interpreter has a
fairly long startup time compared to any given shell or single purpose binaries
like `curl`. As a result you can quickly hit long execution times even with
zero-change runs. And I don't mean few minutes long runs like it's often the
case in programming projects. You will often hear about 30 minutes long
executions. That's quite a problem, because during execution Ansible stores some
information in memory, like which handlers need to be launched after introducing
changes. This information gets lost, if playbook gets interrupted. `Mitogen
<https://github.com/mitogen-hq/mitogen>`_ fixes this issue partially, but breaks
some compatibility and as far as I know it's not planned to be merged into
Ansible anytime soon.

That's where PyInfra comes in
--------------------------------------------------------------------------------

After reading above you may think that I hate Ansible, but that's not true. I
believe it has proven the flexibility of tasks based workflow in configuration
management and restored faith in splitting complex setups into smaller roles and
then into standalone scripts which can be tested independently from
infrastructure. Yet what I often need is not a whole tool, but a simple library
which I can freely hack and create some specific abstractions based on
external requirements. Before I delve into `PyInfra
<https://github.com/pyinfra-dev/pyinfra>`_ you need to understand that for its
size and development history it's a pretty decent tool. A lot of comparisons may
feel unfair, since on the other side I have an enterprise grade project used
daily by hundreds of companies. Goal is to replace Ansible in my workflows and I
have specific requirements which play a crucial role to me.

Main differences
--------------------------------------------------------------------------------

The first time I read about PyInfra it reminded me more of a Puppet than
Ansible, due to staged runs where:

- Facts are gathered
- Deployment scripts are executed to build an actions plan
- Changes are applied

It may even sound a bit like Terraform to you, but unlike Puppet and Terraform,
PyInfra won't automatically order operations for you. You still need to make
sure that Nginx is installed before virtual hosts are configured etc. Also
unlike Terraform, you can't easily pass output from one operation as input to
another operation. In HCL it's handled automatically when you mention output
from different resource in attributes, in Pulumi it's handled using promises,
but in PyInfra you would need to use immediate Python execution operation, which
leaves you without idempotency guarantee.

Advantages
--------------------------------------------------------------------------------

It's just a code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Yes - you can use your favourite formatters, linters and language servers. When
using PyInfra CLI, host context is automatically provided to operations, so you
don't need to carry around any `Host` object or anything like this - you can
just import it, if you really need it. I won't advertise it any further.
Practically all the things you can do in Python can be also done in PyInfra.
This brings back the joy of programming into infrastructure management.

Performance is decent
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

PyInfra doesn't upload full-blown scripts to servers. Instead it tries to run
one-off shell commands whenever possible. Some people say that Ansible is not
agentless, because it requires Python and Pip modules. Well, by that logic
PyInfra is certainly closer to being agentless, because it relies only on
shell utilities. I didn't run any microbenchmarks, but during short tests
PyInfra seemed to run much faster than stock Ansible. In past I did some Python
vs shell utilities benchmarks and discovered that Bash + curl combo can often
finish job before Python interpreter even starts, so it seems to me that PyInfra
should allow much faster development cycles. Note that default SSH connector
uses Paramiko. I recall from Ansible that it offered worse performance than
native OpenSSH client, but use pattern is a bit different here. Ansible calls
SSH less often but uploads bigger chunks of code, while PyInfra makes more calls
but uploads smaller one-liners - or at least that's what I think it does after
reviewing some operations implementations.

You can organize code into reusable units
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sure you can use just Python modules, but on top of that PyInfra introduces a
concept of `deploys` which loosely maps to Ansible roles. Forget about some
tool specific repositories like Galaxy. PyInfra deploys can be packaged like a
regular Python modules and published in PyPi.

Optional real-time output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now this is a masterpiece, which acts a bit like multiplexed SSH client. Not
great for batch runs, but gives you a lot of context when developing code. You
don't just stare dumbly into screen when operation gets stuck. You can see
exactly at which point it got stuck. You can also get all details on screen
before failure happens. Ansible does a decent job here, but some logic is hidden
in Python and presented just as a status report from task. It has it's
advantages, but it's a less direct approach to debugging.

Cons
--------------------------------------------------------------------------------

It's just simple commands, right?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Well... no, or at least not the kind of simple commands that you're used to when
working with shell interactively. You can run PyInfra with `-vvv` to debug
commands, e.g. here is how fact for gathering file details looks like::

   [@ssh/host] >>> sudo -H -n sh -c '! (test -e /etc/apt/keyrings || test -L /etc/apt/keyrings ) || ( stat -c '"'"'user=%U group=%G mode=%A atime=%X mtime=%Y ctime=%Z size=%s %N'"'"' /etc/apt/keyrings 2> /dev/null || stat -f '"'"'user=%Su group=%Sg mode=%Sp atime=%a mtime=%m ctime=%c size=%z %N%SY'"'"' /etc/apt/keyrings || ls -ld /etc/apt/keyrings )'
   [@ssh/host] user=root group=root mode=drwxr-xr-x atime=1759727935 mtime=1750784566 ctime=1759727935 size=4096 '/etc/apt/keyrings'

Some simple commands like APT package management also require additional options
to make them behave well in batch runs, e.g.::

   [@ssh/host] >>> sudo -H -n sh -c 'DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin'

Ansible's debug output is much more chaotic, but PyInfra's commands are also not
exactly simple. The good thing is that you can copy them to SSH session and test
how they really behave, but you may spend some time trying to follow all these
shorthand conditionals. Also I would expect that in more advanced scenarios you
still end up with most logic being implemented in Python, but at least here you
can use traditional debug print statements like in any other script.

Less battle-tested
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I might have been just unlucky, but I was interested how PyInfra handles
temporary files and stumbled upon `Deb` operation. `Deb` operation let's you
manage Debian packages. If you specify URL as a source, then it will be
downloaded to temporary location and acted upon. All good so far, but I couldn't
find any cleanup code. Tested it with Chrome package and to my surprise PyInfra
left ~115M artefact in `/tmp` directory. Not like Ansible doesn't have bugs, but
there's bigger pressure on Ansible to solve them and more eyes to report them.

Not all operations are suitable for shell commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Running commands has a big drawback: everyone on a system can see what you're
doing. There's protection called `hidepid`, but I've never seen it turned on by
default. Without `hidepid` things like `passing MySQL password
<https://github.com/pyinfra-dev/pyinfra/blob/a43146afc6a3c2b34078b74651a81f89e5c7c02b/src/pyinfra/facts/mysql.py#L32>`_
in arguments become risky. Again just a matter of carefully reading code and
reviewing operations before they touch sensitive data, but I feel like Ansible
community solved these kind of issues long ago by using Python modules as
drivers for managing databases. If we still want to use MySQL CLI, then we can
at least put password in a `configuration file
<https://dev.mysql.com/doc/refman/8.4/en/option-files.html>`_.

No builtin way to limit execution scope
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ansible tags are seriously underrated. They allow to limit changes just to the
smallest part of codebase that you care about. In PyInfra there's no tags
replacement. It's not a problem for small projects, but I would expect to find a
lot of different implementations of tags in rapidly scaling projects. After all
these kind of automations are often developed ad-hoc by humans and being able to
turn off some code with configuration argument is much more plausible solution
then asking someone to comment some block before running.

Verdict
--------------------------------------------------------------------------------

To me it feels like PyInfra got things right. It won't be Ansible's replacement
for old school sysadmins or less technical users, but it's a perfect way to
engage developers more into infrastructure and find common language in a spirit
of devops methodology. I also see it as a better driver for vendoring
one-command tools, especially since `uv <https://github.com/astral-sh/uv>`_
allows to automatically install script dependencies upon launching based on
script metadata. Seems like ecosystem could use some patches and I'm still
learning details. Some of above issues may be already patched by the time you
read this article.