Jakski's blog

Virtual machine as a development environment

The more technologies I work with the more sophisticated are my development environments. In scripting languages everything usually boils down to having intepreter in relevant version. Problem begins when I work with extensions compiled to native processor executable code or lower level projects. Things are getting really messy when I have to partially provision infrastructure to test solutions. Of course virtualization is the best for these kind of problems and a lot of solutions exist:

Idea

I was thinking of a few approaches:

So is there another way? First let's limit the scope to just solutions working with QEMU + KVM stack. After seeing the tricks people did to make Docker run on Windows and MacOS I give up the hope that there's a universal way to provision development infrastructure on all platforms. If something works "everywhere" it usually means that some company spent a lot of time to ensure compatibility with every popular platform. I don't want any complicated solutions. I just want to be able to spin up development instance for each of my projects ad-hoc on Debian.

I recently had a pleasure to work with cloud-init which makes it really easy to provision new instances on-fly. But how cloud-init gets information on how to provision instance and can I use it without using a cloud service? It turns out that there're a lot of methods and they are called datasources in documentation. In public clouds it's usually assumed that some network service will expose metadata, but there's more. Metadata can be also provided on CD image attached to instance. It's great cause it means that we can use cloud-init with practically any virtualization platform including QEMU. Let's try to implement the following flow:

  1. ISO 9660 with provisioning information is created. It contains user access keys and any required installation scripts.
  2. QEMU launches instance with metadata CD. It may be also a good place to use virtiofs for exposing directories with project from host system. AppArmor profile would be welcome as well.
  3. Developer logs into machine or just sends SSH commands to recompile and rerun tests.

Generating base image

Fortunately Debian provides official cloud images, which already contain cloud-init. Since we use QEMU all devices will be provided via virtio, we can pick genericcloud variant. cloud-init provides some tool for creating datasource image, but we can also do it with any ISO manipulation tool:

$> cat > meta-data
instance-id: devbox
local-hostname: devbox
$> cat > user-data
#cloud-config
ssh_authorized_keys:
  - $(cat "${HOME}/.ssh/id_rsa.pub" | tr -d "\n")
$> genisoimage -output seed.iso -volid cidata -joliet -rock user-data meta-data

Provisioning parameters are split into meta-data traditionally provided by hosting platform and user-data, where we can place our own customizations. In this case we use meta-data to provide minimal information for identification.

Launching instance

cloud-init will begin provisioning upon startup, so we don't have to do anything. The only exception is when you use package_reboot_if_required flag, which(as name suggests) restart our instance additionally in some cases.

$> qemu-system-x86_64 \
  -m 256 \
  -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 \
  -drive file=instance.qcow2,if=virtio \
  -drive driver=raw,file=seed.iso,if=virtio

This is far from optimal, since out network card is implemented in user-space and burns CPU, but it's good enough for now. Take a note that we also expose port 22 to host network, so that we can login via SSH like this:

$> ssh debian@127.0.0.1 -p 2222

debian is the default user for official cloud images. We could also change default user or provision a few users using users key. You can find a documentation for all cloud-init modules here.

If you scheduled package installation in cloud-config, it might be worth checking provisioning status before starting any work:

$> cloud-init status
status: done

Using host directory

In theory above setup would be enough for most of things, but can take things even further. Instead of synchronizing code to virtual machine or setting up development environment remotely, we can mount code directly. QEMU allows us to do it in at least 2 ways:

I really wanted to do it with virtio-fs, but it requires root daemon virtiofsd to be running on hypervisor. It's not like it's a blocker, but for now we want to keep things simple, so let's stick to VirtFS. With VirtFS using host directory is just a matter of adding flags to QEMU:

-fsdev local,id=code0,path="$(realpath .)",security_model=mapped-xattr \
-device virtio-9p-pci,fsdev=code0,mount_tag=code

and mounting it inside virtual machine like so:

$> mount -t 9p -o trans=virtio code /mnt

If you're using genericcloud Debian image, than you will be warned about unknown filesystem type:

root@devbox:~# mount -t 9p -o trans=virtio code /mnt
mount: /mnt: unknown filesystem type '9p'.

It is, because minimal Debian image ships without any extra drivers beside paravirtualization support. Once you use generic image this problem will go away. Automatic mounting can be configured with cloud-init like this:

mounts:
  - [code, /mnt, 9p, trans=virtio, "0", "0"]

Improvements

CPU acceleration

If you try to recreate this setup, you've probably noticed that virtual machine literally burns your CPU. Using virtualization processor extensions like VTx is a quickwin here. On Linux they are utilized by KVM. Note that your user must have access to /dev/kvm in order to use this acceleration. Under Debian it's a matter of adding user to group kvm. Then you can use command kvm instead of QEMU(it's still QEMU underneath) or add -enable-kvm flag.

Virtual network interface instead of user networking

User networking is quick and easy, but it also involves emulating network stack on top of hypervisor's network stack. We can improve it by using TAP interfaces, which are connected to bridge on hypervisor. There are countless ways to configure bridge interfaces, so let's use ifup scripts on Debian with nftables to implement packet masquerading. In order to allow configuring bridges in interfaces you will need extra package:

$> apt install bridge-utils

In /etc/network/interfaces.d/devbox:

auto br0
iface br0 inet static
  bridge_ports none
  bridge_stp off
  address 192.168.2.1
  network 192.168.2.0
  netmask 255.255.255.0
  broadcast 192.168.2.255

We will use static address configuration, because DHCP will be provided by dnsmasq and we want it to bind only to manged interfaces(bind-interfaces) to prevent it from interfering accidentally with other system services.

$> apt install dnsmasq

Normally dnsmasq works greate out of the box, but in this case we want to explicitly limit its job to single interface. Configuration is stored in /etc/dnsmasq.conf on Debian:

interface=br0
except-interface=lo
listen-address=192.168.2.1
bind-interfaces
dhcp-range=192.168.2.0,static,1m
dhcp-host=devbox,192.168.2.2
dhcp-option=br0,option:router,192.168.2.1

Until now we effectively enabled communication between virtual machine and host, but we also want to share internet access. Linux requires changing kernel options before it starts forwarding packets:

$> sysctl -w net.ipv4.ip_forward=1

Caution: Turning on packet forwarding might impose some security risks, if you don't trust your network. It's best to enable it only after you configure basic firewall rules.

Last thing to do is NATting, so that packets sent by virtual machine will be seen like in network with proper source address. This can be achieved in multiple ways, but I will stick to nftables:

$> apt install nftables

In /etc/nftables.conf:

#!/usr/sbin/nft -f

flush ruleset

table inet nat {
  chain postrouting {
    type nat hook postrouting priority 0
    iifname br0 masquerade
  }
}

table inet filter {
  chain input {
    type filter hook input priority 0; policy drop
    meta iif lo accept
    meta iif br0 accept
    ct state established,related accept
    ct state invalid drop
  }
  chain forward {
    type filter hook forward priority 0; policy drop
    ct state related,established accept
    iifname br0 accept
  }
  chain output {
    type filter hook output priority 0;
  }
}

Note that above rules allow to access every network that your host can, including home/office internal networks.

Since QEMU uses separate bridge helper tool to enable interface manipulation, we will need to setup privileges appropriately:

$> setcap cap_net_admin+ep /usr/lib/qemu/qemu-bridge-helper
$> cd /etc/qemu/
$> cat bridge.conf
include /etc/qemu/jakski-bridge.conf
$> cat jakski-bridge.conf
allow br0
$> ls -alrth
total 24K
-rw-r-----   1 root jakski   10 Aug 29 17:51 jakski-bridge.conf
-rw-r--r--   1 root root     37 Aug 29 17:51 bridge.conf
drwxr-xr-x   2 root root   4.0K Aug 29 17:51 .
drwxr-xr-x 154 root root    12K Aug 30 04:47 ..

Note how file permissions are set up on helper ACLs. Mode and ownership are crucial to ensure that only selected user can attach interfaces to our bridge.

Now we're ready to launch virtual machine with optimal network setup:

kvm \
  -m 256 \
  -nic bridge,br=br0,model=virtio-net-pci \
  -drive file=instance.qcow2,if=virtio \
  -drive driver=raw,file=seed.iso,if=virtio \
  -fsdev local,id=code0,path="$(realpath .)",security_model=mapped-xattr \
  -device virtio-9p-pci,fsdev=code0,mount_tag=code

Protecting local networks

Original firewall rules allow virtual machine to mess with private networks like VPNs. You may consider hardening firewall further by adding:

define PRIVATE_IPV4 = {
  10.0.0.0/8,
  172.16.0.0/12,
  192.168.0.0/16,
  169.254.0.0/16
}
define PRIVATE_IPV6 = {
  fd00::/8,
  fe80::/10,
  fc00::/7
}
table inet filter {
  chain br0_prerouting {
    type filter hook prerouting priority 0
    iifname != br0 return
    fib daddr . iif type local return
    ip daddr $PRIVATE_IPV4 drop
    ip6 daddr $PRIVATE_IPV6 drop
  }

  ...other rules...
}

virtio-fs

TODO