Libvirt Live Migrations Without Shared Storage

Counter to popular believe, you don’t need shared storage for libvirt live migrations. It works just fine with both lvm as well as qcow2 backed VMs.

Let’s bootstrap libvirt on a fresh debian 10 (buster) machine, so we are on the same page:

# First set up ssh key based authentication for your user. This is left as an exercise to the reader.

# Install virsh, libvirtd and qemu
sudo apt install libvirt-clients libvirt-daemon-system qemu-kvm

# Allow your user to interact with the libvirt 'system' URI without further polkit authentication.
sudo usermod -aG libvirt "$USER"
newgrp libvirt
# Default to using the 'system' URI for your user.
mkdir -p ~/.config/libvirt/
echo 'uri_default = "qemu:///system"' > ~/.config/libvirt/libvirt.conf
# Check out the libvirt FAQs question on "What is the difference between qemu:///system and qemu:///session?" for more information.

# (Auto)start the default libvirt network, this already comes with libvirt on buster.
virsh net-autostart default
virsh net-start default

# Create and (auto)start a default libvirt storage pool.
virsh pool-define-as --name default --type dir --target /var/lib/libvirt/images
virsh pool-autostart default
virsh pool-start default

Do the same on a second machine. For the sake of this demo both machines need to be connected to the same Ethernet.

Now create a virtual machine on the first hypervisor. One simple way would be to use virt-manager, but it doesn’t really matter. If you use virt-manager you can pretty much use all the default settings, except for networking. You need to change the network selection from NAT to macvtap + Bridge. Note that neither NAT nor macvtap + Bridge are very suitable for production systems though. 💁‍♀️

Now we should be able to migrate the running virtual machine from the first hypervisor to the second one! Run this from your local machine:

LIBVIRT_DEFAULT_URI="qemu+ssh://hv1/system?socket=/var/run/libvirt/libvirt-sock" virsh migrate --live --persistent --undefinesource --copy-storage-all --verbose vm1 'qemu+ssh://hv2/system?socket=/var/run/libvirt/libvirt-sock'

Replace hv1, hv2 and vm1 accordingly. I’m running this from my mac, that’s why I need to define the socket locations via ?socket=/var/run/libvirt/libvirt-sock. My virsh expects the libvirt-sock to be in a different location.

In the past you _had_ to create the VMs disks on the destination hypervisor before starting the migration. This was fixed quite some time ago. Unfortunately I still sometimes run in permission issues like error: Cannot access storage file '/var/lib/libvirt/images/vm1.qcow2' (as uid:64055, gid:64055): No such file or directory when starting the migration and I’m not sure why. You can pre-create the disks via something like virsh vol-create-as default vm1.qcow2 16108814336 --format qcow2 on the destination hypervisor. Note that this also means libvirt will just overwrite disk images with the same name, as the one of the VM you are migrating, on the target hypervisor.

Once the migration is done, remember to manually remove the VMs disks from the source hypervisor via virsh vol-delete vm1.qcow2 --pool default.

🥳 🎉.

SSH Key Based API Authentication

Update: It’s now possible (and advisable) to sign arbitrary data with ssh-keygen directly. This comes with some benefits over the approach in this blog. One major win is, that ssh-keygen mitigates the risk of cross-protocol attacks, which is not discussed in this post.

If you like running with scissors, carry on.

Last year I, once again, attended the excellent Configuration Management Camp in Ghent. It’s a great little conference bringing together some great people from the system administration software scene.

During a chat with @purpleidea of mgmt fame about secrets management in mgmt he pointed out that any worthwhile API should offer some form of public/private key cryptography based authentication. I contemplated how to best adopt this idea in serveradmin, InnoGames configuration management database. At the time serveradmins API only allowed authentication via pre shared key API tokens. The obvious first choice for a HTTP API would be the use of x509 client certificates. Unfortunately anybody that ever had to use them will be quick to point out that they are inconvenient to generate, manage and use.

Fortunately InnoGames already has an alternative authentication scheme in place. All the admins and many developers already have SSH key pairs and they even have their public keys in LDAP already💡

Serveradmin and its client library are written in python, so I started digging into Paramiko, a pure python SSH library. While the Paramiko API around signing and verifying of ssh messages has some gotchas, its generally straight forward enough. What follows are some pointers how to abuse Paramikos crypto for arbitrary blobs. The source code in this blog post is MIT licensed and partially copied from the serveradmin project.

from paramiko import RSAKey
# Load private RSA key from file. This will raise
# paramiko.ssh_exception.PasswordRequiredException
# if a password is required.
key = RSAKey.from_private_key_file(private_key_path)
# Sign some blob
msg = key.sign_ssh_data(b"somestuff")
msg.rewind()

Unfortunately we have to know what key type to expect. To work around this we can try to load the key with all of Paramikos key types:

from paramiko.ssh_exception import (
    SSHException,
    PasswordRequiredException,
)
try:
    from paramiko import RSAKey, ECDSAKey, Ed25519Key
    key_classes = (RSAKey, ECDSAKey, Ed25519Key)
except ImportError:
    # Ed25519Key requires paramiko >= 2.2
    from paramiko import RSAKey, ECDSAKey
    key_classes = (RSAKey, ECDSAKey)
 
def load_private_key_file(private_key_path):
    # I don't think there is a key type independent
    # way of doing this
    for key_class in key_classes:
        try:
            return key_class.from_private_key_file(
                private_key_path
            )
        except PasswordRequiredException as e:
            raise AuthenticationError(e)
        except SSHException:
            continue
 
    raise AuthenticationError(
        'Loading private key failed'
    )
 
key = load_private_key_file(private_key_path)
msg = key.sign_ssh_data(b"somestuff")
msg.rewind()

From the loaded private key we can derive the public key. Unfortunately the API is not super obvious here. For loading the public key, we again need to know the key type. Using this public key object we can now verify the signature in the message object:

# This derives the public key.. Obviously.
pub = RSAKey(data=key.asbytes())
# Returns a boolean if the signature matched
pub.verify_ssh_sig(b"somestuff", msg)

The asbytes method on private keys returns the public key blob. That is the key you get when you load the base64 encoded bit of your public key files. This example is a bit of a stretch though, as we won’t have the private key on the server. So we have to do this:

def load_public_key_file(public_key_path):
    with open(public_key_path, 'r') as fd:
        public_key = fd.read()

    key_algorithm, key_base64, *_ = public_key.split(' ', 2)
    public_key_blob = b64decode(key_base64)
    if key_algorithm.startswith('ssh-ed25519'):
        try:
            return Ed25519Key(data=public_key_blob)
        except NameError:
            raise ValidationError('Paramiko too old to load ed25519 keys')
    elif key_algorithm.startswith('ecdsa-'):
        return ECDSAKey(data=public_key_blob)
    elif key_algorithm.startswith('ssh-rsa'):
        return RSAKey(data=public_key_blob)

    raise SSHException('Key is not RSA, ECDSA or Ed25519')


public_key = load_public_key_file(public_key_path)
if not public_key.verify_ssh_sig(
    data=expected_message.encode(),
    msg=msg,
):
    raise PermissionDenied('Invalid signature')

This is nice already, but we can do one better on the client side. Part of the openssh project is the ssh-agent. Using the agent over a key file has some benefits like:

  • With public keys in LDAP and private keys in the agent we require no additional setup on a developers or sysadmins computer.
  • Keys in the ssh-agent can be password protected on disk and decrypted only inside the agent. Adminapi never even sees the private part of the key.
  • Serveradmin only knows the public part of the key, nothing secret is saved there.

Obligatory warning: Do not forward your ssh-agent to other servers, especially not servers which other people have access to. Privileged users will be able to sign stuff using your agent for as long as you are connected.

Signing stuff using your own agent works like this:

# Load private RSA keys from agent
from base64 import b64encode
from paramiko.ssh_exception import SSHException
from paramiko.agent import Agent
from paramiko.message import Message

try:
    agent = Agent()
    for key in agent.get_keys():
        # This doesn't return a Message object but
        # bytes. That's dumb as it's not in line
        # with the other key types sign_ssh_data
        # methods. That also means we don't have to
        # call rewind on it, we can't actually.
        sig = private_key.sign_ssh_data(b"somestuff")
        # So we make sure its bytes here
        if isinstance(sig, Message):
            sig = sig.asbytes()
        # Now we could send it over the wire in base64
        print(b64encode(sig).decode())
except SSHException:
    raise AuthenticationError('No ssh agent found')

The return type of the sign_ssh_data method on agent keys is a bit of a gotcha. Note also that the can_sign method on agent keys always seems to return False, which is incorrect. Other then that it works like the keys loaded from a file.

One design note: In SSH you log in via a user and key combination. One key can allow access to multiple users. That is not how I implemented the API authentication in serveradmin. Instead I enforce that one key can only belong to one user. This way I am able to send the public key and a signed version of the request in the HTTP headers and authenticate the user with no setup required at all.

OpenWRT IP Whitelisting

This is how to only route whitelisted IPs on OpenWRT:

  • First decide where to restrict access. I created an extra vlan with one switch port.
  • Then create an “interface” with that vlan in it.
  • Assign a static ip and a netmask to the vlan interface.
  • Go to the “Firewall Settings” tab and create a new zone.
  • Enable Masquerading for the zone.
  • You could now “Allow forward to destination zone: wan” to allow all internet traffic.
  • But we will rather go to “Network->Firewall->Traffic Rules” and create rules from our zone to wan for some IPs.
  • Be aware that the default for traffic rules is to only allow tcp+udp.
  • Now just set the destination address.
  • And we are good to go.

That was complicated, wasn’t it? Well, let’s look at the config diff between our router and a fresh openwrt setup. Wow, that’s messy. Unfortunately changes in lucy lead to a complete reparse and rewrite of the default config. I went through the diff and these are the interesting bits:

diff --git a/config/firewall b/config/firewall
index 7be01d2..28391ab 100644
--- a/config/firewall
+++ b/config/firewall
@@ -183,6 +183,20 @@ config rule
 #	option dest_port	120
 #	option proto	tcp
 #	option target	REJECT
+config zone
+	option name 'limited'
+	option input 'ACCEPT'
+	option forward 'REJECT'
+	option output 'ACCEPT'
+	option network 'limited_lan'
+
+config rule
+	option target 'ACCEPT'
+	option src 'limited'
+	option dest 'wan'
+	option name 'berlin-ccc-dns'
+	option proto 'all'
+	option dest_ip '213.73.91.35'
 
 #config redirect
 #	option src		lan
diff --git a/config/network b/config/network
index e15871a..737cfb2 100644
--- a/config/network
+++ b/config/network
@@ -33,10 +33,21 @@ config switch
 config switch_vlan
 	option device 'switch0'
 	option vlan '1'
-	option ports '1 2 3 4 5t'
+	option ports '1 2 3 5t'
 
 config switch_vlan
 	option device 'switch0'
 	option vlan '2'
 	option ports '0 5t'
 
+config switch_vlan
+	option device 'switch0'
+	option ports '4 5t'
+	option vlan '42'
+
+config interface 'limited_lan'
+	option proto 'static'
+	option ifname 'eth0.42'
+	option ipaddr '10.23.42.1'
+	option netmask '255.255.255.0'
+