xend refusing to start

We recently had a few power outages at work, some scheduled, some not, and this played havoc with our xen servers.

One of the problems we had was that xend would not start (and thus xendomains would also not start).

Checking /var/log/xen/xend.log gave us the following snippet:

inst = XendNode()
File "/usr/lib/python2.5/site-packages/xen/xend/XendNode.py", line 164, in __init__
saved_pifs = self.state_store.load_state('pif')
File "/usr/lib/python2.5/site-packages/xen/xend/XendStateStore.py", line 104, in
dom = minidom.parse(xml_path)
File "xml/dom/minidom.py", line 1913, in parse
File "xml/dom/expatbuilder.py", line 924, in parse
File "xml/dom/expatbuilder.py", line 211, in parseFile
ExpatError: no element found: line 1, column 0
[2008-03-10 21:37:40 18122] INFO (__init__:1094) Xend exited with status 1.

A quick google of that error revealed several people that had come across the same problem, but no actual answer!

It looks like xen is having problems parsing an xml file, so some quick mental inspiration, and the find command, yielded /var/lib/xend/state/pif.xml which was a 0 byte file! A comparison to a working server showed that it should (or atleast could) contain this:

A copy and paste later and we had a working xend! However it refused to create any of the xenlets:

root@xen0:/etc/xen# xm create server0.cfg
Using config file "./server0.cfg".
Error: The privileged domain did not balloon!

Despite their being plenty of RAM!

root@xen0:/var/log/xen# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 7928 8 r----- 832.8
root@xen0:/var/log/xen# free
total used free shared buffers cached
Mem: 8119416 393028 7726388 0 11344 58832
-/+ buffers/cache: 322852 7796564
Swap: 15631224 0 15631224

An strace of the process revealed xen did think it had less memory available than it actually had ..

[2008-03-10 21:47:48 18620] DEBUG (__init__:1094) Balloon: 131064 KiB free; 0 to scrub;
need 524288; retries: 20.

As we had a working xend finally we decided to implement a technique we’d learned from working with Windows machines and rebooted the server. This magically fixed the memory issue, it would have been nice to know what actually caused it and if there was a proper fix though.

Resize Xen Filesystem

We run a lot of Xen instances for our development and test servers and a few were starting to get full. Fortunately the disks in the real servers were very large and the xenlet partitions were made using LVM so resizing them to add more space was possible!

root@dev-myfiles0:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 4.0G 3.8G 200M 95% /
varrun 257M 48K 257M 1% /var/run
varlock 257M 0 257M 0% /var/lock
udev 257M 40K 257M 1% /dev
devshm 257M 0 257M 0% /dev/shm

Basically we just have to shut down the xenlet, resize the partition and then restart the xenlet again, simple!

root@brandy:~# xm shutdown dev-myfiles0
root@brandy:~# lvextend -L40G /dev/vg0/dev-myfiles0-disk
Extending logical volume dev-myfiles0-disk to 40.00 GB
Logical volume dev-myfiles0-disk successfully resized
root@brandy:~# e2fsck -f /dev/vg0/dev-myfiles0-disk
e2fsck 1.40.2 (12-Jul-2007)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vg0/dev-myfiles0-disk: 16541/524288 files (0.9% non-contiguous), 138346/1048576 blocks
root@brandy:~# resize2fs /dev/vg0/dev-myfiles0-disk
resize2fs 1.40.2 (12-Jul-2007)
Resizing the filesystem on /dev/vg0/dev-myfiles0-disk to 10485760 (4k) blocks.
The filesystem on /dev/vg0/dev-myfiles0-disk is now 10485760 blocks long.
root@brandy:~# cd /etc/xen
root@brandy:/etc/xen# xm create dev-myfiles0.cfg
Using config file "./dev-myfiles0.cfg".
Started domain dev-myfiles0

Wee, lots of free space now!

root@dev-myfiles0:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 40G 3.8G 37G 10% /
varrun 257M 40K 257M 1% /var/run
varlock 257M 0 257M 0% /var/lock
udev 257M 40K 257M 1% /dev
devshm 257M 0 257M 0% /dev/shm