FreeBSD + Vantec NexStar 3

A few days ago, I added an entry to the FreeBSD Developers Want List indicating that I would like to have a large hard drive and USB-attachable enclosure, in order to permit me to perform backups in a more sane manner. Santa (aka. Daniel Seuffert) provided me with a 250GB Seagate Barracuda 7200.9 SATA2 hard drive and a Vantec NexStar 3 USB 2.0 enclosure, and since several people were curious as to how well this hardware was supported by FreeBSD, I thought I should provide a brief report.

The good: It works. I installed the drive into the enclosure, plugged in the power, and plugged the USB cable into my Dell D600 laptop, and FreeBSD 6.0-RELEASE-p4 recognized it immediately:

Jan 27 19:07:15 hexahedron kernel: umass0: Sunplus Technology Inc. USB to Serial-ATA bridge, rev 2.00/c4.fd, addr 2
Jan 27 19:07:15 hexahedron kernel: da0 at umass-sim0 bus 0 target 0 lun 0
Jan 27 19:07:15 hexahedron kernel: da0: <ST325082 4AS > Fixed Direct Access SCSI-2 device
Jan 27 19:07:15 hexahedron kernel: da0: 40.000MB/s transfers
Jan 27 19:07:15 hexahedron kernel: da0: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C)
I could then read and write to /dev/da0 just like I would any other 250GB hard drive. I could partition it, label it, create filesystems on it -- everything Just Worked.

The bad: It is a bit slow. USB 2.0 can, in theory, transmit data at 60MB/s, while according to StorageReview the Seagate drive has a transfer rate varying from 34.4 MB/s to 62.0 MB/s. In contrast, the transfer rate I obtained via USB was constant at approximately 25 MB/s across the entire drive.

FreeBSD's diskinfo -c explains the reason for the poor performance: command overhead. In contrast to my laptop's hard drive, where there is an overhead cost of 97 microseconds for a single read request, the USB-attached drive has an overhead cost of 730 microseconds. I imagine that this increased cost is largely due to the USB<-->SATA translation, but also partly due to my laptop's poor interrupt routing -- the USB controller is sharing IRQ 11 with several other devices, and the FreeBSD kernel needs to pick up the Giant lock to handle each interrupt.

The ugly: FreeBSD doesn't handle removal of drives very gracefully. When I unplug the USB cable, FreeBSD recognizes that the device is gone -- but if there is a filesystem mounted from the device, that filesystem remains mounted. FreeBSD doesn't want to unmount the filesystem, since it thinks the underlying device is busy; but at the same time you (obviously) can't do anything with that filesystem. If you ask FreeBSD to forcibly unmount the filesystem -- or if FreeBSD shuts down, at which point it forcibly unmounts every filesystem -- then it will panic.

I imagine that this could be fixed by teaching the kernel to forcibly unmount filesystems at the point when their underlying device is being removed (but before freeing the data structures associated with the device), but I'm not comfortable enough in the FreeBSD kernel to try to make that sort of change myself. In any case, there is a very simple answer to unplugging the drive while it has a filesystem mounted: Don't do that!

Posted at 2006-01-28 17:10 | Permanent link | Comments

Canadian election results trivia.

Now that the results of the 39th Canadian general election are (mostly) in, I have looked through the numbers (helpfully provided by Elections Canada in CSV format) and pulled out some of the more interesting statistics:

Note to media and blogs: Feel free to republish the above (in part or in whole), giving credit to Colin Percival or a link to this post.

Posted at 2006-01-26 11:15 | Permanent link | Comments

Garbage collection is evil.

For several days I've been wrestling with a peculiar performance problem in Maple. In the Quadratic Sieve code I'm currently writing, I use external C code to perform the sieving -- that is, I have a QuadraticSieveSieveInterval() function which I wrote in C and call from Maple -- and the relations are collected and filtered in Maple. This allows me to keep the amount of C code needed to a minimum by using Maple for some of the messy initialization (e.g., computing modular square roots).

The performance problem arose in the "collecting relations in Maple" part. My code is roughly as follows:

while (Nrels < NumberOfRelationsWanted) do
	rels := QuadraticSieveSieveInterval( ... );
	for rel in [rels] do
		Nrels := Nrels + 1;
		rtab[Nrels] := rel;
	od;
od;

With NumberOfRelationsWanted equal to 30000, I noticed something very odd: If I commented out the "rtab[Nrels] := rel" line -- that is, if I counted the relations, but didn't store them -- then the code would be faster by roughly 150 seconds. However, after collecting all the relations, I could copy them all into a new hash table in under one second. Somehow adding the relations to a table while they were being generated was 200 times slower than adding the relations to a table after they are generated.

After some exploration of Maple's profiling capabilities, I noticed an unexpected function was (according to the profiler) using 15% of the total CPU time: the garbage collector. This made me immediately suspicious, since the most significant difference (aside from the very much increased time taken) between throwing the relations away and collecting them in a table is that collecting them means that the total memory usage increases over time. With a bit more searching, I found that a kernel option "gcfreq" which controls the frequency with which Maple's garbage collector is called. The default value is "every million words allocated"; I changed this to "every hundred million words allocated", and suddenly my code was 160s faster -- even with the "store the relation in a table" operation which had been peculiarly slow, my code was now faster than it had been without that operation before.

I'm not sure quite why my code was causing the garbage collector to perform so poorly, but it might be related to the combination of very small memory allocations (used by Maple) and rather large memory allocations (in the sieving code itself). Whatever the cause, it's worth remembering that while garbage collection isn't always slow, it certainly can be slow and should be investigated as a possible cause of unexplained poor performance. J.K. Rowling remarked, via a character in the second Harry Potter book, that one should "never trust anything that can think for itself, if you can't see where it keeps its brain"; in much the same vein, I would suggest that one should never trust a programming language if you can't see where and how it allocates and deallocates memory.

Posted at 2006-01-14 07:00 | Permanent link | Comments

Recent posts

Monthly Archives

Yearly Archives


RSS