Saturday, May 16, 2009

Performance Woes

It had been a few months since I had last worked on Snowflake.  Now that I’ve gotten back into, I’ve ported across the old AC’97 driver, and attempted to build a functional network stack.

And now I’ve been trying to get Snowflake to stream a wave file across the network and play it.  Unfortunately, I’ve run into a rather large problem: after a few hundred packets in less than a second seems to put a very heavy load on the garbage collector.

I’m fairly certain it is the garbage collector, but I haven’t yet figured out how to make a Linux app such that I can profile my network stack.  Given I also have customised O’Caml libraries, I’m even more reluctant to try to do this.

The other problem with trying to profile the network stack in Linux is that without the RealTek 8139 driver, the performance characteristics will likely change as well.  One would have to write the fake driver to mimic the behaviour of my RealTek driver fairly accurately in order to get somewhat reliable information.

I’m at a loss what to do next.  I was really excited about the thought of streaming wave files over the network and playing them with Snowflake.  Now I feel dejected with no hope of fixing this stupid problem.

Sure it is related to memory allocation, but every time I feel like I’ve eliminated potentially excessive or large memory allocations, and fix the bugs that crop up as a result, it seems to be no better.

There’s still one place I haven’t pruned memory allocations from yet, and that’s building the packets to be sent out on the wire.  And I’m not sure how I’m going to fix that either.  Ideally, I’d be like the parsing, and just write bytes to a pre-allocated Bigarray – preferably the actual buffer given to the driver for DMA.

However, if I do that, and it still doesn’t fix the problem, then I’ll be really screwed.  I want this to work so bad!

I’d also hate to see how much CPU Snowflake actually uses to stream the file across the network – I can imagine the CPU pegged at 100% – which wouldn’t be very efficient at all.

No comments:

Post a Comment