I ran into scalability issues this week. My prototype system keeps an in-memory database of infered models from my rules. It seemed to be fine with my test cases, but when I decided to push it a little to see how it scaled, it java.lang.OutOfMemoryError on me. I had an idea of why it was happening. I was wrong. Then another idea. I was wrong again. The problem was that I was picking possibilities randomly, without data. Not a good way of doing science.
I decided I needed to profile the prototype to see where exactly my memory was being used. I was happy when I realized (after a few minutes of googling around) that I did not have to download nor configure anything. Turns out the Java 6 JDK comes with a built-in lightweight profiler called JVisualVM. So my machine already had the profiler one command away. JVisualVM uses Java's reflection API and other tricks to dynamically instrument a running Java program. Zero configuration and full profiling benefits. Nice.
I profiled the memory of my prototype, and it turns out my models where huge: 7MB each. This, with a max heap size of 128MB was easily throwing my service to its knees with only 15 inference requests. To further test its behaviour, I bumped the max heap size to a more server-friendly size of 512MB. The math says I can hold around 70 models now. But clearly there was something wrong as 7MB is way too much memory for what I was doing. After looking into my project settings I realized I never turned off inference derivation logging, and that was stealing most of the memory, aorund 6.5 of those 7MB! Memory problem fixed.
A much more subtle and hard to fix problem was the CPU time performance. I was averaging 500ms per inference. But after looking at the CPU profiling from JVisualVM, I was able to pinpoint the source of most of the CPU time consumption. I was firing way too many rules. Fixing this issue took much more time than pinpointing it though. Inference time will vary with the complexity of the model and the # of rules, but for my test cases, I managed to go from that ugly 500ms to 36ms on average. I'm still getting some spikes of ~150ms, but these occur only when the current heap size is almost full. My theory here is that Java is first doing a run of the Garbage Collector before committing to giving more heap to the program. But this is a minor issue now.
I'm starting to like Java again.