I saw an interesting talk at last night's Providence Geeks by a startup called Tracelytics. Their product is something you could roughly call "Firebug for the server side." Tracelytics provides software profiling across the entire server side of a request. Now this isn't necessarily anything new (DynaTrace comes to mind) but what's interesting is that Tracelytics actually profiles the entire request path, from the web server through the app server across RPC boundaries and even through the database. So you can, for instance, see how much of the request time was consumed in your web front-end before the request was passed off to the app server, or see that there's a cache miss in one of your services (running on a separate machine) that's gumming up the works.
The product essentially consists of two parts. Its core is X-Trace (brainchild of Brown CS professor Rodrigo Fonseca) which provides the instrumentation. When the code is executed, X-Trace essentially performs all of the profiling and spits out raw metrics (akin to a log file). What Tracelytics is selling is the software and services that analyze those log files and provide all sorts of flashy graphs that make it possible to consume the raw data and visually determine where your computing power is being spent.
The X-Trace piece includes modules for various platforms (MySQL, Rails, PHP, Lighttpd, etc.) and is pledged to be open-sourced, which is super cool. Unfortunately, everything visible (all the fancy Gantt charts and stuff) are the 'analytic' part that presumably will not be free. (Hey, they have to make money somehow.) But someone from the community could (for instance) release a Firebug plugin to retrieve and visualize the sever-side trace for your last request.
A couple questions I have which hadn't been asked at the meeting...
- How much do you have to tell X-Trace about your architecture? That is, do I need to say "the app server connects to the background processing service" or is it all transparent if it's instrumented?
- They mentioned that typically a small percentage of requests are sampled on the production system. But how are the requests chosen? If the code is instrumented, it typically comes at a performance penalty. Are two sets of code running, and only a portion of requests are directed to the instrumented code?
- How deep does the rabbit hole go? Their demo showed back traces across servers, which is cool. But can you see the OS-level call? (That seems like it would be excessive, but it would be neat.) I'm also curious if SQL calls are broken down beyond the query (e.g. can you see that you forgot to index a column that's being filtered?)
- Besides backtraces, can I see an execution time for every line of code? That seems like would be particularly expensive to track, but presumably doing so you could even find unused bits of code.
- Finally, do they support Java?
In any case, it's cool stuff, and as far as I know, something that's missing in the open-source world at the moment. They're looking for Alpha test partners -- unfortunately all my sites run on AppEngine -- Hit them up if you're interested.