Thom Nichols

Thom

Technology is evolution outside the gene pool

Tracelytics: Firebug for the Server Side

I saw an interesting talk at last night's Providence Geeks by a startup called Tracelytics.  Their product is something you could roughly call "Firebug for the server side."  Tracelytics provides software profiling across the entire server side of a request.  Now this isn't necessarily anything new (DynaTrace comes to mind) but what's interesting is that Tracelytics actually profiles the entire request path, from the web server through the app server across RPC boundaries and even through the database.  So you can, for instance, see how much of the request time was consumed in your web front-end before the request was passed off to the app server, or see that there's a cache miss in one of your services (running on a separate machine) that's gumming up the works.   

The product essentially consists of two parts.  Its core is X-Trace (brainchild of Brown CS professor Rodrigo Fonseca) which provides the instrumentation.  When the code is executed, X-Trace essentially performs all of the profiling and spits out raw metrics (akin to a log file).  What Tracelytics is selling is the software and services that analyze those log files and provide all sorts of flashy graphs that make it possible to consume the raw data and visually determine where your computing power is being spent.

The X-Trace piece includes modules for various platforms (MySQL, Rails, PHP, Lighttpd, etc.) and is pledged to be open-sourced, which is super cool.  Unfortunately, everything visible (all the fancy Gantt charts and stuff) are the 'analytic' part that presumably will not be free.  (Hey, they have to make money somehow.)  But someone from the community could (for instance) release a Firebug plugin to retrieve and visualize the sever-side trace for your last request.  

A couple questions I have which hadn't been asked at the meeting...

  1. How much do you have to tell X-Trace about your architecture?  That is, do I need to say "the app server connects to the background processing service" or is it all transparent if it's instrumented?
  2. They mentioned that typically a small percentage of requests are sampled on the production system.  But how are the requests chosen?  If the code is instrumented, it typically comes at a performance penalty.  Are two sets of code running, and only a portion of requests are directed to the instrumented code?
  3. How deep does the rabbit hole go?  Their demo showed back traces across servers, which is cool.  But can you see the OS-level call?  (That seems like it would be excessive, but it would be neat.)  I'm also curious if SQL calls are broken down beyond the query (e.g. can you see that you forgot to index a column that's being filtered?)
  4. Besides backtraces, can I see an execution time for every line of code?  That seems like would be particularly expensive to track, but presumably doing so you could even find unused bits of code.
  5. Finally, do they support Java?


In any case, it's cool stuff, and as far as I know, something that's missing in the open-source world at the moment.  They're looking for Alpha test partners -- unfortunately all my sites run on AppEngine -- Hit them up if you're interested. 

Go Tracelytics!

(Comments are closed)

2 Comments

  1. avatar Re: Tracelytics: Firebug for the Server Side Sept. 28, 2010 dan k

    Hey Thom--thanks for coming out to the event and sorry we didn't get to these.  I'll try to address the questions here:

    1. You don't have to configure anything about the relationships between the components -- X-Trace figures that all out for you.

    2. The code is optimized for the non-tracing case as well as the tracing case.  The overhead of having the tracing modules installed but not using them on a given request (sampling rate is configurable at the web server level) is very small; on the order of a few if statements.

    3. We do grab additional stats about queries: the amount of data transferred, whether an index was used, was it a good index?, etc.  In terms of system calls, we're not catching most of them right now due to the overhead/value ratio.

    4. In order to keep tracing as lightweight as possible, we instrument a minimal set of points by default. However, users can add whatever further trace events and we'll capture deltas in time, memory consumption, etc. between them.

    5. Yes, though we currently don't have support for any specific Java
    web frameworks.

    Hope that helps!
  2. avatar Thanks Dan! Sept. 29, 2010 Thom

    Dan thanks for the reply!  Hope to hear more about the growth of Tracelytics in Rhode Island.