« Agile Testing | Main | Monoblog »

The Fog of Service

The 'classic' monitoring technologies (such as snmp and syslog) are mostly concerned with machine and server level events, and don't support the monitoring of things that are going on at the higher messaging and application levels. Importantly the intepretation of system level events with business level issues is mostly non-existent in IT. On a project last year we used a combination of XMPP, RDF and Atom to provide a monitoring system that informed on multiple layers of the stack. It's proving useful since the system can report on events at abitrary levels and from arbitrary nodes. It's also semantically rudimentary, for example it doesn't interpret collections of low level events as a potential business issue - the implications of such events are left to the people operatoring the system which requires domain knowledge.

It may be that NASA, Fedex, Formula One crews, and the odd supply chain have figured out how to make sense of telemmetry noise, but it's an open issue for Web and Service oriented systems. It would be so much better if we had protocols and formats for this. There do not seem to be WS or Semweb specs targeted at this area (if I missed any, please let me know).

My current thinking is the most useful feature direction for such a monitoring system would be to allow matchers to register a pattern or set of patterns that it should be notified about. Which sounds a lot like a blackboard architecture or a content based non-router.


February 18, 2005 01:03 AM

Comments

anon
(February 18, 2005 03:13 AM #)

webMethods has a product - "Optimize" that claims to be able to correlate patterns of low level events with high level events.

http://www.webmethods.com/meta/default/folder/0000006336?time=416445452510078229

Scott
(February 18, 2005 06:02 AM #)

Would you mind elaborating on this:

Importantly the intepretation of system level events with business level issues is mostly non-existent in IT.



I've been contributing to the development of Chainsaw (http://logging.apache.org/log4j/docs/chainsaw.html), which was originally designed as a tool to help developers. A number of features have been added that may be helpful in addressing some of the monitoring concerns you discuss.

Here are a few of Chainsaw's features:


  • receive events from multiple sources (text files, databases, vfs, network)

  • route events to tabs in the UI based on content (by default, events sharing both host and application name are routed to the same tab)

  • slice & dice events on a tab (colorize, search and filter using a simple expression language)

  • support the creation of event views (combine events from multiple sources into a single tab)

  • support custom event 'properties' in the expression language and the UI (name/value pairs assigned by the person creating the event)

Event properties and the expression language come into play when we start to look for ways to recognize patterns in the event system. The key is for the business unit to define a set of naming conventions for event properties, and for the dev team to trigger events containing those properties when needed.

Here's an example:


  • An app triggers an event that contains an 'INVENTORY' property. The INVENTORY property holds some value that makes sense to the business unit.

  • Chainsaw is running and the tab displaying events for this process (or processes) is filtering events using this expression:

    PROP.INVENTORY exists


In the UI, only events that contain an INVENTORY property would be displayed in the tab, and the person monitoring the system would be able to take appropriate action.



And one more:

  • 5 app servers are writing to their own log files.

  • Chainsaw is configured to process each of these log files (events for each app server are routed to their own tab).

  • The person monitoring the system has defined a 'view' using this expression:

    LEVEL >= WARN


With this configuration, regardless of which app server generates a warning, the event notifications are displayed on a single screen. Only one place to look.



Does this sync with what you were looking for? I'm interested in making Chainsaw useful in IT/Operations and would appreciate suggestions you'd have.

Mike
(February 18, 2005 04:34 PM #)

I'm intrigued by the fledgling BugsAppender project, using log4j appenders to generate notifications of issues in production systems.

http://bugsappender.sourceforge.net/

Bill de hOra
(February 18, 2005 04:35 PM #)

Wow. Thanks Scott, I had no idea Chainsaw had those kind of features. Chainsaw was a candidate component for the monitoring app I mentioned. I'll be taking a closer look.

Post a comment

(you may use HTML tags for style)




Remember Me?

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1474