CMIS Specifics

Bex Huff on CMIS: "I have some issues with this, because I feel APP isn't robust enough for large scale syndication. "

AtomPub is a posting protocol, not a syndication protocol.

"There simply is no guarantee of quality of service when you're using "feeds", "

What does "quality of service" mean?

"and polling-based architectures simply don't scale to thousands of enterprise applications. That's the dirty little secret that ReST fanboys don't want you to find out..."

Someone might want to tell that to the web syndication world. I think their web is bigger than your enterprise.

There's probably a point in Bex's post, ECM can get very complicated but there'd need to be a lot more precision about criticising web technology like RSS/Atom/AtomPub/Http. For example:

  • Versioning
  • Synchronisation
  • Private/restricted content
  • User varying content
  • Conflict resolution
  • Batching
  • Error codes
  • Translation
  • Editing workflows
  • Composite documents
  • Multipart posting
  • Security
  • Search (including thesauri and vocabularies)
  • Partial updates
  • Publishing, (and multichannel publishing)
  • Link verification
  • Metadata management
  • Multiformat export

which is the meaty stuff once you get beyond basic CRUD work. But that would require a more detailed post and less handwaving ;)

Tags:

10 Comments


    > I think their web is bigger than your enterprise.

    Oh snap!


    [Google hat on] The last time we checked, 16% of the pages in Google's index had at least one feed autodiscovery element. (Sorry, I can't share absolute numbers.) Obviously many of these pages link to the same feed, but still, that's a lot of feeds. And that was in 2006. [Google hat off]

    Reading the linked article reminds me of the arguments we had like 5 years ago, when people complained that feed aggregators were sucking up so much bandwidth because they were downloading their entire feed once an hour. Then someone politely explained ETags and they skulked away quietly, mumbling something about how they still couldn't imagine how this could possibly scale. Good times.


    ETags!?!? FAIL.

    An ETag is only useful if the whole feed is unchanged.

    If one single item in a feed with 100 items changed, then you have to download the whole thing again. That's 100 times the required bandwidth, despite lackluster and frequently unused caching tags.

    If you go the other extreme and have one item in a feed to maximize cache hits, then you risk losing data because feed updates might be more frequent than your polling. Poll once per day, update twice per day, and you lose half the data.

    No matter how you cut it, messaging is more robust and scalable than polling any day of the week. I'm not the first, nor will I be the last, to call attention to this pretty frigging obvious problem:

    http://www.oreillynet.com/conferences...


    bex: did you benchmark that? Seriously, assuming performance characteristics for something like this without even having a proper use case (how many changes, how many pollers) is unscientific. I doubt your example would really be a problem in the real world (tm).

    And for your missed updates problem, there is paging. I find it interesting to say that messaging is more robust than polling - I actually think the opposite. Messaging gets awkward when recipients are offline, or overwhelmed by message amounts, or when a server runs into problems managing _many_ different recipients, etc. It might have certain performance in certain, limited cases, but it's absolutely not more robust than polling.


    Bex: "If one single item in a feed with 100 items changed, then you have to download the whole thing again. That's 100 times the required bandwidth, despite lackluster and frequently unused caching tags."

    Right. Which matters more - the bandwidth cost, or having to hit the DB again to get a precise delta?

    "If you go the other extreme and have one item in a feed to maximize cache hits, then you risk losing data because feed updates might be more frequent than your polling. Poll once per day, update twice per day, and you lose half the data."

    RFC5005, feed paging - page until you hit the first matching local id/updated pair, then stop.

    "No matter how you cut it, messaging is more robust and scalable than polling any day of the week. "

    There's more to it than that. If I service a lot of users, how do I manage those open connections to push messages out? Where do I keep messages when the user is offline? For how long? How do I transmit large files? And so on.

    "I'm not the first, nor will I be the last, to call attention to this pretty frigging obvious problem:"

    So, the most scalable messaging system is email. Will that work for you?


    wrt security. I am trying.


    "Right. Which matters more - the bandwidth cost, or having to hit the DB again to get a precise delta?"

    If well designed, a feed should be either statically published, or dynamically cached. So... the bandwidth cost is more important.

    "RFC5005, feed paging - page until you hit the first matching local id/updated pair, then stop."

    RFC 5005 is a good start, but not enough. Here's a quote from the RFC itself:

    "Paged feeds are lossy; that is, it is not possible to guarantee that clients will be able to reconstruct the contents of the logical feed at a particular time."

    Regarding messaging, I think we're not talking about the same thing... I'm thinking about a JMS or XMPP wrapper around the data packets.

    What I'd like, is a simplified API that is ReST-inspired, but one that supports a (shudder) universal endpoint, and a wrapper. That means its easy to use from a HTML form post, a HTTP client, and through JMS...

    And I'd like to able to do that without ReST fanatics screaming bloody murder ;-)


    "So... the bandwidth cost is more important."

    It's hard to dynamically cache a feed reqeust that is designed to bust caches - which is what a lot of "since" + "protected" content requests do. Maybe that doesn't matter for small enterprise systems that don't have millions of users ;)

    "Regarding messaging, I think we're not talking about the same thing... I'm thinking about a JMS or XMPP wrapper around the data packets."

    So am I. Use JMS a lot.

    Try standing up an XMPP stack against a lot of users - a few hundred thousand to start, with long lived connections. First thing to go are the load balancers, next issue will the overhead of the presence write to the database/mnesia/whatever, next thing will be the cluster swarming because some of the XMPP systems use S2S for cluster comms. There's no free lunch here that I can see.

    But email still has the biggest reach, as long as we're ignoring file sharing networks or p2p systems. Interestingly, email is a hybrid - a push to pull model. Why is that good? Frees up connection state on the server statistically speaking; clients drop off.

    Check again the high-thruput, fast, always-on JMS systems you know about. I think you'll find most of them have a relatively low number of listeners/subscribers, with a metric buttload of messages. They're ideal for point to point scenarios. Not the same as having a lot of users show up for data, and hog connections all day.

    "And I'd like to able to do that without ReST fanatics screaming bloody murder ;-)"

    *who* are these REST fanatics? Even if you can find them, how can you be a fanatic about an architectural model that has its technical properties written down? Objectively. Unlike, say any of the following - EAI, WS, ECMS, SOA, REST can actually be evaluated as being fit for purpose, or not. Sounds dangerously close to actual engineering ;)


    what are your thoughts on using SIP for large scale syndication?


    PoSnIR FFFIILLUUUSSS2, <a href="http://nhro1f.robafome.co.cc/page-216.html ">medical equipment rentals denver </a>, [url="http://nhro1f.robafome.co.cc/page-216.html " ]medical equipment rentals denver [/url], http://nhro1f.robafome.co.cc/page-216... medical equipment rentals denver , 8082,


Comments are closed for this entry.