« That's entertainment | Main | Jabber and IM walled gardens »

Managing stuff with a web server

In 1753 Samuel Johnson said:

I saw that one enquiry only gave occasion to another, that book referred to book, that to search was not always to find, and to find was not always to be informed.

We're still nowhere.

Sean is asking interesting questions about URI applications and local data access linking to a piece by Micah Dubinko.

These days, I use mostly webservers to manage my files, having gone through a number of iterations using folders and ad-hoc scripts. The current modus operandi is to install Apache, running or backing the following:

  • Python CGIs
  • Wiki
  • Subversion

I also have public webspace set up with Python and a Wiki (I might get around to begging my provider for Subversion one day, but I'm not hopeful). Spidering into Lucene is imminent. So is RDF and TM metadata (every folder I've ever created was ultimately an exercise in shoehorning descriptive metadata into folder names).

It's not complicated . You don't need a fullblown setup to manage this and enter data. Apache can be configured to serve from a directory, which gives you web access without the overhead of forms or WebDAV. Just drag and drop files into the folder and you're done. Access control is a problem with any naive client machine running web servers. The upside of Apache is that if you're worried about access control, adding .htaccess files to a folder is straightforward.

Here's an example of how this is useful. In Propylon we have a Redhat backed server called Nimoy that has an smb share mapped into Apache document space. Folks can drag and drop dependencies for any project into that share - jar files, databases, app servers. For example with a Java project we can zip up all the jars for the project and drag them into a folder on Nimoy. An ant file can pull down the dependency zip using the get task and unzip it into the project's lib/ folder, thus keeping the version control free of binary cruft, providing a single canonical place to hold jar files, letting people get started without downloading files from all over the net or having them emailed. [Yes, Maven can manage jar files using HTTP as well.]

The next step is split the filesystem from file management altogether. What do I mean? Well, over the years I've moved away from a place where I would think hard about how to file everything away (where what I could do was predetermined by the file system at hand). I haven't be able or willing to do that for years - there's too much to classify and too many ways to classify it and I'm not paying myself to be a librarian. Then consider that folder based classification doesn't help with retrieval anyway unless you carry that classification scheme in your head all the time. Life's too short.

I prefer the fire and forget mode that is enabled by giving things URIs and putting them behind web servers. Everything else I've tried or seen was too complicated. I could imagine never classifying or sorting anything based on folders within a couple of years, preferring something like a Topic Map instead to tag the files with metadata - not that as I user I'd actually care how it's done. WinFS seems to be going in this direction, we'll see.

June 21, 2004 09:06 PM


Tom Passin
(June 21, 2004 10:01 PM #)

I've been getting more and more interested in something of the sort myself. One problem that I foresee is that once you have large numbers of stuff in the system, it will still be hard to remember what to ask for, and hard to tell it when you find it, meta data or no.

For this we need good, creative solutions. I have read that for every order of magnitude increase in the size of a collection - I'm speaking of library-style collections but it must apply just as well here - you need a new organizing principle to help the user find things.

What works for hundreds isn't so good for thoussands, and may be hopeless for ten-thousands.

(June 22, 2004 07:28 PM #)

Well, if you have a wiki page of metadata per document, then you tend toward a semi-controlled vocabulary which might allow for some browsing-by-category (and intersection of categories).


Tom Passin
(June 26, 2004 04:10 AM #)

Well, you know, creating and maintaining those categories is harder than it seems, and searching by them may or may not be that great, depending on how well they match your second-by-second thinking and associations.

I actually have been taking some steps in this direction by enhancing my topic map-based bookmark assistant. The system combines bookmarks from all the browsers I use, and builds a topic map. The app does some analysis of the bookmark folders and creates a lot of relationships between them. It works very well, and with some 2700 bookmarks, I don't know how I would manage without it.

The recent enhancement is to have the batch file - that combines the bookmarks and turns them into xml - look at a specific subtree of directories, and merge them and their contents into the bookmark topic map.

So now the bookmark assistant will show those files exactly as if they were bookmarks. Very convenient. No maintenance of categories are needed because they all get rebuilt from the actual directory names each time the batch process is run to rebuild the topic map. Any directory names automatically get related to bookmark folders with the same names. Search results get returned in two ways - by bookmark or file title, and by classifying according to the folders they reside in. Soon I may add on-the-fly clustering as well.

The way it works right now, you still have to name the files and directories decently or the whole scheme won't work so well. But it could certainly be extended to use other meta data. The search facilities make it fairly easy to find things, based on titles and the locations they are filed in. The system provides a wonderful ability to find things you had forgotten about or did not realize were related to somethng you wre looking for.

The system lets you add (by hand) your own meta data and remarks about any bookmark or file, but so far doesn't use that information for searching.

I don't think that plain searching is adequate, especially for a very large collection. I think that some kind of grouping (or more than one kind), and lists to look through, are also needed, along with some more really clever ideas.