« There is no such thing as the perfect information model | Main | Diego Doval: openoffice thru java »

xerces.hell

A whole new level of classpath nonsense has presented itself in the last few days.

Start.

So, I've got this code to validate a messaging envelope (not SOAP-based). The validation component sits at the front of an XML pipeline. I'm adding a config file for loading up document schemas and DTDs. Very nice, no code changes to the validator for new versions of a schema. The config file is pointed to by a .properties entry. Lots of unit tests with good coverage, testing the configuration as well as the behaviour. And using Ant the classpath is udner control (more on this).

Deploy to JBoss. First burp, the .properties file in the container is using backslashes, so the path to the config is a mashup. Fixed that, but it took a while to see.

Bounce JBoss. Second burp. MethodInvocationException, no such method. Hmm. Track it down to a class called CCI, the actual configurator (it loads individual schema files bound to a version numbers of the schema proper). The ctr for CCI isn't happy for some reason:


public CCI() throws Exception
{
itsDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
itsDocumentBuilderFactory.setValidating(true);
itsDocumentBuilderFactory.setIgnoringElementContentWhitespace(true);
itsDocumentBuilderFactory.setIgnoringComments(true);
itsDocumentBuilderFactory.setCoalescing(false);
itsDocumentBuilderFactory.setNamespaceAware(false);
itsDocumentBuilderFactory.setExpandEntityReferences(false);
itsDocumentBuilder = itsDocumentBuilderFactory.newDocumentBuilder();
}

Looks fine, compiles fine, test fine. Doesn't run in the container. I'm lost. After some checking, it turns out that this line:

    itsDocumentBuilderFactory.setIgnoringElementContentWhitespace(true);

is the problem. More checking. I'm really lost now. Start moving jarfiles around and around, namely xerces.jar and xercesImpl.jar. [I'm sure XML parsing jars are closely related to rabbits, because they seem to pop up everywhere in the system. You look in a folder, come back to it an hour later and there's a new one sitting there].

I find the xerces.jar that is being loaded and put it into the build classpath for IDEA and Ant. Wow. they both tell me that indeed there is no such method as :


itsDocumentBuilderFactory.setIgnoringElementContentWhitespace(true);

on DocumentBuilderFactory. Oh really? Well that flag is needed, to avoid checking for (ignorable) ws nodes as the XML is walked (before you ask, yes, it's not a mixed content model and each config file has the dtd inlined). The jar has no version information in its manifest entry.

So I take it out of the classpath, and drop in xerces from Ant 1.5.1 (the one I'm building and testing against, that happens to work fine). Bounce JBoss. Out of memory exception. I'm not going there. Try another xerces or two from who knows where else - more out of memory exceptions. Eventually, I use Tomcat 4.0.1's.

Success.

No. Wait. Third burp. The system property. org.xml.sax.driver is not set anymore (according to Jing's RELAX NG validator). Ok, set the driver.

Success.

After all that, I go back and add the (ignorable) ws checking anyway. God knows I don't want this breaking because the classpath has changed.

Finish.

Time elapsed, nearly 2 days. I am rage. I feel so damn stupid.

It would have taken much longer without this call, which tells where a class came from:


Object o = ...
o.getClass().getProtectionDomain().getCodeSource().getLocation().toString();

Without this call, I would have probably broken down like a baby and written a SAX handler (even though the config file structure is recursive and a DOM tree is handy for that). It's going into the logging framework for this system, I hope soon.

Here is an example of how to use Ant to add manifest info to a jar:


<target name="mdb-jar" depends="compile" description="generate the mdb-jar file">
<mkdir dir="${build.lib}"/>
<mkdir dir="${build.classes}/${meta.inf}"/>
<copy todir="${build.classes}/${meta.inf}">
<fileset dir="${src.dir}/${meta.inf}/MDB-INF">
</fileset>
</copy>
<jar jarfile="${build.lib}/${mdb.jar.name}">
<manifest>
<attribute name="Built-By" value="${user.name}"/>
<attribute name="Sealed" value="${mdb.jar.Sealed}"/>
<attribute name="Specification-Title" value="${mdb.jar.Specification-Title}"/>
<attribute name="Specification-Version" value="${mdb.jar.Specification-Version}"/>
<attribute name="Specification-Vendor" value="${mdb.jar.Specification-Vendor}"/>
<attribute name="Implementation-Title" value="${mdb.jar.Implementation-Title}"/>
<attribute name="Implementation-Version" value="${mdb.jar.Implementation-Version}"/>
<attribute name="Implementation-Vendor" value="${mdb.jar.Implementation-Vendor}"/>
</manifest>
<fileset dir="${build.classes}">
<include name="${meta.inf}/**"/>
<patternset refid="jar.target.mdb.inclusion.set"/>
</fileset>
</jar>
<delete dir="${build.classes}/${meta.inf}"/>
</target>


February 11, 2003 01:36 PM

Comments

Danny Ayers
(February 11, 2003 11:02 PM #)

Hi Bill,
After thinking I'd done with classpath problems when I started putting it in the jar's manifest, last week I was caught again, also by Xerces classes because :

>Standard libraries introduced with jdk 1.4 have precedence over the
>libraries in the classpath, so in the case the standart library isn't
>suitable it has to be overriden by setting "java.endorsed.dirs".

Danny.

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/913

Listed below are links to weblogs that reference xerces.hell:

» Bill de hra: xerces.hell from Raw Blog
Is there a Patron Saint of classpaths? [Read More]

Tracked on February 11, 2003 11:03 PM

» More on XML libraries & classpath from java work
Seems like I'm not the only one running into classpath & jar problems. I've been perusing weblogs and found this about Bill de hra: xerces.hell. And Steve wrote a classpath verifyer... neet ! [Read More]

Tracked on February 16, 2003 11:10 PM