« Convention over Configuration | Main | Django 0.90 »

Nutch with basic authentication

Cool. I seem to have hacked Nutch into spidering with basic authentication. The protocol-httpclient plugin has the guts of an approach for authentication, but doesn't work ootb. It wasn't hard to fix on the nutch trunk. What I have - see a 401, retry with the auth details - is probably ok for an intranet but a web/wide-area scenario really needs preemptive authentication. The next thing to do would be to patch protocol-httpclient further to read in all the http.auth.basic.* properties and configure them on HttpClient. I'll try to post details soon.


November 11, 2005 02:04 AM

Comments

Post a comment

(you may use HTML tags for style)




Remember Me?

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1667