Nutch with basic authentication
November 11, 2005 |
co.mments
Cool. I seem to have hacked Nutch into spidering with basic authentication. The protocol-httpclient plugin has the guts of an approach for authentication, but doesn't work ootb. It wasn't hard to fix on the nutch trunk. What I have - see a 401, retry with the auth details - is probably ok for an intranet but a web/wide-area scenario really needs preemptive authentication. The next thing to do would be to patch protocol-httpclient further to read in all the http.auth.basic.* properties and configure them on HttpClient. I'll try to post details soon.
November 11, 2005 02:04 AM
Comments
Post a comment
Trackback Pings
TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1667