Hi,
I'm using Nutch for crawling Microsoft Windows shares using
smb
protocol. I can crawl my local network if I put in my
"urls" file urls
of all computers in the network. My "urls" file
looks like:
smb://comp1/
smb://comp2/
..
Is it possible to put in this file range of IP addresses? I
mean
something like this:
smb://192.168.18.*/ or smb://192.168.18.1/ -
smb://192.168.18.255/ or
anythig else.
I tried to put my networks name (smb://network_name/) as
jcifs says it
is a valid url
http://jcifs.samba.org/src/docs/api/jcifs/smb/SmbFile
.html.
But Nutch fails with java.net.UnknownHostException:
fetch of smb://werewolf/ failed with:
jcifs.smb.SmbException: smb://werewolf/
java.net.UnknownHostException: werewolf
at jcifs.UniAddress.getByName(UniAddress.java:301)
at jcifs.smb.SmbFile.getAddress(SmbFile.java:765)
at jcifs.smb.SmbFile.getType(SmbFile.java:1171)
at jcifs.smb.SmbFile.exists(SmbFile.java:1282)
at
org.apache.nutch.protocol.smb.SMBResponse.<init>(SMBRe
sponse.java:94)
at
org.apache.nutch.protocol.smb.SMB.getProtocolOutput(SMB.java
:65)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.j
ava:145)
Thank you.
|