在Windows上的Nutch:错误crawlInjector

我试图安装nutch 1.12在基于cygwin64 2.874的Windows 2012服务器上。 由于java和linux的技能有限,我在https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Seeding_the_crawldb_with_a_list_of_URLs上按照一步一步的介绍。 命令

bin/nutch inject crawl/crawldb urls 

抛出错误,因为无法findwinutils.exe。 这里是hadoop日志:

 2016-07-01 09:22:25,660 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326) at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:432) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:478) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64) at org.apache.nutch.crawl.Injector.main(Injector.java:441) 2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: starting at 2016-07-01 09:22:25 2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: urlDir: urls 2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2016-07-01 09:22:26,050 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-07-01 09:22:26,394 ERROR crawl.Injector - Injector: java.lang.NullPointerException at java.lang.ProcessBuilder.start(Unknown Source) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849) at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149) at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:58) at org.apache.nutch.crawl.Injector.inject(Injector.java:357) at org.apache.nutch.crawl.Injector.run(Injector.java:467) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.crawl.Injector.main(Injector.java:441) 

我在这里发现了几个提示,从https://codeload.github.com/srccodes/hadoop-common-2.2.0-bin/zip/master下载了winutils.exe。 我解压缩服务器上的文件夹,并设置环境variablesNUTCH_OPTS = -Dhadoop.home.dir = [winutils_folder]。 现在,winutils错误消失了,但是nutch调用失败,出现了一个不同的错误:

 2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: starting at 2016-07-01 13:24:09 2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: urlDir: urls 2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2016-07-01 13:24:09,885 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-07-01 13:24:10,307 ERROR crawl.Injector - Injector: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849) at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149) at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:58) at org.apache.nutch.crawl.Injector.inject(Injector.java:357) at org.apache.nutch.crawl.Injector.run(Injector.java:467) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.crawl.Injector.main(Injector.java:441) 

更新.bashrc(添加以下行)后,hadoop日志只显示警告。

 export JAVA_HOME='/cygdrive/c/Program Files/Java/jre1.8.0_92' export PATH=$PATH:$JAVA_HOME/bin 

但是nutch仍然会抛出一个错误:

 Injector: starting at 2016-07-01 17:25:22 Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:570) at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977) at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:173) at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:160) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:94) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at org.apache.nutch.crawl.Injector.inject(Injector.java:376) at org.apache.nutch.crawl.Injector.run(Injector.java:467) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.crawl.Injector.main(Injector.java:441) 

我需要提示什么可能是错误的configuration,或者是不可能与Windows / cygwin运行nutch?