Mallet在Linux中工作,但不在Windows中工作

好的,我正在尝试使用Mallet对Windows中的某些文档进行分类

我已经在Linux中实现了。 只是无法在Windows(目标环境)中完成这项工作

我已经将数据导入到.mallet文件中。

然后使用这个input数据创build一个分类器。

-rw-r--r-- 1 henry henry 15197116 Feb 23 15:56 nntp.classifier 

 07/03/2014 21:28 15,197,116 nntp.classifier 

但是,当我在Linux中运行:

bin / mallet classify-dir –input ./testfolder –output – –classifier nntp.classifier

它迭代testing文件夹中的任何文件,并抛出它认为每个类的类。

但是,如果我在Windows中运行相同的命令:

 bin\mallet classify-dir --input ./testfolder --output - --classifier nntp.classifier 

它只是抛出命令列表:

 Mallet 2.0 commands: import-dir load the contents of a directory into mallet instances (one per file) import-file load a single file into mallet instances (one per line) import-svmlight load a single SVMLight format data file into mallet instances (one per line) train-classifier train a classifier from Mallet data files train-topics train a topic model from Mallet data files infer-topics use a trained topic model to infer topics for new documents estimate-topics estimate the probability of new documents given a trained model hlda train a topic model using Hierarchical LDA prune remove features based on frequency or information gain split divide data into testing, training, and validation portions Include --help with any option for more information 

我注意到的东西:我

我在linux下运行bin/mallet classify-dir --help我得到帮助文件,即每个命令的描述,但是在Windows bin\mallet classify-dir --help中的相同的结果不会产生相同的结果 – 只是该命令上面列表…(如果你input垃圾作为命令,它也会做同样的事情)

而早先的命令之一,例如bin/mallet import-dir --helpbin\mallet import-dir --help会生成相同的完整帮助文件输出。

在bin目录中存在一个存在问题的whitlet mallet.bat文件。 你应该修改它在:

 @echo off rem This batch file serves as a wrapper for several rem MALLET command line tools. if not "%MALLET_HOME%" == "" goto gotMalletHome echo MALLET requires an environment variable MALLET_HOME. goto :eof :gotMalletHome set MALLET_CLASSPATH=%MALLET_HOME%\class;%MALLET_HOME%\lib\mallet-deps.jar set MALLET_MEMORY=1G set MALLET_ENCODING=UTF-8 set CMD=%1 shift set CLASS= if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors if "%CMD%"=="import-smvlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.Vectors2Topics if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics if "%CMD%"=="estimate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics if "%CMD%"=="hlda" set CLASS=cc.mallet.topics.tui.HierarchicalLDATUI if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader if "%CMD%"=="run" set CLASS=%1 & shift if not "%CLASS%" == "" goto gotClass echo Mallet 2.0 commands: echo import-dir load the contents of a directory into mallet instances (one per file) echo import-file load a single file into mallet instances (one per line) echo import-svmlight load a single SVMLight format data file into mallet instances (one per line) echo train-classifier train a classifier from Mallet data files echo classify-dir classify the contents of a directory with a saved classifier echo classify-file classify a file with a saved classifier echo train-topics train a topic model from Mallet data files echo infer-topics use a trained topic model to infer topics for new documents echo estimate-topics estimate the probability of new documents given a trained model echo hlda train a topic model using Hierarchical LDA echo prune remove features based on frequency or information gain echo split divide data into testing, training, and validation portions echo Include --help with any option for more information goto :eof :gotClass set MALLET_ARGS= :getArg if "%1"=="" goto run set MALLET_ARGS=%MALLET_ARGS% %1 shift goto getArg :run java -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS% :eof 

能够在Windows环境中进行分类。

我希望这可以帮助。

纳齐奥西隆

请注意,ignazio提供的.bat文件(包括在mallet-2.0.7下载,不幸的是)导致它寻找“import-smvlight”,而不是“import-svmlight” ,这是帮助信息中指定的内容。 如果您想使用此功能,请确保您切换“m”和“v”。