正则expression式从多个txt文件的第一行删除东西

我search了互联网天试图解决这个问题,但没有成功。 我不能显示任何代码,因为我所尝试的一切都是错误的。 所以我的问题也许有点长,但是我希望有人能帮助我。

我得到一个文件夹,其中包含了很多.txt文件与刮擦的数据(1000+)。 现在我想要一个循环遍历所有这些文件的脚本,改变第一行然后再保存文件。 在.txt文件的第一行,有一个标题。 但是我想要从标题中删除某些东西。 以下是来自5个不同的.txt文件的几个第一行示例

 Hello world my name is awesome, but you knew that video c5 Hello world my name is awesome video3 v: Everybody knows I am r Hello world my name is awesome: It got 100 likes in 10 min 1f Hello world my name is awesome 43 video 2: Did you know that: Hello world my name is awesome a3: It is Mr smokealot 1f, 

我想我涵盖了所有可能的组合。 我想只是这个部分Hello world my name is awesome留在每一个文件的第一行,其余的必须删除。 我还在.txt文件的其他行上获得了文本,但这些文本必须保持不变。 每个标题是不同的每个.txt文件,但我用Hello world my name is awesome ,显示至less部分的第一行包含标题。

我认为这些是为了解决我的题目而必须做的事情。

  1. 第一次昏迷或结肠后,删除所有字符,包括昏迷和冒号本身。
  2. 删除所有包含数字的单词和字母,如第五行的例子中的a3。
  3. 删除所有的数字。
  4. 删除video这个词。
  5. 删除字母“I”以外的所有单个字符。 这是从第二个第一行示例中删除“v”

完成后,我想将文件保存在当前名称的当前位置。

这可能吗?

REGEX批量支持是非常有限的(只在findstr实现,只是一个非常小的子集)。因此,让我们做(几乎)没有REGEX:

 @echo off setlocal enabledelayedexpansion REM t.txt contains your first-line-examples for /f "delims=" %%a in (t.txt) do call :getTitle %%a goto :eof :getTitle set title=%* REM get rid of all characters after (including) comma or colon: for /f "delims=,:" %%a in ("%title%") do set title=%%a REM get rid of every word with numbers and anything after it: set "line=" for %%a in (%title%) do ( REM if the word contains a number, Exit the Loop, else add the word to the line: echo %%a|findstr "[0-9]" >nul && goto :endOfTitle ||set line=!line! %%a ) :endOfTitle REM get rid of the leading space (included with the SET in the last FOR) if defined line set line=%line:~1% echo(%line% 

t.txt的内容:

 Hello world my name is awesome, but you knew that video c5 Hello world my name is awesome video3 v: Everybody knows I am r Hello world my name is awesome: It got 100 likes in 10 min 1f Hello world my name is awesome 43 video 2: Did you know that: Hello world my name is awesome a3: It is Mr smokealot 1f, Hello world my name is John Doe, but you knew that video5c Hello world my name is Jane Doe video3v: but you knew that Hello world my name is Superman: but you knew that 1f Hello world my name is Asterix, Gallier 43: but you knew that 1f 34: this is obviously an invalid title Any string is a title until it hits a comma or colon, this is scrap Any word that contains a number ends the title l1ke this one 

输出:

 Hello world my name is awesome Hello world my name is awesome Hello world my name is awesome Hello world my name is awesome Hello world my name is awesome Hello world my name is John Doe Hello world my name is Jane Doe Hello world my name is Superman Hello world my name is Asterix Any string is a title until it hits a comma or colon Any word that contains a number ends the title 

(@莫菲:我包括你的一些例子 – 我喜欢名字:))

编辑处理当前文件夹中的所有文件

 @echo off setlocal enabledelayedexpansion REM for all .txt files in the current folder do: for /f "delims=" %%a in ('dir /b *.txt') do call :process "%%a" goto :eof :process set file=%1 REM get first line of this file: set /p first=<%1 REM call :getTitle %first% REM when %line% is empty, skip the edit: if defined line ( echo %line% >"%~n1.tmp" more +1 %file% >>"%~n1.tmp" @ECHO move /y "%~n1.tmp" "%file%" ) goto :eof :getTitle set title=%* REM get rid of all characters after (including) comma or colon: for /f "delims=,:" %%a in ("%title%") do set title=%%a REM get rid of every word with numbers and anything after it: set "line=" for %%a in (%title%) do ( REM if the word contains a number, Exit the Loop, else add the word to the line: echo %%a|findstr "[0-9]" >nul && goto :endOfTitle ||set line=!line! %%a ) :endOfTitle REM get rid of the leading space (included with the SET in the last FOR): if defined line set line=%line:~1% goto :eof 

如果您对输出满意,请删除@ECHO

我创建了以下内容的文本文件Test.txt

 Hello world my name is John Doe, but you knew that video5c Hello world my name is Jane Doe video3v: but you knew that Hello world my name is Superman: but you knew that 1f Hello world my name is Asterix, Gallier 43: but you knew that 1f 34: but you knew that this is an invalid title , but you knew that this is also an invalid title but you knew that this is an invalid title, too 

然后我开发了以下评论批处理文件来评估所有标题行:

 @echo off setlocal EnableDelayedExpansion set "FileTitle=" for /F "usebackq delims=" %%L in ("Test.txt") do ( call :GetTitle "%%L" echo Get from "%%L" echo the title "!FileTitle!" echo. ) endlocal goto :EOF :GetTitle set "Line=%~1" rem Replace the string " but you knew that" case insensitive rem by character # with a comma before which is always removed. set "Line=%Line: but you knew that=,#%" rem Use character # as separator to get everything left of this character. for /F "tokens=1 delims=#" %%T in ("%Line%") do set "FileTitle=%%T" if "%FileTitle%" == "," goto TitleError set "FileTitle=%FileTitle:~0,-1%" :RemoveSpaces if "%FileTitle%" == "" goto TitleError if "%FileTitle:~-1%" == " " set "FileTitle=%FileTitle:~0,-1%" & goto RemoveSpaces rem Is the last character a comma, remove this character, too. if "%FileTitle:~-1%" == "," ( set "FileTitle=%FileTitle:~0,-1%" if "!FileTitle!" == "" goto TitleError goto RemoveSpaces ) rem Is the last character not a colon, we have the file title. if not "%FileTitle:~-1%" == ":" goto :EOF rem Otherwise remove the colon and the word left to the rem colon if this word contains also at least 1 digit. set "FileTitle=%FileTitle:~0,-1%" set "LastWord=" rem The first condition in code below should be never true according to rem provided sample data, but it is necessary to avoid an endless loop. :RemoveWord if "%FileTitle%" == "" goto TitleError if "%FileTitle:~-1%" == " " goto EvaluateWord set "LastWord=%FileTitle:~-1%%LastWord%" set "FileTitle=%FileTitle:~0,-1%" goto RemoveWord :EvaluateWord for /L %%C in (0,1,9) do ( if not "!LastWord:%%C=!" == "%LastWord%" goto RemoveSpaces ) rem Append the removed word as it does not contain a digit. rem Then we have the file title and can exit the subroutine. set "FileTitle=%FileTitle%%LastWord%" goto :EOF :TitleError set "FileTitle=Error: No file title found^!" goto :EOF 

在命令提示符窗口中执行此批处理文件输出:

 Get from "Hello world my name is John Doe, but you knew that video5c" the title "Hello world my name is John Doe" Get from "Hello world my name is Jane Doe video3v: but you knew that" the title "Hello world my name is Jane Doe" Get from "Hello world my name is Superman: but you knew that 1f" the title "Hello world my name is Superman" Get from "Hello world my name is Asterix, Gallier 43: but you knew that 1f" the title "Hello world my name is Asterix, Gallier" Get from "34: but you knew that this is an invalid title" the title "Error: No file title found!" Get from ", but you knew that this is also an invalid title" the title "Error: No file title found!" Get from " but you knew that this is an invalid title, too" the title "Error: No file title found!" 

该批处理文件中的GetTitle函数使用FORSET命令大量地进行字符串替换来确定每个标题。

如果输出是你对这些行的期望,那么使用函数GetTitle

为了理解使用的命令及其工作方式,请打开命令提示符窗口,在其中执行以下命令,并仔细阅读为每个命令显示的所有帮助页面。

  • call /?
  • echo /?
  • endlocal /?
  • for /?
  • goto /?
  • if /?
  • rem /?
  • setlocal /?

我的下一个版本使用更新的示例数据

 Hello world my name is awesome Hello world my name is awesome, but you knew that video c5 Hello world my name is awesome video3 v: Everybody knows I am r Hello world my name is awesome: It got 100 likes in 10 min 1f Hello world my name is awesome 43 video 2: Did you know that: Hello world my name is awesome a3: It is Mr smokealot 1f, 

批处理代码现在比以前更容易使用第一次定义的规则:

 @echo off setlocal EnableExtensions EnableDelayedExpansion set "FileTitle=" for /F "usebackq delims=" %%L in ("Test.txt") do ( call :GetTitle "%%L" echo Get from "%%L" echo the title "!FileTitle!" echo. ) endlocal goto :EOF :GetTitle rem Get string left from first comma or first colon. The entire line rem is returned if the line does not contain any comma or colon. for /F "tokens=1 delims=,:" %%T in ("%~1") do set "WordsList=%%T" rem Process each word from list and check for containing any digit, or being rem a single letter different than upper case "I", or being the word "video". rem Those words are ignored on building the file title from the words list. rem It is expected here that non of the words contain a non word character rem as this could result in a syntax error on processing the words. set "FileTitle=" for %%I in (%WordsList%) do ( set "Skip=0" for /F "tokens=1 delims=0123456789" %%W in ("#%%I#") do set "Word=%%W" if "!Word!" NEQ "#%%I#" set "Skip=1" set "Word=%%I" if "!Word:~0,1!" EQU "%%I" ( if "%%I" NEQ "I" set "Skip=1" ) if /I "%%I" EQU "video" set "Skip=1" if "!Skip!" EQU "0" set "FileTitle=!FileTitle! %%I" ) rem Was there not at least 1 valid word found? if "%FileTitle%" == "" set "FileTitle= Error: No file title found." rem Remove the leading space and exit subroutine GetTitle. set "FileTitle=%FileTitle:~1%" goto :EOF 

示例输入数据的输出是:

 Get from "Hello world my name is awesome" the title "Hello world my name is awesome" Get from "Hello world my name is awesome, but you knew that video c5" the title "Hello world my name is awesome" Get from "Hello world my name is awesome video3 v: Everybody knows I am r" the title "Hello world my name is awesome" Get from "Hello world my name is awesome: It got 100 likes in 10 min 1f" the title "Hello world my name is awesome" Get from "Hello world my name is awesome 43 video 2: Did you know that:" the title "Hello world my name is awesome" Get from "Hello world my name is awesome a3: It is Mr smokealot 1f," the title "Hello world my name is awesome" 

这里还有一个与Stephan版本很相似的版本:

 @echo off setlocal EnableExtensions EnableDelayedExpansion set "FileTitle=" for /F "usebackq delims=" %%L in ("Test.txt") do ( call :GetTitle "%%L" echo !FileTitle! ) endlocal goto :EOF :GetTitle rem Get string left from first comma or first colon. The entire line rem is returned if the line does not contain any comma or colon. for /F "tokens=1 delims=,:" %%T in ("%~1") do set "WordsList=%%T" rem Process each word from list and check for containing any digit, or being rem a single letter different than upper case "I", or being the word "video". rem Those words are ignored and all other remaining words on building the rem file title from the words list. It is expected here that non of the words rem contain a non word character as this could result in a syntax error on rem processing the words. set "FileTitle=" for %%I in (%WordsList%) do ( for /F "tokens=1 delims=0123456789" %%W in ("#%%I#") do set "Word=%%W" if "!Word!" NEQ "#%%I#" goto CheckTitle set "Word=%%I" if "!Word:~0,1!" EQU "%%I" ( if "%%I" NEQ "I" goto CheckTitle ) if /I "%%I" EQU "video" goto CheckTitle set "FileTitle=!FileTitle! %%I" ) :CheckTitle rem Was there not at least 1 valid word found? if "%FileTitle%" == "" set "FileTitle= Error: No file title found." rem Remove the leading space and exit subroutine GetTitle. set "FileTitle=%FileTitle:~1%" goto :EOF 

在Stephan的答案中发布的示例输入数据的输出是:

 Hello world my name is awesome Hello world my name is awesome Hello world my name is awesome Hello world my name is awesome Hello world my name is awesome Hello world my name is John Doe Hello world my name is Jane Doe Hello world my name is Superman Hello world my name is Asterix Error: No file title found. Any string is Any word that contains 

Stephan的代码所产生的输出的不同之处在于单个字符单词不是I导致退出单词处理循环,因此会截断这个单词的标题。

这个单个字符的行为可以通过删除代码行来改变:

  set "Word=%%I" if "!Word:~0,1!" EQU "%%I" ( if "%%I" NEQ "I" goto CheckTitle ) 

然后,这第三个批处理代码的输出将与Stephan编写的批处理代码的输出相同,除了一个错误消息而不是一个空白行。


PS:这是一个很好的例子,如果提问者没有通过张贴质疑期望的输出数据来明确提供输入数据的输出数据应该怎么办。