正则expression式在日志文件内匹配,返回匹配上下的dynamic内容

我有一些格式如下的catchall日志文件:

timestamp event summary foo details account name: userA bar more details timestamp event summary baz details account name: userB qux more details timestamp etc. 

我想searchuserB的日志文件,如果find,从前面的时间戳回显到(但不包括)下面的时间戳。 可能会有几个匹配我的search的事件。 回应某种--- start ------ end ---围绕每场比赛将是很好的。

这对于pcregrep -M来说是完美的,对吗? 问题是,GnuWin32的pcregrep崩溃与多行正则expression式search大文件,这些pcregrep日志可以是100兆或更多。

我试过了

到目前为止,我的解决方法是使用grep -B15 -A30来find匹配的行并打印周围的内容,然后将现在更易于pipe理的块pcregreppcregrep进行打磨。 问题是一些事件不到十行,而另一些事件是30或更多; 在遇到较短的事件时,我会得到一些意想不到的结果。

 :parselog <username> <logfile> set silent=1 set count=0 set deez=20\d\d-\d\d-\d\d \d\d:\d\d:\d\d echo Searching %~2 for records containing %~1... for /f "delims=" %%I in ( 'grep -P -i -B15 -A30 ":\s+\b%~1\b(@mydomain\.ext)?$" "%~2" ^| pcregrep -M -i "^%deez%(.|\n)+?\b%~1\b(@mydomain\.ext|\r?\n)(.|\n)+?\n%deez%" 2^>NUL' ) do ( echo(%%I| findstr "^20[0-9][0-9]-[0-9][0-9]-[0-9][0-9].[0-9][0-9]:[0-9][0-9]:[0-9][0-9]" >NUL && ( if defined silent ( set silent= set found=1 set /a "count+=1" echo; echo ---------------start of record !count!------------- ) else ( set silent=1 echo ----------------end of record !count!-------------- echo; ) ) if not defined silent echo(%%I ) goto :EOF 

有一个更好的方法吗? 我遇到了一个看起来很有趣的awk命令,例如:

 awk "/start pattern/,/end pattern/" logfile 

…但它也需要匹配中间模式。 不幸的是,我不熟悉awk语法。 有什么build议么?


埃德·莫顿build议我提供一些示例日志和预期的输出。

示例catch-all

 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730158 Mon Mar 25 08:02:28 2013 529 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 2 Logon Failure: Reason: Unknown user name or bad password User Name: user5f Domain: MYDOMAIN Logon Type: 3 Logon Process: Advapi Authentication Package: Negotiate Workstation Name: dc3 Caller User Name: dc3$ Caller Domain: MYDOMAIN Caller Logon ID: (0x0,0x3E7) Caller Process ID: 400 Transited Services: - Source Network Address: 169.254.7.86 Source Port: 40838 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730159 Mon Mar 25 08:02:29 2013 680 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 9 Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0 Logon account: USER6Q Source Workstation: dc3 Error Code: 0xC0000234 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730160 Mon Mar 25 08:02:29 2013 539 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 2 Logon Failure: Reason: Account locked out User Name: USER6Q@MYDOMAIN.TLD Domain: MYDOMAIN Logon Type: 3 Logon Process: Advapi Authentication Package: Negotiate Workstation Name: dc3 Caller User Name: dc3$ Caller Domain: MYDOMAIN Caller Logon ID: (0x0,0x3E7) Caller Process ID: 400 Transited Services: - Source Network Address: 169.254.7.89 Source Port: 55314 2013-03-25 08:02:32 Auth.Notice 169.254.5.62 Mar 25 08:36:38 DC4.mydomain.tld MSWinEventLog 5 Security 201326798 Mon Mar 25 08:36:37 2013 4624 Microsoft-Windows-Security-Auditing N/A Audit Success DC4.mydomain.tld 12544 An account was successfully logged on. Subject: Security ID: S-1-0-0 Account Name: - Account Domain: - Logon ID: 0x0 Logon Type: 3 New Logon: Security ID: S-1-5-21-606747145-1409082233-725345543-160838 Account Name: DEPTACCT16$ Account Domain: MYDOMAIN Logon ID: 0x1158e6012c Logon GUID: {BCC72986-82A0-4EE9-3729-847BA6FA3A98} Process Information: Process ID: 0x0 Process Name: - Network Information: Workstation Name: Source Network Address: 169.254.114.62 Source Port: 42183 Detailed Authentication Information: Logon Process: Kerberos Authentication Package: Kerberos Transited Services: - Package Name (NTLM only): - Key Length: 0 This event is generated when a logon session is created. It is generated on the computer that was accessed. The subject fields indicate... 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730162 Mon Mar 25 08:02:30 2013 675 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 9 Pre-authentication failed: User Name: USER8Y User ID: %{S-1-5-21-606747145-1409082233-725345543-3904} Service Name: krbtgt/MYDOMAIN Pre-Authentication Type: 0x0 Failure Code: 0x19 Client Address: 169.254.87.158 2013-03-25 08:02:32 Auth.Critical etc. 

示例命令

 call :parselog user6q \\path\to\catch-all.log 

预期结果

 ---------------start of record 1------------- 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730159 Mon Mar 25 08:02:29 2013 680 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 9 Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0 Logon account: USER6Q Source Workstation: dc3 Error Code: 0xC0000234 ---------------end of record 1------------- ---------------start of record 2------------- 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730160 Mon Mar 25 08:02:29 2013 539 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 2 Logon Failure: Reason: Account locked out User Name: USER6Q@MYDOMAIN.TLD Domain: MYDOMAIN Logon Type: 3 Logon Process: Advapi Authentication Package: Negotiate Workstation Name: dc3 Caller User Name: dc3$ Caller Domain: MYDOMAIN Caller Logon ID: (0x0,0x3E7) Caller Process ID: 400 Transited Services: - Source Network Address: 169.254.7.89 Source Port: 55314 ---------------end of record 2------------- 

Solutions Collecting From Web of "正则expression式在日志文件内匹配,返回匹配上下的dynamic内容"

GNU awk(IGNORECASE)所需的全部内容:

 $ cat tst.awk function prtRecord() { if (record ~ regexp) { printf "-------- start of record %d --------%s", ++numRecords, ORS printf "%s", record printf "--------- end of record %d ---------%s%s", numRecords, ORS, ORS } record = "" } BEGIN{ IGNORECASE=1 } /^[[:digit:]]+-[[:digit:]]+-[[:digit:]]+/ { prtRecord() } { record = record $0 ORS } END { prtRecord() } 

或与任何awk:

 $ cat tst.awk function prtRecord() { if (tolower(record) ~ tolower(regexp)) { printf "-------- start of record %d --------%s", ++numRecords, ORS printf "%s", record printf "--------- end of record %d ---------%s%s", numRecords, ORS, ORS } record = "" } /^[[:digit:]]+-[[:digit:]]+-[[:digit:]]+/ { prtRecord() } { record = record $0 ORS } END { prtRecord() } 

无论哪种方式,你可以在UNIX上运行它:

 $ awk -v regexp=user6q -f tst.awk file 

我不知道Windows的语法,但我认为如果不相同,它是非常相似的。

注意在脚本中使用tolower()使比较小写的两边都匹配,不区分大小写。 如果你可以传递正确的搜索正则表达式,那么你不需要在比较的任何一边调用tolower()。 NBD,它可能只是略微加快脚本。

 $ awk -v regexp=user6q -f tst.awk file -------- start of record 1 -------- 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730159 Mon Mar 25 08:02:29 2013 680 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 9 Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0 Logon account: USER6Q Source Workstation: dc3 Error Code: 0xC0000234 --------- end of record 1 --------- -------- start of record 2 -------- 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730160 Mon Mar 25 08:02:29 2013 539 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 2 Logon Failure: Reason: Account locked out User Name: USER6Q@MYDOMAIN.TLD Domain: MYDOMAIN Logon Type: 3 Logon Process: Advapi Authentication Package: Negotiate Workstation Name: dc3 Caller User Name: dc3$ Caller Domain: MYDOMAIN Caller Logon ID: (0x0,0x3E7) Caller Process ID: 400 Transited Services: - Source Network Address: 169.254.7.89 Source Port: 55314 --------- end of record 2 --------- 

这是我的努力:

 @ECHO OFF SETLOCAL :: :: Target username :: SET target=%1 CALL :zaplines SET count=0 FOR /f "delims=" %%I IN (rojoslog.txt) DO ( ECHO.%%I| findstr /r "^20[0-9][0-9]-[0-9][0-9]-[0-9][0-9].[0-9][0-9]:[0-9][0-9]:[0-9][0-9]" >NUL IF NOT ERRORLEVEL 1 ( IF DEFINED founduser CALL :report CALL :zaplines ) (SET stored=) FOR /l %%L IN (1000,1,1200) DO IF NOT DEFINED stored IF NOT DEFINED line%%L ( SET line%%L=%%I SET stored=Y ) ECHO.%%I|FINDSTR /b /e /i /c:"account name: %target%" >NUL IF NOT ERRORLEVEL 1 (SET founduser=Y) ) IF DEFINED founduser CALL :report GOTO :eof :: :: remove all envvars starting 'line' :: Set 'not found user' at same time :: :zaplines (SET founduser=) FOR /f "delims==" %%L IN ('set line 2^>nul') DO (SET %%L=) GOTO :eof :report IF NOT DEFINED line1000 GOTO :EOF SET /a count+=1 ECHO. ECHO.---------- START of record %count% ---------- FOR /l %%L IN (1000,1,1200) DO IF DEFINED line%%L CALL ECHO.%%line%%L%% ECHO.----------- END of record %count% ----------- GOTO :eof 

下面是一个纯粹的批处理解决方案,不使用grep。 它定位时间戳行,因为“摘要”一词不能存在于其他行中,但是如果需要的话,这个词可能会改变。

编辑 :我改变了单词,确定时间戳线“验证”。 我也改变了FINDSTR寻求忽略的情况。 这是新版本:

 @echo off setlocal EnableDelayedExpansion :parselog <username> <logfile> echo Searching %~2 for records containing %~1... set n=0 set previousMatch=Auth. for /F "tokens=1* delims=:" %%a in ('findstr /I /N "Auth\. %~1" %2') do ( set currentMatch=%%b if "!previousMatch:Auth.=!" neq "!previousMatch!" ( if "!currentMatch:Auth.=!" equ "!currentMatch!" ( set /A n+=1 set /A skip[!n!]=!previousLine!-1 ) ) else ( set /A end[!n!]=%%a-1 ) set previousLine=%%a set previousMatch=%%b ) if %n% equ 0 ( echo No records found goto :EOF ) if not defined end[%n%] set end[%n%]=-1 set i=1 :nextRecord echo/ echo ---------------start of record %i%------------- if !skip[%i%]! equ 0 ( set skip= ) else ( set skip=skip=!skip[%i%]! ) set end=!end[%i%]! for /F "%skip% tokens=1* delims=:" %%a in ('findstr /N "^" %2') do ( echo(%%b if %%a equ %end% goto endOfRecord ) :endOfRecord echo ---------------end of record %i%------------- set /A i+=1 if %i% leq %n% goto nextRecord 

示例命令:

 C:>test user6q catch-all.log 

结果:

 Searching catch-all.log for records containing user6q... ---------------start of record 1------------- 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730159 Mon Mar 25 08:02:29 2013 680 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 9 Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0 Logon account: USER6Q Source Workstation: dc3 Error Code: 0xC0000234 ---------------end of record 1------------- ---------------start of record 2------------- 2013-03-25 08:02:32 Auth.Critical 169.254.8.110 Mar 25 08:02:32 dc3 MSWinEventLog 2 Security 11730160 Mon Mar 25 08:02:29 2013 539 Security NT AUTHORITY\SYSTEM N/A Audit Failure dc3 2 Logon Failure: Reason: Account locked out User Name: USER6Q@MYDOMAIN.TLD Domain: MYDOMAIN Logon Type: 3 Logon Process: Advapi Authentication Package: Negotiate Workstation Name: dc3 Caller User Name: dc3$ Caller Domain: MYDOMAIN Caller Logon ID: (0x0,0x3E7) Caller Process ID: 400 Transited Services: - Source Network Address: 169.254.7.89 Source Port: 55314 ---------------end of record 2------------- 

此方法仅使用一次findstr命令的执行来查找所有匹配的记录,然后使用一个附加的findstr命令来显示每条记录。 请注意,第一个for /F ...命令作用于findstr "Auth. user.."结果,第二个for /F命令有一个“skip = N”选项和一个GOTO,显示。 这意味着FOR命令不会减慢程序; 这个程序的速度取决于FINDSTR命令的速度。

但是,由于在FOR处理之前FINDSTR输出结果的大小,因此for /F "%skip% ... in ('findstr /N "^" %2')命令中的第二个可能过长。如果发生这种情况,我们可以用另一个更快的方法修改第二个FOR(比如一个异步管道),请报告结果。

安东尼奥

我认为awk是你需要的一切:

 awk "/---start of record---/,/---end of record---/ {print}" logfile 

如果第一行指标是:

 ---start of record--- 

最后是:

 ---end of record--- 

请注意,没有中间模式匹配,“,”只是两个正则表达式的分隔符。