我遇到了一个很大的问题
我使用wget发送一个post到一个网站,然后我收到一个html而我需要过滤这个html的示例:
more code up... <div id="song_html" class="show1"> <div class="left"> <!-- info mp3 here --> 256 kbps<br />3:21<br />6.13 mb </div> <div id="right_song"> <div style="font-size:15px;"><b>Marilyn Manson - Tainted Love ( Manson Remix) mp3</b></div> <div style="clear:both;"></div> <div style="float:left;"> <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> <div style="float:left;"><a href="http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3" rel="nofollow" target="_blank" style="color:green;">Download</a></div> <div style="margin-left:8px; float:left; width:27px; text-align:center;"><a href="javascript:void(0)" onclick="showPlayer_new(37119, '91da6888c92ccb4198dbc78cb30f311635751694', 'marilyn+manson', 'tainted+love')" rel="nofollow" id="lk37119" class="play_now">Play</a></div> <div style="margin-left:8px; float:left;"><a href="javascript:void(0)" onclick="showEmbed_new(37119, '91da6888c92ccb4198dbc78cb30f311635751694')" rel="nofollow" id="em37119" class="embed">Embed</a></div> <div style="margin-left:8px; float:left;"><a href="http://www.ringtonematcher.com/go/?sid=WDLL&artist=marilyn+manson&song=tainted+love" rel="nofollow" target="_blank" style="color:red;" title="Send Marilyn Manson - Tainted Love Ringtone to your Cell">Descarga Tono</a></div> <div style="clear:both;"></div> </div> <div id="player37119" style="float:left; margin-left:10px;" class="player"></div> </div> <div style="clear:both;"></div> </div> <div style="clear:both;"></div> </div> <div id="song_html" class="show2"> <div class="left"> <!-- info mp3 here --> </div> <div id="right_song"> <div style="font-size:15px;"><b>Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3</b></div> <div style="clear:both;"></div> <div style="float:left;"> <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> <div style="float:left;"><a href="http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3" rel="nofollow" target="_blank" style="color:green;">Download</a></div> <div style="margin-left:8px; float:left; width:27px; text-align:center;"><a href="javascript:void(0)" onclick="showPlayer_new(668416, 'ac5b8834fa26b892fc1436db4678aca9d8acfdb1', 'spaz+marilyn+manson+metric', 'grow+up+and+blow+the+great+big+dj%3a%2f%2fspaz%2c+marilyn+manson')" rel="nofollow" id="lk668416" class="play_now">Play</a></div> <div style="margin-left:8px; float:left;"><a href="javascript:void(0)" onclick="showEmbed_new(668416, 'ac5b8834fa26b892fc1436db4678aca9d8acfdb1')" rel="nofollow" id="em668416" class="embed">Embed</a></div> <div style="margin-left:8px; float:left;"><a href="http://www.ringtonematcher.com/go/?sid=WDLL&artist=spaz+marilyn+manson+metric&song=grow+up+and+blow+the+great+big+dj%3a%2f%2fspaz%2c+marilyn+manson" rel="nofollow" target="_blank" style="color:red;" title="Send Spaz Marilyn Manson Metric - Grow Up And Blow The Great Big Dj://spaz, Marilyn Manson Ringtone to your Cell">Descarga Tono</a></div> <div style="clear:both;"></div> </div> <div id="player668416" style="float:left; margin-left:10px;" class="player"></div> </div> <div style="clear:both;"></div> </div> <div style="clear:both;"></div> </div> <div id="morelink" style="margin:10px; text-align:center;"><a href="" rel="nofollow" onClick="toggle(); return false;">Show More Results</a></div> <div id="song_html" class="show3"> <div class="left"> <!-- info mp3 here --> 3:10<br /> </div> <div id="right_song"> <div style="font-size:15px;"><b>Marilyn Manson - MARILYN MANSON - Rock is Dead mp3</b></div> <div style="clear:both;"></div> <div style="float:left;"> <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> <div style="float:left;"><a href="http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3" rel="nofollow" target="_blank" style="color:green;">Download</a></div> <div style="margin-left:8px; float:left; width:27px; text-align:center;"><a href="javascript:void(0)" onclick="showPlayer_new(670124, '14a52b596082676bed6a9d860c383488a486e1dc', 'marilyn+manson', '-+rock+is+dead')" rel="nofollow" id="lk670124" class="play_now">Play</a></div> <div style="margin-left:8px; float:left;"><a href="javascript:void(0)" onclick="showEmbed_new(670124, '14a52b596082676bed6a9d860c383488a486e1dc')" rel="nofollow" id="em670124" class="embed">Embed</a></div> <div style="margin-left:8px; float:left;"><a href="http://www.ringtonematcher.com/go/?sid=WDLL&artist=marilyn+manson&song=-+rock+is+dead" rel="nofollow" target="_blank" style="color:red;" title="Send Marilyn Manson - - Rock Is Dead Ringtone to your Cell">Descarga Tono</a></div> <div style="clear:both;"></div> </div> <div id="player670124" style="float:left; margin-left:10px;" class="player"></div> </div> <div style="clear:both;"></div> </div> <div style="clear:both;"></div> </div> </div> </div> <!-- ================= --> more code down...
…要设置一些variables,如“名称”“比特率”“大小”和“下载”,要在批处理中打印所有信息,如下所示:
1st result: [Name] Marilyn Manson - Tainted Love ( Manson Remix) mp3 [Info] Bitrate: 256 kbps. Length: 3:21. Size: 6.13 mb. [Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3 2nd result: [Name] Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3 [Info] NO INFO. [Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3 3rd result: [Name] Marilyn Manson - MARILYN MANSON - Rock is Dead mp3 [Info] Lenght: 3:10. [Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3
我已经尝试了“Findstr”,“Find”,“SED”,“GREP”,“FART”,但我找不到方法(A行和字符分隔符)做对了…
唯一我能看到的是这条线:
<!-- ================= -->
我可以像使用END分隔符一样使用它,因为该行标记了mp3的结尾以下载并打印它们的信息…
有人可以帮助我吗?
谢谢
下面的批处理文件使用的是您要的数据位于“info mp3 here”行下面的固定行数的事实。 而且,数据是根据它在行中的位置提取的。 如果有些数据不遵循这个规则,程序将需要修改。
@echo off setlocal EnableDelayedExpansion findstr /N /C:"info mp3 here" %1 > "%~N1.tmp" set lastLine=-1 (for /F "usebackq delims=:" %%a in ("%~N1.tmp") do ( set /A skip=%%a-lastLine for /L %%i in (1,1,!skip!) do set /P info= set /P =& set /P name= for /L %%i in (1,1,4) do set /P download= set "name=!name:*<b>=! for /F "delims=<" %%n in ("!name!") do echo [Name] %%n set "info=!info:<br />= !" set "info=!info:</div>=!" set bitrate= set lenght= set size= set value= for %%t in (!info!) do ( if not defined value ( set value=%%t ) else ( if %%t equ kbps ( set "bitrate=Bitrate: !value! kbps. " set value= ) else if %%t equ mb ( set "size=Size: !value! mb." set value= ) else ( set "lenght=Lenght: !value!. " set value=%%t ) ) ) if defined value ( set "lenght=Lenght: !value!. " ) set info=!bitrate!!lenght!!size! if not defined info set info=NO INFO. echo [Info] !info! set "download=!download:"=$!" for /F "tokens=4 delims=$" %%d in ("!download!") do echo [Download] %%d set /A lastline=%%a+6 )) < %1 del "%~N1.tmp"
输出:
[Name] Marilyn Manson - Tainted Love ( Manson Remix) mp3 [Info] Bitrate: 256 kbps. Lenght: 3:21. Size: 6.13 mb. [Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3 [Name] Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3 [Info] NO INFO. [Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3 [Name] Marilyn Manson - MARILYN MANSON - Rock is Dead mp3 [Info] Lenght: 3:10. [Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3
这是一个脚本,将解析你想要的信息。
该脚本将您的HTML文件的名称作为参数 。
输出被发送到一个名称为“.parsed”的文件。
脚本顶部的注释给出了用于在HTML文件中查找请求信息的模式的一些解释。
将两个“TAB”实例替换为制表符,并确保在每个制表符之前保留单个空格。
#!/bin/bash # Parse HTML with sed, suppressing all unwanted lines # "Info" lines all start with a number (ignoring whitespace) # Bitrate and file size can be identified by looking for # the unit (kbps, mb) immediately following the numeric data # Length is identified by the colon in the middle of numeric data # File names are delimited by <b> and </b> # Lines with the URL all contain Download</a> # The </a> isn't necessary, but I thought it would be safer to # include it since one could imagine "Download" appearing in a file name # Pipe output to Awk for reordering of the parsed lines # and addition of "NO INFO" lines where necessary sed -n ' /^[ TAB]*[0-9]/ { s/^[ TAB]*/[Info] / s/\([0-9]*:[0-9]*\)[^0-9]*/Length: \1. / s/\([0-9\.]* .bps\)[^0-9L]*/Bitrate: \1. / s/\([0-9\.]* .b\)[^p][^0-9LB]*/Size: \1. / p } /<b>/ { s|</b>.*|| s|.*<b>\(.*\)|[Name] \1| p } \|Download</a>| { s/^.*\(http:[^"]*\).*/[Download] \1/ p }' $1 | awk 'BEGIN { no_info = "[INFO] NO INFO."; info = no_info } { if ($1 == "[Name]") name = $0; else if ($1 == "[Info]") info = $0; else { printf("%s\n%s\n%s\n\n", name, info, $0); info = no_info } }' > $1.parsed exit 0
TXR 65(在Windows上运行; MinGW-compiled .exe可用)
@(collect) <div id="song_html" class="show@nil"> <div class="left"> <!-- info mp3 here --> @(gather :vars ((bitrate nil) (length nil) (size nil))) @bitrate kbps@(skip) @(skip)@{length /\d+:\d\d/}@(skip) @(skip)@{size /\d+\.\d\d/} mb@(skip) @(until) <div id="right_song"> @(end) @(bind info @(if (or bitrate length size) (let ((s (make-string-output-stream))) (if bitrate (format s "Bitrate: ~a kbps. " bitrate)) (if length (format s "Length: ~a. " length)) (if size (format s "Size: ~a mb. " size)) (get-string-from-stream s)) "NO INFO.")) <div id="right_song"> <div style="font-size:15px;"><b>@title</b></div> <div style="clear:both;"></div> <div style="float:left;"> <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> <div style="float:left;"><a href="@link" rel="nofollow" target="_blank" style="color:green;">Download</a></div> @(until) <!-- ================= --> @(end) @(output) @ (repeat) [Name] @title [Info] @info [Download] @link @ (end) @(end)
跑:
$ txr data.txr data.html [Name] Marilyn Manson - Tainted Love ( Manson Remix) mp3 [Info] Bitrate: 256 kbps. Length: 3:21. Size: 6.13 mb. [Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3 [Name] Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3 [Info] NO INFO. [Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3 [Name] Marilyn Manson - MARILYN MANSON - Rock is Dead mp3 [Info] Length: 3:10. [Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3