希望得到唯一匹配的grep

我正在尝试使用grep命令获取匹配。

我正在阅读一个XML文件，我想要在标签位置获取URL

<?xml> <!-- ..... --> <location>http://myurl.com/myuri/document</location>

我只想得到“ http://myurl.com/myuri/document ”。我做的：

 curl http://mywebsite.com/file.xml | grep "\<location\>"

我收到了完整的标签：

 <location>http://myurl.com/myuri/document</location> <location>http://myurl.com/myuri/document2</location> <location>http://myurl.com/myuri/document3</location>

现在我只想得到我所做的这个URL：

 curl http://mywebsite.com/file.xml | grep "\<location\>" | grep -oh ">.*<"

我几乎赢了哈哈

我收到了chars>和<的URL

 >http://myurl.com/myuri/document<

我怎样才能得到比赛？例如（这个例子不工作）

 curl http://mywebsite.com/file.xml | grep "\<location\>" | grep -oh ">(.*)<" http://myurl.com/myuri/document

之后我想在wget中使用var。像| wget $1一样 | wget $1

我能想到的最简单的解决方案是sed：

 ... | sed -e 's/^>//' -e 's/<$//'

这将消除卡住在URL上的尖括号。

grep与Perl正则表达式：

 grep -oP '(?<=<location>)[^<]+(?=</location>)'

要么

 grep -o '[^<>]\+</location>' |grep -o '^[^<>]\+'

或者用sed：

 sed -n 's#<location>\([^<]\+\)</location>#\1#p'

如果你想下载所有这些网址，那么：

 curl http://mywebsite.com/file.xml | grep -o '[^<>]\+</location>' |grep -o '^[^<>]\+' | wget -ci -

对于PCRE正则表达式，你可以在gnu grep上使用-P选项：

 curl http://mywebsite.com/file.xml | grep -oP '<location>\K[^<]+'

或者使用awk：

 curl http://mywebsite.com/file.xml | awk -F '</?location>' '/<location>/{print $2}' http://myurl.com/myuri/document

我无法获得anubhava的版本，所以只是试验我提出了以下 – 注意，我已经包含了GNU版本，因为我不确定是否可以解决这个问题。

我有点担心处理嵌入的XML标签正在搜索什么（可能不是一个问题与您的示例使用的位置，但看作是一个更普遍的问题）。我也发现我不得不在结果文本中删除<location>..</location>包装，因此两个sed命令。

 duck@lt-ctaylor-2:~/ateb/myx$ grep --version grep (GNU grep) 2.24 duck@lt-ctaylor-2:~/ateb/myx$ cat tmp.tmp <location><test>123</test></location> duck@lt-ctaylor-2:~/ateb/myx$ cat tmp.tmp | grep -o '<location>.*</location>' | sed 's;<location>;;' | sed 's;</location>;;' <test>123</test>