从XML CDATA获取img src

我是C#和Windows Phone开发人员的新手,所以请原谅我,如果我错过了显而易见的:

我想从位于http://blog.dota2.com/feed/的RSS XML提要显示缩略图。 图像在用HTML编写的CDATA标签内。 这里是XML代码:

<content:encoded> <![CDATA[ <p>We celebrate Happy Bear Pun Week a day earlier as Lone Druid joins Dota 2′s cast of heroes.</p> <p><a href="http://img.zgserver.com/c%23/LoneDruid_full.jpg "><img class="alignnone" title="The irony is that he's allergic to fur." src="http://img.zgserver.com/c%23/LoneDruid_small.jpg" alt="The irony is that he's allergic to fur." width="551" height="223" /></a></p> <p>Community things:</p> <ul> <li><a href="http://www.itsgosu.com/game/dota2/articles/ig-monthly-madness-invitational-finals-mar-29_407" target="_blank">It’s Gosu’s Monthly Madness</a> tournament finals are tomorrow, March 29th. You don’t want to miss this, we hear it could be more than we can bear.</li> <li>Bear witness to <a href="http://www.team-dignitas.net/articles/blogs/DotA/1092/Dota-2-Ultimate-Guide-to-Warding/" target="_blank">Team Dignitas’ Ultimate Guide to Warding</a>. This should be required teaching in clawsrooms across the globe.</li> <li>Great Explorer Nullf has <a href="http://nullf.deviantart.com/#/d4ubxiu" target="_blank">compiled the eating habits</a> of the legendary Tidehunter in one handy chart. This might give you paws before deciding to head to the beach.</li> </ul> <p>Bear in mind that there will not be an update next week as we will be hibernating during that time.</p> <p>Today’s bearlog is available <a href="http://store.steampowered.com/news/7662" target="_blank">here</a>.</p> <p>&nbsp;</p> <p>Bear.</p> ]]> </content:encoded> 

我只需要<img src="http://img.zgserver.com/c%23/LoneDruid_small.jpg" />这样我就可以使用URL在我的阅读器应用程序中显示图像。

我听到有人说不使用正则expression式,因为parsingHTML是不好的做法。 我正在创造这个概念的certificate,并不需要担心这个。 我正在寻找最快的方式来获取图像的这个URL,然后在我的应用程序中调用这个。

有没有人有任何帮助? 谢谢,汤姆

假设你的XML看起来像这样(我敢肯定它不),以及这些扩展: http : //searisen.com/xmllib/extensions.wiki

 <?xml version="1.0" encoding="utf-8"?> <root xmlns:content='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'> <content:encoded> <![CDATA[ <p>We celebrate ...</p> <p> <a href="http://img.zgserver.com/c%23/LoneDruid_full.jpg "> <img class="alignnone" title="The irony is that he's allergic to fur." src="http://img.zgserver.com/c%23/LoneDruid_small.jpg" /> </a> </p> <p>the rest removed</p> ]]> </content:encoded> </root> 

这将从第二段得到图像来源 – 硬编码和丑陋的,但它是所有你需要你说。 你将不得不给路径path/to/content:encoded为它的工作,如果它在一个组(又名数组),那么它会更加复杂。 从我的代码中,你可以看到如何分离出数组(参见下文):

 XElement root = XElement.Load(file) // or .Parse(string) string html = root.Get("content:encoded", string.Empty).Replace("&nbsp", " "); XElement xdata = XElement.Parse(string.Format("<root>{0}</root>", html)); XElement[] paras = xdata.GetElements("p").ToArray(); string src = paras[1].Get("a/img/src", string.Empty); 

PS这个工程,因为HTML格式正确,否则,你将不得不使用HtmlAgilityPack其他人已经回答。 你可以使用从Get("content:emcoded" ...)返回的html

当你准备使用HtmlAgilityPack时,你可以试试这个

 HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(yourstring); var imgLinks = doc.DocumentNode .Descendants("img") .Select(n => n.Attributes["src"].Value) .ToArray(); 
 const string pattern = @"<img.+?src.*?\=.*?""(<?URL>.*?)"""; Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); var match = regex.Match(myCDataText); var domain = match.Groups["URL"].Value;