一聚教程网:一个值得你收藏的教程网站

热门教程

C#.Net基于正则表达式抓取百度百家文章列表的方法示例

时间:2022-06-25 07:48:45 编辑:袖梨 来源:一聚教程网

工作之余,学习了一下正则表达式,鉴于实践是检验真理的唯一标准,于是便写了一个利用正则表达式抓取百度百家文章的例子,具体过程请看下面源码:

一、获取百度百家网页内容

publicList GetUrl()
{
  try
  {
    stringurl ="http://baijia.baidu.com/";
    WebRequest webRequest = WebRequest.Create(url);
    WebResponse webResponse = webRequest.GetResponse();
    StreamReader reader =newStreamReader(webResponse.GetResponseStream());
    stringresult = reader.ReadToEnd();
    reader.Close();
    webResponse.Close();
    returnAnalysisHtml(result);
  }
  catch(Exception ex)
  {
    throwex;
  }
}

二、通过正则表达式筛选

publicList AnalysisHtml(stringhtmlContent)
{
  List list =newList();
  stringstrPattern ="

(?[^<]+)</a></h3>.*s*<ps*class="feeds-item-text">(?<Abstract>[^<]+)<as*href="(?<Url>.*)"s*target="_blank"s*class="feeds-item-more"s*mon=".*s*">.*s*</a></p>"; Regex regex =newRegex(strPattern, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant); if(regex.IsMatch(htmlContent)) { MatchCollection matchCollection = regex.Matches(htmlContent); foreach(Match matchinmatchCollection) { string[] str =newstring[3]; str[0] = match.Groups[1].Value;//获取到的是列表数据的标题 str[1] = match.Groups[2].Value;//获取到的是内容 str[2] = match.Groups[3].Value;//获取到的是链接到的地址 list.Add(str); } } returnlist; }</pre> </div> </div> <div class="articles"> <div class="tit02"> <h4>相关文章</h4> </div> <ul> <li> <a target="_blank" href="/new/393778.htm">碧蓝航线RPG系列主题皮肤性价比 碧蓝航线RPG系列主题皮肤购买推荐</a> <span>03-29</span> </li> <li> <a target="_blank" href="/new/393777.htm">崩坏星穹铁道梦核朋克酒保行动成就怎么达成 梦核朋克酒保行动成就攻略</a> <span>03-29</span> </li> <li> <a target="_blank" href="/new/393776.htm">碧蓝航线从零开始的魔王讨伐之旅活动攻略 碧蓝航线从零开始的魔王讨伐之旅活动怎么玩</a> <span>03-29</span> </li> <li> <a target="_blank" href="/new/393775.htm">崩坏星穹铁道Omakase成就怎么达成 崩坏星穹铁道Omakase成就攻略</a> <span>03-29</span> </li> <li> <a target="_blank" href="/new/393774.htm">崩坏星穹铁道小鬼当家成就怎么达成 崩坏星穹铁道火力全开成就攻略</a> <span>03-29</span> </li> <li> <a target="_blank" href="/new/393773.htm">崩坏星穹铁道仓鼠球骑士速度与坚果得分技巧 仓鼠球骑士速度与坚果怎么拿高分</a> <span>03-29</span> </li> </ul> </div> </div> <div class="pages art-detail"> </div> </div> </div> </div> </div> <div class="hot-column"> <div class="cont"> <div class="tit"> <h4>热门栏目</h4> </div> <ul class="clearfix"> <li> <h6><a href="/list-1/" target="_blank">php教程</a></h6> <a href="/list-45/" target="_blank">php入门</a> <a href="/list-46/" target="_blank">php安全</a> <a href="/list-47/" target="_blank">php安装</a> <a href="/list-48/" target="_blank">php常用代码</a> <a href="/list-49/" target="_blank">php高级应用</a> </li> <li> <h6><a href="/list-2/" target="_blank">asp.net教程</a></h6> <a href="/list-78/" target="_blank">基础入门</a> <a href="/list-79/" target="_blank">.Net开发</a> <a href="/list-80/" target="_blank">C语言</a> <a href="/list-81/" target="_blank">VB.Net语言</a> <a href="/list-82/" target="_blank">WebService</a> </li> <li> <h6><a href="/list-6/" target="_blank">手机开发</a></h6> <a href="/list-208/" target="_blank">安卓教程</a> <a href="/list-209/" target="_blank">ios7教程</a> <a href="/list-210/" target="_blank">Windows Phone</a> <a href="/list-211/" target="_blank">Windows Mobile</a> <a href="/list-212/" target="_blank">手机常见问题</a> </li> <li> <h6><a href="/list-3/" target="_blank">css教程</a></h6> <a href="/list-99/" target="_blank">CSS入门</a> <a href="/list-100/" target="_blank">常用代码</a> <a href="/list-101/" target="_blank">经典案例</a> <a href="/list-102/" target="_blank">样式布局</a> <a href="/list-103/" target="_blank">高级应用</a> </li> <li> <h6><a href="/list-4/" target="_blank">网页制作</a></h6> <a href="/list-136/" target="_blank">设计基础</a> <a href="/list-137/" target="_blank">Dreamweaver</a> <a href="/list-138/" target="_blank">Frontpage</a> <a href="/list-139/" target="_blank">js教程</a> <a href="/list-140/" target="_blank">XNL/XSLT</a> </li> <li> <h6><a href="/list-7/" target="_blank">办公数码</a></h6> <a href="/list-236/" target="_blank">word</a> <a href="/list-237/" target="_blank">excel</a> <a href="/list-238/" target="_blank">powerpoint</a> <a href="/list-239/" target="_blank">金山WPS</a> <a href="/list-240/" target="_blank">电脑新手</a> </li> <li> <h6><a href="/list-11/" target="_blank">jsp教程</a></h6> <a href="/list-68/" target="_blank">Application与Applet</a> <a href="/list-69/" target="_blank">J2EE/EJB/服务器</a> <a href="/list-70/" target="_blank">J2ME开发</a> <a href="/list-71/" target="_blank">Java基础</a> <a href="/list-72/" target="_blank">Java技巧及代码</a> </li> </ul> </div> </div> <div class="footer"> <div class="cont"> <p> <a href="/" target="_self">一聚教程网</a>| <a href="javascript:;" class="about" target="_self">关于我们</a>| <a href="javascript:;" class="contact" target="_self">联系我们</a>| <a href="javascript:;" class="gg_contact" target="_self">广告合作</a>| <a href="javascript:;" class="friend_link" target="_self">友情链接</a>| <a href="javascript:;" class="copyright_notice" target="_self">版权声明</a> </p> <p> <span>copyRight@2007-2022 www.111CN.NET AII Right Reserved <a href="https://beian.miit.gov.cn/" target="_blank" class="beian"></a></span> </p> <p> <span> 网站内容来自网络整理或网友投稿如有侵权行为请邮件:cn111net@163.com 我们24小时内处理 </span> </p> </div> </div> <script src="/js/stat.js"></script> </body> </html>