Java使用正則表達(dá)式匹配獲取鏈接地址的方法示例

更新時(shí)間：2017年08月16日 09:54:54 作者：fancylovejava

這篇文章主要介紹了Java使用正則表達(dá)式匹配獲取鏈接地址的方法,簡(jiǎn)單分析了java正則匹配常用方法及獲取網(wǎng)址鏈接的相關(guān)操作技巧,需要的朋友可以參考下

本文實(shí)例講述了Java使用正則表達(dá)式匹配獲取鏈接地址的方法。分享給大家供大家參考，具體如下：

獲取頁(yè)面中字符串的url地址我們都會(huì)使用正則表達(dá)式來(lái)匹配獲取了，下面我來(lái)給大家總結(jié)幾個(gè)匹配獲取鏈接地址示例。

1、正則表達(dá)式中Matcher中find()方法的應(yīng)用。

2、String對(duì)象中的 replaceAll(String regex,String replacement) 方法的使用。通過(guò)這個(gè)方法去除了不必要的字符串，從而得到了需要的網(wǎng)址和鏈接文字

例1.超簡(jiǎn)單的

String content = "<a href="URL" rel="external nofollow" >";
String pattern= "href="([^" rel="external nofollow" ]*)"";
Pattern p = Pattern.compile(pattern, 2 | Pattern.DOTALL);
Matcher m = p.matcher(content);
if(m.find()) {
   System.out.println("url="+m.group(1));
}

例2.上面只能獲取帶有雙“號(hào)的a標(biāo)題中的url,下面我們加以改進(jìn)可以獲取任何狀態(tài)下的a標(biāo)題url

package com.gong.example;
import Java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Simple {
 public static void main(String[] args){
 String input="<a style=" " href = "http://chabaoo.cn" target="_blank" >chabaoo.cn</a>" +
 "<a  target='_blank' >www.163.com</a> " +
 "<a href=http://www.yahoo.com target=_blank >www.yahoo.com</a>";
 String patternString = "\s*(?i)href\s*=\s*("([^"]*")|'[^']*'|([^'">\s]+))"; //href
 Pattern pattern = Pattern.compile(patternString,
  Pattern.CASE_INSENSITIVE);
 Matcher matcher = pattern.matcher(input);
 while (matcher.find()) {
  String link=matcher.group();
  System.out.println(link);
  link=link.replaceAll("href\s*=\s*(['|"]*)", "");
  System.out.println("--"+link);
  link=link.replaceAll("['|"]", "");
  System.out.println("---"+link);
 }
 }
}

例3.我們還可以利用它進(jìn)行升級(jí)獲取獲取網(wǎng)址和鏈接文字哦。

/*
   功能說(shuō)明：分析字符串s，提取s里面的超鏈接和鏈接文字
*/
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegTest
{
  public static void main(String[] args)
  {
    //String s="<p id=km>&nbsp;<a href=http://down.yourweb.com>空間</a>&nbsp;|&nbsp;<a ";
    String s="</p><p style=height:14px><a href=http://mb.yourweb.com>企業(yè)推廣</a> | <a href=http://code.yourweb.com>搜索風(fēng)云榜</a> | <a href=/home.html>關(guān)于百度</a> | <a href=http://www.yourweb.com>About Baidu</a></p><p id=b>&copy;2008 Baidu <a href=http://www.yourweb.com>使用百度前必讀</a> <a href=http://www.miibeian.gov.cn target=_blank>京ICP證03xxxx號(hào)</a> <a href=http://chabaoo.cn><img src=/get_pic/2013/11/22/20131122031447947.gif></a></p></center></body></html><!--543ff95f18f36b11-->";
     String regex="<a.*?/a>";
    //String regex = "<a.*>(.*)</a>";
    Pattern pt=Pattern.compile(regex);
    Matcher mt=pt.matcher(s);
    while(mt.find())
    {
       System.out.println(mt.group());
       System.out.println();
       String s2=">.*?</a>";//標(biāo)題部分
       String s3="href=.*?>";
       Pattern pt2=Pattern.compile(s2);
       Matcher mt2=pt2.matcher(mt.group());
       while(mt2.find())
       {
        System.out.println("標(biāo)題："+mt2.group().replaceAll(">|</a>",""));
       }
       Pattern pt3=Pattern.compile(s3);
       Matcher mt3=pt3.matcher(mt.group());
       while(mt3.find())
       {
        System.out.println("網(wǎng)址："+mt3.group().replaceAll("href=|>",""));
       }
    }
  }
}

PS：這里再為大家提供2款非常方便的正則表達(dá)式工具供大家參考使用：

JavaScript正則表達(dá)式在線(xiàn)測(cè)試工具：
http://tools.jb51.net/regex/javascript

正則表達(dá)式在線(xiàn)生成工具：
http://tools.jb51.net/regex/create_reg

更多關(guān)于java算法相關(guān)內(nèi)容感興趣的讀者可查看本站專(zhuān)題：《Java正則表達(dá)式技巧大全》、《Java數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Java操作DOM節(jié)點(diǎn)技巧總結(jié)》、《Java文件與目錄操作技巧匯總》和《Java緩存操作技巧匯總》

希望本文所述對(duì)大家java程序設(shè)計(jì)有所幫助。

您可能感興趣的文章: