使用常态表达式得到网页标题 : HTML解析器 « 网络协议 « Java

Java

1.	图形用户界面
2.	三维图形动画
3.	高级图形
4.	蚂蚁编译
5.	Apache类库
6.	统计图
7.	类
8.	集合数据结构
9.	数据类型
10.	数据库JDBC
11.	设计模式
12.	开发相关类
13.	EJB3
14.	电子邮件
15.	事件
16.	文件输入输出
17.	游戏
18.	泛型
19.	GWT
20.	Hibernate
21.	本地化
22.	J2EE平台
23.	基于J2ME
24.	JDK-6
25.	JNDI的LDAP
26.	JPA
27.	JSP技术
28.	JSTL
29.	语言基础知识
30.	网络协议
31.	PDF格式RTF格式
32.	映射
33.	常规表达式
34.	脚本
35.	安全
36.	Servlets
37.	Spring
38.	Swing组件
39.	图形用户界面
40.	SWT-JFace-Eclipse
41.	线程
42.	应用程序
43.	Velocity
44.	Web服务SOA
45.	可扩展标记语言

Java 教程

Java » 网络协议 » HTML解析器

屏幕截图

使用常态表达式得到网页标题






import java.io.DataInputStream;

import java.net.URL;

import java.net.URLConnection;

import java.util.regex.Matcher;

import java.util.regex.Pattern;



public class Main {

  public static void main(String[] argv) throws Exception {



    URL url = new URL("http://www.java.com/");

    URLConnection urlConnection = url.openConnection();

    DataInputStream dis = new DataInputStream(urlConnection.getInputStream());

    String html = "", tmp = "";

    while ((tmp = dis.readUTF()) != null) {

      html += " " + tmp;

    }

    dis.close();



    html = html.replaceAll("\\s+", " ");

    Pattern p = Pattern.compile("<title>(.*?)</title>");

    Matcher m = p.matcher(html);

    while (m.find() == true) {

      System.out.println(m.group(1));

    }

  }

}

Related examples in the same category

1.	从字符串过滤的HTML特殊字符
2.	使用javax.swing.text.html.HTMLEditorKit解析HTML文档
3.	提取链接的网页
4.	延伸HTMLEditorKit.ParserCallback
5.	基于HTMLEditorKit.ParserCallbackHTML的解析器
6.	获取所有超网页上的链接
7.	获取HTML文件的链接
8.	获取HTML文件中的文字
9.	查找并显示超连结的网页内

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.