Use SGML paraser : sgmllib « XML « Python Tutorial

Python Tutorial
1. Introduction
2. Data Type
3. Statement
4. Operator
5. String
6. Tuple
7. List
8. Dictionary
9. Collections
10. Function
11. Class
12. File
13. Buildin Function
14. Buildin Module
15. Database
16. Regular Expressions
17. Thread
18. Tkinker
19. wxPython
20. XML
21. Network
22. CGI Web
23. Windows
Java
Java Tutorial
Java Source Code / Java Documentation
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorial
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Python Tutorial » XML » sgmllib 
20. 7. 1. Use SGML paraser
import sgmllib
import string

filename = "index.html"
class CleanExit(Exception):
    pass

class Titlefinder(sgmllib.SGMLParser):
    def __init__(self, verbose=0):
        sgmllib.SGMLParser.__init__(self, verbose)
        self.title = self.data = None
    def start_title(self, attributes):
        self.data = []
    def end_title(self):
        self.title = string.join(self.data, "")
        raise CleanExit
    def handle_data(self, data):
        if self.data is not None:
            self.data.append(data)

def get_title(filehandle):
    Parser = Titlefinder()
    try:
        while 1:
            sgmldata = filehandle.read(1024)
            if not sgmldata:
                break
            Parser.feed(sgmldata)
        Parser.close()
    except CleanExit:
        return Parser.title
    return None

filehandle = open(filename)
title = get_title(filehandle)

print "The page's title is: %s" (title)
20. 7. sgmllib
20. 7. 1. Use SGML paraser
www.java2java.com | Contact Us
Copyright 2010 - 2030 Java Source and Support. All rights reserved.
All other trademarks are property of their respective owners.