Source Code Cross Referenced for URLEncoder.java in » 6.0-JDK-Core » net » java » net » Java Source Code / Java DocumentationJava Source Code and Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server
26.	ERP CRM Financial
27.	ESB
28.	Forum
29.	Game
30.	GIS
31.	Graphic 3D
32.	Graphic Library
33.	Groupware
34.	HTML Parser
35.	IDE
36.	IDE Eclipse
37.	IDE Netbeans
38.	Installer
39.	Internationalization Localization
40.	Inversion of Control
41.	Issue Tracking
42.	J2EE
43.	J2ME
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Music
50.	Net
51.	Parser
52.	PDF
53.	Portal
54.	Profiler
55.	Project Management
56.	Report
57.	RSS RDF
58.	Rule Engine
59.	Science
60.	Scripting
61.	Search Engine
62.	Security
63.	Sevlet Container
64.	Source Control
65.	Swing Library
66.	Template Engine
67.	Test Coverage
68.	Testing
69.	UML
70.	Web Crawler
71.	Web Framework
72.	Web Mail
73.	Web Server
74.	Web Services
75.	Web Services apache cxf 2.2.6
76.	Web Services AXIS2
77.	Wiki Engine
78.	Workflow Engines
79.	XML
80.	XML UI
Java Source Code / Java Documentation » 6.0 JDK Core » net » java.net
Source Cross Referenced Class Diagram Java Document (Java Doc)
001        /*
002         * Copyright 1995-2006 Sun Microsystems, Inc.  All Rights Reserved.
003         * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
004         *
005         * This code is free software; you can redistribute it and/or modify it
006         * under the terms of the GNU General Public License version 2 only, as
007         * published by the Free Software Foundation.  Sun designates this
008         * particular file as subject to the "Classpath" exception as provided
009         * by Sun in the LICENSE file that accompanied this code.
010         *
011         * This code is distributed in the hope that it will be useful, but WITHOUT
012         * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
013         * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
014         * version 2 for more details (a copy is included in the LICENSE file that
015         * accompanied this code).
016         *
017         * You should have received a copy of the GNU General Public License version
018         * 2 along with this work; if not, write to the Free Software Foundation,
019         * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
020         *
021         * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
022         * CA 95054 USA or visit www.sun.com if you need additional information or
023         * have any questions.
024         */
025
026        package java.net;
027
028        import java.io.ByteArrayOutputStream;
029        import java.io.BufferedWriter;
030        import java.io.OutputStreamWriter;
031        import java.io.IOException;
032        import java.io.UnsupportedEncodingException;
033        import java.io.CharArrayWriter;
034        import java.nio.charset.Charset;
035        import java.nio.charset.IllegalCharsetNameException;
036        import java.nio.charset.UnsupportedCharsetException;
037        import java.util.BitSet;
038        import java.security.AccessController;
039        import java.security.PrivilegedAction;
040        import sun.security.action.GetBooleanAction;
041        import sun.security.action.GetPropertyAction;
042
043        /**
044         * Utility class for HTML form encoding. This class contains static methods
045         * for converting a String to the <CODE>application/x-www-form-urlencoded</CODE> MIME
046         * format. For more information about HTML form encoding, consult the HTML 
047         * <A HREF="http://www.w3.org/TR/html4/">specification</A>. 
048         *
049         * <p>
050         * When encoding a String, the following rules apply:
051         *
052         * <p>
053         * <ul>
054         * <li>The alphanumeric characters &quot;<code>a</code>&quot; through
055         *     &quot;<code>z</code>&quot;, &quot;<code>A</code>&quot; through
056         *     &quot;<code>Z</code>&quot; and &quot;<code>0</code>&quot; 
057         *     through &quot;<code>9</code>&quot; remain the same.
058         * <li>The special characters &quot;<code>.</code>&quot;,
059         *     &quot;<code>-</code>&quot;, &quot;<code>*</code>&quot;, and
060         *     &quot;<code>_</code>&quot; remain the same. 
061         * <li>The space character &quot;<code>&nbsp;</code>&quot; is
062         *     converted into a plus sign &quot;<code>+</code>&quot;.
063         * <li>All other characters are unsafe and are first converted into
064         *     one or more bytes using some encoding scheme. Then each byte is
065         *     represented by the 3-character string
066         *     &quot;<code>%<i>xy</i></code>&quot;, where <i>xy</i> is the
067         *     two-digit hexadecimal representation of the byte. 
068         *     The recommended encoding scheme to use is UTF-8. However, 
069         *     for compatibility reasons, if an encoding is not specified, 
070         *     then the default encoding of the platform is used.
071         * </ul>
072         *
073         * <p>
074         * For example using UTF-8 as the encoding scheme the string &quot;The
075         * string &#252;@foo-bar&quot; would get converted to
076         * &quot;The+string+%C3%BC%40foo-bar&quot; because in UTF-8 the character
077         * &#252; is encoded as two bytes C3 (hex) and BC (hex), and the
078         * character @ is encoded as one byte 40 (hex).
079         *
080         * @author  Herb Jellinek
081         * @version 1.38, 05/05/07
082         * @since   JDK1.0
083         */
084        public class URLEncoder {
085            static BitSet dontNeedEncoding;
086            static final int caseDiff = ('a' - 'A');
087            static String dfltEncName = null;
088
089            static {
090
091                /* The list of characters that are not encoded has been
092                 * determined as follows:
093                 *
094                 * RFC 2396 states:
095                 * -----
096                 * Data characters that are allowed in a URI but do not have a
097                 * reserved purpose are called unreserved.  These include upper
098                 * and lower case letters, decimal digits, and a limited set of
099                 * punctuation marks and symbols. 
100                 *
101                 * unreserved  = alphanum | mark
102                 *
103                 * mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
104                 *
105                 * Unreserved characters can be escaped without changing the
106                 * semantics of the URI, but this should not be done unless the
107                 * URI is being used in a context that does not allow the
108                 * unescaped character to appear.
109                 * -----
110                 *
111                 * It appears that both Netscape and Internet Explorer escape
112                 * all special characters from this list with the exception
113                 * of "-", "_", ".", "*". While it is not clear why they are
114                 * escaping the other characters, perhaps it is safest to
115                 * assume that there might be contexts in which the others
116                 * are unsafe if not escaped. Therefore, we will use the same
117                 * list. It is also noteworthy that this is consistent with
118                 * O'Reilly's "HTML: The Definitive Guide" (page 164).
119                 *
120                 * As a last note, Intenet Explorer does not encode the "@"
121                 * character which is clearly not unreserved according to the
122                 * RFC. We are being consistent with the RFC in this matter,
123                 * as is Netscape.
124                 *
125                 */
126
127                dontNeedEncoding = new BitSet(256);
128                int i;
129                for (i = 'a'; i <= 'z'; i++) {
130                    dontNeedEncoding.set(i);
131                }
132                for (i = 'A'; i <= 'Z'; i++) {
133                    dontNeedEncoding.set(i);
134                }
135                for (i = '0'; i <= '9'; i++) {
136                    dontNeedEncoding.set(i);
137                }
138                dontNeedEncoding.set(' '); /* encoding a space to a + is done
139                 * in the encode() method */
140                dontNeedEncoding.set('-');
141                dontNeedEncoding.set('_');
142                dontNeedEncoding.set('.');
143                dontNeedEncoding.set('*');
144
145                dfltEncName = (String) AccessController
146                        .doPrivileged(new GetPropertyAction("file.encoding"));
147            }
148
149            /**
150             * You can't call the constructor.
151             */
152            private URLEncoder() {
153            }
154
155            /**
156             * Translates a string into <code>x-www-form-urlencoded</code>
157             * format. This method uses the platform's default encoding
158             * as the encoding scheme to obtain the bytes for unsafe characters.
159             *
160             * @param   s   <code>String</code> to be translated.
161             * @deprecated The resulting string may vary depending on the platform's
162             *             default encoding. Instead, use the encode(String,String)
163             *             method to specify the encoding.
164             * @return  the translated <code>String</code>.
165             */
166            @Deprecated
167            public static String encode(String s) {
168
169                String str = null;
170
171                try {
172                    str = encode(s, dfltEncName);
173                } catch (UnsupportedEncodingException e) {
174                    // The system should always have the platform default
175                }
176
177                return str;
178            }
179
180            /**
181             * Translates a string into <code>application/x-www-form-urlencoded</code>
182             * format using a specific encoding scheme. This method uses the
183             * supplied encoding scheme to obtain the bytes for unsafe
184             * characters.
185             * <p>
186             * <em><strong>Note:</strong> The <a href=
187             * "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
188             * World Wide Web Consortium Recommendation</a> states that
189             * UTF-8 should be used. Not doing so may introduce
190             * incompatibilites.</em>
191             *
192             * @param   s   <code>String</code> to be translated.
193             * @param   enc   The name of a supported 
194             *    <a href="../lang/package-summary.html#charenc">character
195             *    encoding</a>.
196             * @return  the translated <code>String</code>.
197             * @exception  UnsupportedEncodingException
198             *             If the named encoding is not supported
199             * @see URLDecoder#decode(java.lang.String, java.lang.String)
200             * @since 1.4
201             */
202            public static String encode(String s, String enc)
203                    throws UnsupportedEncodingException {
204
205                boolean needToChange = false;
206                StringBuffer out = new StringBuffer(s.length());
207                Charset charset;
208                CharArrayWriter charArrayWriter = new CharArrayWriter();
209
210                if (enc == null)
211                    throw new NullPointerException("charsetName");
212
213                try {
214                    charset = Charset.forName(enc);
215                } catch (IllegalCharsetNameException e) {
216                    throw new UnsupportedEncodingException(enc);
217                } catch (UnsupportedCharsetException e) {
218                    throw new UnsupportedEncodingException(enc);
219                }
220
221                for (int i = 0; i < s.length();) {
222                    int c = (int) s.charAt(i);
223                    //System.out.println("Examining character: " + c);
224                    if (dontNeedEncoding.get(c)) {
225                        if (c == ' ') {
226                            c = '+';
227                            needToChange = true;
228                        }
229                        //System.out.println("Storing: " + c);
230                        out.append((char) c);
231                        i++;
232                    } else {
233                        // convert to external encoding before hex conversion
234                        do {
235                            charArrayWriter.write(c);
236                            /*
237                             * If this character represents the start of a Unicode
238                             * surrogate pair, then pass in two characters. It's not
239                             * clear what should be done if a bytes reserved in the 
240                             * surrogate pairs range occurs outside of a legal
241                             * surrogate pair. For now, just treat it as if it were 
242                             * any other character.
243                             */
244                            if (c >= 0xD800 && c <= 0xDBFF) {
245                                /*
246                                  System.out.println(Integer.toHexString(c) 
247                                  + " is high surrogate");
248                                 */
249                                if ((i + 1) < s.length()) {
250                                    int d = (int) s.charAt(i + 1);
251                                    /*
252                                      System.out.println("\tExamining " 
253                                      + Integer.toHexString(d));
254                                     */
255                                    if (d >= 0xDC00 && d <= 0xDFFF) {
256                                        /*
257                                          System.out.println("\t" 
258                                          + Integer.toHexString(d) 
259                                          + " is low surrogate");
260                                         */
261                                        charArrayWriter.write(d);
262                                        i++;
263                                    }
264                                }
265                            }
266                            i++;
267                        } while (i < s.length()
268                                && !dontNeedEncoding
269                                        .get((c = (int) s.charAt(i))));
270
271                        charArrayWriter.flush();
272                        String str = new String(charArrayWriter.toCharArray());
273                        byte[] ba = str.getBytes(charset);
274                        for (int j = 0; j < ba.length; j++) {
275                            out.append('%');
276                            char ch = Character
277                                    .forDigit((ba[j] >> 4) & 0xF, 16);
278                            // converting to use uppercase letter as part of
279                            // the hex value if ch is a letter.
280                            if (Character.isLetter(ch)) {
281                                ch -= caseDiff;
282                            }
283                            out.append(ch);
284                            ch = Character.forDigit(ba[j] & 0xF, 16);
285                            if (Character.isLetter(ch)) {
286                                ch -= caseDiff;
287                            }
288                            out.append(ch);
289                        }
290                        charArrayWriter.reset();
291                        needToChange = true;
292                    }
293                }
294
295                return (needToChange ? out.toString() : s);
296            }
297        }
www.java2java.com | Contact Us
All other trademarks are property of their respective owners.