| 1 | kses 0.2.1 README [kses strips evil scripts!] |
---|
| 2 | ================= |
---|
| 3 | |
---|
| 4 | |
---|
| 5 | * INTRODUCTION * |
---|
| 6 | |
---|
| 7 | |
---|
| 8 | Welcome to kses - an HTML/XHTML filter written in PHP. It removes all unwanted |
---|
| 9 | HTML elements and attributes, no matter how malformed HTML input you give it. |
---|
| 10 | It also does several checks on attribute values. kses can be used to avoid |
---|
| 11 | Cross-Site Scripting (XSS), Buffer Overflows and Denial of Service attacks, |
---|
| 12 | among other things. |
---|
| 13 | |
---|
| 14 | The program is released under the terms of the GNU General Public License. You |
---|
| 15 | should look into what that means, before using kses in your programs. You can |
---|
| 16 | find the full text of the license in the file COPYING. |
---|
| 17 | |
---|
| 18 | |
---|
| 19 | * FEATURES * |
---|
| 20 | |
---|
| 21 | |
---|
| 22 | Some of kses' current features are: |
---|
| 23 | |
---|
| 24 | * It will only allow the HTML elements and attributes that it was explicitly |
---|
| 25 | told to allow. |
---|
| 26 | |
---|
| 27 | * Element and attribute names are case-insensitive (a href vs A HREF). |
---|
| 28 | |
---|
| 29 | * It will understand and process whitespace correctly. |
---|
| 30 | |
---|
| 31 | * Attribute values can be surrounded with quotes, apostrophes or nothing. |
---|
| 32 | |
---|
| 33 | * It will accept valueless attributes with just names and no values (selected). |
---|
| 34 | |
---|
| 35 | * It will accept XHTML's closing " /" marks. |
---|
| 36 | |
---|
| 37 | * Attribute values that are surrounded with nothing will get quotes to avoid |
---|
| 38 | producing non-W3C conforming HTML |
---|
| 39 | (<a href=http://sourceforge.net/projects/kses> works but isn't valid HTML). |
---|
| 40 | |
---|
| 41 | * It handles lots of types of malformed HTML, by interpreting the existing |
---|
| 42 | code the best it can and then rebuilding new code from it. That's a better |
---|
| 43 | approach than trying to process existing code, as you're bound to forget about |
---|
| 44 | some weird special case somewhere. It handles problems like never-ending |
---|
| 45 | quotes and tags gracefully. |
---|
| 46 | |
---|
| 47 | * It will remove additional "<" and ">" characters that people may try to |
---|
| 48 | sneak in somewhere. |
---|
| 49 | |
---|
| 50 | * It supports checking attribute values for minimum/maximum length and |
---|
| 51 | minimum/maximum value, to protect against Buffer Overflows and Denial of |
---|
| 52 | Service attacks against WWW clients and various servers. You can stop |
---|
| 53 | <iframe src= width= height=> from having too high values for width and height, |
---|
| 54 | for instance. |
---|
| 55 | |
---|
| 56 | * It has got a system for whitelisting URL protocols. You can say that |
---|
| 57 | attribute values may only start with http:, https:, ftp: and gopher:, but no |
---|
| 58 | other URL protocols (javascript:, java:, about:, telnet:..). The functions that |
---|
| 59 | do this work handle whitespace, upper/lower case, HTML entities |
---|
| 60 | ("javascript:") and repeated entries ("javascript:javascript:alert(57)"). |
---|
| 61 | It also normalizes HTML entities as a nice side effect. |
---|
| 62 | |
---|
| 63 | * It removes Netscape 4's JavaScript entities ("&{alert(57)};"). |
---|
| 64 | |
---|
| 65 | * It handles NULL bytes and Opera's chr(173) whitespace characters. |
---|
| 66 | |
---|
| 67 | * There is both a procedural version and an object-oriented version of kses. |
---|
| 68 | |
---|
| 69 | |
---|
| 70 | * USE IT * |
---|
| 71 | |
---|
| 72 | |
---|
| 73 | It's very easy to use kses in your own PHP web application! Basic usage looks |
---|
| 74 | like this: |
---|
| 75 | |
---|
| 76 | |
---|
| 77 | <?php |
---|
| 78 | |
---|
| 79 | include 'kses.php'; |
---|
| 80 | |
---|
| 81 | $allowed = array('b' => array(), |
---|
| 82 | 'i' => array(), |
---|
| 83 | 'a' => array('href' => 1, 'title' => 1), |
---|
| 84 | 'p' => array('align' => 1), |
---|
| 85 | 'br' => array()); |
---|
| 86 | |
---|
| 87 | $val = $_POST['val']; |
---|
| 88 | if (get_magic_quotes_gpc()) |
---|
| 89 | $val = stripslashes($val); |
---|
| 90 | # You must strip slashes from magic quotes, or kses will get confused. |
---|
| 91 | |
---|
| 92 | $val = kses($val, $allowed); # The filtering takes place here. |
---|
| 93 | |
---|
| 94 | # Do something with $val. |
---|
| 95 | |
---|
| 96 | ?> |
---|
| 97 | |
---|
| 98 | |
---|
| 99 | This definition of $allowed means that only the elements B, I, A, P and BR are |
---|
| 100 | allowed (along with their closing tags /B, /I, /A, /P and /BR). B, I and BR |
---|
| 101 | may not have any attributes. A may only have the attributes HREF and TITLE, |
---|
| 102 | while P may only have the attribute ALIGN. You can list the elements and |
---|
| 103 | attributes in the array in any mixture of upper and lower case. kses will also |
---|
| 104 | recognize HTML code that uses both lower and upper case. |
---|
| 105 | |
---|
| 106 | It's important to select the right allowed attributes, so you won't open up |
---|
| 107 | an XSS hole by mistake. Some important attributes that you mustn't allow |
---|
| 108 | include but are not limited to: 1) style, and 2) all intrinsic events |
---|
| 109 | attributes (onMouseOver and so on, on* really). I'll write more about this in |
---|
| 110 | the documentation that will be distributed with future versions of kses. |
---|
| 111 | |
---|
| 112 | It's also important to note that kses' HTML input must be cleaned of all |
---|
| 113 | slashes coming from magic quotes. If the rest of your code requires these |
---|
| 114 | slashes to be present, you can always add them again after calling kses with |
---|
| 115 | a simple addslashes() call. |
---|
| 116 | |
---|
| 117 | You should take a look at the documentation in the docs/ directory and the |
---|
| 118 | examples in the examples/ directory, to get more information on how to use |
---|
| 119 | kses. The object-oriented version of kses is also worth checking out, and it's |
---|
| 120 | included in the oop/ directory. |
---|
| 121 | |
---|
| 122 | |
---|
| 123 | * UPGRADING FROM 0.1.0 OR 0.2.0 TO 0.2.1 * |
---|
| 124 | |
---|
| 125 | |
---|
| 126 | kses 0.2.1 is backwards compatible with 0.1.0 and 0.2.0, so upgrading should |
---|
| 127 | just be a matter of using a new version of kses.php instead of an old one! |
---|
| 128 | |
---|
| 129 | When you're ready to start using 0.2.1's new features, you can read about them |
---|
| 130 | in the files in the docs/ directory. The ChangeLog also summarizes the new |
---|
| 131 | features in this release. |
---|
| 132 | |
---|
| 133 | |
---|
| 134 | * NEW VERSIONS, MAILING LISTS AND BUG REPORTS * |
---|
| 135 | |
---|
| 136 | |
---|
| 137 | If you want to download new versions, subscribe to the kses-general mailing |
---|
| 138 | list or even take part in the development of kses, we refer you to its |
---|
| 139 | homepage at http://sourceforge.net/projects/kses . New developers and beta |
---|
| 140 | testers are more than welcome! |
---|
| 141 | |
---|
| 142 | If you have any bug reports, suggestions for improvement or simply want to tell |
---|
| 143 | us that you use kses for some project, feel free to post to the kses-general |
---|
| 144 | mailing list. If you have found any security problems (particularly XSS, |
---|
| 145 | naturally) in kses, please contact Ulf privately at metaur at users dot |
---|
| 146 | sourceforge dot net so he can correct it before you or someone else tells the |
---|
| 147 | public about it. |
---|
| 148 | |
---|
| 149 | (No, it's not a security problem in kses if some program that uses it allows a |
---|
| 150 | bad attribute, silly. If kses is told to accept the element body with the |
---|
| 151 | attributes style and onLoad, it will accept them, even if that's a really bad |
---|
| 152 | idea, securitywise.) |
---|
| 153 | |
---|
| 154 | |
---|
| 155 | * OTHER HTML FILTERS * |
---|
| 156 | |
---|
| 157 | |
---|
| 158 | Here are the other stand-alone, open source HTML filters that we currently know |
---|
| 159 | of: |
---|
| 160 | |
---|
| 161 | * XSS filter for PHP4 - the filter from Squirrelmail |
---|
| 162 | PHP |
---|
| 163 | Konstantin Riabitsev |
---|
| 164 | http://www.mricon.com/html/phpfilter.html |
---|
| 165 | |
---|
| 166 | * HTML::StripScripts and related CPAN modules |
---|
| 167 | Perl |
---|
| 168 | Nick Cleaton |
---|
| 169 | http://search.cpan.org/perldoc?HTML%3A%3AStripScripts |
---|
| 170 | |
---|
| 171 | There are also a lot of HTML filters that were written specifically for some |
---|
| 172 | program. Some of them are better than others. |
---|
| 173 | |
---|
| 174 | Please write to the kses-general mailing list if you know of any other |
---|
| 175 | stand-alone, open-source filters. |
---|
| 176 | |
---|
| 177 | |
---|
| 178 | * DEDICATION * |
---|
| 179 | |
---|
| 180 | |
---|
| 181 | kses 0.2.1 is dedicated to Mischa the cat. |
---|
| 182 | |
---|
| 183 | |
---|
| 184 | * MISC * |
---|
| 185 | |
---|
| 186 | |
---|
| 187 | The kses code is based on an HTML filter that Ulf wrote on his own back in 2002 |
---|
| 188 | for the open-source project Gnuheter ( http://savannah.nongnu.org/projects/ |
---|
| 189 | gnuheter ). Gnuheter is a fork from PHP-Nuke. The HTML filter has been |
---|
| 190 | improved a lot since then. |
---|
| 191 | |
---|
| 192 | To stop people from having sleepless nights, we feel the urgent need to state |
---|
| 193 | that kses doesn't have anything to do with the KDE project, despite having a |
---|
| 194 | name that starts with a K. |
---|
| 195 | |
---|
| 196 | In case someone was wondering, Ulf is available for kses-related consulting. |
---|
| 197 | |
---|
| 198 | Finally, the name kses comes from the terms XSS and access. It's also a |
---|
| 199 | recursive acronym (every open-source project should have one!) for "kses |
---|
| 200 | strips evil scripts". |
---|
| 201 | |
---|
| 202 | |
---|
| 203 | // Ulf and the kses gang, September 2003 |
---|