|
|
 |
htmlspecialchars (PHP 3, PHP 4, PHP 5) htmlspecialchars --
Convert special characters to HTML entities
Descriptionstring htmlspecialchars ( string string [, int quote_style [, string charset]] )
Certain characters have special significance in HTML, and should
be represented by HTML entities if they are to preserve their
meanings. This function returns a string with some of these
conversions made; the translations made are those most
useful for everyday web programming. If you require all HTML
character entities to be translated, use
htmlentities() instead.
This function is useful in preventing user-supplied text from
containing HTML markup, such as in a message board or guest book
application. The optional second argument, quote_style, tells
the function what to do with single and double quote characters.
The default mode, ENT_COMPAT, is the backwards compatible mode
which only translates the double-quote character and leaves the
single-quote untranslated. If ENT_QUOTES is set, both single and
double quotes are translated and if ENT_NOQUOTES is set neither
single nor double quotes are translated.
The translations performed are:
'&' (ampersand) becomes '&'
'"' (double quote) becomes '"' when ENT_NOQUOTES
is not set.
''' (single quote) becomes ''' only when
ENT_QUOTES is set.
'<' (less than) becomes '<'
'>' (greater than) becomes '>'
Example 1. htmlspecialchars() example |
<?php
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; ?>
|
|
Note that this function does not translate anything beyond what
is listed above. For full entity translation, see
htmlentities(). Support for the optional
second argument was added in PHP 3.0.17 and PHP 4.0.3.
The third argument charset defines character set
used in conversion. The default character set is ISO-8859-1. Support for
this third argument was added in PHP 4.1.0.
Following character sets are supported in PHP 4.3.0 and later.
Table 1. Supported charsets | Charset | Aliases | Description |
|---|
| ISO-8859-1 | ISO8859-1 |
Western European, Latin-1
| | ISO-8859-15 | ISO8859-15 |
Western European, Latin-9. Adds the Euro sign, French and Finnish
letters missing in Latin-1(ISO-8859-1).
| | UTF-8 | |
ASCII compatible multi-byte 8-bit Unicode.
| | cp866 | ibm866, 866 |
DOS-specific Cyrillic charset.
This charset is supported in 4.3.2.
| | cp1251 | Windows-1251, win-1251, 1251 |
Windows-specific Cyrillic charset.
This charset is supported in 4.3.2.
| | cp1252 | Windows-1252, 1252 |
Windows specific charset for Western European.
| | KOI8-R | koi8-ru, koi8r |
Russian. This charset is supported in 4.3.2.
| | BIG5 | 950 |
Traditional Chinese, mainly used in Taiwan.
| | GB2312 | 936 |
Simplified Chinese, national standard character set.
| | BIG5-HKSCS | |
Big5 with Hong Kong extensions, Traditional Chinese.
| | Shift_JIS | SJIS, 932 |
Japanese
| | EUC-JP | EUCJP |
Japanese
|
Note:
Any other character sets are not recognized and ISO-8859-1 will be used
instead.
See also get_html_translation_table(),
strip_tags(),
htmlentities(), and nl2br().
User Contributed Notes
htmlspecialchars
palrich at gmail dot com
16-May-2005 03:29
To Alexander Nofftz and urbanheroes:
It's not an IE problem. There is no ' in HTML. So it's only a problem if someone else does render this as an apostraphe on an HTML page.
paul dot l at aon dot at
09-May-2005 11:50
function reverse_htmlentities($mixed)
{
$htmltable = get_html_translation_table(HTML_ENTITIES);
foreach($htmltable as $key => $value)
{
$mixed = ereg_replace(addslashes($value),$key,$mixed);
}
return $mixed;
}
this is my version of a reversed htmlentities function
thisiswherejunkgoes at gmail dot com
06-May-2005 12:06
If there're any n00bs out there looking for a way to ensure that no html/special chars are getting sent to their databases/put through forms/etc., this has been doing the trick for me (though being at least slightly n00bish, if this won't always work perhaps someone will ammend :-)
function checkforchars ($foo) {
if ($foo === htmlspecialchars($foo)) {
return "Valid entry.";
} else {
return "Invalid entry.";
}
}
urbanheroes {at} gmail {dot} com
30-Apr-2005 01:32
In response to the note made by Alexander Nofftz on October 2004, ' is used instead of ' because IE unfortunately seems to have trouble with the latter.
gt at realvertex.com
28-Apr-2005 11:55
Here is the recursive version that works for both arrays and strings. Doesn't look as elegant as the other recursive versions, because of the input checks.
function HTML_ESC($_input = null, $_esc_keys = false)
{
if ((null != $_input) && (is_array($_input)))
{
foreach($_input as $key => $value)
{
if($_esc_keys)
{
$_return[htmlspecialchars($key)] = HTML_ESC($value,$_esc_keys);
}
else
{
$_return[$key] = HTML_ESC($value);
}
}
return $_return;
}
elseif(null != $_input)
{
return htmlspecialchars($_input);
}
else
{
return null;
}
}
took
23-Apr-2005 11:14
The Algo from donwilson at gmail dot com to reverse the action of htmlspecialchars(), edited for germany:
function unhtmlspecialchars( $string )
{
$string = str_replace ( '&', '&', $string );
$string = str_replace ( ''', '\'', $string );
$string = str_replace ( '"', '"', $string );
$string = str_replace ( '<', '<', $string );
$string = str_replace ( '>', '>', $string );
$string = str_replace ( 'ü', 'ü', $string );
$string = str_replace ( 'Ü', 'Ü', $string );
$string = str_replace ( 'ä', 'ä', $string );
$string = str_replace ( 'Ä', 'Ä', $string );
$string = str_replace ( 'ö', 'ö', $string );
$string = str_replace ( 'Ö', 'Ö', $string );
return $string;
}
11-Mar-2005 06:22
function htmlspecialchars_array($arr = array()) {
$rs = array();
while(list($key,$val) = each($arr)) {
if(is_array($val)) {
$rs[$key] = htmlspecialchars_array($val);
}
else {
$rs[$key] = htmlspecialchars($val, ENT_QUOTES);
}
}
return $rs;
}
beer UNDRSCR nomaed AT hotmail DOT com
01-Feb-2005 04:46
After inspecting the non-native encoding problem, I noticed that for example, if the encoding is cyrillic, and I write Latin characters that are not part of the encoding (æ for example - ae-ligature), the browser will send the real entity, such as æ for this case.
Therefore, the only way I see to display multilingual text that is encoded with entities is by:
<?php
echo str_replace('&', '&', htmlspecialchars($txt));
?>
The regex for numeric entities will skip the Latin-1 textual entities.
zolinak at zoli dot szathmari dot hu
14-Dec-2004 06:46
A sample function, if anybody want to turn html entities (and special characters) back to simple. (eg: "è", "<" etc)
function html2specialchars($str){
$trans_table = array_flip(get_html_translation_table(HTML_ENTITIES));
return strtr($str, $trans_table);
}
beer UNDRSCR nomaed AT hotmail DOT com
21-Oct-2004 03:03
Quite often, on HTML pages that are not encoded as UTF-8, and people write in not native encoding, some browser (for sure IExplorer) will send the different charset characters using HTML Entities, such as б for small russian 'b'.
htmlspecialchars() will convert this character to the entity, since it changes all & to &
What I usually do, is either turn & back to & so the correct characters will appear in the output, or I use some regex to replace all entities of characters back to their original entity:
<?php
$result = preg_replace('/&#(x[a-f0-9]+|[0-9]+);/i', '&#$1;', $source);
?>
Alexander Nofftz
20-Oct-2004 06:41
mlvanbie at gmail dot com
06-Oct-2004 06:45
The code in the previous note has a bug. If the original text was `>' then htmlspecialchars will turn it into `&gt;' and the suggested code will turn that into `>'. The & translation must be last.
donwilson at gmail dot com
25-Sep-2004 11:58
To reverse the action of htmlspecialchars(), use this code:
<?php
unhtmlspecialchars( $string )
{
$string = str_replace ( '&', '&', $string );
$string = str_replace ( ''', '\'', $string );
$string = str_replace ( '"', '\"', $string );
$string = str_replace ( '<', '<', $string );
$string = str_replace ( '>', '>', $string );
return $string;
}
?>
thelatesundayshow.com @ nathan (flip it)
02-Sep-2004 01:51
heres a version of the recursive escape function that takes the array byref rather than byval so saves some resources in case of big arrays
function recurse_array_HTML_safe(&$arr) {
foreach ($arr as $key => $val)
if (is_array($val))
recurse_array_HTML_safe($arr[$key]);
else
$arr[$key] = htmlspecialchars($val, ENT_QUOTES);
}
moc.xnoitadnuof@310symerej
21-Apr-2004 06:04
Here are some usefull functions.
They will apply || decode, htmlspecialchars || htmlentities recursivly to arrays() || to regular $variables. They also protect agains "double encoding".
<?PHP
function htmlspecialchars_or( $mixed, $quote_style = ENT_QUOTES ){
return is_array($mixed) ? array_map('htmlspecialchars_or',$mixed, array_fill(0,count($mixed),$quote_style)) : htmlspecialchars(htmlspecialchars_decode($mixed, $quote_style ),$quote_style);
}
function htmlspecialchars_decode( $mixed, $quote_style = ENT_QUOTES ) {
if(is_array($mixed)){
return array_map('htmlspecialchars_decode',$mixed, array_fill(0,count($mixed),$quote_style));
}
$trans_table = get_html_translation_table( HTML_SPECIALCHARS, $quote_style );
if( $trans_table["'"] != ''' ) { $trans_table["'"] = ''';
}
return (strtr($mixed, array_flip($trans_table)));
}
function htmlentities_or($mixed, $quote_style = ENT_QUOTES){
return is_array($mixed) ? array_map('htmlentities_or',$mixed, array_fill(0,count($mixed),$quote_style)) : htmlentities(htmlentities_decode($mixed, $quote_style ),$quote_style);
}
function htmlentities_decode( $mixed, $quote_style = ENT_QUOTES ) {
if(is_array($mixed)){
return array_map('htmlentities_decode',$mixed, array_fill(0,count($mixed),$quote_style));
}
$trans_table = get_html_translation_table(HTML_ENTITIES, $quote_style );
if( $trans_table["'"] != ''' ) { $trans_table["'"] = ''';
}
return (strtr($mixed, array_flip($trans_table)));
}
?>
These functions are an addition to an earlier post. I would like to give the person some credit but I do not know who it was.
<? ;llnu=u!eJq dHd?>
Dave Duchene
19-Feb-2004 07:58
Here is a handy function that will escape the contents of a variable, recursing into arrays.
<?php
function escaporize($thing) {
if (is_array($thing)) {
$escaped = array();
foreach ($thing as $key => $value) {
$escaped[$key] = escaporize($value);
}
return $escaped;
}
return htmlspecialchars($thing);
}
?>
mike-php at emerge2 dot com
20-Nov-2003 04:13
Here's a handy function that guards against 'double' encoding:
# Given a string, this function first strips out all html special characters, then
# encodes the string, safely returning an encoded string without double-encoding.
function get_htmlspecialchars( $given, $quote_style = ENT_QUOTES ){
return htmlspecialchars( html_entity_decode( $given, $quote_style ), $quote_style );
}
# Needed for older versions of PHP that do not have this function built-in.
function html_entity_decode( $given_html, $quote_style = ENT_QUOTES ) {
$trans_table = get_html_translation_table( HTML_SPECIALCHARS, $quote_style );
if( $trans_table["'"] != ''' ) { # some versions of PHP match single quotes to '
$trans_table["'"] = ''';
}
return ( strtr( $given_html, array_flip( $trans_table ) ) );
}
Note: I set the default to ENT_QUOTES, as this makes more sense to me than the PHP function's default of ENT_COMPAT.
nospam at somewhere dot com
15-Jun-2003 12:28
most simple function for decoding html-encoded strings:
function htmldecode($encoded) {
return strtr($encoded,array_flip(get_html_translation_table(HTML_ENTITIES)));
}
dystopia589 at yahoo dot com
13-Mar-2003 09:58
Sorry, part of that code was unnecessary. Here's a more readable version:
function SpecialChars($Security)
{
if (is_array($Security))
{
while(list($key, $val) = each($Security))
{
$Security[$key] = SpecialChars($val);
}
}
else
{
$Security = htmlspecialchars(stripslashes($Security), ENT_QUOTES);
}
return $Security;
}
webmaster at NOSPAM dot onlinegs dot com
29-Jan-2003 12:51
for those of you using V 4.3.0+ you can use html_entity_decode() to decode a string encoded with htmlspecialschars(), this should be faster and easier then using a str_replace or ereg.
_____ at luukku dot com
14-Sep-2002 04:21
People, don't use ereg_replace for the most simple string replacing operations (replacing constant string with another).
Use str_replace.
akira dot yoshi at shrine dot de
15-May-2002 11:15
If you need to htmlspecialchars a jis string, here's a function that does:
function htmlspecialchars_jis($text) {
$ret="";
if ($text=="") return "";
$esc=chr(27);
$text=$esc."$B".$esc."$B".$text;
$text=str_replace($esc."(B", $esc."$B", $text);
$trans=explode($esc."$B", $text);
$enc=0;
while (list (, $val) = each ($trans)) {
if ($enc==0) {
$val.="";
if ($val!="") $ret.=htmlspecialchars($val);
$enc=1;
} else {
$val.="";
if ($val!="") $ret.=$esc."$B".$val.$esc."(B";
$enc=0;
};
}
return $ret;
};
BTW: I'm very(!) sure that JIS is iso-2022-jp, not iso-2002-jp
juadielon_NOSPAM at hotmail dot com
30-Apr-2002 11:09
I was trying to retrieve information from a database to display it into the browser. However it did not work as I was expecting. For instance double quotes (“”) and single quotes (‘’) were conflicting in HTML in an INPUT selector.
The first approach to solve this was to use htmlspecialchars to convert special characters to HTML entities to display the input box with its value.
$encode=htmlspecialchars($str, ENT_QUOTES);
However, the result was having HTML entities with a \ (backslash) preceding it (escape characters). For instance ampersand (&) becomes \& displaying \& and double quotes becomes \" displaying \”
So the final solution was to replace first any \ (backslash) and then ask htmlspecialchars to make the conversion.
[Editor's Note: This is the wrong way to do this. The proper way is to use
$encoded = htmlspecialchars(stripslashes($str), ENT_QUOTES);
]
$encoded=htmlspecialchars(str_replace('\\', '', $str), ENT_QUOTES);
Try this example to see it your self.
<form action="<?php echo $PHP_SELF; ?>">
<input type="text" name="str" size="20" value="">
<input type="submit" value="Submit">
<br>
<?php
if (!empty($str)) {
$encoded=htmlspecialchars(str_replace('\\', '', $str), ENT_QUOTES);
echo "<br><p>Result: <b>".$encoded."</b>. It should be the same you just typed</p>";
echo "<p>But source code is transformed to:<b><xmp>".$encoded."</xmp></b></p>";
}
?>
</form>
Hope this will helps someone.
akira at kurogane dot net
01-Apr-2002 11:42
Beware of parsing JIS (aka 'iso-2002-jp') text through this function, as this function does not appear to have a sense for multibyte characters and may corrupt some characters. Eg. the japanese comma (the two ascii characters !" as viewed by an ascii client) gets transferred into !" , which transforms the comma into a 'maru' mark and the following characters into garbage.
Conceivably this could affect other multibyte charsets.
joseph at nextique dot com
20-Feb-2002 03:21
Here is a handy function to htmlalize an array (or scalar) before you hand it off to xml.
function htmlspecialchars_array($arr = array()) {
$rs = array();
while(list($key,$val) = each($arr)) {
if(is_array($val)) {
$rs[$key] = htmlspecialchars_array($val);
}
else {
$rs[$key] = htmlspecialchars($val, ENT_QUOTES);
}
}
return $rs;
}
15-Jul-2001 01:18
If your sending data from one form to another, the data in the textareas and text inputs may need to have htmlspecialchars("form data", ENT_QUOTES) applied, assuming you will ever have quotes or less-than signs or any of those special characters. Using htmlspecialchars will make the text show up properly in the second form. The changes are automatically undone whenever the form data is submitted. It does seem a little strange, but it works and my headache is now starting to go away.
AZ
ryan at ryano dot net
29-Jun-2001 05:06
Actually, if you're using >= 4.0.5, this should theoretically be quicker (less overhead anyway):
$text = str_replace(array(">", "<", """, "&"), array(">", "<", "\"", "&"), $text);
david at gislaved dot net
13-Sep-2000 09:49
To replace the swedish characters...
$s=ereg_replace(197, "Å",$s);
$s=ereg_replace(196, "Ä",$s);
$s=ereg_replace(214, "Ö",$s);
$s=ereg_replace(229, "å",$s);
$s=ereg_replace(228, "ä",$s);
$s=ereg_replace(246, "ö",$s);
thorax at inforocket dot com
08-Dec-1999 07:26
to convert a document back from this,
do string replacements in this order:
> >
< <
" "
& &
Doing the last phase first will
reveal erroneous results.. For example:
'<' => specialchars() => '<' '<' => convert ampersands => '<' => convert everything else => '<'
| |