search for in the  
<strcspnstripcslashes>
Last updated: Thu, 19 May 2005

strip_tags

(PHP 3 >= 3.0.8, PHP 4, PHP 5)

strip_tags -- Strip HTML and PHP tags from a string

Description

string strip_tags ( string str [, string allowable_tags] )

This function tries to return a string with all HTML and PHP tags stripped from a given str. It uses the same tag stripping state machine as the fgetss() function.

You can use the optional second parameter to specify tags which should not be stripped.

Note: allowable_tags was added in PHP 3.0.13 and PHP 4.0b3.

Since PHP 4.3.0, HTML comments are also stripped. This is hardcoded and can not be changed with allowable_tags.

Warning

Because strip_tags() does not actually validate the HTML, partial, or broken tags can result in the removal of more text/data than expected.

Warning

This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.

Example 1. strip_tags() example

<?php
$text
= '<p>Test paragraph.</p><!-- Comment --> Other text';
echo
strip_tags($text);
echo
"\n";

// Allow <p>
echo strip_tags($text, '<p>');
?>

The above example will output:

Test paragraph. Other text
<p>Test paragraph.</p> Other text

strip_tags() has been binary safe since PHP 5.0.0

See also htmlspecialchars().



User Contributed Notes
strip_tags
info {at at } programare dot dot dot org
02-May-2005 03:58
A simple function to strip given characters:

function strip_chars($string,$chars) {
  preg_match_all("(.)",$chars,$tmp_clean);
  return(str_replace($tmp_clean[0],
       array_fill(0,count($tmp_clean),""),$string));
}

Then use it:

echo strip_chars("Any text you 'want';"," ';");

=> 'Anytextyouwant'
bazzy
22-Apr-2005 07:09
I think bryn and john780 are missing the point - eric at direnetworks wasn't suggesting there is an overall string limit of 1024 characters but rather that actual tags over 1024 characters long (eg, in his case it sounds like a really long encrypted <a href> tag) will fail to be stripped.

The functions to slowly pass strings through strip_tags 1024 characters at a time aren't necessary and are actually counter productive (since if a tag spans the break point, ie it is opened before the 1024 characters and closed after the 1024 characters then only the opening tag is removed which leaves a mess of text up to the closing tag).

Only mentioning this as I spent ages working out a better way to deal with this character spanning before I actually went back and read eric's post and realised the subsequent posts were misleading - hopefully it'll save others the same headaches :)
bryn -at- drumdatabase dot net
20-Apr-2005 04:38
Further to john780's idea for a solution to the 1024 character limit of strip_tags - it's a good one, but I think the ltrim function isn't the one for the job? I wrote this simple function to get around the limit (I'm a newbie, so there may be some problem / better way of doing it!):

<?
function strip_tags_in_big_string($textstring){
   while (
strlen($textstring) != 0)
       {
      
$temptext = strip_tags(substr($textstring,0,1024));
      
$safetext .= $temptext;
      
$textstring = substr_replace($textstring,'',0,1024);
       }   
   return
$safetext;
}
?>

Hope someone finds it useful.
cz188658 at tiscali dot cz
07-Apr-2005 03:21
If you want to remove XHTML tags like <br /> (single pair tags), as an allowable_tags parametr you must include tag <br>
Jiri
php at arzynik dot com
29-Mar-2005 06:04
instead of removing tags that you dont want, sometimes you might want to just stop them from doing anything.

<?php
$disalowedtags
= array("script",
                      
"object",
                      
"iframe",
                      
"image",
                      
"applet",
                      
"meta",
                      
"form",
                      
"onmouseover",
                      
"onmouseout");

foreach (
$_GET as $varname)
foreach (
$disalowedtags as $tag)
if (
eregi("<[^>]*".$tag."*\"?[^>]*>", $varname))
die(
"stop that");

foreach (
$_POST as $varname)
foreach (
$disalowedtags as $tag)
if (
eregi("<[^>]*".$tag."*\"?[^>]*>", $varname))
die(
"stop that");

?>
christianbecke at web dot de
16-Feb-2005 08:34
to kangaroo232002 at yahoo dot co dot uk:

As far as I understand, what you report is not a bug in strip_tags(), but a bug in your HTML.
You should use alt='Go &gt;' instead of alt='Go >'.

I suppose your HTML diplays allright in browsers, but that does not mean it's correct. It just shows that browsers are more graceful concerning characters not properly escaped as entities than strip_tags() is.
kangaroo232002 at yahoo dot co dot uk
03-Feb-2005 07:23
After wondering why the following was indexed in my trawler despite stripping all text in tags (and punctuation) "» valign left align middle border 0 src go gif name search1 onclick search", please take a quick look at what produced it: <DIV style="position: absolute; TOP:22%; LEFT:68%;"><input type="image" alt="Go >" valign="left" align="middle" border=0 src="go.gif" name="search1" onClick="search()"></div>...

looking at this closely, it is possible to see that despite the 'Go >' statement being enclosed in speech marks (with the right facing chevron), strip_tags() still assumes that it is the end of the input statement, and treats everything after as text. Not sure if this has been fixed in later versions; im using v4.3.3...

good hunting.
jon780 -at- gmail.com
02-Feb-2005 11:18
To eric at direnetworks dot com regarding the 1024 character limit:

You could simply ltrim() the first 1024 characters, run them through strip_tags(), add them to a new string, and remove them from the first.

Perform this in a loop which continued until the original string was of 0 length.
dumb at coder dot com
17-Jan-2005 06:22
/*
15Jan05

Within <textarea>, Browsers auto render & display certain "HTML Entities" and "HTML Entity Codes" as characters:
&lt; shows as <    --    &amp; shows as &    --    etc.

Browsers also auto change any "HTML Entity Codes" entered in a <textarea> into the resultant display characters BEFORE UPLOADING.  There's no way to change this, making it difficult to edit html in a <textarea>

"HTML Entity Codes" (ie, use of &#60 to represent "<", &#38 to represent "&" &#160 to represent "&nbsp;") can be used instead.  Therefore, we need to "HTML-Entitize" the data for display, which changes the raw/displayed characters into their HTML Entity Code equivalents before being shown in a <textarea>.

how would I get a textarea to contain "&lt;" as a literal string of characters and not have it display a "<"
&amp;lt; is indeed the correct way of doing that. And if you wanted to display that, you'd need to use &amp;amp;lt;'. That's just how HTML entities work.

htmlspecialchars() is a subset of htmlentities()
the reverse (ie, changing html entity codes into displayed characters, is done w/ html_entity_decode()

google on ns_quotehtml and see http://aolserver.com/docs/tcl/ns_quotehtml.html
see also http://www.htmlhelp.com/reference/html40/entities/
*/
eric at direnetworks dot com
20-Dec-2004 08:36
the strip_tags() function in both php 4.3.8 and 5.0.2 (probably many more, but these are the only 2 versions I tested with) have a max tag length of 1024.  If you're trying to process a tag over this limit, strip_tags will not return that line (as if it were an illegal tag).  I noticed this problem while trying to parse a paypal encrypted link button (<input type="hidden" name="encrypted" value="encryptedtext">, with <input> as an allowed tag), which is 2702 characters long.  I can't really think of any workaround for this other than parsing each tag to figure out the length, then only sending it to strip_tags() if its under 1024, but at that point, I might as well be stripping the tags myself.
ashley at norris dot org dot au
31-Oct-2004 09:11
leathargy at hotmail dot com wrote:

"it seems we're all overlooking a few things:
1) if we replace "</ta</tableble>" by removing </table, we're not better off..."

I beat this by using ($input contains the data):

<?php
while($input != strip_tags($input)) {
          
$input = strip_tags($input);
       }
?>

This iteratively strips tags until all tags have gone :)
@dada
29-Sep-2004 07:41
if you  only want to have the text within the tags, you can use this function:

function showtextintags($text)

{

$text = preg_replace("/(\<script)(.*?)(script>)/si", "dada", "$text");
$text = strip_tags($text);
$text = str_replace("<!--", "&lt;!--", $text);
$text = preg_replace("/(\<)(.*?)(--\>)/mi", "".nl2br("\\2")."", $text);

return $text;

}

it will show all the text without tags and (!!!) without javascripts
Anonymous User
22-Aug-2004 11:24
Be aware that tags constitute visual whitespace, so stripping may leave the resulting text looking misjoined.

For example,

"<strong>This is a bit of text</strong><p />Followed by this bit"

are seperable paragraphs on a visual plane, but if simply stripped of tags will result in

"This is a bit of textFollowed by this bit"

which may not be what you want, e.g. if you are creating an excerpt for an RSS description field.

The workaround is to force whitespace prior to stripping, using something like this:

     $text = getTheText();
     $text = preg_replace('/</',' <',$text);
     $text = preg_replace('/>/','> ',$text);
     $desc = html_entity_decode(strip_tags($text));
     $desc = preg_replace('/[\n\r\t]/',' ',$desc);
     $desc = preg_replace('/  /',' ',$desc);
Isaac Schlueter php at isaacschlueter dot com
16-Aug-2004 09:32
steven --at-- acko --dot-- net pointed out that you can't make strip_slashes allow comments.  With this function, you can.  Just pass <!--> as one of the allowed tags.  Easy as pie: just pull them out, strip, and then put them back.

<?php
function strip_tags_c($string, $allowed_tags = '')
{   
  
$allow_comments = ( strpos($allowed_tags, '<!-->') !== false );
   if(
$allow_comments )
   {
      
$string = str_replace(array('<!--', '-->'), array('&lt;!--', '--&gt;'), $string);
      
$allowed_tags = str_replace('<!-->', '', $allowed_tags);
   }
  
$string = strip_tags( $string, $allowed_tags );
   if(
$allow_comments ) $string = str_replace(array('&lt;!--', '--&gt;'), array('<!--', '-->'), $string);
   return
$string;
}
?>
Isaac Schlueter php at isaacschlueter dot com
16-Aug-2004 01:16
I am creating a rendering plugin for a CMS system (http://b2evolution.net) that wraps certain bits of text in acronym tags.  The problem is that if you have something like this:
<a href="http://www.php.net" title="PHP is cool!">PHP</a>

then the plugin will mangle it into:

<a href="http://www.<acronym title="PHP: Hypertext Processor">php</acronym>.net" title="<acronym title="PHP: Hypertext Processor">PHP</acronym> is cool!>PHP</a>

This function will strip out tags that occur within other tags.  Not super-useful in tons of situations, but it was an interesting puzzle.  I had started out using preg_replace, but it got riduculously complicated when there were linebreaks and multiple instances in the same tag.

The CMS does its XHTML validation before the content gets to the plugin, so we can be pretty sure that the content is well-formed, except for the tags inside of other tags.

<?php
if( !function_exists( 'antiTagInTag' ) )
{
  
// $content is the string to be anti-tagintagged, and $format sets the format of the internals.
  
function antiTagInTag( $content = '', $format = 'htmlhead' )
   {
       if( !
function_exists( 'format_to_output' ) )
       {   
// Use the external function if it exists, or fall back on just strip_tags.
          
function format_to_output($content, $format)
           {
               return
strip_tags($content);
           }
       }
      
$contentwalker = 0;
      
$length = strlen( $content );
      
$tagend = -1;
       for(
$tagstart = strpos( $content, '<', $tagend + 1 ) ; $tagstart !== false && $tagstart < strlen( $content ); $tagstart = strpos( $content, '<', $tagend ) )
       {
          
// got the start of a tag.  Now find the proper end!
          
$walker = $tagstart + 1;
          
$open = 1;
           while(
$open != 0 && $walker < strlen( $content ) )
           {
              
$nextopen = strpos( $content, '<', $walker );
              
$nextclose = strpos( $content, '>', $walker );
               if(
$nextclose === false )
               {   
// ERROR! Open waka without close waka!
                   // echo '<code>Error in antiTagInTag - malformed tag!</code> ';
                  
return $content;
               }
               if(
$nextopen === false || $nextopen > $nextclose )
               {
// No more opens, but there was a close; or, a close happens before the next open.
                   // walker goes to the close+1, and open decrements
                  
$open --;
                  
$walker = $nextclose + 1;
               }
               elseif(
$nextopen < $nextclose )
               {
// an open before the next close
                  
$open ++;
                  
$walker = $nextopen + 1;
               }
           }
          
$tagend = $walker;
           if(
$tagend > strlen( $content ) )
              
$tagend = strlen( $content );
           else
           {
              
$tagend --;
              
$tagstart ++;
           }
          
$tag = substr( $content, $tagstart, $tagend - $tagstart );
          
$tags[] = '<' . $tag . '>';
          
$newtag = format_to_output( $tag, $format );
          
$newtags[] = '<' . $newtag . '>';
          
$newtag = format_to_output( $tag, $format );
       }
      
      
$content = str_replace($tags, $newtags, $content);
       return
$content;
   }
}
Tony Freeman
19-Nov-2003 04:45
This is a slightly altered version of tREXX's code.  The difference is that this one simply removes the unwanted attributes (rather than flagging them as forbidden).

function removeEvilAttributes($tagSource)
{
       $stripAttrib = "' (style|class)=\"(.*?)\"'i";
       $tagSource = stripslashes($tagSource);
       $tagSource = preg_replace($stripAttrib, '', $tagSource);
       return $tagSource;
}

function removeEvilTags($source)
{
   $allowedTags='<a><br><b><h1><h2><h3><h4><i>' .
             '<img><li><ol><p><strong><table>' .
             '<tr><td><th><u><ul>';
   $source = strip_tags($source, $allowedTags);
   return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);
}

$text = '<p style="Normal">Saluton el <a href="#?"
 class="xsarial">Esperanto-lando</a><img src="my.jpg"
 alt="Saluton" width=100 height=100></p>';

$text = removeEvilTags($text);

var_dump($text);
leathargy at hotmail dot com
26-Oct-2003 12:15
it seems we're all overlooking a few things:
1) if we replace "</ta</tableble>" by removing </table, we're not better off. try using a char-by-char comparison, and replaceing stuff with *s, because then this ex would become "</ta******ble>", which is not problemmatic; also, with a char by char approach, you can skip whitespace, and kill stuff like "< table>"... just make sure <&bkspTable> doesn't work...
2) no browser treats { as <.[as far as i know]
3) because of statement 2, we can do:
$remove=array("<?","<","?>",">");
$change=array("{[pre]}","{[","{/pre}","]}");
$repairSeek = array("{[pre]}", "</pre>","{[b]}","{[/b]}","{[br]}");
// and so forth...

$repairChange("<pre>","</pre>","<b>","<b>","<br>");
// and so forth...

$maltags=array("{[","]}");
$nontags=array("{","}");
$unclean=...;//get variable from somewhere...
$unclean=str_replace($remove,$change,$unclean);
$unclean=str_replace($repairSeek, $repairChange, $unclean);
$clean=str_replace($maltags, $nontags, $unclean);

////end example....
4) we can further improve the above by using explode(for our ease):
function purifyText($unclean, $fixme)
{
$remove=array();
$remove=explode("\n",$fixit['remove']);
//... and so forth for each of the above arrays...
// or you could just pass the arrays..., or a giant string
//put above here...
return $clean
}//done
tREXX [www.trexx.ch]
15-Oct-2003 08:15
Here's a quite fast solution to remove unwanted tags AND also unwanted attributes within the allowed tags:

<?php
/**
 * Allow these tags
 */
$allowedTags = '<h1><b><i><a><ul><li><pre><hr><blockquote><img>';

/**
 * Disallow these attributes/prefix within a tag
 */
$stripAttrib = 'javascript:|onclick|ondblclick|onmousedown|onmouseup|onmouseover|'.
              
'onmousemove|onmouseout|onkeypress|onkeydown|onkeyup';

/**
 * @return string
 * @param string
 * @desc Strip forbidden tags and delegate tag-source check to removeEvilAttributes()
 */
function removeEvilTags($source)
{
   global
$allowedTags;
  
$source = strip_tags($source, $allowedTags);
   return
preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);
}

/**
 * @return string
 * @param string
 * @desc Strip forbidden attributes from a tag
 */
function removeEvilAttributes($tagSource)
{
   global
$stripAttrib;
   return
stripslashes(preg_replace("/$stripAttrib/i", 'forbidden', $tagSource));
}

// Will output: <a href="forbiddenalert(1);" target="_blank" forbidden =" alert(1)">test</a>
echo removeEvilTags('<a href="javascript:alert(1);" target="_blank" onMouseOver = "alert(1)">test</a>');
?>
dougal at gunters dot org
10-Sep-2003 03:03
strip_tags() appears to become nauseated at the site of a <!DOCTYPE> declaration (at least in PHP 4.3.1). You might want to do something like:

$html = str_replace('<!DOCTYPE','<DOCTYPE',$html);

before processing with strip_tags().
joris878 at hotmail dot com
04-Jun-2003 07:58
[  Editor's Note: This functionality will be natively supported in a future release of PHP.  Most likely 5.0  ]

This routine removes all attributes from a given tag except
the attributes specified in the array $attr.

function stripeentag($msg,$tag,$attr) {
  $lengthfirst = 0;
  while (strstr(substr($msg,$lengthfirst),"<$tag ")!="")
  {
   $imgstart = $lengthfirst + strpos(substr($msg,$lengthfirst), "<$tag ");
   $partafterwith = substr($msg,$imgstart);
   $img = substr($partafterwith,0,strpos($partafterwith,">")+1);
   $img = str_replace(" =","=",$msg);
   $out = "<$tag"; 
   for($i=1;$i<=count($atr);$i++)
   {
     $val = filter($img,$attr[$i]."="," ");
     if(strlen($val)>0) $attr[$i] = " ".$attr[$i]."=".$val;
     else $attr[$i] = "";
     $out .= $attr[$i];
   }
   $out .= ">";
   $partafter = substr($partafterwith,strpos($partafterwith,">")+1);
   $msg = substr($msg,0,$imgstart).$out.$partafter;
   $lengthfirst = $imgstart+3;
  }
  return $msg;
}
Chuck
20-Mar-2003 06:01
Caution, HTML created by Word may contain the sequence
'<?xml...'

Apparently strip_slashes treats this like <?php and removes the remainder of the input string. Not the just the XML tag but all input that follows.
dontknowwhat at thehellIamdoing dot com
19-Nov-2002 08:23
Here's a quickie that will strip out only specific tags. I'm using it to clean up Frontpage and WORD code from included third-party code (which shouldn't have the all the extra header information in it).

$contents = "Your HTML string";

// Part 1
// This array is for single tags and their closing counterparts

$tags_to_strip = Array("html","body","meta","link","head");

foreach ($tags_to_strip as $tag) {
       $contents = preg_replace("/<\/?" . $tag . "(.|\s)*?>/","",$contents);
}

// Part 2
// This array is for stripping opening and closing tags AND what's in between

$tags_and_content_to_strip = Array("title");

foreach ($tags_and_content_to_strip as $tag) {
       $contents = preg_replace("/<" . $tag . ">(.|\s)*?<\/" . $tag . ">/","",$contents);
}
mrmaxxx333 at triad dot rr dot com
07-May-2002 01:29
to rid everything in between script tags, including the script tags, i use this.

<?php
$description
= ereg_replace("~<script[^>]*>.+</script[^>]*>~isU", "", $description);
?>

it hasn't been extensively tested, but it works.

also, i ran into trouble with a href tags. i wanted to strip out the url in them. i did this to turn an <a href="blah.com">welcome to blah</a> into welcome to blah (blah.com)

<?php
$string
= preg_replace('/<a\s+.*?href="([^"]+)"[^>]*>([^<]+)<\/a>/is', '\2 (\1)', $string);
?>
guy at datalink dot SPAMMENOT dot net dot au
15-Mar-2002 12:19
Strip tags will NOT remove HTML entities such as &nbsp;
chrisj at thecyberpunk dot com
18-Dec-2001 02:57
strip_tags has doesn't recognize that css within the style tags are not document text. To fix this do something similar to the following:

$htmlstring = preg_replace("'<style[^>]*>.*</style>'siU",'',$htmlstring);

<strcspnstripcslashes>
 Last updated: Thu, 19 May 2005
Copyright © 2001-2005 The PHP Group
All rights reserved.
This unofficial mirror is operated at: The Server Pages
Last updated: Thu May 19 17:35:34 2005 CDT