class Sanitize::Policy::HTMLSanitizer
Overview
This policy serves as a good default configuration that should fit most typical use cases for HTML sanitization.
Configurations
It comes in three different configurations with different sets of supported HTML tags.
They only differ in the default configuration of allowed tags and attributes. The transformation behaviour is otherwise the same.
Common Configuration
.common
: Accepts most standard tags and thus allows using a good
amount of HTML features (see COMMON_SAFELIST
).
This is the recommended default configuration and should work for typical use cases unless strong restrictions on allowed content is required.
sanitizer = Sanitize::Policy::HTMLSanitizer.common
sanitizer.process(%(<a href="javascript:alert('foo')">foo</a>)) # => %(foo)
sanitizer.process(%(<p><a href="foo">foo</a></p>)) # => %(<p><a href="foo" rel="nofollow">foo</a></p>)
sanitizer.process(%(<img src="foo.jpg">)) # => %(<img src="foo.jpg">)
sanitizer.process(%(<table><tr><td>foo</td><td>bar</td></tr></table>)) # => %(<table><tr><td>foo</td><td>bar</td></tr></table>)
NOTE This configuration (nor any other) does not accept <html>
,
<head>
, or # <body>
tags by default. In order to use
#sanitized_document
they need to be added explicitly to accepted_arguments
.
Basic Configuration
.basic
: This set accepts some basic tags including paragraphs, headlines,
lists, and images (see BASIC_SAFELIST
).
sanitizer = Sanitize::Policy::HTMLSanitizer.basic
sanitizer.process(%(<a href="javascript:alert('foo')">foo</a>)) # => %(foo)
sanitizer.process(%(<p><a href="foo">foo</a></p>)) # => %(<p><a href="foo" rel="nofollow">foo</a></p>)
sanitizer.process(%(<img src="foo.jpg">)) # => %(<img src="foo.jpg">)
sanitizer.process(%(<table><tr><td>foo</td><td>bar</td></tr></table>)) # => %(foo bar)
Inline Configuration
.inline
: Accepts only a limited set of inline tags (see INLINE_SAFELIST
).
sanitizer = Sanitize::Policy::HTMLSanitizer.inline
sanitizer.process(%(<a href="javascript:alert('foo')">foo</a>)) # => %(foo)
sanitizer.process(%(<p><a href="foo">foo</a></p>)) # => %(<a href="foo" rel="nofollow">foo</a>)
sanitizer.process(%(<img src="foo.jpg">)) # => %()
sanitizer.process(%(<table><tr><td>foo</td><td>bar</td></tr></table>)) # => %(foo bar)
Attribute Transformations
Attribute transformations are identical in all three configurations. But more
advanced transforms won't apply if the respective attribute is not allowed in
accepted_tags
.
So you can easily add additional elements and attributes to lower-tier sets
and get the same attribute validation. For example: .inline
doesn't include
<img>
tags, but when img
is added to accepted_attributes
,
the policy validates img tags the same way as in .common
.
URL Sanitization
This transformation applies to attributes that contain a URL (configurable
through (#url_attributes
).
- Makes sure the value is a valid URI (via
URI.parse
). If it does not parse, the attribute value is set to empty string. - Sanitizes the URI via `URISanitizer (configurable trough
#uri_sanitizer
). If the sanitizer returnsnil
, the attribute value is set to empty string.
The same URISanitizer
is used for any URL attributes.
Anchor Tags
For <a>
tags with a href
attribute, there are two transforms:
rel="nofollow"
is added (can be disabled with#add_rel_nofollow
).rel="noopener"
is added to links withtarget
attribute (can be disabled with#add_rel_noopener
).
Anchor tags the have neither a href
, name
or id
attribute are stripped.
NOTE name
and id
attributes are not in any of the default sets of
accepted attributes, so they can only be used when explicitly enabled.
Image Tags
<img>
tags are stripped if they don't have a src
attribute.
Size Attributes
If a tag has width
or height
attributes, the values are validated to be
numerical or percent values.
By default, these attributes are only accepted for <img> tags.
Alignment Attribute
The align
attribute is validated against allowed values for this attribute:
center, left, right, justify, char
.
If the value is invalid, the attribute is stripped.
Classes
class
attributes are filtered to accept only classes described by
#valid_classes
. String values need to match the class name exactly, regex
values need to match the entire class name.
class
is accepted as a global attribute in the default configuration, but no
values are allowed in #valid_classes
.
All classes can be accepted by adding the match-all regular expression /.*/
to #valid_classes
.
Defined in:
Constant Summary
-
BASIC_SAFELIST =
INLINE_SAFELIST.merge({"blockquote" => Set {"cite"}, "br" => Set(String).new, "h1" => Set(String).new, "h2" => Set(String).new, "h3" => Set(String).new, "h4" => Set(String).new, "h5" => Set(String).new, "h6" => Set(String).new, "hr" => Set(String).new, "img" => Set {"alt", "src", "longdesc", "width", "height", "align"}, "li" => Set(String).new, "ol" => Set {"start"}, "p" => Set {"align"}, "pre" => Set(String).new, "ul" => Set(String).new})
-
Compatible with basic Markdown features.
-
COMMON_SAFELIST =
BASIC_SAFELIST.merge({"dd" => Set(String).new, "del" => Set {"cite"}, "details" => Set(String).new, "dl" => Set(String).new, "dt" => Set(String).new, "div" => Set(String).new, "ins" => Set {"cite"}, "kbd" => Set(String).new, "q" => Set {"cite"}, "ruby" => Set(String).new, "rp" => Set(String).new, "rt" => Set(String).new, "s" => Set(String).new, "samp" => Set(String).new, "strike" => Set(String).new, "sub" => Set(String).new, "summary" => Set(String).new, "sup" => Set(String).new, "table" => Set(String).new, "time" => Set {"datetime"}, "tbody" => Set(String).new, "td" => Set(String).new, "tfoot" => Set(String).new, "th" => Set(String).new, "thead" => Set(String).new, "tr" => Set(String).new, "tt" => Set(String).new, "var" => Set(String).new})
-
Accepts most standard tags and thus allows using a good amount of HTML features.
-
INLINE_SAFELIST =
{"a" => Set {"href", "hreflang"}, "abbr" => Set(String).new, "acronym" => Set(String).new, "b" => Set(String).new, "code" => Set(String).new, "em" => Set(String).new, "i" => Set(String).new, "strong" => Set(String).new, "*" => Set {"dir", "lang", "title", "class"}}
-
Only limited elements for inline text markup.
Constructors
-
.basic : HTMLSanitizer
Creates an instance which accepts more basic tags including paragraphs, headlines, lists, and images (see
BASIC_SAFELIST
). -
.common : HTMLSanitizer
Creates an instance which accepts even more standard tags and thus allows using a good amount of HTML features (see
COMMON_SAFELIST
). -
.inline : HTMLSanitizer
Creates an instance which accepts a limited set of inline tags (see
INLINE_SAFELIST
).
Instance Method Summary
- #accept_tag(tag : String, attributes : Set(String) = Set(String).new)
-
#add_rel_nofollow : Bool
Add
rel="nofollow"
to every<a>
tag withhref
attribute. -
#add_rel_nofollow=(add_rel_nofollow)
Add
rel="nofollow"
to every<a>
tag withhref
attribute. -
#add_rel_noopener : Bool
Add
rel="noopener"
to every<a>
tag withhref
andtarget
attribute. -
#add_rel_noopener=(add_rel_noopener)
Add
rel="noopener"
to every<a>
tag withhref
andtarget
attribute. - #append_attribute(attributes, attribute, value)
-
#no_links
Removes anchor tag (
<a>
from the list of accepted tags). - #transform_attributes(tag : String, attributes : Hash(String, String)) : String | CONTINUE | STOP
- #transform_classes(tag, attributes)
- #transform_tag_a(attributes)
- #transform_tag_img(attributes)
- #transform_uri(tag, attributes, attribute, uri : URI) : String?
- #transform_url_attribute(tag, attributes, attribute, value)
- #transform_url_attributes(tag, attributes)
-
#uri_sanitizer : Sanitize::URISanitizer
Configures the
URISanitizer
to use for sanitizing URL attributes. -
#uri_sanitizer=(uri_sanitizer)
Configures the
URISanitizer
to use for sanitizing URL attributes. -
#url_attributes : Set(String)
Configures which attributes are considered to contain URLs.
-
#url_attributes=(url_attributes : Set(String))
Configures which attributes are considered to contain URLs.
- #valid_class?(tag, klass, valid_classes)
-
#valid_classes : Set(String | Regex)
Configures which classes are valid for
class
attributes. -
#valid_classes=(valid_classes : Set(String | Regex))
Configures which classes are valid for
class
attributes. - #valid_classes=(classes)
Instance methods inherited from class Sanitize::Policy::Whitelist
accepted_attributes : Hash(String, Set(String))
accepted_attributes,
accepted_attributes=(accepted_attributes : Hash(String, Set(String)))
accepted_attributes=,
global_attributes
global_attributes,
transform_attributes(name : String, attributes : Hash(String, String)) : String | CONTINUE | STOP
transform_attributes,
transform_tag(name : String, attributes : Hash(String, String)) : String | CONTINUE | STOP
transform_tag,
transform_text(text : String) : String?
transform_text
Constructor methods inherited from class Sanitize::Policy::Whitelist
new(accepted_attributes : Hash(String, Set(String)))
new
Instance methods inherited from class Sanitize::Policy
block_tag?(name)
block_tag?,
block_whitespace : String
block_whitespace,
block_whitespace=(block_whitespace)
block_whitespace=,
process(html : String | XML::Node) : String
process,
process_document(html : String | XML::Node) : String
process_document,
transform_tag(name : String, attributes : Hash(String, String)) : String | Processor::CONTINUE | Processor::STOP
transform_tag,
transform_text(text : String) : String?
transform_text
Constructor Detail
Creates an instance which accepts more basic tags including paragraphs,
headlines, lists, and images (see BASIC_SAFELIST
).
Creates an instance which accepts even more standard tags and thus allows
using a good amount of HTML features (see COMMON_SAFELIST
).
Unless you need tight restrictions on allowed content, this is the recommended default.
Creates an instance which accepts a limited set of inline tags (see
INLINE_SAFELIST
).
Instance Method Detail
Add rel="noopener"
to every <a>
tag with href
and target
attribute.
Removes anchor tag (<a>
from the list of accepted tags).
NOTE This doesn't reject attributes with URL values for other tags.
Configures the URISanitizer
to use for sanitizing URL attributes.
Configures the URISanitizer
to use for sanitizing URL attributes.
Configures which attributes are considered to contain URLs. If empty, URL sanitization is disabled.
Default value: Set{"src", "href", "action", "cite", "longdesc"}
.
Configures which attributes are considered to contain URLs. If empty, URL sanitization is disabled.
Default value: Set{"src", "href", "action", "cite", "longdesc"}
.
Configures which classes are valid for class
attributes.
String values need to match the class name exactly, regex values need to match the entire class name.
Default value: empty
Configures which classes are valid for class
attributes.
String values need to match the class name exactly, regex values need to match the entire class name.
Default value: empty