chiark / gitweb /
m.htmlsanity: don't apply typography on links with URLs in title.
authorVladimír Vondruš <mosra@centrum.cz>
Sun, 21 Oct 2018 20:20:52 +0000 (22:20 +0200)
committerVladimír Vondruš <mosra@centrum.cz>
Sun, 21 Oct 2018 20:23:18 +0000 (22:23 +0200)
And e-mail addresses. That's a very bad thing to do.

doc/plugins/htmlsanity.rst
pelican-plugins/m/htmlsanity.py
pelican-plugins/m/test/htmlsanity_typography/page.html
pelican-plugins/m/test/htmlsanity_typography/page.rst

index 95d8619ba36c77e98f3de2bf6a0cfffead27dec5..5ccf92aa1b0f36d1b705d89776f24bc1c016327e 100644 (file)
@@ -167,8 +167,9 @@ of long words being wrapped on new lines.
 The hyphenation is done using `Pyphen <http://pyphen.org/>`_ and is applied to
 whole document contents and fields that are included in the :py:`FORMATTED_FIELDS`.
 All other fields including document title are excluded from hyphenation, the
-same goes for literal and raw blocks. You can see it in practice in the
-following convoluted example, it's also language-aware:
+same goes for literal and raw blocks and links with URL (or e-mail) as a title.
+You can see it in practice in the following convoluted example, it's also
+language-aware:
 
 .. code-figure::
 
index 642f49a756747c9dd4e72dc47cbdc1a1cba2c25f..b6c3e68e972df8b7f4e19e6c5abab438450f14a9 100644 (file)
@@ -78,11 +78,14 @@ def can_apply_typography(txtnode):
     #  - raw code (such as SVG)
     #  - field names
     #  - bibliographic elements (author, date, ... fields)
+    #  - links with title that's the same as URL (or e-mail)
     if isinstance(txtnode.parent, nodes.literal) or \
        isinstance(txtnode.parent.parent, nodes.literal) or \
        isinstance(txtnode.parent, nodes.raw) or \
        isinstance(txtnode.parent, nodes.field_name) or \
-       isinstance(txtnode.parent, nodes.Bibliographic):
+       isinstance(txtnode.parent, nodes.Bibliographic) or \
+       (isinstance(txtnode.parent, nodes.reference) and
+            (txtnode.astext() == txtnode.parent['refuri'] or 'mailto:' + txtnode.astext() == txtnode.parent['refuri'])):
         return False
 
     # From fields include only the ones that are in FORMATTED_FIELDS
@@ -196,7 +199,10 @@ class Pyphen(Transform):
 
             for txtnode in node.traverse(nodes.Text):
                 if not can_apply_typography(txtnode): continue
-                # Don't hyphenate document title
+
+                # Don't hyphenate document title. Not part of
+                # can_apply_typography() because we *do* want smart quotes for
+                # a document title.
                 if isinstance(txtnode.parent, nodes.title): continue
 
                 # Useful for debugging, don't remove ;)
index 45d132810a35a970f87fea865643d3040bf2afd9..c92c4d9227b0875203d23c9e7d35dbed10ac2335 100644 (file)
@@ -52,6 +52,12 @@ Nest&shy;ed con&shy;tent should be hy&shy;phen&shy;at&shy;ed al&shy;so! And al&s
 ver&shy;ba&shy;tim stuff shouldn’t: <span class="raw-html">hello "this" is not hyphenated</span>. Nei&shy;ther
 ver&shy;ba&shy;tim blocks:</p>
 "quote" hyphenation<p lang="cs">Od&shy;sta&shy;vec v češ&shy;ti&shy;ně. „Uvo&shy;zov&shy;ky“ fun&shy;gu&shy;jí ji&shy;nak a dě&shy;le&shy;ní slov jakbys&shy;met.</p>
+<p>Links with ti&shy;tles that are URLs (or e-mail ad&shy;dress&shy;es) shouldn’t be hy&shy;phen&shy;at&shy;ed
+ei&shy;ther:</p>
+<ul>
+<li><a href="mailto:info&#64;magnum.graphics">info&#64;magnum.graphics</a></li>
+<li><a href="https://magnum.graphics">https://magnum.graphics</a></li>
+</ul>
 <!-- /content -->
       </div>
     </div>
index f429181832bd854bf96fe2b95f92272425a6eb10..cdab8f14960af95239b9de58f045eaf436848f72 100644 (file)
@@ -33,3 +33,9 @@ verbatim blocks:
 .. class:: language-cs
 
     Odstavec v češtině. "Uvozovky" fungují jinak a dělení slov jakbysmet.
+
+Links with titles that are URLs (or e-mail addresses) shouldn't be hyphenated
+either:
+
+-   info@magnum.graphics
+-   https://magnum.graphics