From: Vladimír Vondruš Date: Wed, 6 Dec 2017 19:17:01 +0000 (+0100) Subject: doxygen: rework of the patching number 67. X-Git-Url: https://www.chiark.greenend.org.uk/ucgi/~cjwatson/git?a=commitdiff_plain;h=1d41534a8e4a6fe8aa8a0882647ad85a5fed4b88;p=blog.git doxygen: rework of the patching number 67. Argh! * No longer having

around markdown headings (what!),

    ,
      and elements. * If ba list item, parameter description or return value description has multiple paragraphs, they are preserved. Brief description is still strictly single paragraph. * A special casing of simple nested lists, where item containing a sublist shouldn't be wrapped in a

      . * Documented list behavior, explained how to make inflated lists even out of single-paragraph items. --- diff --git a/doc/doxygen.rst b/doc/doxygen.rst index d8f5f530..2a697f13 100644 --- a/doc/doxygen.rst +++ b/doc/doxygen.rst @@ -306,6 +306,53 @@ modifications: added after ``::`` and ``_`` in long symbols in link titles and after ``/`` in URLs. +Single-paragraph list items, function parameter description and return value +documentation is stripped from the enclosing :html:`

      ` tag to make the output +more compact. If multiple paragraphs are present, nothing is stripped. In case +of lists, they are then rendered in an inflated form. However, in order to +achieve even spacing also with single-paragraph items, it's needed use some +explicit markup. Adding :html:`

      ` to a single-paragraph item will make +sure the enclosing :html:`

      ` is not stripped. + +.. code-figure:: + + .. code:: c++ + + /** + - A list + + of multiple + + paragraphs. + + - Another item + +

      + + - A sub list + + Another paragraph + */ + + .. raw:: html + +
        +
      • +

        A list

        +

        of multiple

        +

        paragraphs.

        +
      • +
      • +

        Another item

        +
          +
        • +

          A sub list

          +

          Another paragraph

          +
        • +
        +
      • +
      + `Pages, sections and table of contents`_ ======================================== diff --git a/doxygen/dox2html5.py b/doxygen/dox2html5.py index 464a951d..0b975d27 100755 --- a/doxygen/dox2html5.py +++ b/doxygen/dox2html5.py @@ -131,44 +131,89 @@ def parse_type(state: State, type: ET.Element) -> str: # Remove spacing inside <> and before & and * return fix_type_spacing(out) -def parse_desc_internal(state: State, element: ET.Element, trim = True): +def parse_desc_internal(state: State, element: ET.Element, immediate_parent: ET.Element = None, trim = True): out = Empty() - out.write_start_tag = True - out.write_close_tag = True out.section = None out.templates = {} out.params = {} out.return_value = None + # DOXYGEN PATCHING 1/5 + # + # In the optimistic case, when parsing the element, the parsed + # content is treated as single reasonable paragraph and the caller is told + # to write both

      and

      enclosing tag. + # + # Unfortunately Doxygen puts some *block* elements inside a element + # instead of closing it before and opening it again after. That is making + # me raging mad. Nested paragraphs are no way valid HTML and they are ugly + # and problematic in all ways you can imagine, so it's needed to be + # patched. See the long ranty comments below for more parts of the story. + out.write_paragraph_start_tag = element.tag == 'para' + out.write_paragraph_close_tag = element.tag == 'para' + out.is_reasonable_paragraph = element.tag == 'para' + out.parsed: str = '' if element.text: out.parsed = html.escape(element.text.strip() if trim else element.text) + # Needed later for deciding whether we can strip the surrounding

      from + # the content + paragraph_count = 0 + has_block_elements = False + i: ET.Element for i in element: - # Doxygen puts the following *block* elements inside a element - # instead of closing it before and then opening it again after. Nested - # paragraphs are ugly and also not valid HTML, so we have to patch that - # up. If there was any content before, we close the paragraph. If there + # DOXYGEN PATCHING 2/5 + # + # Upon encountering a block element nested in , we need to act. + # If there was any content before, we close the paragraph. If there # wasn't, we tell the caller to not even open the paragraph. After - # processing the following tag, there won't be any paragraph open, so - # we also tell the caller that there's no need to close anything. + # processing the following tag, there probably won't be any paragraph + # open, so we also tell the caller that there's no need to close + # anything (but it's not that simple, see for more patching at the end + # of the cycle iteration). + # + # Those elements are: + # - + # -

      + # - and + # - + # - , , + # - + # - block + # - (complex block/inline autodetection involved, so + # the check is deferred to later in the loop) # # Note that and are # extracted out of the text flow, so these are removed from this check. # - # It's not that simple, see for more patching at the end of the cycle - # iteration. - if (i.tag in ['blockquote', 'xrefsect', 'variablelist', 'verbatim'] or (i.tag == 'simplesect' and i.attrib['kind'] != 'return') or (i.tag == 'formula' and i.text.startswith('\[ ') and i.text.endswith(' \]'))) and element.tag == 'para' and out.write_close_tag: + # In addition, there's special handling to achieve things like this: + #
        + #
      • A paragraph + #
          + #
        • A nested list item
        • + #
        + #
      • + # I.e., not wrapping "A paragraph" in a

        , but only if it's + # immediately followed by another and it's the first paragraph in a + # list item. We check that using the immediate_parent variable. + if (i.tag in ['heading', 'blockquote', 'xrefsect', 'variablelist', 'verbatim', 'itemizedlist', 'orderedlist', 'image'] or (i.tag == 'simplesect' and i.attrib['kind'] != 'return') or (i.tag == 'formula' and i.text.startswith('\[ ') and i.text.endswith(' \]'))) and element.tag == 'para' and out.write_paragraph_close_tag: + out.is_reasonable_paragraph = False out.parsed = out.parsed.rstrip() if not out.parsed: - out.write_start_tag = False + out.write_paragraph_start_tag = False + elif immediate_parent and immediate_parent.tag == 'listitem' and i.tag in ['itemizedlist', 'orderedlist']: + out.write_paragraph_start_tag = False else: out.parsed += '

        ' - out.write_close_tag = False + out.write_paragraph_close_tag = False # Block elements if i.tag in ['sect1', 'sect2', 'sect3']: + assert element.tag != 'para' # should be top-level block element + has_block_elements = True + parsed = parse_desc_internal(state, i) assert parsed.section assert not parsed.templates and not parsed.params and not parsed.return_value @@ -179,6 +224,9 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): out.parsed += '
        {}
        '.format(extract_id(i), parsed.parsed) elif i.tag == 'title': + assert element.tag != 'para' # should be top-level block element + has_block_elements = True + if element.tag == 'sect1': tag = 'h2' elif element.tag == 'sect2': @@ -196,6 +244,8 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): out.parsed += '<{0}>{2}'.format(tag, id, title) elif i.tag == 'heading': + has_block_elements = True + if i.attrib['level'] == '1': tag = 'h2' elif i.attrib['level'] == '2': @@ -208,15 +258,31 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): out.parsed += '<{0}>{1}'.format(tag, html.escape(i.text)) elif i.tag == 'para': + assert element.tag != 'para' # should be top-level block element + paragraph_count += 1 + + # DOXYGEN PATCHING 3/5 + # # Parse contents of the paragraph, don't trim whitespace around # nested elements but trim it at the begin and end of the paragraph # itself. Also, some paragraphs are actually block content and we # might not want to write the start/closing tag. - parsed = parse_desc_internal(state, i, False) + # + # Also, to make things even funnier, parameter and return value + # description come from inside of some paragraph, so bubble them up + # and assume they are not scattered all over the place (ugh). + # + # There's also the patching of nested lists that results in the + # immediate_parent variable in the section 2/5 -- we pass the + # parent only if this is the first paragraph inside it. + parsed = parse_desc_internal(state, i, element if paragraph_count == 1 and not has_block_elements else None, False) parsed.parsed = parsed.parsed.strip() - - # Inherit parameter and return value description, assume it's not - # scattered all over the place (ugh) + if not parsed.is_reasonable_paragraph: + has_block_elements = True + if parsed.parsed: + if parsed.write_paragraph_start_tag: out.parsed += '

        ' + out.parsed += parsed.parsed + if parsed.write_paragraph_close_tag: out.parsed += '

        ' if parsed.templates: assert not out.templates out.templates = parsed.templates @@ -230,29 +296,20 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): # Assert we didn't miss anything important assert not parsed.section - # Omit superfluous

        for simple elments (list items, brief, - # parameter and return value description) - if element.tag in ['listitem', 'briefdescription', 'parameterdescription'] or (element.tag == 'simplesect' and element.attrib['kind'] == 'return'): - # Not expecting any funny thing from there (this will bite back - # in the future) - assert parsed.write_start_tag and parsed.write_close_tag - out.parsed += parsed.parsed - # Otherwise behave like requested - elif parsed.parsed: - if parsed.write_start_tag: out.parsed += '

        ' - out.parsed += parsed.parsed - if parsed.write_close_tag: out.parsed += '

        ' - elif i.tag == 'blockquote': + has_block_elements = True out.parsed += '
        {}
        '.format(parse_desc(state, i)) elif i.tag == 'itemizedlist': + has_block_elements = True out.parsed += '
          {}
        '.format(parse_desc(state, i)) elif i.tag == 'orderedlist': + has_block_elements = True out.parsed += '
          {}
        '.format(parse_desc(state, i)) elif i.tag == 'listitem': + has_block_elements = True out.parsed += '
      • {}
      • '.format(parse_desc(state, i)) elif i.tag == 'simplesect': @@ -261,6 +318,7 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): assert not out.return_value out.return_value = parse_desc(state, i) else: + has_block_elements = True if i.attrib['kind'] == 'see': out.parsed += '' elif i.tag == 'xrefsect': + has_block_elements = True + id = i.attrib['id'] match = xref_id_rx.match(id) file = match.group(1) @@ -287,6 +347,8 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): color, file, match.group(2), i.find('xreftitle').text, parse_desc(state, i.find('xrefdescription'))) elif i.tag == 'parameterlist': + has_block_elements = True + out.param_kind = i.attrib['kind'] assert out.param_kind in ['param', 'templateparam'] for param in i: @@ -304,6 +366,7 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): out.templates[name.text] = description elif i.tag == 'variablelist': + has_block_elements = True out.parsed += '
        ' for var in i: @@ -315,11 +378,12 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): out.parsed += '
        ' elif i.tag == 'verbatim': + has_block_elements = True out.parsed += '
        {}
        '.format(html.escape(i.text)) elif i.tag == 'programlisting': - # Seems to be a standalone code paragraph, don't wrap it in

        - # and use

        :
        +            # If it seems to be a standalone code paragraph, don't wrap it in
        +            # 

        and use

        :
                     # - is either alone in the paragraph, with no text or other
                     #   elements around
                     # - or is a code snippet (filename instead of just .ext). Doxygen
        @@ -345,14 +409,17 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True):
                         if out.parsed and not out.parsed[-1].isspace() and not out.parsed[-1] in '([{':
                             out.parsed += ' '
         
        -            # Specialization of similar paragraph cleanup code above
        +            # DOXYGEN  PATCHING 4/5
        +            #
        +            # Specialization of similar paragraph cleanup code above.
                     if code_block:
        +                has_block_elements = True
                         out.parsed = out.parsed.rstrip()
                         if not out.parsed:
        -                    out.write_start_tag = False
        +                    out.write_paragraph_start_tag = False
                         else:
                             out.parsed += '

        ' - out.write_close_tag = False + out.write_paragraph_close_tag = False # Hammer unhighlighted code out of the block # TODO: preserve links @@ -438,6 +505,7 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): out.parsed += '<{0} class="{1}">{2}'.format('pre' if code_block else 'code', class_, highlighted) elif i.tag == 'image': + has_block_elements = True name = i.attrib['name'] path = None @@ -471,14 +539,12 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): out.parsed += m.math._patch(i.text, rendered, attribs) # Block formula - elif i.text.startswith('\[ ') and i.text.endswith(' \]'): + else: + assert i.text.startswith('\[ ') and i.text.endswith(' \]') + has_block_elements = True rendered = m.latex2svg.latex2svg('$${}$$'.format(i.text[3:-3]), params=m.math.latex2svg_params) out.parsed += '
        {}
        '.format(m.math._patch(i.text, rendered, '')) - # Shouldn't happen - else: # pragma: no cover - assert False - # Inline elements elif i.tag == 'anchor': out.parsed += ''.format(extract_id(i)) @@ -506,13 +572,15 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): else: # pragma: no cover logging.warning("Ignoring <{}> in desc".format(i.tag)) + # DOXYGEN PATCHING 5/5 + # # Besides putting notes and blockquotes and shit inside paragraphs, # Doxygen also doesn't attempt to open a new for the ACTUAL NEW # PARAGRAPH after they end. So I do it myself and give a hint to the # caller that they should close the

        again. - if element.tag == 'para' and not out.write_close_tag and i.tail and i.tail.strip(): + if element.tag == 'para' and not out.write_paragraph_close_tag and i.tail and i.tail.strip(): out.parsed += '

        ' - out.write_close_tag = True + out.write_paragraph_close_tag = True # There is usually some whitespace in between, get rid of it as # this is a start of a new paragraph. Stripping of the whole thing # is done by the caller. @@ -522,6 +590,20 @@ def parse_desc_internal(state: State, element: ET.Element, trim = True): elif i.tail: out.parsed += html.escape(i.tail.strip() if trim else i.tail) + # Brief description always needs to be single paragraph because we're + # sending it out without enclosing

        . + if element.tag == 'briefdescription': + assert not has_block_elements and paragraph_count <= 1 + if paragraph_count == 1: + assert out.parsed.startswith('

        ') and out.parsed.endswith('

        ') + out.parsed = out.parsed[3:-4] + + # Strip superfluous

        for simple elments (list items, parameter and + # return value description), but only if there is just a single paragraph + elif (element.tag in ['listitem', 'parameterdescription'] or (element.tag == 'simplesect' and element.attrib['kind'] == 'return')) and not has_block_elements and paragraph_count == 1: + assert out.parsed.startswith('

        ') and out.parsed.endswith('

        ') + out.parsed = out.parsed[3:-4] + return out def parse_desc(state: State, element: ET.Element) -> str: diff --git a/doxygen/test/contents_blocks/index.html b/doxygen/test/contents_blocks/index.html index d8f8b5af..262d6265 100644 --- a/doxygen/test/contents_blocks/index.html +++ b/doxygen/test/contents_blocks/index.html @@ -39,7 +39,7 @@

        First paragraph containing some content.

        Paragraph following the sections.

        -

        A blockquote

        Text right after that blockquote should be a new paragraph.

        +

        A blockquote

        Text right after that blockquote should be a new paragraph.

        • A simple
        • List
          1. With one line
          2. for each
        • item, so paragraphs are removed
        • A simple
        • List
          1. With the sublist delimited
          2. by blank lines
        • should behave the same as above
        • A new list

          of multiple

          paragraphs.

        • Another item

          • A sub list

            Another paragraph

        A paragraph after that list.

        diff --git a/doxygen/test/contents_blocks/input.dox b/doxygen/test/contents_blocks/input.dox index 3c0fe3cc..04902caf 100644 --- a/doxygen/test/contents_blocks/input.dox +++ b/doxygen/test/contents_blocks/input.dox @@ -19,6 +19,40 @@ Paragraph following the sections. > A blockquote Text right after that blockquote should be a new paragraph. +- A simple +- List + -# With one line + -# for each +- item, so paragraphs are removed + +. + +- A simple +- List + + -# With the sublist delimited + -# by blank lines + +- should behave the same as above + +. + +- A new list + + of multiple + + paragraphs. + +- Another item + +

        + + - A sub list + + Another paragraph + +A paragraph after that list. + */ /** @page other Other page diff --git a/doxygen/test/contents_image/index.html b/doxygen/test/contents_image/index.html index 795bc183..3a0965ec 100644 --- a/doxygen/test/contents_image/index.html +++ b/doxygen/test/contents_image/index.html @@ -37,7 +37,7 @@

        My Project

        -

        Alt text

        +Alt text diff --git a/doxygen/test/contents_image/warnings.html b/doxygen/test/contents_image/warnings.html index 66925e2b..769ce803 100644 --- a/doxygen/test/contents_image/warnings.html +++ b/doxygen/test/contents_image/warnings.html @@ -37,7 +37,7 @@

        Images that produce warnings

        -

        Image that doesn't exist.

        Image without alt text:

        Image

        +Image that doesn't exist.

        Image without alt text:

        Image diff --git a/doxygen/test/contents_typography/index.html b/doxygen/test/contents_typography/index.html index cc90fdf9..eef3e86a 100644 --- a/doxygen/test/contents_typography/index.html +++ b/doxygen/test/contents_typography/index.html @@ -38,7 +38,7 @@ My Project

        Page section

        A blockquote.

        Preformatted text.
        -

        Page subsection

        • Unordered
        • list
        • of
          • nested
          • items
        • and back

        Sub-sub section

        1. Ordered
        2. list
        3. of
          1. nested
          2. items
        4. and back

        This is a typewriter text, emphasis and bold. http://google.com and URL. En-dash – and em-dash —. Reference to a Page subsection.

        +

        Page subsection

        • Unordered
        • list
        • of
          • nested
          • items
        • and back

        Sub-sub section

        1. Ordered
        2. list
        3. of
          1. nested
          2. items
        4. and back

        This is a typewriter text, emphasis and bold. http://google.com and URL. En-dash – and em-dash —. Reference to a Page subsection.

        diff --git a/doxygen/test/contents_typography/warnings.html b/doxygen/test/contents_typography/warnings.html index 454f87e2..1bb044b3 100644 --- a/doxygen/test/contents_typography/warnings.html +++ b/doxygen/test/contents_typography/warnings.html @@ -37,7 +37,7 @@

        Content that produces warnings

        -

        Markdown heading 1

        Markdown heading 2

        Markdown heading 3

        +

        Markdown heading 1

        Markdown heading 2

        Markdown heading 3