(NM) Any regular expression gurus here?
I'm writing a python script to produce a document outline, from headers in a Restructured Text document.
EDIT: Never mind, I found one way:
The regex I've come up with is:
That gives me all the headers ok, but I get an extraneous blank line at the beginning of each match.
What I'm looking for is a regex that does not match the previous blank line. The issue being RSt
allows optional overlines.
The regex above is just a prototype, so does not yet match all possible header characters.
There may be a beer involved, no promises
The following shows the match I am getting, and the match I want:
The sample text follows:
I'm writing a python script to produce a document outline, from headers in a Restructured Text document.
EDIT: Never mind, I found one way:
Code:
^([=|`|\-|]+\n[\w| |]+\n[=|`|\-]+|[\w| ]+\n[=|`|\-]+)
The regex I've come up with is:
Code:
header_pat = r"""^([\n|[=|\-|~|`|\+]+]?[\w| |]+\n[=|\-|~|`|\+]+)"""
header_re = re.compile(header_pat, re.M)
That gives me all the headers ok, but I get an extraneous blank line at the beginning of each match.
What I'm looking for is a regex that does not match the previous blank line. The issue being RSt
allows optional overlines.
The regex above is just a prototype, so does not yet match all possible header characters.
There may be a beer involved, no promises
The following shows the match I am getting, and the match I want:
Code:
<~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=======================================
reStructuredText Markup Specification
=======================================
Testing header
==============
-----------------------
Quick Syntax Overview
-----------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~>>
=======================================
reStructuredText Markup Specification
=======================================
Testing header
==============
-----------------------
Quick Syntax Overview
-----------------------
The sample text follows:
Code:
.. -*- coding: utf-8 -*-
=======================================
reStructuredText Markup Specification
=======================================
reStructuredText_ is plaintext that uses simple and intuitive
constructs to indicate the structure of a document.
Testing header
==============
Simple, implicit markup is used to indicate special constructs, such
as section headings, bullet lists, and emphasis. The markup used is
reStructuredText is applicable to documents of any length, from the
very small (such as inline program documentation fragments, e.g.
-----------------------
Quick Syntax Overview
-----------------------
A reStructuredText document is made up of body or block-level
elements, and may be structured into sections. Sections_ are
Here are examples of `body elements`_:
----------------
Syntax Details
----------------
Descriptions below list "doctree elements" (document tree element
names; XML DTD generic identifiers) corresponding to syntax
Whitespace
==========
Spaces are recommended for indentation_, but tabs may also be used.
Tabs will be converted to spaces. Tab stops are at every 8th column.
Other whitespace characters (form feeds [chr(12)] and vertical tabs
[chr(11)]) are converted to single spaces before processing.
Blank Lines
-----------
Blank lines are used to separate paragraphs and other elements.
Multiple successive blank lines are equivalent to a single blank line,
RCS Keywords
````````````
`Bibliographic fields`_ recognized by the parser are normally checked
for RCS [#]_ keywords and cleaned up [#]_. RCS keywords may be
------
Other
------
Indentation
-----------
Indentation is used to indicate -- and is only significant in
indicating -- block quotes, definitions (in definition list items),
Last edited: