List Info

Thread: Regex to Create Table of Contents from HTML String




Regex to Create Table of Contents from HTML String
user name
2007-12-19 16:16:02
I'm asking for the upcoming book Google Office Hacks. I got
some
working code but am looking into ways to optimize it. In
JavaScript
(the Replace function I suppose), what regular expression/
code can I
use to turn strings like this one...

  <h2>Foo</h2>
  <p>Bla</h2>
  <h2 class="hey">Bar</h2>
  <p>Bla bla</p>

... into strings like this one?

  <a href="#toc1">Foo</a><br />
  <a href="#toc2">Bar</a><br />

  <h2><a
name="toc1"></a>Foo</h2>
  <p>Bla</p>
  <h2 class="hey"><a
name="toc2"></a>Bar</h2>
  <p>Bla bla</p>

Using the HTML DOM doesn't seems to be an option, or at
least, what I
get as input will be a string from a textarea (in Google
Docs). Thanks!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Regex to Create Table of Contents from HTML String
user name
2007-12-19 23:27:11
Philipp,

I think this function should do the trick:

function CreateTOC( htmlBefore )
{
    var entryRegex =
/(<(hd)[^>]*>)([^<]+)(</2>)/g;
    var entryMatch = entryRegex.exec( htmlBefore );

    var lastIndex = 0;

    var entryCount = 1;
    var tocHtml = "";
    var htmlAfter = "";

    while ( entryMatch != null )
    {
        tocHtml += "<a href="#toc" +
entryCount + "">" +
                   entryMatch[ 3 ] + "</a><br
/>n";

        htmlAfter +=
            htmlBefore.substring( lastIndex,
entryMatch.index ) +
            entryMatch[ 1 ] +
            "<a name="toc" + entryCount++
+ "">" +
            entryMatch[ 3 ] +
            "</a>" + entryMatch[ 4 ];

        lastIndex = entryMatch.index + entryMatch[ 0
].length;

        entryMatch = entryRegex.exec( htmlBefore );
    }

    if ( lastIndex < htmlBefore.length )
    {
        htmlAfter += htmlBefore.substring( lastIndex );
    }

    return tocHtml + htmlAfter;
}


Let me know if you have any questions.

Jeff

On Dec 20, 5:16 am, Philipp <philipp.lens...gmail.com> wrote:
> I'm asking for the upcoming book Google Office Hacks. I
got some
> working code but am looking into ways to optimize it.
In JavaScript
> (the Replace function I suppose), what regular
expression/ code can I
> use to turn strings like this one...
>
>   <h2>Foo</h2>
>   <p>Bla</h2>
>   <h2 class="hey">Bar</h2>
>   <p>Bla bla</p>
>
> ... into strings like this one?
>
>   <a href="#toc1">Foo</a><br
/>
>   <a href="#toc2">Bar</a><br
/>
>
>   <h2><a
name="toc1"></a>Foo</h2>
>   <p>Bla</p>
>   <h2 class="hey"><a
name="toc2"></a>Bar</h2>
>   <p>Bla bla</p>
>
> Using the HTML DOM doesn't seems to be an option, or at
least, what I
> get as input will be a string from a textarea (in
Google Docs). Thanks!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Regex to Create Table of Contents from HTML String
user name
2007-12-20 16:41:50
> I think this function should do the trick:

Thanks a lot Jeff!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )