Opened 11 years ago

Closed 11 years ago

#1211 closed enhancement (wontfix)

The regular expression in Xinha.RE_doctype needs correction

Reported by: guest Owned by: gogo
Priority: normal Milestone: 0.96
Component: Xinha Core Version: trunk
Severity: normal Keywords:



I intend to use (after modifications) the Equation plugin for inserting mathematical formulas written in MathML. I am experimenting with the old HTMLArea now.

I discovered that the editor does not deal correctly with the following DOCTYPE definition:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "" [
  <!ENTITY mathml "">

And my attention was attracted by the line 435 (SVN revision 1001) in XinhaCore?.js which says:

Xinha.RE_doctype  = /(<!doctype((.|\n)*?)>)\n?/i;

This regular expression apparently does not fit.

I have found a good source of information to solve the problem:
Robert D. Cameron "REX: XML Shallow Parsing with Regular Expressions"

Using the idea from there I have reach to this regular expression:

Xinha.RE_doctype = /<!doctype(([ \n\t\r]+([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*([ \n\t\r]+(([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*|"[^"]*"|'[^']*'))*([ \n\t\r]+)?(\[(<(!(--[^-]*-([^-][^-]*-)*->|[^-]([^]"'><]+|"[^"]*"|'[^']*')*>)|\?([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*(\?>|[\n\r\t ][^?]*\?+([^>?][^?]*\?+)*>))|%([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*;|[ \n\t\r]+)*]([ \n\t\r]+)?)?>)?)|(([ \n\t\r]+)?>)/i;

In my editor the same variable is HTMLArea.RE_doctype.

Wow, it is complex, but it works so far. I am proposing altering Xinha.RE_doctype according the statement above.

Ivan Tcholakov

Change History (3)

comment:1 Changed 11 years ago by guest

  • Type changed from defect to enhancement

comment:2 Changed 11 years ago by guest

Nope. Does not work. It is just a good starting point.

Currently I am playing with:

Xinha.RE_doctype = /(<!doctype((.|\n)*?)(\[((.|\n)*?)\]([ \n\t\r])*?)?>)\n?/i;

Maybe this one is good enough.

Regards, Ivan Tcholakov.

comment:3 Changed 11 years ago by gogo

  • Resolution set to wontfix
  • Status changed from new to closed

I say wontfix this because properly finding and parsing such a doctype with a single regexp would be impossible (can't count the brackets) and the regexps proposed are stupidly complicated and you are the only one who has mentioned it and I've never seen an entity in a doctype in html before anywhere.

Note: See TracTickets for help on using tickets.