Opened 15 years ago

Closed 13 years ago

#127 closed defect (fixed)

convert special chars to HTML entities

Reported by: anonymous Owned by: gogo
Priority: high Milestone:
Component: Xinha Core Version: trunk
Severity: normal Keywords: special chars entities euro corrupt file
Cc: pixelsoul7@…


THe spcial char button inserts not the code but the actual sign... which is not good code € instead of € etc..

Attachments (1)

special_chars.txt (14.4 KB) - added by mharrisonline 15 years ago.
fix for ticket 127, special characters

Download all attachments as: .zip

Change History (18)

comment:1 Changed 15 years ago by niko

actually i think the browser converts it, go into html-code-view, enter € switch to wysiwyg and back again -> € is replaced by €.

..but why is this a problem?

comment:2 Changed 15 years ago by gogo

  • Resolution set to wontfix
  • Status changed from new to closed

I agree, I don't think it's a problem. The only entities that are strictly required are & < > and " everything else can be just plain old characters. If anybody really wants to they can do post-processing to turn the "special" characters into numbered entities or such.

Closing as wont fix.

comment:3 Changed 15 years ago by niko

see #130

comment:4 Changed 15 years ago by anonymous

It does not happen only with € also with thing like ï so it's annoying :)

comment:5 Changed 15 years ago by anonymous

  • Resolution wontfix deleted
  • Status changed from closed to reopened

Also the button for special chars gives the correct format for special chars.. only it inserts the wrong one....

It would be allot more work to catch everything in php and replace the characters then inserting the correct format at once... now it's doing the same thing twice but one of them does not work.

comment:6 Changed 15 years ago by anonymous

Forgot to add that you can't just use the characters sine it will give a square as character...

comment:7 Changed 15 years ago by gogo

  • Resolution set to wontfix
  • Status changed from reopened to closed

Ok, here's the deal. Looks like Gecko at least converts the entity into the appropriate representation of the character in the character set of the document being edited (typically utf-8). At least that's what my cursory look shows, it may be that when we get the HTML out we are inadvertadly converting it but I don't think that's the case.

Forgot to add that you can't just use the characters since it will give a square as character...

Only if you have an incorrect character set defined for the html or are using a font that doesn't have that character, that is not our concern, there seems to be a some confusion about character sets, so here is a wiki page about them?.

So, to cut this short, closing as WONTFIX, however we probably should remove the entity display from the CharacterMap? plugin.

If somebody wants to patch htmlarea.js to make it keep the entities then by all means (but it should be configurable, some prefer the raw characters) reopen.

comment:8 Changed 15 years ago by mharrisonline

  • Resolution wontfix deleted
  • Status changed from closed to reopened

It is so easy to fix this once and for all if you do this. In the implementation of HTMLArea3 and then Xinha used in Jones Standard we saved the file htmlarea.js as UTF-8 and changed HTMLArea.htmlEncode to be the example below. Now, if it's a character shown below, even if you paste the actual character instead of inserting the entity, Xinha will convert it to the entity. Nothing else (hex codes, trying to catch in PHP, etc.) that we tried worked for more than the most common symbols. One of the biggest problems found in examples in the old htmlarea forum was people using case insensitive regular expressions.

This converts a lot more than what the insert character plugin for Xinha offers, and it works 100% of the time for the symbols below. To use this you must save htmlarea.js as UTF-8. To save filesize you could delete any symbols you don't expect that you will encounter, or use the compressed version at the bottom.

HTMLArea.htmlEncode = function(str) { 
// we don't need regexp for that, but.. so be it for now. 
str = str.replace(/&/g, "&"); 
str = str.replace(/</g, "<"); 
str = str.replace(/>/g, ">"); 
str = str.replace(/¡/g, "¡");
str = str.replace(/¢/g, "¢");
str = str.replace(/£/g, "£");
str = str.replace(/¤/g, "¤");
str = str.replace(/¥/g, "¥");
str = str.replace(/¦/g, "¦");
str = str.replace(/§/g, "§");
str = str.replace(/¨/g, "uml;");
str = str.replace(/©/g, "©");
str = str.replace(/ª/g, "ª");
str = str.replace(/«/g, "«");
str = str.replace(/¬/g, "¬");
str = str.replace(/®/g, "®");
str = str.replace(/¯/g, "¯");
str = str.replace(/°/g, "°");
str = str.replace(/±/g, "±");
str = str.replace(/²/g, "²");
str = str.replace(/³/g, "³");
str = str.replace(/´/g, "´");
str = str.replace(/µ/g, "µ");
str = str.replace(/¶/g, "¶");
str = str.replace(/·/g, "·");
str = str.replace(/¸/g, "¸");
str = str.replace(/¹/g, "¹");
str = str.replace(/º/g, "º");
str = str.replace(/»/g, "»");
str = str.replace(/¼/g, "¼");
str = str.replace(/½/g, "½");
str = str.replace(/¾/g, "¾");
str = str.replace(/¿/g, "¿");
str = str.replace(/À/g, "À");
str = str.replace(/Á/g, "Á");
str = str.replace(/Â/g, "Â");
str = str.replace(/Ã/g, "Ã");
str = str.replace(/Ä/g, "Ä");
str = str.replace(/Å/g, "Å");
str = str.replace(/Æ/g, "Æ");
str = str.replace(/Ç/g, "Ç");
str = str.replace(/È/g, "È");
str = str.replace(/É/g, "É");
str = str.replace(/Ê/g, "Ê");
str = str.replace(/Ë/g, "Ë");
str = str.replace(/Ì/g, "Ì");
str = str.replace(/Í/g, "Í");
str = str.replace(/Î/g, "Î");
str = str.replace(/Ï/g, "Ï");
str = str.replace(/Ð/g, "Ð");
str = str.replace(/Ñ/g, "Ñ");
str = str.replace(/Ò/g, "Ò");
str = str.replace(/Ó/g, "Ó");
str = str.replace(/Ô/g, "Ô");
str = str.replace(/Õ/g, "Õ");
str = str.replace(/Ö/g, "Ö");
str = str.replace(/×/g, "×");
str = str.replace(/Ø/g, "Ø");
str = str.replace(/Ù/g, "Ù");
str = str.replace(/Ú/g, "Ú");
str = str.replace(/Û/g, "Û");
str = str.replace(/Ü/g, "Ü");
str = str.replace(/Ý/g, "Ý");
str = str.replace(/Þ/g, "Þ");
str = str.replace(/ß/g, "ß");
str = str.replace(/à/g, "à");
str = str.replace(/á/g, "á");
str = str.replace(/â/g, "â");
str = str.replace(/ã/g, "ã");
str = str.replace(/ä/g, "ä");
str = str.replace(/å/g, "å");
str = str.replace(/æ/g, "æ");
str = str.replace(/ç/g, "ç");
str = str.replace(/è/g, "è");
str = str.replace(/é/g, "é");
str = str.replace(/ê/g, "ê");
str = str.replace(/ë/g, "ë");
str = str.replace(/ì/g, "ì");
str = str.replace(/í/g, "í");
str = str.replace(/î/g, "î");
str = str.replace(/ï/g, "ï");
str = str.replace(/ð/g, "ð");
str = str.replace(/ñ/g, "ñ");
str = str.replace(/ò/g, "ò");
str = str.replace(/ó/g, "ó");
str = str.replace(/ó/g, "ó");
str = str.replace(/ô/g, "ô");
str = str.replace(/õ/g, "õ");
str = str.replace(/ö/g, "ö");
str = str.replace(/÷/g, "÷");
str = str.replace(/ø/g, "ø");
str = str.replace(/ù/g, "ù");
str = str.replace(/ú/g, "ú");
str = str.replace(/û/g, "û");
str = str.replace(/ü/g, "ü");
str = str.replace(/ý/g, "ý");
str = str.replace(/þ/g, "þ");
str = str.replace(/ÿ/g, "ÿ");
str = str.replace(/ƒ/g, "ƒ");
str = str.replace(/Α/g, "Α");
str = str.replace(/Β/g, "Β");
str = str.replace(/Γ/g, "Γ");
str = str.replace(/Δ/g, "Δ");
str = str.replace(/Ε/g, "Ε");
str = str.replace(/Ζ/g, "Ζ");
str = str.replace(/Η/g, "Η");
str = str.replace(/Θ/g, "Θ");
str = str.replace(/Ι/g, "Ι");
str = str.replace(/Κ/g, "Κ");
str = str.replace(/Λ/g, "Λ");
str = str.replace(/Μ/g, "Μ");
str = str.replace(/Ν/g, "Ν");
str = str.replace(/Ξ/g, "Ξ");
str = str.replace(/Ο /g, "Ο");
str = str.replace(/Π/g, "Π");
str = str.replace(/Ρ/g, "Ρ");
str = str.replace(/Σ/g, "Σ");
str = str.replace(/Τ/g, "Τ");
str = str.replace(/Υ/g, "Υ");
str = str.replace(/Φ/g, "Φ");
str = str.replace(/Χ/g, "Χ");
str = str.replace(/Ψ/g, "Ψ");
str = str.replace(/Ω/g, "Ω");
str = str.replace(/α/g, "α");
str = str.replace(/β/g, "β");
str = str.replace(/γ/g, "γ");
str = str.replace(/δ/g, "δ");
str = str.replace(/ε/g, "ε");
str = str.replace(/ζ/g, "ζ");
str = str.replace(/η/g, "η");
str = str.replace(/θ/g, "θ");
str = str.replace(/ι/g, "ι");
str = str.replace(/κ/g, "κ");
str = str.replace(/λ/g, "λ");
str = str.replace(/μ/g, "μ");
str = str.replace(/ν/g, "ν");
str = str.replace(/ξ/g, "ξ");
str = str.replace(/ο/g, "ο");
str = str.replace(/π/g, "π");
str = str.replace(/ρ/g, "ρ");
str = str.replace(/ς/g, "ς");
str = str.replace(/σ/g, "σ");
str = str.replace(/τ/g, "τ");
str = str.replace(/υ/g, "υ");
str = str.replace(/φ/g, "φ");
str = str.replace(/ω/g, "ω");
str = str.replace(/•/g, "•");
str = str.replace(/…/g, "…");
str = str.replace(/′/g, "′");
str = str.replace(/″/g, "″");
str = str.replace(/‾/g, "‾");
str = str.replace(/⁄/g, "⁄");
str = str.replace(/™/g, "™");
str = str.replace(/←/g, "←");
str = str.replace(/↑/g, "↑");
str = str.replace(/→/g, "→");
str = str.replace(/↓/g, "↓");
str = str.replace(/↔/g, "↔");
str = str.replace(/⇒/g, "⇒");
str = str.replace(/∂/g, "∂");
str = str.replace(/∏/g, "∏");
str = str.replace(/∑/g, "∑");
str = str.replace(/−/g, "−");
str = str.replace(/√/g, "√");
str = str.replace(/∞/g, "∞");
str = str.replace(/∩/g, "∩");
str = str.replace(/∫/g, "∫");
str = str.replace(/≈/g, "≈");
str = str.replace(/≠/g, "≠");
str = str.replace(/≡/g, "≡");
str = str.replace(/≤/g, "≤");
str = str.replace(/≥/g, "≥");
str = str.replace(/◊/g, "◊");
str = str.replace(/♠/g, "♠");
str = str.replace(/♣/g, "♣");
str = str.replace(/♥/g, "♥");
str = str.replace(/♦/g, "♦");
str = str.replace(/Œ/g, "Œ");
str = str.replace(/œ/g, "œ");
str = str.replace(/Š/g, "Š");
str = str.replace(/š/g, "š");
str = str.replace(/Ÿ/g, "Ÿ");
str = str.replace(/ˆ/g, "ˆ");
str = str.replace(/˜/g, "˜");
str = str.replace(/–/g, "–");
str = str.replace(/—/g, "—");
str = str.replace(/‘/g, "‘");
str = str.replace(/’/g, "’");
str = str.replace(/‚/g, "‚");
str = str.replace(/“/g, "“");
str = str.replace(/”/g, "”");
str = str.replace(/„/g, "„");
str = str.replace(/†/g, "†");
str = str.replace(/‡/g, "‡");
str = str.replace(/‰/g, "‰");
str = str.replace(/‹/g, "‹");
str = str.replace(/›/g, "›");
str = str.replace(/€/g, "€");
	// \x22 means '"' -- we use hex reprezentation so that we don't disturb
	// JS compressors (well, at least mine fails.. ;)
	str = str.replace(/\x22/ig, """);
	str = str.replace(/\xA0/gi," ");
	str = str.replace(String.fromCharCode(0x2264), "≤"); 
	str = str.replace(String.fromCharCode(0x2265), "≥");

return str;

Compressed version:

HTMLArea.htmlEncode=function(str){str=str.replace(/&/g,"&");str=str.replace(/</g,"<");str=str.replace(/>/g,">");str=str.replace(/¡/g,"¡");str=str.replace(/¢/g,"¢");str=str.replace(/£/g,"£");str=str.replace(/¤/g,"¤");str=str.replace(/¥/g,"¥");str=str.replace(/¦/g,"¦");str=str.replace(/§/g,"§");str=str.replace(/¨/g,"uml;");str=str.replace(/©/g,"©");str=str.replace(/ª/g,"ª");str=str.replace(/«/g,"«");str=str.replace(/¬/g,"¬");str=str.replace(/®/g,"®");str=str.replace(/¯/g,"¯");str=str.replace(/°/g,"°");str=str.replace(/±/g,"±");str=str.replace(/²/g,"²");str=str.replace(/³/g,"³");str=str.replace(/´/g,"´");str=str.replace(/µ/g,"µ");str=str.replace(/¶/g,"¶");str=str.replace(/·/g,"·");str=str.replace(/¸/g,"¸");str=str.replace(/¹/g,"¹");str=str.replace(/º/g,"º");str=str.replace(/»/g,"»");str=str.replace(/¼/g,"¼");str=str.replace(/½/g,"½");str=str.replace(/¾/g,"¾");str=str.replace(/¿/g,"¿");str=str.replace(/À/g,"À");str=str.replace(/Á/g,"Á");str=str.replace(/Â/g,"Â");str=str.replace(/Ã/g,"Ã");str=str.replace(/Ä/g,"Ä");str=str.replace(/Å/g,"Å");str=str.replace(/Æ/g,"Æ");str=str.replace(/Ç/g,"Ç");str=str.replace(/È/g,"È");str=str.replace(/É/g,"É");str=str.replace(/Ê/g,"Ê");str=str.replace(/Ë/g,"Ë");str=str.replace(/Ì/g,"Ì");str=str.replace(/Í/g,"Í");str=str.replace(/Î/g,"Î");str=str.replace(/Ï/g,"Ï");str=str.replace(/Ð/g,"Ð");str=str.replace(/Ñ/g,"Ñ");str=str.replace(/Ò/g,"Ò");str=str.replace(/Ó/g,"Ó");str=str.replace(/Ô/g,"Ô");str=str.replace(/Õ/g,"Õ");str=str.replace(/Ö/g,"Ö");str=str.replace(/×/g,"×");str=str.replace(/Ø/g,"Ø");str=str.replace(/Ù/g,"Ù");str=str.replace(/Ú/g,"Ú");str=str.replace(/Û/g,"Û");str=str.replace(/Ü/g,"Ü");str=str.replace(/Ý/g,"Ý");str=str.replace(/Þ/g,"Þ");str=str.replace(/ß/g,"ß");str=str.replace(/à/g,"à");str=str.replace(/á/g,"á");str=str.replace(/â/g,"â");str=str.replace(/ã/g,"ã");str=str.replace(/ä/g,"ä");str=str.replace(/å/g,"å");str=str.replace(/æ/g,"æ");str=str.replace(/ç/g,"ç");str=str.replace(/è/g,"è");str=str.replace(/é/g,"é");str=str.replace(/ê/g,"ê");str=str.replace(/ë/g,"ë");str=str.replace(/ì/g,"ì");str=str.replace(/í/g,"í");str=str.replace(/î/g,"î");str=str.replace(/ï/g,"ï");str=str.replace(/ð/g,"ð");str=str.replace(/ñ/g,"ñ");str=str.replace(/ò/g,"ò");str=str.replace(/ó/g,"ó");str=str.replace(/ó/g,"ó");str=str.replace(/ô/g,"ô");str=str.replace(/õ/g,"õ");str=str.replace(/ö/g,"ö");str=str.replace(/÷/g,"÷");str=str.replace(/ø/g,"ø");str=str.replace(/ù/g,"ù");str=str.replace(/ú/g,"ú");str=str.replace(/û/g,"û");str=str.replace(/ü/g,"ü");str=str.replace(/ý/g,"ý");str=str.replace(/þ/g,"þ");str=str.replace(/ÿ/g,"ÿ");str=str.replace(/ƒ/g,"ƒ");str=str.replace(/Α/g,"Α");str=str.replace(/Β/g,"Β");str=str.replace(/Γ/g,"Γ");str=str.replace(/Δ/g,"Δ");str=str.replace(/Ε/g,"Ε");str=str.replace(/Ζ/g,"Ζ");str=str.replace(/Η/g,"Η");str=str.replace(/Θ/g,"Θ");str=str.replace(/Ι/g,"Ι");str=str.replace(/Κ/g,"Κ");str=str.replace(/Λ/g,"Λ");str=str.replace(/Μ/g,"Μ");str=str.replace(/Ν/g,"Ν");str=str.replace(/Ξ/g,"Ξ");str=str.replace(/Ο /g,"Ο");str=str.replace(/Π/g,"Π");str=str.replace(/Ρ/g,"Ρ");str=str.replace(/Σ/g,"Σ");str=str.replace(/Τ/g,"Τ");str=str.replace(/Υ/g,"Υ");str=str.replace(/Φ/g,"Φ");str=str.replace(/Χ/g,"Χ");str=str.replace(/Ψ/g,"Ψ");str=str.replace(/Ω/g,"Ω");str=str.replace(/α/g,"α");str=str.replace(/β/g,"β");str=str.replace(/γ/g,"γ");str=str.replace(/δ/g,"δ");str=str.replace(/ε/g,"ε");str=str.replace(/ζ/g,"ζ");str=str.replace(/η/g,"η");str=str.replace(/θ/g,"θ");str=str.replace(/ι/g,"ι");str=str.replace(/κ/g,"κ");str=str.replace(/λ/g,"λ");str=str.replace(/μ/g,"μ");str=str.replace(/ν/g,"ν");str=str.replace(/ξ/g,"ξ");str=str.replace(/ο/g,"ο");str=str.replace(/π/g,"π");str=str.replace(/ρ/g,"ρ");str=str.replace(/ς/g,"ς");str=str.replace(/σ/g,"σ");str=str.replace(/τ/g,"τ");str=str.replace(/υ/g,"υ");str=str.replace(/φ/g,"φ");str=str.replace(/ω/g,"ω");str=str.replace(/•/g,"•");str=str.replace(/…/g,"…");str=str.replace(/′/g,"′");str=str.replace(/″/g,"″");str=str.replace(/‾/g,"‾");str=str.replace(/⁄/g,"⁄");str=str.replace(/™/g,"™");str=str.replace(/←/g,"←");str=str.replace(/↑/g,"↑");str=str.replace(/→/g,"→");str=str.replace(/↓/g,"↓");str=str.replace(/↔/g,"↔");str=str.replace(/⇒/g,"⇒");str=str.replace(/∂/g,"∂");str=str.replace(/∏/g,"∏");str=str.replace(/∑/g,"∑");str=str.replace(/−/g,"−");str=str.replace(/√/g,"√");str=str.replace(/∞/g,"∞");str=str.replace(/∩/g,"∩");str=str.replace(/∫/g,"∫");str=str.replace(/≈/g,"≈");str=str.replace(/≠/g,"≠");str=str.replace(/≡/g,"≡");str=str.replace(/≤/g,"≤");str=str.replace(/≥/g,"≥");str=str.replace(/◊/g,"◊");str=str.replace(/♠/g,"♠");str=str.replace(/♣/g,"♣");str=str.replace(/♥/g,"♥");str=str.replace(/♦/g,"♦");str=str.replace(/Œ/g,"Œ");str=str.replace(/œ/g,"œ");str=str.replace(/Š/g,"Š");str=str.replace(/š/g,"š");str=str.replace(/Ÿ/g,"Ÿ");str=str.replace(/ˆ/g,"ˆ");str=str.replace(/˜/g,"˜");str=str.replace(/–/g,"–");str=str.replace(/—/g,"—");str=str.replace(/‘/g,"‘");str=str.replace(/’/g,"’");str=str.replace(/‚/g,"‚");str=str.replace(/“/g,"“");str=str.replace(/”/g,"”");str=str.replace(/„/g,"„");str=str.replace(/†/g,"†");str=str.replace(/‡/g,"‡");str=str.replace(/‰/g,"‰");str=str.replace(/‹/g,"‹");str=str.replace(/›/g,"›");str=str.replace(/€/g,"€");str=str.replace(/\x22/ig,""");str=str.replace(/\xA0/gi," ");str=str.replace(String.fromCharCode(0x2264),"≤");str=str.replace(String.fromCharCode(0x2265),"≥");return str;};

Changed 15 years ago by mharrisonline

fix for ticket 127, special characters

comment:9 Changed 15 years ago by mharrisonline

I had to attach this in a text file, the code above is all wrong. After I submitted, the HTML entity in each line turned into the actual character. Looks like Xinha isn't the only thing with that problem. The second incidence of the character was originally the html entity.

comment:10 Changed 15 years ago by mharrisonline

It seems that for at least a month the HTMLArea.htmlEncode function no longer works in Xinha, rendering this fix useless.

comment:11 Changed 15 years ago by mharrisonline

Oops! Nevermind, it must have been my PC, I can't get it to not work now. It still works fine, sorry.

comment:12 Changed 15 years ago by gogo

  • Resolution set to wontfix
  • Status changed from reopened to closed

I don't want to introduce the code supplied above (htmlEncode) for two reasons...

  1. It's a lot of code, for an unnecessary purpose. There should be no reason to encode characters to html entities except for < > " and &
  2. I don't want to require that javascript files in Xinha are UTF-8 (which would be necessary to include htmlEncode) because of multiple developer concerns. Although this can be worked around by not including the UTF-8 characters but using javascript unicode escapes instead. but again, lots of unnecessary code IMHO.

This is more suitable for a plugin to implement, it doesn't need to be in the core.

comment:13 Changed 14 years ago by ray

  • Keywords special chars entities euro added
  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Summary changed from Special chars to convert special chars to HTML entities
  • Version changed from 2.0 to trunk

comment:14 Changed 14 years ago by ray

  • Resolution set to fixed
  • Status changed from reopened to closed

If you need the entities (e.g. to use the € in ISO-8859-1, a common case in europe), use the HtmlEntities? plugin (committed in Changeset [615])

comment:15 Changed 14 years ago by mharrisonline

Ray, the new plugin is great! That's one less thing I have to customize everytime I upgrade to the latest Xinha version. Thanks!

comment:16 Changed 13 years ago by znoob2@…

  • Keywords corrupt file added
  • Resolution fixed deleted
  • Status changed from closed to reopened

In Xinha version 0.92beta the Entities.js file is completely faulty. Something went wrong, definitely.

And furthermore, when the html contains a span that has at least a classname that contains AM (Equation plugin generates <span class="AM">, but class="GAME" would give the same result), this entire plugin (HtmlEntities) seems to be disabled. None of my characters in the html are converted to entities anymore. Maybe this is due to the mangled Entities.js file? Or is this purposely generated behaviour?

(Couldn't find Plugin_HtmlEntities as a component so I post thus under Xinha Core...)

comment:17 Changed 13 years ago by ray

  • Resolution set to fixed
  • Status changed from reopened to closed

The curruption is caused by the js comressor. rev [823]: changed the compression script to take care of such cases.

And furthermore, when the html contains a span that has at least a classname that contains AM (Equation plugin generates <span class="AM">, but class="GAME" would give the same result), this entire plugin (HtmlEntities) seems to be disabled. None of my characters in the html are converted to entities anymore. Maybe this is due to the mangled Entities.js file? Or is this purposely generated behaviour?

could not reproduce this

Note: See TracTickets for help on using tickets.