Wednesday, January 06, 2010

Special characters - part V

The end of the War of the Worlds?


Well... that remains to be seen. I predict the conclusion will be that there's always a possibility unwanted, or "wrong" characters get displayed. But when you start to understand why, my mission is accomplished. By the way - I found an excellent explanation on HTML right here

Setup


For the setup, I used the "we8" database instance (see this entry) I added the 10.1.3.3 Oracle Http Server(based on Apache 2), of which Linux version used to be downloadable off the database download page. There was a bit of work to be done to get things running. The official documentation states:
For Apache 2.0 startup to succeed, link the current libdb.so version to /usr/lib/libdb-3.3.so.

For example, if the current shared library is in /usr/lib is libdb-4.1.so, then run the following command:

cd /usr/lib
ln -s /usr/lib/libdb-4.1.so libdb-3.3.so

Database Access Descriptor (DAD)


My DAD looks simple enough:

# ============================================================================
# mod_plsql DAD Configuration File
# ============================================================================
# 1. Please refer to dads.README for a description of this file
# ============================================================================
<Location /pls/myapp>
SetHandler pls_handler
Order deny,allow
Allow from all
AllowOverride None
PlsqlDatabaseConnectString cs-frank03:1521:we8
PlsqlAuthenticationMode Basic
PlsqlErrorStyle ModplsqlStyle
PlsqlNLSLanguage American_America.UTF8
PlsqlDefaultPage scott.home
</Location>

I added my own html wrapper, and created the following procedure check if the DAD was configured correctly:

create or replace procedure home as
begin
htp.p(format.pagestart(
p_title=>'DAD Check page',
p_text=>'DAD Checked out successfully!')
);
end;
/

This results in a "DAD Checked out successfully!" message, signifying the http server works, the connection to the database works, and my wrapper works.

Webcode


Knowing all the basics are working, I created the following package to display the contents of the blah table:

create or replace package demochar as
procedure showblah;
prodedure editblah;
end;
/
create or replace package body demochar
as
procedure showblah
is
begin
htp.p(format.pagestart(p_title => 'Demonstration of Special Characters on the Web',
p_text => 'Welcome - contents of table is')
);
htp.p('</div>');
htp.p('<div class="content">');
htp.p('<table summary="blah" border="1" align="center">');
htp.p('<tr><th>Character<th>Hex value');
for i in (select b.a, dump(b.a,16) as d
from blah b)
loop
htp.p('<tr>');
htp.p('<td>' || i.a);
htp.p('<td>' || i.d);
end loop;
htp.p('</table>');
htp.p(format.pageend);
end showblah;
procedure editblah is
begin
null;
end editblah;
begin
DEBUG.setlevel(2);
DEBUG.setdestination(DEBUG.destweb);
DBMS_RANDOM.initialize(TO_NUMBER(TO_CHAR(SYSDATE, 'miss')));
end;
/

Data

The blah table is there again, and it's contents is restored:

SQL> col dump(a,16) format a20
SQL> select a, dump(a,16) from blah;

A DUMP(A,16)
-------------------- --------------------
€ Typ=1 Len=1: 80
Æ Typ=1 Len=1: c6
ß Typ=1 Len=1: df

Results and observations


First of all, the UTF8, used in the DAD is standard APEX, to name one. But is it the correct one to use?
Displayed on the web, the page looks like this (Linux/Firefox):

Om MS Windows (XP) with Firefox, as well as using Konqueror on Linux, the dreaded diamond is displayed:


So - the fact that our character does not display in a web browser is depending on the browser and on what is used in the DAD. As we change the DAD characterset to be used to
PlsqlNLSLanguage American_America.WE8ISO8859P15
the page is displayed correctly in Konquerer, incorrect in Firefox on Linux, but that can be corrected by forcing the page to use the 1252 codepage (!). Firefox under MS Windows reacts the same: auto detection, Universal and ISO-8859-15 all do not work, changing the Character Encoding to West European(Windows 1252) displays the page correctly. Internet Explorer V7.0.5730.13: same story: auto detection does not work.

Now - the question arises: is this due to the coding of the page? Character sets are encoded in the header. The last screenshot is the same on all browsers/systems, and is not due to encoding settings, but purely Oracle - the DAD characterset is now WE8ISO8859P1, which indeed does not know how to map the € sign. That results in an upside-down question mark.


Windows...
After inserting the euro character in the MS Windows encoding (0xa4, or 164), all browsers manage to show correct values (that is the € sign) for that when using UTF8 in the DAD, as well as browser Character Encoding. Using WE9ISO8859P15 in the DAD, the pages will display one € correct, based on Character Encoding in the Browser:
- Western ISO-8859-15 will display € for 0xa4
- Western (Windows-1252) will display € for 0x80

Only Konqueror mangages to display the € symbol for both codes, when the DAD uses WE8ISO8859P15, and the page is explicitly coded in ISO-8859-15 (using the META element Content-type in the header):


Conclusion


There's alwyas at least one way to screw up. If it's not the database, that was fed the wrong code points, it's the browser, that does a lousy job. Konqueror does an admirable job, though. Internet Explorer as well as Firefox have the possibility to switch Character Coding, which may result in correct display of the glyphs.
UTF-8 is not always the best PlsqlNLSLanguage!

No comments: