Databases, SQL and Foreign Character Sets: It'll be O.K. in the N

by Stefan Hudson

One of the challenges faced when creating SimplyLingo was the storage, updating and retrieval of foreign character sets in a SQLServer database using SQL. The compilation of 6,000 words and simple phrases in each language, although time consuming and laborious, was technically easily achieved using an Excel spreadsheet. It soon became apparent that transferring to and retrieving from a database was not so easy.

This short article is intended to assist inexperienced web developers or those not familiar with combining databases and foreign words. Some elementary knowledge of SQL and database development is assumed. If you don't have that knowledge and would like to learn you can take a free on-line course here.

Standard character sets are not appropriate for most languages (esp. Greek) so it is necessary to use UTF-8 (Unicode) character encoding to provide the representations required for the full language sets, including accents. So when choosing a field-type in SQLServer for a foreign word the normal "varchar" types are inadequate. What is needed is the "nvarchar" types which take up twice as much space as the equivalent "varchar" types thus providing more space for the additional combinations of the foreign sets. This then is the first decision to make.

When you store foreign words use nvarchar types not varchar

Most website hosts provide tools for maintaining the database and many of these provide a tool for bulk adding to a table from a comma delimited file (csv). If your web hosts don't provide this then either change to another who does or learn how to add entries programmatically. (Some ideas about this below).

So assuming you have a tool like this there is still one or two things needed to make it work.

First, you need to save your spreadsheet file in a text format using Unicode text. Usually this is achieved by scrolling down to this type on the Save page in Excel. This probably won't produce a csv file (comma separated variables) but a text file with fields separated by tab characters. This means that as you import the file you will have to specify the file as TAB delimited not comma. This is normally an option in the import tool.

You can inspect the import results using the query tool from the web hosts with a statement such as:

SELECT * FROM tablename;

Well and good if you can report all the rows you imported. If not you will need to find the reason the import failed. Some of the reasons I came across at SimplyLingo:

So, now you're ready to use the data and you write some code similar to this:

SELECT field1, field2….. FROM tablename WHERE field3 = ‘value';

Shock, horror – either the query does not work at all or it retrieves rows where many of the columns are filled with "???????????????". - Why?

The SQL code needs to be told that the columns are Unicode.

The above code should be written thus:

Select field1, field2…. FROM tablename WHERE field3 = N'value';

The N tells SQL to expect Unicode characters.

Similarly if you wish to insert rows in the database use the following code (this is what you must do to import if your database maintenance tool does not have an import function):

INSERT INTO tablename
(first-column, ... last-column)
VALUES (N'string1', N'string2', numeric1, N'string3'…);

The N is used each time the field is defined as Unicode and must immediately precede the quote.

Similarly for updates:

UPDATE tablename
SET column1 = N‘string1',
Column2 = N'string2',
Column3 = numeric1…..
WHERE columnname = N'string';

Although the above seem relatively small tips, in reality if you are not used to dealing with foreign character sets and are not experienced they could save you hours, days or even weeks of trial-and-error and frustration. But as I found, it all comes good in the N.

About the Author

Stefan Hudson is a retired computer professional who has tried learning languages himself (notably Greek). After using home-made flash cards for some time and with spare time on his hands he developed the vocabulary training web site SimplyLingo.

SimplyLingo is a language learning vocabulary training web site. It provides up to 6,000 words and simple phrases in English, French, Greek and Spanish in an easy to use methodology.

comments powered by Disqus

Other articles


Learn Chinese Characters with the Omniglot Chinese app | Language Jobs at Jobseeker.co.uk
Hosted by Kualo