Build 4.5.0.8 of xlsgen adds support for charset and language for the 4 import file formats : CSV, HTML, JSON and XML.
By default, xlsgen infers data types using the user's regional settings when it comes to the language, and it relies on charset markup built in files, whenever applicable, to parse files.
Regarding language, such as en_US, en_GB and fr_FR, those affect how data type inference recognizes numbers, currencies and dates.
When you know in advance the file being imported is of a given language, you can pass it to xlsgen before importing it by setting the following property :
worksheet.Import.CSV.Options.Language = "fr_FR"; // example of custom language used in an imported CSV file
And this option is equally available for HTML, JSON and XML files and buffers.
Worth noting the syntax of the language parameter. This is made of the primary language initials, followed by an underscore, and the secondary language initials. As such, US English behaves differently than British English. It's normalized as
RFC 1766.
Regarding charsets, it's a bit more involved, because charset may be present or not in each file being imported, and specs vary depending on the file format.
- CSV file : the charset can be implicit for Unicode 2 and Unicode UTF-8 with the presence of BOM markers at the beginning of the file. xlsgen already handles BOM. Otherwise it is assumed the charset is the user's current code page. This can be overridden by setting the following property :
worksheet.Import.CSV.Options.Charset = "iso-8859-1"; // example of custom charset used in an imported CSV file
- XML file : the charset is explicit in the XML markup, in the first line. This can be overridden by setting the following property :
worksheet.Import.XML.Options.Charset = "iso-8859-1"; // example of custom charset used in an imported XML file
- JSON file : the charset is defaulting to Unicode UTF-8. This can be overridden by setting the following property :
worksheet.Import.JSON.Options.Charset = "iso-8859-1"; // example of custom charset used in an imported JSON file
- HTML file : the charset is explicit in the HTML markup, in optional meta HTTP equiv markup. This can be overridden by setting the following property :
worksheet.Import.HTML.Options.Charset = "iso-8859-1"; // example of custom charset used in an imported HTML file
When any of those files are imported from the internet, the HTTP response headers have a charset spec too, that is seen and passed along by xlsgen. But the custom charset setting always override everything else.