Developer article : How to use source code to generate Excel spreadsheets

 

I thought it was about time to write a little article on Excel, since there are interesting things to say even a decade after it's been released. Given how much and many people use it these days, if the sheer amount of people posting Excel topics every day in newsgroup is any sign of actual usage at work, then if you are running a business around Excel, or if you are looking for help on an Excel task, you are very likely to find answers.

So what is this article about? This article is about a few links on how to best use coding tools to generate Excel workbooks. You know you can use xlsgen since it has a rather rich API, scales much better than Excel automation, lets you do obtain code from existing Excel workbooks, let's you use templates and so on, but then there are several use cases where other tools or techniques will get your work done a different way, sometimes very very efficicently and with no effort.

So among the details to take into account are :

  • do you need the Excel workbooks to be generated with the native .xls file format, or does this is not a problem for you? Depending on your answer, there is a deployment issue since file formats like html or xml can be imported in Excel with Excel XP or above, but not in previous releases, putting the burden on end-users.
  • how fast do you need those Excel workbooks to be generated?
  • do you need a server-side generation of these?
  • what is your programming language of choice? ASP.NET, C#/VB.NET, VBA, VB, C++ or none of these?
  • do you want interactions? (Active documents, OWC component)
  • do you want database access? ADO, ODBC
  • do you need debugging? Excel VBA, VSTO
  • do you need direct support for the new Excel 2007 file formats?

 

Native support for the Excel file format (.XLS)

Whatever third-party component you rely on, it must mention whether or not it produces real .xls files in the native file format containing the OLE-based BIFF (Binary Interchange File Format) known by Excel. If that is the case, you will be able to open those workbooks using Excel 95 (may be), Excel 97, Excel 2000, Excel XP, Excel 2003 and Excel 2007. In other words, if a component produces Excel workbooks using the native file format, then you can rest assured and send your documents to your colleagues, partners and so on regardless the Office version they are using.

MS has removed the documentation for that file format from MSDN library a long time ago already.

xlsgen does exactly this. Although xlsgen does provide a comprehensive support for cells, formatting, templates and the like, you might only need raw cell generation, and for that matter you could even find valuable code out there, for instance on the online codeproject developer site there is a tiny C++ code that will create a simple cell-based Excel workbook.

Here is a quick price-ordered list of third parties producing native Excel files (.XLS), ordered by price. (if a 3rd party feels he's factually misrepresented below, then please let me know).

NameDeveloper license priceOEM license priceSupport priceCalculation engineExcel 2007 support(*)Comment
HSSF POI APIfreefreefreenonoJava only. Open source project (use it at your own risks). Lacks a calculation engine. Lacks rich Excel objects such as charts, pictures, outline, ...
JExcel APIfreefreefreenonoJava only. Open source project (use it at your own risks). Lacks a calculation engine. Lacks rich Excel objects such as charts.
NikaSoft NativeExcel50$ (150$ site)freefreeyesnoDelphi only. Lacks rich Excel objects such as charts.
xlsgen75 euros250 eurosfreeyesyesOver 10 programming languages (C, C++ 6.0/7.x, VB, VB.NET, C#, Java, Delphi, Perl, Python, ...). Requires fulltrust. Provided with an automatic code generation tool to avoid having to learn the object model.
ActiveXLS295$ (1,195$ for server license)2,850$148$/yearnono.NET only. Requires fulltrust. A separate product package for Java is available (295$ dev license, 1,195$ server license).
SoftArtisans ExcelWriter299$ (> 1,495$ for server license)not availableadditional 15% of product pricenono.NET only. Requires fulltrust. Lack of OEM license means it cannot be deployed with your own products.
Gembox Spreadsheet375$freefreenono.NET only. Requires fulltrust. Does not support a number of rich Excel objects such as charts, ... Rudimentary function support.
Syncfusion XlsIO495$+495$free100$/yearyesno.NET only. Requires fulltrust. Consists in two packages that work together to provide generation (XlsIO) and calculation (Calculate).
Aspose Cells599$7,188$one year of support includedyesno.NET only. A separate product package Aspose Cells for Java is available (599$ dev license, 7,188$ OEM license) but is not on a feature par with the .NET implementation. Does not require fulltrust (OLE Storage reimplemented). Aspose features a number of design-time controls.
SpreadsheetGear999$ (+499$ for upgrade/ year)freeone year included, then 499$/year.yesno.NET only. Requires fulltrust. Does not support rich Excel objects such as charts, ...In its latest incarnation, SpreadsheetGear features a Excel-like datagrid winforms control, which should actually be sold as a separate product since it does not contribute to the main component.

(*) Excel 2007 support means native new ZIP-based Excel 2007 XML file format generation.

If we break down the 3rd parties based on the programming language that can be used :

languageCC++ 6.0/7.xVB.NETJavaDelphiOther COM compatible languages (Perl, python, PHP, ...)
 xlsgenxlsgenxlsgenxlsgen, Gembox Spreadsheet, SoftArtisans ExcelWriter, Syncfusion XlsIO, Spreadsheetgear, ActiveXLS, Aspose Cellsxlsgen, HSSF POI, JExcel API, ActiveXLS, Aspose Cellsxlsgen, NikaSoft NativeExcelxlsgen, SoftArtisans ExcelWriter (COM), ActiveXLS

 

Other file formats compatible with Excel

It is very easy to produce an html page, name it something like document.xls, and then persuade Excel to read it as a regular Excel workbook although it's really html.

This requires Excel XP and above to open such workbook. If you don't care the deployment issue, then you can generate html with some specific mso syntax (microsoft object drawing file format), and open the resulting documents using Excel.

This raises a few questions : 1) how do I get info about this mso syntax? 2) what are the tools that would help me to do just that?

Answer to question 1) is contained in the Office html file format documentation. here is a version (compressed html file, 627kb). Those documents are really reference documents, they are hard to read and understand and are not meant to explain what are the objects for. For that matter, you may need to buy one or more how-to Excel related books.

Answer to question 2) there are implementations that take advantage of that html. For instance, this ASP.NET implementation can leverage html generation on the server side, and use a MIME content type (application/vnd.ms-excel) trick to allow the client to recognize the generated content as a workbook that Excel will try to open.

Then you might want to give a try to the latest xml schemas introduced by Microsoft in Excel 2003, namely SpreadsheetML. SpreadsheetML builds on the previous html+mso. You can download the reference documentation for SpreadsheetML here (MS site), and here is an example of SpreadsheetML markup code. You can distinguish SpreadsheetML markup from others by just looking up the namespace for it : urn:schemas-microsoft-com:office:spreadsheet.

The schema is complex enough to get you lost, and it also has a strong limitation : it only supports cells and a few cell-related objects like lists. It does not support charts for instance. If you are into xml however, you might want to give it a try, here is an article.

Here are generation tools based on that schema : 1) Carlos Excel Writer in C# (you can decompile the assembly using a tool like Reflector), and it's free. 2) PHP writer (the "native XLS binary" title in the webpage is not accurate), it's not free. 3) Java source code to generate Excel Spreadsheet ML.

 

Fast generation

If you need a fast generation of Excel content, do not use the built-in automation Excel object model. Whether you end up using VBA, C++, or whatever language supporting automation, all calls to the object model have to cross the process boundaries (Excel is a separate process) and this alone is a degradation factor of 10 of general performance. And then there is more to that, the Excel object model is not designed for cell-intensive calls, neither is reliable for such scenarios. There are numerous Microsoft knowledge base articles explaining it's better to avoid using Excel automation at all for intensive scenarios.

If you need a mission-critical generator, just give a try to xlsgen. xlsgen does not live in a separate process, and is designed for intensive cell formatting and generation.

 

Excel interactions?

There is an Excel ActiveX that will let you embed an Excel workbook right within a web page, it's called Office web components (OWC). If you Google OWC9, OWC10 and OWC11 then you'll find the ActiveX component compatible with Excel 2000, Excel XP and Excel 2003. Office web components show formatted cells, charts, pivot tables and pivot charts. Note that embedded VBA macros won't be executed. Here is an example using OWC11 in ASP.NET C#.

There is more to it, Microsoft originally implemented OLE container support for Excel workbooks, which means you can open an Excel workbook in Internet Explorer, Word, or any other OLE container compatible application and this will produce the following effect : some or all of the client area of the application is replaced by the Excel grid (if that's Internet Explorer, that's all of the client area); the menu bar is the Excel menu bar ; right-click shows up the Excel context menu ; you cannot save the document (essentially originating from the fact that the Excel document container works on a basis of a temporary file).

 

Excel and databases

Oddly enough, it is perfectly possible to read or write Excel from databases like MS Access, Sql Server and so on. Of course, this will work in scenarios where you are interested in cells, not in charts or other rich objects. Yet, this provides a rather efficient tunnelling of data that avoids translation layers, prone to errors by definition. Legacy database layers like DAO allow that, as more recent layers like ODBC, OLEDB and ADO do. Here is an example of code using C++ and ODBC. Here is an example of code using C# and OLEDB, and here is another one. What those pieces of code have in common is the fact that you are using regular SQL statements to populate Excel workbooks back and forth. Quite unnatural from an Excel perspective, but it works!

 

Excel debugging?

The most obvious way to debug your Excel code is to use an automation-enabled language. This allows to put breakpoint in some debugger, progress on a step-by-step basis, watch the variables and so on. If you write VBA code, you can use the built-in Excel VBE editor for that matter.

MS came up with VSTO (Visual Studio Tools for Office), an add-on to VisualStudio.NET that leverages the support of Active documents to allow users/developers to see and program against Excel workbooks right within Visual Studio. VSTO however uses an interop layer (Primary interop assemblies) to deal with Excel and this is for that reason not the most suitable way to write code due to lack of clarity and concision. VSTO however allows any .NET language to leverage the Excel object model, well pretty much like automation-based languages can so far.

 

Excel and the "too many different cell formats" problem

Excel has a longstanding bug that Microsoft never bothered to fix, it's a hardcoded limit of how many formatting styles one can use in a workbook.


Microsoft® Excel cannot hold too many formatting styles.

Fortunately, there is a third-party tool, the xlsgen style reduction tool, that fixes the problem by 1) removing duplicate formatting styles 2) compressing similar formatting styles at an arbitrary compression level making a rich workbook editable again.

 

Excel 2007 and the new XML-based file format

The new Excel release from Microsoft coming in the 2007 timeframe comes with important changes in regards to the file formats. A number of new ones are introduced and are the defaults. The release comes among an overhaul of file formats not just for Excel 2007 but also Word 2007, Powerpoint 2007 and Xml Paper Specification (equivalent to Adobe PDF) based documents.

Word 2007 File TypeExtension
Word 2007 XML Document.docx
Word 2007 XML Macro-Enabled Document.docm
Word 2007 XML Template.dotx
Word 2007 XML Macro-Enabled Template.dotm

Excel 2007 File TypeExtension
Excel 2007 XML Workbook.xlsx
Excel 2007 XML Macro-Enabled Workbook.xlsm
Excel 2007 XML Template.xltx
Excel 2007 XML Macro-Enabled Template.xltm
Excel 2007 Binary Workbook.xlsb
Excel 2007 XML Macro-Enabled Add-In.xlam

PowerPoint 2007 File TypeExtension
PowerPoint 2007 XML Presentation.pptx
PowerPoint 2007 Macro-Enabled XML Presentation.pptm
PowerPoint 2007 XML Template.potx
PowerPoint 2007 Macro-Enabled XML Template.potm
PowerPoint 2007 Macro-Enabled XML Add-In.ppam
PowerPoint 2007 XML Show.ppsx
PowerPoint 2007 Macro-Enabled XML Show.ppsm

XPS File TypeExtension
Xml paper specification.xps

Each of those files are actually ZIP files. You can rename one manually and then see the parts it's made of. A number of those parts use XML. But Microsoft does not provide tools to manipulate those files. In particular, if you are intending to write against those new file formats, hoping that this will be a good return in investment over time, you may want to take a look at the diffopc+ tool which 1) automatically creates diff views from such files, a unique and critical feature if you want to create/manipulate/write against those files without having to learn the large spec (6000+ pages) and 2) lets one navigate parts and relationships by clicking links and 3) provide a visual cue from a validation layer and 4) provide extra navigational links for Excel 2007 spreadsheets in order to quickly navigate shared strings, styles, external references and so on. Here is a screen capture of a diff view produced by the tool :


diffopc+ creates diff views from arbitrary Office 2007 and XPS file pairs
It also creates navigational links in parts and relationships

 

Resources on the net

 

 

Stéphane Rodriguez, Jan 29 2005. Last update : Nov 27 2006.

 


Home
Blog