Developer article : How to use source code to generate Excel spreadsheets
I thought it was about time to write a little article on Excel, since there are interesting things to say even a decade after it's been released. Given how much and many people use it these days, if the sheer amount of people posting Excel topics every day in newsgroup is any sign of actual usage at work, then if you are running a business around Excel, or if you are looking for help on an Excel task, you are very likely to find answers. So what is this article about? This article is about a few links on how to best use coding tools to generate Excel workbooks. You know you can use xlsgen since it has a rather rich API, scales much better than Excel automation, lets you do obtain code from existing Excel workbooks, let's you use templates and so on, but then there are several use cases where other tools or techniques will get your work done a different way, sometimes very very efficicently and with no effort. So among the details to take into account are :
Native support for the Excel file format (.XLS)Whatever third-party component you rely on, it must mention whether or not it produces real .xls files in the native file format containing the OLE-based BIFF (Binary Interchange File Format) known by Excel. If that is the case, you will be able to open those workbooks using Excel 95 (may be), Excel 97, Excel 2000, Excel XP, Excel 2003 and Excel 2007. In other words, if a component produces Excel workbooks using the native file format, then you can rest assured and send your documents to your colleagues, partners and so on regardless the Office version they are using. MS has removed the documentation for that file format from MSDN library a long time ago already. xlsgen does exactly this. Although xlsgen does provide a comprehensive support for cells, formatting, templates and the like, you might only need raw cell generation, and for that matter you could even find valuable code out there, for instance on the online codeproject developer site there is a tiny C++ code that will create a simple cell-based Excel workbook. Here is a quick price-ordered list of third parties producing native Excel files (.XLS), ordered by price. (if a 3rd party feels he's factually misrepresented below, then please let me know).
(*) Excel 2007 support means native new ZIP-based Excel 2007 XML file format generation. If we break down the 3rd parties based on the programming language that can be used :
Other file formats compatible with ExcelIt is very easy to produce an html page, name it something like This requires Excel XP and above to open such workbook. If you don't care the deployment issue, then you can generate html with some specific mso syntax (microsoft object drawing file format), and open the resulting documents using Excel. This raises a few questions : 1) how do I get info about this mso syntax? 2) what are the tools that would help me to do just that? Answer to question 1) is contained in the Office html file format documentation. here is a version (compressed html file, 627kb). Those documents are really reference documents, they are hard to read and understand and are not meant to explain what are the objects for. For that matter, you may need to buy one or more how-to Excel related books. Answer to question 2) there are implementations that take advantage of that html. For instance, this ASP.NET implementation can leverage html generation on the server side, and use a MIME content type ( Then you might want to give a try to the latest xml schemas introduced by Microsoft in Excel 2003, namely SpreadsheetML. SpreadsheetML builds on the previous html+mso. You can download the reference documentation for SpreadsheetML here (MS site), and here is an example of SpreadsheetML markup code. You can distinguish SpreadsheetML markup from others by just looking up the namespace for it : The schema is complex enough to get you lost, and it also has a strong limitation : it only supports cells and a few cell-related objects like lists. It does not support charts for instance. If you are into xml however, you might want to give it a try, here is an article. Here are generation tools based on that schema : 1) Carlos Excel Writer in C# (you can decompile the assembly using a tool like Reflector), and it's free. 2) PHP writer (the "native XLS binary" title in the webpage is not accurate), it's not free. 3) Java source code to generate Excel Spreadsheet ML.
Fast generationIf you need a fast generation of Excel content, do not use the built-in automation Excel object model. Whether you end up using VBA, C++, or whatever language supporting automation, all calls to the object model have to cross the process boundaries (Excel is a separate process) and this alone is a degradation factor of 10 of general performance. And then there is more to that, the Excel object model is not designed for cell-intensive calls, neither is reliable for such scenarios. There are numerous Microsoft knowledge base articles explaining it's better to avoid using Excel automation at all for intensive scenarios. If you need a mission-critical generator, just give a try to xlsgen. xlsgen does not live in a separate process, and is designed for intensive cell formatting and generation.
Excel interactions?There is an Excel ActiveX that will let you embed an Excel workbook right within a web page, it's called Office web components (OWC). If you Google OWC9, OWC10 and OWC11 then you'll find the ActiveX component compatible with Excel 2000, Excel XP and Excel 2003. Office web components show formatted cells, charts, pivot tables and pivot charts. Note that embedded VBA macros won't be executed. Here is an example using OWC11 in ASP.NET C#. There is more to it, Microsoft originally implemented OLE container support for Excel workbooks, which means you can open an Excel workbook in Internet Explorer, Word, or any other OLE container compatible application and this will produce the following effect : some or all of the client area of the application is replaced by the Excel grid (if that's Internet Explorer, that's all of the client area); the menu bar is the Excel menu bar ; right-click shows up the Excel context menu ; you cannot save the document (essentially originating from the fact that the Excel document container works on a basis of a temporary file).
Excel and databasesOddly enough, it is perfectly possible to read or write Excel from databases like MS Access, Sql Server and so on. Of course, this will work in scenarios where you are interested in cells, not in charts or other rich objects. Yet, this provides a rather efficient tunnelling of data that avoids translation layers, prone to errors by definition. Legacy database layers like DAO allow that, as more recent layers like ODBC, OLEDB and ADO do. Here is an example of code using C++ and ODBC. Here is an example of code using C# and OLEDB, and here is another one. What those pieces of code have in common is the fact that you are using regular SQL statements to populate Excel workbooks back and forth. Quite unnatural from an Excel perspective, but it works!
Excel debugging?The most obvious way to debug your Excel code is to use an automation-enabled language. This allows to put breakpoint in some debugger, progress on a step-by-step basis, watch the variables and so on. If you write VBA code, you can use the built-in Excel VBE editor for that matter. MS came up with VSTO (Visual Studio Tools for Office), an add-on to VisualStudio.NET that leverages the support of Active documents to allow users/developers to see and program against Excel workbooks right within Visual Studio. VSTO however uses an interop layer (Primary interop assemblies) to deal with Excel and this is for that reason not the most suitable way to write code due to lack of clarity and concision. VSTO however allows any .NET language to leverage the Excel object model, well pretty much like automation-based languages can so far.
Excel and the "too many different cell formats" problemExcel has a longstanding bug that Microsoft never bothered to fix, it's a hardcoded limit of how many formatting styles one can use in a workbook.
Fortunately, there is a third-party tool, the xlsgen style reduction tool, that fixes the problem by 1) removing duplicate formatting styles 2) compressing similar formatting styles at an arbitrary compression level making a rich workbook editable again.
Excel 2007 and the new XML-based file formatThe new Excel release from Microsoft coming in the 2007 timeframe comes with important changes in regards to the file formats. A number of new ones are introduced and are the defaults. The release comes among an overhaul of file formats not just for Excel 2007 but also Word 2007, Powerpoint 2007 and Xml Paper Specification (equivalent to Adobe PDF) based documents.
Each of those files are actually ZIP files. You can rename one manually and then see the parts it's made of. A number of those parts use XML. But Microsoft does not provide tools to manipulate those files. In particular, if you are intending to write against those new file formats, hoping that this will be a good return in investment over time, you may want to take a look at the diffopc+ tool which 1) automatically creates diff views from such files, a unique and critical feature if you want to create/manipulate/write against those files without having to learn the large spec (6000+ pages) and 2) lets one navigate parts and relationships by clicking links and 3) provide a visual cue from a validation layer and 4) provide extra navigational links for Excel 2007 spreadsheets in order to quickly navigate shared strings, styles, external references and so on. Here is a screen capture of a diff view produced by the tool : ![]() diffopc+ creates diff views from arbitrary Office 2007 and XPS file pairs It also creates navigational links in parts and relationships
Resources on the net
Stéphane Rodriguez, Jan 29 2005. Last update : Nov 27 2006.
|
Home Blog |