Automated IE SaveAs Mhtml

The purpose of this article is to show how to automate the full-fledge Save As html feature from IE, which is normally hidden to those using the Internet Explorer API.

Saving the current document as Mhtml format is just one of the options available including :
  • Save As Mhtml (whole web page, images, ... in a single file)
  • Save As Full Html (additional folder for images, ...)
  • Save Html code only
  • Save As Text

 

Saving silently as Html using the IE API

In fact, the ability to save the current web page for storage without showing a single dialog box is already available to everyone under C++, using the following code, with an important restriction :
LPDISPATCH lpDispatch = NULL;
IPersistFile *lpPersistFile = NULL;

// m_ctrl is an instance of the Web Browser control
lpDispatch = m_ctrl.get_Document();
lpDispatch->QueryInterface(IID_IPersistFile, (void**)&lpPersistFile);

lpPersistFile->Save(L"c:\\htmlpage.html",0);
lpPersistFile->Release();
lpDispatch->Release();
(caption for code above) Saving Html code only, without dialog boxes

The restriction is that we are talking about the Html code only, not the web page. Of course, what is interesting is to gain access to full Html archives with images and so on.

Because there is no "public" or known way to ask for this feature without showing one or more dialog boxes from IE, what we are going to do is hook the operating system to listen all window creations, including the dialog boxes, then ask IE for the feature, then override the file path from the dialog boxes without being seen. Finally we mimic the user clicking on the Save button to validate the dialog box, and we unhook ourselves. That's done !

Hooking IE to SaveAs Html without popping the dialog boxes

 

This was the short workflow, but there are a few tricks to get along, and this article is a unique opportunity to go into details. By the way, the code is rooted by an article from MS about how to customize IE Printing by hooking the Print dialog boxes, see here or here.

In our app, we have our own SaveAs feature :
m_wbSaveAs.Config( CString("c:\\htmlpage.mhtml"), SAVETYPE_ARCHIVE );
m_wbSaveAs.SaveAs();

where the second parameter is the type of Html needed :
typedef enum _SaveType
{
    SAVETYPE_HTMLPAGE = 0,
    SAVETYPE_ARCHIVE,
    SAVETYPE_HTMLONLY,
    SAVETYPE_TXTONLY
} SaveType;
We start the SaveAs() implementation by installing the windows hook :
// prepare SaveAs Dialog hook
//
g_hHook = SetWindowsHookEx(WH_CBT, CbtProc, NULL, GetCurrentThreadId());
if (!g_hHook)
    return false;

// make SaveAs Dialog appear
//
// cmd = OLECMDID_SAVEAS (see ./include/docobj.h)
g_bSuccess = false;
g_pWebBrowserSaveAs = this;
HRESULT hr = m_pWebBrowser->ExecWB(OLECMDID_SAVEAS, OLECMDEXECOPT_PROMPTUSER, NULL, NULL);


// remove hook
UnhookWindowsHookEx(g_hHook);
g_pWebBrowserSaveAs = NULL;
g_hHook = NULL;
The hook callback procedure is just hardcore code, see for yourself :
LRESULT CALLBACK CSaveAsWebbrowser::CbtProc(int nCode, WPARAM wParam, LPARAM lParam) 
{  
    // the windows hook sees for each new window being created :
    // - HCBT_CREATEWND : when the window is about to be created
    //      we check out if it is a dialog box (classid = 0x00008002, see Spy++)
    //      and we hide it, likely to be the IE SaveAs dialog
    // - HCBT_ACTIVATE : when the window itself gets activited
    //      we run a separate thread, and let IE do his own init steps in the mean time
    switch (nCode)
    {
        case HCBT_CREATEWND:
        {
            HWND hWnd = (HWND)wParam;
            LPCBT_CREATEWND pcbt = (LPCBT_CREATEWND)lParam;
            LPCREATESTRUCT pcs = pcbt->lpcs;
            if ((DWORD)pcs->lpszClass == 0x00008002)
            {
                g_hWnd = hWnd;          // Get hwnd of SaveAs dialog
                pcs->x = -2 * pcs->cx;  // Move dialog off screen
            }
            break;
        }	
        case HCBT_ACTIVATE:
        {
            HWND hwnd = (HWND)wParam;
            if (hwnd == g_hWnd)
            {
                g_hWnd = NULL;
                g_bSuccess = true;

                if (g_pWebBrowserSaveAs->IsSaveAsEnabled())
                {
                    g_pWebBrowserSaveAs->SaveAsDisable();

                    CSaveAsThread *newthread = new CSaveAsThread();
                    newthread->SetKeyWnd(hwnd);
                    newthread->Config( g_pWebBrowserSaveAs->GetFilename(), 
                                       g_pWebBrowserSaveAs->GetSaveAsType() );
                    newthread->StartThread();
                }
            }
            break;
        }
    }
    return CallNextHookEx(g_hHook, nCode, wParam, lParam); 
} 
In our thread, we wait the IE SaveAs dialog is ready with filled data :
switch(	::WaitForSingleObject( m_hComponentReadyEvent, m_WaitTime) )
{
     ...
     if ( ::IsWindowVisible(m_keyhwnd) )
     {
         bSignaled = TRUE;
         bContinue = FALSE;
     }

     MSG msg ;
     while( PeekMessage(&msg, NULL, 0, 0, PM_REMOVE) )
     {
         if (msg.message == WM_QUIT)
         {
              bContinue = FALSE ;
              break ;
         }
         TranslateMessage(&msg);
         DispatchMessage(&msg);
     }
     ...
}

// relaunch our SaveAs class, but now everything is ready to play with
if (bSignaled)
{
    CSaveAsWebbrowser surrenderNow;
    surrenderNow.Config( GetFilename(), GetSaveAsType() );
    surrenderNow.UpdateSaveAs( m_keyhwnd );
}

// kill the thread, we don't care anymore about it
delete this;
We can now override the appropriate data :
void CSaveAsWebbrowser::UpdateSaveAs(HWND hwnd)
{
    // editbox : filepath (control id = 0x047c)
    // dropdown combo : filetypes (options=complete page;archive;html only;txt) (control id = 0x0470)
    // save button : control id = 0x0001
    // cancel button : control id = 0x0002


    // select right item in the combobox
    SendMessage(GetDlgItem(hwnd, 0x0470), CB_SETCURSEL, (WPARAM) m_nSaveType, 0);
    SendMessage(hwnd, WM_COMMAND, MAKEWPARAM(0x0470,CBN_CLOSEUP), (LPARAM) GetDlgItem(hwnd, 0x0470));

    // set output filename
    SendMessage(GetDlgItem(hwnd, 0x047c), WM_SETTEXT, (WPARAM) 0, (LPARAM)(LPCTSTR)m_szFilename);

    SendMessage(GetDlgItem(hwnd, 0x0001), BM_CLICK, 0, 0);  // Invoke Save button
}
In the code above, it is funny to remark that to select the kind of Html we want (full Html, archive, code only, or txt format), we not only select the adequate entry in the combo-box, we also send IE a combo-box CloseUp notification, because that's what IE has subscribed for to know we want this kind of Html. This behavior is known by hints-and-trials.


Conclusion

This article describes a technique to gain access to the full-fledge Save As Html feature exposed by IE. I have never seen an article about this topic on the net, whereas it's easy to figure out it is a compelling feature for developers building web applications.

Files you may use from the source code provided :
  • SaveAsWebBrowser.h,.cpp : hook procedure, fill the dialog box data
  • SaveAsThread.h, .cpp : auxiliary thread for synchronization with IE
The application is just a simple MFC-based CHtmlView application embedding the web browser control.


Stephane Rodriguez-
September 1, 2002.

Home
Blog