Problems when uploading to a server.

Dec 1, 2007 at 4:38 PM
I see that several people have mentioned uploading the webpages created to a Sharepoint server. I'm trying to upload them to an ordinary one.
I've exported the pages, and they run as expected from my memory stick (in both IE and Firefox with the IE tab).
Once uploaded, some of them have been garbled, and I can't quite work out why.

(See )

Some of the pages that work are single sections with no extra pages; some aren't. Some were created in 2003 and then migrated to 2007; some weren't. And so on. I can't work out anything that either all those that work have in common, nor all those that don't.
It must be something, though, as I've tried re-exporting, and it's the same pages that work/don't work.

As I say, from the memory stick - no problem. It's the uploading that's doing something.

Does anyone have any ideas?
Feb 25, 2008 at 10:44 PM
Emmadw: I had the same problem as you.
I did some experimentation on this.
The issue appears to center around the .mht content.

I set up an Apache Server 2.2.8 for WinXP and played around with hosting/serving the files to myself. For best compatibility with .mht I stuck with IE6 SP2 for the test browser. I did this so I didn't have to worry about whether something was wrong with the FTP transfer - much simpler to handle things locally, but maybe not as controlled.

Displays well:
file://C:\Documents and Settings\Ross\My Documents\OneNote Notebooks\Zoology\_onefiles\0\page0.mht

Does not display well:
(looks very similar in my browser to your pages, Emmadw)

These are the same files, accessed through the filesystem, and then accessed through my local apache http server. When accessed over HTTP, the formatting goes out the window, character errors are added, and some of the file's code is visible in the browser.

Ex: MIME-Version: 1.0 Content-Location: file:///C:/D13228D0/page0.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"

Other files, often ./#/page1.mht, ./#/page2.mht, etc seem to work fine. (but very rarely ./#/page0.mht)
Played around with the headers of various .mht files for a few hours to no result
...until suddenly and inexplicably the "bad" pages (./0/page0.mht) appeared fine.
I'm not sure why.
I then noticed that when pages worked well, the title bar would say
"mhtml:http://localhost/_onefiles/0/page1.mht" instead of

so, I started putting "mhtml:" in front of html:// on my local server, and every time I did it, that specific .mht page would load fine, and I would no longer be able to get it to load incorrectly anymore.

Emmadw: This kind of works for you too.

However, it doesn't keep the formatting when I remove the mhtml: prefix.
Other notes:

My opinion: Internet Explorer probably isn't associating the .mht extension very well. Specifically something is strange with the MIME types. I can't grasp it, and don't know HTML/Browsers well enough to understand what's going on. It would be nice to be able to force an ":mhtml"-like state remotely. I think that's what the .mht extension is supposed to do, but is not.

I want to find a way to rid ourselves of the MHTML format entirely, and convert to HTML, keeping relative links intact.
Feb 26, 2008 at 1:58 AM
Edited Feb 26, 2008 at 2:52 AM
I spent some more time on this after taking a zoology exam.

Basically, after separating all the .mht files that render correctly from the ones that don't, I realized all the ones that work have graphics in them. When the file has a graphic, the file is introduced with the MIME Content-Type: multipart/related. Then the file is seperated into several documents (The page itself, the image - translated to ASCII, and an XML filelist).

When there are graphics, the whole file is "multipart/related", with sub-sections in the formats "text/html", "image/png", "text/xml". We don't really need to have all those subsections, just one works fine.

With the non-working .mht files, we can add an encapsulation that makes the whole file a single text/html subsection of a multipart/related MHTML document. At the very beginning of the document, immediately after the first line "MIME-Version: 1.0", we can add this:

Content-Type: multipart/related; boundary="----=_NextPart_FU.BAR"

This document is a Single File Web Page, also known as a Web Archive file. If you are seeing this message, your browser

or editor doesn't support Web Archive files. Please download a browser that supports Web Archive, such as Windows®

Internet Explorer®.


Where it says FU.BAR, put anything, but keep it identical in both/all places in that document. Also keep it unique from all the other documents.
Save it, and it will now be rendered correctly on the web.

Anyways, it should end up looking like this:
Yes, some of the text is still screwed up, but it doesn't make anything illegible, just less presentable. Also, some of the colors didn't make it to the web browser. ---> Upload over FTP should be in binary to avoid this.

I really, really, wish Microsoft would use open formats.
Feb 26, 2008 at 1:58 AM
Edited Feb 26, 2008 at 1:59 AM
...double post...
Mar 20, 2008 at 3:46 PM
Oh, thanks for this. I'll have a look - I can see that your files are working!

For now, I'll stick to saving it to a memory stick, as I want to go home & that's quicker than either editing files or shoving a picture on every page - but when I get back, I'll try that !