Article ID: 244085 - View products that this article applies to.
This article was previously published under Q244085
This article has been archived. It is offered "as is" and will no longer be updated.
It may be desirable to parse HTML files inside a Web server process in response to a browser page request. However, the WebBrowser control, DHTML Editing Control, MSHTML, and other Internet Explorer components may not function properly in an Active Server Pages (ASP) page or other application run in a Web server application.
Internet Explorer and its associated components were not designed or tested to be used in the constraints of the high-performance, secure user context of a Web server process.
Microsoft does not support the use of the WebBrowser control, DHTML Editing Control, DHTMLED, or MSHTML from inside a Web server (IIS) process. Applications experiencing problems with these components should be redesigned to use alternate technologies.
Most Web applications that attempt to programmatically parse HTML on the server have two steps they need to accomplish: retrieve the HTML from a remote server and parse the HTML.
Retrieving HTML from ServerWhen retrieving HTML from another server, Internet Explorer components should not be used. All of the mechanisms used in Internet Explorer or that use Internet Explorer components -- WebBrowser control, Internet Transfer Control, and so on -- rely ultimately on the services of a low-level client module called WININET to make requests to other Web servers. WININET is not supported in a server context and has a number of known performance problems in this environment. Thus, any server-side application that needs to parse HTML must either store all necessary HTML locally or use a lower-level Networking component or technology, such as the WinSock API or Visual Basic WinSock Control, to retrieve the HTML file before attempting to parse it. For additional information, click the article number below to view the article in the Microsoft Knowledge Base:
(http://support.microsoft.com/kb/238425/EN-US/ )INFO: WinInet Not Supported for Use in Services
In general, downloading data from another Web server adds an extra level of delay to a Web server that is not appreciated in a typical Web application. It is recommended that high performance server applications use an alternative design to avoid this delay.
Parsing HTMLAs with the retrieval of the HTML from other Web servers, but not to as large an extent, parsing HTML is a time expensive operation. Web application developers should consider their design very carefully before creating an application that needs to do parsing on a per-request basis.
Microsoft offers a number of components that developers can re-use to parse HTML in their own applications, either with a user interface (UI) for editing or with a simple parser without a user interface. The DHTML Editing Control is probably the best choice for this job.
However, none of the HTML parsing technologies offered by Microsoft today have been designed or tested to work in a high-performance server context. There may be a number of performance concerns, especially for high-use Web servers. Developers that experience problems with these technologies in these environments should consider writing custom HTML parsing code that is optimized for just the information that the application needs to retrieve from the HTML. This yields the best performance in any scenario.