PS Tip: Parsing HTML from a local File or a String

If you are familiar with Invoke-WebRequest cmdlet then you are aware that you can get a parsed HTML from the requested Web URL. DOM structure of this parsed HTML could be utilised to get access to HTML elements of the web page (see below).

$webRequest = Invoke-WebRequest ""

$webRequest.ParsedHTML.getElementsByTagName("span") | % textContent



What if we have the HTML files locally saved in the computer or in a string? Do we have any mechanism to parse it from a local file/string?


Answer is Yes.

Microsoft provides the HTML document class in .Net framework class library, which has a Write() method to write HTML Document using DOM 2 (Document Object Model Level 2)


Solution 1 : From a string

$html = New-Object -ComObject "HTMLFile"


$html.all.tags("A") | % innerText

Solution 2 : From a file

Similarly we can parse HTML document from a local HTML file.

$html = New-Object -ComObject "HTMLFile"

$html.IHTMLDocument2_write($(Get-Content .\file.html -raw))

$html.all.tags("A") | % innerText



Even the parsed HTML from Invoke-Webrequest has the type HTML Document Class

$WR = Invoke-WebRequest ""


Output is: HTMLDocumentClass




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s