Saturday, September 3, 2011

Extract URLs from a downloaded web page

Computers > Programming > Windows Batch files

Download: parse_http.zip

How to use parse_http.exe

parse_http.exe is a command-line utility that will help you extract URLs from downloaded webpages.

It does that, by parsing a file using the " (double-quote) as a delimiter and print parsed phrases in different line.

E.g.
if a page contains
<a href="http://ulr.com/">Click me</a>

by running

:>\ parse_http.exe <filename>

it will print out:

<a href=
http://url.com/
>Click me</a>

By directing results to a file and then using (e.g. for http URLs)

find "http" <parsed filename>

you can extract urls from a file.

No comments:

Post a Comment