Wednesday, November 28, 2007

Extracting data from a webpage using lynx

So I am trying to parse out data from a log file on my router. Basically I want to extract the log file automagically and export it to a generic text document. Basically appending the new log file to the end of the last.

So far I have install lynx to view a webpage via the terminal.

sudo apt-get install lynx

Then to extract file I call the following command:

lynx -dump http://192.168.2.1/log.stm > log

I know that the > should actually be >> to append to the end. But the issue is I need to log in to the router for it to work....

Any help would be greatly appreciated.

1 comment:

Jeremiah said...

You'll probably want to write a script to do this in your favorite scripting language (I'd go with python, myself).

General idea would be:

Scrape the page looking for login boxes.
Enter info.
Log in.
Grab the log (or just the info out of the log).

Feel free to shoot me an email at jeremiah (dot) dodds @t gmail if you need more info, I'd be glad to help.