From:  "Rodrigo Meza" <rodrigo.meza@gmail.com>
Date:  12 Jan 2007 07:12:01 Hong Kong Time
Newsgroup:  news.mozilla.org/netscape.public.mozilla.jseng
Subject:  

interpreting javascript and html, all together?

NNTP-Posting-Host:  89.129.201.155

Hello Everyone
   For a project I am working on, I need  to retrieve links from html
documents. The easy part is to obtain 'plain' links like , but when those links are
javascript'ized, the only robust solution needs to load the javascript
and dom document representation in the same way that browsers do. For
example, links in the form:

   

  First I though that using spidermonkey (the mozilla javascript
interpreter) should be enough, but in that case, I dont have the
document structure elements (like document, window, document.history,
document.form.element, etc), so I tried parsing the document using a
library to build a tree representation of it, but that leads me to the
same problem again, that is, I have to represent all tree nodes as
javascript entities.

  Anybody here have worked on a similar problem? What tools do you
think I should take a look?

Thanks in advance!

   Rodrigo.