How to archive a webpage properly?
9 22 Nov 2016 02:34 by u/IdeaGhost
I'm trying to make a web page archiver sort of like wayback machine.
My though process so far has been:
For the page to look readable i need at least the html and css. So i need to download the html, then lookup for css, then possibly re-write the css references? (ie fix absolute urls to local urls).
And that's it? What about javascript, can't it be a security issue , too leave it there ?
A "safer" alternative to handle javascript would be to remove everything not in a whitelist (ie every tag and attribute not whitelisted). Since javascript can be in a lot of places right?