metanohi/site/projects/noncrawl/index.org

806 B
Executable File

noncrawl

#&summary A links-centric webcrawler #&

noncrawl

#&img;url=img/noncrawl-logo-192.png, float=right, alt=noncrawl logo, \ #& width=192, height=192

noncrawl is a crawler that saves only links. It crawls the web but does not attempt to do everything. Instead, its only purpose is to recursively check sites for links to other sites, which are then also checked for links to other sites, etc. So, if site Y links to site X, that piece of information is saved, and if site X has not been checked yet, it will be crawled just like site Y was.

DOWNLOAD.

noncrawl has its branches at Gitorious; see http://gitorious.org/noncrawl. A bugtracker can be found at Launchpad; see http://launchpad.net/noncrawl.