metanohi/site/projects/noncrawl/index.org

24 lines
806 B
Org Mode
Raw Normal View History

#+title: noncrawl
#&summary
A links-centric webcrawler
#&
#+license: bysa, page
#+license: gpl 3+, program
* noncrawl
#&img;url=img/noncrawl-logo-192.png, float=right, alt=noncrawl logo, \
#& width=192, height=192
noncrawl is a crawler that saves only links. It crawls the web but does not
attempt to do everything. Instead, its only purpose is to recursively check
sites for links to other sites, which are then also checked for links to other
sites, etc. So, if site Y links to site X, that piece of information is saved,
and if site X has not been checked yet, it will be crawled just like site Y
was.
[[noncrawl-0.1.tar.gz][DOWNLOAD]].
noncrawl has its branches at Gitorious; see [[http://gitorious.org/noncrawl]]. A
bugtracker can be found at Launchpad; see [[http://launchpad.net/noncrawl]].