24 lines
806 B
Org Mode
24 lines
806 B
Org Mode
|
#+title: noncrawl
|
||
|
#&summary
|
||
|
A links-centric webcrawler
|
||
|
#&
|
||
|
#+license: bysa, page
|
||
|
#+license: gpl 3+, program
|
||
|
|
||
|
* noncrawl
|
||
|
|
||
|
#&img;url=img/noncrawl-logo-192.png, float=right, alt=noncrawl logo, \
|
||
|
#& width=192, height=192
|
||
|
|
||
|
noncrawl is a crawler that saves only links. It crawls the web but does not
|
||
|
attempt to do everything. Instead, its only purpose is to recursively check
|
||
|
sites for links to other sites, which are then also checked for links to other
|
||
|
sites, etc. So, if site Y links to site X, that piece of information is saved,
|
||
|
and if site X has not been checked yet, it will be crawled just like site Y
|
||
|
was.
|
||
|
|
||
|
[[noncrawl-0.1.tar.gz][DOWNLOAD]].
|
||
|
|
||
|
noncrawl has its branches at Gitorious; see [[http://gitorious.org/noncrawl]]. A
|
||
|
bugtracker can be found at Launchpad; see [[http://launchpad.net/noncrawl]].
|