24 lines
806 B
Org Mode
Executable File
24 lines
806 B
Org Mode
Executable File
#+title: noncrawl
|
|
#&summary
|
|
A links-centric webcrawler
|
|
#&
|
|
#+license: bysa, page
|
|
#+license: gpl 3+, program
|
|
|
|
* noncrawl
|
|
|
|
#&img;url=img/noncrawl-logo-192.png, float=right, alt=noncrawl logo, \
|
|
#& width=192, height=192
|
|
|
|
noncrawl is a crawler that saves only links. It crawls the web but does not
|
|
attempt to do everything. Instead, its only purpose is to recursively check
|
|
sites for links to other sites, which are then also checked for links to other
|
|
sites, etc. So, if site Y links to site X, that piece of information is saved,
|
|
and if site X has not been checked yet, it will be crawled just like site Y
|
|
was.
|
|
|
|
[[noncrawl-0.1.tar.gz][DOWNLOAD]].
|
|
|
|
noncrawl has its branches at Gitorious; see [[http://gitorious.org/noncrawl]]. A
|
|
bugtracker can be found at Launchpad; see [[http://launchpad.net/noncrawl]].
|