A lot of projects ported from the old metanohi site.
This commit is contained in:
23
site/projects/noncrawl/index.org
Executable file
23
site/projects/noncrawl/index.org
Executable file
@@ -0,0 +1,23 @@
|
||||
#+title: noncrawl
|
||||
#&summary
|
||||
A links-centric webcrawler
|
||||
#&
|
||||
#+license: bysa, page
|
||||
#+license: gpl 3+, program
|
||||
|
||||
* noncrawl
|
||||
|
||||
#&img;url=img/noncrawl-logo-192.png, float=right, alt=noncrawl logo, \
|
||||
#& width=192, height=192
|
||||
|
||||
noncrawl is a crawler that saves only links. It crawls the web but does not
|
||||
attempt to do everything. Instead, its only purpose is to recursively check
|
||||
sites for links to other sites, which are then also checked for links to other
|
||||
sites, etc. So, if site Y links to site X, that piece of information is saved,
|
||||
and if site X has not been checked yet, it will be crawled just like site Y
|
||||
was.
|
||||
|
||||
[[noncrawl-0.1.tar.gz][DOWNLOAD]].
|
||||
|
||||
noncrawl has its branches at Gitorious; see [[http://gitorious.org/noncrawl]]. A
|
||||
bugtracker can be found at Launchpad; see [[http://launchpad.net/noncrawl]].
|
||||
Reference in New Issue
Block a user