Update site.

This commit is contained in:
2016-09-02 11:47:33 +02:00
parent 86b86ccdc0
commit 327ac4250c
32 changed files with 821 additions and 943 deletions

View File

@@ -1,14 +1,10 @@
#+title: Old junk code: Word finder
#+summary: Less than perfect C code
#+license: wtfpl, unless otherwise noted
#+startup: showall
#&toc
---
abstract: Less than perfect C code
---
# Old junk code: Word finder
* Old junk code: Word finder
#+caption: Based on [[https://commons.wikimedia.org/wiki/File:2001-91-1_Computer,_Laptop,_Pentagon_(5891422370).jpg][this]], CC BY 2.0
#&img;url=sadcomputer.png, float=right
![Based on [this](https://commons.wikimedia.org/wiki/File:2001-91-1_Computer,_Laptop,_Pentagon_(5891422370).jpg), CC BY 2.0](sadcomputer.png)
If you ever get tired of looking at your own junk code, take a look at this.
@@ -39,46 +35,46 @@ store the list of words on the stack instead of in memory, so words with length
In any case, a word length of 10 would require about 100 MB, a word length of 11
about 1.2 GB, a word length of 12 about 15.6 GB, and a word length of 17 (like
"inconspicuousness") about 16,5 Petabytes (16500000 GB). That's 6,5 Petabytes
*more* than [[http://archive.org/web/petabox.php][what the Internet Archive uses]] to store millions of websites, books,
video and audio.
*more* than [what the Internet Archive uses](http://archive.org/web/petabox.php)
to store millions of websites, books, video and audio.
So perhaps neither my algorithm nor my implementation was that good.
* The code
## The code
Note that this code doesn't actually compile, because of all the wrong
code. However, it did compile back in 2008 which means that either I added the
wrong code after I had compiled it, or I used an overfriendly compiler (I don't
remember which compiler it was, but it ran on Windows). I have run the old
executable with ~wine~, and that works.
executable with `wine`, and that works.
It's not necesarry to know C to laugh at this code, but it helps.
We'll start with some basic ~#include~s.
#+BEGIN_SRC c
```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
#+END_SRC
```
So far, so good. Then the global variables with descriptive names. And let's
declare four strings of length 0 to be statically allocated, because we'll just
extend them later on...?
#+BEGIN_SRC c
```c
char os[0],s[0],r[0],t[0];
int l,c,rc,k,sk,i,ii,iii,ri;
#+END_SRC
```
The next step is to define our own version of C's builtin ~strstr~ function
The next step is to define our own version of C's builtin `strstr` function
(almost). I was used to PHP, so I wanted the same return values as PHP's
~strpos~.
`strpos`.
#+BEGIN_SRC c
```c
int strpos (const char *haystack, const char *needle) {
int i;
@@ -92,14 +88,14 @@ int strpos (const char *haystack, const char *needle) {
return -1;
}
#+END_SRC
```
Then it's time for the main function. We don't want to separate it into
auxiliary functions, because that's just ugly!
Indentation? Too much wastes too much space.
#+BEGIN_SRC c
```c
int main(int argc, char *argv[])
{
if (argc>1) {
@@ -114,7 +110,7 @@ int main(int argc, char *argv[])
for(i=0;s[i];i++) {
s[i]=tolower(s[i]);
}
#+END_SRC
```
Wait, what? We use ~strcpy~ to copy the string ~argv[1]~, which contains the
word we want to permute, into the statically allocated ~os~ with length 0? Or we
@@ -123,51 +119,51 @@ That's... not good.
At least these two lines aren't that bad.
#+BEGIN_SRC c
```c
l=strlen(s);
c=pow(l,l);
#+END_SRC
```
But then begins the actual permutation generation logic. I have tried to
re-understand it, with no success.
#+BEGIN_SRC c
```c
rc=1;
i=0;
while (i<l-1) {
rc=rc*(l-i);
i++;
}
#+END_SRC
```
While we're at it, why not declare two to-be-statically-allocated arrays with
dynamically-generated ints as lengths?
#+BEGIN_SRC c
```c
int ca[l];
char ra[rc][l+1];
#+END_SRC
```
And then some more assignments and ~while~ loops...
#+BEGIN_SRC c
```c
ri=0;
i=0;
while (i<c) {
k=1;
ii=0;
while (ii<l && k==1) {
#+END_SRC
```
This formula does something. I'm not sure what.
#+BEGIN_SRC c
```c
ca[ii]=floor(i/pow(l,l-ii-1))-floor(i/pow(l,l-ii))*l;
#+END_SRC
```
More ~while~ loops, now also with ~if~ statements.
#+BEGIN_SRC c
```c
iii=0;
while (iii<ii) {
if (ca[ii]==ca[iii]) {k=0;}
@@ -180,27 +176,27 @@ More ~while~ loops, now also with ~if~ statements.
ii=0;
while (ii<l) {
strncpy(t,s+ca[ii],1);
#+END_SRC
```
Let's concatenate ~t~ onto ~ra[ri]~, a string which hardly exists due to the
~char ra[rc][l+1];~ magic above.
Let's concatenate `t` onto ~ra[ri]~, a string which hardly exists due to the
`char ra[rc][l+1];` magic above.
#+BEGIN_SRC c
```c
strcat(ra[ri],t);
ii++;
}
#+END_SRC
```
And why not concatenate an end-of-string mark onto a string which, if it
doesn't have an end-of-string mark, will make ~strcat~ fail miserably?
#+BEGIN_SRC c
```c
strcat(ra[ri],"\0");
#+END_SRC
```
And then more junk.
#+BEGIN_SRC c
```c
sk=1;
ii=0;
while (ii<ri && sk==1) {
@@ -215,51 +211,51 @@ And then more junk.
i++;
}
//printf("\nOrd: %s\nOrdl\x91ngde: %d\nOrdkombinationer: %d\n",os,l,ri);
#+END_SRC
```
Phew... At this point, I'm certain that ~ra~ is supposed to be an array of all
word permutations. So let's open our dictionary "ord.txt" and look for matches.
#+BEGIN_SRC c
```c
FILE *f;
char wrd[128];
if (f=fopen("ord.txt","r")) {
FILE *fw;
#+END_SRC
```
Everything is written both to output.txt *and* standard out. Anything else would
be stupid.
#+BEGIN_SRC c
```c
fw=fopen("output.txt","w");
printf("Ord dannet af \"%s\":\n\n",os);
fprintf(fw,"Ord dannet af \"%s\":\n\n",os);
int wc=0;
while(!feof(f)) {
if(fgets(wrd,126,f)) {
#+END_SRC
```
The words each end with a newline, so let's replace the newline with an
end-of-string mark.
#+BEGIN_SRC c
```c
wrd[strlen(wrd)-1]=0;
//printf("%s\n",wrd);
k=0;
ii=0;
while (ii<ri && k==0) {
#+END_SRC
```
The magical core of the matching logic, using our own ~strpos~:
#+BEGIN_SRC c
```c
if (strpos(ra[ii],wrd)>-1) {k=1;}
#+END_SRC
```
If ~k == 1~, something good happens. But it doesn't happen at once for some
reason.
#+BEGIN_SRC c
```c
ii++;
}
if (k==1) {
@@ -277,17 +273,17 @@ reason.
}
return 0;
}
#+END_SRC
```
And that's my pretty C code.
* The SML equivalent
## The SML equivalent
To make my inefficient algorithm a bit clearer, I have made a few SML functions
to do the same as above:
#+BEGIN_SRC ocaml
```ocaml
open List
(* Removes an element from a list. *)
@@ -334,11 +330,11 @@ fun findMatchingWords word wordList =
exists (fn word => word = testWord)
wordPermutations) wordList
end
#+END_SRC
```
As well as some SML functions to calculate the number of permutations and bytes:
#+BEGIN_SRC ocaml
```ocaml
(* Calculates the factorial. *)
fun factorial 0 = 1
| factorial n = n * factorial (n - 1)
@@ -364,14 +360,12 @@ fun nPermutations len = foldl op+ 0 (map (fn n => factorial n * binomc len n)
fun nSize len = 8 * len + foldl op+ 0 (
map (fn n => (n + 1) * factorial n * binomc len n)
(upTo 1 len))
#+END_SRC
```
* The alternative
## The alternative
Preprocess the dictionary into a clever data structure and don't use up all the
memory.
#&line
Originally published [[http://dikutal.dk/artikler/old-junk-code-word-finder][here]].
Originally published
[here](http://dikutal.metanohi.name/artikler/old-junk-code-word-finder).