Update site.
This commit is contained in:
		@@ -1,14 +1,10 @@
 | 
			
		||||
#+title: Old junk code: Word finder
 | 
			
		||||
#+summary: Less than perfect C code 
 | 
			
		||||
#+license: wtfpl, unless otherwise noted
 | 
			
		||||
#+startup: showall
 | 
			
		||||
#&toc
 | 
			
		||||
---
 | 
			
		||||
abstract: Less than perfect C code 
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
# Old junk code: Word finder
 | 
			
		||||
 | 
			
		||||
* Old junk code: Word finder
 | 
			
		||||
 | 
			
		||||
#+caption: Based on [[https://commons.wikimedia.org/wiki/File:2001-91-1_Computer,_Laptop,_Pentagon_(5891422370).jpg][this]], CC BY 2.0
 | 
			
		||||
#&img;url=sadcomputer.png, float=right
 | 
			
		||||
.jpg), CC BY 2.0](sadcomputer.png)
 | 
			
		||||
 | 
			
		||||
If you ever get tired of looking at your own junk code, take a look at this.
 | 
			
		||||
 | 
			
		||||
@@ -39,46 +35,46 @@ store the list of words on the stack instead of in memory, so words with length
 | 
			
		||||
In any case, a word length of 10 would require about 100 MB, a word length of 11
 | 
			
		||||
about 1.2 GB, a word length of 12 about 15.6 GB, and a word length of 17 (like
 | 
			
		||||
"inconspicuousness") about 16,5 Petabytes (16500000 GB). That's 6,5 Petabytes
 | 
			
		||||
*more* than [[http://archive.org/web/petabox.php][what the Internet Archive uses]] to store millions of websites, books,
 | 
			
		||||
video and audio.
 | 
			
		||||
*more* than [what the Internet Archive uses](http://archive.org/web/petabox.php)
 | 
			
		||||
to store millions of websites, books, video and audio.
 | 
			
		||||
 | 
			
		||||
So perhaps neither my algorithm nor my implementation was that good.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
* The code
 | 
			
		||||
## The code
 | 
			
		||||
 | 
			
		||||
Note that this code doesn't actually compile, because of all the wrong
 | 
			
		||||
code. However, it did compile back in 2008 which means that either I added the
 | 
			
		||||
wrong code after I had compiled it, or I used an overfriendly compiler (I don't
 | 
			
		||||
remember which compiler it was, but it ran on Windows). I have run the old
 | 
			
		||||
executable with ~wine~, and that works.
 | 
			
		||||
executable with `wine`, and that works.
 | 
			
		||||
 | 
			
		||||
It's not necesarry to know C to laugh at this code, but it helps.
 | 
			
		||||
 | 
			
		||||
We'll start with some basic ~#include~s.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
#include <stdio.h>
 | 
			
		||||
#include <stdlib.h>
 | 
			
		||||
#include <string.h>
 | 
			
		||||
#include <ctype.h>
 | 
			
		||||
#include <math.h>
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
So far, so good. Then the global variables with descriptive names. And let's
 | 
			
		||||
declare four strings of length 0 to be statically allocated, because we'll just
 | 
			
		||||
extend them later on...?
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
char os[0],s[0],r[0],t[0];
 | 
			
		||||
int l,c,rc,k,sk,i,ii,iii,ri;
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The next step is to define our own version of C's builtin ~strstr~ function
 | 
			
		||||
The next step is to define our own version of C's builtin `strstr` function
 | 
			
		||||
(almost). I was used to PHP, so I wanted the same return values as PHP's
 | 
			
		||||
~strpos~.
 | 
			
		||||
`strpos`.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
int strpos (const char *haystack, const char *needle) {
 | 
			
		||||
  int i;
 | 
			
		||||
 | 
			
		||||
@@ -92,14 +88,14 @@ int strpos (const char *haystack, const char *needle) {
 | 
			
		||||
 | 
			
		||||
  return -1;
 | 
			
		||||
}
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Then it's time for the main function. We don't want to separate it into
 | 
			
		||||
auxiliary functions, because that's just ugly!
 | 
			
		||||
 | 
			
		||||
Indentation? Too much wastes too much space.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
int main(int argc, char *argv[])
 | 
			
		||||
{
 | 
			
		||||
 if (argc>1) {
 | 
			
		||||
@@ -114,7 +110,7 @@ int main(int argc, char *argv[])
 | 
			
		||||
 for(i=0;s[i];i++) {
 | 
			
		||||
 s[i]=tolower(s[i]);
 | 
			
		||||
 }
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Wait, what? We use ~strcpy~ to copy the string ~argv[1]~, which contains the
 | 
			
		||||
word we want to permute, into the statically allocated ~os~ with length 0? Or we
 | 
			
		||||
@@ -123,51 +119,51 @@ That's... not good.
 | 
			
		||||
 | 
			
		||||
At least these two lines aren't that bad.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 l=strlen(s);
 | 
			
		||||
 c=pow(l,l);
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
But then begins the actual permutation generation logic. I have tried to
 | 
			
		||||
re-understand it, with no success.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 rc=1;
 | 
			
		||||
 i=0;
 | 
			
		||||
 while (i<l-1) {
 | 
			
		||||
 rc=rc*(l-i);
 | 
			
		||||
 i++;
 | 
			
		||||
 }
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
While we're at it, why not declare two to-be-statically-allocated arrays with
 | 
			
		||||
dynamically-generated ints as lengths?
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 int ca[l];
 | 
			
		||||
 char ra[rc][l+1];
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And then some more assignments and ~while~ loops...
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 ri=0;
 | 
			
		||||
 i=0;
 | 
			
		||||
 while (i<c) {
 | 
			
		||||
 k=1;
 | 
			
		||||
 ii=0;
 | 
			
		||||
 while (ii<l && k==1) {
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
This formula does something. I'm not sure what.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 ca[ii]=floor(i/pow(l,l-ii-1))-floor(i/pow(l,l-ii))*l;
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
More ~while~ loops, now also with ~if~ statements.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 iii=0;
 | 
			
		||||
 while (iii<ii) {
 | 
			
		||||
 if (ca[ii]==ca[iii]) {k=0;}
 | 
			
		||||
@@ -180,27 +176,27 @@ More ~while~ loops, now also with ~if~ statements.
 | 
			
		||||
 ii=0;
 | 
			
		||||
 while (ii<l) {
 | 
			
		||||
 strncpy(t,s+ca[ii],1);
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Let's concatenate ~t~ onto ~ra[ri]~, a string which hardly exists due to the
 | 
			
		||||
~char ra[rc][l+1];~ magic above.
 | 
			
		||||
Let's concatenate `t` onto ~ra[ri]~, a string which hardly exists due to the
 | 
			
		||||
`char ra[rc][l+1];` magic above.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 strcat(ra[ri],t);
 | 
			
		||||
 ii++;
 | 
			
		||||
 }
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And why not concatenate an end-of-string mark onto a string which, if it
 | 
			
		||||
doesn't have an end-of-string mark, will make ~strcat~ fail miserably?
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 strcat(ra[ri],"\0");
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And then more junk.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 sk=1;
 | 
			
		||||
 ii=0;
 | 
			
		||||
 while (ii<ri && sk==1) {
 | 
			
		||||
@@ -215,51 +211,51 @@ And then more junk.
 | 
			
		||||
 i++;
 | 
			
		||||
 }
 | 
			
		||||
 //printf("\nOrd: %s\nOrdl\x91ngde: %d\nOrdkombinationer: %d\n",os,l,ri);
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Phew... At this point, I'm certain that ~ra~ is supposed to be an array of all
 | 
			
		||||
word permutations. So let's open our dictionary "ord.txt" and look for matches.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 FILE *f;
 | 
			
		||||
 char wrd[128];
 | 
			
		||||
 if (f=fopen("ord.txt","r")) {
 | 
			
		||||
 FILE *fw;
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Everything is written both to output.txt *and* standard out. Anything else would
 | 
			
		||||
be stupid.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 fw=fopen("output.txt","w");
 | 
			
		||||
 printf("Ord dannet af \"%s\":\n\n",os);
 | 
			
		||||
 fprintf(fw,"Ord dannet af \"%s\":\n\n",os);
 | 
			
		||||
 int wc=0;
 | 
			
		||||
 while(!feof(f)) {
 | 
			
		||||
 if(fgets(wrd,126,f)) {
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The words each end with a newline, so let's replace the newline with an
 | 
			
		||||
end-of-string mark.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 wrd[strlen(wrd)-1]=0;
 | 
			
		||||
 //printf("%s\n",wrd);
 | 
			
		||||
 k=0;
 | 
			
		||||
 ii=0;
 | 
			
		||||
 while (ii<ri && k==0) {
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The magical core of the matching logic, using our own ~strpos~:
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 if (strpos(ra[ii],wrd)>-1) {k=1;}
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
If ~k == 1~, something good happens. But it doesn't happen at once for some
 | 
			
		||||
reason.
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC c
 | 
			
		||||
```c
 | 
			
		||||
 ii++;
 | 
			
		||||
 }
 | 
			
		||||
 if (k==1) {
 | 
			
		||||
@@ -277,17 +273,17 @@ reason.
 | 
			
		||||
 }
 | 
			
		||||
 return 0;
 | 
			
		||||
}
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And that's my pretty C code.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
* The SML equivalent
 | 
			
		||||
## The SML equivalent
 | 
			
		||||
 | 
			
		||||
To make my inefficient algorithm a bit clearer, I have made a few SML functions
 | 
			
		||||
to do the same as above:
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC ocaml
 | 
			
		||||
```ocaml
 | 
			
		||||
open List
 | 
			
		||||
 | 
			
		||||
(* Removes an element from a list. *)
 | 
			
		||||
@@ -334,11 +330,11 @@ fun findMatchingWords word wordList =
 | 
			
		||||
                   exists (fn word => word = testWord)
 | 
			
		||||
                          wordPermutations) wordList
 | 
			
		||||
    end
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
As well as some SML functions to calculate the number of permutations and bytes:
 | 
			
		||||
 | 
			
		||||
#+BEGIN_SRC ocaml
 | 
			
		||||
```ocaml
 | 
			
		||||
(* Calculates the factorial. *)
 | 
			
		||||
fun factorial 0 = 1
 | 
			
		||||
  | factorial n = n * factorial (n - 1)
 | 
			
		||||
@@ -364,14 +360,12 @@ fun nPermutations len = foldl op+ 0 (map (fn n => factorial n * binomc len n)
 | 
			
		||||
fun nSize len = 8 * len + foldl op+ 0 (
 | 
			
		||||
                map (fn n => (n + 1) * factorial n * binomc len n)
 | 
			
		||||
                    (upTo 1 len))
 | 
			
		||||
#+END_SRC
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
* The alternative
 | 
			
		||||
## The alternative
 | 
			
		||||
 | 
			
		||||
Preprocess the dictionary into a clever data structure and don't use up all the
 | 
			
		||||
memory.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#&line
 | 
			
		||||
 | 
			
		||||
Originally published [[http://dikutal.dk/artikler/old-junk-code-word-finder][here]].
 | 
			
		||||
Originally published
 | 
			
		||||
[here](http://dikutal.metanohi.name/artikler/old-junk-code-word-finder).
 | 
			
		||||
		Reference in New Issue
	
	Block a user