Update site.
This commit is contained in:
@@ -1,14 +1,10 @@
|
||||
#+title: Old junk code: Word finder
|
||||
#+summary: Less than perfect C code
|
||||
#+license: wtfpl, unless otherwise noted
|
||||
#+startup: showall
|
||||
#&toc
|
||||
---
|
||||
abstract: Less than perfect C code
|
||||
---
|
||||
|
||||
# Old junk code: Word finder
|
||||
|
||||
* Old junk code: Word finder
|
||||
|
||||
#+caption: Based on [[https://commons.wikimedia.org/wiki/File:2001-91-1_Computer,_Laptop,_Pentagon_(5891422370).jpg][this]], CC BY 2.0
|
||||
#&img;url=sadcomputer.png, float=right
|
||||
.jpg), CC BY 2.0](sadcomputer.png)
|
||||
|
||||
If you ever get tired of looking at your own junk code, take a look at this.
|
||||
|
||||
@@ -39,46 +35,46 @@ store the list of words on the stack instead of in memory, so words with length
|
||||
In any case, a word length of 10 would require about 100 MB, a word length of 11
|
||||
about 1.2 GB, a word length of 12 about 15.6 GB, and a word length of 17 (like
|
||||
"inconspicuousness") about 16,5 Petabytes (16500000 GB). That's 6,5 Petabytes
|
||||
*more* than [[http://archive.org/web/petabox.php][what the Internet Archive uses]] to store millions of websites, books,
|
||||
video and audio.
|
||||
*more* than [what the Internet Archive uses](http://archive.org/web/petabox.php)
|
||||
to store millions of websites, books, video and audio.
|
||||
|
||||
So perhaps neither my algorithm nor my implementation was that good.
|
||||
|
||||
|
||||
* The code
|
||||
## The code
|
||||
|
||||
Note that this code doesn't actually compile, because of all the wrong
|
||||
code. However, it did compile back in 2008 which means that either I added the
|
||||
wrong code after I had compiled it, or I used an overfriendly compiler (I don't
|
||||
remember which compiler it was, but it ran on Windows). I have run the old
|
||||
executable with ~wine~, and that works.
|
||||
executable with `wine`, and that works.
|
||||
|
||||
It's not necesarry to know C to laugh at this code, but it helps.
|
||||
|
||||
We'll start with some basic ~#include~s.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <ctype.h>
|
||||
#include <math.h>
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
So far, so good. Then the global variables with descriptive names. And let's
|
||||
declare four strings of length 0 to be statically allocated, because we'll just
|
||||
extend them later on...?
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
char os[0],s[0],r[0],t[0];
|
||||
int l,c,rc,k,sk,i,ii,iii,ri;
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
The next step is to define our own version of C's builtin ~strstr~ function
|
||||
The next step is to define our own version of C's builtin `strstr` function
|
||||
(almost). I was used to PHP, so I wanted the same return values as PHP's
|
||||
~strpos~.
|
||||
`strpos`.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
int strpos (const char *haystack, const char *needle) {
|
||||
int i;
|
||||
|
||||
@@ -92,14 +88,14 @@ int strpos (const char *haystack, const char *needle) {
|
||||
|
||||
return -1;
|
||||
}
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
Then it's time for the main function. We don't want to separate it into
|
||||
auxiliary functions, because that's just ugly!
|
||||
|
||||
Indentation? Too much wastes too much space.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
if (argc>1) {
|
||||
@@ -114,7 +110,7 @@ int main(int argc, char *argv[])
|
||||
for(i=0;s[i];i++) {
|
||||
s[i]=tolower(s[i]);
|
||||
}
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
Wait, what? We use ~strcpy~ to copy the string ~argv[1]~, which contains the
|
||||
word we want to permute, into the statically allocated ~os~ with length 0? Or we
|
||||
@@ -123,51 +119,51 @@ That's... not good.
|
||||
|
||||
At least these two lines aren't that bad.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
l=strlen(s);
|
||||
c=pow(l,l);
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
But then begins the actual permutation generation logic. I have tried to
|
||||
re-understand it, with no success.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
rc=1;
|
||||
i=0;
|
||||
while (i<l-1) {
|
||||
rc=rc*(l-i);
|
||||
i++;
|
||||
}
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
While we're at it, why not declare two to-be-statically-allocated arrays with
|
||||
dynamically-generated ints as lengths?
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
int ca[l];
|
||||
char ra[rc][l+1];
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
And then some more assignments and ~while~ loops...
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
ri=0;
|
||||
i=0;
|
||||
while (i<c) {
|
||||
k=1;
|
||||
ii=0;
|
||||
while (ii<l && k==1) {
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
This formula does something. I'm not sure what.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
ca[ii]=floor(i/pow(l,l-ii-1))-floor(i/pow(l,l-ii))*l;
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
More ~while~ loops, now also with ~if~ statements.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
iii=0;
|
||||
while (iii<ii) {
|
||||
if (ca[ii]==ca[iii]) {k=0;}
|
||||
@@ -180,27 +176,27 @@ More ~while~ loops, now also with ~if~ statements.
|
||||
ii=0;
|
||||
while (ii<l) {
|
||||
strncpy(t,s+ca[ii],1);
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
Let's concatenate ~t~ onto ~ra[ri]~, a string which hardly exists due to the
|
||||
~char ra[rc][l+1];~ magic above.
|
||||
Let's concatenate `t` onto ~ra[ri]~, a string which hardly exists due to the
|
||||
`char ra[rc][l+1];` magic above.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
strcat(ra[ri],t);
|
||||
ii++;
|
||||
}
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
And why not concatenate an end-of-string mark onto a string which, if it
|
||||
doesn't have an end-of-string mark, will make ~strcat~ fail miserably?
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
strcat(ra[ri],"\0");
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
And then more junk.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
sk=1;
|
||||
ii=0;
|
||||
while (ii<ri && sk==1) {
|
||||
@@ -215,51 +211,51 @@ And then more junk.
|
||||
i++;
|
||||
}
|
||||
//printf("\nOrd: %s\nOrdl\x91ngde: %d\nOrdkombinationer: %d\n",os,l,ri);
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
Phew... At this point, I'm certain that ~ra~ is supposed to be an array of all
|
||||
word permutations. So let's open our dictionary "ord.txt" and look for matches.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
FILE *f;
|
||||
char wrd[128];
|
||||
if (f=fopen("ord.txt","r")) {
|
||||
FILE *fw;
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
Everything is written both to output.txt *and* standard out. Anything else would
|
||||
be stupid.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
fw=fopen("output.txt","w");
|
||||
printf("Ord dannet af \"%s\":\n\n",os);
|
||||
fprintf(fw,"Ord dannet af \"%s\":\n\n",os);
|
||||
int wc=0;
|
||||
while(!feof(f)) {
|
||||
if(fgets(wrd,126,f)) {
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
The words each end with a newline, so let's replace the newline with an
|
||||
end-of-string mark.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
wrd[strlen(wrd)-1]=0;
|
||||
//printf("%s\n",wrd);
|
||||
k=0;
|
||||
ii=0;
|
||||
while (ii<ri && k==0) {
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
The magical core of the matching logic, using our own ~strpos~:
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
if (strpos(ra[ii],wrd)>-1) {k=1;}
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
If ~k == 1~, something good happens. But it doesn't happen at once for some
|
||||
reason.
|
||||
|
||||
#+BEGIN_SRC c
|
||||
```c
|
||||
ii++;
|
||||
}
|
||||
if (k==1) {
|
||||
@@ -277,17 +273,17 @@ reason.
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
And that's my pretty C code.
|
||||
|
||||
|
||||
* The SML equivalent
|
||||
## The SML equivalent
|
||||
|
||||
To make my inefficient algorithm a bit clearer, I have made a few SML functions
|
||||
to do the same as above:
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
```ocaml
|
||||
open List
|
||||
|
||||
(* Removes an element from a list. *)
|
||||
@@ -334,11 +330,11 @@ fun findMatchingWords word wordList =
|
||||
exists (fn word => word = testWord)
|
||||
wordPermutations) wordList
|
||||
end
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
As well as some SML functions to calculate the number of permutations and bytes:
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
```ocaml
|
||||
(* Calculates the factorial. *)
|
||||
fun factorial 0 = 1
|
||||
| factorial n = n * factorial (n - 1)
|
||||
@@ -364,14 +360,12 @@ fun nPermutations len = foldl op+ 0 (map (fn n => factorial n * binomc len n)
|
||||
fun nSize len = 8 * len + foldl op+ 0 (
|
||||
map (fn n => (n + 1) * factorial n * binomc len n)
|
||||
(upTo 1 len))
|
||||
#+END_SRC
|
||||
```
|
||||
|
||||
* The alternative
|
||||
## The alternative
|
||||
|
||||
Preprocess the dictionary into a clever data structure and don't use up all the
|
||||
memory.
|
||||
|
||||
|
||||
#&line
|
||||
|
||||
Originally published [[http://dikutal.dk/artikler/old-junk-code-word-finder][here]].
|
||||
Originally published
|
||||
[here](http://dikutal.metanohi.name/artikler/old-junk-code-word-finder).
|
||||
Reference in New Issue
Block a user