Programming discussions since 1842

Monday, December 24, 2012

What Databases Fix

Paul Graham showed an interesting way to compare programming languages: to describe each in terms of the problem it fixes. I have found out that this also works well for databases:

Oracle's DB: There are no enterprise databases

MySQL: Oracle isn't open-source.

MariaDB: We like MySQL but Oracle is evil.

PostgreSQL: MySQL doesn't have enough options.

SQLite: Let's just embed everything. Also, 4 datatypes should be enough for everyone.

MongoDB: Why do we need joins and schemas?

CouchDB: Why do we need collections?

Redis: Why do we need documents?

Memcached: Why do we need harddrives?

Neo4j: SQL doesn't have enough relations.

Bigtable: MongoDB isn't webscale enough.

Hbase: Bigtable isn't open-source.

Cassandra: Bigtable wasn't made at Facebook.

Riak: Cassandra wasn't written in Erlang.

OrientDB: Let's include everything in the same database!

Monday, August 8, 2011

Trigram string searching

String searching and sorting is another one of my unexplainable favorite programming pasttimes. I was reading the interesting but tragically short article on the subject on wikipedia when I stumbled upon a special technique for fuzzy string searching known as Trigram string searching. What's that? Well, this is what the (entire) article had to say:
Trigram search is a powerful method of searching for text when the exact syntax or spelling of the target object is not precisely known. It finds objects which match the maximum number of three-character strings in the entered search terms, i.e. near matches. A threshold can be specified as a cutoff point, after which a result is no longer regarded as a match.
Bit short for such an important technique. This is the algorithm that manages to successfully turn up the search results for "Programming" when you spelled it "Porgraming".

The lack of a real implementation to go with the article irritated me. "That can't require very much coding, can it?"

Nope, It didn't!

Saturday, August 6, 2011

Challenge: Read a file of unknown size

I'm not sure this can be considered a "real world programming problem" but it is a "fun" and important challenge for any would be programmer and a good measurement of whether you have a good grasp of memory management.

The task is as follows:
You have a text file containing an unknown number of rows of plain text. You are to parse this file and read each row into a string in a two-dimensional array (aka a string matrix). At your disposal you have a statically typed language, like C. Each row may be of varying length.

As for the restrictions, you may only traverse the file once. You may not count the number of rows manually. You may not use a linked-list with strings.

Are you up for it? Grab this textfile and go!

The solution is below the jump. But you solved it yourself, right?