Tag Archives: database

Database: Modeling Life and Data

24 Jun

tumblr_l4wusszCm61qbs0f7o1_500

nosql:

Michael Will:

The proper representation of life is not tabular, but associative. The structure of life is not relational, but hierarchical. Relation is a poor term that falls far short of capturing dynamic connections. […] Shoehorning life science into relational databases is a very lossy process.

— ☞ Cassandra for Life Science (pdf)

I wholeheartedly agree! But from this perspective it looks like graph databases are the closest to model real life.

If you haven’t checked out nosql on tumblr, please do:

http://nosql.mypopescu.com

Databases are hammers; MapReduce is a screwdriver.

11 Jan

From: http://scienceblogs.com/goodmath/2008/01/databases_are_hammers_mapreduc.php

A bunch of people have sent me links to an article about MapReduce. I’ve hesitated to write about it, because the currently hyped MapReduce stuff was developed, and extensively used by Google, my employer. But the article is really annoying, and deserves a response. So I’m going to be absolutely clear. I am not commenting on this in my capacity as a Google employee. (In fact, I’ve never actually used MapReduce at work!) This is strictly on my own time, and it’s purely my own opinion. If it’s the dumbest thing you’ve ever read, that’s my fault, not Google’s. If it’s the most brilliant thing you’ve ever read, that’s my credit, not Google’s. I wasn’t asked to write this by Google, and I didn’t ask their permission, either. This is just me, the annoying geek behind this blog, writing solely on my own behalf, speaking for no one but me. Got it?

Check It Out:
Good Math, Bad Math : Databases are hammers; MapReduce is a screwdriver.

news: inital xtradb vs innodb benchmarks are in

22 Dec

From: http://www.mysqlperformanceblog.com/2008/12/18/xtradb-benchmarks-15x-gain/

I guess it is first reaction on new storage engine – show me benefits. So there is benchmark I made on one our servers. It is Dell 2950 with 8CPU cores and RAID10 on 6 disks with BBU, and 32GB RAM on board with CentOS 5.2 as OS. This is quite typical server we recommend to run MySQL on. What is important I used Noop IO scheduler, instead of default CFQ. Disclaimer: Please note you may not get similar benefits on less powerful servers, as most important fixes in XtraDB are related to multi-core and multi-disks utilization. Also results may be different if load is CPU bound.

Check It Out:
XtraDB benchmarks – 1.5X gain in IO-bound load | MySQL Performance Blog

news: Percona announes XtraDB, a InnoDB replacement for mysql

17 Dec

From: http://www.mysqlperformanceblog.com/2008/12/16/announcing-percona-xtradb-storage-engine-a-drop-in-replacement-for-standard-innodb/

Today we officially announce our new storage engine, “Percona XtraDB“, which is based on the InnoDB storage engine. It’s 100% backwards-compatible with standard InnoDB, so you can use it as a drop-in replacement in your current environment. It is designed to scale better on modern hardware, and includes a variety of other features useful in high performance environments.

Check It Out:
Announcing Percona XtraDB Storage Engine: a Drop-in Replacement for Standard InnoDB | MySQL Performance Blog

wordpress hooks database

15 Dec

Sadly wordpress doesn’t maintain a complete list of all action and filter hooks. Lucky for us adam brown does

From: http://adambrown.info/p/wp_hooks

If you’re a plugin developer, you know how difficult it can be to figure out which hooks are available. This WordPress hooks database automatically scans each WP build for apply_filters() and do_action() to figure out exactly which hooks are available in each version and where the hooks occur.

If you don’t know what WordPress hooks are for, read the Plugin API .

Check It Out:
WordPress hooks database – action hooks and filters for all wp versions || Adam Brown, BYU Political Science

facebook shares their memcached patches and linux tweaks

12 Dec

Facebook so rocks, they have posted not only their memcached patches, but also explain what they needed to change in linux to properly utilize them. facebook +1, myspace – well they still suck

From: http://www.facebook.com/note.php?note_id=39391378919

If you’ve read anything about scaling large websites, you’ve probably heard about memcached. memcached is a high-performance, distributed memory object caching system. Here at Facebook, we’re likely the world’s largest user of memcached. We use memcached to alleviate database load. memcached is already fast, but we need it to be faster and more efficient than most installations. We use more than 800 servers supplying over 28 terabytes of memory to our users. Over the past year as Facebook’s popularity has skyrocketed, we’ve run into a number of scaling issues. This ever increasing demand has required us to make modifications to both our operating system and memcached to achieve the performance that provides the best possible experience for our users.

Check It Out:
Engineering @ Facebook’s Notes

couchdb: couchdb 101

11 Dec

So after 3-4 days of research and study I’m compiling a list of links that helped me finally understand couchdb. Still bunches to learn, but hopefully it will save others from 4 days of googling.

Start

The very first thing you should read is the work in progress online couchdb book:
Relax with CouchDB [http://books.couchdb.org/relax/]

Now that your started

The following sections are grouped by what I left the article better understanding. They may cover other areas, but then again more knowledge leads to better understanding right?

JSON

Just in case you don’t understand json:

http://webt.wordpress.com/2007/10/01/json/

Couch MapReduce

fyi: hashes

Depending on what language your coming from you may know hashes as arrays or associative arrays. When they say reduce returns a single value, they are referring to the hash value it returns (scratched my head for a while)

If your coming from php an easy way to connect the dots is to think of how serialize creates a string that represents your object. Only in couchDb this is a json string

I had my eureka moment here:

http://www.ibuildings.com/blog/archives/1291-Some-thoughts-on-CouchDB.html

Just in case you didn’t eureka:

http://rrees.wordpress.com/2008/03/09/couchdb-querying-data/

Damien Katz explains more on couch’s mapreduce ( check the part 2 near the end as well ):

http://damienkatz.net/2008/02/incremental_map.html

MapReduce Method

In case you didn’t know mapreduce isn’t something couch invented, you can learn more about mapreduce below.

explains the mapreduce method in detail:

http://code.google.com/edu/parallel/mapreduce-tutorial.html#MapReduce

the mapreduce white paper:

http://labs.google.com/papers/mapreduce.html

mapreduce lecture (didn’t watch, but it was recommended by google, so why not):

http://www.youtube.com/v/-vD6PUdf3Js

using couchdb

blog db example / couchdb “joins”:

http://www.cmlenz.net/archives/2007/10/couchdb-joins

user permissions system example / offers rdbms comparison:

http://kore-nordmann.de/blog/couchdb_a_use_case.html

aimee’s 8+ couchdb on rails series (links to part 1, but your good from there):

http://aimee.mychores.co.uk/2008/09/07/post/320/

couchdb internals

Ricy ho’s overview:

http://horicky.blogspot.com/2008/10/couchdb-implementation.html

Related discussion where btrees are further discussed (as well as some decent bantering):

http://www.reddit.com/r/programming/comments/792hf/couchdb_implementation/

Ayende Rahien has an indepth series on couch db called: reading erlang

http://ayende.com/Blog/archive/2008/09/24/reading-erlang-inspecting-couchdb.aspx

http://ayende.com/Blog/archive/2008/09/24/more-couchdb-reading-btreelookup.aspx

http://ayende.com/Blog/archive/2008/09/24/more-couchdb-reading-btreequery_modify.aspx

http://ayende.com/Blog/archive/2008/10/04/reading-erlang-couchdb-from-rest-to-disk-in-a.aspx

http://ayende.com/Blog/archive/2008/10/04/erlang-reading-couchdb-digging-down-to-disk.aspx

http://ayende.com/Blog/archive/2008/10/06/reading-eralng-couchdb-streams.aspx

The most important part

Use it damn it! get planning, hacking, pop locking and start playing with couch

sources

Sources not referenced already:

http://damienkatz.net/2008/09/peek_into_couchdb.html

http://jan.prima.de/plok/

7 Reasons why MySQL Quality will never be the same

11 Dec

mysqlperformanceblog as a well written post discussing why mysql quality will never be what it once was. I honestly am more attracted to drizzle than mysql these days, but still a good read

I had a call with Monty the other day and I told him why I think MySQL Server Quality will never be the same again. I’ve been thinking a bit more about it and here is the extended list.

In particular I think MySQL Server will never be able to reach its original quality guidelines (see previous post) and even current release criterias will unlikely be ever reached with any sensible definition of what serious bugs are.

7 Reasons why MySQL Quality will never be the same | MySQL Performance Blog

couchdb: mapReduce explained

10 Dec

So far the the to biggest hurdles in my understanding of couchdb are mapReduce and how best to use data in it. Thanks to Michael Stillwell mapReduce I finally understand mapReduce.

From: http://www.ibuildings.com/blog/archives/1291-Some-thoughts-on-CouchDB.html

A few weeks ago Jan Lehnardt of CouchDB came by to Ibuildings UK and gave a talk about the project. CouchDB is a database that’s designed to be highly scalable, in terms of, well pretty much everything really: the amount of data it can handle, and the number of CPUs the distributed server can efficiently use in parallel, the number of concurrent clients. (Note though that CouchDB sits at version 0.8.1 at the moment, and many of the scalability features either haven’t been implemented, or haven’t been tested.) What follows is a short description of how CouchDB works, including the unusual MapReduce-powered database query technique, as gleaned from the talk and a few days’ worth of playing with it.

Check It Out:
Ibuildings – Ibuildings blogs > Some thoughts on CouchDB

damn you web goodness

10 Dec

So now I’m stuck wondering… which db do I use for new backend? I have a few months before its actually used, but I’ve seriously been wanting to try out mysql with a wafflegrid, but couchdb is just awesome…

Maybe I should just do both and hope I can toss two more servers into the mix before it actually launches….