22 October 2013
Dr. David DeWitt recently presented a keynote (video, slides) for PASS Summit 2013 on the new Hekaton query engine. I was impressed by how the new engine design is rooted in basic engineering principles.
Software engineers and IT staff are bound to the economics and practicalities of the computing industry. These trends define what we can reasonably do.
Peter Norvig, Director of Research at Google, famously wrote Numbers Every Programmer Should Know, describing the latency of different operations.
When a CPU is doing work, the job of the rest of the computer is to feed it data and instructions. Reading 1MB of data from memory is ~ 800 times faster than reading it sequentially from a disk.
A recent hype has been “in-memory” technology. These products are based on a constraint: RAM is far, far faster than the disk or network.
“In-memory” means “stored in RAM”. It’s hard to market “stored in RAM” as the new hotness when it’s been around for decades.
The price of CPU cycles has dropped dramatically. So has the cost of basic storage and RAM.
You can buy a 10-core server with 1 terabyte of RAM for $50K. That’s cheaper than hiring a single developer or DBA. It is now cost effective to fit database workloads entirely into memory.
I can write code that is infinitely fast, has 0 bugs, and is infinitely scalable. How? By removing it.
The best way to make something faster is to have it do less work.
CPU scaling is running out of headroom. Even if Moore’s Law isn’t ending, it has transformed into something less useful. Single-threaded performance hasn’t improved in some time. The current trend is to add cores.
What software and hardware companies have done is add support for parallel and multicore programming. Unfortunately, parallel programming is notoriously difficult, and runs head-first into a painful problem:
As the amount of parallel code increases, the serial part of the code becomes the bottleneck.
Luckily for us, truly brilliant people, like Dr. Maurice Herlihy, have invented entirely parallel architectures.
“Big Data” is all the rage nowadays. The number and quality of sensors has increased dramatically, and people are putting more of their information online. A few places have truly big data, like Facebook, Google or the NSA.
For most companies, however, the volume of quality data isn’t increasing at nearly as rapid a pace. I see this all the time; OLTP databases are growing at a much smaller pace than their associated ‘big data’ click-streams.
Systems are not upgraded quickly. IT professionals live with a hard truth: change brings risk. For existing systems the benefit of change must outweigh the cost.
Many NoSQL deployments are in new companies or architectures because they don’t have to migrate and re-architect an existing (and presumably working) system.
Backwards compatibility is a huge selling point. It reduces risk.
Brilliant ideas don’t come from large groups. The most impressive changes come from small groups of dedicated people.
However, most companies have overhead (email, managers, PMs, accounting, etc). It is easy to destroy a team’s productivity by adding overhead.
I have been in teams where 3 weeks of design/coding/testing work required 4 **months **of planning and project approvals.
Overhead drains productive time _and _morale.
Smart companies realize this and build isolated labs:
Dr. DeWitt’s keynote covered how these basic principles contributed to the Hekaton project.
I have hope for the new query engine, but also concerns: