Show Me the Tuition

23 July 2015

College in the United States is expensive.

For all of the news about tuition increases and state cutbacks, there's very little media coverage about how universities spend their money.

One of the best college systems in the U.S. is the University of California ("UC") system. However, recent tuition increases are so painful they've led to student protests, and revealed outright contempt of students by UC leadership. Where does its money go?

The Data

The UC spends over two-thirds of its funding on staff. The rest goes to large expenses (buildings), financing (paying off debt), and miscellaneous expenses. However, those are related to the number of employees; buildings exist because people work in them.

The University of California's official budget website does not provide a useful breakdown further than that some high-level numbers. However, the State of California does provide the amounts it pays to its employees by title, along with the number of employees in each role. Thank goodness for public data.

After cleaning up the raw public data and coming up with a cleaner data set, I was ready to do analysis.

Warning: this data is not 100% correct, because I have had to guess what some job title acronyms mean. It is largely accurate.

Show Me the Money

Most of the university's expenses are for administration and medicine.

Only ~30% of the UC's budget goes to teaching. Of that, almost half goes to medicine. Non-medical teaching, including professors and TAs, is only about 15% of salary costs.

Most university revenue, like student loans, doesn't go to teachers. It goes to the huge teaching hospital and administrators in office buildings. Sure, patient fees and grants make up for a lot of that, but it's clear that all non-teaching activities are the tail that wags the dog.

What is the UC System?

The University of California system is a collection of medical centers and large administrative staff. The teachers and researchers are there as window dressing.

Medicine vs. Not

Medicine is a huge amount of the money being spent by the UC system.

Administration vs. Not

It's not a majority, again, but it is quite a bit.

Academic vs. Not

Most of the money doesn't go to academics. The University of California's own data supports this.

Give examples of titles for each category.

It's About the Incentives

I think I can see why salaries are allocated this way: the incentives are messed up.

Administrators have no incentive to cut their own budgets, their staff, or their authority. They'd be less important, and be able to offer less 'comprehensive' 'solutions' to (Insert Issue Here) if they did.

I've lost count of times I've heard the gnashing of teeth that some college lost a couple of places and 'prestige' in the U.S. World & News Report, and hired administrative staff to make things better.

Colleges are focusing heavily on increasing scope and adding control, which increases costs. Anyone who has ever heard of the Iron Triangle could tell you this was going to happen.


The implications are profoundly positive. It is possible to make universities cheaper and improve the lives of faculty. There are some important but systemic changes that could be made:

  • Cap the administrator:student ratio to a level seen in the 60s, 70s or 80s. Fire some administrators.
  • If tuition goes up, all high-level administrators should automatically take a pay cut equivalent to the tuition rate increase.
  • Appoint a university ombudsman specifically tasked at making their teaching hospitals more cost-efficient.

What has been tried so far is not working. It's insane to do "the same thing over and over and expect different results" (Einstein).


MLConf Seattle

10 May 2015

A couple of weeks ago I attended Seattle's first MLConf, a one-day machine learning conference. I knew from YouTube videos just how good this was going to be, and I wasn't disappointed.

Here are some highlights:

Network Effects

Folksy Wisdom: Wear an interesting t-shirt. It's an instant conversation starter, and you'll make some good contacts. I met data scientists, a researcher, and a presenter this way.

The machine learning / data science community is small. Most people who talk about ML don't do any; they're vendors, executives, sales folks, and the press. The core group of practitioners is pretty tight-knit, even in tech-savvy Seattle.

One of the best ways to learn is from the smartest people you can find. If you become only halfway close to their level of competence, you'll be more clever than your competition.

Money Effects

Like many small conferences, MLConf was heavily vendor-supported. The result was the usual bevy of startups trying to compete with Amazon by impressing everyone with their sales pitches "presentations".

That's fine, I suppose. I don't mind sitting in the audience while these folks speak; it's a good time to catch up on reading research papers.

Geeky Wisdom: P(sales pitch | presenter has C-suite job) = .99999

Speed and Consistency

Machine learning, more than many disciplines, moves incredibly quickly from academic paper -> open-source library -> competitive advantage. This is a career where your skills become obsolete faster than in software engineering.

The key challenges in ML are timeless:

  1. Defining the right objective function (target metric) for the business goal.
  2. Identifying which algorithms are the best choices for a problem.
  3. Feature extraction.

Work that Matters

Some folks are working to make a difference. One example is medicine, where machine learning is used to do real-time fMRI decoding, genomic sequencing for personalized medicine, image classification of x-rays, and more. If you're tired of working on online ads and want to help the world, this is a good option.

The challenges are substantial. Most medical data faces the curse of dimensionality. There are more features than patients, or even humans. Our physiology interacts in complicated and subtle ways, so data measurements are constantly skewed and biased.

As a result, many techniques must be invented just for medicine, proven, and then re-written to scale.

Concepts to Learn


Tensors are an intriguing, but complicated, concept. My basic understanding is that unlike 1d arrays or 2d matrices, tensors are higher-dimensional structures. Anima Anandkumar, a professor from UC Irvine, spoke about some of the opportunities. Her talk gave a brief overview of some of the opportunities and challenges:

  • Tensor methods can lead to better neural network accuracy when used instead of backpropagation.
  • Tensor methods lead to interesting approaches to spectral (dimensional) decomposition. Imagine a tensor with a million-by-million features - it can be reduced to a lower-dimensional representation.
  • The math involved in the theory is much more involved than matrix math; sometimes it's NP-hard.
  • However, some tensor decomposition algos are embarrassingly parallel for cloud-based systems (Spark), GPU systems

Amina's hardly alone; influential people are talking about tensors. This'll be an area to keep an eye on.

Deep Neural Networks

There's a huge amount of media coverage of deep learning. Its results in the ImageNet competition and ability to self-identify features are truly impressive. However, it's not a panacea...yet. The field is so new that there are few people proficient in their use, so practically nobody knows how to build, tweak, and support them. Plus, they're effectively impossible to debug.

Deep learning is one of the few areas where revolutionary improvements in ML can come from, so it's worth learning about.

However, it's also an existential threat to feature engineering. If you remove feature engineering, and model selection, what you're left with is...defining a business metric. You don't need data scientists for that.

Learning at Scale

'Big data' has come to mean "build a distributed stack that can query X terabytes of data". Learning at scale is a much more difficult challenge. Microsoft's Misha Bilenko spoke about some of the approaches used by Azure's ML systems, notably the Learning from Counts approach. It was great to realize that this idea isn't new (it was previously used for 'pattern tables' when applied to CPU branch prediction).

One core lesson from the day was that clever engineering, good judgment, and heuristics are needed to advance the field of machine learning. Using just math, or just more hardware, doesn't advance the state of the art.

Happy researching!