College in the United States is expensive.
For all of the news about tuition increases and state cutbacks, there's very little media coverage about how universities spend their money.
One of the best college systems in the U.S. is the University of California ("UC") system. However, recent tuition increases are so painful they've led to student protests, and revealed outright contempt of students by UC leadership. Where does its money go?
The UC spends over two-thirds of its funding on staff. The rest goes to large expenses (buildings), financing (paying off debt), and miscellaneous expenses. However, those are related to the number of employees; buildings exist because people work in them.
The University of California's official budget website does not provide a useful breakdown further than that some high-level numbers. However, the State of California does provide the amounts it pays to its employees by title, along with the number of employees in each role. Thank goodness for public data.
Warning: this data is not 100% correct, because I have had to guess what some job title acronyms mean. It is largely accurate.
Most of the university's expenses are for administration and medicine.
Only ~30% of the UC's budget goes to teaching. Of that, almost half goes to medicine. Non-medical teaching, including professors and TAs, is only about 15% of salary costs.
Most university revenue, like student loans, doesn't go to teachers. It goes to the huge teaching hospital and administrators in office buildings. Sure, patient fees and grants make up for a lot of that, but it's clear that all non-teaching activities are the tail that wags the dog.
The University of California system is a collection of medical centers and large administrative staff. The teachers and researchers are there as window dressing.
Medicine vs. Not
Medicine is a huge amount of the money being spent by the UC system.
Administration vs. Not
It's not a majority, again, but it is quite a bit.
Academic vs. Not
Most of the money doesn't go to academics. The University of California's own data supports this.
Give examples of titles for each category.
I think I can see why salaries are allocated this way: the incentives are messed up.
Administrators have no incentive to cut their own budgets, their staff, or their authority. They'd be less important, and be able to offer less 'comprehensive' 'solutions' to (Insert Issue Here) if they did.
I've lost count of times I've heard the gnashing of teeth that some college lost a couple of places and 'prestige' in the U.S. World & News Report, and hired administrative staff to make things better.
Colleges are focusing heavily on increasing scope and adding control, which increases costs. Anyone who has ever heard of the Iron Triangle could tell you this was going to happen.
The implications are profoundly positive. It is possible to make universities cheaper and improve the lives of faculty. There are some important but systemic changes that could be made:
What has been tried so far is not working. It's insane to do "the same thing over and over and expect different results" (Einstein).Permalink
Here are some highlights:
Folksy Wisdom: Wear an interesting t-shirt. It's an instant conversation starter, and you'll make some good contacts. I met data scientists, a researcher, and a presenter this way.
The machine learning / data science community is small. Most people who talk about ML don't do any; they're vendors, executives, sales folks, and the press. The core group of practitioners is pretty tight-knit, even in tech-savvy Seattle.
One of the best ways to learn is from the smartest people you can find. If you become only halfway close to their level of competence, you'll be more clever than your competition.
Like many small conferences, MLConf was heavily vendor-supported. The result was the usual bevy of startups trying to compete with Amazon by impressing everyone with their
sales pitches "presentations".
That's fine, I suppose. I don't mind sitting in the audience while these folks speak; it's a good time to catch up on reading research papers.
Geeky Wisdom: P(sales pitch | presenter has C-suite job) = .99999
Machine learning, more than many disciplines, moves incredibly quickly from academic paper -> open-source library -> competitive advantage. This is a career where your skills become obsolete faster than in software engineering.
The key challenges in ML are timeless:
Some folks are working to make a difference. One example is medicine, where machine learning is used to do real-time fMRI decoding, genomic sequencing for personalized medicine, image classification of x-rays, and more. If you're tired of working on online ads and want to help the world, this is a good option.
The challenges are substantial. Most medical data faces the curse of dimensionality. There are more features than patients, or even humans. Our physiology interacts in complicated and subtle ways, so data measurements are constantly skewed and biased.
As a result, many techniques must be invented just for medicine, proven, and then re-written to scale.
Tensors are an intriguing, but complicated, concept. My basic understanding is that unlike 1d arrays or 2d matrices, tensors are higher-dimensional structures. Anima Anandkumar, a professor from UC Irvine, spoke about some of the opportunities. Her talk gave a brief overview of some of the opportunities and challenges:
Amina's hardly alone; influential people are talking about tensors. This'll be an area to keep an eye on.
There's a huge amount of media coverage of deep learning. Its results in the ImageNet competition and ability to self-identify features are truly impressive. However, it's not a panacea...yet. The field is so new that there are few people proficient in their use, so practically nobody knows how to build, tweak, and support them. Plus, they're effectively impossible to debug.
Deep learning is one of the few areas where revolutionary improvements in ML can come from, so it's worth learning about.
However, it's also an existential threat to feature engineering. If you remove feature engineering, and model selection, what you're left with is...defining a business metric. You don't need data scientists for that.
'Big data' has come to mean "build a distributed stack that can query X terabytes of data". Learning at scale is a much more difficult challenge. Microsoft's Misha Bilenko spoke about some of the approaches used by Azure's ML systems, notably the Learning from Counts approach. It was great to realize that this idea isn't new (it was previously used for 'pattern tables' when applied to CPU branch prediction).
One core lesson from the day was that clever engineering, good judgment, and heuristics are needed to advance the field of machine learning. Using just math, or just more hardware, doesn't advance the state of the art.