As a data scientist, one of the most common questions I hear is “How do I become a data scientist? How can I do what you do?”
Here are my answers to that and similar questions…
Well, my work is about communication and curiosity.
Before I start any data science effort, I want to know how it why it’s needed. If I deliver perfect predictions or recommendations, what will someone do with them? Is it ethical? Well thought out? Do they even know?
If someone does not know what they will do with predictions and analysis, it’s my job to tell them that.
For useful projects, I try to build up my curiosity. What am I trying to predict? What am I trying to recommend? What do I need to understand about the problem? How does everything work?
A huge part of this is humility and curiosity; I’m trying to identify my customers’ blind spots, and my own. I’m trying to be aware of and compensate for my blind spots. I’m trying to make correct assumptions as shortcuts, to avoid “analysis paralysis”. This involves asking lots of questions, thinking with a pen and paper, and asking again.
Once I have a better understanding of a problem, I start looking at data. What is the ideal data set for this problem? Do we have it? (No). What data do we have? Is that going to be good enough to answer the questions we have?
Sometimes my projects stop here, and I tell people to collect data before going any further. Usually I have to tell people what kind of data they need to have the accuracy they’re looking for.
If I have data I can use, my job becomes very domestic: cleaning. 80% of my time is acquiring, cleaning, and transforming data sets into a format I can use. The final 20%: feature extraction, machine learning, validating predictions, and making fun visuals.
Finally, once I have good results, I put on my cynic’s hat, and try to prove myself wrong. What assumptions did I make? What am I not seeing? What else could be going on that I didn’t check for?
…the end of this explanation is usually when the developers I talk to turn pale and walk away. This is a very human, very messy process, and the code I write is only part of my work.
My degree was useful in some ways:
In other ways my degree was missing important things:
Yes indeed. I wish I had done lots of small side projects. Building recommendation enginees, data classifiers, topic analyses…it would all be great training.
I wish I had known the most popular languages and tools, so I could study them. Linux. SQL. Python. R. Jupyter notebooks.
I don’t have a complete answer to this, because I don’t have the graduate school experience to compare it to.
I’ve only had the chance to do in-depth research into a subject once, when studying student behavior as a UW data scientist. Graduate students have more opportunity there. However, there’s a huge difference between paying to learn in graduate school versus being paid to work as a data scientist.
Well, I was naïve in college. I listened to advisors in college and perfected my résumé and cover letter. I thought that was what I needed to get a job.
It turns out that what matters is networking. The success rate for online applications is ~1%; networking and referrals are around ~20%.
I submitted hundreds of job applications. Sometimes 10+ a day, with variations on the same cover letter and resume. Nothing ever came of them. I finally finagled an interview as a system administrator for an advertising startup through a friend.
For the last 13 years I have grown from a sysadmin, to SQL developer, to a senior software engineer, to a data scientist.
In many ways these are the same qualities I would look for in any scientist. Data scientists need the same qualities as other scientists.
I’d practice my communication skills more, get used to working as a team. Things like body language and humor are important.
I’d be less optimistic about changing a company to suit me. I can be successful in many different industries. That doesn’t mean I want to. I shouldn’t stay too long in any one place. When the company politics and organizational culture hinder me too much, I should trust my gut and leave. I should take more risks.
It’s important for me to do my part to help humanity as a whole, to give back to society. Only organizations that have that mission will suit me well.
I’ve know of 3 ways:
After my last job, I decided to work somewhere already working to make the world more equitable and just. I am fortunate to be a data scientist and engineer; it gives me the opportunity to work in many places.
I was lucky enough to find, apply for, and get a job as a data engineer at the Fred Hutchinson Cancer Research Center. I’ve lost friends and family to the cancer, the same as everyone else.
My new day job is to build a data ‘commonwealth’, where researchers can upload data, process it, and share it. It is an evolutionary step in data intensive science, after open source scientific computing and open access research. Helping scientists with reproducible research and “building upon the work of others” can dramatically accelerate the pace of scientific discovery. That’s my dream for this job.
My evening plans involve learning about cancer biology, genomics, and bioinformatics. My next career goal is to be both a data scientist and cancer researcher.
I’m hoping to find the time to write, about data engineering, bioinformatics, cancer biology, and more. Stay tuned :)Permalink