2019 in Review

31 December 2019

2019 is over. As is my tradition, I use the New Year to look back, reflect, and learn lessons. I’ve always been a student; I see life as a series of puzzles to solve, and lessons to learn.

2019 was a year of growth. Large parts of my life were stable, so I had the buffer to grow in ways I wanted.

Here’s a few of the things I did and learned over the year…

Life Changes

  1. I got divorced late in 2018. 2019 started with a newfound sense of autonomy and possibility.
  2. I lived alone for the first time, ever. I didn’t have to ask permission for anything at home, which felt strange at first.
  3. I discovered a drawback to living alone: it’s easy to isolate yourself.
  4. Isolation is bad, so I cultivated the friendships I valued most. The people you’re closest to have a huge effect on your life.
  5. I made my health my top priority for the whole year. Whenever I was deciding what to do, I chose the healthier option.
  6. I tried dating, with a fatal flaw. My friends said their best relationships started when they weren’t seeking anything.
    • We find our strongest connections when both people are being their authentic selves.
    • That’s doesn’t happen when going on dates, unless both people are self-aware. Pssst, online dating is playing a game with your life with bad odds
  7. I changed my finances, using principles taught to me by my grandmother. They’re also the guiding tenets of the FIRE movement.
  8. Late in the year, I met someone…after I’d stopped looking. I’m lucky to have a strong, smart, passionate woman in my life.
  9. My closest friend moved across the continent in 2018. This year was an exercise in staying connected with someone from far away. Emails and messages suffice when you know someone’s communication patterns well.

Health

  1. My nutrition changes grew from what I already knew. I grew up eating South Indian food, which consists of spices & vegetables. I was already comfortable as a cook.
  2. The most effective tactic was to make healthy choices the easiest thing to do. It’s easy to eat at home when there’s curry in the fridge.
  3. This was a multi-month set of changes done one week at a time. The best approach for this I’ve ever found is Nerd Fitness.
    • It covered practical tactics, like beginner bodyweight exercises, introductory nutrition changes.
    • It also covered the psychology of health changes.
  4. 90% of the battle was in my head. Some of it was a lack of knowledge. I cultivated good habits to replace bad ones.
  5. I hate gyms. I ended up doing bodyweight exercises at home, and cycling.
  6. Be lazy - Find ways to be healthy when and lazy. For me, that meant batch cooking. I cook once a week, and have delicious food the rest of the time.
  7. A weekly habit was critical. I made reminders to choose recipes, get groceries, cook, do workouts, and ask friends for help.
  8. Eat more veggies, eat less sugar. Add all the spices you want. It’s easy to eat healthy when your food is delicious.
  9. The most effective advice was simple: eat healthy, exercise, sleep, and ask for help.

Music

  1. Music is a language. Small combinations of notes are syllables. A measure is a word, a combination of measures is a sentence. Songs are paragraphs, even poetry.
    • This is a very helpful way for me to learn.
    • An effective way to learn music is to learn the most common ~words~ measures
  2. Growing as a musician helps me grow as an engineer. I’m working on my technique, material to play, people to play with. The same is true in my job.
  3. Music is instinct, muscle memory, and cognition all at the same time. The same is true of engineering.
    • I knew I was making a bad software design choice when I got an uneasy feeling.
  4. Inspiration came many sources. One was variety. By playing with different groups, I gained a deeper understanding the music I play.
    • The same was true with work.

Work

My work went through some interesting changes this year.

  1. The PMs I work with had a critical role. The most important thing for our group was to have a product vision based on strong evidence. This was especially true with many different research labs as potential users.
    • A big part of that was asking open-ended but leading questions.
    • Keeping an open mind. Confirmation bias was very common.
  2. Hiring was the most powerful thing I could be doing, and one of the most difficult. It took persistence to hire engineers to work for less money than the big tech companies offered.
  3. Biology is very messy. I long for the days where my most complicated problem was a tangled codebase. That’s child’s play compared to the evolution of computational biology processes and tools.
  4. It takes a village. The most effective groups I was in had people with different specialties. Pair a domain expert, a technologist, and a PM, and get out of the way.
  5. One-off scripts and pipelines make up bioinformatics. Simple, intuitive data pipelines are a quick and effective way to speed up science.
    • Everyone I met had a witch’s brew of bash, Python, and R scripts that underpinned all their work.
  6. When your problem was one of technical skill & culture, education was the golden ticket. I should have taught people to write pipelines rather than building any on my own.
  7. New problem? Try something new. I lost track of the number of times I faced a technical problem and a quick-and-dirty fix was to use a cloud service in a new way (hello, AWS Athena!).
  8. Data science was still gathering, cleaning up, and transforming data. The pace of new machine learning techniques was mind-boggling. The pace of data extraction, transformation, and transfer tools? Not so much.

All that said, I loved where I work. I have faith in the work I do, and the impact it can have on the world. (Interested? We’re hiring!).

Common Elements

When I look back upon the lessons of 2019, patterns emerge.

  1. “We don’t rise to the level of our expectations, we fall to the level of our systems” (Archilochus)
  2. ‘Unintended consequences’ often meant ‘I didn’t think this through, ask questions, or listen’
  3. I have mental blind spots. A productive habit is to reflect on the week.
    • Every week, I ask myself: what did I want to do? What succeeded? Why? How do I persist?
    • I would also ask, what did I not do, despite wanting to? What held me back? What should I do different next time? Why will that work?
    • Write this all down. Over time I start to notice perspectives/techniques that work for me, and ones that don’t.
  4. Micro-tasks are one of my most important tools. If I can break a challenging task (e.g. ‘write a long blog post’), into tiny pieces, I can get a lot done.
  5. Focus is important. Like everyone else, I don’t multitask well. I found a good technique to help me focus:
    1. Use pomodoros. I work for 60 minutes, and take a 10-minute break.
    2. I close my email, Slack, silent my phone, and flip it over.
    3. I’ll put on ambient music or bird song.
    4. For my break, I will walk around outside, drink some water, and stretch.
  6. Sleep is far more important than you think. My life hinges on my cognitive capacity, and sleep deprivation sabotages that.
    • Not getting enough sleep is a mental trap.
  7. Ambition is a dangerous force. Mine is intellectual ambition; I want to know everything. That (misguided) desire shapes how I spend my time, what I care about, and whom I pay attention to.
  8. Communication is King. And Queen. And Vizier
    • Software engineering is about communication, with code at the end.
    • People can be great teachers, mentors, friends, and students. It’s up to each of us to decide.
  9. People are messy. Developing working relationships takes time and effort, and has an uncertain outcome. This is very different from software.
    • This isn’t well suited for today’s fast-paced, digitized, immediate-gratification world.
    • Being vulnerable around people is hard, and worthwhile.
    • If people are manipulative, get them out of your life. The best revenge is not to be like that.
  10. When in doubt, build buffer. ‘Buffer’ for me is work that I’m pulling from the future and doing now. Some examples:
    • Making a week’s worth of soup and freezing it
    • Making minor repairs to my car & house so they don’t get bigger (or more expensive).
    • Documenting and cleaning up my code so Future Dev doesn’t have to wonder WTF I was doing
    • Learning to cook well. Good for my health, my wallet, and my relationships.

2019 is over. With any luck, I will use the lessons from this last year to grow in this next one.

2020 begins.

Permalink

ElasticSearch Snippets

28 December 2019

I do a bunch of work with ElasticSearch, building tools so researchers can search through large amounts of data. I’ve had to figure out a bunch of useful queries for searching, aggregations, deletes, and index management.

In this example, I will use an index, twitter, that has account, tweet, retweet_count, language, and country fields.

Each record/document is a tweet, and one of the accounts is for @devnambi

Searches

Search everything

GET twitter/_search
{
    "query": {
        "match_all": {}
    }
}

Search for a single word

GET twitter/_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "account":   "devnambi" }}
      ]
    }
  }
}

Search for multiple words

GET twitter/_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "tweet":   "Hello World" }}
      ]
    }
  }
}

Search for multiple lower-case words

GET twitter/_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "tweet":   "hello world" }}
      ]
    }
  }
}

Records for a given dataset name (w/ tokens)

Single-word:

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [{
          "term": {
            "tweet": "example"
          }
        }
      ]
    }
  }
}
}

Multiple-word:

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [{
          "term": {
            "tweet": "include"
          }
        },
        {
          "term": {
            "tweet": "all"
          }
        },
        {
          "term": {
            "tweet": "these"
          }
        },
        {
          "term": {
            "tweet": "words"
          }
        }
      ]
    }
  }
}
}

Aggregations

Count of records by country

GET twitter/_search
{
  "size": 0,
  "aggs" : { 
        "source" : { 
            "terms" : { 
              "field" : "country.keyword"
            }
        }
    }
}

Count number of records for a filter

GET twitter/_search
{
    "size" : 0,
    "query": {
    "bool": {
      "must": [
        {
          "match": {
            "country.keyword": "US"
          }
        }
      ]
    }
  },
    "aggs" : { 
        "source" : { 
            "terms" : { 
              "field" : "country.keyword"
            }
        }
    }
}

Sum of retweets for an account

GET twitter/_search
{
    "size" : 0,
    "query": {
    "bool": {
      "must": [
        {
          "match": {
            "account.keyword": "devnambi"
          }
        }
      ]
    }
  },
    "aggs" : { 
        "source" : { 
            "sum" : { 
              "field" : "retweet_count"
            }
        }
    }
}

Get tweet count by account

GET twitter/_search
{
    "size" : 0,
    "aggs" : { 
        "source" : { 
            "terms" : { 
              "field" : "account.keyword"
            }
        }
    }
}

Get average size of all tweets in the UK

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "country.keyword": "UK"
          }
        }
      ]
    }
  },
  "aggs":{
    "avg_length" : { "avg" : { "script" : "_source.tweet.toString().getBytes(\"UTF-8\").length"}}
  }
}

Get unique count of tweets in Germany

GET twitter/_search
{
    "size" : 0,
    "query": {
    "bool": {
      "must": [
        {
          "match": {
            "country.keyword": "DE"
          }
        }
      ]
    }
  },
    "aggs" : 
    {
      "unique_filecount": {
          "cardinality" : 
          { 
            "field" : "tweet"
          }
      }
    }
}

Get records with missing retweet_count field by country (no retweets)

GET twitter/_search
{
  "size": 5,
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "retweet_count"
          }
        }
      ]
    }
  },
  "aggs": {
    "sources": {
      "terms": {
        "field": "country.keyword",
        "size": 10
      }
    }
  }
}

Get multiple aggregations

GET twitter/_search
{
    "size" : 0,
    "aggs" : { 
        "countries" : { 
            "terms" : { 
              "field" : "country.keyword"
            }
        },
        "accounts" : { 
            "terms" : { 
              "field" : "account.keyword"
            }
        }
        ,"languages" : { 
            "terms" : { 
              "field" : "language.keyword"
            }
        }
    }
}

Sources: % Increase, Term Percentage, Search Agg Pipeline

Deletes

Delete all tweets by account name

POST /twitter/_delete_by_query?conflicts=proceed
{
  "query": {
    "bool": {
      "must": [{
          "term": {
            "account": "devnambi"
          }
        }
      ]
    }
  }
}
}

Index Management

Index name in examples: twitter
Alias name in examples: social_media

Get Indexes

GET /_cat/indices?v

Delete an index

DELETE /twitter

Change the index refresh interval

PUT /twitter/_settings
{
    "index" : {
        "refresh_interval" : "1s"
    }
}

Aliases

Get aliases

GET /_cat/aliases?v

Set an alias

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "twitter", "alias" : "social_media" } }
    ]
}

Remove an alias

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "twitter", "alias" : "social_media" } }
    ]
}

No piece of code is ever done. I can think of various improvements to improve query performance or add functionality. I’ll add those over time.

Happy coding!

Permalink