As I have talked about before, I come to data science as a user of software and not a developer of it. I mostly leverage frameworks and packages made by others to quickly spin up my analysis projects and get my deliverables out in a reproducible and sometimes automatable manner.

But sometimes the problem you need to solve is not addressed in any preexisting packages, and instead you need to put on your developer hat and make a specialized tool for yourself. I love these challenges and working on them is when I get to feel like a maker and use my creativity. I find myself having a lot of energy for these types of problems, however I am aware that I am probably not solving them efficiently. While I can get build a tool there is so much I can learn and improve on in this area.

The side effect of my results driven approach to learning R has been that I have developed a blind spot when it comes to understanding how what I ask the computer to do actually happens “under the hood”. I don’t want to go deep down into the metal and think about voltages and logic gates, but I think there are important things to know about this that can translate to real improvements in your code.

The central concept to understand is how your data is structured in memory and what the implications of the code’s operations are on that specific data structure. This is the field of data structures and algorithms and is a central topic in standard computer science curriculum, and I was complexly unaware of it because I learned about tree litter decomposition in college.

I had never really considered that it might be significantly different for the computer to execute a “simple” operation, such as returning a value at a given index, depending on if you are searching through an array or a list. And now that I am aware of this the prospect of writing better and more elegant code by being thoughtful of how I store and structure my data is really exciting.

For now I am finding this topic fascinating and I will be writing up more about how I am going about learning it later.