Thursday, July 3, 2014

Developing Your Personal Statistical Code/Syntax Library

At 38, I'm trying to develop and refine some research practices skills and habits that I wish I would've developed and refined earlier. These are the things no one teaches you about in class, so you have to learn them on the job or from a trusted adviser. Or, by trial and error. One is developing and maintain your own stat code library. Perhaps it's an artifact of starting my career in the internet age, not being a full-time programmer, or both, but it never occurred to me to develop a personal code library. I would write code for a class or project, and then it would just live with those class or project files. Makes them hard to find later, even if you hang onto everything. When you need to run an ANOVA in SAS three years after you took a class about it, you're probably not going to be thinking "Stat101_HW2.sas", or whatever way you saved the file at the time.

Why do this? 
  1. Your coding productivity (and whatever the end product of your coding is, i.e., research papers) will go up if you can quickly find the code you need.
  2. There are a lot of great code sites but you can't rely on them for everything. In particular, they don't have your notes and modifications in what's presented on their site. Your code library is your own annotated, modified, and personalized versions of all those great examples online, from class notes, and in books.
  3. Code snippets online are often short examples that don't cover an entire process (e.g., how to apply variable and value labels, but not a full data cleaning process). On the other extreme, you can find complex programs that were written for someone else's purposes. Echoing point 2, they aren't organized in a way that makes sense to you, so will be harder to use time after time unless you personalize them. 
How to do it? 
  1. Start now! Literally, with your next coding task or class assignment, take time to annotate your code and save it in your library. Make it a habit.
  2. Add to your library as you do new tasks or go back over old class materials. 
  3. Annotate the code liberally, explaining what each line or group of lines do. Just because you're clear about what it does now doesn't mean you will be in 3, 7, or 10 years later.
  4. You don't need to make these textbook-quality examples, but think about what you would need to include to teach someone else your code (or to teach your 3-years-from-now self...who hasn't used the particular routine for a while, now has a job, kids, etc., and is being asked to turn around a project quickly). What can you add to the code quickly, while its fresh in your mind, that would help you understand it later.
  5. At minimum, take time during your analysis tasks to copy good examples of code into your library. Don't just let it sit in your project folder (which may disappear if you're not the owner, or you'll forget to take with you when you change jobs, etc.). Make your code library a "thing" that you curate, just like you would your iTunes play lists or bookshelf at home.
  6. Get in the habit of naming your files with meaningful names that will work across contexts. For example, instead of "Homework.sas" or "Homework 1.sas" (the second is slightly better), consider putting more info in the file name. "But I can just open the file and read it to see what it is." That's true, but that takes time. How many files named "Homework 1" do you want to open and read (and try to remember what it does) to find code when you're in a pinch. However, if you name or files like below, you're bound to always be able to remember what's in it because you've given yourself several cues. 
  7. stat101_hw1_proc-freq_dr-jans_spr2014.sas
    You might not need all those cues, but it can't hurt to have them.

  8. Create standardized file names for your code once you've "curated" it. At minimum, the name should have a) what the procedure is, b) what software it's for, and c) the date modified. This should be replicated in a header within the file, too. For example, your file name might look like "sas_one-way-anova_2014-07-03.sas" Whether you start with "sas" or "oneway" (or even the date) depends on how you store your files and what sort order you like. You might say "But Windows tells me it's a SAS program file with a little icon." Consider what that would look like on a computer that doesn't have SAS installed. And while the .sas extension tells you it's a SAS program, sometimes extensions don't automatically display on computers. It never hurts to be a little redundant if it makes it easier to understand what's in the file without opening it. 

No comments:

Post a Comment