Thursday, November 13, 2014

The long road that is short

It was in a job interview that I really first thought about the difference between writing code and programming. The conversation went something like this.

Potential Boss: Tell me about your statistical software experience. Do you program?

Me: I write my own code.

PB: Yes, but do you program?

Me: Ummm...

PB: Hmmm...Do you use macros?

Me: Oh! Yes.

Programming involves writing code, but writing code is not necessarily programming. I've become hypersensitive to this distinction since that experience. Although I had a tacit understanding of it, I never really separated the two things out.

Then this morning I came across a great quote in Michael N. Mitchell's Data Management Using Stata: A Practical Handbook that reads:

"The word programming can be a loaded word. I use it here to describe the creation of a series of commands that can be easily repeated to perform a given task. As such, this chapter is about how to create a series of Stata commands that [can] be easily repeated to perform data-management and data analysis tasks. But you might say that you already know how to use Stata for your data management and data analysis. Why spend time learning about programming? My colleague at UCLA, Phil Ender, had a wise saying that I loved: 'There is the short road that is long and the long road that is short.' Investing time in learning and applying programming strategies may seem like it will cost you extra time, but at the end of your research project, you will find that it is part of the 'long road that is short.'" (http://www.stata.com/bookstore/data-management-using-stata/, p. 278)

This will be my new mantra..."the long road that is short"

Thursday, October 16, 2014

Why keep a project log?

Of course you keep all your syntax, output, and logs from your analyses, but why should you keep a separate log of your work, and what should it look like?

Why? 

Because your analysis logs won't tell the whole story. If you're good about commenting your code, they can tell most of the story, but there will inevitably be some question or insight that comes up after you've run things. At this step, it's probably easier to update notes in a Word file, text file, Excel file, or some place external to your code.

What? 

The format isn't too important. Use whatever program you find easiest to use. I use OneNote b/c I like how it lets me drop in pictures, etc., and automatically inserts URLs from things I copy and paste from the web. But it can be slow at times, and has more than one needs for a log. I've tried Excel before, but that's just too restrictive for me. It's good for other kinds of logs, but for a general project log, I like to have more free space to write.

What goes in this log? Anything you want to remember for later. I write a short summary of what I did during the session/day, and list/highlight open questions and next actions. If I made big decisions, I'll document that (though I like to have a decisions log, too). If I have key output or a finding, I'll add that.

Think of this as your "captains log"...record all the highlights and puzzling issues of the work, as well as major insights and new ideas.

Wednesday, August 20, 2014

The keyboard shortcuts I use the most

There's no doubt that kb shortcuts improve your efficiency (compared to mousing) once you've learned them. I learned a couple new ones from my boss yesterday, so I figured I should start listing them all here as a reference. I'll update this from time to time and include those I've listed elsewhere.

Navigating Text

Move to beginning of the line

home

Move to end of the line

end

Move one word to the right(left) of cursor

ctrl + right (or left) arrow


Editing/Formatting Text

Delete word to the right of cursor

ctrl + del

Select text while moving the cursor (e.g., select next word or text to the end of the line)

Hold down 'shift' with any of the movement shortcuts above


Select all text in the document, form field, etc.

ctrl + a


Excel Text Editing

For the most part, the editing shortcuts above work in Excel, too. You can apply them to to the whole cell or enter the cell and apply them to selected text in the cell.

Enter Excel cell

F2


Excel Navigation

Go to a cell

F5


Go to the next worksheet to the right

ctrl + page up 


Go to the next worksheet to the left

ctrl + page down 

Tuesday, July 22, 2014

Missing values in SAS

For the longest time I was using the default missing data display in SAS proc freq. But this isn't ideal for doing recodes, where you might want to do something specific with the missing values, and so you want them to show up in tabulations and cross-tabulations so you can see where they go. Here's a great page on how to change how missings display in output.

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_sect016.htm

After the vars you list on the table statement, include one of the following options.

The first option will display missing values and include them in the % and cumulative % calculations for the table. Helpful if you want to see what % of cases are missing.
/ missing

The second option will display missing values in your table, but it won't include them in the percentage calculations (i.e., it's "print only").
/ missprint

Monday, July 21, 2014

How to consume mass quantities

Not the kind of "mass quantities" that the Coneheads would consume, but mass quantities of academic readings and scientific findings. I'm not a fast reader myself, so I've tried different speed reading programs over the years, and have developed some of my tricks for getting through stacks of readings (are readings still "stacked" if they're in PDF files??). 

Here are some things I've tried that seem to work well.

Rule #1: Don't read through articles end to end

When you're reading for a class or for a scientific paper, you only want to read end to end if a) the topic is brand new and you need general orientation, b) the content you need is in the lit review, c) you've read the findings and you still don't understand them.  

Another way to phrase this is "Define the goal of your reading." When you're reading as part of a project or paper (rather than reading for a general overview of a topic or for fun), you're usually reading to find references to support your work. Thus you want to focus on the most helpful parts of the readings  (e.g., the findings). Make sure you can distinguish between these different kinds of reading (i.e., the goal of your reading), and use the right approach for the goal. 

Rule #2: Adopt a method

Particularly if you need to burn through a lot of reading, having a methodical way to process the "stack" is really helpful. Here are some options. 
  1. Alpha & Omega method: Read the abstract and conclusions first. This will give you the overall goal of the research and the take-home findings. These are probably the most important things for you to gather. 
  2. "To the point" method: There are a couple versions of this method, but the goal is to get to the point of the paper. 
    1. Option 1: Read the results section first (and only go back to methods when needed)
    2. Option 2: Read tables and figures first. They can be easier to consume than text sometimes.
    3. Option 3: Read conclusions/discussions first. Do this if you're trying to get the most general "big picture" take-home message or ideas for future research.

Rule #3: Document your progress

Remember that your goal is to extract findings and conclusions, not just highlight them. So, rather than just underlining or highlighting key findings (something I did for years), make sure that your note-taking method pulls the findings out so that they're easier to find later. You don't want to have to flip through pages of articles to find your meaningful highlights. Here are some ideas. 

  1. Create a list of 3-4 "take home message". If you only remember 3-4 things about the article, these points are what you want to remember. Make this list in a separate document or on the first page of the article so it's easy to find. 
  2. In Adobe Reader (and maybe other PDF readers), you can insert comments and then have only the comments displayed (rather than showing them on the pages of the document). This is another way to bring important points to the surface. 
  3. Copy key text and tables right out of the document into your notes file or document draft. Be careful that you don't accidentally plagiarize when doing this. It protect against that, I like to copy out of articles as screen caps (rather than text) so that I can't edit it directly. 
You also want to make your notes "retraceable". For your take-homes or anything you copy out of the article, be sure to say where in the article it was found (e.g., page, and even paragraph). This makes it easier to go back to the full discussion if you have to. 

Suggestion #1 Try formal speed reading methods

I've tried a couple (Evelyn Wood: http://www.ewrd.com/ewrd/index.asp and Iris: http://www.irisreading.com/). I've had more luck with Iris (by which I mean, it's stuck as a habit), but that's probably because I took the Iris course, while I tried to learn EW from the book. Here are some general speed reading tips taken from both. These methods assume you're reading a chapter or section end-to-end (which you may not be if you're using one of the methods above). However, you can apply these techniques to individual sections of a research article pretty easily. 
  1. Do your reading in multiple passes (you don't have to do all these...they're a combination of the two methods)
    1. Pass 1a: Read at a pace of 1-2 sec per page, just skimming for style, key words, etc.). This is a "warm-up pass" of sorts.
    2. Pass 1b: Read the opening and closing paragraph of the chapter or section. A similar type of warm-up.
    3. Take notes. What will the section be about? What terms seem new, etc.
    4. Pass 2: Read the first sentence of each paragraph. 
    5. Pass 3: Read the full text. 
  2. Both methods recommend using a finger or some sort of pointer to "underline" or "tap" beneath the words as you read. Essentially, you follow your pointer to keep your eyes moving.
  3. Don't back track. If you think you missed a word or sentence, just keep going. You'll likely pick it up later in the text or on the next read-through. 
  4. Don't take note while actively reading, but in between passes. Taking notes slows you down. 
Suggestion #2 Make a game of it. How quickly can you  extract three substantive take-home messages. 

Suggestion #3 Use your environment. Don't read articles while lying comfortable on your sofa or in bed. Do them in a location or at a time that puts some pressure on you to finish. For example, read while standing and only sit down after you've finished an article. Or, read articles on the bus. Keep your readings hand so you can read them in those frequent 5-15 minute down times each day (e.g,. before a meeting, waiting for your dinner to heat up in the microwave). 

Suggestion #4 Use Technology. http://accelareader.com/ is just one example that Iris recomments. You copy text into it and it shows you a word at a time at your chosen pace. Here are some other tools: http://www.irisreading.com/software-and-apps/

Happy reading!

Tuesday, July 15, 2014

Data import problems

After helping my GSR with a data import problem in Stata yesterday, I was reminded that the quickest and easiest solution is often to just read the file into a different program first (and the export from there, or read from there into the terminal program). In this case we read it into Excel, which correctly delimited the variables that Stata wasn't. The first Excel attempt didn't work, so we had to read it into Excel on a different computer (and later version of Excel).

This example of critical thinking and persistence in mundane tasks reminded me of a similar situation at CSR. I couldn't get my text file to read into FOCUS (our database program), so I enlisted the help of Vicky Stringfellow. We tried all known options (i.e., check column assignments in layout file, check for hidden special characters, read into Excel, etc.). Nothing worked. Finally, for some reason, reading it into SPSS worked. Still a mystery.

Thursday, July 3, 2014

Developing Your Personal Statistical Code/Syntax Library

At 38, I'm trying to develop and refine some research practices skills and habits that I wish I would've developed and refined earlier. These are the things no one teaches you about in class, so you have to learn them on the job or from a trusted adviser. Or, by trial and error. One is developing and maintain your own stat code library. Perhaps it's an artifact of starting my career in the internet age, not being a full-time programmer, or both, but it never occurred to me to develop a personal code library. I would write code for a class or project, and then it would just live with those class or project files. Makes them hard to find later, even if you hang onto everything. When you need to run an ANOVA in SAS three years after you took a class about it, you're probably not going to be thinking "Stat101_HW2.sas", or whatever way you saved the file at the time.

Why do this? 
  1. Your coding productivity (and whatever the end product of your coding is, i.e., research papers) will go up if you can quickly find the code you need.
  2. There are a lot of great code sites but you can't rely on them for everything. In particular, they don't have your notes and modifications in what's presented on their site. Your code library is your own annotated, modified, and personalized versions of all those great examples online, from class notes, and in books.
  3. Code snippets online are often short examples that don't cover an entire process (e.g., how to apply variable and value labels, but not a full data cleaning process). On the other extreme, you can find complex programs that were written for someone else's purposes. Echoing point 2, they aren't organized in a way that makes sense to you, so will be harder to use time after time unless you personalize them. 
How to do it? 
  1. Start now! Literally, with your next coding task or class assignment, take time to annotate your code and save it in your library. Make it a habit.
  2. Add to your library as you do new tasks or go back over old class materials. 
  3. Annotate the code liberally, explaining what each line or group of lines do. Just because you're clear about what it does now doesn't mean you will be in 3, 7, or 10 years later.
  4. You don't need to make these textbook-quality examples, but think about what you would need to include to teach someone else your code (or to teach your 3-years-from-now self...who hasn't used the particular routine for a while, now has a job, kids, etc., and is being asked to turn around a project quickly). What can you add to the code quickly, while its fresh in your mind, that would help you understand it later.
  5. At minimum, take time during your analysis tasks to copy good examples of code into your library. Don't just let it sit in your project folder (which may disappear if you're not the owner, or you'll forget to take with you when you change jobs, etc.). Make your code library a "thing" that you curate, just like you would your iTunes play lists or bookshelf at home.
  6. Get in the habit of naming your files with meaningful names that will work across contexts. For example, instead of "Homework.sas" or "Homework 1.sas" (the second is slightly better), consider putting more info in the file name. "But I can just open the file and read it to see what it is." That's true, but that takes time. How many files named "Homework 1" do you want to open and read (and try to remember what it does) to find code when you're in a pinch. However, if you name or files like below, you're bound to always be able to remember what's in it because you've given yourself several cues. 
  7. stat101_hw1_proc-freq_dr-jans_spr2014.sas
    You might not need all those cues, but it can't hurt to have them.

  8. Create standardized file names for your code once you've "curated" it. At minimum, the name should have a) what the procedure is, b) what software it's for, and c) the date modified. This should be replicated in a header within the file, too. For example, your file name might look like "sas_one-way-anova_2014-07-03.sas" Whether you start with "sas" or "oneway" (or even the date) depends on how you store your files and what sort order you like. You might say "But Windows tells me it's a SAS program file with a little icon." Consider what that would look like on a computer that doesn't have SAS installed. And while the .sas extension tells you it's a SAS program, sometimes extensions don't automatically display on computers. It never hurts to be a little redundant if it makes it easier to understand what's in the file without opening it. 

Thursday, June 5, 2014

10 Excel keyboard shortcuts that will make you love Excel again

You should be using KB shortcuts in all your programs, but here are a few (some newly-discovered) that have re-awakened my relationship with Excel.

(Note: I'm sure I'll get to 10 at least, but wanted to keep this post updated in real-time)

1) Add/delete a row:
shift+space to highlight the row
then shift+ctrl+'+' to add

2) Add/delete a column:


ctrl+space to highlight the row
then ctrl+shift+'+' (same key combo as above) to add


3) Enter a cell: Good ole F2! Works throughout windows for same purpose

4) Move around text in a cell: These are common text editing kb shortcuts, but if you're a mouser you may not know them.

End, Home, ctrl+arrow, ctrl+shift+arrow:
end/home to get to the end/beginning of cell text.
ctrl+arrow to move one word at a time.
shift+ctrl+arrow to highlight/select as moving. 

I use these everywhere! You can't get off-mouse without them.

More to come...