Research Efficiency: July 2014

Tuesday, July 22, 2014

Missing values in SAS

For the longest time I was using the default missing data display in SAS proc freq. But this isn't ideal for doing recodes, where you might want to do something specific with the missing values, and so you want them to show up in tabulations and cross-tabulations so you can see where they go. Here's a great page on how to change how missings display in output.

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_sect016.htm

After the vars you list on the table statement, include one of the following options.

The first option will display missing values and include them in the % and cumulative % calculations for the table. Helpful if you want to see what % of cases are missing.

/ missing

The second option will display missing values in your table, but it won't include them in the percentage calculations (i.e., it's "print only").

/ missprint

Monday, July 21, 2014

How to consume mass quantities

Not the kind of "mass quantities" that the Coneheads would consume, but mass quantities of academic readings and scientific findings. I'm not a fast reader myself, so I've tried different speed reading programs over the years, and have developed some of my tricks for getting through stacks of readings (are readings still "stacked" if they're in PDF files??).

Here are some things I've tried that seem to work well.

Rule #1: Don't read through articles end to end

When you're reading for a class or for a scientific paper, you only want to read end to end if a) the topic is brand new and you need general orientation, b) the content you need is in the lit review, c) you've read the findings and you still don't understand them.

Another way to phrase this is "Define the goal of your reading." When you're reading as part of a project or paper (rather than reading for a general overview of a topic or for fun), you're usually reading to find references to support your work. Thus you want to focus on the most helpful parts of the readings (e.g., the findings). Make sure you can distinguish between these different kinds of reading (i.e., the goal of your reading), and use the right approach for the goal.

Rule #2: Adopt a method

Particularly if you need to burn through a lot of reading, having a methodical way to process the "stack" is really helpful. Here are some options.

Alpha & Omega method: Read the abstract and conclusions first. This will give you the overall goal of the research and the take-home findings. These are probably the most important things for you to gather.
"To the point" method: There are a couple versions of this method, but the goal is to get to the point of the paper.

Option 1: Read the results section first (and only go back to methods when needed)
Option 2: Read tables and figures first. They can be easier to consume than text sometimes.
Option 3: Read conclusions/discussions first. Do this if you're trying to get the most general "big picture" take-home message or ideas for future research.

Rule #3: Document your progress

Remember that your goal is to extract findings and conclusions, not just highlight them. So, rather than just underlining or highlighting key findings (something I did for years), make sure that your note-taking method pulls the findings out so that they're easier to find later. You don't want to have to flip through pages of articles to find your meaningful highlights. Here are some ideas.

Create a list of 3-4 "take home message". If you only remember 3-4 things about the article, these points are what you want to remember. Make this list in a separate document or on the first page of the article so it's easy to find.
In Adobe Reader (and maybe other PDF readers), you can insert comments and then have only the comments displayed (rather than showing them on the pages of the document). This is another way to bring important points to the surface.
Copy key text and tables right out of the document into your notes file or document draft. Be careful that you don't accidentally plagiarize when doing this. It protect against that, I like to copy out of articles as screen caps (rather than text) so that I can't edit it directly.

You also want to make your notes "retraceable". For your take-homes or anything you copy out of the article, be sure to say where in the article it was found (e.g., page, and even paragraph). This makes it easier to go back to the full discussion if you have to.

Suggestion #1 Try formal speed reading methods

I've tried a couple (Evelyn Wood: http://www.ewrd.com/ewrd/index.asp and Iris: http://www.irisreading.com/). I've had more luck with Iris (by which I mean, it's stuck as a habit), but that's probably because I took the Iris course, while I tried to learn EW from the book. Here are some general speed reading tips taken from both. These methods assume you're reading a chapter or section end-to-end (which you may not be if you're using one of the methods above). However, you can apply these techniques to individual sections of a research article pretty easily.

Do your reading in multiple passes (you don't have to do all these...they're a combination of the two methods)
1. Pass 1a: Read at a pace of 1-2 sec per page, just skimming for style, key words, etc.). This is a "warm-up pass" of sorts.
2. Pass 1b: Read the opening and closing paragraph of the chapter or section. A similar type of warm-up.
3. Take notes. What will the section be about? What terms seem new, etc.
4. Pass 2: Read the first sentence of each paragraph.
5. Pass 3: Read the full text.
Both methods recommend using a finger or some sort of pointer to "underline" or "tap" beneath the words as you read. Essentially, you follow your pointer to keep your eyes moving.
Don't back track. If you think you missed a word or sentence, just keep going. You'll likely pick it up later in the text or on the next read-through.
Don't take note while actively reading, but in between passes. Taking notes slows you down.

Suggestion #2 Make a game of it. How quickly can you extract three substantive take-home messages.

Suggestion #3 Use your environment. Don't read articles while lying comfortable on your sofa or in bed. Do them in a location or at a time that puts some pressure on you to finish. For example, read while standing and only sit down after you've finished an article. Or, read articles on the bus. Keep your readings hand so you can read them in those frequent 5-15 minute down times each day (e.g,. before a meeting, waiting for your dinner to heat up in the microwave).

Suggestion #4 Use Technology. http://accelareader.com/ is just one example that Iris recomments. You copy text into it and it shows you a word at a time at your chosen pace. Here are some other tools: http://www.irisreading.com/software-and-apps/

Happy reading!

Tuesday, July 15, 2014

Data import problems

After helping my GSR with a data import problem in Stata yesterday, I was reminded that the quickest and easiest solution is often to just read the file into a different program first (and the export from there, or read from there into the terminal program). In this case we read it into Excel, which correctly delimited the variables that Stata wasn't. The first Excel attempt didn't work, so we had to read it into Excel on a different computer (and later version of Excel).

This example of critical thinking and persistence in mundane tasks reminded me of a similar situation at CSR. I couldn't get my text file to read into FOCUS (our database program), so I enlisted the help of Vicky Stringfellow. We tried all known options (i.e., check column assignments in layout file, check for hidden special characters, read into Excel, etc.). Nothing worked. Finally, for some reason, reading it into SPSS worked. Still a mystery.

Thursday, July 3, 2014

Developing Your Personal Statistical Code/Syntax Library

At 38, I'm trying to develop and refine some research practices skills and habits that I wish I would've developed and refined earlier. These are the things no one teaches you about in class, so you have to learn them on the job or from a trusted adviser. Or, by trial and error. One is developing and maintain your own stat code library. Perhaps it's an artifact of starting my career in the internet age, not being a full-time programmer, or both, but it never occurred to me to develop a personal code library. I would write code for a class or project, and then it would just live with those class or project files. Makes them hard to find later, even if you hang onto everything. When you need to run an ANOVA in SAS three years after you took a class about it, you're probably not going to be thinking "Stat101_HW2.sas", or whatever way you saved the file at the time.

Why do this?

Your coding productivity (and whatever the end product of your coding is, i.e., research papers) will go up if you can quickly find the code you need.
There are a lot of great code sites but you can't rely on them for everything. In particular, they don't have your notes and modifications in what's presented on their site. Your code library is your own annotated, modified, and personalized versions of all those great examples online, from class notes, and in books.
Code snippets online are often short examples that don't cover an entire process (e.g., how to apply variable and value labels, but not a full data cleaning process). On the other extreme, you can find complex programs that were written for someone else's purposes. Echoing point 2, they aren't organized in a way that makes sense to you, so will be harder to use time after time unless you personalize them.

How to do it?

Start now! Literally, with your next coding task or class assignment, take time to annotate your code and save it in your library. Make it a habit.

Add to your library as you do new tasks or go back over old class materials.

Annotate the code liberally, explaining what each line or group of lines do. Just because you're clear about what it does now doesn't mean you will be in 3, 7, or 10 years later.

You don't need to make these textbook-quality examples, but think about what you would need to include to teach someone else your code (or to teach your 3-years-from-now self...who hasn't used the particular routine for a while, now has a job, kids, etc., and is being asked to turn around a project quickly). What can you add to the code quickly, while its fresh in your mind, that would help you understand it later.

At minimum, take time during your analysis tasks to copy good examples of code into your library. Don't just let it sit in your project folder (which may disappear if you're not the owner, or you'll forget to take with you when you change jobs, etc.). Make your code library a "thing" that you curate, just like you would your iTunes play lists or bookshelf at home.

Get in the habit of naming your files with meaningful names that will work across contexts. For example, instead of "Homework.sas" or "Homework 1.sas" (the second is slightly better), consider putting more info in the file name. "But I can just open the file and read it to see what it is." That's true, but that takes time. How many files named "Homework 1" do you want to open and read (and try to remember what it does) to find code when you're in a pinch. However, if you name or files like below, you're bound to always be able to remember what's in it because you've given yourself several cues.

stat101_hw1_proc-freq_dr-jans_spr2014.sas

Create standardized file names for your code once you've "curated" it. At minimum, the name should have a) what the procedure is, b) what software it's for, and c) the date modified. This should be replicated in a header within the file, too. For example, your file name might look like "sas_one-way-anova_2014-07-03.sas" Whether you start with "sas" or "oneway" (or even the date) depends on how you store your files and what sort order you like. You might say "But Windows tells me it's a SAS program file with a little icon." Consider what that would look like on a computer that doesn't have SAS installed. And while the .sas extension tells you it's a SAS program, sometimes extensions don't automatically display on computers. It never hurts to be a little redundant if it makes it easier to understand what's in the file without opening it.