Research Statistics: Spring 2020

I’m a big believer in personal organization, and particularly in the value of keeping logs. Personal logging, or journaling, is, of course, not a new phenomenon; for centuries people have kept logs of their days in order to gain perspective, understand their emotions, and leave a record of their thoughts for posterity—or even themselves, at a later date. My journals tend to give a sense of my tendency to over-organize.

Pages from personal journal

Why Keep a Research Log?

In a research context, logging can be less common. There are some fields, such as biology, in which strict standards for research logs have been established and taught, but in computer science this instruction is not standardized. If not for my advisor’s urging, I might not have ever started keeping research logs. Having now kept consistent research logs for three years, I can confidently say that they are an invaluable tool for keeping track of experiments, recalling the broader purposes of a project, and remembering the work that I have done over a specific period. With some additional record-keeping, they can also become a way to analyze trends in the kinds of work that you’re doing, and to improve time management.

Screenshot from the beginning of my research log Screenshot from further down in the research log

The simplest way to keep a research log is to use something like Google Docs. For my purposes, a research log needs to have two things: a way to record a unit of work done on a particular day, and a way of categorizing those units of work. It should also be accessible from any computer, since I end up using a number of different computers in my research and its a good idea to make recording research activities as easy as possible. I use color-coding to categorize, but certainly more sophisticated tagging tools exist in other products like Airtable. What is attractive to me about Google Docs is the flexibility of a word document-based log; each recorded unit of work can stretch as long as I need it to, with any number of sub-bullets, without coming up against the constraints of a spreadsheet-based logging tool. In any case, the main thing is to optimize for ease of recording activities—analyzing your logs, as I will go on to do in the rest of this post, can take as long as it needs to because it’s done infrequently.

At regular intervals—I choose the end of each semester—it’s a good idea to pause, compile statistics from the current research log, and begin a new one. This is a good idea for several reasons: (1) It’s helpful and good for your mental health to periodically read back over your log to remember what you’ve done (chances are good that it’s more than you think); (2) Having separate documents for different time periods makes it easier to find a specific date range later on; (3) Google Docs doesn’t deal well with long documents—I see significant slowdowns if a research log gets longer than twenty pages.

Screenshot from research stats spreadsheet

Compiling statistics at regular intervals also gives you the opportunity to compare, for example, research activities in Spring 2020 to research activities in Spring 2019. Were you more or less productive last year? Are you still doing enough reading as you progress in your research? Did you remember to take enough days off to relax? These are all questions that research log statistics can help you to answer. To actually compile the statistics, I just used Google Sheets. I then downloaded the resulting spreadsheet as a CSV to do the plotting that follows. I used R to generate these plots, but any programming language (or spreadsheet software) will have equivalent capabilities.

Spring 2020

Spring 2020 was a weird semester, as I’m sure I don’t need to remind anyone. Because of that, one might reasonably expect that my productivity went down. And, more generally, because these are statistics generated from subjective units of work as self-reported by me, one might expect that they are not particularly consistent or reliable. Surprisingly, however, there are remarkable consistencies and trends that can be observed in the data that I’ve collected over the past two years1. I’ve previously analyzed some of this data on Twitter, so I will mainly be focusing here on the trends that Spring 2020 highlights. They are, of course, inherently personal, and I will try to avoid over-generalizing. I encourage anyone to analyze their own logs and share the results; maybe there are certain aspects of our self-reported productivity that are common across people.

Spring 2020 Plots

Spring 2020 trends plot Spring 2020 stacked dot plot

It’s only really appropriate to compare statistics from Spring 2020 with statistics from Spring 2019, because I have different responsibilities in the Fall and in the Spring. In the Fall I’m usually spending a lot of time writing fellowship applications and TAing, but in the Spring my attention is focused more directly on research. The plots above display these statistics in two different forms: separate line plots of the number of units of work in five different categories that I accomplished over the semester, and a stacked dot plot showing the total distribution of work throughout the semester. In the line plots, the vertical gray bars indicate “days off”, during which I recorded no research work units.

Let’s first discuss trends within the data from Spring 2020. There’s an interesting ‘m’-shaped trend to the total units of work graph, which could be attributed to either of two obvious explanations. The first is that I began working from home in response to the coronavirus in late February 2020, and so the decrease in productivity that began around that time might be directly related to that decision. The resumption in productivity in mid-March might then be the result of my adjusting to the different work environment and heightened level of stress. An alternative explanation arises if we examine the pattern of days off during this semester—from this perspective, it appears that decreases in productivity are preceded by periods of relatively few days off, whereas the uptick of productivity in mid-March is preceded by a four-day break that I took. This explanation is somewhat undercut by the similar-looking trend in the meetings plot, which suggests that there was a reduction in my overall productivity associated with a reduction in meetings. Since the number of meetings that I have per week is not really under my control but was affected by quarantine, the coronavirus explanation is the more likely one for this ‘m’-shaped trend.

Apart from the ‘m’-shaped trend, I did more research and reading at the beginning of Spring 2020 than at the end. This is because I got reviews back on a rejected paper in February and began rewriting it. A corresponding increase in the number of writing work units later in the semester confirms this. There’s also a noticeable decrease in the number of miscellaneous work units as the semester goes on, but I think that this is attributable to the work from home situation. Not being in my office and with most events cancelled, I have many fewer small administrative and extracurricular activities to take care of.

Spring 2019 Plots

Spring 2019 trends plot Spring 2019 stacked dot plot

Now, let’s compare with Spring 2019. The scale on these graphs is slightly different, but we can still observe some similarities and differences with Spring 2020. First, there’s again a decrease in total work units following a period without many days off beginning in late March. Since there was no quarantine in Spring 2019, we might infer that failing to take days off does indeed affect overall productivity. Ultimately, we can’t draw any concrete conclusions from these two semesters’ data alone—but I think that I can comfortably say that it’s important to take time to rest during a PhD. Second, it’s interesting that my research and reading effort was more clustered around the middle of the semester in Spring 2019, rather than being clustered at the beginning and end as in Spring 2020. I also did less writing in Spring 2019; perhaps one has to do with the other? Writing is certainly time-consuming and mentally draining. Finally, it’s comforting to see that the average number of meetings and miscellaneous items that I do per day hasn’t changed dramatically from Spring 2019 to Spring 2020—this suggests that I’m successfully avoiding bureaucracy creep in my day-to-day research activities.

I want to point out one final aspect of this data that isn’t reflected in the plots above, but which has been the most surprising aspect of doing this analysis each semester. From the data in my research logs, I can calculate the number of days off—defined as days in which I recorded no work units in my research log—for each semester. The fascinating thing about this number is that it is utterly consistent from Fall 2018 to Spring 2020. Despite the fact that I don’t always take weekends off, and factoring in vacations from school, each of these four semesters I have taken either 26 or 27 days off. The spooky thing about this number is that it corresponds to roughly the number of weekend days in a Rice semester, which consists of fourteen weeks. Without getting too speculative about it, this seems to me to be evidence that the five-day work week is somehow ingrained in my psyche. Even when I don’t consciously intend it, my behavior on average reinforces that standard pattern of work. Is this a cultural effect, or does it indicate an amount of necessary rest that is fundamental to human pyschology? I have no idea.

To finish up here, I’d like to say that despite the fact that I’m calling these “research statistics,” what I’m seeking to do in this blog post really has very little to do with formal statistics. Quantitative data about our habits can help us to understand patterns that we might tend to forget, but at the end of the day there’s an essential qualitative aspect to being human. Nowhere reported in these numbers are the crucial interactions with friends and family members that have allowed me to stay sane during one of the most stressful events in any of our lives. Compiling these statistics every semester is not an exercise in data science; it’s a useful reminder to slow down, reflect, and consider whether I’m living my life according to the principles that I hold dear. Don’t let life pass you by unnoticed—keep some sort of log of your time. In the last analysis, we have so little of it.

  1. I didn’t start categorizing entries in my research log until I started my PhD, in Fall 2018. As such, I don’t have numbers from Fall 2017 and Spring 2018. 

Published by Cannon Lewis on May 19, 2020