A research workflow with Zotero and Org mode

A diagram of my workflow. I find papers on the internet, keep links and notes in an emacs org-mode buffer, use the links to pull up the papers when needed, and use the papers' bibliographies to find more references.

Any research project is going to involve a literature search: reading through a bunch of papers that might be relevant to your topic in order to get a sense of what the field already knows. Now, maybe there's some magic technique for picking out the information that matters, passing over the rest, and writing out a single, coherent story in one pass through all the papers you can find. If that technique exists, I have no idea what it is.

So when every paper brings up ten new questions and twenty papers to start answering them, I need a system to keep my notes organized. I need notes that let me jump back and forth between papers without losing my place, draw links between papers, and store lists of citations to come back to. Here's how I do it.

Storing papers with Zotero

The first tool I use is Zotero, a reference manager. Zotero's job is to store all the actual papers I come across, along with information like data on how to cite the papers and any tags they might have been published with. It can grab that information from my web browser, whether from a journal's website or someplace like Google Scholar or PubMed. It's also great for quickly putting together a bibliography, using bibtex or similar programs, when I want to write up some results.

The Zotero interface.

Zotero stores the papers I want to read and reference. I scaled up the font size here to make it readable in a tiny blog image.

Zotero isn't the only choice for reference managers. Mendeley is another popular choice, and there are a whole bunch more out there. I picked Zotero arbitrarily a few years ago, but it's working out well because of its emacs integration.

Keeping notes with Emacs and Org mode

You see, Zotero has some note-taking functions, and I used to keep my notes there, but there were some problems. Notes are stored as separate files for each paper, but I want to cross-reference notes from a lot of different papers at once. And while the editor has some rich-text capabilities (e.g. bold and italic text), it's missing important things I need in my notes, like the ability to typeset equations.

That's where Emacs and its extension Org mode come in. To borrow a term from Perl enthusiasts, Org mode is the swiss army chainsaw of text document formats. Org mode documents have a lot of features, and it's way beyond this post's scope to describe them all. For the purpose of research notes, the most useful things it lets me do are:

  • I can store my notes in a hierarchical tree structure, and I can hide parts of the tree from view in order to focus on other parts.
  • I can put hyperlinks into my notes, including links to papers, websites, or other parts of the file.
  • I can put math in my notes using Latex, and view the typeset equations right in my Emacs buffer.

An image of a notes file in Org mode.

A sample from my notes file. You can see the tree structure of the file, some links to papers, and a little bit of inline math, using Latex.

Gluing it all together with zotxt

Now, see those links to papers in my notes buffer? I didn't have to copy and paste them from anywhere. I inserted them with just three keystrokes each. So far, I've just described some useful pieces of software, but the interesting part of my workflow is how they fit together.

zotxt is an extension that lets other programs talk to Zotero, and Emacs has a package to talk to it. It's even structured specifically to work with Org mode documents. With zotxt, my workflow looks like this:

  • I find a paper I want to look at somewhere on the internet.
  • I use Zotero's browser plugin to save it to Zotero. Hopefully it grabs the paper itself and this happens in one click; if the site doesn't play along, I spend a minute grabbing a pdf and feeding it to Zotero.
  • I insert a link to the Zotero entry into my notes file in Emacs. I can do this with the key chords C-c " ". I don't need to further specify what paper I want to grab: the browser plugin leaves the paper selected in Zotero, and zotxt can grab the selected paper.
  • When I want to read the paper, I go to the link and tell Emacs to open the paper in my system PDF viewer. The key chords for this are C-c " a, and then selecting the PDF attachment from the Helm window that appears (usually I just type pdf RET).
  • When I'm reading a paper and see a citation that might be useful, I look it up on the internet and repeat this process to store a note linking to it.

It took me a while to get it set up to my liking, so here's how I did it:

  • First, install zotxt. If you're using Zotero as a firefox extension, you just need to install zotxt as another extension. If you're using the standalone Zotero client, you can still do it: download the extension file from that link, then go to the Add-Ons Manager under the Tools menu and find the option to install an add-on from a file.

The Install Add-on From File menu option.

The menu option looks like this.

  • Next, install the zotxt package in emacs. If your package manager is set up, you can just type M-x package-install RET zotxt RET.
  • Now, when org-zotxt-mode is active, you can use its functions in your org-mode buffers. You can search for papers and insert them with C-c " i, insert the currently-selected paper in Zotero with C-u C-c " i, and open a paper's PDF or other related files by moving the cursor to a link and typing C-c " a. However, you might want a little bit more setup to deal with some annoyances.
  • You probably want to have org-zotxt-mode automatically activated in all your org-mode documents. To make that happen, you can add some code to your .emacs file to start up this mode on all your org-mode buffers - see below this list for the .emacs configuration I use.
  • If you want to insert a link to the currently-selected item a lot, C-u C-c " i is an awkward sequence to type. I rebound it to C-c " ".
  • You might notice that when you insert a link to a paper, the text of that link is a full citation. That might be what you want, but I just want the authors, paper name, and year. It took me a bit of hacking to get around that: it's possible to tell the emacs zotxt interface to use a different citation format than the default, but I had to throw together a little XML file to give it a shorter format than a full citation. (This may not be the easiest or cleanest way to do it, but it works!) That XML file is here. To use it, go into your Zotero preferences and select Cite -> Styles, and add the file. It should appear in the menu as "mkbehr's short reference format". Then add the last two lines in the .emacs snippet below, and you should get shorter citations.
  • You probably want to install the Helm package, to make zotxt's search interface easier to navigate. That link should tell you everything you need to know.

Here's that .emacs setup code:

;; Activate org-zotxt-mode in org-mode buffers
(add-hook 'org-mode-hook (lambda () (org-zotxt-mode 1)))
;; Bind something to replace the awkward C-u C-c " i
(define-key org-mode-map
  (kbd "C-c \" \"") (lambda () (interactive)
                      (org-zotxt-insert-reference-link '(4))))
;; Change citation format to be less cumbersome in files.
;; You'll need to install mkbehr-short into your style manager first.
(eval-after-load "zotxt"
'(setq zotxt-default-bibliography-style "mkbehr-short"))

Of course, I'm not done tinkering to make my workflow better. I hear good things about the org-ref and helm-bibtex packages - if only I can keep an up-to-date bibtex file as I add papers to my library, I can associate links with not only a paper's pdf, but also that paper's section of my notes file. And I haven't found a smooth way to take a paper and pull up the papers it cites in my browser. But until then, I'm pretty happy with this setup.

Happy researching!

Comments

Comments powered by Disqus