The Joy of Text

Andrew Robinson, CEBRA, University of Melbourne

The days in which data arrived as clearly formatted and unequivocal numbers have passed, or did we dream them? Data commonly include characters, whether inadvertently or not. Happily, base R includes a suite of powerful tools for the manipulation of textual data. We'll arm ourselves with grep and gsub, and the wonderful regexp, and go out looking for trouble. And for completeness, we'll touch on sed, a lovely non-R tool for efficient pre-processing.