23 Dec 2017 by sammons

I need to admit this right off the bat: I still use Microsoft Excel to do a significant portion of my data analysis. My trainees do as well. And deep down, I do not think there is nearly the problem with using Excel (or other spreadsheet software alternatives like LibreOffice) that many in the computer science/bioinformatics world might lead you to believe. There is something to be said for simplicity and for being able to actually "see" and interact with your data.

I would be (more) of a fool if I didn't say there are extreme limitations to Excel that preclude its use on large datasets.

These include 1. row and column limits

  1. severe trouble with sorting lists by any feature other than by numerical values,

  2. problems with reproducible workflows, scripting, and data formats, and

  3. most importantly, automatic formatting that leads to data analysis errors.

These items are summarized incredibly well in a recent PeerJ Preprint.

I think this is an essential read for anyone who uses spreadsheets to collect data (so, most of us!)