The HUGO Gene Nomenclature Committee (HGNC) has published its latest guidelines for the names given to human genes, which includes changes to symbols that automatically converted to dates in Microsoft Excel.
An example of this is the “Membrane Associated Ring-CH-Type Finger 1“, which abbreviated to MARCH1 and was automatically converted to 1 March of the current year if you typed it in Excel.
To avoid this issue, the HGNC has changed the MARCH1 abbreviation to MARCHF1. Similarly, SEPT1 is now SEPTIN1.
While you can simply change the data type of individual cells, rows, columns, or even an entire spreadsheet to text, the problem is that researchers share data in comma-separated values (CSV) files.
CSV files do not preserve any of the formattings you might use in Excel to prevent the automatic conversion from a text string into a date.
According to a report on The Verge, this widespread issue was a significant concern for researchers in genetics.
A 2016 study found that around 20% of peer-reviewed work had been affected by errors though Microsoft Excel, The Verge reported. The study surveyed the genetic data that was included with 3,597 published papers between 2005 and 2015.
The Verge reported that over the past year, the HGNC has renamed 27 human genes to avoid the errors introduced when researchers use Excel for their work.
It’s not only Excel that can introduce errors like these. The problem also exists in Google Sheets, which converts MARCH1 to a date. Google Sheets does not automatically convert SEPT1 to a date.
LibreOffice Calc does not convert MARCH1 or SEPT1 to a date.
Aside from changing the names of genes to make it easier for researchers to continue using Excel, the HGNC announced several other changes with its latest guidelines.
These include changing tRNA synthetase symbols that were common words, such as WARS which has been changed to WARS, and CARS which has been changed to CARS1.
Genes that used misleading or incorrect nomenclature were renamed, along with genes that used domain- or motif-based terms, such as TMEM206 which has been renamed to PACC1.
The HGNC also changed symbols that were determined to be pejorative, such as DOPEY1 that was renamed to DOP1 leucine zipper-like protein A (DOP1A).