Author: Xavier Fernández-i-Marín
January 14, 2012 - 3 minutesPython
The program process_nici processes quantification files manually integrated from a gas chromatograph coupled to a mass spectrometer operating in negative ion chemical ionization mode (GC-MS-NICI, 6980N Agilent Technologies) into a meaningful output that can be directly exported to statistical packages and spreadsheets. This kind of file is generated from ChemStation Software and it contains the concentration of specific compounds.
The program is licensed under a Free Software license (GPLv3), which means that you have the right to use, study and modify it. The only restriction is that if you make modifications you have to share them under the same circumstances and make the source code available.
The program is writen in Python, so a working python installation is required. Python is also Free Software. In addition to being free as in freedom, it is also free as in “free beer”, so you can download it at no cost. It runs in many different platforms and there are pre-packaged versions for the most used operating systems at Get Python. In GNU/Linux I recommend to use the python version that comes packaged with your distribution.
In order to get a spreadsheet with the results, the script also uses a non-standard python library called xlwt. Python libraries (modules) can be easily installed. Consult the guide Installing Python Modules for further assistance.
How to run the program
You have to copy the “process_nici.py” script in the top of the directory hierarchy that you want to analize. From then, you must run the program invoking it from the command line and with an argument, which represents the name of the files that must be processed (in this example, quan1.txt).
$ python process_nici.py quan1.txt
The program will print the status of the operations and it may return an error message if it detects something wrong during the process.
The program returns two files: out.csv and compounds.xls.
out.csv is a text file with three columns. The first column represents the name of the sample as stated by the “Sample” or “Misc” field. The second column represents the compound, and the third column its concentration. This comma-separated file can be read by any spreadsheet or statistical package. In R, for example, you get a nice data frame by importing it with:
d <- read.table(“out.csv”, sep=“;”, header=TRUE)
compounds.xls is a spreadsheet of concentrations where columns represent samples and rows represent compounds.
The concentrations below the calibration are assigned value -1, and the non-detected concentrations are assigned value -2. This is useful for later processing the signals.
Characteristics and Limitations
- As of version 1.1 the program reads the concentrations under the “Target Compounds” section. So the “Internal Standards” section is not processed.
- The program compares the strings of “Sample” and “Misc” to get a number for the sample. If they do not match, it gives an error. If one of them is empty, the program uses the other as a valid name for the sample.