Package gmisclib :: Module fiatio

Module fiatio

Fiatio reads and writes an extension of the FIAT file format originally defined by David Wittman (UC Davis). This reads and writes the FIAT 1.2 format, defined at http://kochanski.org/gpk/papers/2010/fiat.pdf. (FIAT 1.0 is defined at http://dls.physics.ucdavis.edu/fiat/fiat.html.)

Nice FIAT features:

header information looks like a comment to most programs, so they will treat a FIAT file as simple multi-column ASCII.
since it has column names in the header, you can add columns at will, and your existing scripts will continue to run.
Simple to parse.
Easy to generate.

This describes fiat 1.2 format, which is nearly 100% upwards compatible with fiat 1.0 format. It is defined as follows:

Lines are separated by newlines.
All values are encoded by replacing newline and other difficult characters by a percent character (%) followed by a hex code on writing, and the reverse on reading. (There are also some more human-friendly codes which can be used, instead of pure hex: see g_encode._specials for their definitions. Notably, %S is space, %L is newline, %R is carriage return, %t is tab, and %T is percent.)
At the top of the file, you have a line identifying the format: "# fiat 1.2" (regexp: "# fiat 1\.[0-9.]+").
Then, you typically have a number of header lines beginning with "#". Header lines are in the form # attribute = value (where white space is optional and can be a mixture of spaces and tabs). The attribute must match the regular expression [a-zA-Z_][a-zA-Z_0-9]* . The value is whatever follows the equals sign, after leading and following white space is stripped. If the value begins and ends with the same quote character, either ' or ", the quotes are also stripped off. Values may contain any character except newline and the chosen quote.
- Note that you must quote or encode a value if it begins or ends with whitespace.
- Note also that header lines can also appear further down in the file; they are not restricted to the top.
- Any other header lines are just treated as comments and ignored.

There may be header lines of the form "# TTYPE1 = name" or "#ttype4 = name" which name the columns of the data (the leftmost column is TTYPE1). If you don't name the Ith column, its name will be I. When writing, this module adds an attribute COL_SEPARATOR which contains the numeric code(s) (ASCII) of the column separator character(s). This defaults to 9, ASCII tab. The module also adds a COL_EMPTY attribute with the string used to mark an blank (nonexistant) item. (This defaults to %na.) Note that nonexistant is not the same as a zero-length string.
- These lines may also appear anywhere in the file. They take effect immediately.
- All attributes and names are optional.
Typically, the header is followed by multicolumn ASCII data. Columns are separated (by default) with any white space, but if there is a COL_SEPARATOR attribute, it is used instead. Empty entries for columns should be indicated by whatever code is specified in COL_EMPTY, if that is set. Otherwise, if COL_SEPARATOR is set, COL_SEPARATOR strings separate items, some of which may simply be empty. (In all cases, a completely blank line is treated as a datum which has all columns blank (nonexistant).)

If there is no DATE attribute, the write routine adds one, using the current date and time, in the form ccyy-mm-ddThh:mm:ss (as defined in the NASA FITS format). Note that all attributes are optional.

This is not quite David Wittman FIAT (1.0), which forces value to either be quoted or to contain no white space. Dwittman FIAT will take a line in the form "#a=b c", and interpret c as a comment, whereas fiat 1.2 will interpret the value as "b c". However, almost all files will be interpreted the same way as Fiat 1.0.

Here's an example:

       # fiat 1.2
       # TTYPE1 = b
       # TTYPE2 = a
       # SAMPRATE = 2.3
       # DATE = 2001-09-21T21:32:32
       # COL_EMPTY = "%na"
       # COL_SEPARATOR = "9"
       # Comment1
       # Comment2
       # b     a
       2       1
       3       2
       3       %na
       %na     3
       %na     %na
       0       1

Classes
	FiatioWarning
	writer Write a file in FIAT format.
	merged_writer Assumes that the data will be read with read_merged(), so that header values will supply default values for each column.
	ConflictingColumnSpecification

Functions

col_order(a, b)

source code

write_array(fd, adata, columns=None, comments=None, hdr=None, sep='\t', numeric=False)
Write a rectangular array in FIAT format. source code

write(fd, ldata, comments=None, hdr=None, sep='\t', blank='%na', fixed_order=0)
Write a file in FIAT format. source code

tuple(dict(str:anything), list(dict(str:anything)))

shared_data_values(data_items)
Takes a list of data and pulls out all the items that have the same value in each line.

source code

tuple(dict, list(dict(str:str)), list(str))

read(fd)
Read a fiat format data file.

source code

(list(dict), list(str))

read_merged(fd)
Read in a fiat file and return a list of dictionaries, one for each line in the file.

source code

(dict(str:str), dict(str:str), list(str))

readiter(fd)
Read in a fiat file.

source code

read_as_float_array(fd, loose=False, baddata=None)
Read in a fiat file.

source code

test1()

source code

test2()

source code

test3()

source code

test4()

source code

test()

source code

Variables
	TABLEN = `8`
	__package__ = `'gmisclib'`

Imports: re, types, string, warnings, gpk_writer, g_encode, BadFormatError, FiatError

Function Details

write_array(fd, adata, columns=None, comments=None, hdr=None, sep=`'\t'`, numeric=False)

source code

Write a rectangular array in FIAT format. Adata is a 2-D numpy array or a sequence of sequences.

write(fd, ldata, comments=None, hdr=None, sep=`'\t'`, blank=`'%na'`, fixed_order=0)

source code

Write a file in FIAT format. Ldata is a list of dictionaries. Each dictionary corresponds to one line in the file. Each unique key generates a column, and the values are printed in the data section of the FIAT file. Note that the TTYPE header lines will be automatically generated from ldata. Hdr is a dictionary of information that will be put in the header. Comments is a list of comment lines for the header. Fd is a file descriptor where the data should be written. Sep is a string used to separate data columns. Blank is a string to use when a data value is missing.

shared_data_values(data_items)

source code

Takes a list of data and pulls out all the items that have the same value in each line. The idea is that you can then put them into the header via:

hdr, data, c = read(fd) htmp, data = shared_data_values(data) hdr.update(htmp)

Parameters:

data_items (list(dict(str: anything)))

Returns: tuple(dict(str:anything), list(dict(str:anything)))

It returns a tuple of (1) a dictionary of header items, and (2) a list of data. The list has the same length as data_items, but the dictionaries within it may have fewer entries.

read(fd)

source code

Read a fiat format data file. Each line in the FIAT file is represented by a dictionary that maps column name into the data (data is a string). Lines without data in a certain column will not have the corresponding entry in the dictionary for that line.

You can use this function as follows:

       hdr, data, comments = read(fd)
       for datum in data:
               print datum['X']

Parameters:

fd (An iterator that generates strings. Typically a file object.) - The data source: typically a file descriptor.

Returns: tuple(dict, list(dict(str:str)), list(str))

Three items: header, data, and comments. Header is the collected dictionary of header information data is a list of dictionaries, one for each line in the file, and comments is a list of strings.

read_merged(fd)

source code

Read in a fiat file and return a list of dictionaries, one for each line in the file. (Also a list of comment lines.) Each line in the input FIAT file is represented by a dictionary that maps column name into the data (data is a string). The header data in the FIAT file is merged into the per-column data, so that the header data is used as a default value for the column of the same name. As a result, all the information in the file (both header and data) is in the resulting list of dictionaries.

NB: this is a bit of a specialized routine. Normally, one uses read.

E.g. if there is a header line "# X = Y" and no data column called "X", then this will succeed:

       data, comments = read_merged(fd):
       for datum in data:
               assert datum['X']=='Y'

That this routine does not require header lines to precede data lines. If header lines appear in the middle, then a new column will be created from that point onwards.

Parameters:

fd (An iterator that generates strings. Typically a file object.) - The data source: typically a file descriptor.

Returns: (list(dict), list(str))

(data, comments)

readiter(fd)

source code

Read in a fiat file. Each line in the file is represented by a dictionary that maps column name into the data (data is a string). Lines without data in a certain column will not have an entry in that line's dictionary for that column name.

Lines beginning with '#' are either header or comment lines. A fiat file can mix header lines amongst the data lines. (Although, typically, all the header info is at the top.)

You can use this function as follows:

       for (hdr, datum, comments) in readiter(fd):
               print datum['X']

NB: this is a bit of a specialized routine. Normally, one uses read.

Parameters:

fd (Anything that supports iteration. Typically a file object.) - The data source: typically a file. Not a filename.

Returns: (dict(str:str), dict(str:str), list(str))

Three items: header, data, and comments. Header is the collected dictionary of header information since the last iteration, data is a dictionary of the data on the current line, and comments is a list of comment string seen so far. The end of the file yields None for the last data, along with any header info or comments after the last datum.

read_as_float_array(fd, loose=False, baddata=None)

source code

Read in a fiat file. Return (header, data, comments), where header is a dictionary of header information, data is a numpy array, and comments is a list of strings. Two special entries are added to header: __COLUMNS points to a mapping from column numbers (the order in which they appeared in the file, left to right, starting with 0) to names, and _NAME_TO_COL holds the reverse mapping.

Empty values are set to NaN.

If loose==False, then all entries must be either floating point numbers or empty entries or equal to baddata (as a string, before conversion to a float). If loose==True, all non-floats will simply be masked out.

Module fiatio

write_array(fd, adata, columns=None, comments=None, hdr=None, sep='\t', numeric=False)

write(fd, ldata, comments=None, hdr=None, sep='\t', blank='%na', fixed_order=0)

shared_data_values(data_items)

read(fd)

read_merged(fd)

readiter(fd)

read_as_float_array(fd, loose=False, baddata=None)

write_array(fd, adata, columns=None, comments=None, hdr=None, sep=`'\t'`, numeric=False)

write(fd, ldata, comments=None, hdr=None, sep=`'\t'`, blank=`'%na'`, fixed_order=0)