Module fiatio
source code
Fiatio reads and writes an extension of the FIAT file format
originally defined by David Wittman (UC Davis). This reads and writes the
FIAT 1.2 format, defined at http://kochanski.org/gpk/papers/2010/fiat.pdf. (FIAT
1.0 is defined at http://dls.physics.ucdavis.edu/fiat/fiat.html.)
Nice FIAT features:
-
header information looks like a comment to most programs, so they
will treat a FIAT file as simple multi-column ASCII.
-
since it has column names in the header, you can add columns at will,
and your existing scripts will continue to run.
-
Simple to parse.
-
Easy to generate.
This describes fiat 1.2 format, which is nearly 100% upwards
compatible with fiat 1.0 format. It is defined as follows:
-
Lines are separated by newlines.
-
All values are encoded by replacing newline and other difficult
characters by a percent character (%) followed by a hex code on
writing, and the reverse on reading. (There are also some more
human-friendly codes which can be used, instead of pure hex: see g_encode._specials for their
definitions. Notably,
%S
is space, %L
is
newline, %R
is carriage return, %t
is tab,
and %T
is percent.)
-
At the top of the file, you have a line identifying the format:
"# fiat 1.2" (regexp:
"# fiat
1\.[0-9.]+"
).
-
Then, you typically have a number of header lines beginning with
"#". Header lines are in the form
# attribute =
value
(where white space is optional and can be a mixture of
spaces and tabs). The attribute must match the regular expression
[a-zA-Z_][a-zA-Z_0-9]*
. The value is whatever follows
the equals sign, after leading and following white space is stripped.
If the value begins and ends with the same quote character, either
'
or "
, the quotes are also stripped
off. Values may contain any character except newline and the chosen
quote.
-
Note that you must quote or encode a value if it begins or ends
with whitespace.
-
Note also that header lines can also appear further down in the
file; they are not restricted to the top.
-
Any other header lines are just treated as comments and ignored.
-
There may be header lines of the form "# TTYPE1 = name" or
"#ttype4 = name" which name the columns of the data (the
leftmost column is TTYPE1). If you don't name the Ith column, its
name will be
I
. When writing, this module adds an
attribute COL_SEPARATOR which contains the numeric code(s) (ASCII) of
the column separator character(s). This defaults to 9, ASCII tab.
The module also adds a COL_EMPTY attribute with the string used to
mark an blank (nonexistant) item. (This defaults to
%na
.) Note that nonexistant is not the same as a
zero-length string.
-
These lines may also appear anywhere in the file. They take
effect immediately.
-
All attributes and names are optional.
-
Typically, the header is followed by multicolumn ASCII data. Columns
are separated (by default) with any white space, but if there is a
COL_SEPARATOR attribute, it is used instead. Empty entries for
columns should be indicated by whatever code is specified in
COL_EMPTY, if that is set. Otherwise, if COL_SEPARATOR is set,
COL_SEPARATOR strings separate items, some of which may simply be
empty. (In all cases, a completely blank line is treated as a datum
which has all columns blank (nonexistant).)
If there is no DATE attribute, the write routine adds one, using the
current date and time, in the form ccyy-mm-ddThh:mm:ss (as defined in the
NASA FITS format). Note that all attributes are optional.
This is not quite David Wittman FIAT (1.0), which forces value to
either be quoted or to contain no white space. Dwittman FIAT will take a
line in the form "#a=b c", and interpret c as a comment,
whereas fiat 1.2 will interpret the value as "b c". However,
almost all files will be interpreted the same way as Fiat 1.0.
Here's an example:
# fiat 1.2
# TTYPE1 = b
# TTYPE2 = a
# SAMPRATE = 2.3
# DATE = 2001-09-21T21:32:32
# COL_EMPTY = "%na"
# COL_SEPARATOR = "9"
# Comment1
# Comment2
# b a
2 1
3 2
3 %na
%na 3
%na %na
0 1
|
|
|
write_array(fd,
adata,
columns=None,
comments=None,
hdr=None,
sep=' \t ' ,
numeric=False)
Write a rectangular array in FIAT format. |
source code
|
|
|
write(fd,
ldata,
comments=None,
hdr=None,
sep=' \t ' ,
blank=' %na ' ,
fixed_order=0)
Write a file in FIAT format. |
source code
|
|
tuple(dict(str:anything), list(dict(str:anything)))
|
|
tuple(dict, list(dict(str:str)), list(str))
|
|
(list(dict), list(str))
|
read_merged(fd)
Read in a fiat file and return a list of dictionaries, one for each
line in the file. |
source code
|
|
(dict(str:str), dict(str:str), list(str))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TABLEN = 8
|
|
__package__ = ' gmisclib '
|
Imports:
re,
types,
string,
warnings,
gpk_writer,
g_encode,
BadFormatError,
FiatError
write_array(fd,
adata,
columns=None,
comments=None,
hdr=None,
sep=' \t ' ,
numeric=False)
| source code
|
Write a rectangular array in FIAT format. Adata is a 2-D numpy array
or a sequence of sequences.
|
write(fd,
ldata,
comments=None,
hdr=None,
sep=' \t ' ,
blank=' %na ' ,
fixed_order=0)
| source code
|
Write a file in FIAT format. Ldata is a list of dictionaries. Each
dictionary corresponds to one line in the file. Each unique key
generates a column, and the values are printed in the data section of the
FIAT file. Note that the TTYPE header lines will be automatically
generated from ldata. Hdr is a dictionary of information that will be put
in the header. Comments is a list of comment lines for the header. Fd is
a file descriptor where the data should be written. Sep is a string used
to separate data columns. Blank is a string to use when a data value is
missing.
|
Takes a list of data and pulls out all the items that have the same
value in each line. The idea is that you can then put them into the
header via:
hdr, data, c = read(fd) htmp, data = shared_data_values(data)
hdr.update(htmp)
- Parameters:
data_items (list(dict(str: anything)) )
- Returns:
tuple(dict(str:anything), list(dict(str:anything)))
- It returns a tuple of (1) a dictionary of header items, and (2) a
list of data. The list has the same length as
data_items , but the dictionaries within it may have
fewer entries.
|
Read a fiat format data file. Each line in the FIAT file is
represented by a dictionary that maps column name into the data (data is
a string). Lines without data in a certain column will not have the
corresponding entry in the dictionary for that line.
You can use this function as follows:
hdr, data, comments = read(fd)
for datum in data:
print datum['X']
- Parameters:
fd (An iterator that generates strings. Typically a file object.) - The data source: typically a file descriptor.
- Returns: tuple(dict, list(dict(str:str)), list(str))
- Three items: header, data, and comments. Header is the collected
dictionary of header information data is a list of dictionaries,
one for each line in the file, and comments is a list of strings.
|
Read in a fiat file and return a list of dictionaries, one for each
line in the file. (Also a list of comment lines.) Each line in the input
FIAT file is represented by a dictionary that maps column name into the
data (data is a string). The header data in the FIAT file is merged into
the per-column data, so that the header data is used as a default value
for the column of the same name. As a result, all the information in the
file (both header and data) is in the resulting list of dictionaries.
NB: this is a bit of a specialized routine. Normally, one uses read.
E.g. if there is a header line "# X = Y" and no data column
called "X", then this will succeed:
data, comments = read_merged(fd):
for datum in data:
assert datum['X']=='Y'
That this routine does not require header lines to precede data lines.
If header lines appear in the middle, then a new column will be created
from that point onwards.
- Parameters:
fd (An iterator that generates strings. Typically a file object.) - The data source: typically a file descriptor.
- Returns: (list(dict), list(str))
- (data, comments)
|
Read in a fiat file. Each line in the file is represented by a
dictionary that maps column name into the data (data is a string). Lines
without data in a certain column will not have an entry in that line's
dictionary for that column name.
Lines beginning with '#' are either header or comment lines. A fiat
file can mix header lines amongst the data lines. (Although, typically,
all the header info is at the top.)
You can use this function as follows:
for (hdr, datum, comments) in readiter(fd):
print datum['X']
NB: this is a bit of a specialized routine. Normally, one uses read.
- Parameters:
fd (Anything that supports iteration. Typically a file object.) - The data source: typically a file . Not
a filename.
- Returns: (dict(str:str), dict(str:str), list(str))
- Three items: header, data, and comments. Header is the collected
dictionary of header information since the last iteration, data
is a dictionary of the data on the current line, and comments is
a list of comment string seen so far. The end of the file yields
None for the last data, along with any header info or comments
after the last datum.
|
read_as_float_array(fd,
loose=False,
baddata=None)
| source code
|
Read in a fiat file. Return (header, data, comments), where header is
a dictionary of header information, data is a numpy array, and comments
is a list of strings. Two special entries are added to header: __COLUMNS
points to a mapping from column numbers (the order in which they appeared
in the file, left to right, starting with 0) to names, and _NAME_TO_COL
holds the reverse mapping.
Empty values are set to NaN.
If loose==False, then all entries must be either floating point
numbers or empty entries or equal to baddata (as a string, before
conversion to a float). If loose==True, all non-floats will simply be
masked out.
|