Package gmisclib :: Module fiatio
[frames] | no frames]

Source Code for Module gmisclib.fiatio

  1  """Fiatio reads and writes an extension of the 
  2  FIAT file format originally defined by David Wittman (UC Davis). 
  3  This reads and writes the FIAT 1.2 format, defined at U{http://kochanski.org/gpk/papers/2010/fiat.pdf}. 
  4  (FIAT 1.0 is defined at U{http://dls.physics.ucdavis.edu/fiat/fiat.html}.) 
  5   
  6  Nice FIAT features: 
  7          - header information looks like a comment 
  8          to most programs, so they will treat a FIAT file as 
  9          simple multi-column ASCII. 
 10   
 11          - since it has column names in the header, you can add 
 12          columns at will, and your existing scripts will continue 
 13          to run. 
 14   
 15          - Simple to parse. 
 16   
 17          - Easy to generate. 
 18   
 19   
 20  This describes fiat 1.2 format, which is nearly 100% 
 21  upwards compatible with fiat 1.0 format. 
 22  It is defined as follows: 
 23   
 24          1. Lines are separated by newlines. 
 25   
 26          2. All values are encoded by replacing newline and other difficult 
 27          characters by a percent character (%) followed by a hex code 
 28          on writing, and the reverse on reading.   (There are also 
 29          some more human-friendly codes which can be used, instead 
 30          of pure hex: 
 31          see L{g_encode._specials} for their definitions.   Notably, 
 32          C{%S} is space, C{%L} is newline, C{%R} is carriage return, 
 33          C{%t} is tab, and C{%T} is percent.) 
 34   
 35          3. At the top of the file, you have a line identifying the format: "# fiat 1.2" 
 36          (regexp: C{"# fiat 1\.[0-9.]+"}). 
 37   
 38          4. Then, you typically have a number of header lines beginning with "#". 
 39          Header lines are in the form C{# attribute = value} (where 
 40          white space is optional and can be a mixture of spaces and tabs). 
 41          The attribute must match the regular expression C{[a-zA-Z_][a-zA-Z_0-9]*} . 
 42          The value is whatever follows the equals sign, after leading and following 
 43          white space is stripped.  If the value begins and ends with 
 44          the same quote character, either C{'} or C{"}, the quotes are also stripped off. 
 45          Values may contain any character except newline and the chosen quote. 
 46   
 47                  - Note that you must quote or encode a value if it begins or ends with whitespace. 
 48   
 49                  - Note also that header lines can also appear further down in the file; 
 50                          they are not restricted to the top. 
 51   
 52                  - Any other header lines are just treated as comments and ignored. 
 53   
 54          6. There may be header lines of the form 
 55          "# TTYPE1 = name" or "#ttype4 = name" 
 56          which name the columns of the data (the leftmost column is TTYPE1). 
 57          If you don't name the Ith column, its name will be C{I}. 
 58          When writing, this module adds an attribute 
 59          COL_SEPARATOR which contains the numeric code(s) 
 60          (ASCII) of the column separator character(s).  This defaults to 9, 
 61          ASCII tab. 
 62          The module also adds a COL_EMPTY attribute with the string used to mark an 
 63          blank (nonexistant) item.  (This defaults to C{%na}.)  Note that nonexistant 
 64          is not the same as a zero-length string. 
 65   
 66                  - These lines may also appear anywhere in the file.   They take 
 67                          effect immediately. 
 68   
 69                  - All attributes and names are optional. 
 70   
 71          7. Typically, the header is followed by multicolumn ASCII data. 
 72          Columns are separated (by default) with any white space, 
 73          but if there is a COL_SEPARATOR attribute, it is used instead. 
 74          Empty entries for columns should be indicated by whatever code is specified 
 75          in COL_EMPTY, if that is set. 
 76          Otherwise, if COL_SEPARATOR is set, COL_SEPARATOR strings separate items, 
 77          some of which may simply be empty. 
 78          (In all cases, a completely blank line is treated as a datum which has all 
 79          columns blank (nonexistant).) 
 80   
 81  If there is no DATE attribute, the write routine adds one, using the current date 
 82  and time, in the form ccyy-mm-ddThh:mm:ss (as defined in the NASA FITS format). 
 83  Note that all attributes are optional. 
 84   
 85  This is not quite David Wittman FIAT (1.0), which forces value to either be quoted 
 86  or to contain no white space. Dwittman FIAT will take a line 
 87  in the form "#a=b c", and interpret c as a comment, whereas 
 88  fiat 1.2 will interpret the value as "b c". 
 89  However, almost all files will be interpreted the same way as Fiat 1.0. 
 90   
 91  Here's an example:: 
 92   
 93          # fiat 1.2 
 94          # TTYPE1 = b 
 95          # TTYPE2 = a 
 96          # SAMPRATE = 2.3 
 97          # DATE = 2001-09-21T21:32:32 
 98          # COL_EMPTY = "%na" 
 99          # COL_SEPARATOR = "9" 
100          # Comment1 
101          # Comment2 
102          # b     a 
103          2       1 
104          3       2 
105          3       %na 
106          %na     3 
107          %na     %na 
108          0       1 
109  """ 
110   
111   
112  import re 
113  import types 
114  import string 
115  import warnings 
116   
117  # from gmisclib: 
118  import gpk_writer 
119  import g_encode 
120   
121  TABLEN = 8 
122   
123 -class FiatioWarning(UserWarning):
124 - def __init__(self, *s):
125 UserWarning.__init__(self, *s)
126 127 128
129 -def _alph(s):
130 n = min(len(s), 8) 131 o = 0.0 132 f = 1.0 133 # Not OK for unicode. Sigh. 134 for i in range(n): 135 f = f/256.0 136 o += f * ord(s[i]) 137 return -o
138 139 140 # def col_order(a, b): 141 # lc = cmp(len(str(a[0]))+a[1], len(str(b[0]))+b[1]) 142 # if lc != 0: 143 # return lc 144 # return cmp(str(a[0]), str(b[0])) 145
146 -def col_order(a, b):
147 sa = str(a) 148 sb = str(b) 149 lc = cmp(len(sa), len(sb)) 150 if lc != 0: 151 return lc 152 return cmp(sa, sb)
153 154 155 _autogen = re.compile("COL_SEPARATOR$", re.IGNORECASE) 156 _drop = re.compile("(__NAME_TO_COL|__COLUMNS)$") 157 158
159 -def write_array(fd, adata, columns=None, comments=None, hdr=None, sep='\t', numeric=False):
160 """Write a rectangular array in FIAT format. 161 Adata is a 2-D numpy array or a sequence of sequences. 162 """ 163 w = writer(fd, sep=sep) 164 165 if columns is not None: 166 w.add_cols(columns) 167 if hdr is not None: 168 w.headers(hdr) 169 if comments is not None: 170 for c in comments: 171 w.comment(c) 172 for i in range(len(adata)): 173 w.datavec( adata[i], numeric=numeric )
174 175 176 _autogen = re.compile("TTYPE[0-9]+|COL_EMPTY|COL_SEPARATOR", re.IGNORECASE) 177
178 -def write(fd, ldata, comments=None, hdr=None, sep='\t', blank='%na', fixed_order=0):
179 """Write a file in FIAT format. 180 Ldata is a list of dictionaries. Each dictionary 181 corresponds to one line in the file. Each 182 unique key generates a column, and the values 183 are printed in the data section of the FIAT file. 184 Note that the TTYPE header lines will be automatically 185 generated from ldata. 186 Hdr is a dictionary of information that will be 187 put in the header. 188 Comments is a list of comment lines for the header. 189 Fd is a file descriptor where the data should be 190 written. 191 Sep is a string used to separate data columns. 192 Blank is a string to use when a data value is missing. 193 """ 194 w = writer(fd, sep=sep, blank=blank) 195 if comments is not None: 196 for com in comments: 197 w.comment(com) 198 if hdr is not None: 199 w.headers(hdr) 200 for d in ldata: 201 w.datum(d) 202 fd.flush()
203 204
205 -class writer(gpk_writer.writer):
206 """Write a file in FIAT format. This class represents an open file, 207 and you call member functions to write data into the file. 208 This automatically generates much of the header information. 209 210 Column names are set from the keys passed in the C{datum()} method. 211 Each unique key generates a column, and the values 212 are printed in the data section of the FIAT file. 213 The TTYPE header lines will also be automatically generated. 214 """ 215
216 - def comment(self, comment):
217 """Add a comment to the data file. 218 @param comment: the comment 219 @type comment: str 220 """ 221 if '\n' in comment: 222 raise ValueError, "No newline allowed in comments for fiatio." 223 self.fd.write("# %s\n" % comment)
224 225
226 - def header(self, k, v):
227 """Add a single C{key=value} line to the header of the data file. 228 @param k: key 229 @param v: value 230 @type k: str 231 @type v: str 232 """ 233 if _autogen.match(k): 234 warnings.warn("Hdr specifies information that is automatically generated: %s" % k, FiatioWarning) 235 elif _drop.match(k): 236 return 237 self.__write_header(k, v)
238 239
240 - def __init__(self, fd, sep='\t', blank='%na'):
241 """@param fd: where to write the data 242 @type fd: L{file} 243 @param sep: what separates columns? 244 @type sep: str 245 @param blank: what marks a spot where there isn't data? 246 @type blank: str 247 """ 248 gpk_writer.writer.__init__(self, fd) 249 self.enc = _encoder(sep) 250 self.blank = blank 251 self.sep = sep 252 self.map = {} 253 self.columns = [] 254 fd.write("# fiat 1.2\n") 255 fd.write("# I/O code: gmisclib.fiatio.py in speechresearch project on http://sourceforge.org\n") 256 fd.write("# Format definition: http://kochanski.org/gpk/papers/2010/fiat.pdf\n") 257 self.__write_header('COL_EMPTY', self.blank) 258 self.__write_header('COL_SEPARATOR', 259 ' '.join([str(ord(sc)) for sc in self.sep]) 260 )
261
262 - def add_cols(self, colnames):
263 n = len(self.map) 264 for c in colnames: 265 self.map[c] = n 266 self.columns.append( c ) 267 self.fd.write("# TTYPE%d = %s\n" % (n+1, c)) 268 n += 1
269
270 - def __hline(self, o):
271 """o is not used, except to help set the width of each field. 272 """ 273 ostart = 0 274 hstart = 1 # The comment symbol. 275 ls = len(self.sep) 276 hline = [] 277 for (cn, val) in zip(self.columns, o): 278 w = max(1, len(val) + (ostart - hstart)/2) 279 ostart += len(val) + ls 280 cn = str(cn) 281 hstart += len(cn) + ls 282 hline.append(cn.center(w)) 283 self.fd.write('#' + self.sep.join(hline) + '\n')
284 285
286 - def __write_header(self, k, v):
287 v = '%s' % v 288 if '\n' in v or v[0] in string.whitespace or v[-1] in string.whitespace: 289 v = '|%s|' % self.enc.fwd(v) 290 self.fd.write("# %s = %s\n" % (k, v))
291 292
293 - def datum(self, data_item):
294 """Write a line into a fiat file. They column names will be set from 295 the keys. 296 @param data_item: a dictionary of C{key=value} pairs. 297 @type data_item: C{dict(str: anything)} 298 """ 299 o = [ self.blank ] * len(self.map) 300 # o = [ self.blank for q in self.map.keys() ] 301 try: 302 # This is the path for most calls. 303 for (k, v) in data_item.iteritems(): 304 o[self.map[k]] = self.enc.fwd(str(v)) 305 except KeyError: 306 # This is the path the first time, when self.map 307 # doesn't yet exist. 308 add = [] 309 for (k, v) in data_item.iteritems(): 310 if isinstance(k, types.StringType): 311 pass 312 elif isinstance(k, types.IntType) and k>=0: 313 pass 314 else: 315 raise TypeError, ("Key is not a string or non-negative integer", k) 316 if not k in self.map: 317 # add.append( (k, len(str(v))) ) 318 add.append( k ) 319 add.sort(col_order) 320 # self.add_cols([ t[0] for t in add ]) 321 self.add_cols( add ) 322 o = [ self.blank ] * len(self.map) 323 # o = [ self.blank for q in self.map.keys() ] 324 for (k, v) in data_item.iteritems(): 325 o[self.map[k]] = self.enc.fwd(str(v)) 326 self.__hline(o) 327 self.fd.write(self.sep.join(o) + '\n')
328 329
330 - def datavec(self, vector, numeric=False):
331 """This assumes that you've already called add_cols() to set the 332 column names. It is an error to have a vector whose length doesn't 333 match the number of column names. 334 """ 335 lv = len(vector) 336 lc = len(self.columns) 337 assert lv >= lc, "vector length=%d but %d columns" % (lv, lc) 338 if lc < lv: 339 self.add_cols( [ '%d' % q for q in range(lc, lv) ] ) 340 if numeric: 341 self.fd.write( self.sep.join([str(q) for q in vector]) + '\n' ) 342 else: 343 self.fd.write( self.sep.join([self.enc.fwd(str(q)) for q in vector]) + '\n' )
344 345
346 -class merged_writer(writer):
347 """Assumes that the data will be read with read_merged(), so that 348 header values will supply default values for each column. 349 """ 350
351 - def __init__(self, fd, sep='\t', blank='%na'):
352 writer.__init__(self, fd, sep, blank) 353 self._hdr = {}
354
355 - def header(self, k, v):
356 self._hdr[k] = v 357 writer.header(self, k, v)
358
359 - def datum(self, data_item):
360 """Assumes that the data will be read with read_merged(), so that 361 header values will supply default values for each column. 362 This writes a line in the fiat file, but first it deletes any values 363 that alread exist as a header item of the same name. 364 """ 365 tmp = {} 366 for (k, v) in data_item.items(): 367 if k not in self._hdr or v!=self._hdr[k]: 368 tmp[k] = v 369 writer.datum(self, tmp)
370 371
372 -def shared_data_values(data_items):
373 """Takes a list of data and pulls out all the items that have 374 the same value in each line. The idea is that you can then 375 put them into the header via:: 376 377 hdr, data, c = read(fd) 378 htmp, data = shared_data_values(data) 379 hdr.update(htmp) 380 381 @type data_items: C{list(dict(str: anything))} 382 @return: It returns a tuple of (1) a dictionary of header items, 383 and (2) a list of data. The list has the same length as 384 C{data_items}, but the dictionaries within it may have fewer entries. 385 @rtype: C{tuple(dict(str:anything), list(dict(str:anything)))} 386 """ 387 data_items = list(data_items) 388 values = None 389 for datum in data_items: 390 if values is None: 391 values = datum.copy() 392 for (k, v) in datum.items(): 393 if k in values and v!=values[k]: 394 del values[k] 395 assert values is not None 396 outdata = [] 397 for datum in data_items: 398 tmp = {} 399 for (k, v) in datum.items(): 400 if k not in values: 401 tmp[k] = v 402 else: 403 assert v == values[k] 404 outdata.append(tmp) 405 return (values, outdata)
406 407 408 BadFormatError = g_encode.BadFormatError 409 FiatError = BadFormatError 410 411
412 -class ConflictingColumnSpecification(BadFormatError):
413 - def __init__(self, s):
414 FiatError.__init__(self, s)
415 416 417
418 -def _check_last_comment(comment, names):
419 """Check to see if the last comment is just a list of column names. 420 This is what write() produces. If so, it can be safely deleted. 421 """ 422 # print "SORTED NAMES=", sorted_names 423 # print "LAST COMMENT", comment.split() 424 return comment.split() == names
425 426 427 428 _encoder_cache = {}
429 -def _encoder(sep):
430 CACHELEN = 30 431 if sep not in _encoder_cache: 432 if len(_encoder_cache) > CACHELEN: 433 _encoder_cache.pop() 434 notallowed = '%#=\n\r' 435 if sep is '': 436 notallowed += ' \t\f\v' 437 else: 438 if sep in notallowed: 439 raise ValueError, "Illegal separator: {%s}" % sep 440 notallowed += sep 441 _encoder_cache[sep] = g_encode.encoder(notallowed=notallowed) 442 return _encoder_cache[sep]
443 444
445 -class _rheader(object):
446 """This class is private. It processes and accumulates header information 447 as a FIAT file is read in. 448 It represents the header information of a fiat file. 449 """ 450 451 LTTYPE = len('TTYPE') 452
453 - def __init__(self):
454 self.sep = None 455 self.blank = '%na' 456 self.comments = [] 457 self.name_to_col = {} 458 self.header = {} 459 # self.header = {'__COLUMNS': {}, '__NAME_TO_COL': {} 460 # } 461 self.enc = _encoder('') 462 self.icol = [] 463 self.colmap = {}
464
465 - def dequote(self, s):
466 """Remove quotes from a value.""" 467 ss = s.strip() 468 if len(ss) < 2: 469 return ss 470 elif ss[0] in '\'"|' and ss[0]==ss[-1]: 471 if ss[0] == '|': 472 return self.enc.back(ss[1:-1]) 473 return ss[1:-1] 474 return ss
475
476 - def parse(self, s):
477 """Parse a line of text and store the information. 478 @type s: str 479 """ 480 l = s[1:].strip() 481 a = l.split('=', 1) 482 if len(a) > 1 and len(a[0].split()) == 1: 483 attr = a[0].strip() 484 val = self.dequote(a[1]) 485 if attr.upper().startswith('TTYPE'): 486 ic = int(attr[self.LTTYPE:])-1 487 if ic in self.colmap and self.colmap[ic]!=val: 488 raise ConflictingColumnSpecification, 'column=%d: "%s" vs. "%s"' % ( 489 ic, val, self.icol[ic] 490 ) 491 if val in self.name_to_col and self.name_to_col[val] != ic: 492 raise ConflictingColumnSpecification, 'val="%s": columns %d vs. %d' % ( 493 val, ic, self.icol[ic] 494 ) 495 self.extend_icol(ic+1) 496 self.icol[ic] = val 497 self.colmap[ic] = val 498 self.name_to_col[val] = ic 499 elif attr == 'COL_EMPTY': 500 self.blank = val 501 elif attr == 'COL_SEPARATOR': 502 self.sep = ''.join( [chr(int(q)) for q in val.split() ] ) 503 self.enc = _encoder(self.sep) 504 else: 505 self.header[attr] = val 506 elif not _check_last_comment(l, self.icol): 507 self.comments.append(l)
508 509
510 - def extend_icol(self, la):
511 if len(self.icol) < la: 512 self.icol.extend( range(len(self.icol), la) )
513
514 - def dump(self, d):
515 hdr = self.header 516 com = self.comments 517 self.header = {} 518 self.comments = [] 519 return (hdr, d, com)
520
521 - def dumpx(self, d):
522 """Two special entries are added to header: __COLUMNS points to a mapping 523 from column numbers (the order in which they appeared in the file, 524 left to right, starting with 0) to names, 525 and __NAME_TO_COL is the reverse mapping. 526 """ 527 hdr = self.header 528 hdr['__NAME_TO_COL'] = self.name_to_col 529 hdr['__COLUMNS'] = self.colmap 530 com = self.comment 531 self.header = {} 532 self.comment = [] 533 return (hdr, d, com)
534 535
536 -def read(fd):
537 """Read a fiat format data file. 538 Each line in the FIAT file is represented by a dictionary that maps 539 column name into the data (data is a string). 540 Lines without data in a certain column will not have the corresponding 541 entry in the dictionary for that line. 542 543 You can use this function as follows:: 544 545 hdr, data, comments = read(fd) 546 for datum in data: 547 print datum['X'] 548 549 @return: Three items: header, data, and comments. 550 Header is the collected dictionary of header information 551 data is a list of dictionaries, one for each line in the file, 552 and comments is a list of strings. 553 @rtype: tuple(dict, list(dict(str:str)), list(str)) 554 @param fd: The data source: typically a file descriptor. 555 @type fd: An iterator that generates strings. Typically a L{file} object. 556 """ 557 out = [] 558 hdr = {} 559 comments = [] 560 for (h, d, c) in readiter(fd): 561 hdr.update(h) 562 comments.extend(c) 563 if d is not None: 564 out.append(d) 565 return (hdr, out, comments)
566 567
568 -def read_merged(fd):
569 """Read in a fiat file and return a list of dictionaries, 570 one for each line in the file. (Also a list of comment lines.) 571 Each line in the input FIAT file is represented by a dictionary that maps 572 column name into the data (data is a string). 573 The header data in the FIAT file is merged into the per-column data, 574 so that the header data is used as a default value for the column of the same name. 575 As a result, all the information in the file (both header and data) is in the 576 resulting list of dictionaries. 577 578 NB: this is a bit of a specialized routine. Normally, one uses L{read}. 579 580 E.g. if there is a header line "# X = Y" and 581 no data column called "X", then this will succeed:: 582 583 data, comments = read_merged(fd): 584 for datum in data: 585 assert datum['X']=='Y' 586 587 That this routine does not require header lines to precede data lines. 588 If header lines appear in the middle, then a new column will be created from 589 that point onwards. 590 @return: (data, comments) 591 @rtype: (list(dict), list(str)) 592 @param fd: The data source: typically a file descriptor. 593 @type fd: An iterator that generates strings. Typically a L{file} object. 594 """ 595 out = [] 596 comments = [] 597 hdr = {} 598 for (h, d, c) in readiter(fd): 599 comments.extend(c) 600 if d is not None: 601 hdr.update(h) 602 tmp = hdr.copy() 603 tmp.update(d) 604 out.append(tmp) 605 return (out, comments)
606 607 608
609 -def readiter(fd):
610 """Read in a fiat file. 611 Each line in the file is represented by a dictionary that maps 612 column name into the data (data is a string). 613 Lines without data in a certain column will not have an entry in that line's 614 dictionary for that column name. 615 616 Lines beginning with '#' are either header or comment lines. 617 A fiat file can mix header lines amongst the data lines. (Although, typically, all 618 the header info is at the top.) 619 620 You can use this function as follows:: 621 622 for (hdr, datum, comments) in readiter(fd): 623 print datum['X'] 624 625 NB: this is a bit of a specialized routine. Normally, one uses L{read}. 626 627 @param fd: The data source: typically a L{file}. Not a filename. 628 @type fd: Anything that supports iteration. Typically a L{file} object. 629 @return: Three items: header, data, and comments. 630 Header is the collected dictionary of header information 631 since the last iteration, 632 data is a dictionary of the data on the current line, 633 and comments is a list of comment string seen so far. 634 The end of the file yields None for the last data, 635 along with any header info or comments after the last datum. 636 @rtype: (dict(str:str), dict(str:str), list(str)) 637 """ 638 hobj = _rheader() 639 i = 0 640 for line in fd: 641 if not line.endswith('\n'): # Incomplete final line. 642 if i == 0: 643 raise BadFormatError, "Empty file" 644 else: 645 warnings.warn('fiatio.readiter(): ignoring line without newline.', FiatioWarning) 646 break 647 line = line.rstrip('\r\n') 648 if not line: # empty line 649 yield hobj.dump({}) 650 elif line.startswith('#'): 651 if i==0 and line.startswith('# fiat'): 652 continue 653 hobj.parse(line) 654 else: 655 a = line.split(hobj.sep) 656 if len(hobj.icol) < len(a): 657 hobj.extend_icol(len(a)) 658 tmp = {} 659 for (ic, ai) in zip(hobj.icol, a): 660 if ai != hobj.blank: 661 tmp[ic] = hobj.enc.back(ai) 662 yield hobj.dump(tmp) 663 i += 1 664 yield hobj.dump(None)
665 666 667
668 -def read_as_float_array(fd, loose=False, baddata=None):
669 """Read in a fiat file. Return (header, data, comments), 670 where header is a dictionary of header information, 671 data is a numpy array, and comments is a list of strings. 672 Two special entries are added to header: __COLUMNS points to a mapping 673 from column numbers (the order in which they appeared in the file, 674 left to right, starting with 0) to names, and 675 _NAME_TO_COL holds the reverse mapping. 676 677 Empty values are set to NaN. 678 679 If loose==False, then all entries must be either floating point numbers 680 or empty entries or equal to baddata (as a string, before conversion 681 to a float). 682 If loose==True, all non-floats will simply be masked out. 683 """ 684 import Num 685 import fpconst 686 687 if loose: 688 def float_or_NaN(s): 689 if s == baddata: 690 tmpd = NaN 691 else: 692 try: 693 tmpd = float(s) 694 except ValueError: 695 tmpd = NaN 696 return tmpd
697 else: 698 def float_or_NaN(s): 699 if s == baddata: 700 tmpd = NaN 701 else: 702 tmpd = float(s) 703 return tmpd 704 705 706 707 NaN = fpconst.NaN 708 hobj = _rheader() 709 710 data = [] 711 maxcols = None 712 for (i,line) in enumerate(fd): 713 if i==0 and line.startswith('# fiat'): 714 continue 715 line = line.rstrip('\r\n') 716 717 if line.startswith('#'): 718 hobj.parse(line) 719 else: 720 a = line.split(hobj.sep) 721 if len(hobj.icol) < len(a): 722 hobj.extend_icol(len(a)) 723 tmpd = Num.array( [ float_or_NaN(q) for q in a ], Num.Float) 724 data.append(tmpd) 725 726 # Now, if the length has expanded between the beginning and the end, 727 # go back and fill out the data with NaNs to a uniform length. 728 if data and data[0].shape[0] < data[-1].shape[0]: 729 NaNs = Num.zeros((maxcols,), Num.Float) + NaN 730 assert not Num.sometrue(Num.equal(NaNs, NaNs)) 731 i = 0 732 while i<len(data) and data[i].shape[0]<maxcols: 733 data[i] = Num.concatenate((data[i], NaNs[:maxcols-data[i].shape[0]])) 734 i += 1 735 736 return (hobj.header, data, hobj.comments) 737 738 739
740 -def _hcheck(a, b):
741 # del a['__NAME_TO_COL'] 742 # del a['__COLUMNS'] 743 assert abs(float(a['SAMPRATE'])-float(b['SAMPRATE'])) < 1e-8 744 del a['SAMPRATE'] 745 del b['SAMPRATE'] 746 assert a == b
747 748
749 -def _dcheck(a, b):
750 for (ax, bx) in zip(a, b): 751 tmp = {} 752 for (k, v) in bx.items(): 753 tmp[k] = str(v) 754 assert ax == tmp
755 756
757 -def test1():
758 fd = open("/tmp/fakeZZZ.fiat", "w") 759 data = [{'a':1, 'b':2}, 760 {'a':2, 'b':3}, 761 {'b':3}, 762 {'a':3}, 763 {}, 764 {'a':1, 'b':0} 765 ] 766 comments = ['Comment1', 'Comment2'] 767 header = {'SAMPRATE': 2.3, 'DATE':'2001-09-21T21:32:32'} 768 write(fd, data, comments, header) 769 fd.flush() 770 fd.close() 771 fd = open("/tmp/fakeZZZ.fiat", "r") 772 h, d, c = read(fd) 773 _hcheck(h, header) 774 _dcheck(d, data) 775 _ccheck(c, comments)
776
777 -def _ccheck(c, comments):
778 assert len(c) == len(comments)+2 779 for ac in comments: 780 assert ac in c
781 782
783 -def test2():
784 fd = open("/tmp/fakeZZZ.fiat", "w") 785 data = [{}, {'a': 111}, 786 {'a':1, 'b':2}, 787 {'a':2, 'b':3}, 788 {'b':3}, 789 {'a':3}, 790 {}, 791 {'a':1, 'b':0} 792 ] 793 comments = ['Comment1', 'Comment2'] 794 header = {'SAMPRATE': 2.3, 'DATE':'2001-09-21T21:32:32'} 795 write(fd, data, comments, header) 796 fd.flush() 797 fd.close() 798 fd = open("/tmp/fakeZZZ.fiat", "r") 799 h, d, c = read(fd) 800 _hcheck(h, header) 801 _dcheck(d, data) 802 _ccheck(c, comments)
803 804 805
806 -def test3():
807 fd = open("/tmp/fakeZZ1.fiat", "w") 808 comments = ['Comment1', 'Comment2'] 809 header = {'SAMPRATE': 2.3, 'DATE':'2001/09/21', 'nasty': '\033\032\011\rxebra'} 810 data = [ 811 {'a':101, 'bljljlj':2, 'fd': 33341, 'q': 12}, 812 {'a':10, 'bljljlj':2, 'fd': 3334111, 'q': 12}, 813 {'a':10, 'bljljlj':21, 'fd': 3331, 'q': 12}, 814 {'a':1, 'bljljlj':4, 'fd': 3334122, 'q': 12}, 815 {'a':1, 'bljljlj':3, 'fd': 333, 'q': 12} 816 ] 817 write(fd, data, comments, header, 818 blank = 'NA' 819 ) 820 fd.flush() 821 fd.close() 822 fd = open("/tmp/fakeZZ1.fiat", "r") 823 h, d, c = read(fd) 824 _hcheck(h, header) 825 _dcheck(d, data) 826 _ccheck(c, comments)
827 828
829 -def test4():
830 import Num 831 fd = open("/tmp/fakeZZ1.fiat", "w") 832 adata = Num.zeros((4,7), Num.Float) 833 for i in range(adata.shape[0]): 834 for j in range(adata.shape[1]): 835 adata[i,j] = i**2 + 2*j**2 - 0.413*i*j - 0.112*float(i+1)/float(j+2) 836 comments = ['C1'] 837 hdr = {'foo': 'bar', 'gleep':' nasty\n value\t'} 838 columns = ['A', 'b', 'C', 'd'] 839 write_array(fd, adata, columns=columns, 840 comments=comments, hdr=hdr, sep='\t') 841 fd = open("/tmp/fakeZZ1.fiat", "r") 842 h, adtest, c = read_as_float_array(fd, loose=False, baddata=None) 843 _ccheck(c, comments) 844 for (k,v) in hdr.items(): 845 assert h[k] == v 846 # for (i,cname) in enumerate(columns): 847 # assert h['__COLUMNS'][i] == cname 848 # assert h['__NAME_TO_COL'][cname] == i 849 if Num.sum(Num.absolute(adtest-adata)) > 0.001: 850 raise AssertionError('Bad array recovery')
851
852 -def test():
853 test1() 854 test2() 855 test3() 856 test4()
857 858 if __name__ == '__main__': 859 test() 860