Module table_func¶
This module contains table (file) functions.
It comprises reading, writing and a limited number of manipulations.
The data is stored in numpy recarrays.
The main purpose of the module is to make it easier to read and write different formats of table files.
Introduction¶
Getting started¶
Module file: table_func.py
Recommended import statement:
from table_func import *
Python installation and modules:
Python | Developed and tested with Python 2.7 |
Numpy | Developed and tested with numpy 1.8.2 |
Tables¶
Tables are data structured in fields (columns) and records (rows). The fields have a name and could have different data types, e.g. float, int and string. A record contains a value for each field.
In table_func tables are stored in numpy recarrays (structured arrays). See below.
Numpy recarray (structured array)¶
See also Numpy documentation on structured arrays
Numpy recarrays make it possible to store data of different type (dtype) in one array. Data is structured in fields and records and is therefore suitable to store data in tabular form (tables).
Although one may think of a table as a 2-dimensional dataset, it is called 1-dimensional, with the records on dimension 1.
Fields could be accessed by their field names, e.g.:
myTable['myField']
It returns an array with the value of each record in the field ‘myField’.
Records could be accessed by index number, e.g.:
myTable[0]
It returns the first record.
In table_func all values in the table are 0-dimensional values, i.e. single values. It is not possible to store a sequence of values in a field within one record. This is possible with numpy recarrays, but not used here.:
# single values in each field
[(1, 3.0, 'a'),
(2, 4.0, 'b')]
# sequence of values in the first field; this is not possible in table_func
[([1,10,100], 3.0, 'a'),
([2,20,200], 4.0, 'b')]
Ways to get field names and data types of a recarray:
# getting the field names:
myTable.dtype.names
# getting the field names and data types:
myTable.dtype
Table file formats¶
Comma-separated files¶
A text-formatted table file. The fields are separated by commas. This file format is also refered to as ‘csv’ file.
In some functions it is assumed that there is 1 header line (also comma-separated) which contains the field names. See function description if this assumption is made.
DBF files¶
A binary table file. File format used by the dBASE database management system to store tables of data.
The data file of an ESRI shapefile is a DBF file.
See also: DBF file
The field specs comprise type, size and deci where
type is one of:
C for ascii character data
M for ascii character memo data (real memo fields not supported)
D for datetime objects
N for ints or decimal objects
F for floats
L for logical values ‘T’, ‘F’, or ‘?’
size is the field width
deci is the number of decimal places in the provided decimal object
iMOD IPF file¶
A text-formatted table file. This is a table file used by iMOD. It may be space or comma-separated.
The header of the file consists of several lines. The number of records, number of fields, each field name are all stored in a separate header line. The last header line contains an index number (refering to one of the fields) and a file extension. The index number starts at 1 (the first field). If the index number is > 0 it is used to construct file names from that specific field and the extension. These file names point to iMOD timeseries files (see below).
The records consist of one line each.
Example of an IPF file with 4 fields and 3 records:
3
4
X
Y
Z
Q
0,TXT
110000.0 750000.0 15.0 -300.0
109000.0 760000.0 25.0 -300.0
108000.0 770000.0 26.0 -300.0
And now the 4th field refer to timeseries files with extention *.TXT:
3
4
X
Y
Z
Q
4,TXT
110000.0 750000.0 15.0 "data\abstraction 1"
110000.0 750000.0 15.0 "data\abstraction 2"
110000.0 750000.0 15.0 "data\abstraction 3"
iMOD timeseries file¶
A text-formatted table file. This is a table file used by iMOD for timeseries. It may be space or comma-separated.
The first field is used to store dates (or date-times). The missing value is specified for each field.
Example of timeseries file with 2 fields and with 14 year of data on a daily basis (5114 dates):
5114
2
DATE,-999
HEAD,-999999
20000101 -10.5
20000102 -10.4
..
20131231 -10.9
Space-separated files¶
A text-formatted table file. The fields are separated by spaces: any (combination of) whitespaces, i.e. space and tab. This file format is sometimes refered to as ‘dat’ file.
In some functions it is assumed that there is 1 header line (also space-separated) which contains the field names. See function description if this assumption is made.
Other text-formatted files¶
It is possible to use other text-formatted files, which resemble the comma or space-separated files but use other separators/delimiters.
Excel files¶
Microsoft Excel files: *.xls and *.xlsx.
Only reading of Excel files (workbooks) is currently supported. Tables from any worksheet in the workbook can be read.
By default it is assumed that all data in the worksheet belong to the table. However, empty rows at the top and empty columns on the left are skipped automatically.
Additionally it is possible to specify the block of cells containing the table.
Reading table files¶
Main function for reading table files:
table2arr |
Function to read a table file and create a numpy recarray (structured array). |
Functions for reading table files in specific formats:
ipf2arr |
Function to read an iMOD IPF file and create a numpy recarray. |
imodtss2arr |
Function to read an iMOD timeseries file and create a numpy recarray. |
csv2arr |
Function to read a comma-separated file and create a numpy recarray. |
dbf2arr |
Function to read a DBF file and create a numpy recarray. |
dat2arr |
Function to read a space-separated file and create a numpy recarray. |
txt2arr |
Function to read a text-formatted table file and create a numpy recarray or ndarray. |
xls2arr |
Function to read a table from a worksheet in an Excel workbook. |
xls2arr_allsheets |
Function to read the tables from all worksheets in an Excel workbook. |
Functions for getting field information:
table2fields |
Function to get the field names of a table file. |
ipf2fields |
Function to get the field names of an iMOD IPF file. |
ipf2iext |
Function to get the field number of the iMOD timeseries file references (IEXT) and the extension (EXT) from an iMOD IPF file. |
csv2fields |
Function to get the field names of a comma-separated file. |
dbf2fields |
Function to get the field names of a DBF file. |
dbf2fields_specs |
Function to get the field names and field specs of a DBF file. |
dat2fields |
Function to get the field names of a space-separated file. |
xls2fields |
Function to get the field names of a table in a worksheet in an Excel workbook. |
get_fields |
Function to get the field names of a text-formatted table file. |
Other functions related to reading table files:
dbfreader |
Function to create a generator over records in a DBF file. |
xls2sheets |
Function to get the worksheet names of an Excel workbook. |
Writing table files¶
Main function for writing table files:
arr2table |
Function to write a numpy recarray to a table file. |
Functions for writing table files in specific formats:
arr2ipf |
Function to write a numpy recarray to an iMOD IPF file. |
arr2imodtss |
Function to write a numpy recarray to an iMOD timeseries file. |
arr2csv |
Function to write numpy recarray to a comma-separated file. |
arr2dbf |
Function to write numpy recarray to a DBF file. |
arr2dat |
Function to write numpy recarray to a space-separated file. |
arr2txt |
Function to write a numpy recarray to a text-formatted table file. |
Other functions related to writing table files:
dbfwriter |
Function to create and write a binary string directly to an opened DBF file. |
Modifying numpy recarrays¶
Functions:
recarray2ndarray |
Function to convert a numpy recarray to a regular numpy ndarray. |
make_recarr |
Function to create a numpy recarray filled with nodata. |
take_from_recarr |
Function to take specific fields from a numpy recarray. |
add_to_recarr |
Function to add fields to a numpy recarray. |
remove_from_recarr |
Function to remove fields from a numpy recarray. |
change_dtype |
Function to change the datatype (dtype) of specific fields in a numpy recarray. |
Other functions¶
arr2dbfspecs |
Function to get DBF field specs for a numpy recarray. |
split_csv |
Function to split a string according to ‘csv’ rules. |
split_ipf |
Function to split a string according to ‘ipf’ rules. |
split_freeformat |
Function to split a string according to ‘freeformat’ rules. |
split_string |
Function to split a string using one or more delimiters. |
replace_string |
Function to replace substrings by other substrings. |