| PyTables User's Guide: Hierarchical datasets in Python - Release 1.3.2 | ||
|---|---|---|
| Prev | Chapter 4. Library Reference | Next | 
In this section a series of classes that are meant to declare datatypes that are required for primary PyTables (like Table or VLArray ) objects are described.
This class is designed to be used as an easy, yet meaningful way to describe the properties of Table objects through the definition of derived classes that inherit properties from it. In order to define such a class, you must declare it as descendant of IsDescription, with as many attributes as columns you want in your table. The name of each attribute will become the name of a column, and its value will hold a description of it.
Ordinary columns can be described using instances of the Col (see section 4.16.2) class. Nested columns can be described by using classes derived from IsDescription or instances of it. Derived classes can be declared in place (in which case the column takes the name of the class) or referenced by name, and they can have a _v_pos special attribute which sets the position of the nested column among its sibling columns.
Once you have created a description object, you can pass it to the Table constructor, where all the information it contains will be used to define the table structure. See the section 3.4 for an example on how that works.
See below for a complete list of the special attributes that can be specified to complement the metadata of an IsDescription class.
The flavor of the table. It can take "numarray" (default) or "numpy" values. This determines the type of objects returned during input (i.e. read) operations.
An instance of the IndexProps class (see section 4.17.2). You can use this to alter the properties of the index creation process for a table.
Sets the position of a possible nested column description among its sibling columns.
The Col class is used as a mean to declare the different properties of a table column. In addition, a series of descendant classes are offered in order to make these column descriptions easier to the user. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code.
The type class of the column.
The string type of the column.
The string type, in RecArray format, of the column.
The shape of the column.
The size of the base items. Specially useful for StringCol objects.
Whether this column is meant to be indexed or not.
The position of this column with regard to its column siblings.
The name of this column
The complete pathname of the column. This is mainly useful in nested columns; for non-nested ones this value is the same a _v_name.
A description of the different constructors with their parameters follows:
Declare the properties of a Table column.
The data type for the column. All types listed in appendix A are valid data types for columns. The type description is accepted both in string-type format and as a numarray data type.
An integer or a tuple, that specifies the number of dtype items for each element (or shape, for multidimensional elements) of this column. For CharType columns, the last dimension is used as the length of the character strings. However, for this kind of objects, the use of StringCol subclass is strongly recommended.
The default value for elements of this column. If the user does not supply a value for an element while filling a table, this default value will be written to disk. If the user supplies an scalar value for a multidimensional column, this value is automatically broadcasted to all the elements in the column cell. If dflt is not supplied, an appropriate zero value (or null string) will be chosen by default. Please, note that all the default values are kept internally as numarray objects.
By default, columns are arranged in memory following an alpha-numerical order of the column names. In some situations, however, it is convenient to impose a user defined ordering. pos parameter allows the user to force the desired ordering.
Whether this column should be indexed for better performance in table selections.
Declare a column to be of type CharType. The length parameter sets the length of the strings. The meaning of the other parameters are like in the Col class.
Define a column to be of type Bool. The meaning of the parameters are the same of those in the Col class.
Declare a column to be of type IntXX, depending on the value of itemsize parameter, that sets the number of bytes of the integers in the column. sign determines whether the integers are signed or not. The meaning of the other parameters are the same of those in the Col class.
This class has several descendants:
Define a column of type Int8.
Define a column of type UInt8.
Define a column of type Int16.
Define a column of type UInt16.
Define a column of type Int32.
Define a column of type UInt32.
Define a column of type Int64.
Define a column of type UInt64.
Define a column to be of type FloatXX, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the column and the default is 8 bytes (double precision). The meaning of the other parameters are the same as those in the Col class.
This class has two descendants:
Define a column of type Float32.
Define a column of type Float64.
Define a column to be of type ComplexXX, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the complex types in the column and the default is 16 bytes (double precision complex). The meaning of the other parameters are the same as those in the Col class.
This class has two descendants:
Define a column of type Complex32.
Define a column of type Complex64.
ComplexCol columns and its descendants do not support indexation.
Define a column to be of type Time. Two kinds of time columns are supported depending on the value of itemsize: 4-byte signed integer and 8-byte double precision floating point columns (the default ones). The meaning of the other parameters are the same as those in the Col class.
Time columns have a special encoding in the HFD5 file. See appendix A for more information on those types.
This class has two descendants:
Define a column of type Time32.
Define a column of type Time64.
Description of a column of an enumerated type.
Instances of this class describe a table column which stores enumerated values. Those values belong to an enumerated type, defined by the first argument (enum) in the constructor of EnumCol, which accepts the same kinds of arguments as Enum (see 4.17.4). The enumerated type is stored in the enum attribute of the column.
A default value must be specified as the second argument (dflt) in the constructor; it must be the name (a string) of one of the enumerated values in the enumerated type. Once the column is created, the corresponding concrete value is stored in its dflt attribute. If the name does not match any value in the enumerated type, a KeyError is raised.
A numarray data type might be specified in order to determine the base type used for storing the values of enumerated values in memory and disk. The data type must be able to represent each and every concrete value in the enumeration. If it is not, a TypeError is raised. The default base type is unsigned 32-bit integer, which is sufficient for most cases.
The stype attribute of enumerated columns is always 'Enum', while the type attribute is the data type used for storing concrete values.
The shape, position and indexed attributes of the column are treated as with other column description objects (see 4.16.2).
The Atom class is a descendant of the Col class (see 4.16.2) and is meant to declare the different properties of the base element (also known as atom) of CArray, EArray and VLArray objects. The Atom instances have the property that their length is always the same. However, you can grow objects along the extensible dimension in the case of EArray or put a variable number of them on a VLArray row. Moreover, the atoms are not restricted to scalar values, and they can be fully multidimensional objects.
A series of descendant classes are offered in order to make the use of these element descriptions easier. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code.
In addition to the variables that it inherits from the Col class, it has the next additional attributes:
The object representation for this atom. See below on constructors description for Atom class the possible values it can take.
Returns the total length, in bytes, of the element base atom. If its shape is has one zero element on it (for use in EArrays, for example), this is replaced by an one in order to compute the atom size correctly.
A description of the different constructors with their parameters follows:
Define properties for the base elements of CArray, EArray and VLArray objects.
The data type for the base element. See the appendix A for a relation of data types supported. The type description is accepted both in string-type format and as a numarray data type.
In a EArray context, it is a tuple specifying the shape of the object, and one (and only one) of its dimensions must be 0, meaning that the EArray object will be enlarged along this axis. In the case of a VLArray, it can be an integer with a value of 1 (one) or a tuple, that specifies whether the atom is an scalar (in the case of a 1) or has multiple dimensions (in the case of a tuple). For CharType elements, the last dimension is used as the length of the character strings. However, for this kind of objects, the use of StringAtom subclass is strongly recommended.
The object representation for this atom. It can be any of "numarray", "numpy" or "python" for the character types and "numarray", "numpy", "numeric" or "python" for the numerical types. If specified, the read atoms will be converted to that specific flavor. If not specified, the atoms will remain in their native format (i.e. numarray).
Define an atom to be of CharType type. The meaning of the shape parameter is the same as in the Atom class. length sets the length of the strings atoms. flavor can be whether "numarray", "numpy" or "python". Unicode strings are not supported by this type; see the VLStringAtom class if you want Unicode support (only available for VLAtom objects).
Define an atom to be of type Bool. The meaning of the parameters are the same of those in the Atom class.
Define an atom to be of type IntXX, depending on the value of itemsize parameter, that sets the number of bytes of the integers that conform the atom. sign determines whether the integers are signed or not. The meaning of the other parameters are the same of those in the Atom class.
This class has several descendants:
Define an atom of type Int8.
Define an atom of type UInt8.
Define an atom of type Int16.
Define an atom of type UInt16.
Define an atom of type Int32.
Define an atom of type UInt32.
Define an atom of type Int64.
Define an atom of type UInt64.
Define an atom to be of FloatXX type, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the atom and the default is 8 bytes (double precision). The meaning of the other parameters are the same as those in the Atom class.
This class has two descendants:
Define an atom of type Float32.
Define an atom of type Float64.
Define an atom to be of ComplexXX type, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the atom and the default is 16 bytes (double precision complex). The meaning of the other parameters are the same as those in the Atom class.
This class has two descendants:
Define an atom of type Complex32.
Define an atom of type Complex64.
Define an atom to be of type Time. Two kinds of time atoms are supported depending on the value of itemsize: 4-byte signed integer and 8-byte double precision floating point atoms (the default ones). The meaning of the other parameters are the same as those in the Atom class.
Time atoms have a special encoding in the HFD5 file. See appendix A for more information on those types.
This class has two descendants:
Define an atom of type Time32.
Define an atom of type Time64.
Description of an atom of an enumerated type.
Instances of this class describe the atom type used by an array to store enumerated values. Those values belong to an enumerated type.
The meaning of the enum and dtype arguments is the same as in EnumCol (see 4.16.2). The shape and flavor arguments have the usual meaning of other Atom classes (the flavor applies to the representation of concrete read values).
Enumerated atoms also have stype and type attributes with the same values as in EnumCol.
Now, there come two special classes, ObjectAtom and VLString, that actually do not descend from Atom, but which goal is so similar that they should be described here. The difference between them and the Atom and descendants classes is that these special classes does not allow multidimensional atoms, nor multiple values per row. A flavor can not be specified neither as it is immutable (see below).
Caveat emptor: You are only allowed to use these classes to create VLArray objects, not CArray and EArray objects.
This class is meant to fit any kind of object in a row of an VLArray instance by using cPickle behind the scenes. Due to the fact that you can not foresee how long will be the output of the cPickle serialization (i.e. the atom already has a variable length), you can only fit a representant of it per row. However, you can still pass several parameters to the VLArray.append() method as they will be regarded as a tuple of compound objects (the parameters), so that we still have only one object to be saved in a single row. It does not accept parameters and its flavor is automatically set to "Object", so the reads of rows always returns an arbitrary python object. You can regard ObjectAtom types as an easy way to save an arbitrary number of generic python objects in a VLArray object.
This class describes a row of the VLArray class, rather than an atom. It differs from the StringAtom class in that you can only add one instance of it to one specific row, i.e. the VLArray.append() method only accepts one object when the base atom is of this type. Besides, it supports Unicode strings (contrarily to StringAtom) because it uses the UTF-8 codification (this is why its atomsize() method returns always 1) when serializing to disk. It does not accept any parameter and because its flavor is automatically set to "VLString", the reads of rows always returns a python string. See the appendix D.3.5 if you are curious on how this is implemented at the low-level. You can regard VLStringAtom types as an easy way to save generic variable length strings.
See examples/vlarray1.py and examples/vlarray2.py for further examples on VLArrays, including object serialization and Unicode string management.