Columns¶
Due to its nature, hamana library is designed to work with data often extracted in tabular form. As a consequence, it was introduced the Column class that could be used to store useful information about the data extracted (e.g. name, type, etc.) and better describe the data. For example, the Column class can be found in Query objects, or in the definition of CSV connectors.
Even if Column classes define a general behavior, they can be customized to better fit to specific data types or sources. hamana provides default implemntations for the most common data types:
NumberColumn: this column can be used to manage any kind of number.IntegerColumn: column class specialised to manage integer values.StringColumn: column class specialised to manage string values.BooleanColumn: column class specialised to manage boolean values.DatetimeColumn: this column is specific for datetime values.DateColumn: this column is specific for date values.
These classes could be useful because they provide already a default implementation of the ColumnParser class, that is used to convert the data from the source to the internal representation. In addition, they provide additional class attributes fitting the desired datatype.
Clearly, it remains always possible to create custom Column classes by extending the Column class and providing a custom implementation of the ColumnParser class.
DataType¶
Before presenting the Column class, we first introduce the DataType class. This class creates a standard inside the library to manage the types, and it provides a bridge between SQLite and pandas data types.
hamana.core.column.DataType
¶
Bases: Enum
Enumeration representing the datatypes of the hamana columns.
The library supports the following data types:
INTEGER: integer data type.NUMBER: number data type.STRING: string data type.BOOLEAN: boolean data type.DATETIME: datetime data type.DATE: date data type.CUSTOM: custom data type.
The CUSTOM data type is used to represent a custom datatype
that could be used for dedicated implementations.
Since the library is designed to be used with pandas and sqlite,
the DataType enumeration also provides a method to map the data types
to the corresponding data types in sqlite and pandas.
from_pandas
classmethod
¶
from_pandas(dtype: str) -> DataType
Function to map a pandas datatype to DataType.
Observe that if no mapping is found, the default is DataType.STRING.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dtype
|
str
|
pandas data type. |
required |
Returns:
| Type | Description |
|---|---|
DataType
|
|
Source code in src/hamana/core/column.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |
to_sqlite
classmethod
¶
to_sqlite(dtype: DataType) -> str
Function to map a DataType to a SQLite datatype.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dtype
|
DataType
|
|
required |
Returns:
| Type | Description |
|---|---|
str
|
SQLite data type mapped. |
Source code in src/hamana/core/column.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
Parser¶
Another useful functionality that could be available in the Column class is the parser attribute. This variable, if present, is an instance of the ColumnParser class, that is used to convert the data from the source to the internal representation.
The ColumnParser class is composed of two methods:
pandas: this method must respect the protocolPandasParser, and it is specifically used to convertpandas.Seriesinput datas.polars: currently not supported, but it will be used to convertpolars.Seriesinput datas.
By default, the Column class does not provide any parser, but the NumberColumn, IntegerColumn, StringColumn, BooleanColumn, DatetimeColumn, and DateColumn classes provide a default implementation of the ColumnParser class.
hamana.core.column.ColumnParser
dataclass
¶
ColumnParser(
pandas: PandasParser, polars: Callable | None = None
)
Class representing a parser for a column in the hamana library.
Since the library is designed to be used with pandas and polars,
the ColumnParser class provides methods that could be used to parse
data coming from these libraries.
hamana.core.column.PandasParser
¶
Bases: Protocol
Protocol representing a parser for pandas series.
A pandas parser is a function that requires at least a pandas series
to be taken as input and returned as output after dedicated transformations.
Structure:
def parser(series: pandas.Series, *args: Any, **kwargs: Any) -> pandas.Series:
...
Identifier¶
The are many situations where it is required to identity the column datatype (string, number, date, etc.), e.g. when the data is extracted from file sources like CSV files. To solve this problem, hamana provides the ColumnIdentifier class, that is used to identify the column type according to an input data.
Similarly to the ColumnParser class, the ColumnIdentifier class is composed of two methods:
pandas: this method must respect the protocolPandasIdentifier, and it is specifically used to identify the column type from apandas.Seriesinput data.polars: currently not supported, but it will be used to identify the column type from apolars.Seriesinput data.
hamana.core.identifier.ColumnIdentifier
dataclass
¶
ColumnIdentifier(
pandas: PandasIdentifier[TColumn],
polars: Callable | None = None,
)
Bases: Generic[TColumn]
Class representing an identifier for a column in the hamana library.
Since the library is designed to be used with pandas and polars,
the ColumnIdentifier class provides methods that could be used to identify
the column from a set of data from both libraries.
Note
Observe that the identification process tries to infer the column type based on the data provided. The process is not perfect and could lead to wrong inferences. The user should always check the inferred column type and adjust it if needed.
is_empty
staticmethod
¶
is_empty(
series: PandasSeries, raise_error: bool = False
) -> bool
Check if the series is empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
the series to check. |
required |
raise_error
|
bool
|
if True, raise an error if the series is empty. |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
True if the series is empty, False otherwise. |
Source code in src/hamana/core/identifier.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
__call__
¶
__call__(
series: Any,
column_name: str,
order: int | None = None,
*args: Any,
**kwargs: Any
) -> TColumn | None
Identifies the column type from a given series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Any
|
the series to identify the column type from. |
required |
column_name
|
str
|
the name of the column to identify. |
required |
*args
|
Any
|
additional arguments to pass to the identifier. |
()
|
**kwargs
|
Any
|
additional keyword arguments to pass to the identifier. |
{}
|
Returns:
| Type | Description |
|---|---|
TColumn | None
|
the identified column type or |
Source code in src/hamana/core/identifier.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
infer
staticmethod
¶
infer(
series: Any,
column_name: str,
order: int | None = None,
*args: Any,
**kwargs: Any
) -> (
NumberColumn
| IntegerColumn
| StringColumn
| BooleanColumn
| DatetimeColumn
| DateColumn
)
Infers the column type from a given series. The function passes
the series to the default hamana identifiers in the following
order:
in order to infer the column type.
Note
If the column is empty, then by default the
function assign the STRING datatype.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Any
|
the series to infer the column type from. |
required |
*args
|
Any
|
additional arguments to pass to the identifier. |
()
|
**kwargs
|
Any
|
additional keyword arguments to pass to the identifier. |
{}
|
Returns:
| Type | Description |
|---|---|
NumberColumn | IntegerColumn | StringColumn | BooleanColumn | DatetimeColumn | DateColumn
|
the inferred column type. |
Raises:
| Type | Description |
|---|---|
ColumnIdentifierError
|
if no column type could be inferred. |
Source code in src/hamana/core/identifier.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
hamana.core.identifier.PandasIdentifier
¶
Bases: Protocol[TColumn]
Protocol representing an identifier for pandas series.
A PandasIdentifier is a callable that must have at least
the following input parameters:
- series: the
pandasseries to identify the column type from. - column_name: the name of the column to identify.
The PandasIdentifier must return a column type or None if the
column type could not be identified.
Structure
def __call__(self, series: PandasSeries, column_name: str, order: int | None = None, *args: Any, **kwargs: Any) -> TColumn | None:
...
Default Identifiers¶
hamana provides a set of default identifiers that can be used to identify the default's hamana column types.
Number Identifier¶
hamana.core.identifier.number_identifier
module-attribute
¶
number_identifier = ColumnIdentifier[NumberColumn](
pandas=_default_numeric_pandas
)
Default identifier for the NumberColumn class.
More details on the default methods can be found in the corresponding functions' documentation.
- pandas:
_default_numeric_pandas - polars:
None(not implemented)
hamana.core.identifier._default_numeric_pandas
¶
_default_numeric_pandas(
series: PandasSeries,
column_name: str,
order: int | None = None,
) -> NumberColumn | None
This function defines the default behavior to identify a number column from a pandas series.
In order to identify a number column, the function follows the steps:
- Drop null values (included empty strings)
- Check if the column has letters
- Count the max appearance of the comma and dot separators in all the elements.
- Evaluate first the default configuration (dot decimal separator, comma thousands separator).
- If the default configuration does not work, evaluate the alternative configuration (comma decimal separator, dot thousands separator).
- If also this configuration does not work, return None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
column_name
|
str
|
name of the column to be checked. |
required |
Returns:
| Type | Description |
|---|---|
NumberColumn | None
|
|
Source code in src/hamana/core/identifier.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | |
Integer Identifier¶
hamana.core.identifier.integer_identifier
module-attribute
¶
integer_identifier = ColumnIdentifier[IntegerColumn](
pandas=_default_integer_pandas
)
Default identifier for the IntegerColumn class.
More details on the default methods can be found in the corresponding functions' documentation.
- pandas:
_default_integer_pandas - polars:
None(not implemented)
hamana.core.identifier._default_integer_pandas
¶
_default_integer_pandas(
series: PandasSeries,
column_name: str,
order: int | None = None,
) -> IntegerColumn | None
This function defines the default behavior to identify an integer column from a pandas series.
In order to identify an integer column, the function follows the steps:
- Drop null values (included empty strings)
- Check if the column can be considered as number datatype
- If the check is passed, then is checked if the column is composed only by integers (included the sign).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
column_name
|
str
|
name of the column to be checked. |
required |
Returns:
| Type | Description |
|---|---|
IntegerColumn | None
|
|
Source code in src/hamana/core/identifier.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 | |
String Identifier¶
hamana.core.identifier.string_identifier
module-attribute
¶
string_identifier = ColumnIdentifier[StringColumn](
pandas=_default_string_pandas
)
Default identifier for the StringColumn class.
More details on the default methods can be found in the corresponding functions' documentation.
- pandas:
_default_string_pandas - polars:
None(not implemented)
hamana.core.identifier._default_string_pandas
¶
_default_string_pandas(
series: PandasSeries,
column_name: str,
order: int | None = None,
) -> StringColumn | None
Function to identify a string column from a pandas series.
The function checks if the column is a string column by converting the column to string type and checking if at least one value can be considered as string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
column_name
|
str
|
name of the column to be checked. |
required |
Returns:
| Type | Description |
|---|---|
StringColumn | None
|
|
Source code in src/hamana/core/identifier.py
345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 | |
Boolean Identifier¶
hamana.core.identifier.boolean_identifier
module-attribute
¶
boolean_identifier = ColumnIdentifier[BooleanColumn](
pandas=_default_boolean_pandas
)
Default identifier for the BooleanColumn class.
More details on the default methods can be found in the corresponding functions' documentation.
- pandas:
_default_boolean_pandas - polars:
None(not implemented)
hamana.core.identifier._default_boolean_pandas
¶
_default_boolean_pandas(
series: PandasSeries,
column_name: str,
order: int | None = None,
min_count: int = 1000,
) -> BooleanColumn | None
This function defines the default behavior to identify a boolean column from a pandas series.
To identify a boolean column, the function checks if the column has only two unique values.
Observe, that the function does not check if the values are boolean values, but only if the
column has two unique values; for this reason the assignment of the True and False values
is arbitrary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
column_name
|
str
|
name of the column to be checked. |
required |
min_count
|
int
|
minimum number of elements to consider the column as a boolean column. This parameter is used to avoid wrong inferences when the column has only a few elements. |
1000
|
Returns:
| Type | Description |
|---|---|
BooleanColumn | None
|
|
Source code in src/hamana/core/identifier.py
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 | |
Datetime Identifier¶
hamana.core.identifier.datetime_identifier
module-attribute
¶
datetime_identifier = ColumnIdentifier[DatetimeColumn](
pandas=_default_datetime_pandas
)
Default identifier for the DatetimeColumn class.
More details on the default methods can be found in the corresponding functions' documentation.
- pandas:
_default_datetime_pandas - polars:
None(not implemented)
hamana.core.identifier._default_datetime_pandas
¶
_default_datetime_pandas(
series: PandasSeries,
column_name: str,
order: int | None = None,
format: str | None = None,
) -> DatetimeColumn | None
This function defines the default behavior to identify a datetime column from a pandas series.
To identify this type of column, the function removes first the
null values, then tries to apply pandas.to_datetime with a list
of the most common datetime formats. If the column is not
identified, the function tries to apply pandas.to_datetime
without providing any format. Since this last operation could
lead to wrong inferences, the function considers the column as
a datetime column only if all the values are converted correctly.
Default Formats:
YYYY-MM-DD HH:mm:ssYYYY-MM-DD HH:mmYYYY-MM-DDYYYY/MM/DD HH:mm:ssYYYY/MM/DD HH:mmYYYY/MM/DDYYYYMMDD HH:mm:ssYYYYMMDD HH:mmYYYYMMDD
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
column_name
|
str
|
name of the column to be checked. |
required |
format
|
str | None
|
datetime format used to try to convert the series. If the format is provided, then the default formats are not used. |
None
|
Returns:
| Type | Description |
|---|---|
DatetimeColumn | None
|
|
Source code in src/hamana/core/identifier.py
448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
Date Identifier¶
hamana.core.identifier.date_identifier
module-attribute
¶
date_identifier = ColumnIdentifier[DatetimeColumn](
pandas=_default_date_pandas
)
Default identifier for the Datetime class.
More details on the default methods can be found in the corresponding functions' documentation.
- pandas:
_default_date_pandas - polars:
None(not implemented)
hamana.core.identifier._default_date_pandas
¶
_default_date_pandas(
series: PandasSeries,
column_name: str,
order: int | None = None,
format: str | None = None,
) -> DateColumn | None
This function defines the default behavior to identify a date column from a pandas series.
The function leverages on DatetimeColumn deault pandas identifier method
_default_datetime_pandas to identify the column. However, the function
considers only datetime formats that do not contain time information.
Default Formats:
YYYY-MM-DDYYYY/MM/DDYYYYMMDD
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
column_name
|
str
|
name of the column to be checked. |
required |
format
|
str | None
|
date format used to try to convert the series. If the format is provided, then the default formats are not used. Observe that the format must not contain time information. |
None
|
Returns:
| Type | Description |
|---|---|
DateColumn | None
|
|
Raises:
| Type | Description |
|---|---|
ColumnDateFormatterError
|
if the format is not valid. |
Source code in src/hamana/core/identifier.py
539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 | |
API¶
hamana.core.column.Column
dataclass
¶
Column(
name: str,
dtype: DataType,
parser: ColumnParser | None = None,
order: int | None = None,
inferred: bool = False,
)
Class representing a column in the hamana library.
To define a column, the following attributes are required:
name: name of the column.dtype: represents the datatype and should be an instance ofDataType.parser: a column inhamanacould have an associatedparserobject that could be used to parse list of values; e.g. useful when data are extracted from different data sources and should be casted and normalized.
hamana.core.column.NumberColumn
¶
NumberColumn(
name: str,
decimal_separator: str = ".",
thousands_separator: str = ",",
null_default_value: int | float | None = None,
parser: ColumnParser | None = None,
order: int | None = None,
)
Bases: Column
Dedicated class representing DataType.NUMBER columns.
The class provides attributes that could be used to define the properties of the number column, such as:
decimal_separator: the decimal separator used in the number. By default, the decimal separator is set to..thousands_separator: the thousands separator used in the number. By default, the thousands separator is set to,.null_default_value: the default value to be used when a null value is found. By default, the default value is set toNone.
The class also provides a default parser that could be used to parse
the number column using pandas.
Source code in src/hamana/core/column.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
decimal_separator
instance-attribute
¶
decimal_separator: str = decimal_separator
Decimal separator used in the number.
thousands_separator
instance-attribute
¶
thousands_separator: str = thousands_separator
Thousands separator used in the number.
null_default_value
instance-attribute
¶
null_default_value: int | float | None = null_default_value
Default value to be used when a null value is found.
pandas_default_parser
¶
pandas_default_parser(
series: PandasSeries,
mode: PandasParsingModes = PandasParsingModes.RAISE,
) -> PandasSeries
Default pandas parser for the number columns. The function
converts first the column to string type and replaces the
thousands separator with an empty string and the decimal
separator with .. Then, the function tries to convert the
column to a numeric type using the pandas.to_numeric.
If the null_default_value is set, the function fills the
null values with the default value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
mode
|
PandasParsingModes
|
mode to be used when parsing the number column.
By default, the mode is set to |
PandasParsingModes.RAISE
|
Returns:
| Type | Description |
|---|---|
PandasSeries
|
|
Raises:
| Type | Description |
|---|---|
`ColumnParserPandasNumberError`
|
error parsing the number column. |
Source code in src/hamana/core/column.py
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 | |
hamana.core.column.IntegerColumn
¶
IntegerColumn(
name: str,
decimal_separator: str = ".",
thousands_separator: str = ",",
null_default_value: int | None = 0,
parser: ColumnParser | None = None,
order: int | None = None,
)
Bases: NumberColumn
Class representing DataType.INTEGER columns.
It ehrits from the NumberColumn class and provides
a default parser that could be used to parse integer columns.
Similar to the NumberColumn class, the IntegerColumn class
provides attributes that could be used to define the properties
of the integer column, such as:
decimal_separator: the decimal separator used in the number. By default, the decimal separator is set to..thousands_separator: the thousands separator used in the number. By default, the thousands separator is set to,.null_default_value: the default value to be used when a null value is found. By default, the default value is set to0.
Source code in src/hamana/core/column.py
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | |
pandas_default_parser
¶
pandas_default_parser(
series: PandasSeries,
mode: PandasParsingModes = PandasParsingModes.RAISE,
) -> PandasSeries
Default pandas parser for the integer columns. Similar
to the NumberColumn class, the function converts first
the column to string type and replaces the thousands separator
with an empty string and the decimal separator with ..
Then, the function tries to convert the column to a numeric
type using the pandas.to_numeric.
If the null_default_value is set, the function fills the
null values with the default value, and casts the column to
integer type. Otherwise, the function applies the np.floor
function to the returned series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
mode
|
PandasParsingModes
|
mode to be used when parsing the number column.
By default, the mode is set to |
PandasParsingModes.RAISE
|
Returns:
| Type | Description |
|---|---|
PandasSeries
|
|
Raises:
| Type | Description |
|---|---|
`ColumnParserPandasNumberError`
|
error parsing the number column. |
Source code in src/hamana/core/column.py
311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 | |
hamana.core.column.StringColumn
¶
StringColumn(
name: str,
parser: ColumnParser | None = None,
order: int | None = None,
)
Bases: Column
Class representing DataType.STRING columns.
Source code in src/hamana/core/column.py
359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 | |
pandas_default_parser
¶
pandas_default_parser(series: PandasSeries) -> PandasSeries
Default pandas parser for the string columns. The function
converts the column to string type and replaces the null values
with None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
Returns:
| Type | Description |
|---|---|
PandasSeries
|
|
Source code in src/hamana/core/column.py
378 379 380 381 382 383 384 385 386 387 388 389 390 391 | |
hamana.core.column.BooleanColumn
¶
BooleanColumn(
name: str,
true_value: str | int | float = "Y",
false_value: str | int | float = "N",
parser: ColumnParser | None = None,
order: int | None = None,
)
Bases: Column
Class representing DataType.BOOLEAN columns.
The class provides attributes that could be used to define the properties of the boolean column, such as:
true_value: the value to be used to represent theTruevalue. By default, the value is set toY.false_value: the value to be used to represent theFalsevalue. By default, the value is set toN.
The class also provides a default parser that could be used to parse
the boolean column using pandas.
Source code in src/hamana/core/column.py
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 | |
true_value
instance-attribute
¶
true_value: str | int | float = true_value
Value to be used to represent the True value.
false_value
instance-attribute
¶
false_value: str | int | float = false_value
Value to be used to represent the False value.
pandas_default_parser
¶
pandas_default_parser(series: PandasSeries) -> PandasSeries
Default pandas parser for the boolean columns.
The function maps the values to True and False
based on the true_value and false_value attributes.
Observe that all other values are set to None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
Returns:
| Type | Description |
|---|---|
PandasSeries
|
|
Source code in src/hamana/core/column.py
441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 | |
hamana.core.column.DatetimeColumn
¶
DatetimeColumn(
name: str,
format: str = "%Y-%m-%d %H:%M:%S",
null_default_value: (
datetime | pd.Timestamp | None
) = None,
parser: ColumnParser | None = None,
order: int | None = None,
)
Bases: Column
Class representing DataType.DATETIME columns.
The class provides attributes that could be used to define the properties of the datetime column, such as:
format: the format to be used to parse the datetime. By default, the format is set to%Y-%m-%d %H:%M:%S.null_default_value: the default value to be used when a null value is found. By default, the default value is set toNone.
The class also provides a default parser that could be used to parse
the datetime column using pandas.
Source code in src/hamana/core/column.py
479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 | |
null_default_value
instance-attribute
¶
null_default_value: datetime | pd.Timestamp | None = (
null_default_value
)
Default value to be used when a null value is found.
pandas_default_parser
¶
pandas_default_parser(
series: PandasSeries,
mode: PandasParsingModes = PandasParsingModes.RAISE,
) -> PandasSeries
Default pandas parser for the datetime columns. The function
tries to convert the column to a datetime type using the pandas.to_datetime.
Observe that pandas.to_datetime could raise an OutOfBoundsDatetime error
when the datetime is out of bounds. In this case, the function switches to
a 'slow' mode where it first converts the column to string type and divides
it into two parts:
- the part that could be casted to datetime using the
pandas.to_datetime. - the part that could not be casted, and should be parsed using the
dateutil.parser.
This approach is slower than the default one, but can handle out of bounds datetimes.
Finally, the function fills the null values with the default value, if set.
If the null_default_value is set, the function fills the null values
with the default value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
PandasSeries
|
|
required |
mode
|
PandasParsingModes
|
mode to be used when parsing the datetime column.
By default, the mode is set to |
PandasParsingModes.RAISE
|
Returns:
| Type | Description |
|---|---|
PandasSeries
|
|
Raises:
| Type | Description |
|---|---|
`ColumnParserPandasDatetimeError`
|
error parsing the datetime column. |
Source code in src/hamana/core/column.py
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 | |
hamana.core.column.DateColumn
¶
DateColumn(
name: str,
format: str = "%Y-%m-%d",
null_default_value: (
datetime | pd.Timestamp | None
) = None,
parser: ColumnParser | None = None,
order: int | None = None,
)
Bases: DatetimeColumn
Class representing DataType.DATE columns.
The class inherits from the DatetimeColumn class and can
be used to store date values. Different from the DatetimeColumn
class, the DateColumn class does not store the time part of the
datetime.
Note
During the initialization, the format is analysed to ensure that no
time part is present. If the time part is found, an error is raised.
Similar to the DatetimeColumn class, the DateColumn class provides attributes
that could be used to define the properties of the date column, such as:
format: the format to be used to parse the date. By default, the format is set to%Y-%m-%d.null_default_value: the default value to be used when a null value is found. By default, the default value is set toNone.
Raises:
| Type | Description |
|---|---|
`ColumnDateFormatterError`
|
error raised when the date format contains a time part. |
Source code in src/hamana/core/column.py
595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 | |
check_format
staticmethod
¶
check_format(format: str) -> None
Function to check if the date format contains a time part.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
format
|
str
|
date format to be checked. |
required |
Raises:
| Type | Description |
|---|---|
`ColumnDateFormatterError`
|
error raised when the date format contains a time part. |
Source code in src/hamana/core/column.py
614 615 616 617 618 619 620 621 622 623 624 625 626 627 | |