Foundation: Incremental Loading
Incremental data loading is a method used to optimize the ingestion of data from a source system to a storage or data warehouse by only loading new or updated records after an initial full load. This approach reduces processing time, minimizes resource usage, and ensures that data pipelines remain efficient. In TimeXtender Classic, incremental loading is a key feature that enables faster and more efficient data integration, particularly for large datasets.
Key Components of Incremental Loading
- Primary Keys: A reliable primary key is required for incremental load as it ensures that records can be accurately matched between the source and storage systems. Rows with new primary keys are inserted and rows with primary keys that already exist in the storage are updated.
- Incremental Selection Rules: These rules identify which field (or fields) contain values which indicate when a row has been updated in the source. In many source systems, these are indicated by a field name containing the word "Modified" and include Date/Time values (e.g. [ModifiedDate]). However, it's important to verify that the fields are reliable prior to configuration. Other data types are also supported as long as the values can be filtered using
>
,<
,MAX
, orSORT BY
operators.
How Incremental Loading Works in TimeXtender
TimeXtender handles incremental loading through a combination of primary keys, incremental selection rules, and SQL WHERE
clause filters. This process can be broken down into three main steps:
- Full Load (Initial Extraction):
The first step involves performing a full load, where all rows from the source system are extracted and transferred to the storage or data warehouse. This establishes the baseline dataset for future incremental loads. This step uses a simple query likeSELECT *
to retrieve all available rows. - Identifying the Max Incremental Field Value:
After the initial load, TimeXtender identifies the maximum value(s) contained in the incremental selection rule (e.g.,ModifiedDate
orTimestamp
). This value is stored for future reference. This field serves as a marker for determining which rows have been added or updated since the last load. - Incremental Load (Subsequent Extractions):
During subsequent loads, TimeXtender uses the stored maximum value to filter out previously loaded records. Using an SQLWHERE
clause, such asWHERE [ModifiedDate] > '2023-01-01'
, only rows with a value greater than this stored maximum are extracted and then stored in the RAW table. - Insert & Update:
The TimeXtender Data Cleansing procedure then compares the primary keys of the records in the RAW and VALID tables, inserting records without existing primary key and updating records with existing primary keys.
By automating these steps, TimeXtender streamlines data integration processes, enabling organizations to maintain up-to-date data warehouses while minimizing resource consumption.
For more detailed information please see: Incremental Loading