1
0
mirror of synced 2025-12-22 03:21:25 -05:00
Files
airbyte/docs/connector-development/config-based/understanding-the-yaml-file/pagination.md
2024-05-07 08:19:33 -07:00

232 lines
6.3 KiB
Markdown

# Pagination
Given a page size and a pagination strategy, the `DefaultPaginator` will point to pages of results for as long as its strategy returns a `next_page_token`.
Iterating over pages of result is different from iterating over stream slices.
Stream slices have semantic value, for instance, a Datetime stream slice defines data for a specific date range. Two stream slices will have data for different date ranges.
Conversely, pages don't have semantic value. More pages simply means that more records are to be read, without specifying any meaningful difference between the records of the first and later pages.
Schema:
```yaml
Paginator:
type: object
anyOf:
- "$ref": "#/definitions/DefaultPaginator"
- "$ref": "#/definitions/NoPagination"
NoPagination:
type: object
additionalProperties: true
```
## Default paginator
The default paginator is defined by
- `page_size_option`: How to specify the page size in the outgoing HTTP request
- `pagination_strategy`: How to compute the next page to fetch
- `page_token_option`: How to specify the next page to fetch in the outgoing HTTP request
Schema:
```yaml
DefaultPaginator:
type: object
additionalProperties: true
required:
- page_token_option
- pagination_strategy
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
page_size:
type: integer
page_size_option:
"$ref": "#/definitions/RequestOption"
page_token_option:
anyOf:
- "$ref": "#/definitions/RequestOption"
- "$ref": "#/definitions/RequestPath"
pagination_strategy:
"$ref": "#/definitions/PaginationStrategy"
```
3 pagination strategies are supported
1. Page increment
2. Offset increment
3. Cursor-based
## Pagination Strategies
Schema:
```yaml
PaginationStrategy:
type: object
anyOf:
- "$ref": "#/definitions/CursorPagination"
- "$ref": "#/definitions/OffsetIncrement"
- "$ref": "#/definitions/PageIncrement"
```
### Page increment
When using the `PageIncrement` strategy, the page number will be set as part of the `page_token_option`.
Schema:
```yaml
PageIncrement:
type: object
additionalProperties: true
required:
- page_size
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
page_size:
type: integer
```
The following paginator example will fetch 5 records per page, and specify the page number as a request_parameter:
Example:
```yaml
paginator:
type: "DefaultPaginator"
page_size_option:
type: "RequestOption"
inject_into: "request_parameter"
field_name: "page_size"
pagination_strategy:
type: "PageIncrement"
page_size: 5
page_token_option:
type: "RequestOption"
inject_into: "request_parameter"
field_name: "page"
```
If the page contains less than 5 records, then the paginator knows there are no more pages to fetch.
If the API returns more records than requested, all records will be processed.
Assuming the endpoint to fetch data from is `https://cloud.airbyte.com/api/get_data`,
the first request will be sent as `https://cloud.airbyte.com/api/get_data?page_size=5&page=0`
and the second request as `https://cloud.airbyte.com/api/get_data?page_size=5&page=1`,
### Offset increment
When using the `OffsetIncrement` strategy, the number of records read will be set as part of the `page_token_option`.
Schema:
```yaml
OffsetIncrement:
type: object
additionalProperties: true
required:
- page_size
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
page_size:
type: integer
```
The following paginator example will fetch 5 records per page, and specify the offset as a request_parameter:
Example:
```yaml
paginator:
type: "DefaultPaginator"
page_size_option:
type: "RequestOption"
inject_into: "request_parameter"
field_name: "page_size"
pagination_strategy:
type: "OffsetIncrement"
page_size: 5
page_token_option:
type: "RequestOption"
field_name: "offset"
inject_into: "request_parameter"
```
Assuming the endpoint to fetch data from is `https://cloud.airbyte.com/api/get_data`,
the first request will be sent as `https://cloud.airbyte.com/api/get_data?page_size=5&offset=0`
and the second request as `https://cloud.airbyte.com/api/get_data?page_size=5&offset=5`,
### Cursor
The `CursorPagination` outputs a token by evaluating its `cursor_value` string with the following parameters:
- `response`: The decoded response
- `headers`: HTTP headers on the response
- `last_records`: List of records selected from the last response
This cursor value can be used to request the next page of record.
Schema:
```yaml
CursorPagination:
type: object
additionalProperties: true
required:
- cursor_value
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
cursor_value:
type: string
stop_condition:
type: string
page_size:
type: integer
```
#### Cursor paginator in request parameters
In this example, the next page of record is defined by setting the `from` request parameter to the id of the last record read:
```yaml
paginator:
type: "DefaultPaginator"
<...>
pagination_strategy:
type: "CursorPagination"
cursor_value: "{{ last_records[-1]['id'] }}"
page_token_option:
type: "RequestPath"
field_name: "from"
inject_into: "request_parameter"
```
Assuming the endpoint to fetch data from is `https://cloud.airbyte.com/api/get_data`,
the first request will be sent as `https://cloud.airbyte.com/api/get_data`.
Assuming the id of the last record fetched is 1000,
the next request will be sent as `https://cloud.airbyte.com/api/get_data?from=1000`.
#### Cursor paginator in path
Some APIs directly point to the URL of the next page to fetch. In this example, the URL of the next page is extracted from the response headers:
```yaml
paginator:
type: "DefaultPaginator"
<...>
pagination_strategy:
type: "CursorPagination"
cursor_value: "{{ headers['link']['next']['url'] }}"
page_token_option:
type: "RequestPath"
```
Assuming the endpoint to fetch data from is `https://cloud.airbyte.com/api/get_data`,
the first request will be sent as `https://cloud.airbyte.com/api/get_data`
Assuming the response's next url is `https://cloud.airbyte.com/api/get_data?page=1&page_size=100`,
the next request will be sent as `https://cloud.airbyte.com/api/get_data?page=1&page_size=100`