Skip to main content

FTP/SFTP Data Connector

The FTP/SFTP Data Connector enables federated SQL query across Parquet/CSV files stored in FTP/SFTP servers.

If a folder is provided, all child Parquet/CSV files will be loaded.

Configuration​

Parameters​

The connection to FTP can be configured by providing the following params:

  • file_format: Specifies the data file format. Required if the format cannot be inferred by from the from path. See Object Store File Formats.
  • ftp_port: Optional, specifies the port of the FTP server. Default is 21. E.g. ftp_port: 21
  • ftp_user: The username for the FTP server. E.g. ftp_user: my-ftp-user
  • ftp_pass: The password for the FTP server. Use the secret replacement syntax to load the password from a secret store, e.g. ${secrets:my_ftp_pass}.
  • client_timeout: Optional. Specifies timeout for FTP connection. E.g. client_timeout: 30s. When not set, no timeout will be configured for FTP client.
  • hive_infer_partitions: Optional. Infer the partition columns for hive-style partitioning from the folder structure. Defaults to true.

More CSV related parameters can be configured, see CSV Parameters

Examples​

  - from: ftp://remote-ftp-server.com/path/to/folder/
name: my_dataset
params:
file_format: csv
ftp_user: my-ftp-user
ftp_pass: ${secrets:my_ftp_password}
hive_infer_partitions: false