feast_spark.pyspark.launchers.k8s package

Submodules

feast_spark.pyspark.launchers.k8s.k8s module

exception feast_spark.pyspark.launchers.k8s.k8s.JobNotFoundException[source]

Bases: Exception

class feast_spark.pyspark.launchers.k8s.k8s.KubernetesBatchIngestionJob(api: CustomObjectsApi, namespace: str, job_id: str, feature_table: str)[source]

Bases: KubernetesJobMixin, BatchIngestionJob

Ingestion job result for a k8s cluster

get_feature_table() str[source]

Get the feature table name associated with this job. Return empty string if unable to determine the feature table, such as when the job is created by the earlier version of Feast.

Returns

Feature table name

Return type

str

class feast_spark.pyspark.launchers.k8s.k8s.KubernetesJobLauncher(namespace: str, incluster: bool, staging_location: str, generic_resource_template_path: Optional[str], batch_ingestion_resource_template_path: Optional[str], stream_ingestion_resource_template_path: Optional[str], historical_retrieval_resource_template_path: Optional[str], staging_client: AbstractStagingClient)[source]

Bases: JobLauncher

Submits spark jobs to a spark cluster. Currently supports only historical feature retrieval jobs.

get_job_by_id(job_id: str) SparkJob[source]
historical_feature_retrieval(job_params: RetrievalJobParameters) RetrievalJob[source]

Submits a historical feature retrieval job to a Spark cluster.

Raises

SparkJobFailure – The spark job submission failed, encountered error during execution, or timeout.

Returns

wrapper around remote job that returns file uri to the result file.

Return type

RetrievalJob

list_jobs(include_terminated: bool, project: Optional[str] = None, table_name: Optional[str] = None) List[SparkJob][source]
offline_to_online_ingestion(ingestion_job_params: BatchIngestionJobParameters) BatchIngestionJob[source]

Submits a batch ingestion job to a Spark cluster.

Raises

SparkJobFailure – The spark job submission failed, encountered error during execution, or timeout.

Returns

wrapper around remote job that can be used to check when job completed.

Return type

BatchIngestionJob

schedule_offline_to_online_ingestion(ingestion_job_params: ScheduledBatchIngestionJobParameters)[source]

Schedule a batch ingestion job using Spark Operator.

Raises

SparkJobFailure – Failure to create the ScheduleSparkApplication resource, or timeout.

Returns

wrapper around remote job that can be used to check the job id.

Return type

ScheduledBatchIngestionJob

start_stream_to_online_ingestion(ingestion_job_params: StreamIngestionJobParameters) StreamIngestionJob[source]

Starts a stream ingestion job to a Spark cluster.

Raises

SparkJobFailure – The spark job submission failed, encountered error during execution, or timeout.

Returns

wrapper around remote job.

Return type

StreamIngestionJob

unschedule_offline_to_online_ingestion(project: str, feature_table: str)[source]

Unschedule a scheduled batch ingestion job.

class feast_spark.pyspark.launchers.k8s.k8s.KubernetesJobMixin(api: CustomObjectsApi, namespace: str, job_id: str)[source]

Bases: object

cancel()[source]
get_error_message() str[source]
get_id() str[source]
get_start_time() datetime[source]
get_status() SparkJobStatus[source]
class feast_spark.pyspark.launchers.k8s.k8s.KubernetesRetrievalJob(api: CustomObjectsApi, namespace: str, job_id: str, output_file_uri: str)[source]

Bases: KubernetesJobMixin, RetrievalJob

Historical feature retrieval job result for a k8s cluster

get_output_file_uri(timeout_sec=None, block=True)[source]

Get output file uri to the result file. This method will block until the job succeeded, or if the job didn’t execute successfully within timeout.

Parameters
  • timeout_sec (int) – Max no of seconds to wait until job is done. If “timeout_sec” is exceeded or if the job fails, an exception will be raised.

  • block (bool) – If false, don’t block until the job is done. If block=True, timeout parameter is ignored.

Raises

SparkJobFailure – The spark job submission failed, encountered error during execution, or timeout.

Returns

file uri to the result file.

Return type

str

class feast_spark.pyspark.launchers.k8s.k8s.KubernetesStreamIngestionJob(api: CustomObjectsApi, namespace: str, job_id: str, job_hash: str, feature_table: str)[source]

Bases: KubernetesJobMixin, StreamIngestionJob

Ingestion streaming job for a k8s cluster

get_feature_table() str[source]

Get the feature table name associated with this job. Return None if unable to determine the feature table, such as when the job is created by the earlier version of Feast.

Returns

Feature table name

Return type

str

get_hash() str[source]

Gets the consistent hash of this stream ingestion job.

The hash needs to be persisted at the data processing layer, so that we can get the same hash when retrieving the job from Spark.

Returns

The hash for this streaming ingestion job

Return type

str

feast_spark.pyspark.launchers.k8s.k8s_utils module

class feast_spark.pyspark.launchers.k8s.k8s_utils.JobInfo(job_id, job_type, job_error_message, namespace, extra_metadata, state, labels, start_time)[source]

Bases: tuple

property extra_metadata

Alias for field number 4

property job_error_message

Alias for field number 2

property job_id

Alias for field number 0

property job_type

Alias for field number 1

property labels

Alias for field number 6

property namespace

Alias for field number 3

property start_time

Alias for field number 7

property state

Alias for field number 5

Module contents

class feast_spark.pyspark.launchers.k8s.KubernetesBatchIngestionJob(api: CustomObjectsApi, namespace: str, job_id: str, feature_table: str)[source]

Bases: KubernetesJobMixin, BatchIngestionJob

Ingestion job result for a k8s cluster

get_feature_table() str[source]

Get the feature table name associated with this job. Return empty string if unable to determine the feature table, such as when the job is created by the earlier version of Feast.

Returns

Feature table name

Return type

str

class feast_spark.pyspark.launchers.k8s.KubernetesJobLauncher(namespace: str, incluster: bool, staging_location: str, generic_resource_template_path: Optional[str], batch_ingestion_resource_template_path: Optional[str], stream_ingestion_resource_template_path: Optional[str], historical_retrieval_resource_template_path: Optional[str], staging_client: AbstractStagingClient)[source]

Bases: JobLauncher

Submits spark jobs to a spark cluster. Currently supports only historical feature retrieval jobs.

get_job_by_id(job_id: str) SparkJob[source]
historical_feature_retrieval(job_params: RetrievalJobParameters) RetrievalJob[source]

Submits a historical feature retrieval job to a Spark cluster.

Raises

SparkJobFailure – The spark job submission failed, encountered error during execution, or timeout.

Returns

wrapper around remote job that returns file uri to the result file.

Return type

RetrievalJob

list_jobs(include_terminated: bool, project: Optional[str] = None, table_name: Optional[str] = None) List[SparkJob][source]
offline_to_online_ingestion(ingestion_job_params: BatchIngestionJobParameters) BatchIngestionJob[source]

Submits a batch ingestion job to a Spark cluster.

Raises

SparkJobFailure – The spark job submission failed, encountered error during execution, or timeout.

Returns

wrapper around remote job that can be used to check when job completed.

Return type

BatchIngestionJob

schedule_offline_to_online_ingestion(ingestion_job_params: ScheduledBatchIngestionJobParameters)[source]

Schedule a batch ingestion job using Spark Operator.

Raises

SparkJobFailure – Failure to create the ScheduleSparkApplication resource, or timeout.

Returns

wrapper around remote job that can be used to check the job id.

Return type

ScheduledBatchIngestionJob

start_stream_to_online_ingestion(ingestion_job_params: StreamIngestionJobParameters) StreamIngestionJob[source]

Starts a stream ingestion job to a Spark cluster.

Raises

SparkJobFailure – The spark job submission failed, encountered error during execution, or timeout.

Returns

wrapper around remote job.

Return type

StreamIngestionJob

unschedule_offline_to_online_ingestion(project: str, feature_table: str)[source]

Unschedule a scheduled batch ingestion job.

class feast_spark.pyspark.launchers.k8s.KubernetesRetrievalJob(api: CustomObjectsApi, namespace: str, job_id: str, output_file_uri: str)[source]

Bases: KubernetesJobMixin, RetrievalJob

Historical feature retrieval job result for a k8s cluster

get_output_file_uri(timeout_sec=None, block=True)[source]

Get output file uri to the result file. This method will block until the job succeeded, or if the job didn’t execute successfully within timeout.

Parameters
  • timeout_sec (int) – Max no of seconds to wait until job is done. If “timeout_sec” is exceeded or if the job fails, an exception will be raised.

  • block (bool) – If false, don’t block until the job is done. If block=True, timeout parameter is ignored.

Raises

SparkJobFailure – The spark job submission failed, encountered error during execution, or timeout.

Returns

file uri to the result file.

Return type

str

class feast_spark.pyspark.launchers.k8s.KubernetesStreamIngestionJob(api: CustomObjectsApi, namespace: str, job_id: str, job_hash: str, feature_table: str)[source]

Bases: KubernetesJobMixin, StreamIngestionJob

Ingestion streaming job for a k8s cluster

get_feature_table() str[source]

Get the feature table name associated with this job. Return None if unable to determine the feature table, such as when the job is created by the earlier version of Feast.

Returns

Feature table name

Return type

str

get_hash() str[source]

Gets the consistent hash of this stream ingestion job.

The hash needs to be persisted at the data processing layer, so that we can get the same hash when retrieving the job from Spark.

Returns

The hash for this streaming ingestion job

Return type

str