Skip to content

Client

glue R Documentation

AWS Glue

Description

Glue

Defines the public endpoint for the Glue service.

Usage

glue(config = list(), credentials = list(), endpoint = NULL, region = NULL)

Arguments

config

Optional configuration of credentials, endpoint, and/or region.

  • credentials:

    • creds:

      • access_key_id: AWS access key ID

      • secret_access_key: AWS secret access key

      • session_token: AWS temporary session token

    • profile: The name of a profile to use. If not given, then the default profile is used.

    • anonymous: Set anonymous credentials.

  • endpoint: The complete URL to use for the constructed client.

  • region: The AWS Region used in instantiating the client.

  • close_connection: Immediately close all HTTP connections.

  • timeout: The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds.

  • s3_force_path_style: Set this to true to force the request to use path-style addressing, i.e. ⁠http://s3.amazonaws.com/BUCKET/KEY⁠.

  • sts_regional_endpoint: Set sts regional endpoint resolver to regional or legacy https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html

credentials

Optional credentials shorthand for the config parameter

  • creds:

    • access_key_id: AWS access key ID

    • secret_access_key: AWS secret access key

    • session_token: AWS temporary session token

  • profile: The name of a profile to use. If not given, then the default profile is used.

  • anonymous: Set anonymous credentials.

endpoint

Optional shorthand for complete URL to use for the constructed client.

region

Optional shorthand for AWS Region used in instantiating the client.

Value

A client for the service. You can call the service's operations using syntax like svc$operation(...), where svc is the name you've assigned to the client. The available operations are listed in the Operations section.

Service syntax

svc <- glue(
  config = list(
    credentials = list(
      creds = list(
        access_key_id = "string",
        secret_access_key = "string",
        session_token = "string"
      ),
      profile = "string",
      anonymous = "logical"
    ),
    endpoint = "string",
    region = "string",
    close_connection = "logical",
    timeout = "numeric",
    s3_force_path_style = "logical",
    sts_regional_endpoint = "string"
  ),
  credentials = list(
    creds = list(
      access_key_id = "string",
      secret_access_key = "string",
      session_token = "string"
    ),
    profile = "string",
    anonymous = "logical"
  ),
  endpoint = "string",
  region = "string"
)

Operations

batch_create_partition
Creates one or more partitions in a batch operation
batch_delete_connection
Deletes a list of connection definitions from the Data Catalog
batch_delete_partition
Deletes one or more partitions in a batch operation
batch_delete_table
Deletes multiple tables at once
batch_delete_table_version
Deletes a specified batch of versions of a table
batch_get_blueprints
Retrieves information about a list of blueprints
batch_get_crawlers
Returns a list of resource metadata for a given list of crawler names
batch_get_custom_entity_types
Retrieves the details for the custom patterns specified by a list of names
batch_get_data_quality_result
Retrieves a list of data quality results for the specified result IDs
batch_get_dev_endpoints
Returns a list of resource metadata for a given list of development endpoint names
batch_get_jobs
Returns a list of resource metadata for a given list of job names
batch_get_partition
Retrieves partitions in a batch request
batch_get_table_optimizer
Returns the configuration for the specified table optimizers
batch_get_triggers
Returns a list of resource metadata for a given list of trigger names
batch_get_workflows
Returns a list of resource metadata for a given list of workflow names
batch_put_data_quality_statistic_annotation
Annotate datapoints over time for a specific data quality statistic
batch_stop_job_run
Stops one or more job runs for a specified job definition
batch_update_partition
Updates one or more partitions in a batch operation
cancel_data_quality_rule_recommendation_run
Cancels the specified recommendation run that was being used to generate rules
cancel_data_quality_ruleset_evaluation_run
Cancels a run where a ruleset is being evaluated against a data source
cancel_ml_task_run
Cancels (stops) a task run
cancel_statement
Cancels the statement
check_schema_version_validity
Validates the supplied schema
create_blueprint
Registers a blueprint with Glue
create_classifier
Creates a classifier in the user's account
create_connection
Creates a connection definition in the Data Catalog
create_crawler
Creates a new crawler with specified targets, role, configuration, and optional schedule
create_custom_entity_type
Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data
create_database
Creates a new database in a Data Catalog
create_data_quality_ruleset
Creates a data quality ruleset with DQDL rules applied to a specified Glue table
create_dev_endpoint
Creates a new development endpoint
create_job
Creates a new job definition
create_ml_transform
Creates an Glue machine learning transform
create_partition
Creates a new partition
create_partition_index
Creates a specified partition index in an existing table
create_registry
Creates a new registry which may be used to hold a collection of schemas
create_schema
Creates a new schema set and registers the schema definition
create_script
Transforms a directed acyclic graph (DAG) into code
create_security_configuration
Creates a new security configuration
create_session
Creates a new session
create_table
Creates a new table definition in the Data Catalog
create_table_optimizer
Creates a new table optimizer for a specific function
create_trigger
Creates a new trigger
create_usage_profile
Creates an Glue usage profile
create_user_defined_function
Creates a new function definition in the Data Catalog
create_workflow
Creates a new workflow
delete_blueprint
Deletes an existing blueprint
delete_classifier
Removes a classifier from the Data Catalog
delete_column_statistics_for_partition
Delete the partition column statistics of a column
delete_column_statistics_for_table
Retrieves table statistics of columns
delete_connection
Deletes a connection from the Data Catalog
delete_crawler
Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING
delete_custom_entity_type
Deletes a custom pattern by specifying its name
delete_database
Removes a specified database from a Data Catalog
delete_data_quality_ruleset
Deletes a data quality ruleset
delete_dev_endpoint
Deletes a specified development endpoint
delete_job
Deletes a specified job definition
delete_ml_transform
Deletes an Glue machine learning transform
delete_partition
Deletes a specified partition
delete_partition_index
Deletes a specified partition index from an existing table
delete_registry
Delete the entire registry including schema and all of its versions
delete_resource_policy
Deletes a specified policy
delete_schema
Deletes the entire schema set, including the schema set and all of its versions
delete_schema_versions
Remove versions from the specified schema
delete_security_configuration
Deletes a specified security configuration
delete_session
Deletes the session
delete_table
Removes a table definition from the Data Catalog
delete_table_optimizer
Deletes an optimizer and all associated metadata for a table
delete_table_version
Deletes a specified version of a table
delete_trigger
Deletes a specified trigger
delete_usage_profile
Deletes the Glue specified usage profile
delete_user_defined_function
Deletes an existing function definition from the Data Catalog
delete_workflow
Deletes a workflow
get_blueprint
Retrieves the details of a blueprint
get_blueprint_run
Retrieves the details of a blueprint run
get_blueprint_runs
Retrieves the details of blueprint runs for a specified blueprint
get_catalog_import_status
Retrieves the status of a migration operation
get_classifier
Retrieve a classifier by name
get_classifiers
Lists all classifier objects in the Data Catalog
get_column_statistics_for_partition
Retrieves partition statistics of columns
get_column_statistics_for_table
Retrieves table statistics of columns
get_column_statistics_task_run
Get the associated metadata/information for a task run, given a task run ID
get_column_statistics_task_runs
Retrieves information about all runs associated with the specified table
get_connection
Retrieves a connection definition from the Data Catalog
get_connections
Retrieves a list of connection definitions from the Data Catalog
get_crawler
Retrieves metadata for a specified crawler
get_crawler_metrics
Retrieves metrics about specified crawlers
get_crawlers
Retrieves metadata for all crawlers defined in the customer account
get_custom_entity_type
Retrieves the details of a custom pattern by specifying its name
get_database
Retrieves the definition of a specified database
get_databases
Retrieves all databases defined in a given Data Catalog
get_data_catalog_encryption_settings
Retrieves the security configuration for a specified catalog
get_dataflow_graph
Transforms a Python script into a directed acyclic graph (DAG)
get_data_quality_model
Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason)
get_data_quality_model_result
Retrieve a statistic's predictions for a given Profile ID
get_data_quality_result
Retrieves the result of a data quality rule evaluation
get_data_quality_rule_recommendation_run
Gets the specified recommendation run that was used to generate rules
get_data_quality_ruleset
Returns an existing ruleset by identifier or name
get_data_quality_ruleset_evaluation_run
Retrieves a specific run where a ruleset is evaluated against a data source
get_dev_endpoint
Retrieves information about a specified development endpoint
get_dev_endpoints
Retrieves all the development endpoints in this Amazon Web Services account
get_job
Retrieves an existing job definition
get_job_bookmark
Returns information on a job bookmark entry
get_job_run
Retrieves the metadata for a given job run
get_job_runs
Retrieves metadata for all runs of a given job definition
get_jobs
Retrieves all current job definitions
get_mapping
Creates mappings
get_ml_task_run
Gets details for a specific task run on a machine learning transform
get_ml_task_runs
Gets a list of runs for a machine learning transform
get_ml_transform
Gets an Glue machine learning transform artifact and all its corresponding metadata
get_ml_transforms
Gets a sortable, filterable list of existing Glue machine learning transforms
get_partition
Retrieves information about a specified partition
get_partition_indexes
Retrieves the partition indexes associated with a table
get_partitions
Retrieves information about the partitions in a table
get_plan
Gets code to perform a specified mapping
get_registry
Describes the specified registry in detail
get_resource_policies
Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants
get_resource_policy
Retrieves a specified resource policy
get_schema
Describes the specified schema in detail
get_schema_by_definition
Retrieves a schema by the SchemaDefinition
get_schema_version
Get the specified schema by its unique ID assigned when a version of the schema is created or registered
get_schema_versions_diff
Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry
get_security_configuration
Retrieves a specified security configuration
get_security_configurations
Retrieves a list of all security configurations
get_session
Retrieves the session
get_statement
Retrieves the statement
get_table
Retrieves the Table definition in a Data Catalog for a specified table
get_table_optimizer
Returns the configuration of all optimizers associated with a specified table
get_tables
Retrieves the definitions of some or all of the tables in a given Database
get_table_version
Retrieves a specified version of a table
get_table_versions
Retrieves a list of strings that identify available versions of a specified table
get_tags
Retrieves a list of tags associated with a resource
get_trigger
Retrieves the definition of a trigger
get_triggers
Gets all the triggers associated with a job
get_unfiltered_partition_metadata
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata
get_unfiltered_partitions_metadata
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata
get_unfiltered_table_metadata
Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog
get_usage_profile
Retrieves information about the specified Glue usage profile
get_user_defined_function
Retrieves a specified function definition from the Data Catalog
get_user_defined_functions
Retrieves multiple function definitions from the Data Catalog
get_workflow
Retrieves resource metadata for a workflow
get_workflow_run
Retrieves the metadata for a given workflow run
get_workflow_run_properties
Retrieves the workflow run properties which were set during the run
get_workflow_runs
Retrieves metadata for all runs of a given workflow
import_catalog_to_glue
Imports an existing Amazon Athena Data Catalog to Glue
list_blueprints
Lists all the blueprint names in an account
list_column_statistics_task_runs
List all task runs for a particular account
list_crawlers
Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag
list_crawls
Returns all the crawls of a specified crawler
list_custom_entity_types
Lists all the custom patterns that have been created
list_data_quality_results
Returns all data quality execution results for your account
list_data_quality_rule_recommendation_runs
Lists the recommendation runs meeting the filter criteria
list_data_quality_ruleset_evaluation_runs
Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source
list_data_quality_rulesets
Returns a paginated list of rulesets for the specified list of Glue tables
list_data_quality_statistic_annotations
Retrieve annotations for a data quality statistic
list_data_quality_statistics
Retrieves a list of data quality statistics
list_dev_endpoints
Retrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag
list_jobs
Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag
list_ml_transforms
Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag
list_registries
Returns a list of registries that you have created, with minimal registry information
list_schemas
Returns a list of schemas with minimal details
list_schema_versions
Returns a list of schema versions that you have created, with minimal information
list_sessions
Retrieve a list of sessions
list_statements
Lists statements for the session
list_table_optimizer_runs
Lists the history of previous optimizer runs for a specific table
list_triggers
Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag
list_usage_profiles
List all the Glue usage profiles
list_workflows
Lists names of workflows created in the account
put_data_catalog_encryption_settings
Sets the security configuration for a specified catalog
put_data_quality_profile_annotation
Annotate all datapoints for a Profile
put_resource_policy
Sets the Data Catalog resource policy for access control
put_schema_version_metadata
Puts the metadata key value pair for a specified schema version ID
put_workflow_run_properties
Puts the specified workflow run properties for the given workflow run
query_schema_version_metadata
Queries for the schema version metadata information
register_schema_version
Adds a new version to the existing schema
remove_schema_version_metadata
Removes a key value pair from the schema version metadata for the specified schema version ID
reset_job_bookmark
Resets a bookmark entry
resume_workflow_run
Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run
run_statement
Executes the statement
search_tables
Searches a set of tables based on properties in the table metadata as well as on the parent database
start_blueprint_run
Starts a new run of the specified blueprint
start_column_statistics_task_run
Starts a column statistics task run, for a specified table and columns
start_crawler
Starts a crawl using the specified crawler, regardless of what is scheduled
start_crawler_schedule
Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED
start_data_quality_rule_recommendation_run
Starts a recommendation run that is used to generate rules when you don't know what rules to write
start_data_quality_ruleset_evaluation_run
Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table)
start_export_labels_task_run
Begins an asynchronous task to export all labeled data for a particular transform
start_import_labels_task_run
Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality
start_job_run
Starts a job run using a job definition
start_ml_evaluation_task_run
Starts a task to estimate the quality of the transform
start_ml_labeling_set_generation_task_run
Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels
start_trigger
Starts an existing trigger
start_workflow_run
Starts a new run of the specified workflow
stop_column_statistics_task_run
Stops a task run for the specified table
stop_crawler
If the specified crawler is running, stops the crawl
stop_crawler_schedule
Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running
stop_session
Stops the session
stop_trigger
Stops a specified trigger
stop_workflow_run
Stops the execution of the specified workflow run
tag_resource
Adds tags to a resource
untag_resource
Removes tags from a resource
update_blueprint
Updates a registered blueprint
update_classifier
Modifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present)
update_column_statistics_for_partition
Creates or updates partition statistics of columns
update_column_statistics_for_table
Creates or updates table statistics of columns
update_connection
Updates a connection definition in the Data Catalog
update_crawler
Updates a crawler
update_crawler_schedule
Updates the schedule of a crawler using a cron expression
update_database
Updates an existing database definition in a Data Catalog
update_data_quality_ruleset
Updates the specified data quality ruleset
update_dev_endpoint
Updates a specified development endpoint
update_job
Updates an existing job definition
update_job_from_source_control
Synchronizes a job from the source control repository
update_ml_transform
Updates an existing machine learning transform
update_partition
Updates a partition
update_registry
Updates an existing registry which is used to hold a collection of schemas
update_schema
Updates the description, compatibility setting, or version checkpoint for a schema set
update_source_control_from_job
Synchronizes a job to the source control repository
update_table
Updates a metadata table in the Data Catalog
update_table_optimizer
Updates the configuration for an existing table optimizer
update_trigger
Updates a trigger definition
update_usage_profile
Update an Glue usage profile
update_user_defined_function
Updates an existing function definition in the Data Catalog
update_workflow
Updates an existing workflow

Examples

## Not run: 
svc <- glue()
svc$batch_create_partition(
  Foo = 123
)

## End(Not run)