Client

glue	R Documentation

AWS Glue¶

Description¶

Glue

Defines the public endpoint for the Glue service.

Usage¶

glue(config = list(), credentials = list(), endpoint = NULL, region = NULL)

Arguments¶

config

Optional configuration of credentials, endpoint, and/or region.

credentials:
- creds:
  - access_key_id: AWS access key ID
  - secret_access_key: AWS secret access key
  - session_token: AWS temporary session token
- profile: The name of a profile to use. If not given, then the default profile is used.
- anonymous: Set anonymous credentials.
endpoint: The complete URL to use for the constructed client.
region: The AWS Region used in instantiating the client.
close_connection: Immediately close all HTTP connections.
timeout: The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds.
s3_force_path_style: Set this to true to force the request to use path-style addressing, i.e. ⁠http://s3.amazonaws.com/BUCKET/KEY⁠.
sts_regional_endpoint: Set sts regional endpoint resolver to regional or legacy https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html

credentials

Optional credentials shorthand for the config parameter

creds:
- access_key_id: AWS access key ID
- secret_access_key: AWS secret access key
- session_token: AWS temporary session token
profile: The name of a profile to use. If not given, then the default profile is used.
anonymous: Set anonymous credentials.

endpoint

Optional shorthand for complete URL to use for the constructed client.

region

Optional shorthand for AWS Region used in instantiating the client.

Value¶

A client for the service. You can call the service's operations using syntax like svc$operation(...), where svc is the name you've assigned to the client. The available operations are listed in the Operations section.

Service syntax¶

svc <- glue(
  config = list(
    credentials = list(
      creds = list(
        access_key_id = "string",
        secret_access_key = "string",
        session_token = "string"
      ),
      profile = "string",
      anonymous = "logical"
    ),
    endpoint = "string",
    region = "string",
    close_connection = "logical",
    timeout = "numeric",
    s3_force_path_style = "logical",
    sts_regional_endpoint = "string"
  ),
  credentials = list(
    creds = list(
      access_key_id = "string",
      secret_access_key = "string",
      session_token = "string"
    ),
    profile = "string",
    anonymous = "logical"
  ),
  endpoint = "string",
  region = "string"
)

Operations¶

batch_create_partition: Creates one or more partitions in a batch operation
batch_delete_connection: Deletes a list of connection definitions from the Data Catalog
batch_delete_partition: Deletes one or more partitions in a batch operation
batch_delete_table: Deletes multiple tables at once
batch_delete_table_version: Deletes a specified batch of versions of a table
batch_get_blueprints: Retrieves information about a list of blueprints
batch_get_crawlers: Returns a list of resource metadata for a given list of crawler names
batch_get_custom_entity_types: Retrieves the details for the custom patterns specified by a list of names
batch_get_data_quality_result: Retrieves a list of data quality results for the specified result IDs
batch_get_dev_endpoints: Returns a list of resource metadata for a given list of development endpoint names
batch_get_jobs: Returns a list of resource metadata for a given list of job names
batch_get_partition: Retrieves partitions in a batch request
batch_get_table_optimizer: Returns the configuration for the specified table optimizers
batch_get_triggers: Returns a list of resource metadata for a given list of trigger names
batch_get_workflows: Returns a list of resource metadata for a given list of workflow names
batch_stop_job_run: Stops one or more job runs for a specified job definition
batch_update_partition: Updates one or more partitions in a batch operation; Cancels the specified recommendation run that was being used to generate rules; Cancels a run where a ruleset is being evaluated against a data source
cancel_ml_task_run: Cancels (stops) a task run
cancel_statement: Cancels the statement
check_schema_version_validity: Validates the supplied schema
create_blueprint: Registers a blueprint with Glue
create_classifier: Creates a classifier in the user's account
create_connection: Creates a connection definition in the Data Catalog
create_crawler: Creates a new crawler with specified targets, role, configuration, and optional schedule
create_custom_entity_type: Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data
create_database: Creates a new database in a Data Catalog
create_data_quality_ruleset: Creates a data quality ruleset with DQDL rules applied to a specified Glue table
create_dev_endpoint: Creates a new development endpoint
create_job: Creates a new job definition
create_ml_transform: Creates an Glue machine learning transform
create_partition: Creates a new partition
create_partition_index: Creates a specified partition index in an existing table
create_registry: Creates a new registry which may be used to hold a collection of schemas
create_schema: Creates a new schema set and registers the schema definition
create_script: Transforms a directed acyclic graph (DAG) into code
create_security_configuration: Creates a new security configuration
create_session: Creates a new session
create_table: Creates a new table definition in the Data Catalog
create_table_optimizer: Creates a new table optimizer for a specific function
create_trigger: Creates a new trigger
create_user_defined_function: Creates a new function definition in the Data Catalog
create_workflow: Creates a new workflow
delete_blueprint: Deletes an existing blueprint
delete_classifier: Removes a classifier from the Data Catalog; Delete the partition column statistics of a column
delete_column_statistics_for_table: Retrieves table statistics of columns
delete_connection: Deletes a connection from the Data Catalog
delete_crawler: Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING
delete_custom_entity_type: Deletes a custom pattern by specifying its name
delete_database: Removes a specified database from a Data Catalog
delete_data_quality_ruleset: Deletes a data quality ruleset
delete_dev_endpoint: Deletes a specified development endpoint
delete_job: Deletes a specified job definition
delete_ml_transform: Deletes an Glue machine learning transform
delete_partition: Deletes a specified partition
delete_partition_index: Deletes a specified partition index from an existing table
delete_registry: Delete the entire registry including schema and all of its versions
delete_resource_policy: Deletes a specified policy
delete_schema: Deletes the entire schema set, including the schema set and all of its versions
delete_schema_versions: Remove versions from the specified schema
delete_security_configuration: Deletes a specified security configuration
delete_session: Deletes the session
delete_table: Removes a table definition from the Data Catalog
delete_table_optimizer: Deletes an optimizer and all associated metadata for a table
delete_table_version: Deletes a specified version of a table
delete_trigger: Deletes a specified trigger
delete_user_defined_function: Deletes an existing function definition from the Data Catalog
delete_workflow: Deletes a workflow
get_blueprint: Retrieves the details of a blueprint
get_blueprint_run: Retrieves the details of a blueprint run
get_blueprint_runs: Retrieves the details of blueprint runs for a specified blueprint
get_catalog_import_status: Retrieves the status of a migration operation
get_classifier: Retrieve a classifier by name
get_classifiers: Lists all classifier objects in the Data Catalog
get_column_statistics_for_partition: Retrieves partition statistics of columns
get_column_statistics_for_table: Retrieves table statistics of columns
get_column_statistics_task_run: Get the associated metadata/information for a task run, given a task run ID
get_column_statistics_task_runs: Retrieves information about all runs associated with the specified table
get_connection: Retrieves a connection definition from the Data Catalog
get_connections: Retrieves a list of connection definitions from the Data Catalog
get_crawler: Retrieves metadata for a specified crawler
get_crawler_metrics: Retrieves metrics about specified crawlers
get_crawlers: Retrieves metadata for all crawlers defined in the customer account
get_custom_entity_type: Retrieves the details of a custom pattern by specifying its name
get_database: Retrieves the definition of a specified database
get_databases: Retrieves all databases defined in a given Data Catalog
get_data_catalog_encryption_settings: Retrieves the security configuration for a specified catalog
get_dataflow_graph: Transforms a Python script into a directed acyclic graph (DAG)
get_data_quality_result: Retrieves the result of a data quality rule evaluation; Gets the specified recommendation run that was used to generate rules
get_data_quality_ruleset: Returns an existing ruleset by identifier or name; Retrieves a specific run where a ruleset is evaluated against a data source
get_dev_endpoint: Retrieves information about a specified development endpoint
get_dev_endpoints: Retrieves all the development endpoints in this Amazon Web Services account
get_job: Retrieves an existing job definition
get_job_bookmark: Returns information on a job bookmark entry
get_job_run: Retrieves the metadata for a given job run
get_job_runs: Retrieves metadata for all runs of a given job definition
get_jobs: Retrieves all current job definitions
get_mapping: Creates mappings
get_ml_task_run: Gets details for a specific task run on a machine learning transform
get_ml_task_runs: Gets a list of runs for a machine learning transform
get_ml_transform: Gets an Glue machine learning transform artifact and all its corresponding metadata
get_ml_transforms: Gets a sortable, filterable list of existing Glue machine learning transforms
get_partition: Retrieves information about a specified partition
get_partition_indexes: Retrieves the partition indexes associated with a table
get_partitions: Retrieves information about the partitions in a table
get_plan: Gets code to perform a specified mapping
get_registry: Describes the specified registry in detail
get_resource_policies: Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants
get_resource_policy: Retrieves a specified resource policy
get_schema: Describes the specified schema in detail
get_schema_by_definition: Retrieves a schema by the SchemaDefinition
get_schema_version: Get the specified schema by its unique ID assigned when a version of the schema is created or registered
get_schema_versions_diff: Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry
get_security_configuration: Retrieves a specified security configuration
get_security_configurations: Retrieves a list of all security configurations
get_session: Retrieves the session
get_statement: Retrieves the statement
get_table: Retrieves the Table definition in a Data Catalog for a specified table
get_table_optimizer: Returns the configuration of all optimizers associated with a specified table
get_tables: Retrieves the definitions of some or all of the tables in a given Database
get_table_version: Retrieves a specified version of a table
get_table_versions: Retrieves a list of strings that identify available versions of a specified table
get_tags: Retrieves a list of tags associated with a resource
get_trigger: Retrieves the definition of a trigger
get_triggers: Gets all the triggers associated with a job
get_unfiltered_partition_metadata: Retrieves partition metadata from the Data Catalog that contains unfiltered metadata
get_unfiltered_partitions_metadata: Retrieves partition metadata from the Data Catalog that contains unfiltered metadata
get_unfiltered_table_metadata: Retrieves table metadata from the Data Catalog that contains unfiltered metadata
get_user_defined_function: Retrieves a specified function definition from the Data Catalog
get_user_defined_functions: Retrieves multiple function definitions from the Data Catalog
get_workflow: Retrieves resource metadata for a workflow
get_workflow_run: Retrieves the metadata for a given workflow run
get_workflow_run_properties: Retrieves the workflow run properties which were set during the run
get_workflow_runs: Retrieves metadata for all runs of a given workflow
import_catalog_to_glue: Imports an existing Amazon Athena Data Catalog to Glue
list_blueprints: Lists all the blueprint names in an account
list_column_statistics_task_runs: List all task runs for a particular account
list_crawlers: Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag
list_crawls: Returns all the crawls of a specified crawler
list_custom_entity_types: Lists all the custom patterns that have been created
list_data_quality_results: Returns all data quality execution results for your account; Lists the recommendation runs meeting the filter criteria; Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source
list_data_quality_rulesets: Returns a paginated list of rulesets for the specified list of Glue tables
list_dev_endpoints: Retrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag
list_jobs: Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag
list_ml_transforms: Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag
list_registries: Returns a list of registries that you have created, with minimal registry information
list_schemas: Returns a list of schemas with minimal details
list_schema_versions: Returns a list of schema versions that you have created, with minimal information
list_sessions: Retrieve a list of sessions
list_statements: Lists statements for the session
list_table_optimizer_runs: Lists the history of previous optimizer runs for a specific table
list_triggers: Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag
list_workflows: Lists names of workflows created in the account
put_data_catalog_encryption_settings: Sets the security configuration for a specified catalog
put_resource_policy: Sets the Data Catalog resource policy for access control
put_schema_version_metadata: Puts the metadata key value pair for a specified schema version ID
put_workflow_run_properties: Puts the specified workflow run properties for the given workflow run
query_schema_version_metadata: Queries for the schema version metadata information
register_schema_version: Adds a new version to the existing schema
remove_schema_version_metadata: Removes a key value pair from the schema version metadata for the specified schema version ID
reset_job_bookmark: Resets a bookmark entry
resume_workflow_run: Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run
run_statement: Executes the statement
search_tables: Searches a set of tables based on properties in the table metadata as well as on the parent database
start_blueprint_run: Starts a new run of the specified blueprint
start_column_statistics_task_run: Starts a column statistics task run, for a specified table and columns
start_crawler: Starts a crawl using the specified crawler, regardless of what is scheduled
start_crawler_schedule: Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED; Starts a recommendation run that is used to generate rules when you don't know what rules to write; Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table)
start_export_labels_task_run: Begins an asynchronous task to export all labeled data for a particular transform
start_import_labels_task_run: Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality
start_job_run: Starts a job run using a job definition
start_ml_evaluation_task_run: Starts a task to estimate the quality of the transform; Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels
start_trigger: Starts an existing trigger
start_workflow_run: Starts a new run of the specified workflow
stop_column_statistics_task_run: Stops a task run for the specified table
stop_crawler: If the specified crawler is running, stops the crawl
stop_crawler_schedule: Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running
stop_session: Stops the session
stop_trigger: Stops a specified trigger
stop_workflow_run: Stops the execution of the specified workflow run
tag_resource: Adds tags to a resource
untag_resource: Removes tags from a resource
update_blueprint: Updates a registered blueprint
update_classifier: Modifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present); Creates or updates partition statistics of columns
update_column_statistics_for_table: Creates or updates table statistics of columns
update_connection: Updates a connection definition in the Data Catalog
update_crawler: Updates a crawler
update_crawler_schedule: Updates the schedule of a crawler using a cron expression
update_database: Updates an existing database definition in a Data Catalog
update_data_quality_ruleset: Updates the specified data quality ruleset
update_dev_endpoint: Updates a specified development endpoint
update_job: Updates an existing job definition
update_job_from_source_control: Synchronizes a job from the source control repository
update_ml_transform: Updates an existing machine learning transform
update_partition: Updates a partition
update_registry: Updates an existing registry which is used to hold a collection of schemas
update_schema: Updates the description, compatibility setting, or version checkpoint for a schema set
update_source_control_from_job: Synchronizes a job to the source control repository
update_table: Updates a metadata table in the Data Catalog
update_table_optimizer: Updates the configuration for an existing table optimizer
update_trigger: Updates a trigger definition
update_user_defined_function: Updates an existing function definition in the Data Catalog
update_workflow: Updates an existing workflow

Examples¶

## Not run: 
svc <- glue()
svc$batch_create_partition(
  Foo = 123
)

## End(Not run)