Client
glue | R Documentation |
AWS Glue¶
Description¶
Glue
Defines the public endpoint for the Glue service.
Usage¶
Arguments¶
config
Optional configuration of credentials, endpoint, and/or region.
credentials:
creds:
access_key_id: AWS access key ID
secret_access_key: AWS secret access key
session_token: AWS temporary session token
profile: The name of a profile to use. If not given, then the default profile is used.
anonymous: Set anonymous credentials.
endpoint: The complete URL to use for the constructed client.
region: The AWS Region used in instantiating the client.
close_connection: Immediately close all HTTP connections.
timeout: The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds.
s3_force_path_style: Set this to
true
to force the request to use path-style addressing, i.e.http://s3.amazonaws.com/BUCKET/KEY
.sts_regional_endpoint: Set sts regional endpoint resolver to regional or legacy https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html
credentials
Optional credentials shorthand for the config parameter
creds:
access_key_id: AWS access key ID
secret_access_key: AWS secret access key
session_token: AWS temporary session token
profile: The name of a profile to use. If not given, then the default profile is used.
anonymous: Set anonymous credentials.
endpoint
Optional shorthand for complete URL to use for the constructed client.
region
Optional shorthand for AWS Region used in instantiating the client.
Value¶
A client for the service. You can call the service's operations using
syntax like svc$operation(...)
, where svc
is the name you've
assigned to the client. The available operations are listed in the
Operations section.
Service syntax¶
svc <- glue(
config = list(
credentials = list(
creds = list(
access_key_id = "string",
secret_access_key = "string",
session_token = "string"
),
profile = "string",
anonymous = "logical"
),
endpoint = "string",
region = "string",
close_connection = "logical",
timeout = "numeric",
s3_force_path_style = "logical",
sts_regional_endpoint = "string"
),
credentials = list(
creds = list(
access_key_id = "string",
secret_access_key = "string",
session_token = "string"
),
profile = "string",
anonymous = "logical"
),
endpoint = "string",
region = "string"
)
Operations¶
- batch_create_partition
- Creates one or more partitions in a batch operation
- batch_delete_connection
- Deletes a list of connection definitions from the Data Catalog
- batch_delete_partition
- Deletes one or more partitions in a batch operation
- batch_delete_table
- Deletes multiple tables at once
- batch_delete_table_version
- Deletes a specified batch of versions of a table
- batch_get_blueprints
- Retrieves information about a list of blueprints
- batch_get_crawlers
- Returns a list of resource metadata for a given list of crawler names
- batch_get_custom_entity_types
- Retrieves the details for the custom patterns specified by a list of names
- batch_get_data_quality_result
- Retrieves a list of data quality results for the specified result IDs
- batch_get_dev_endpoints
- Returns a list of resource metadata for a given list of development endpoint names
- batch_get_jobs
- Returns a list of resource metadata for a given list of job names
- batch_get_partition
- Retrieves partitions in a batch request
- batch_get_table_optimizer
- Returns the configuration for the specified table optimizers
- batch_get_triggers
- Returns a list of resource metadata for a given list of trigger names
- batch_get_workflows
- Returns a list of resource metadata for a given list of workflow names
- Annotate datapoints over time for a specific data quality statistic
- batch_stop_job_run
- Stops one or more job runs for a specified job definition
- batch_update_partition
- Updates one or more partitions in a batch operation
- Cancels the specified recommendation run that was being used to generate rules
- Cancels a run where a ruleset is being evaluated against a data source
- cancel_ml_task_run
- Cancels (stops) a task run
- cancel_statement
- Cancels the statement
- check_schema_version_validity
- Validates the supplied schema
- create_blueprint
- Registers a blueprint with Glue
- create_classifier
- Creates a classifier in the user's account
- create_connection
- Creates a connection definition in the Data Catalog
- create_crawler
- Creates a new crawler with specified targets, role, configuration, and optional schedule
- create_custom_entity_type
- Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data
- create_database
- Creates a new database in a Data Catalog
- create_data_quality_ruleset
- Creates a data quality ruleset with DQDL rules applied to a specified Glue table
- create_dev_endpoint
- Creates a new development endpoint
- create_job
- Creates a new job definition
- create_ml_transform
- Creates an Glue machine learning transform
- create_partition
- Creates a new partition
- create_partition_index
- Creates a specified partition index in an existing table
- create_registry
- Creates a new registry which may be used to hold a collection of schemas
- create_schema
- Creates a new schema set and registers the schema definition
- create_script
- Transforms a directed acyclic graph (DAG) into code
- create_security_configuration
- Creates a new security configuration
- create_session
- Creates a new session
- create_table
- Creates a new table definition in the Data Catalog
- create_table_optimizer
- Creates a new table optimizer for a specific function
- create_trigger
- Creates a new trigger
- create_usage_profile
- Creates an Glue usage profile
- create_user_defined_function
- Creates a new function definition in the Data Catalog
- create_workflow
- Creates a new workflow
- delete_blueprint
- Deletes an existing blueprint
- delete_classifier
- Removes a classifier from the Data Catalog
- Delete the partition column statistics of a column
- delete_column_statistics_for_table
- Retrieves table statistics of columns
- delete_connection
- Deletes a connection from the Data Catalog
- delete_crawler
- Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING
- delete_custom_entity_type
- Deletes a custom pattern by specifying its name
- delete_database
- Removes a specified database from a Data Catalog
- delete_data_quality_ruleset
- Deletes a data quality ruleset
- delete_dev_endpoint
- Deletes a specified development endpoint
- delete_job
- Deletes a specified job definition
- delete_ml_transform
- Deletes an Glue machine learning transform
- delete_partition
- Deletes a specified partition
- delete_partition_index
- Deletes a specified partition index from an existing table
- delete_registry
- Delete the entire registry including schema and all of its versions
- delete_resource_policy
- Deletes a specified policy
- delete_schema
- Deletes the entire schema set, including the schema set and all of its versions
- delete_schema_versions
- Remove versions from the specified schema
- delete_security_configuration
- Deletes a specified security configuration
- delete_session
- Deletes the session
- delete_table
- Removes a table definition from the Data Catalog
- delete_table_optimizer
- Deletes an optimizer and all associated metadata for a table
- delete_table_version
- Deletes a specified version of a table
- delete_trigger
- Deletes a specified trigger
- delete_usage_profile
- Deletes the Glue specified usage profile
- delete_user_defined_function
- Deletes an existing function definition from the Data Catalog
- delete_workflow
- Deletes a workflow
- get_blueprint
- Retrieves the details of a blueprint
- get_blueprint_run
- Retrieves the details of a blueprint run
- get_blueprint_runs
- Retrieves the details of blueprint runs for a specified blueprint
- get_catalog_import_status
- Retrieves the status of a migration operation
- get_classifier
- Retrieve a classifier by name
- get_classifiers
- Lists all classifier objects in the Data Catalog
- get_column_statistics_for_partition
- Retrieves partition statistics of columns
- get_column_statistics_for_table
- Retrieves table statistics of columns
- get_column_statistics_task_run
- Get the associated metadata/information for a task run, given a task run ID
- get_column_statistics_task_runs
- Retrieves information about all runs associated with the specified table
- get_connection
- Retrieves a connection definition from the Data Catalog
- get_connections
- Retrieves a list of connection definitions from the Data Catalog
- get_crawler
- Retrieves metadata for a specified crawler
- get_crawler_metrics
- Retrieves metrics about specified crawlers
- get_crawlers
- Retrieves metadata for all crawlers defined in the customer account
- get_custom_entity_type
- Retrieves the details of a custom pattern by specifying its name
- get_database
- Retrieves the definition of a specified database
- get_databases
- Retrieves all databases defined in a given Data Catalog
- get_data_catalog_encryption_settings
- Retrieves the security configuration for a specified catalog
- get_dataflow_graph
- Transforms a Python script into a directed acyclic graph (DAG)
- get_data_quality_model
- Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason)
- get_data_quality_model_result
- Retrieve a statistic's predictions for a given Profile ID
- get_data_quality_result
- Retrieves the result of a data quality rule evaluation
- Gets the specified recommendation run that was used to generate rules
- get_data_quality_ruleset
- Returns an existing ruleset by identifier or name
- Retrieves a specific run where a ruleset is evaluated against a data source
- get_dev_endpoint
- Retrieves information about a specified development endpoint
- get_dev_endpoints
- Retrieves all the development endpoints in this Amazon Web Services account
- get_job
- Retrieves an existing job definition
- get_job_bookmark
- Returns information on a job bookmark entry
- get_job_run
- Retrieves the metadata for a given job run
- get_job_runs
- Retrieves metadata for all runs of a given job definition
- get_jobs
- Retrieves all current job definitions
- get_mapping
- Creates mappings
- get_ml_task_run
- Gets details for a specific task run on a machine learning transform
- get_ml_task_runs
- Gets a list of runs for a machine learning transform
- get_ml_transform
- Gets an Glue machine learning transform artifact and all its corresponding metadata
- get_ml_transforms
- Gets a sortable, filterable list of existing Glue machine learning transforms
- get_partition
- Retrieves information about a specified partition
- get_partition_indexes
- Retrieves the partition indexes associated with a table
- get_partitions
- Retrieves information about the partitions in a table
- get_plan
- Gets code to perform a specified mapping
- get_registry
- Describes the specified registry in detail
- get_resource_policies
- Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants
- get_resource_policy
- Retrieves a specified resource policy
- get_schema
- Describes the specified schema in detail
- get_schema_by_definition
- Retrieves a schema by the SchemaDefinition
- get_schema_version
- Get the specified schema by its unique ID assigned when a version of the schema is created or registered
- get_schema_versions_diff
- Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry
- get_security_configuration
- Retrieves a specified security configuration
- get_security_configurations
- Retrieves a list of all security configurations
- get_session
- Retrieves the session
- get_statement
- Retrieves the statement
- get_table
- Retrieves the Table definition in a Data Catalog for a specified table
- get_table_optimizer
- Returns the configuration of all optimizers associated with a specified table
- get_tables
- Retrieves the definitions of some or all of the tables in a given Database
- get_table_version
- Retrieves a specified version of a table
- get_table_versions
- Retrieves a list of strings that identify available versions of a specified table
- get_tags
- Retrieves a list of tags associated with a resource
- get_trigger
- Retrieves the definition of a trigger
- get_triggers
- Gets all the triggers associated with a job
- get_unfiltered_partition_metadata
- Retrieves partition metadata from the Data Catalog that contains unfiltered metadata
- get_unfiltered_partitions_metadata
- Retrieves partition metadata from the Data Catalog that contains unfiltered metadata
- get_unfiltered_table_metadata
- Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog
- get_usage_profile
- Retrieves information about the specified Glue usage profile
- get_user_defined_function
- Retrieves a specified function definition from the Data Catalog
- get_user_defined_functions
- Retrieves multiple function definitions from the Data Catalog
- get_workflow
- Retrieves resource metadata for a workflow
- get_workflow_run
- Retrieves the metadata for a given workflow run
- get_workflow_run_properties
- Retrieves the workflow run properties which were set during the run
- get_workflow_runs
- Retrieves metadata for all runs of a given workflow
- import_catalog_to_glue
- Imports an existing Amazon Athena Data Catalog to Glue
- list_blueprints
- Lists all the blueprint names in an account
- list_column_statistics_task_runs
- List all task runs for a particular account
- list_crawlers
- Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag
- list_crawls
- Returns all the crawls of a specified crawler
- list_custom_entity_types
- Lists all the custom patterns that have been created
- list_data_quality_results
- Returns all data quality execution results for your account
- Lists the recommendation runs meeting the filter criteria
- Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source
- list_data_quality_rulesets
- Returns a paginated list of rulesets for the specified list of Glue tables
- Retrieve annotations for a data quality statistic
- list_data_quality_statistics
- Retrieves a list of data quality statistics
- list_dev_endpoints
- Retrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag
- list_jobs
- Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag
- list_ml_transforms
- Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag
- list_registries
- Returns a list of registries that you have created, with minimal registry information
- list_schemas
- Returns a list of schemas with minimal details
- list_schema_versions
- Returns a list of schema versions that you have created, with minimal information
- list_sessions
- Retrieve a list of sessions
- list_statements
- Lists statements for the session
- list_table_optimizer_runs
- Lists the history of previous optimizer runs for a specific table
- list_triggers
- Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag
- list_usage_profiles
- List all the Glue usage profiles
- list_workflows
- Lists names of workflows created in the account
- put_data_catalog_encryption_settings
- Sets the security configuration for a specified catalog
- put_data_quality_profile_annotation
- Annotate all datapoints for a Profile
- put_resource_policy
- Sets the Data Catalog resource policy for access control
- put_schema_version_metadata
- Puts the metadata key value pair for a specified schema version ID
- put_workflow_run_properties
- Puts the specified workflow run properties for the given workflow run
- query_schema_version_metadata
- Queries for the schema version metadata information
- register_schema_version
- Adds a new version to the existing schema
- remove_schema_version_metadata
- Removes a key value pair from the schema version metadata for the specified schema version ID
- reset_job_bookmark
- Resets a bookmark entry
- resume_workflow_run
- Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run
- run_statement
- Executes the statement
- search_tables
- Searches a set of tables based on properties in the table metadata as well as on the parent database
- start_blueprint_run
- Starts a new run of the specified blueprint
- start_column_statistics_task_run
- Starts a column statistics task run, for a specified table and columns
- start_crawler
- Starts a crawl using the specified crawler, regardless of what is scheduled
- start_crawler_schedule
- Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED
- Starts a recommendation run that is used to generate rules when you don't know what rules to write
- Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table)
- start_export_labels_task_run
- Begins an asynchronous task to export all labeled data for a particular transform
- start_import_labels_task_run
- Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality
- start_job_run
- Starts a job run using a job definition
- start_ml_evaluation_task_run
- Starts a task to estimate the quality of the transform
- Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels
- start_trigger
- Starts an existing trigger
- start_workflow_run
- Starts a new run of the specified workflow
- stop_column_statistics_task_run
- Stops a task run for the specified table
- stop_crawler
- If the specified crawler is running, stops the crawl
- stop_crawler_schedule
- Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running
- stop_session
- Stops the session
- stop_trigger
- Stops a specified trigger
- stop_workflow_run
- Stops the execution of the specified workflow run
- tag_resource
- Adds tags to a resource
- untag_resource
- Removes tags from a resource
- update_blueprint
- Updates a registered blueprint
- update_classifier
- Modifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present)
- Creates or updates partition statistics of columns
- update_column_statistics_for_table
- Creates or updates table statistics of columns
- update_connection
- Updates a connection definition in the Data Catalog
- update_crawler
- Updates a crawler
- update_crawler_schedule
- Updates the schedule of a crawler using a cron expression
- update_database
- Updates an existing database definition in a Data Catalog
- update_data_quality_ruleset
- Updates the specified data quality ruleset
- update_dev_endpoint
- Updates a specified development endpoint
- update_job
- Updates an existing job definition
- update_job_from_source_control
- Synchronizes a job from the source control repository
- update_ml_transform
- Updates an existing machine learning transform
- update_partition
- Updates a partition
- update_registry
- Updates an existing registry which is used to hold a collection of schemas
- update_schema
- Updates the description, compatibility setting, or version checkpoint for a schema set
- update_source_control_from_job
- Synchronizes a job to the source control repository
- update_table
- Updates a metadata table in the Data Catalog
- update_table_optimizer
- Updates the configuration for an existing table optimizer
- update_trigger
- Updates a trigger definition
- update_usage_profile
- Update an Glue usage profile
- update_user_defined_function
- Updates an existing function definition in the Data Catalog
- update_workflow
- Updates an existing workflow