Create Data Source From S3
machinelearning_create_data_source_from_s3 | R Documentation |
Creates a DataSource object¶
Description¶
Creates a DataSource
object. A DataSource
references data that can
be used to perform create_ml_model
, create_evaluation
, or
create_batch_prediction
operations.
create_data_source_from_s3
is an asynchronous operation. In response
to create_data_source_from_s3
, Amazon Machine Learning (Amazon ML)
immediately returns and sets the DataSource
status to PENDING
. After
the DataSource
has been created and is ready for use, Amazon ML sets
the Status
parameter to COMPLETED
. DataSource
in the COMPLETED
or PENDING
state can be used to perform only create_ml_model
,
create_evaluation
or create_batch_prediction
operations.
If Amazon ML can't accept the input source, it sets the Status
parameter to FAILED
and includes an error message in the Message
attribute of the get_data_source
operation response.
The observation data used in a DataSource
should be ready to use; that
is, it should have a consistent structure, and missing data values
should be kept to a minimum. The observation data must reside in one or
more .csv files in an Amazon Simple Storage Service (Amazon S3)
location, along with a schema that describes the data items by name and
type. The same schema must be used for all of the data files referenced
by the DataSource
.
After the DataSource
has been created, it's ready to use in
evaluations and batch predictions. If you plan to use the DataSource
to train an MLModel
, the DataSource
also needs a recipe. A recipe
describes how each input variable will be used in training an MLModel
.
Will the variable be included or excluded from training? Will the
variable be manipulated; for example, will it be combined with another
variable or will it be split apart into word combinations? The recipe
provides answers to these questions.
Usage¶
machinelearning_create_data_source_from_s3(DataSourceId, DataSourceName,
DataSpec, ComputeStatistics)
Arguments¶
DataSourceId
[required] A user-supplied identifier that uniquely identifies the
DataSource
.DataSourceName
A user-supplied name or description of the
DataSource
.DataSpec
[required] The data specification of a
DataSource
:DataLocationS3 - The Amazon S3 location of the observation data.
DataSchemaLocationS3 - The Amazon S3 location of the
DataSchema
.DataSchema - A JSON string representing the schema. This is not required if
DataSchemaUri
is specified.DataRearrangement - A JSON string that represents the splitting and rearrangement requirements for the
Datasource
.Sample -
"{\"splitting\":{\"percentBegin\":10,\"percentEnd\":60}}"
ComputeStatistics
The compute statistics for a
DataSource
. The statistics are generated from the observation data referenced by aDataSource
. Amazon ML uses the statistics internally duringMLModel
training. This parameter must be set totrue
if theDataSource
needs to be used forMLModel
training.
Value¶
A list with the following syntax: