# Dataset

Metadata for a dataset

## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**name** | **str** |  | 
**row_count** | **int** | The number of rows in the array, not including column headers. | 
**column_headers** | **[str]** | List of all column headers in the order they appear in the dataset. | 
**descriptor_columns** | **[int]** | List of length equal to the number of columns where each element is 1 or 0.  A value of 1 denotes that the corresponding column is a descriptor column.  A descriptor column is an input-only column whose values will not need to be predicted. | 
**id** | **str** | Unique identifier for the dataset. | [optional] [readonly] 
**tags** | **[str]** | Optional tags to attach to the dataset | [optional] 
**notes** | **str** | An optional free field for notes about the model | [optional] 
**status** | **str** | Status | [optional] [readonly] 
**revises_id** | **str** | The UUID of the dataset this revisesId (its parent). | [optional] 
**revision_ids** | **[str]** | The UUIDs of the datasets that are revisions of this dataset (its children). | [optional] [readonly] 
**column_count** | **int** | The number of columns in the array, not including row headers. | [optional] [readonly] 
**categorical_columns** | [**[CategoricalColumn], none_type**](CategoricalColumn.md) | The possible categorical values for each categorical column.  There cannot be more than 1023 unique categorical values per column, and each value cannot be longer than 128 characters.  Categorical values can be wrapped in speech marks (\&quot;) in the csv to represent more complex strings containing special characters (i.e. commas), but speech marks are not allowed to appear anywhere apart from the beginning and end of a value.  Quoted categorical values cannot be used in vector categorical columns.  Quoted categoricals will be reduced to a form without explicit speech marks where possible, e.g. the values of \&quot;red\&quot; and red will be treated as identical.  Categorical values cannot consist of purely whitespace and cannot contain semicolons.  Leading/trailing whitespace around a categorical cell will be trimmed away, although surrounding whitespace enclosed within speech marks will be preserved.  Categorical values also cannot be words reserved for special numerical types, such as NaN, +NaN, -NaN and further variations. Categorical integers are deprecated, please use string values instead.  | [optional] 
**column_info** | [**[ColumnInfo]**](ColumnInfo.md) | Additional information/statistics for each column, listed in the order they appear in the dataset. | [optional] [readonly] 
**complete_columns** | **[int]** | List of length equal to the number of columns where each element is 1 or 0.  A value of 1 denotes that the corresponding column is a \&quot;complete column\&quot;.  This means the column must have no missing values in the dataset.  It is also recommended to not ask a model trained on this dataset to make predictions with missing values in a \&quot;complete column\&quot;.  All \&quot;complete columns\&quot; must be descriptor columns as well.  Marking columns as \&quot;complete columns\&quot; can significantly speed up model training.  If &#x60;completeColumns&#x60; is not given then none of the columns will be marked as \&quot;complete columns\&quot;. | [optional] 
**measurement_groups** | **[int], none_type** | A \&quot;measurement group\&quot; is a group of columns that are usually measured at the same time.  So when making predictions for one of these columns it is expected that the other columns in the measurement group will not be present.  The measurementGroups argument can be specified to avoid training a model that relies on values in a measurement group to predict other values in the same group.  measurementGroups is a list of length equal to the number columns in the training dataset specifying which measurement group (denoted by in integer) each column belongs to.  The order of measurementGroups must correspond to the training dataset&#39;s &#39;columnHeaders&#39; parameter.  Descriptor columns should be included in measurementGroups but they will always be used, regardless of the measurement group they are in.  For example, if measurementGroups&#x3D;[1,2,3,1] then the first and last columns are expected to be known simultaneously and so are in the same measurement group, while the second and third columns may be known or unknown regardless of the knowledge of other columns and so are in their own measurement groups.  If measurementGroups is not provided then it is assumed that every column is in its own measurement group.  | [optional] 
**column_fraction_data_present** | **[float]** | Lists the fraction of values which are given in each of the columns. | [optional] [readonly] 
**data** | **str** | The CSV specification we conform to can be found at https://www.rfc-editor.org/rfc/rfc4180.  A string in CSV format corresponding to a 2D array with row and column headers.  Row and column headers must be unique. Row and column headers containing leading/trailing whitespace will not be trimmed and will be interpreted as they appear in the data. Categorical and vector values are defined outside of the set specification, although rules for their implementation can be found under their respective sections.    Sets of 2D vectors can be included by mapping each axis to a column and separating the values corresponding to each vector with a semicolon. If these vectors are used in the dataset then the columns which are paired as vectors must be provided in the &#39;vectorPairs&#39; argument as part of the POST request.  In the example below the &#39;time&#39; and &#39;temperature&#39; columns are paired as vectors so in the first line their values map to the vectors (0,10), (1,28), (2,35), (4,42).  , heat applied, time   , temperature A, 30         , 0;1;2;4, 10;28;35;42 B, 10         , 0;5    , 10;18  | [optional] 
**vector_pairs** | **[[str]], none_type** | A list of pairs of column names.  The columns in each pair are the axes for a 2D coordinate system. Deprecated, it is recommended that series-based data is split out over separate columns for each series point.  | [optional] 
**created_at** | **int** | The Unix Timestamp in seconds when POST /datasets was called. If &#x60;0&#x60; (Unix system time zero) then creation timestamp unavailable. This can happen for older datasets.  | [optional] [readonly] 
**shared_through** | **[str]** | If a dataset has been shared with the user then this will show through which group(s) it has been shared. Won&#39;t be set if the user requesting the resource owns it.  | [optional] [readonly] 

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


