Croissant
Namespace: http://mlcommons.org/croissant/
Classes
ContexExtractionEnumeration Class
URI: http://mlcommons.org/croissant/ContentExtractionEnumeration
Description: Specifies which content to extract from a file. One of "all", "lines", or "lineNumbers".
Subclass of:
DataSource Class
URI: http://mlcommons.org/croissant/DataSource
Description: A source of data, optionally transformed before being used.
Subclass of:
Properties:
- extract (→ Extract)
- fileObject (→ FileObject)
- fileSet (→ FileSet)
- format (→ Format)
- recordSet (→ RecordSet)
- transform (→ Transform)
Documentation
DataSource
is the class describing the data that can be extracted from files to populate a RecordSet
. This class should be used when the data coming from the source needs to be transformed or formatted to be included in the ML dataset; otherwise a simple Reference
can be used instead to point to the source.
DataSource
is a subclass of schema.org/Intangible.
Properties
Property | Expected Type | Cardinality | Description |
---|---|---|---|
fileObject | Reference | ONE | The name of the referenced FileObject source of the data |
fileSet | Reference | ONE | The name of the referenced FileSet source of the data |
recordSet | Reference | ONE | The name of the referenced RecordSet source |
extract | Extract | ONE | The extraction method from the provided source |
transform | Transform | MANY | A transformation to apply on source data on top of the extracted method as specified through extract , e.g., a regular expression or JSON query |
format | Format | ONE | A format to parse the values of the data from text, e.g., a date format or number format |
Usage
DataSource
is used within Field
definitions to specify where the data for the field comes from and how it should be processed. The source can be a FileObject
, FileSet
, or another RecordSet
, and the data can be extracted and transformed using the extract
, transform
, and format
properties.
Example
{
"source": {
"fileSet": { "@id": "image-files" },
"extract": {
"fileProperty": "filename"
},
"transform": {
"regex": "([^\\/]*)\\.jpg"
}
}
}
This example extracts filenames from a set of image files and applies a regular expression transformation to extract just the base filename without the path and extension.
DataType Class
URI: http://mlcommons.org/croissant/DataType
Description: The data type of values expected for a Field in a RecordSet. This class is inspired by the Datatype class in CSVW. In addition to simple atomic types, types can be semantic types, such as schema.org classes, as well types defined in other vocabularies.
Subclass of:
Documentation
The data type of values expected for a Field
in a RecordSet
. This class is inspired by the Datatype class in CSVW. In addition to simple atomic types, types can be semantic types, such as schema.org classes, as well types defined in other vocabularies.
Key Features
- A field may have more than a single assigned
dataType
, in which case at least one must be an atomic data type (e.g.:sc:Text
), while other types can provide more semantic information, possibly in the context of ML. - Can be specified at two levels: on individual
Field
s and on entireRecordSet
s.
Atomic Data Types
dataType | Usage |
---|---|
sc:Boolean | Describes a boolean |
sc:Date | Describes a date |
sc:Float | Describes a float |
sc:Integer | Describes an integer |
sc:Text | Describes a string |
ML-Specific Data Types
dataType | Usage |
---|---|
sc:ImageObject | Describes a field containing the content of an image (pixels) |
cr:BoundingBox | Describes the coordinates of a bounding box (4-number array) |
cr:Split | Describes a RecordSet used to divide data into multiple sets according to intended usage with regards to models |
Using Data Types from Other Vocabularies
Croissant datasets can use data types from other vocabularies, such as Wikidata. These may be supported by the tools consuming the data, but don't need to. For example:
dataType | Usage |
---|---|
wd:Q48277 (gender) | Describes a Field or a RecordSet whose values are indicative of someone's gender |
Examples
Simple Field Type
{
"@id": "images/color_sample",
"@type": "cr:Field",
"dataType": "sc:ImageObject"
}
Multiple Data Types
{
"@id": "cities/url",
"@type": "cr:Field",
"dataType": ["https://schema.org/URL", "https://www.wikidata.org/wiki/Q515"]
}
This example shows a field that is expected to be a URL, whose semantic type is City, so values will be URLs referring to cities.
Extract Class
URI: http://mlcommons.org/croissant/Extract
Description: Specifies how to extract data from a DataSource. The extraction mechanism depends on the type of content, e.g., a column name for tabular data, or a jsonPath for JSON data.
Subclass of:
Properties:
- column (→ Text)
- content (→ ContentExtractionEnumeration)
- fileProperty (→ FilePropertyEnumeration)
- jsonPath (→ Text)
Documentation
Sometimes, not all the data from the source is needed, but only a subset. The Extract
class can be used to specify how to do that, depending on the type of the data.
Extraction Methods
Source type | Property | Expected property value | Result |
---|---|---|---|
FileObject or FileSet | fileProperty | One of: fullpath , filename , content , lines , lineNumbers |
The corresponding property for the FileObject |
CSV (FileObject) | column | A column name | Values in the specified column |
JSON | jsonPath | A JSONPath expression | The value(s) obtained by evaluating the JSON path expression |
FileProperty Values
fullpath
: The full path to the file within the Croissant extraction or download folders. Example:data/train/metadata.csv
filename
: The name of the file. Indata/train/metadata.csv
, the file name ismetadata.csv
content
: The byte content of the filelines
: The byte content of each line in the filelineNumbers
: The number of each line in the file (starting from 0)
Examples
Extracting File Content
{
"extract": {
"fileProperty": "content"
}
}
Extracting CSV Column
{
"extract": {
"column": "userId"
}
}
Extracting with JSONPath
{
"extract": {
"jsonPath": "$.metadata.title"
}
}
Extracting Filename
{
"extract": {
"fileProperty": "filename"
}
}
This class is typically used within a DataSource
to specify exactly what part of the source data should be extracted for a particular field.
Field Class
URI: http://mlcommons.org/croissant/Field
Description: A component of the structure of a RecordSet, such as a column of a table.
Subclass of:
Properties:
- dataType (→ DataType)
- equivalentProperty (→ URL)
- parentField (→ Field)
- references (→ Field)
- repeated (→ Boolean)
- source (→ DataSource, FileObject, FileSet, RecordSet)
- subField (→ Field)
Documentation
A Field
is part of a RecordSet
. It may represent a column of a table, or a nested data structure or even a nested RecordSet
in the case of hierarchical data.
Field
is a subclass of schema.org/Intangible.
Properties
Property | Expected Type | Cardinality | Description |
---|---|---|---|
source | DataSource URL |
ONE | The data source of the field. This will generally reference a FileObject or FileSet 's contents |
dataType | DataType | MANY | The data type of the field, identified by the URI of the corresponding class |
repeated | Boolean | ONE | If true, then the Field is a list of values of type dataType |
equivalentProperty | URL | MANY | A property that is equivalent to this Field |
references | Reference | MANY | Another Field of another RecordSet that this field references (foreign key equivalent) |
subField | Field | MANY | Another Field that is nested inside this one |
parentField | Reference | MANY | A special case of SubField that should be hidden because it references a Field that already appears in the RecordSet |
Key Features
- Each field has a
name
(unique identifier within theRecordSet
) - Supports foreign key relationships through the
references
property - Supports hierarchical nesting with
subField
andparentField
- Can specify multiple data types for semantic enrichment
Examples
Simple Field
{
"@type": "cr:Field",
"@id": "ratings/user_id",
"dataType": "sc:Integer",
"source": {
"fileObject": { "@id": "ratings-table" },
"extract": {
"column": "userId"
}
}
}
Field with Reference (Foreign Key)
{
"@type": "cr:Field",
"@id": "ratings/movie_id",
"dataType": "sc:Integer",
"source": {
"fileObject": { "@id": "ratings-table" },
"extract": {
"column": "movieId"
}
},
"references": {
"@id": "movies/movie_id"
}
}
Nested Field with SubFields
{
"@type": "cr:Field",
"@id": "gps_coordinates",
"description": "GPS coordinates where the image was taken.",
"dataType": "sc:GeoCoordinates",
"subField": [
{
"@type": "cr:Field",
"@id": "gps_coordinates/latitude",
"dataType": "sc:Float",
"source": {
"fileObject": { "@id": "metadata" },
"extract": { "column": "latitude" }
}
},
{
"@type": "cr:Field",
"@id": "gps_coordinates/longitude",
"dataType": "sc:Float",
"source": {
"fileObject": { "@id": "metadata" },
"extract": { "column": "longitude" }
}
}
]
}
This example shows how fields can be hierarchically structured to represent complex data types like geographical coordinates.
FileObject Class
URI: http://mlcommons.org/croissant/FileObject
Description: An individual file that is part of a dataset.
Subclass of:
Properties:
- containedIn (→ FileObject, FileSet)
Documentation
FileObject
is the Croissant class used to represent individual files that are part of a dataset.
FileObject
is a general purpose class that inherits from Schema.org CreativeWork
, and can be used to represent instances of more specific types of content like DigitalDocument
and MediaObject
.
Most of the important properties needed to describe a FileObject
are defined in the classes it inherits from:
Property | ExpectedType | Cardinality | Description |
---|---|---|---|
sc:name | Text | ONE | The name of the file. As much as possible, the name should reflect the name of the file as downloaded, including the file extension. e.g. "images.zip". |
sc:contentUrl | URL | ONE | Actual bytes of the media object, for example the image file or video file. |
sc:contentSize | Text | ONE | File size in (mega/kilo/…)bytes. Defaults to bytes if a unit is not specified. |
sc:encodingFormat | Text | ONE | The format of the file, given as a mime type. |
sc:sameAs | URL | MANY | URL (or local name) of a FileObject with the same content, but in a different format. |
sc:sha256 | Text | ONE | Checksum for the file contents. |
In addition, FileObject
defines the following property:
Property | ExpectedType | Cardinality | Description |
---|---|---|---|
containedIn | Text | MANY | Another FileObject or FileSet that this one is contained in, e.g., in the case of a file extracted from an archive. When this property is present, the contentUrl is evaluated as a relative path within the container object. |
Let's look at a few examples of FileObject
definitions.
First, a single CSV file:
{
"@type": "cr:FileObject",
"@id": "pass_metadata.csv",
"contentUrl": "https://zenodo.org/record/6615455/files/pass_metadata.csv",
"encodingFormat": "text/csv",
"sha256": "0b033707ea49365a5ffdd14615825511"
}
Next: An archive and some files extracted from it (represented via the containedIn
property):
{
"@type": "cr:FileObject",
"@id": "ml-25m.zip",
"contentUrl": "https://files.grouplens.org/datasets/movielens/ml-25m.zip",
"encodingFormat": "application/zip",
"sha256": "6b51fb2759a8657d3bfcbfc42b592ada"
},
{
"@type": "cr:FileObject",
"@id": "ratings-table",
"contentUrl": "ratings.csv",
"containedIn": { "@id": "ml-25m.zip" },
"encodingFormat": "text/csv"
},
{
"@type": "cr:FileObject",
"@id": "movies-table",
"contentUrl": "movies.csv",
"containedIn": { "@id": "ml-25m.zip" },
"encodingFormat": "text/csv"
}
FilePropertyEnumeration Class
URI: http://mlcommons.org/croissant/FilePropertyEnumeration
Description: Specifies a property of a FileObject. One of "fullPath" or "fileName".
Subclass of:
FileSet Class
URI: http://mlcommons.org/croissant/FileSet
Description: A set of homogeneous files extracted from a container, optionally filtered by inclusion and/or exclusion filters.
Subclass of:
Properties:
- containedIn (→ FileObject, FileSet)
- excludes (→ Text)
- includes (→ Text)
Documentation
In many datasets, data comes in the form of collections of homogeneous files, such as images, videos or text files, where each file needs to be treated as an individual item, e.g., as a training example. FileSet
is a class that describes such collections of files.
A FileSet
is a set of files located in a container, which can be an archive FileObject
or a "manifest" file. A FileSet may also specify inclusion / exclusion filters using file patterns.
FileSet
extends schema.org/Intangible.
Properties
Property | Expected Type | Cardinality | Description |
---|---|---|---|
containedIn | Reference | MANY | The source of data for the FileSet , e.g., an archive. If multiple values are provided, then the union of their contents is taken |
includes | Text | MANY | A glob pattern that specifies the files to include |
excludes | Text | MANY | A glob pattern that specifies the files to exclude |
Pattern Processing
The includes
and excludes
properties use glob patterns, a common mechanism to specify a set of files along a path, like ".jpg" for all jpg images, or "/foo/pic.jpg" for all jpg images under the "foo" directory whose filename starts with "pic".
To get the set of FileObjects included in the FileSet:
1. The includes
pattern(s) are evaluated first
2. If multiple includes
are specified, the union of their results is taken
3. Then all the files corresponding to the excludes
patterns are removed from that set
4. Patterns are evaluated from the root of the containedIn
contents (e.g., the top level directory extracted from an archive)
Examples
Simple Image Archive
{
"@type": "cr:FileObject",
"@id": "train2014.zip",
"contentSize": "13510573713 B",
"contentUrl": "http://images.cocodataset.org/zips/train2014.zip",
"encodingFormat": "application/zip",
"sha256": "sha256"
},
{
"@type": "cr:FileSet",
"@id": "image-files",
"containedIn": { "@id": "train2014.zip" },
"encodingFormat": "image/jpeg",
"includes": "*.jpg"
}
Complex Archive with Multiple FileSets
{
"@type": "cr:FileObject",
"@id": "flores200_dataset.tar.gz",
"description": "Flores 200 is hosted on a webserver.",
"contentSize": "25585843 B",
"contentUrl": "https://tinyurl.com/flores200dataset",
"encodingFormat": "application/x-gzip",
"sha256": "c764ffdeee4894b3002337c5b1e70ecf6f514c00"
},
{
"@type": "cr:FileSet",
"@id": "files-dev",
"description": "dev files are inside the tar.",
"containedIn": { "@id": "flores200_dataset.tar.gz" },
"encodingFormat": "application/json",
"includes": "flores200_dataset/dev/*.dev"
},
{
"@type": "cr:FileSet",
"@id": "files-devtest",
"description": "devtest files are inside the tar.",
"containedIn": { "@id": "flores200_dataset.tar.gz" },
"encodingFormat": "application/json",
"includes": "flores200_dataset/devtest/*.devtest"
}
This example shows how multiple FileSets can be extracted from a single archive, each with different inclusion patterns to select different subsets of files.
Format Class
URI: http://mlcommons.org/croissant/Format
Description: Specifies how to parse the format of the data from a string representation. For example, format may hold a date format string, a number format, or a bounding box format.
Subclass of:
Documentation
A format string used to parse the values coming from a DataSource
. For example, a date may be represented as the string "2022/11/10", and interpreted into the correct date via the format "yyyy/MM/dd". Formats correspond to a target data type.
Supported Format Types
Data types | Format | Example |
---|---|---|
sc:Date sc:DateTime |
CLDR Date/Time Patterns | MM/dd/yyyy |
sc:Number sc:Float sc:Integer |
CLDR Number and Currency patterns | 0.##E0 (scientific notation with max 2 decimals) |
cr:BoundingBox | Keras bounding box format | CENTER_XYWH |
Note: This list is not exhaustive, and not all Croissant implementations will support all formats.
Examples
Date Format Parsing
{
"source": {
"fileObject": { "@id": "metadata" },
"extract": { "column": "datetaken" },
"format": "%Y-%m-%d %H:%M:%S.%f"
}
}
Bounding Box Format
{
"@type": "cr:Field",
"@id": "images/annotations/bbox",
"description": "The bounding box around annotated object[s].",
"dataType": "cr:BoundingBox",
"source": {
"fileSet": { "@id": "instancesperson_keypoints_annotations" },
"extract": { "column": "bbox" },
"format": "CENTER_XYWH"
}
}
Usage
Format specifications are typically used within DataSource
definitions to ensure that string representations of structured data (like dates, numbers, or coordinates) are correctly parsed into their intended data types. This is particularly important for ML datasets where precise data interpretation is crucial for model training and evaluation.
RecordSet Class
URI: http://mlcommons.org/croissant/RecordSet
Description: A description of a set of structured records from one or more data sources and their structure, expressed as a set of fields.
Subclass of:
Properties:
- data (→ 22-rdf-syntax-ns#JSON)
- dataType (→ DataType)
- examples (→ 22-rdf-syntax-ns#JSON)
- field (→ Field)
- key (→ Field)
- source (→ DataSource, FileObject, FileSet, RecordSet)
Documentation
A RecordSet
describes a set of structured records obtained from one or more data sources (typically a file or set of files) and the structure of these records, expressed as a set of fields (e.g., the columns of a table). A RecordSet
can represent flat or nested data.
Purpose
RecordSet
provides a common structure description that can be used across different modalities, in terms of records that may contain multiple fields. It handles:
- Unstructured content (like text and images) as single-field records
- Tabular data as one record per row in the table, with fields for each column
- Tree-structured data with nested and repeated fields
RecordSet
is a subclass of schema.org/Intangible.
Properties
Property | Expected Type | Cardinality | Description |
---|---|---|---|
field | Field | MANY | A data element that appears in the records of the RecordSet (e.g., one column of a table) |
key | Text | MANY | One or more fields whose values uniquely identify each record in the RecordSet |
data | JSON | MANY | One or more records that constitute the data of the RecordSet |
examples | JSON URL |
MANY | One or more records provided as example content of the RecordSet , or a reference to data source that contains examples |
Additional Features
- Embedding: Supports embedding small enumerations directly via the
data
property - Typing: Supports typing with
dataType
for entire RecordSets - Joins: Supports joins through field references (foreign keys)
- Hierarchical: Supports hierarchical structures with nested records
Examples
Simple Tabular RecordSet
{
"@type": "cr:RecordSet",
"@id": "ratings",
"key": [{ "@id": "ratings/user_id" }, { "@id": "ratings/movie_id" }],
"field": [
{
"@type": "cr:Field",
"@id": "ratings/user_id",
"dataType": "sc:Integer",
"source": {
"fileObject": { "@id": "ratings-table" },
"extract": { "column": "userId" }
}
},
{
"@type": "cr:Field",
"@id": "ratings/rating",
"description": "The score of the rating on a five-star scale.",
"dataType": "sc:Float",
"source": {
"fileObject": { "@id": "ratings-table" },
"extract": { "column": "rating" }
}
}
]
}
Enumeration with Embedded Data
{
"@type": "cr:RecordSet",
"@id": "gender_enum",
"description": "Maps gender ids (0, 1) to labeled values.",
"key": { "@id": "gender_enum/id" },
"field": [
{ "@id": "gender_enum/id", "@type": "cr:Field", "dataType": "sc:Integer" },
{ "@id": "gender_enum/label", "@type": "cr:Field", "dataType": "sc:String" }
],
"data": [
{ "gender_enum/id": 0, "gender_enum/label": "Male" },
{ "gender_enum/id": 1, "gender_enum/label": "Female" }
]
}
Geographic Data with Type Mapping
{
"@id": "cities",
"@type": "cr:RecordSet",
"dataType": "sc:GeoCoordinates",
"field": [
{
"@id": "cities/latitude",
"@type": "cr:Field"
},
{
"@id": "cities/longitude",
"@type": "cr:Field"
}
]
}
This example shows how RecordSets can be typed with semantic types like sc:GeoCoordinates
, and fields can be implicitly mapped to properties of that type (latitude and longitude).
Transform Class
URI: http://mlcommons.org/croissant/Transform
Description: Specifies how to transform data extracted from a DataSource. The type of transformation depends on the type of content, e.g., a regular expression to appy on text, or a jsonQuery to transform JSON content.
Subclass of:
Properties:
Documentation
Croissant supports a few simple transformations that can be applied on the source data. Transformations are used to modify extracted data before it's included in the final dataset.
Supported Transformations
- delimiter: Split a string into an array using the supplied character
- regex: A regular expression to parse the data
- jsonQuery: A JSON query to evaluate on the (JSON) data source
Examples
Regular Expression Transformation
{
"fileSet": {
"@id": "files"
},
"extract": {
"fileProperty": "filename"
},
"transform": {
"regex": "^(train|val|test)2014/.*\\.jpg$"
}
}
This example extracts filenames and applies a regex to parse training/validation/test split information.
Filename Parsing
{
"source": {
"fileSet": { "@id": "image-files" },
"extract": {
"fileProperty": "filename"
},
"transform": {
"regex": "([^\\/]*)\\.jpg"
}
}
}
This extracts the base filename (without path and extension) from image files.
Delimiter Transformation
{
"transform": {
"delimiter": ","
}
}
This would split a comma-separated string into an array of values.
JSON Query Transformation
{
"transform": {
"jsonQuery": "$.metadata.authors[*].name"
}
}
This would extract all author names from a JSON structure using a JSON query.
Usage
Transformations are typically used within DataSource
definitions, applied after data extraction but before final formatting. They provide a way to clean, parse, or restructure data to make it suitable for machine learning workflows without requiring external preprocessing steps.
Properties
citeAs Property
URI: http://mlcommons.org/croissant/citeAs
Description: How to cite this dataset. Ideally, citations should be expressed using the bibtex format. Note that this is different from schema.org/citation, which is used to make a citation to another publication from this dataset.
Domain:
Range:
column Property
URI: http://mlcommons.org/croissant/column
Description: In case the data source is tabular, the id of a column to extract.
Domain:
Range:
containedIn Property
URI: http://mlcommons.org/croissant/containedIn
Description: Another FileObject or FileSet that this one is contained in, e.g., in the case of a file extracted from an archive. When this property is present, the contentUrl is evaluated as a relative path within the container object.
Domain:
Range:
content Property
URI: http://mlcommons.org/croissant/content
Description: What to extract from the data source content, e.g., lines.
Domain:
Range:
data Property
URI: http://mlcommons.org/croissant/data
Description: One or more inlined records that constitute the data of the RecordSet, typically used for small enumeration values.
Domain:
Range:
dataType Property
URI: http://mlcommons.org/croissant/dataType
Description: The data type of the field, identified by the URI of the corresponding class. It could be either an atomic type (e.g, sc:Integer) or a semantic type (e.g., sc:GeoLocation).
Domain:
Range:
delimiter Property
URI: http://mlcommons.org/croissant/delimiter
Description: A delimiter to use parse the data into an array.
Domain:
Range:
equivalentProperty Property
URI: http://mlcommons.org/croissant/equivalentProperty
Description: A property that is equivalent to this Field. Used in the case a dataType is specified on the RecordSet to map specific fields to specific properties associated with that dataType.
Domain:
Range:
examples Property
URI: http://mlcommons.org/croissant/examples
Description: One more inlined records provided as example content of the RecordSet.
Domain:
Range:
excludes Property
URI: http://mlcommons.org/croissant/excludes
Description: A glob pattern that specifies the files to exclude. The pattern is evaluated from the root of the containedIn contents, after the includes patterns have been evaluated.
Domain:
Range:
extract Property
URI: http://mlcommons.org/croissant/extract
Description: The extraction method from the provided source.
Domain:
Range:
field Property
URI: http://mlcommons.org/croissant/field
Description: A data element that appears in the records of the RecordSet (e.g., one column of a table).
Domain:
Range:
fileObject Property
URI: http://mlcommons.org/croissant/fileObject
Description: The id of a FileObject that is the source of the data.
Domain:
Range:
fileProperty Property
URI: http://mlcommons.org/croissant/fileProperty
Description: The file property to extract from the data source metadata, e.g., the filename.
Domain:
Range:
fileSet Property
URI: http://mlcommons.org/croissant/fileSet
Description: The id of a FileSet that is the source of the data.
Domain:
Range:
format Property
URI: http://mlcommons.org/croissant/format
Description: A format to parse the values of the data from text, e.g., a date format or number format.
Domain:
Range:
includes Property
URI: http://mlcommons.org/croissant/includes
Description: A glob pattern that specifies the files to include, e.g., ".jpg", "/foo/pic*.jpg". The pattern is evaluated from the root of the containedIn contents.
Domain:
Range:
isLiveDataset Property
URI: http://mlcommons.org/croissant/isLiveDataset
Description: Indicates that the dataset is continuously updated instead of being versioned.
Domain:
Range:
jsonPath Property
URI: http://mlcommons.org/croissant/jsonPath
Description: In case the data source is JSON data, a path expression to extract a subset of the data.
Domain:
Range:
jsonQuery Property
URI: http://mlcommons.org/croissant/jsonQuery
Description: For JSON content, a query to evaluate on the data.
Domain:
Range:
key Property
URI: http://mlcommons.org/croissant/key
Description: One or more fields whose values uniquely identify each record in the RecordSet. (See example below.)
Domain:
Range:
parentField Property
URI: http://mlcommons.org/croissant/parentField
Description: A special case of SubField that should be hidden because it references a Field that already appears in the RecordSet.
Domain:
Range:
recordSet Property
URI: http://mlcommons.org/croissant/recordSet
Description: The id of a RecordSet that is the source of the data.
Domain:
Range:
references Property
URI: http://mlcommons.org/croissant/references
Description: Another Field of another RecordSet that this field references. This is the equivalent of a foreign key reference in a relational database.
Domain:
Range:
regex Property
URI: http://mlcommons.org/croissant/regex
Description: A regular expression to apply to the data.
Domain:
Range:
repeated Property
URI: http://mlcommons.org/croissant/repeated
Description: If true, then the Field is a list of values of type dataType.
Domain:
Range:
source Property
URI: http://mlcommons.org/croissant/source
Description: The data source of the field. This will generally reference a FileObject or FileSet's contents (e.g., a specific column of a table).
Domain:
Range:
subField Property
URI: http://mlcommons.org/croissant/subField
Description: Another Field that is nested inside this one.
Domain:
Range:
transform Property
URI: http://mlcommons.org/croissant/transform
Description: A transformation to apply on source data on top of the extracted method as specified through extract, e.g., a regular expression or JSON query.
Domain:
Range: