A format string used to parse the values coming from a DataSource
. For example, a date may be represented as the string “2022/11/10”, and interpreted into the correct date via the format “yyyy/MM/dd”. Formats correspond to a target data type.
Data types | Format | Example |
---|---|---|
sc:Date sc:DateTime |
CLDR Date/Time Patterns | MM/dd/yyyy |
sc:Number sc:Float sc:Integer |
CLDR Number and Currency patterns | 0.##E0 (scientific notation with max 2 decimals) |
cr:BoundingBox | Keras bounding box format | CENTER_XYWH |
Note: This list is not exhaustive, and not all Croissant implementations will support all formats.
{
"source": {
"fileObject": { "@id": "metadata" },
"extract": { "column": "datetaken" },
"format": "%Y-%m-%d %H:%M:%S.%f"
}
}
{
"@type": "cr:Field",
"@id": "images/annotations/bbox",
"description": "The bounding box around annotated object[s].",
"dataType": "cr:BoundingBox",
"source": {
"fileSet": { "@id": "instancesperson_keypoints_annotations" },
"extract": { "column": "bbox" },
"format": "CENTER_XYWH"
}
}
Format specifications are typically used within DataSource
definitions to ensure that string representations of structured data (like dates, numbers, or coordinates) are correctly parsed into their intended data types. This is particularly important for ML datasets where precise data interpretation is crucial for model training and evaluation.