Sometimes, not all the data from the source is needed, but only a subset. The Extract
class can be used to specify how to do that, depending on the type of the data.
Source type | Property | Expected property value | Result |
---|---|---|---|
FileObject or FileSet | fileProperty | One of: fullpath , filename , content , lines , lineNumbers |
The corresponding property for the FileObject |
CSV (FileObject) | column | A column name | Values in the specified column |
JSON | jsonPath | A JSONPath expression | The value(s) obtained by evaluating the JSON path expression |
fullpath
: The full path to the file within the Croissant extraction or download folders. Example: data/train/metadata.csv
filename
: The name of the file. In data/train/metadata.csv
, the file name is metadata.csv
content
: The byte content of the filelines
: The byte content of each line in the filelineNumbers
: The number of each line in the file (starting from 0){
"extract": {
"fileProperty": "content"
}
}
{
"extract": {
"column": "userId"
}
}
{
"extract": {
"jsonPath": "$.metadata.title"
}
}
{
"extract": {
"fileProperty": "filename"
}
}
This class is typically used within a DataSource
to specify exactly what part of the source data should be extracted for a particular field.