- DataFrame
- Parameters
- toDSV
- toCSV
- toTSV
- toPSV
- toText
- toJSON
- toDict
- toArray
- toCollection
- show
- dim
- transpose
- count
- countValue
- push
- replace
- distinct
- unique
- listColumns
- select
- withColumn
- restructure
- renameAll
- rename
- castAll
- cast
- drop
- chain
- filter
- where
- find
- map
- reduce
- reduceRight
- dropDuplicates
- dropMissingValues
- fillMissingValues
- shuffle
- sample
- bisect
- groupBy
- sortBy
- union
- join
- innerJoin
- fullJoin
- outerJoin
- leftJoin
- rightJoin
- diff
- head
- tail
- slice
- getRow
- setRow
- setRowInPlace
- setDefaultModules
- fromDSV
- fromText
- fromCSV
- fromTSV
- fromPSV
- fromJSON
DataFrame data structure providing an immutable, flexible and powerfull way to manipulate data with columns and rows.
data(Array | Object | DataFrame) The data of the DataFrame.columnsArray The DataFrame column names.optionsObject Additional options. Example: modules. (optional, default{})
Convert the DataFrame into a text delimiter separated values. You can also save the file if you are using nodejs.
args...anysepString Column separator. (optional, default' ')headerBoolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue)pathString? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toDSV()
df.toDSV(';')
df.toDSV(';', true)
// From node.js only
df.toDSV(';', true, '/my/absolute/path/dataframe.txt')Returns String The text file in raw string.
Convert the DataFrame into a comma separated values string. You can also save the file if you are using nodejs.
args...anyheaderBoolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue)pathString? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toCSV()
df.toCSV(true)
// From node.js only
df.toCSV(true, '/my/absolute/path/dataframe.csv')Returns String The csv file in raw string.
Convert the DataFrame into a tab separated values string. You can also save the file if you are using nodejs.
args...anyheaderBoolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue)pathString? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toCSV()
df.toCSV(true)
// From node.js only
df.toCSV(true, '/my/absolute/path/dataframe.csv')Returns String The csv file in raw string.
Convert the DataFrame into a pipe separated values string. You can also save the file if you are using nodejs.
args...anyheaderBoolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue)pathString? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toPSV()
df.toPSV(true)
// From node.js only
df.toPSV(true, '/my/absolute/path/dataframe.csv')Returns String The csv file in raw string.
Convert the DataFrame into a text delimiter separated values. Alias for .toDSV. You can also save the file if you are using nodejs.
args...anysepString Column separator. (optional, default' ')headerBoolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue)pathString? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toText()
df.toText(';')
df.toText(';', true)
// From node.js only
df.toText(';', true, '/my/absolute/path/dataframe.txt')Returns String The text file in raw string.
Convert the DataFrame into a json string. You can also save the file if you are using nodejs.
args...anyasCollectionBoolean Writing the JSON as collection of Object. (optional, defaultfalse)pathString? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toJSON()
// From node.js only
df.toJSON('/my/absolute/path/dataframe.json')Returns String The json file in raw string.
Convert DataFrame into dict / hash / object.
df.toDict()Returns Object The DataFrame converted into dict.
Convert DataFrame into Array of Arrays. You can also extract only one column as Array.
columnNameString? Column Name to extract. By default, all columns are transformed.
df.toArray()Returns Array The DataFrame (or the column) converted into Array.
Convert DataFrame into Array of dictionnaries. You can also return Rows instead of dictionnaries.
ofRowsBoolean? Return a collection of Rows instead of dictionnaries.
df.toCollection()Returns Array The DataFrame converted into Array of dictionnaries (or Rows).
Display the DataFrame as String Table. Can only return a sring instead of displaying the DataFrame.
rowsNumber The number of lines to display. (optional, default10)quietBoolean Quiet mode. If true, only returns a string instead of console.log(). (optional, defaultfalse)
df.show()
df.show(10)
const stringDF = df.show(10, true)Returns String The DataFrame as String Table.
Get the DataFrame dimensions.
const [height, width] = df.dim()Returns Array The DataFrame dimensions. [height, width]
Transpose a DataFrame. Rows become columns and conversely. n x p => p x n.
tranposeColumnNamestransposeColumnNamesBoolean An option to transpose columnNames in a rowNames column. (optional, defaultfalse)
df.transpose()Returns ÐataFrame A new transposed DataFrame.
Get the rows number.
df.count()Returns Int The number of DataFrame rows.
Get the count of a value into a column.
valueToCountThe value to count into the selected column.columnNameString The column to count the value. (optional, defaultthis.listColumns()[0])
df.countValue(5, 'column2')
df.select('column1').countValue(5)Returns Int The number of times the selected value appears.
Push new rows into the DataFrame.
rows(Array | Row) The rows to add.
df.push([1,2,3], [1,4,9])Returns DataFrame A new DataFrame with the new rows.
Replace a value by another in all the DataFrame or in a column.
valueThe value to replace.replacementThe new value.columnNames(String | Array) The columns to apply the replacement. (optional, defaultthis.listColumns())
df.replace(undefined, 0, 'column1', 'column2')Returns DataFrame A new DataFrame with replaced values.
Compute unique values into a column.
columnNameString The column to distinct.
df.distinct('column1')Returns DataFrame A DataFrame containing the column with distinct values.
Compute unique values into a column. Alias from .distinct()
columnNameString The column to distinct.
df.unique('column1')Returns DataFrame A DataFrame containing the column with distinct values.
List DataFrame columns.
df.listColumns()Returns Array An Array containing DataFrame columnNames.
Select columns in the DataFrame.
columnNames...String The columns to select.
df.select('column1', 'column3')Returns DataFrame A new DataFrame containing selected columns.
Add a new column or set an existing one.
columnNameString The column to modify or to create.funcFunction The function to create the column. (optional, default(row,index)=>undefined)
df.withColumn('column4', () => 2)
df.withColumn('column2', (row) => row.get('column2') * 2)Returns DataFrame A new DataFrame containing the new or modified column.
Modify the structure of the DataFrame by changing columns order, creating new columns or removing some columns.
newColumnNamesArray The new columns of the DataFrame.
df.restructure(['column1', 'column4', 'column2', 'column3'])
df.restructure(['column1', 'column4'])
df.restructure(['column1', 'newColumn', 'column4'])Returns DataFrame A new DataFrame with restructured columns (renamed, add or deleted).
Rename each column.
newColumnNamesArray The new column names of the DataFrame.
df.renameAll(['column1', 'column3', 'column4'])Returns DataFrame A new DataFrame with the new column names.
Rename a column.
df.rename('column1', 'columnRenamed')Returns DataFrame A new DataFrame with the new column name.
Cast each column into a given type.
typeFunctionsArray The functions used to cast columns.
df.castAll([Number, String, (val) => new CustomClass(val)])Returns DataFrame A new DataFrame with the columns having new types.
Cast a column into a given type.
columnNameString The column to cast.typeFunctionObjectTypeFunction The function used to cast the column.
df.cast('column1', Number)
df.cast('column1', (val) => new MyCustomClass(val))Returns DataFrame A new DataFrame with the column having a new type.
Remove a single column.
columnNameString The column to drop.
df.drop('column2')Returns DataFrame A new DataFrame without the dropped column.
Chain maps and filters functions on DataFrame by optimizing their executions. If a function returns boolean, it's a filter. Else it's a map. It can be 10 - 100 x faster than standard chains of .map() and .filter().
funcs...Function Functions to apply on the DataFrame rows taking the row as parameter.
df.chain(
row => row.get('column1') > 3, // filter
row => row.set('column1', 3), // map
row => row.get('column2') === '5' // filter
)Returns DataFrame A new DataFrame with modified rows.
Filter DataFrame rows.
df.filter(row => row.get('column1') >= 3)
df.filter({'column2': 5, 'column1': 3}))Returns DataFrame A new filtered DataFrame.
Filter DataFrame rows. Alias of .filter()
df.where(row => row.get('column1') >= 3)
df.where({'column2': 5, 'column1': 3}))Returns DataFrame A new filtered DataFrame.
Find a row (the first met) based on a condition.
df.find(row => row.get('column1') === 3)
df.find({'column1': 3})Returns Row The targeted Row.
Map on DataFrame rows. /!\ Prefer to use .chain().
funcFunction A function to apply on each row taking the row as parameter.
df.map(row => row.set('column1', row.get('column1') * 2))Returns DataFrame A new DataFrame with modified rows.
Reduce DataFrame into a value.
funcFunction The reduce function taking 2 parameters, previous and next.initThe initial value of the reducer.
df.reduce((p, n) => n.get('column1') + p, 0)
df2.reduce((p, n) => (
n.set('column1', p.get('column1') + n.get('column1'))
.set('column2', p.get('column2') + n.get('column2'))
))Returns any A reduced value.
Reduce DataFrame into a value, starting from the last row (see .reduce()).
funcFunction The reduce function taking 2 parameters, previous and next.initThe initial value of the reducer.
df.reduceRight((p, n) => p > n ? p : n, 0)Returns any A reduced value.
Return a DataFrame without duplicated columns.
columnNames...String The columns used to check unicity of rows. If omitted, unicity is checked on all columns.
df.dropDuplicates('id', 'name')Returns DataFrame A DataFrame without duplicated rows.
Return a DataFrame without rows containing missing values (undefined, NaN, null).
columnNamesArray The columns to consider. All columns are considered by default.
df.dropMissingValues(['id', 'name'])Returns DataFrame A DataFrame without rows containing missing values.
Return a DataFrame with missing values (undefined, NaN, null) fill with default value.
replacementThe new value.columnNamesArray The columns to consider. All columns are considered by default.
df.fillMissingValues(0, ['id', 'name'])Returns DataFrame A DataFrame with missing values replaced.
Return a shuffled DataFrame rows.
df.shuffle()Returns DataFrame A shuffled DataFrame.
Return a random sample of rows.
percentageNumber A percentage of the orignal DataFrame giving the sample size.
df.sample(0.3)Returns DataFrame A sample DataFrame
Randomly split a DataFrame into 2 DataFrames.
percentageNumber A percentage of the orignal DataFrame giving the first DataFrame size. The second takes the rest.
const [30DF, 70DF] = df.bisect(0.3)Returns Array An Array containing the two DataFrames. First, the X% DataFrame then the rest DataFrame.
Group DataFrame rows by columns giving a GroupedDataFrame object. See its doc for more examples.
args...anycolumnNames...String The columns used for the groupBy.
df.groupBy('column1')
df.groupBy('column1', 'column2')
df.groupBy('column1', 'column2').listGroups()
df.groupBy('column1', 'column2').show()
df.groupBy('column1', 'column2').aggregate((group) => group.count())Returns GroupedDataFrame A GroupedDataFrame object.
Sort DataFrame rows based on column values. The row should contains only one variable type. Columns are sorted left-to-right.
columnNames(String | Array<string>) The columns giving order.reverseBoolean Reverse mode. Reverse the order if true. (optional, defaultfalse)missingValuesPositionString Define the position of missing values (undefined, nulls and NaN) in the order. (optional, default'first')
df.sortBy('id')
df.sortBy(['id1', 'id2'])
df.sortBy(['id1'], true)Returns DataFrame An ordered DataFrame.
Concat two DataFrames.
dfToUnionDataFrame The DataFrame to concat.
df.union(df2)Returns DataFrame A new concatenated DataFrame resulting of the union.
Join two DataFrames.
dfToJoinDataFrame The DataFrame to join.columnNames(String | Array) The selected columns for the join.howString The join mode. Can be: full, inner, outer, left, right. (optional, default'inner')
df.join(df2, 'column1', 'full')Returns DataFrame The joined DataFrame.
Join two DataFrames with inner mode.
dfToJoinDataFrame The DataFrame to join.columnNames(String | Array) The selected columns for the join.
df.innerJoin(df2, 'id')
df.join(df2, 'id')
df.join(df2, 'id', 'inner')Returns DataFrame The joined DataFrame.
Join two DataFrames with full mode.
dfToJoinDataFrame The DataFrame to join.columnNames(String | Array) The selected columns for the join.
df.fullJoin(df2, 'id')
df.join(df2, 'id', 'full')Returns DataFrame The joined DataFrame.
Join two DataFrames with outer mode.
dfToJoinDataFrame The DataFrame to join.columnNames(String | Array) The selected columns for the join.
df2.outerJoin(df2, 'id')
df2.join(df2, 'id', 'outer')Returns DataFrame The joined DataFrame.
Join two DataFrames with left mode.
dfToJoinDataFrame The DataFrame to join.columnNames(String | Array) The selected columns for the join.
df.leftJoin(df2, 'id')
df.join(df2, 'id', 'left')Returns DataFrame The joined DataFrame.
Join two DataFrames with right mode.
dfToJoinDataFrame The DataFrame to join.columnNames(String | Array) The selected columns for the join.
df.rightJoin(df2, 'id')
df.join(df2, 'id', 'right')Returns DataFrame The joined DataFrame.
Find the differences between two DataFrames (reverse of join).
dfToDiffDataFrame The DataFrame to diff.columnNames(String | Array) The selected columns for the diff.
df2.diff(df2, 'id')Returns DataFrame The differences DataFrame.
Create a new subset DataFrame based on the first rows.
nRowsNumber The number of first rows to get. (optional, default10)
df2.head()
df2.head(5)Returns DataFrame The subset DataFrame.
Create a new subset DataFrame based on the last rows.
nRowsNumber The number of last rows to get. (optional, default10)
df2.tail()
df2.tail(5)Returns DataFrame The subset DataFrame.
Create a new subset DataFrame based on given indexs. Similar to Array.slice.
startIndexNumber The index to start the slice (included). (optional, default0)endIndexNumber The index to end the slice (excluded). (optional, defaultthis.count())
df2.slice()
df2.slice(0)
df2.slice(0, 20)
df2.slice(10, 30)Returns DataFrame The subset DataFrame.
Return a Row by its index.
indexNumber The index to select the row. (optional, default0)
df2.getRow(1)Returns Row The Row.
Modify a Row a the given index.
indexNumber The index to select the row. (optional, default0)func(optional, defaultrow=>row)
df2.setRowByIndex(1, row => row.set("column1", 33))Returns DataFrame A new DataFrame with the modified Row.
Modify a Row in place (by mutation) at the given index.
indexNumber The index to select the row. (optional, default0)func(optional, defaultrow=>row)
df2.setRowByIndex(1, row => row.set("column1", 33))Returns DataFrame The current DataFrame with the modified row.
Set the default modules used in DataFrame instances.
defaultModules...Object DataFrame modules used by default.
DataFrame.setDefaultModules(SQL, Stat)Create a DataFrame from a delimiter separated values text file. It returns a Promise.
args...anypathOrFile(String | File) A path to the file (url or local) or a browser File object.sepString The separator used to parse the file.headerBoolean A boolean indicating if the text has a header or not. (optional, defaulttrue)
DataFrame.fromDSV('http://myurl/myfile.txt').then(df => df.show())
// In browser Only
DataFrame.fromDSV(myFile).then(df => df.show())
// From node.js only Only
DataFrame.fromDSV('/my/absolue/path/myfile.txt').then(df => df.show())
DataFrame.fromDSV('/my/absolue/path/myfile.txt', ';', true).then(df => df.show())Create a DataFrame from a delimiter separated values text file. It returns a Promise. Alias of DataFrame.fromDSV.
args...anypathOrFile(String | File) A path to the file (url or local) or a browser File object.sepString The separator used to parse the file.headerBoolean A boolean indicating if the text has a header or not. (optional, defaulttrue)
DataFrame.fromText('http://myurl/myfile.txt').then(df => df.show())
// In browser Only
DataFrame.fromText(myFile).then(df => df.show())
// From node.js only Only
DataFrame.fromText('/my/absolue/path/myfile.txt').then(df => df.show())
DataFrame.fromText('/my/absolue/path/myfile.txt', ';', true).then(df => df.show())Create a DataFrame from a comma separated values file. It returns a Promise.
args...anypathOrFile(String | File) A path to the file (url or local) or a browser File object.headerBoolean A boolean indicating if the csv has a header or not. (optional, defaulttrue)
DataFrame.fromCSV('http://myurl/myfile.csv').then(df => df.show())
// For browser only
DataFrame.fromCSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromCSV('/my/absolue/path/myfile.csv').then(df => df.show())
DataFrame.fromCSV('/my/absolue/path/myfile.csv', true).then(df => df.show())Create a DataFrame from a tab separated values file. It returns a Promise.
args...anypathOrFile(String | File) A path to the file (url or local) or a browser File object.headerBoolean A boolean indicating if the tsv has a header or not. (optional, defaulttrue)
DataFrame.fromTSV('http://myurl/myfile.tsv').then(df => df.show())
// For browser only
DataFrame.fromTSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromTSV('/my/absolue/path/myfile.tsv').then(df => df.show())
DataFrame.fromTSV('/my/absolue/path/myfile.tsv', true).then(df => df.show())Create a DataFrame from a pipe separated values file. It returns a Promise.
args...anypathOrFile(String | File) A path to the file (url or local) or a browser File object.headerBoolean A boolean indicating if the psv has a header or not. (optional, defaulttrue)
DataFrame.fromPSV('http://myurl/myfile.psv').then(df => df.show())
// For browser only
DataFrame.fromPSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromPSV('/my/absolue/path/myfile.psv').then(df => df.show())
DataFrame.fromPSV('/my/absolue/path/myfile.psv', true).then(df => df.show())Create a DataFrame from a JSON file. It returns a Promise.
args...anypathOrFile(String | File) A path to the file (url or local) or a browser File object.
DataFrame.fromJSON('http://myurl/myfile.json').then(df => df.show())
// For browser only
DataFrame.fromJSON(myFile).then(df => df.show())
// From node.js only
DataFrame.fromJSON('/my/absolute/path/myfile.json').then(df => df.show())