Is there any function available in Danfo.js similar to pandas isin()

1.4k Views Asked by At

I am working extensive data processing in nodejs. I found danfojs as a nice alternative to python Pandas. But facing some functionality lacking in comparison to Pandas.

How to work out pandas isin() functionality in Danfojs ?

Example:

I have the below DataFrame:

id name address
asefwc Abdullah Cumilla
wefcss Khairul Jashore
erfegf Jaman Magura
ytttte Najrul Nowga
edqfgh Latif Chattagram
yutydg Majhar Rajshahi

And the below Series:

id
wefcss
ytttte
yutydg

I want to get those rows of the Dataframe which id exists in the series

3

There are 3 best solutions below

0
On

I don't think there is an equivalent of pandas 'isin' but I've found a work around to solve this problem

const dfd = require("danfojs-node")
const data = [
    ['asefwc', 'Abdullah', 'Cumilla'],
    ['wefcss', 'Khairul', 'Jashore'],
    ['erfegf',  'Jaman', 'Magura']
]

const columns = ['id', 'name', 'address']

let df = new dfd.DataFrame(data, { columns })

let ids = df['id'].values // get all the column values

let allRowsToInclude = []

// idsToInclude is your Series
idsToInclude.forEach((idToInclude) => {
  const rowsToInclude = ids.flatMap((id, idx) => id === idToInclude ? idx : [])
  allRowsToInclude = allRowsToInclude.push(... rowsToInclude)
}

// creates a new df with filtered rows
let dfFinal = df.loc({rows: allRowsToInclude}) 
0
On

I think it's worthy to explore dfd.DataFrame.merge, dfd.DataFrame.merge, basically, choosing an specific join variant you are filtering data in an specific way.

0
On

Unfortunately, there's no native equivalent to pandas isin yet.

However, you can do something pretty similar via a boolean mask on the dataframe rows:

import * as dfd from "danfojs";

let df = new dfd.DataFrame([
  {"id":"asefwc", "name":"Abdullah", "address":"Cumilla"},
  {"id":"wefcss", "name":"Khairul", "address":"Jashore"},
  {"id":"erfegf", "name":"Jaman", "address":"Magura"},
]);

let idToInclude = ["asefwc", "erfegf"];

// Filtered dataframe:
df.loc({
  rows: df["id"].values.map(x => idToInclude.includes(x))
}).print();