عجفت الغور
dask
Tags: pandas, numpy
Flow order
- Graph, scheduling, execution
- constructs a flow order as a graph inside a python dict
- assumptions: data is not modified in place, tasks do not hold the gil
- python bindings for native interfaces may lock, these must be made sure to be released
Dask vs Spark
- similar to spark
- spark has a higher level data representation
- dask uses numpy/pandas/sklear, spark is more mature and has sql support
- dask allows to compute arbitrary graphs
- spark allows computations from compositions of high level primitives or extending to rdd