BigData Home

Wendelin Data Lake Sharing Platform

Ebulk + Wendelin = Big Data sharing platform

Ebulk tool and Wendelin platform are combined to form an easy to use Data Lake to share petabytes of data grouped into data sets. Big Data sharing is essential for research and startups, due building new A.I. models requires access to large data sets, usually available in big platforms such as Google or Alibaba which tend to keep them secret. This project offers a solution to the big data sharing problem by solving the following key points:

  • Huge transfer (over slow and unreliable network)
  • Huge storage (with little budget)
  • Many protocols (S3, HTTP, FTP, etc.)
  • Many binary formats (ndarray, video, etc.)
  • Trade secret

Data lake

Dozens of public and private big data sets are available in the platform, terabytes of data of any kind, including binaries like medical images, ndarrays and more. Do you want to download data sets or share your data? Download our Ebulk tool to transfer big data!

See our full data set list!

dataset icon

Register to get full functionality.

Ebulk tool

Ebulk tool is a wrapper for Embulk, an open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. It supports any kind of input file formats, parallel and distributed execution to deal with big data sets, transaction control to guarantee All-or-Nothing file transfer, and operation resuming. Ebulk is as easy as git to use, allowing the big data transfering to be done by using very few commands. Please, download Ebulk and check the documentation.

Wendelin

Wendelin is a big data framework designed for industrial applications based on python, NumPy, Scipy and other NumPy based libraries. It uses at its core the NEO distributed transactional NoSQL database to store petabytes of binary data. Wendelin combines the performance of scikit-learn machine learning with NEO distributed storage in order to provide out-of-core processing of large data sets. Its goal is to bring the best open source, big data engine based on Numpy python technologies and gather a wide community of contributors of new data analytics algorithms.

Acknowledgements