BigData Home

Ebulk tool

Tool description

Ebulk makes easy to exchange or archive very large data sets. It performs data set ingestions or downloads from different protocols, to Wendelin-IA platform. It also allows to perform local changes in data sets and to upload added and modified files. One key feature of Ebulk is to be able to resume and recover from errors happening with interrupted transfers. See documentation

Requirements

Ebulk tool relies on Embulk Java application (see docs). Please make sure that Java 8 is installed.

Install

Please use the package installation for your operative system and follow the installation instructions.

Linux

Ebulk package available in ubuntu-ppa repository allows to easily install the tool using apt commands.

Make sure software-properties-common is installed in order to run all apt commands:

sudo apt-get install software-properties-common

Add the ppa repository:

sudo add-apt-repository ppa:rporchetto/ebulk-ppa

Update your local sources and install ebulk:

sudo apt-get update

sudo apt-get install ebulk

Debian considerations

For any OS version/series inconvenient during apt installation, it is recommended to install ebulk from the .deb package directly.

Please download the latest .deb ebulk package and install it by running:

dpkg -i ebulk_package.deb

Mac OS X

Installation on Mac OS can be done via homebrew packages by running:

brew install https://github.com/roquegit/homebrew-ebulk/raw/master/ebulk.rb

Potential installation issues

During the package intallation, or during first ebulk execution, the bash script will try to install Embulk automatically (if it is not installed). If your OS needs special permission, it maybe will be necessary to install Embulk v 0.9.7 manually:

curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.bintray.com/embulk/maven/embulk-0.9.7.jar"

chmod +x ~/.embulk/bin/embulk

echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc

source ~/.bashrc

Linux Source

Git project repository