Welcome!
Upgrading a Spark application image
Hello. Today we will be teaching you how to create an image on the Blue Data EPIC platform. In this scenario you will learn how to upgrade Spark image using the Blue Data EPIC Application Workbench on a CentOS base image container.
Prerequisites:
-Basic knowledge on containers
-Linux administration
-Git
-Spark and Hadoop
This scenario is developed by:
Congratulations!
You've completed the scenario!
Scenario Rating
You have successfully completed the scenario for upgrading Spark image using an already existing image!!
Your environment is currently being packaged as a Docker container and the download will begin shortly. To run the image locally, once Docker has been installed, use the commands
cat scrapbook_bluedata_spark_upgrade_scenarios_container.tar | docker load
docker run -it /bluedata_spark_upgrade_scenarios:
Oops!! Sorry, it looks like this scenario doesn't currently support downloads. We'll fix that shortly.

Steps
Upgrading a Spark application image
Step 1 - Creating a base directory
For creating the image in Bluedata EPIC platform, you need to install Bluedata EPIC App workbench.
We have already setup and installed the Blue Data EPIC Application Workbench for this scenario.
For installing Bluedata EPIC App workbench.
Go to this link: Link
To check what version of the Application Workbench is running, please execute the following command:
bdwb --version
Now, to begin let us create the directory. This directory will house all the files and components necessary to create the application image.
mkdir ~/Source
mkdir ~/Source/Spark
Step 2 - Copying the bin file
For upgrading the image, we will be using a bin file of the existing image already developed. Let us download a .bin file to the directory we have created.
Before that, let us install wget package. For doing so, execute the following command
yum install wget -y
Now, navigate to the Source directory and download the bin file by executing the following command(Due to space constraint in Katacoda we have uploaded the required files into dropbox)
cd ~/Source
wget https://www.dropbox.com/s/r4m3bba2qtueu5q/bdcatalog-centos7-bluedata-spark221-1.0.bin.tar
Extract the content of the tar file by executing the following command
tar xvf bdcatalog-centos7-bluedata-spark221-1.0.bin.tar
Now, Make the .bin file executable by executing the following command
chmod +x bdcatalog-centos7-bluedata-spark221-1.0.bin
Step 3 - Extracting the contents of the bin file
In this step we will extract the contents of the .bin file.
For this execute the following command.
./bdcatalog-centos7-bluedata-spark221-1.0.bin --payload
This will create two files in the directory. (to view the files execute the following command)
ls
Untar the payload.tar file by executing the following command
tar xvf payload.tar
This will create new directories. (to view the files execute the following command)
ls
Remove the bin and tar file to save space on the device
rm -rf bdcatalog-centos7-bluedata-spark221-1.0.bin
rm -rf bdcatalog-centos7-bluedata-spark221-1.0.bin.tar
Navigate to the new directory, and untar the file by executing the following command
cd bdcatalog-centos7-bluedata-spark221-1.0
tar xvf bdcatalog-centos7-bluedata-spark221-1.0-src.tgz
Now navigate to the directory created after the above step
cd bdcatalog-centos7-bluedata-spark221-1.0-src
Copy the content of this directory to the directory you created initially
cp -R * ~/Source/Spark
Step 4 - Modifying the existing image
Now, to upgrade an existing image to a new one, we should make changes in some of the files like:
In the next few steps we will be modifying these files to upgrade our application image.
Step 5 - Modifying the .wb file
Now, we will make changes in .wb file first.
.wb file contains a series of App Workbench commands that control the creation of the Catalog Image
Please review the link before proceeding for wb file: Link
For your reference, we have already created a ready .wb file in the ~/test directory. We will copy that file into the Spark directory using the following command:
cd ~/Source/Spark
Delete the existing .wb file and json file before you copy the new .wb file by executing the below command
rm -rf spark221.wb spark221e1.json
Copy the new .wb file into Spark directory using the following command:
cp ~/test/spark_notebook.wb ~/Source/Spark
To view the content of the file, execute the following
cat ~/Source/Spark/spark_notebook.wb
Step 6 - Modifying the Dockerfile
The next step is to create a Dockerfile.
"A docker file is a text file that the Docker engine understands to automatically build an image by reading the file. The Dockerfile consists of all the commands a user would call to assemble the desired image."
Let’s go ahead and create a Dockerfile inside the newly created centos folder.
For your reference, we have already created a ready Dockerfile in the ~/test directory.We will copy that file into the centos folder using the following command:
Delete the existing Dockerfile by executing the below command:
rm -rf ~/Source/Spark/image/centos/Dockerfile
Copy the new Dockerfile into the centos folder using the following command:
cp ~/test/Dockerfile ~/Source/Spark/image/centos/
Copy the updatePath.sh into the centos folder using the following command:
cp ~/test/updatePath.sh ~/Source/Spark/image/centos/
Execute the following command to view the contents of the Dockerfile:
cat ~/Source/Spark/image/centos/Dockerfile
FROM bluedata/centos7:latest
You have ability to create your own base image.
e.g: FROM ubuntu:12.04, ubuntu:12.04 is the base image used here.
Copy java8.sh file to centos directory
cp ~/Source/Spark/image/centos/spark/configure_java8.sh ~/Source/Spark/image/centos/
Copy jupyter file to centos directory
cp ~/Source/Spark/image/centos/spark/configure_jupyter.sh ~/Source/Spark/image/centos/
Step 7 - Adding configuration fils under appconfig directory
We need to add additional configuration files under appconfig directory, We already have a reference configuration files, To add them follow the below steps
Remove the existing appconfig folder from the Spark folder
rm -rf appconfig
Add the appconfig reference file using the below command from stored location(Due to space constraint in Katacoda we have uploaded the required files into dropbox)
wget https://www.dropbox.com/s/f8cjkwzz6qdov0d/appconfig.zip
yum install unzip -y
Unzip the file
unzip appconfig.zip
Check files under appconfig directory
ls appconfig
Make sure you see following files unders appconfig folder:
Remove the zip file from the folder
rm -rf appconfig.zip
Step 8 - Creating the image
Now, for creating the bin file, let us execute the .wb file.
./spark_notebook.wb
After executing the command, wait for sometime as the bin file will take time to create!!!!
Step 9 - Finalising your build for Spark image
The newly built application package (or bundle) is saved in the deliverables directory.
cd deliverables
ls
Make it an executable using the below command
chmod +x bdcatalog-centos7-bluedata-spark231mjuphub7x-2.1.bin
Congratulations you have successfully upgraded spark with notebook image.
To make the new image appear in App Store Copy the bin file to /srv/bluedata/catalog
Go to App Store in the EPIC GUI and click on Refresh button to bring the image in to App Store.
Once the image appears on the App Store click on Install button to install the image.