Jason Tian

Suck out the marrow of data

Set up Jupyter Notebook on AWS

After one day intense work on setting up Ipython on AWS, I think it worth writing the whole process down. There are too many details that need to care about. Most of the contents here are shameless stolen from Andrew Blevins.

Contents Table

  1. Add the inbound port when setup the instance
  2. Add user and security issue (not necessary)
  3. Download Anaconda and software updates
  4. Setting up Jupyter Notebook
  5. Release Jupyter Notebook from terminal

Add the inbound port when setup the instance

When setup the instance (I used ubuntu instance), we need to add more ports when setup security group:

Adding User and Security

Open AWS server on your local terminal

ssh -i .ssh/aws_key.pem ubuntu@11.11.11.11

Remarks

  1. aws_key.pem is the downloaded password key from aws when we create the instance.
  2. 11.11.11.11 is the AWS instance public IP.
  3. This is a public account so it is not safe.

Add new user

ubuntu@ip-172-31-60-68:/home$ sudo adduser jason

Note: pick a password (save it in an easy-to-find place !! ); enter through all the other questions (name fields, etc.)

Delete user

$ sudo userdel -r olduser
User privileges

Make yourself special by granting yourself root privileges: type sudo visudo. This will open up nano (a text editor) to edit the sudoers file. Find the line that says root ALL=(ALL:ALL) ALL. Give yourself a line beneath that which says [username] ALL=(ALL:ALL) ALL.

User privilege specification
root     ALL=(ALL:ALL) ALL
jason  ALL=(ALL:ALL) ALL

Save file in nano editor: Ctrl-o then Enter when asked for the file name.
Exit file from nano editor: Ctrl-x

Setting up User Account

Now you have a user account, but you can’t just log in with a password. Passwords aren’t secure enough.
Copy your github public key (from your local machine) ~/.ssh/id_rsa.pub to your remote machine to the authorized keys file. Your github public key may have different name. You need to know your password for this SSH key. Please notice that this password may be not the same one you use to login your github account. If you cannot remember this, you need to reset github public SSH:

Jason@news-MacBook-Pro:~/.ssh$ ssh-keygen -t rsa

Remarks:

  1. After this command you need to rename the public key file or use the default id_rsa.
  2. You will set your password again.

Create the authorized_keys file

On your remote machine (AWS):

  1. create the directory
  2. then copy key from local machine (all characters inside id_rsa.pub) to remote machine.
sudo mkdir -p /home/my_cool_username/.ssh/
sudo vi /home/my_cool_username/.ssh/authorized_keys

When you paste the key please be careful that you may miss the first character ‘s’.

My example:

1)  get output from your (local machine) public key file like this:
jason$ pwd
/Users/jason/.ssh
jason$ cat id_rsa.pub

2) Copy everything (Command c)

3) On your AWS machine:  
after you run:
$ sudo nano /home/jason/.ssh/authorized_keys

To paste in the current window:  Command v
then hit  
ctrl o (to save)  
enter
ctrl x (to exit)

Change login name

Open a new terminal on your local machine under the root directory. Then create a file .ssh/config, in which .ssh is the folder where you put your password key id_rsa.

$vim `.ssh/config`
#Then inside config file paste the following lines:
Host machine_name_goes_here
Hostname 123.234.123.234
User my_cool_username

My exsample:

Host aws_ipython
     HostName 54.172.80.98
     User jason

Remarks

  1. Host is the AWS instance name.
  2. HostName is the public IP of the AWS instance
  3. user is the new user name you just created

Now you can log in to your remote machine with ssh machine_name_goes_here.

Send a file from your local machine to your remote machine

$ scp cool_file.png machine_name_goes_here:~

Type exit on your remote machine and open a shell on your local machine:

$ ssh -i .ssh/id_rsa machine_name_goes_here

If you do not change the default user name, you can use the following commands:

scp -i <location of aws key> melville-moby_dick.txt ubuntu@11.11.11.11:~/data/

Remarks:

  1. Please be careful about .ssh/id_rsa, it is not .ssh/id_rsa.pub!!
  2. You need to enter your github password for SSH.
  3. Next time you only need to type ssh machine_name_goes_here to login.

Download Anaconda and software updates

After we use that new user name to login AWS, we need to update python and install Jupyter Notebook. The easest way is to download Anaconda.

Download Anaconda

jason@ip-172-31-60-68:~$ wget http://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
jason@ip-172-31-60-68:~$ bash Anaconda3-4.2.0-Linux-x86_64.sh

After a few minutes the install will finish and tell you to put the folder that was just created at the top of your $PATH. Modify your .bashrc

jason@ip-172-31-60-68:~$ vim .bashrc

Inside .bashrc put the following line:

# added by Anaconda3 4.2.0 installer
export PATH="/home/ubuntu/anaconda3/bin:$PATH"

Then execute .bashrc under your new user root directory.

jason@ip-172-31-60-68:~$ source .bashrc

You may need to download newest version.

Confirm the version of python

jason@ip-172-31-60-68:~$ python --version

Update softwares

jason@ip-172-31-60-68:~$ sudo apt-get update

apt-get Package Management Tool
Read more about apt-get at above link.

Setting up Jupyter Notebook

On your remote machine:

$ ipython
In [1]:from IPython.lib import passwd
In [2]:passwd()

Remarks:

  1. You need to set your passwords for Jupyter Notebook.
  2. your output will have a string that starts with 'sha1:' copy this string somewhere for later use

Then you need to do this:

$ cd ~
$ mkdir .certs
$ cd .certs
$ sudo openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

Change the jupyter configuration

jason@ip-172-31-60-68:~$ mkdir .jupyter
jason@ip-172-31-60-68:~$ cd ~/.jupyter/
jason@ip-172-31-60-68:~$ vi jupyter_notebook_config.py

Then you need to copy the following codes into the second line in this file (under # Configuration file for jupyter-notebook.).

c = get_config()

# Kernel config
c.IPKernelApp.pylab = 'inline'  # if you want  plotting support always in your notebook

# Notebook config
c.NotebookApp.certfile = '/home/my_cool_username/.certs/mycert.pem' #location of your certificate file
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False  #so that the ipython notebook does not opens up a browser by default
c.NotebookApp.password = 'sha1:68c136a5b064...'  #the encrypted password we generated in ipython
# Set the port to match the port we opened in the security group
c.NotebookApp.port = 8888

Remarks:

  1. This is for Python 3+.
  2. Be careful about c.NotebookApp.certfile line. You need to enter your own path.

Then let’s run it!

$ cd ~
$ mkdir Notebooks
$ cd Notebooks
$ jupyter notebook

After this you need to open your browser and type 11.11.11.11:8888. 11.11.11.11 is your AWS instance public IP.

Release Jupyter Notebook from terminal

Sometimes we may shut down the terminal by mistakes, then all the runnning notebooks will be affected. We can use the following way to avoid this situation

$ nohup jupyter notebook

Output:

nohup: ignoring input and appending output to ‘nohup.out’

Now you cannot do anything in this terminal. Now you can close this terminal.

Kill this Jupyter Notebook

Open a new terminal and login your remote AWS machine.

$ cd Notebooks     #this folder is where you run your Jupyter Notebook last time
$ lsof nuohup.out

If your program is still running, you can see something like this:

COMMAND    PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
jupyter-n 3034 jason    1w   REG  202,1      838 269167 nohup.out
jupyter-n 3034 jason    2w   REG  202,1      838 269167 nohup.out

Then you can kill the Jupyter Notebook by this:

kill -9 3034

Remarks

  1. 3034 is the PID.