{
 "cells": [
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "import Pkg; Pkg.add(Pkg.PackageSpec(url=\"https://github.com/JuliaComputing/JuliaAcademyData.jl\"))\n",
    "using JuliaAcademyData; activate(\"Deep learning with Flux\")"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "outputs": [],
   "cell_type": "markdown",
   "source": [
    "<br /><br />\n",
    "\n",
    "## Neural networks\n",
    "\n",
    "Now that we know what neurons are, we are ready for the final step: the neural network!. A neural network is literally made out of a network of neurons that are connected together.\n",
    "\n",
    "So far, we have just looked at single neurons, that only have a single output.\n",
    "What if we want multiple outputs?\n",
    "\n",
    "\n",
    "### Multiple output models\n",
    "\n",
    "What if we wanted to distinguish between apples, bananas, *and* grapes? We could use *vectors* of `0` or `1` values to symbolize each output.\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/JuliaComputing/JuliaAcademyData.jl/master/courses/Deep%20learning%20with%20Flux/data/fruit-salad.png\" alt=\"Drawing\" style=\"width: 300px;\"/>\n",
    "\n",
    "The idea of using vectors is that different directions in the space of outputs encode information about different types of inputs."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "markdown",
   "source": [
    "Now we extend our previous model to give multiple outputs by repeating it with different weights. For the first element of the array we'd use:\n",
    "\n",
    "$$\\sigma(x;w^{(1)},b^{(1)}) := \\frac{1}{1 + \\exp(-w^{(1)} \\cdot x + b^{(1)})};$$\n",
    "\n",
    "then for the second we'd use\n",
    "\n",
    "$$\\sigma(x;w^{(2)},b^{(2)}) := \\frac{1}{1 + \\exp(-w^{(2)} \\cdot x + b^{(2)})};$$\n",
    "\n",
    "and if you wanted $n$ outputs, you'd have for each one\n",
    "\n",
    "$$\\sigma(x;w^{(i)},b^{(i)}) := \\frac{1}{1 + \\exp(-w^{(i)} \\cdot x + b^{(i)})}.$$"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "markdown",
   "source": [
    "Notice that these equations are all the same, except for the parameters, so we can write this model more succinctly, as follows. Let's write $b$ in an array:\n",
    "\n",
    "$$b=\\left[\\begin{array}{c}\n",
    "b_{1}\\\\\n",
    "b_{2}\\\\\n",
    "\\vdots\\\\\n",
    "b_{n}\n",
    "\\end{array}\\right]$$\n",
    "\n",
    "and put our array of weights as a matrix:\n",
    "\n",
    "$$ \\mathsf{W}=\\left[\\begin{array}{c}\n",
    "\\\\\n",
    "\\\\\n",
    "\\\\\n",
    "\\\\\n",
    "\\end{array}\\begin{array}{cccc}\n",
    "w_{1}^{(1)} & w_{2}^{(1)} & \\ldots & w_{n}^{(1)}\\\\\n",
    "w_{1}^{(2)} & w_{2}^{(2)} & \\ldots & w_{n}^{(2)}\\\\\n",
    "\\vdots & \\vdots &  & \\vdots\\\\\n",
    "w_{1}^{(n)} & w_{2}^{(n)} & \\ldots & w_{n}^{(n)}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "We can write this all in one line as:\n",
    "\n",
    "$$\\sigma(x;w,b)= \\left[\\begin{array}{c}\n",
    "\\sigma^{(1)}\\\\\n",
    "\\sigma^{(2)}\\\\\n",
    "\\vdots\\\\\n",
    "\\sigma^{(n)}\n",
    "\\end{array}\\right] = \\frac{1}{1 + \\exp(-\\mathsf{W} x + b)}$$\n",
    "\n",
    "$\\mathsf{W} x$ is the operation called \"matrix multiplication\"\n",
    "\n",
    "[Show small matrix multiplication]"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "W = [10 1;\n",
    "     20 2;\n",
    "     30 3]\n",
    "x = [3;\n",
    "     2]\n",
    "W*x"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "outputs": [],
   "cell_type": "markdown",
   "source": [
    "It takes each column of weights and does the dot product against $x$ (remember, that's how $\\sigma^{(i)}$ was defined) and spits out a vector from doing that with each column. The result is a vector, which makes this version of the function give a vector of outputs which we can use to encode larger set of choices.\n",
    "\n",
    "Matrix multiplication is also interesting since **GPUs (Graphics Processing Units, i.e. graphics cards) are basically just matrix multiplication machines**, which means that by writing the equation this way, the result can be calculated really fast."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "markdown",
   "source": [
    "This \"multiple input and multiple output\" version of the sigmoid function is known as a *layer of neurons*.\n",
    "\n",
    "Previously we worked with a single neuron, which we visualized as\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/JuliaComputing/JuliaAcademyData.jl/master/courses/Deep%20learning%20with%20Flux/data/single-neuron.png\" alt=\"Drawing\" style=\"width: 300px;\"/>\n",
    "\n",
    "where we have two pieces of data (green) coming into a single neuron (pink) that returned a single output. We could use this single output to do binary classification - to identify an image of a fruit as `1`, meaning banana or as `0`, meaning not a banana (or an apple).\n",
    "\n",
    "To do non-binary classification, we can use a layer of neurons, which we can visualize as\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/JuliaComputing/JuliaAcademyData.jl/master/courses/Deep%20learning%20with%20Flux/data/single-layer.png\" alt=\"Drawing\" style=\"width: 300px;\"/>\n",
    "\n",
    "We now have stacked a bunch of neurons on top of each other to hopefully work together and train to output results of more complicated features.\n",
    "\n",
    "We still have two input pieces of data, but now have several neurons, each of which produces an output for a given binary classification:\n",
    "* neuron 1: \"is it an apple?\"\n",
    "* neuron 2: \"is it a banana?\"\n",
    "* neuron 3: \"is it a grape?\""
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "markdown",
   "source": [
    "# Multiple outputs with Flux.jl"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "markdown",
   "source": [
    "First step: load the data."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "using CSV, DataFrames, Flux, Plots\n",
    "# Load apple data in CSV.read for each file\n",
    "apples1 = DataFrame(CSV.File(datapath(\"data/Apple_Golden_1.dat\"), delim='\\t', allowmissing=:none, normalizenames=true))\n",
    "apples2 = DataFrame(CSV.File(datapath(\"data/Apple_Golden_2.dat\"), delim='\\t', allowmissing=:none, normalizenames=true))\n",
    "apples3 = DataFrame(CSV.File(datapath(\"data/Apple_Golden_3.dat\"), delim='\\t', allowmissing=:none, normalizenames=true))\n",
    "# And then concatenate them all together\n",
    "apples = vcat(apples1, apples2, apples3)\n",
    "bananas = DataFrame(CSV.File(datapath(\"data/Banana.dat\"), delim='\\t', allowmissing=:none, normalizenames=true))\n",
    "grapes1 = DataFrame(CSV.File(datapath(\"data/Grape_White.dat\"), delim='\\t', allowmissing=:none, normalizenames=true))\n",
    "grapes2 = DataFrame(CSV.File(datapath(\"data/Grape_White_2.dat\"), delim='\\t', allowmissing=:none, normalizenames=true))\n",
    "grapes = vcat(grapes1, grapes2)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "# Extract out the features and construct the corresponding labels\n",
    "x_apples  = [ [apples[i, :red], apples[i, :blue]] for i in 1:size(apples, 1) ]\n",
    "x_bananas  = [ [bananas[i, :red], bananas[i, :blue]] for i in 1:size(bananas, 1) ]\n",
    "x_grapes = [ [grapes[i, :red], grapes[i, :blue]] for i in 1:size(grapes, 1) ]\n",
    "xs = vcat(x_apples, x_bananas, x_grapes)\n",
    "ys = vcat(fill([1,0,0], size(x_apples)),\n",
    "          fill([0,1,0], size(x_bananas)),\n",
    "          fill([0,0,1], size(x_grapes)))\n",
    "# ### One-hot vectors\n",
    "# Recall:\n",
    "#\n",
    "# <img src=\"https://raw.githubusercontent.com/JuliaComputing/JuliaAcademyData.jl/master/courses/Deep%20learning%20with%20Flux/data/fruit-salad.png\" alt=\"Drawing\" style=\"width: 300px;\"/>\n",
    "# `Flux.jl` provides an efficient representation for one-hot vectors, using advanced features of Julia so that it does not actually store these vectors, which would be a waste of memory; instead `Flux` just records in which position the non-zero element is. To us, however, it looks like all the information is being stored:\n",
    "using Flux: onehot\n",
    "\n",
    "onehot(2, 1:3)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "ys = vcat(fill(onehot(1, 1:3), size(x_apples)),\n",
    "          fill(onehot(2, 1:3), size(x_bananas)),\n",
    "          fill(onehot(3, 1:3), size(x_grapes)))\n",
    "# ## The core algorithm from the previous lecture\n",
    "# model = Dense(2, 1, σ)\n",
    "# L(x,y) = Flux.mse(model(x), y)\n",
    "# opt = SGD(params(model))\n",
    "# Flux.train!(L, zip(xs, ys), opt)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "outputs": [],
   "cell_type": "markdown",
   "source": [
    "### Visualization"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "using Plots\n",
    "plot()\n",
    "\n",
    "contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[1], levels=[0.5, 0.51], color = cgrad([:blue, :blue]))\n",
    "contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[2], levels=[0.5,0.51], color = cgrad([:green, :green]))\n",
    "contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[3], levels=[0.5,0.51], color = cgrad([:red, :red]))\n",
    "\n",
    "scatter!(first.(x_apples), last.(x_apples), m=:cross, label=\"apples\", color = :blue)\n",
    "scatter!(first.(x_bananas), last.(x_bananas), m=:circle, label=\"bananas\", color = :green)\n",
    "scatter!(first.(x_grapes), last.(x_grapes), m=:square, label=\"grapes\", color = :red)"
   ],
   "metadata": {},
   "execution_count": null
  }
 ],
 "nbformat_minor": 3,
 "metadata": {
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.0.3"
  },
  "kernelspec": {
   "name": "julia-1.0",
   "display_name": "Julia 1.0.3",
   "language": "julia"
  }
 },
 "nbformat": 4
}