Difficulty: intermediate
Estimated Time: 15 minutes

Learn how to forecast upcoming values in a time series

In this module we'll use the pyramid-arima library to forecast future values in a time-series

This scenario follows the procedures documented in this blog: Using Python and Auto ARIMA to Forecast Seasonal Time Series

It uses the data set : Industrial production of electric and gas utilities in the United States, from the years 1985–2018

https://fred.stlouisfed.org/series/IPG2211A2N

You've completed your first Katacoda scenario!

Don’t stop now! The next scenario will only take about 10 minutes to complete.

Time Series Modeling with Auto-ARIMA

Step 1 - Import Data

The first step is importing the data. Here you'll pull in some test data and graph the result.

Task

First lets download the csv with the data we are going to work with:

curl -o Electric_Production.csv "https://fred.stlouisfed.org/graph/fredgraph.csv?chart_type=line&recession_bars=on&log_scales=&bgcolor=%23e1e9f0&graph_bgcolor=%23ffffff&fo=Open+Sans&ts=12&tts=12&txtcolor=%23444444&show_legend=yes&show_axis_titles=yes&drp=0&cosd=1939-01-01&coed=2018-08-01&height=450&stacking=&range=&mode=fred&id=IPG2211A2N&transformation=lin&nd=1939-01-01&ost=-99999&oet=99999&lsv=&lev=&mma=0&fml=a&fgst=lin&fgsnd=2009-06-01&fq=Monthly&fam=avg&vintage_date=&revision_date=&line_color=%234572a7&line_style=solid&lw=2&scale=left&mark_type=none&mw=2&width=1168"

Now you've got the data in in a csv, pull it into pandas and check the format.

import pandas as pd
data = pd.read_csv('Electric_Production.csv',index_col=0)
print data.head()

Running the program now should display the first few records of the data set python app.py

Ok, now let's get a visual of what this data looks like.

First update the index to be timestamps, then update the column name:

data.index = pd.to_datetime(data.index)
data.columns = ['Energy Production']

Install plotly library pip install plotly --user

Now import the plotly libraries and create an html page for the graph:

import plotly.offline as ply
import plotly.graph_objs as go

graphme = [go.Scatter( x=data.index, y=data['Energy Production'] )]

ply.plot({
    "data": graphme,
    "layout": go.Layout(title="Energy Production Jan 1985--Jan 2018")
}, auto_open=True)

Run the program to generate the graph: python app.py

OK, let's view the graph:.

Render port 8000: https://[[HOST_SUBDOMAIN]]-8000-[[KATACODA_HOST]].environments.katacoda.com/temp-plot.html