Machine Learning Pipeline


Table of Contents

Open Disaster Response
1. Key Points
2. Introduction
3. Tokenize Function
4. Build a Machine Learning Pipeline
5. Train Machine Learning Pipeline
6. Test Model

Key Points
ETL Pipelines
    Extracting Text Data
    APIs
    Transform Data
    Combining Data
    Cleaning Data
    Encodings
    Missing Data (Delete, Impute)
    Duplicate Data
    Dummy Variables
    Outliers
    Scaling Data
    Feature Engineering
    Load Data

NLP Pipelines
1. Text Processing
    Cleaning
    Normalization
    Tokenization
    Stop Word Removal
    Part of Speech Tagging
    Named Entity Recognition
    Stemming and Lemmatization
2. Feature Extraction
    Bag of Words
    TF-IDF
    Word Embeddings
3. Modeling

Machine Learning Pipelines
    Scikit-learn Pipeline
    Estimator
    Transformer
    Predictor
    Scikit-learn Feature Union
    Grid Search

Introduction

We will be creating a machine learning pipeline to categorize real messages that were sent during disaster events to later we can send the messages to an appropriate disaster relief agency.
We also create a web app where an emergency worker can input a new message and get classification results in several categories. The web app will also display visualizations of the data.


Tokenize Function


Build a Machine Learning Pipeline


Train Machine Learning Pipeline


Test Model