Training an Imitation Network to be Interpretable, Sensible and to Generalize: More Coming Soon

Notes:

Unlimited Amount of Data

Imitation learning avoids many of the problems encountered in training deep neural networks because there is effectively an unlimited amount of data. Furthermore, the data does not need to be labeled in order to be used as training data for training the imitation network. Although exceptions will be mentioned below, the standard objective of the imitation network is simply to match the output of the imitated system, regardless of whether the imitated system is correct.

There is also effectively an unlimited amount of data that may be set aside, not used in training, and that may be used for development testing. Because there is an unlimited amount of development data, there can be many rounds of development testing with freah data. Therefore, a human + AI learning management system may make data-specific changes in hyperparameters or the architecture of the network and validate them on new data.

Because of the plentiful development data

a. There is unlimited data for training imitation network on the imitation task • There is also an unlimited amount of data to set aside as development data 1. Enables accurate measurement of bias, variance and ability to generalize a. Both of the original system and of the imitation system • E.g. For text prediction, it may even be a complete n-gram concordance

Special Architectures for the Imitation Network

c. The imitation network may have an architecture specifically designed for interpretability … d. The imitation network may have an architecture specifically designed to be robust against adversarial attacks … • For example, the imitation network may have many internal bottleneck layers e. The imitation system may be an ensemble with heavy knowledge sharing

Restricting the Number of Degrees of Freedom in the Imitation Network

Comparing Performance Datum by Datum

g. The performance of the imitation network may be compared datum-by-datum with the performance of the original network • This comparison may be done on new data 1. Which directly measures the performance of the imitation network with respect to bias, variance, ability to generalize

Trial and Test Paradigm: Reinforcement Learning

Original Network may be Deliberately Designed to Overfit

b. The imitation network may out preform the original system • As measured on data that is new to both of them f. The number of degrees of freedom of the imitation network may be arbitrarily restricted • The effects of any restriction or regularization of the imitation network are easy to observe and measure • The effects of any restriction or regularization of the imitation network may be easy to interpret h. Trial and test paradigm • For experimental changes in the imitation network that change the error rate and/or change the effective number of degrees of freedom, the performance effect may be explicitly measured with an estimated confidence interval • The imitation network may be globally optimized by reinforcement learning with local numerically estimated gradient descent 1. This is the easy kind of reinforcement learning – there is a direct local estimate of the best direction to proceed 2. **The reinforcement learning may be implemented with a multi-stack priority queue beam search (INDEPENDENT CLAIM) a. **Compare performance of competing architectures on the same data (training and test) i. May use multiple test cases j. In some embodiments, the original network may be deliberately trained to overfit the training data … k. The original network may be an ensemble

Navigation Menu

by James K Baker and Bradley J Baker

© D5AI LLC, 2020

The text in this work is licensed under a Creative Commons Attribution 4.0 International License.
Some of the ideas presented here are covered by issued or pending patents. No license to such patents is created or implied by publication or reference to herein.