A downloadable project

Transformer-based language models have shown a stunning collection of capabilities but largely remain black boxes. Understanding these models is hard because they employ complex non-linear interactions in densely-connected layers and operate in high-dimensional spaces. In this article, we address the problem of interpretability of large regressive language models with a principled approach inspired by basic logic. First, we show how classical mathematical logic does not grasp the reasoning system of these models and we propose the intuitive logic, which is notoriously asymmetric and redefines the classic logical operators. We then proceed with the localization of the activated areas associated with the conjunction, disjunction, negation, adversive conjunctions and conditional constructions.
From the localization results, we obtain topological important information about the network that induces the formulation of a conjecture about the mechanisms underlying the intuitive logic introduced in GPT 2-XL. We test the conjecture through model editing and conclude by laying the foundations for a connectomics for GPT.

More information

Status	Released
Category	Other
Author	gcarenini
Tags	gpt, interpretable-ia, logic

Download

Hackaton_2022_Logic.pdf 891 kB

An Intuitive Logic for Understanding Autoregressive Language Models

Download

Leave a comment