A downloadable project

Transformer-based language models have shown a stunning collection of capabilities but largely remain black boxes.  Understanding these models is hard because they employ complex non-linear interactions in densely-connected layers and operate in  high-dimensional spaces. In this article, we address the problem of interpretability of large regressive language models with a principled approach inspired by basic logic. First, we show how classical mathematical logic does not grasp the reasoning system of these models and we propose the intuitive logic, which is notoriously asymmetric and redefines the classic logical operators. We then proceed with the localization of the activated areas associated with the conjunction, disjunction, negation, adversive conjunctions and conditional constructions.
From the localization results, we obtain topological important information about the network  that induces the formulation of a conjecture about the mechanisms underlying the intuitive logic introduced in GPT 2-XL. We test the conjecture through model editing and  conclude by laying the foundations for a connectomics for GPT.

Download

Download
Hackaton_2022_Logic.pdf 891 kB

Leave a comment

Log in with itch.io to leave a comment.