[month] [year]

Saujas Srinivasa V – Linguistic Rules

Saujas Srinivasa Vaduguru received his MS Dual Degree in Computational Linguistics (CL). His research work was supervised by Prof. Dipti M Sharma. Here’s a summary of his  research work on Program synthesis for linguistic rules:

Recent work in NLP has focused on applying powerful neural sequence models to various learning
problems. These neural models excel at extracting statistical patterns from large amounts of data, but
struggle to learn patterns or reason about language from only a few examples. We ask the question: Can
we learn explicit rules that generalize well from only a few examples?
We explore this question by viewing linguistic rules as programs that operate on linguistic forms.
This allows us to tackle the problem of learning linguistic rules using program synthesis. We develop
a synthesis model to learn phonology rules as programs in a domain-specific language. In addition to
being highly sample-efficient, our approach generates human-readable programs, and allows control
over the generalizability of the learnt programs.
We test the ability of our models to generalize from few training examples using our new dataset of
problems from the Linguistics Olympiad. These problems are tasks from contests for high school students around the world that require inferring linguistic patterns from a small number of given examples.
These problems are a challenging set of tasks that require strong linguistic reasoning ability.
Having shown that program synthesis – a method to learn rules from data in the form of programs in
a domain-specific language – can be used to learn phonological rules in highly data-constrained settings.
We use the problem of phonological stress placement as a case to study how the design of the domain specific language influences the generalisation ability when using the same learning algorithm. We find
that encoding the distinction between consonants and vowels results in much better performance, and
providing syllable-level information further improves generalization. Program synthesis, thus, provides
a way to investigate how access to explicit linguistic information influences what can be learnt from a
small number of examples.