Optimization

Projected Gradient Descent (PGD)

PGD is used to solve constrained optimization problem. It is same as the gradient descent except every time the gradient is projected onto the subspace spanned by the constraints.

Typical Problems

Computing Query Given Document Embeddings

Given multiple embeddings $\mathbf{e} _ 1, \cdots, \mathbf{e} _ K$ , find a query $\mathbf{q}$ made from linear combination of $\mathbf{e} _ 1,\cdots, \mathbf{e} _ K$ so that the overall inner product (i.e., cosine similarity) is maximized. This problem could be written as below; it is unbounded. Here $\mathbf{A} := \mathbf{E}^T\mathbf{E}$ and $\mathbf{E} = \begin{bmatrix}\mathbf{e} _ 1 ,&\cdots, &\mathbf{e} _ K \end{bmatrix}$ :
$\max _ \alpha\quad 1^T \mathbf{A\alpha}\quad s.t.\quad 1^T \alpha = 1$
If we further require that all $\alpha$ are non-negative, the solution to this problem is a vector that selects only one of the vectors in $\mathbf{E}$ .

Reference

Universal Adversarial Triggers for Attacking and Analyzing NLP (Wallace et al., EMNLP-IJCNLP 2019)
Universal Adversarial Attacks on Text Classifiers (Behjati et al.)