Optimization
Projected Gradient Descent (PGD)
PGD is used to solve constrained optimization problem. It is same as the gradient descent except every time the gradient is projected onto the subspace spanned by the constraints.
Typical Problems
-
Computing Query Given Document Embeddings
Given multiple embeddings \mathbf{e} _ 1, \cdots, \mathbf{e} _ K, find a query \mathbf{q} made from linear combination of \mathbf{e} _ 1,\cdots, \mathbf{e} _ K so that the overall inner product (i.e., cosine similarity) is maximized. This problem could be written as below; it is unbounded. Here \mathbf{A} := \mathbf{E}^T\mathbf{E} and \mathbf{E} = \begin{bmatrix}\mathbf{e} _ 1 ,&\cdots, &\mathbf{e} _ K \end{bmatrix}:
\max _ \alpha\quad 1^T \mathbf{A\alpha}\quad s.t.\quad 1^T \alpha = 1
If we further require that all \alpha are non-negative, the solution to this problem is a vector that selects only one of the vectors in \mathbf{E}.
Reference
- Universal Adversarial Triggers for Attacking and Analyzing NLP (Wallace et al., EMNLP-IJCNLP 2019)
- Universal Adversarial Attacks on Text Classifiers (Behjati et al.)