Outline
The following is the course schedule (indeed a reading list) compiled from the course website for quick reference.
| Section | Date | Topic | Readings | 
|---|---|---|---|
| I. Datasets in NLP | Aug 22 | Introduction, Historical Perspective, and Overview | Fair ML Book Chapter 7. Datasets Sambasivan et al., 2021: “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI Paullada et al., 2021 Data and its (dis)contents Raji et al., 2022 Ethical Challenges of Data Collection & Use in Machine Learning Research | 
| Aug 24 | Data Collection and Data Ethics | Deng et al., 2009 ImageNet: A large-scale hierarchical image database Kwiatkowski et al., 2019 Natural Questions: A Benchmark for Question Answering Research Sakaguchi et al., 2019 WinoGrande: An Adversarial Winograd Schema Challenge at Scale Bowman et al. 2015 A large annotated corpus for learning natural language inference Nie et al., 2020 Adversarial NLI: A New Benchmark for Natural Language Understanding | |
| Aug 31 | More on Data Ethics | Bender et al., 2021 On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Koch et al., 2021 Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research Klein and D’Ignazio, 2020 Data Feminism Book: Intro and Chapter 1 Strubell et al., 2019 Energy and Policy Considerations for Deep Learning in NLP | |
| II. Bias and Mitigation | Sep 7 | Biases: An Overview | Geirhos et al., 2020 Shortcut Learning in Deep Neural Networks Hort et al., 2022 Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey Feder et al., 2021 Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond | 
| Sep 12 | Spurious Biases I | Torralba & Efros, 2011 Unbiased Look at Dataset Bias Geva et al., 2019 Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets McCoy et al., 2019 Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in NLI | |
| Sep 14 | Spurious Biases II | Gardner et al., 2021 Competency Problems: On Finding and Removing Artifacts in Language Data Eisenstein, 2022 Informativeness and Invariance: Two Perspectives on Spurious Correlations in Natural Language | |
| Sep 19 | Data-Centric Bias Mitigation | Srivastava et al., 2020 Robustness to spurious correlations via human annotations Dixon et al., 2018 Measuring and mitigating unintended bias in text classification Gardner et al., 2019 On Making Reading Comprehension More Comprehensive | |
| Sep 21 | Data Augmentation for Bias Mitigation | Ng et al., 2020 SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving O.O.D. Robustness Kaushik et al., 2019 Learning the Difference that Makes a Difference with Counterfactually-Augmented Data | |
| III. Estimating Data Quality | Sep 26 | Estimates of Data Quality | Le Bras et al., 2020 Adversarial Filters of Dataset Biases Swayamdipta et al., 2020 Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics Liu et al., 2022 WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation Ethayarajh et al., 2022 Understanding Dataset Difficulty with V-Usable Information | 
| Sep 28 | Aggregate vs. Point-wise Estimates of Data Quality | Ghorbani & Zou, 2019 Data Shapley: Equitable Valuation of Data for Machine Learning; Perez et al., 2021 Rissanen Data Analysis: Examining Dataset Characteristics via Description Length; Mindermann et al., 2022 Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt | |
| Oct 3 | Anomalies, Outliers, and Out-of-Distribution Examples | Hendrycks et al., 2018 Deep Anomaly Detection with Outlier Exposure Ren et al., 2019 Likelihood Ratios for Out-of-Distribution Detection | |
| Oct 5 | Disagreements, Subjectivity and Ambiguity I | Pavlick et al., 2019 Inherent Disagreements in Human Textual Inferences; Röttger et al., 2022 Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks; Denton et al., 2021 Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation | |
| Oct 12 | Disagreements, Subjectivity and Ambiguity II | Miceli et al., 2020 Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision; Davani et al., 2021 Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations | |
| IV. Data for Accountability | Oct 17 | Creating Evaluation Sets | Recht et al., 2019 Do ImageNet Classifiers Generalize to ImageNet?; Card et al., 2020 With Little Power Comes Great Responsibility; Clark et al. 2021 All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text Ethayarajh & Jurafsky, 2020 Utility is in the eye of the user: a critique of NLP leaderboards | 
| Oct 19 | Counterfactual Evaluation | Gardner et al., 2020 Evaluating Models’ Local Decision Boundaries via Contrast Sets; Ross et al., 2021 Tailor: Generating and Perturbing Text with Semantic Controls | |
| Oct 24 | Adversarial Evaluation | Jia and Liang, 2017 Adversarial Examples for Evaluating Reading Comprehension Systems; Kiela et al., 2021 Dynabench: Rethinking Benchmarking in NLP; Li and Michael, 2022 Overconfidence in the Face of Ambiguity with Adversarial Data | |
| Oct 26 | Contextualizing Decisions | Gebru et al., 2018 Datasheets for Datasets; Bender and Friedman, 2018 Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science | |
| V. Beyond Labeled Datasets | Oct 31 | Unlabeled Data | Dodge et al., 2021 Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus Lee et al., 2022 Deduplicating Training Data Makes Language Models Better Gururangan et al., 2022 Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection | 
| Nov 2 | Prompts as Data? | Wei et al., 2022 Chain of Thought Prompting Elicits Reasoning in Large Language Models | |
| Nov 7 | Data Privacy and Security | Amodei et al., 2016 Concrete Problems in AI Safety Carlini et al., 2020 Extracting Training Data from Large Language Models Henderson et al., 2022 Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset | |
| Nov 9 | Towards Better Data Citizenship | Jo & Gebru, 2019 Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning Hutchinson et al., 2021 Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure |