Machine Unlearning
Fundamentos teóricos, benchmarks y métodos para el desaprendizaje en modelos de lenguaje.
Relevante Leído Pendiente Irrelevante
| Estado | Año | Título | Tipo de método | Resumen | Citas* |
|---|---|---|---|---|---|
| 2015 | Towards Making Systems Forget with Machine Unlearning | Reentrenamiento exacto | Ver | 26 | |
| 2019 | Making AI Forget You: Data Deletion in Machine Learning | Reentrenamiento exacto | Ver | 1 | |
| 2021 | Machine Unlearning via SISA | Reentrenamiento exacto | Ver | 2 | |
| 2021 | Descent-to-Delete: Gradient-Based Methods for Machine Unlearning | Reentrenamiento exacto | Ver | 0 | |
| 2022 | Knowledge Unlearning for Mitigating Privacy Risks in Language Models | Gradient ascent | Ver | 24 | |
| 2022 | Editing Models with Task Arithmetic | Enmascarado / edición de pesos | Ver | 0 | |
| 2023 | Unlearning Bias in Language Models by Partitioning Gradients (PCGU) | Enmascarado / edición de pesos | Ver | 0 | |
| 2024 | Right to be Forgotten in the Era of Large Language Models | Evaluación / análisis | Ver | 0 | |
| 2023 | Can Sensitive Information Be Deleted From LLMs? | Evaluación / análisis | Ver | 9 | |
| 2023 | Who’s Harry Potter? Approximate Unlearning in LLMs | Fine-tuning | Ver | 22 | |
| 2023 | In-Context Unlearning: Language Models as Few Shot Unlearners | Tiempo de inferencia | Ver | 10 | |
| 2023 | Large Language Model Unlearning | Fine-tuning | Ver | 19 | |
| 2024 | KL Minimization for Machine Unlearning in LLMs | Gradient ascent | Ver | 0 | |
| 2024 | TOFU: A Task of Fictitious Unlearning for LLMs | Evaluación / análisis | Ver | 21 | |
| 2024 | Rethinking Machine Unlearning for Large Language Models | Evaluación / análisis | Ver | 3 | |
| 2024 | Eight Methods to Evaluate Robust Unlearning in LLMs | Evaluación / análisis | Ver | 11 | |
| 2024 | The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | Perturbación de representaciones | Ver | 17 | |
| 2024 | Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning | Optimización de preferencias | Ver | 18 | |
| 2024 | RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models | Evaluación / análisis | Ver | 3 | |
| 2024 | Can Machine Unlearning Reduce Social Bias in Language Models? | Evaluación / análisis | Ver | 2 | |
| 2024 | MUSE: Machine Unlearning Six-Way Evaluation | Evaluación / análisis | Ver | 0 | |
| 2024 | An Adversarial Perspective on Machine Unlearning for AI Safety | Evaluación / análisis | Ver | 3 | |
| 2024 | Gradient Routing: Masking Gradients to Localize Computation in Neural Networks | Enmascarado / edición de pesos | Ver | 1 | |
| 2024 | Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning | Optimización de preferencias | Ver | 5 | |
| 2024 | LLM Unlearning via Loss Adjustment with Only Forget Data | Gradient ascent | Ver | 0 | |
| 2024 | Catastrophic Failure of LLM Unlearning via Quantization | Evaluación / análisis | Ver | 1 | |
| 2024 | Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods | Evaluación / análisis | Ver | 4 | |
| 2025 | Feature-Selective Representation Misdirection for Machine Unlearning | Perturbación de representaciones | Ver | 0 | |
| 2025 | Improving LLM Unlearning Robustness via Random Perturbations | Perturbación de representaciones | Ver | 1 | |
| 2025 | Towards LLM Unlearning Resilient to Relearning Attacks | Gradient ascent | Ver | 2 | |
| 2025 | GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection | Tiempo de inferencia | Ver | 0 | |
| 2025 | Precise In-Parameter Concept Erasure in Large Language Models | Enmascarado / edición de pesos | Ver | 0 | |
| 2025 | OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking | Evaluación / análisis | Ver | 1 | |
| 2025 | A Survey on Unlearning in Large Language Models | Evaluación / análisis | Ver | 0 | |
| 2026 | Beyond Forgetting: Machine Unlearning Elicits Controllable Side Behaviors and Capabilities | Evaluación / análisis | Ver | 0 | |
| 2026 | Per-parameter Task Arithmetic for Unlearning in Large Language Models | Enmascarado / edición de pesos | Ver | 1 |
*Solo citas entre papers del repositorio.
Estadísticas
| Tipo de método | N° de papers |
|---|---|
| Evaluación / análisis | 14 |
| Gradient ascent | 4 |
| Fine-tuning | 2 |
| Reentrenamiento exacto | 4 |
| Enmascarado / edición de pesos | 5 |
| Optimización de preferencias | 2 |
| Perturbación de representaciones | 3 |
| Tiempo de inferencia | 2 |
| Total | 36 |
Frecuencia de datasets en papers de unlearning
Número de papers (sobre 35) que utilizan cada dataset.
| Dataset | Papers que lo usan |
|---|---|
| TOFU | 18 |
| WMDP | 16 |
| Harry Potter corpus | 15 |
| MMLU | 10 |
| TruthfulQA | 6 |
| MUSE | 5 |
| Synthetic PII / textos privados | 3 |
| RWKU | 3 |
| MNIST | 3 |
| The Pile | 2 |
| StereoSet | 2 |
| WinoBias | 2 |
| CrowS-Pairs | 2 |
| Adult (UCI) | 2 |
| HellaSwag | 2 |
| WinoGrande | 2 |
| PKU-SafeRLHF, ToxiGen, RealToxicityPrompts, MT-Bench, GSM8K, ARC, BOLD, ImageNet, Purchase-100, SVHN | 1 cada uno |
Métodos que miden calidad general del modelo
De los 21 papers que proponen un método activo (excluidos los 14 de evaluación/análisis).
| Mide calidad general | N° de papers | Papers |
|---|---|---|
| Sí | 16 | Cao, Bourtoule, Jang, PCGU, Harry Potter, LLMU, KL-Min, WMDP, NPO, Simplicity-NPO, Wang, FSM, Improving, Relearning, GUARD, Per-Param |
| No | 5 | Ginart, Neel, In-Context, Gradient Routing, Barez |