|
2021 |
An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models |
Evaluación / análisis |
Ver |
|
2021 |
Sustainable Modular Debiasing of Language Models |
Adapters / PEFT |
Ver |
|
2021 |
FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders |
Fine-tuning / data augmentation |
Ver |
|
2022 |
Debiasing Pre-Trained Language Models via Efficient Fine-Tuning |
Fine-tuning / data augmentation |
Ver |
|
2022 |
MABEL: Attenuating Gender Bias using Textual Entailment Data |
Fine-tuning / data augmentation |
Ver |
|
2022 |
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned |
Evaluación / análisis |
Ver |
|
2022 |
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback |
Fine-tuning / data augmentation |
Ver |
|
2023 |
Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions |
Fine-tuning / data augmentation |
Ver |
|
2023 |
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models |
Adapters / PEFT |
Ver |
|
2023 |
Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination |
Edición de pesos / neuronas |
Ver |
|
2023 |
Causal-Debias: Unifying Debiasing in Pretrained Language Models via Causal Invariant Learning |
Causal / invariante |
Ver |
|
2023 |
D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias |
Fine-tuning / data augmentation |
Ver |
|
2024 |
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes |
Tiempo de inferencia |
Ver |
|
2024 |
ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs |
Fine-tuning / data augmentation |
Ver |
|
2025 |
BiasEdit: Debiasing Stereotyped Language Models via Model Editing |
Edición de pesos / neuronas |
Ver |
|
2025 |
Debiasing the Fine-Grained Classification Task in LLMs with Bias-Aware PEFT |
Adapters / PEFT |
Ver |
|
2025 |
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models |
Tiempo de inferencia |
Ver |
|
2025 |
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering |
Tiempo de inferencia |
Ver |
|
2025 |
LLM Bias Detection and Mitigation through the Lens of Desired Distributions |
Fine-tuning / data augmentation |
Ver |
|
2025 |
BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them |
Evaluación / análisis |
Ver |
|
2025 |
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective |
Evaluación / análisis |
Ver |
|
2025 |
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias |
Evaluación / análisis |
Ver |
|
2026 |
KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement |
Edición de pesos / neuronas |
Ver |
|
2026 |
No Free Lunch in Language Model Bias Mitigation? |
Evaluación / análisis |
Ver |