NLPer的福音，宝宝脸开源transformers工具-FlyAI

能够灵活地调用各种语言模型，一直是 NLP 研究者的期待。近日 HuggingFace 公司开源了最新的 Transformer2.0 模型库，用户可非常方便地调用现在非常流行的 8 种语言模型进行微调和应用，且同时兼容 TensorFlow2.0 和 PyTorch 两大框架，非常方便快捷。 transformers(原名pytorch-transformers和pytorch-pretrained-bert)提供通用的架构(BERT GPT-2,罗伯塔,XLM, DistilBert, XLNet…)对自然语言理解(NLU)和自然语言生成(NLG)在32+预训练模型，100+种语言和TensorFlow 2.0和PyTorch之间的互操作性。

最近，专注于自然语言处理（NLP）的初创公司 HuggingFace 对其非常受欢迎的 Transformers 库进行了重大更新，从而为 PyTorch 和 Tensorflow 2.0 两大深度学习框架提供了前所未有的兼容性。更新后的 Transformers 2.0 汲取了 PyTorch 的易用性和 Tensorflow 的工业级生态系统。借助于更新后的 Transformers 库，科学家和实践者可以更方便地在开发同一语言模型的训练、评估和制作阶段选择不同的框架。那么更新后的 Transformers 2.0 具有哪些显著的特征呢？对 NLP 研究者和实践者又会带来哪些方面的改善呢？项目地址：https://github.com/huggingface/transformers Transformers 2.0 新特性像 pytorch-transformers 一样使用方便；像 Keras 一样功能强大和简洁；在 NLU 和 NLG 任务上实现高性能；对教育者和实践者的使用门槛低。为所有人提供 SOTA 自然语言处理深度学习研究者；亲身实践者；AI/ML/NLP 教师和教育者。更低的计算开销和更少的碳排放量研究者可以共享训练过的模型，而不用总是重新训练；实践者可以减少计算时间和制作成本；提供有 8 个架构和 30 多个预训练模型，一些模型支持 100 多种语言；为模型使用期限内的每个阶段选择正确的框架 3 行代码训练 SOTA 模型；实现 TensorFlow 2.0 和 PyTorch 模型的深度互操作；在 TensorFlow 2.0 和 PyTorch 框架之间随意移动模型；为模型的训练、评估和制作选择正确的框架。

该库目前包含了PyTorch和Tensorflow实现，预训练的模型权重，使用脚本和转换工具，用于以下模型:

The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:

BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

GPT (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.

GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.

Transformer-XL (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.

XLNet (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.

RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.

DistilBERT (from HuggingFace) released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2.

CTRL (from Salesforce), released together with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.

CamemBERT (from FAIR, Inria, Sorbonne Université) released together with the paper CamemBERT: a Tasty French Language Model by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah, and Benoît Sagot.

ALBERT (from Google Research), released together with the paper a ALBERT: A Lite BERT for Self-supervised Learning of Language Representations by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.

XLM-RoBERTa (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.

FlauBERT (from CNRS) released with the paper FlauBERT: Unsupervised Language Model Pre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab

快速上手怎样使用 Transformers 工具包呢？官方提供了很多代码示例，使用 py 文件脚本进行模型微调当然，有时候你可能需要使用特定数据集对模型进行微调，Transformer2.0 项目提供了很多可以直接执行的 Python 文件。例如： run_glue.py：在九种不同 GLUE 任务上微调 BERT、XLNet 和 XLM 的示例（序列分类）；

run_squad.py：在问答数据集 SQuAD 2.0 上微调 BERT、XLNet 和 XLM 的示例（token 级分类）；

run_generation.py：使用 GPT、GPT-2、Transformer-XL 和 XLNet 进行条件语言生成；其他可用于模型的示例代码。 GLUE 任务上进行模型微调如下为在 GLUE 任务进行微调，使模型可以用于序列分类的示例代码，使用的文件是 run_glue.py。

首先下载 GLUE 数据集，并安装额外依赖： pip install -r ./examples/requirements.txt 然后可进行微调： export GLUE_DIR=/path/to/glueexport TASK_NAME=MRPCpython ./examples/run_glue.py \ --model_type bert \ --model_name_or_path bert-base-uncased \ --task_name $TASK_NAME \ --do_train \ --do_eval \ --do_lower_case \ --data_dir $GLUE_DIR/$TASK_NAME \ --max_seq_length 128 \ --per_gpu_eval_batch_size=8 \ --per_gpu_train_batch_size=8 \ --learning_rate 2e-5 \ --num_train_epochs 3.0 \ --output_dir /tmp/$TASK_NAME/ 在命令行运行时，可以选择特定的模型和相关的训练参数。

使用 SQuAD 数据集微调模型另外，你还可以试试用 run_squad.py 文件在 SQuAD 数据集上进行微调。代码如下： python -m torch.distributed.launch --nproc_per_node=8 ./examples/run_squad.py \ --model_type bert \ --model_name_or_path bert-large-uncased-whole-word-masking \ --do_train \ --do_eval \ --do_lower_case \ --train_file $SQUAD_DIR/train-v1.1.json \ --predict_file $SQUAD_DIR/dev-v1.1.json \ --learning_rate 3e-5 \ --num_train_epochs 2 \ --max_seq_length 384 \ --doc_stride 128 \ --output_dir ../models/wwm_uncased_finetuned_squad/ \ --per_gpu_eval_batch_size=3 \ --per_gpu_train_batch_size=3 \ 这一代码可微调 BERT 全词 Mask 模型，在 8 个 V100GPU 上微调，使模型的 F1 分数在 SQuAD 数据集上超过 93。用模型进行文本生成还可以使用 run_generation.py 让预训练语言模型进行文本生成，代码如下： python ./examples/run_generation.py \ --model_type=gpt2 \ --length=20 \ --model_name_or_path=gpt2 \ 安装方法如此方便的工具怎样安装呢？用户只要保证环境在 Python3.5 以上，PyTorch 版本在 1.0.0 以上或 TensorFlow 版本为 2.0.0-rc1。然后使用 pip 安装即可。 pip install transformers 移动端部署很快就到 HuggingFace 在 GitHub 上表示，他们有意将这些模型放到移动设备上，并提供了一个 repo 的代码，将 GPT-2 模型转换为 CoreML 模型放在移动端。未来，他们会进一步推进开发工作，用户可以无缝地将大模型转换成 CoreML 模型，无需使用额外的程序脚本。 repo 地址：https://github.com/huggingface/swift-coreml-transformers