ArchiveorgSoftwarePyTorch code for Vision Transformers training with the Self-Supervised learning method DINO PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supervised Vision Transformers. [[blogpost](] [[arXiv](] [[Yannic Kilcher's video](] Pretrained modelsYou can choose to download only the weights of the pretrained backbone used for downstream tasks, or the full checkpoint which contains backbone and projection head weights for both student and teacher networks. We also provide the training and evaluation logs.The pretrained models are available on PyTorch Hub.pythonimport torchdeits16 = torch.hub.load('facebookresearch/dino:main', 'dino_deits16')deits8 = torch.hub.load('facebookresearch/dino:main', 'dino_deits8')vitb16 = torch.hub.load('facebookresearch/dino:main', 'dino_vitb16')vitb8 = torch.hub.load('facebookresearch/dino:main', 'dino_vitb8')resnet50 = torch.hub.load('facebookresearch/dino:main', 'dino_resnet50')TrainingDocumentationPlease install PyTorch and download the ImageNet dataset. This codebase has been developed with python version 3.6, PyTorch version 1.7.1, CUDA 11.0 and torchvision 0.8.2. The exact arguments to reproduce the models presented in our paper can be found in the args column of the pretrained models section. For a glimpse at the full documentation of DINO training please run:python --help[1][2][3]Vanilla DINO training :sauropod:Run DINO with DeiT-small network on a single node with 8 GPUs for 100 epochs with the following command. Training time is 1.75 day and the resulting checkpoint should reach 69.3% on k-NN eval and ~73.8% on linear eval. We provide training and linear evaluation logs for this run to help reproducibility.python -m torch.distributed.launch --nproc_per_node=8 --arch deit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir[4][5]Multi-node trainingWe use Slurm and submitit (pip install submitit). To train on 2 nodes with 8 GPUs each (total 16 GPUs):python --nodes 2 --ngpus 8 --arch deit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir[6]DINO with ViT-base network.python --nodes 2 --ngpus 8 --use_volta32 --arch vit_base --data_path /path/to/imagenet/train --output_dir /path/to/saving_dirBoosting DINO performance :t-rex:You can improve the performance of the vanilla run by:- training for more epochs: --epochs 300,- increasing the teacher temperature: --teacher_temp 0.07 --warmup_teacher_temp_epochs 30.- removing last layer normalization (only safe with --arch deit_small): --norm_last_layer false,Full command.python --arch deit_small --epochs 300 --teacher_temp 0.07 --warmup_teacher_temp_epochs 30 --norm_last_layer false --data_path /path/to/imagenet/train --output_dir /path/to/saving_dirThe resulting pretrained model should reach 73.3% on k-NN eval and ~76.1% on linear eval. Training time is 2.6 days with 16 GPUs. We provide training[7] and linear evaluation[8] logs for this run to help reproducibility.ResNet-50 and other convnets trainingsThis code also works for training DINO on convolutional networks, like ResNet-50 for example. We highly recommend to adapt some optimization arguments in this case. For example following is a command to train DINO on ResNet-50 on a single node with 8 GPUs for 100 epochs. We provide training logs for this run.python -m torch.distributed.launch --nproc_per_node=8 --arch resnet50 --optimizer sgd --weight_decay 1e-4 --weight_decay_end 1e-4 --global_crops_scale 0.14 1 --local_crops_scale 0.05 0.14 --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir[9]Self-attention visualizationYou can look at the self-attention of the [CLS] token on the different heads of the last layer by running:python visualize_attention.pyAlso, check out this colab[10] for video inference. Evaluation: k-NN classification on ImageNetTo evaluate a simple k-NN classifier with a single GPU on a pre-trained model, run:python -m torch.distributed.launch --nproc_per_node=1 --data_path /path/to/imagenetIf you choose not to specify --pretrained_weights, then DINO reference weights are used by default. If you want instead to evaluate checkpoints from a run of your own, you can run for example:python -m torch.distributed.launch --nproc_per_node=1 --pretrained_weights /path/to/checkpoint.pth --checkpoint_key teacher --data_path /path/to/imagenet Evaluation: Linear classification on ImageNetTo train a supervised linear classifier on frozen weights on a single node with 8 gpus, run:python -m torch.distributed.launch --nproc_per_node=8 --data_path /path/to/imagenetLicenseSee the LICENSE[11] file for more details.CitationIf you find this repository useful, please consider giving a star :star: and citation :t-rex::@article{caron2021emerging, title={Emerging Properties in Self-Supervised Vision Transformers}, author={Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J\'egou, Herv\'e and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand}, journal={arXiv preprint arXiv:2104.14294}, year={2021}} To restore the repository download the bundle wget and run: git clone facebookresearch-dino_-_2021-05-03_09-11-30.bundle Source:[12]Uploader: facebookresearch[13]Upload date: 2021-05-03 References^ PyTorch (^ ImageNet (^ pretrained models section (^ training (^ linear evaluation (^ submitit (^ training (^ linear evaluation (^ training (^ this colab (^ LICENSE (^ (^ facebookresearch (

weiterlesen: RSS Quelle öffnen