연구실에서 4090 서버에서 diffusion code를 돌리는데 아래와 같은 오류가 발생했다.
중간에 "RuntimeError: CUDA error: no kernel image is available for execution on the device"라고 나와서 kernel image? 데이터셋 문제인가? 하는 생각을 했는데, 결국 torch version이 잘못 설치된게 원인이였다.
※ 아래 오류도 동일하게 해결하면 된다.
(RuntimeError: The NVIDIA driver on your system is too old (found version 11060). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.)
에러 내용
02/02/2024 12:01:36 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: no
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'CLIPTokenizer'.
The class this function is called from is 'MultiTokenCLIPTokenizer'.
{'variance_type'} was not found in config. Values will be initialized to default values.
{'upcast_attention', 'dual_cross_attention', 'use_linear_projection', 'num_class_embeds', 'only_cross_attention'} was not found in config. Values will be initialized to default values.
number of placeholder tokens are: 3
/home/hyunsoo/.local/lib/python3.8/site-packages/torch/cuda/__init__.py:143: UserWarning:
NVIDIA GeForce RTX 4090 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 4090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
...
RuntimeError: CUDA error: no kernel image is available for execution on the device
...
CalledProcessError: Command '['/usr/bin/python3', 'main.py',
'--concept_image_dir=./examples/concept_image', '--content_image_dir=./examples/content_image',
'--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--output_image_path=./outputs',
'--initializer_token=girl', '--max_train_steps=500', '--concept_embedding_num=3',
'--cross_attention_injection_ratio=0.2', '--self_attention_injection_ratio=0.9', '--use_l1']'
returned non-zero exit status 1.
4.1.0.25
해결 방법
아래 명령어를 통해 torch를 다시 설치해주면 해결된다.
이때 pip install torch==1.4.0과 같이 하지말고, url을 통해서 설치해줘야한다.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
(만약 3090 gpu를 사용중이라면 아래 명령어)
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
댓글