Keras 環境設定 - @//メモ

[ トップ ] [ 編集 | 凍結 | 差分 | バックアップ | 添付 | リロード ] [ 新規 | 一覧 | 単語検索 | 最終更新 | ヘルプ ] [ Trackback(0) ]

2025-07-152025
53.4% (195/365)
July
45.2% (14/31)
Week 29
14.5% (1/7)
Day 15 Tue
1.9% (0.4/24)

目次

FrontPage
・Windows11
・Chromebook
・Random
・機械学習
・さくらVPS
・Fedora13
・SuSe10
・Docker
・Ansible
・Java
・Scala
・Python
・Ruby
・Lisp
・Computer
・GIS
・HTML
・Culture
・Link

訪問者

total: 946
today: 1
yesterday: 1
now: 1

更新

最新の10件

2024-06-02

FrontPage

2024-05-06

2024-04-18

2024-04-11

Keras MNIST

2024-04-08

Random

2024-03-20

2024-03-18

AI Github Copilot

人気の10件

MenuBar

想定環境
やらないこと
- OSに直接 CUDA CuDNN Tensorflow をインストールしない
- 不用意にNvidiaのドライバをインストールしない
Docker と nvidia-container-toolkit のインストール
ベースイメージの選定
VSCode DevContainer の作成
VSCode DevContainer に入る
VSCode DevContainer を使ってみる
Junpyter note を使ってみる
なんか Unable to register cuDNN とか出てるのはイイの？

想定環境 †

NVIDIA T550 GPU を搭載したノートパソコン

Windows 11 の WSL2 上で動く Ubuntu 22 cf. Win11 WSL2

atsushi@AT-GPU-NOTE:~/projects$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
atsushi@AT-GPU-NOTE:~/projects$ nvidia-smi
Wed Mar  6 22:57:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.133                Driver Version: 537.79       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA T550 Laptop GPU         On  | 00000000:03:00.0 Off |                  N/A |
| N/A    0C    P5               5W /  15W |      0MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

やらないこと †

なんだか、このセクションに書いてあるようなことを数か月ごとに繰り返しているような気がする
- やったことを忘れて再挑戦→うまく行かない→同じ結論にたどり着く→その経緯を忘れて再挑戦→
ということで、今回の試行錯誤の結果を残しておく

OSに直接 CUDA CuDNN Tensorflow をインストールしない †

ちゃんと環境を設定するの無理
こんなことをやらなきゃいけない
- インストールする Tensorflow が対象としている CUDA CuDNN のバージョンを調べる
  - https://github.com/tensorflow/tensorflow/blob/v2.16.1/RELEASE.md
- CCUA のインストール
  - https://developer.nvidia.com/cuda-toolkit-archive
- CuDNN のインストール
  - https://docs.nvidia.com/deeplearning/cudnn/installation/linux.html#installing-on-linux
- TensorRT のインストール
  - https://docs.nvidia.com/tensorrt/index.html
- これでようやく Tensorflow をインストールできる
  - pip install tensorflow[and-cuda]
ちょっとでも間違うと Tensorflow が動かない
バージョンアップしようと思ったら OS ごとまっさらに戻す必要がある
[結論] OSに自前で CUDA CuDNN ヲインストールせずに Google の公開している Tensorflow イメージを使おう

不用意にNvidiaのドライバをインストールしない †

よし分かった Google の公開している Tensorflow-gpu のコンテナイメージを使おう
どうせだから最新のイメージを使いたいんで、Windowsのグラフィックドライバ (対応する CUDA のバージョン) を最新化しよう
- https://www.nvidia.com/download/index.aspx
- T550 は、Nvidia RTX/Quadro - Nvidia RTX Series (Notebooks) ファミリ
→ Nvidia の最新のグラフィックドライバをインストールすると WSL から仮想GPU が使えなくなる
- 例えば nvidia-smi が segmentation falult で落ちる。解決策はない
幸い Windows のドライバは、ロールバックできる
[結論] Dockerイメージは最新版ではなく、今の環境で動くものを使おう

Docker と nvidia-container-toolkit のインストール †

cf. Win11 WSL2/Docker及Nvidia GPU

ベースイメージの選定 †

インストールマニュアル https://www.tensorflow.org/install/docker
Tensorflow と CUDA のバージョン対応表
- https://www.tensorflow.org/install/source#gpu
- nvidia-smi コマンドの実行結果によると、現在の環境の CUDA バージョンは 12.2 なので、Tensorflow 2.15.0 を使えばよさそう
Docker Image
- https://hub.docker.com/r/tensorflow/tensorflow
- イメージの書式 tensorflow/tensorflow:${version}[-gpu][-jupyter]

お試し実行

2.15.0 で動きそうだけど CUDA Version >=12.3 というエラーで動かない

2.14.0 では動くのでこれをベースイメージにしよう

$ docker run --runtime=nvidia --gpus all -it --rm tensorflow/tensorflow:2.14.0-gpu-jupyter python -c \
"import tensorflow as tf; \
print(tf.config.list_physical_devices('GPU')); \
physical_devices = tf.config.list_physical_devices('GPU'); \
print(tf.config.experimental.get_device_details(physical_devices[0]))"

2024-03-09 06:50:06.625119: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-09 06:50:06.648882: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-09 06:50:06.648962: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-09 06:50:06.648992: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-09 06:50:06.653614: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-09 06:50:08.600246: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-09 06:50:08.608037: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-09 06:50:08.608193: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2024-03-09 06:50:08.608538: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
{'compute_capability': (7, 5), 'device_name': 'NVIDIA T550 Laptop GPU'}

ふぅ、実際やってみるといろいろあるね

VSCode DevContainer の作成 †

VSCode で Docker コンテナにログインして、そこで開発を行う仕組み
WSL2/Ubuntu22 上で、次のようなディレクトリ構成に Dockerfile や devcontainer.json を配置する
https://github.com/kagyuu/KerasExam

Dockerfile †

https://github.com/kagyuu/KerasExam/blob/main/.devcontainer/Dockerfile

FROM tensorflow/tensorflow:2.14.0-gpu-jupyter

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Asia/Tokyo

# update the ubuntu os image and install mandatory libs.
# CAUTION: Don't upgrade packages, especially the CUDA and CuDNN.
RUN apt update &&  \
    apt install -y \
        sudo wget curl apt-utils locales bash-completion fonts-noto-cjk graphviz && \
    apt clean && \
    rm -rf /var/lib/apt/lists/*

# install python library
RUN /usr/local/bin/pip3 install --upgrade pip && \
    /usr/local/bin/pip3 install \
    autopep8 black yapf bandit flake8 mypy pycodestyle pydocstyle pylint \
    jupyterlab-language-pack-ja-JP keras-tqdm pydot pillow pandas bokeh matplotlib && \
    /usr/local/bin/pip3 cache purge

# create vscode user
ARG USERNAME=vscode
ARG GROUPNAME=vscode
ARG USER_UID=1000
ARG USER_GID=$USER_UID

RUN groupadd -g 1000 ${GROUPNAME} && \
    useradd -m -s /bin/bash -u ${USER_UID} -g ${USER_GID} ${USERNAME} && \
    echo ${USERNAME} ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/${USERNAME} && \
    chmod 0440 /etc/sudoers.d/${USERNAME} && \
    localedef -f UTF-8 -i ja_JP ja_JP.UTF-8

# install poerty for vscode user
USER vscode
ARG WORKDIR
RUN curl -sSL https://install.python-poetry.org | python3 -
RUN echo "export PATH=/home/vscode/.local/bin:$PATH" >> /home/vscode/.bashrc
RUN /home/vscode/.local/bin/poetry completions bash | sudo tee /etc/bash_completion.d/poerty.bash-completion > /dev/null

Googleのtensorflow/tensorflowイメージには、ほぼTensorflowの開発に必要なものがそろっている
- tensorflow/tensorflowに含まれるパッケージ
VSCode devcontainerの作法として、WSL2のユーザの UID/GID と同じ vscode ユーザーを作成する
一応 poetry をインストールしてるけど使わないかな、Docker を使って環境の分割をしているから。必要なライブラリがあれば Dockerfile を変えて再ビルドするのが良いだろう

devcontainer.json †

https://github.com/kagyuu/KerasExam/blob/main/.devcontainer/devcontainer.json

{
	"name": "KerasExam",
	"build": {
		"dockerfile": "Dockerfile",
		"context": ".."
	},

	// Configure tool-specific properties.
	"customizations": {
		// Configure properties specific to VS Code.
		"vscode": {
			// Set *default* container specific settings.json values on container create.
			"settings": { 
				"python.defaultInterpreterPath": "/usr/local/bin/python3.11",
				"python.formatting.autopep8Path": "/usr/local/bin/autopep8",
				"python.formatting.blackPath": "/usr/local/bin/black",
				"python.formatting.yapfPath": "/usr/local/bin/yapf"
			},
			
			// Add the IDs of extensions you want installed when the container is created.
			"extensions": [
				"ms-python.python",
				"ms-python.vscode-pylance",
				"ms-azuretools.vscode-docker", // To avoid the vscode on the devcontaier recommends it.
				"oderwat.indent-rainbow",
				"github.copilot",
				"github.copilot-chat"
			]
		}
	},

	"runArgs" : [
		"--runtime=nvidia",
		"--gpus","all",
		"--add-host=host.docker.internal:host-gateway"
	],

	// Use 'forwardPorts' to make a list of ports inside the container available locally.
	// 8888 : jupyter notebook
	// 6006 : TensorBoard
	"forwardPorts": [6006, 8888],

	// Use 'postCreateCommand' to run commands after the container is created.
	// "postCreateCommand": "pip3 install --user -r requirements.txt",

	// Comment out to connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
	"remoteUser": "vscode"
}

Dockerの起動設定

GPUを使う上でのみそは、Dockerの起動パラメータ

		"--runtime=nvidia",
		"--gpus","all",
		"--add-host=host.docker.internal:host-gateway"

launch.json †

https://github.com/kagyuu/KerasExam/blob/main/.vscode/launch.json

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
    ]
}

VSCode の起動時に動かすプログラムの設定。ひとまず何もしない

VSCode DevContainer に入る †

WSL2に入る [F1]-[WSL:Connect to WSL]
右下に WSL:Ubuntu と接続しているアイコンが表示される
WSL2で、KerasExam? フォルダを開く
DevContaienr?に入る [F1]-[Dev　Containers：Reopen in Container]
- ここで必要があれば docker build が自動的に走る
DevContainerに入った#ref(r1.png)

VSCode DevContainer を使ってみる †

WSL2 上の /home/atsushi/projects/KerasExam? が、/workspaces/KerasExam? としてマウントされる

GPUを列挙する簡単な Tensorflow サンプルを実行してみる

https://github.com/kagyuu/KerasExam/blob/main/python/list_gpu.py

import tensorflow as tf;
from tensorflow.python.platform import build_info as tf_build_info

for physical_device in tf.config.list_physical_devices('GPU') :
    print(tf.config.experimental.get_device_details(physical_device))

print("cudnn_version",tf_build_info.build_info['cudnn_version'])
print("cuda_version",tf_build_info.build_info['cuda_version'])

vscode@a976b6555d9c:/workspaces/KerasExam$ python3 list_gpu.py 
2024-03-13 14:57:28.335052: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-13 14:57:29.023909: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-13 14:57:29.023980: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-13 14:57:29.027485: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-13 14:57:29.299870: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-13 14:57:34.610492: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-13 14:57:34.672764: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-13 14:57:34.672926: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-13 14:57:34.673118: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
{'compute_capability': (7, 5), 'device_name': 'NVIDIA T550 Laptop GPU'}

良かろうと思います。
ちなみに NUMA は、ノード（CPU+メモリ+GPU) をつないでクラスタを作ったときに使うもの。Non-uniform memory access

Junpyter note を使ってみる †

Devcontainer の Terminal から

 $ jupyter notebook

で Jupter Note を起動

これでよかろうと思います。

なんか Unable to register cuDNN とか出てるのはイイの？ †

WSL2環境で出るようだけど、どこ探しても解決策なし
試しに Tensorflow の Quick Start チュートリアル (MNIST) を動かしてみる

https://www.tensorflow.org/tutorials/quickstart/beginner

https://github.com/kagyuu/KerasExam/blob/main/python/tutorial.py

import tensorflow as tf

print("TensorFlow version:", tf.__version__)

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

実行結果

2024-03-16 15:28:23.538492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2080 MB memory:  -> device: 0, name: NVIDIA T550 Laptop GPU, pci bus id: 0000:03:00.0, compute capability: 7.5
Epoch 1/5
2024-03-16 15:28:26.145357: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ff77c242340 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-03-16 15:28:26.145404: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA T550 Laptop GPU, Compute Capability 7.5
2024-03-16 15:28:26.149143: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-03-16 15:28:26.310207: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
2024-03-16 15:28:26.399600: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
1875/1875 [==============================] - 12s 5ms/step - loss: 0.2176 - accuracy: 0.9352
Epoch 2/5
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0967 - accuracy: 0.9702
Epoch 3/5
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0677 - accuracy: 0.9786
Epoch 4/5
1875/1875 [==============================] - 10s 6ms/step - loss: 0.0533 - accuracy: 0.9829
Epoch 5/5
1875/1875 [==============================] - 12s 6ms/step - loss: 0.0431 - accuracy: 0.9864
313/313 [==============================] - 1s 4ms/step - loss: 0.0650 - accuracy: 0.9810

Loaded cuDNN version 8600 ってログに出てるし動いているっぽい
GPUも使われている
なんか　GPUメモリが 2GB しか認識されてないけど・・・
- WSL2 の制限みたい https://fizzylogic.nl/2023/01/05/how-to-configure-memory-limits-in-wsl2
- まぁいいか。（人間の）学習用にはこれで十分

Deep Learning#Keras

添付ファイル:

gpu.png 380件 [詳細]

gpulist.png 409件 [詳細]

r5.png 422件 [詳細]

r4.png 457件 [詳細]

r3.png 417件 [詳細]

r2.png 410件 [詳細]

r1.png 424件 [詳細]

devcontainer1.png 405件 [詳細]

gdriver.png 442件 [詳細]

cuda.png 321件 [詳細]

Last-modified: 2024-03-17 (日) 00:59:03 (485d)

Short-URL: http://at-sushi.com/pukiwiki/index.php?cmd=s&k=1232a09943

ISBN10		ISBN13
		9784061426061