なんか重い処理を用意する †
- 姫野ベンチ
- make
$ sudo yum -y install gcc make
$ wget http://accc.riken.jp/wp-content/uploads/2015/07/himenobmt.c.zip
$ unzip himenobmt.c.zip
$ lha himenobmt.c.lzh
Makefile の MODEL を MODEL = MIDDLE に変更して
$ make
- 実行してみる
$ ./bmt
mimax = 257 mjmax = 129 mkmax = 129
imax = 256 jmax = 128 kmax =128
cpu : 7.016288 sec.
Loop executed for 200 times
Gosa : 1.245715e-03
MFLOPS measured : 3908.195787
Score based on MMX Pentium 200MHz : 121.109259
bsub (JOB投入) †
$ su lsfadmin
$ bsub -e /tmp/err.txt -o /tmp/std.txt /usr/local/bin/bmt
Job <103> is submitted to default queue <normal>.
- options
-e err_file | 標準エラーの出力先ファイル |
-o out_file | 標準出力の出力先ファイル |
-q queue | 実行Queue名 |
-m host | 実行ホスト |
- 実行結果 /tmp/std.txt
Sender: LSF System <lsfadmin@lsf1>
Subject: Job 103: </usr/local/bin/bmt> in cluster <cluster1> Done
Job </usr/local/bin/bmt> was submitted from host <lsf1> by user <lsfadmin> in cluster <cluster1>.
Job was executed on host(s) <lsf1>, in queue <normal>, as user <lsfadmin> in cluster <cluster1>.
</home/lsfadmin> was used as the home directory.
</usr/share/lsf> was used as the working directory.
Started at Results reported on
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
/usr/local/bin/bmt
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 7.36 sec.
Max Memory : -
Average Memory : -
Total Requested Memory : -
Delta Memory : -
Max Swap : -
Max Processes : -
Max Threads : -
Run time : 27 sec.
Turnaround time : 16 sec.
The output (if any) follows:
mimax = 257 mjmax = 129 mkmax = 129
imax = 256 jmax = 128 kmax =128
cpu : 6.616366 sec.
Loop executed for 200 times
Gosa : 1.245715e-03
MFLOPS measured : 4144.424175
Score based on MMX Pentium 200MHz : 128.429630
PS:
Read file </tmp/err.txt> for stderr output of this job.
- root では実行できない
# bsub pwd
User permission denied. Job not submitted.
bjobs (生きているJOBの一覧) †
$ bsub /usr/local/bin/bmt
Job <104> is submitted to default queue <normal>.
$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
104 lsfadmi RUN normal lsf1 lsf1 *l/bin/bmt Dec 3 08:13
- options
(default) | 自分のJOBを表示 |
-u user_name | user_name ユーザのJOBを表示 |
-u user_group | user_group グループのJOBを表示 |
-u all | 全ユーザのJOBを表示 |
bkill (JOBの停止) †
$ bsub /usr/local/bin/bmt
Job <105> is submitted to default queue <normal>.
$ bkill 105
Job <105> is being terminated
- 動作
- bkill は、最初に SIGINT と SIGTERM を送信
- lsb.param の JOB_TERMINATE_INTERVAL で指定した秒数後に SIGKILL を送信
- options
-u user_name | user_name ユーザのJOBを全停止 |
-u user_group | user_group グループのJOBを全停止 |
-u all | 全ユーザのJOBを全停止 |
bhist (JOBの実行履歴) †
$ bhist 105
Summary of time in seconds spent in various states:
JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
105 lsfadmi *bin/bmt 1 0 12 0 0 0 13
bpeek (実行中コマンドの標準出力を見る) †
$ bsub /usr/local/bin/bmt
Job <106> is submitted to default queue <normal>.
$ bpeek 106
<< output from stdout >>
管理コマンド (lsid, lshosts, lsload, bhosts) †
- LSFクラスタの状態を表示
$ lsid
IBM Spectrum LSF Community Edition 10.1.0.0, Jun 15 2016
Copyright IBM Corp. 1992, 2016. All rights reserved.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
My cluster name is cluster1
My master name is lsf1
- LSFホストの一覧
$ lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
lsf1 X86_64 PC6000 116.1 2 1023M 1.9G Yes (mg)
- LSFホストの負荷
$ lsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
lsf1 ok 0.1 0.0 0.1 1% 0.1 1 0 2587M 1.6G 338M
- LSFホストごとのJOB状態一覧
$ bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
lsf1 ok - 2 0 0 0 0 0
queue の設定 †
/usr/share/lsf/conf/lsbatch/cluster1/configdir/lsb.queues
...
Begin Queue
QUEUE_NAME = normal
PRIORITY = 30
INTERACTIVE = NO
FAIRSHARE = USER_SHARES[[default,1]]
#RUN_WINDOW = 5:19:00-1:8:30 20:00-8:30
#r1m = 0.7/2.0 # loadSched/loadStop
#r15m = 1.0/2.5
#pg = 4.0/8
#ut = 0.2
#io = 50/240
#CPULIMIT = 180/hostA # 3 hours of host hostA
#FILELIMIT = 20000
#DATALIMIT = 20000 # jobs data segment limit
#CORELIMIT = 20000
#TASKLIMIT = 5 # job task limit
#USERS = all # users who can submit jobs to this queue
#HOSTS = all # hosts on which jobs in this queue can run
#PRE_EXEC = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC = /usr/local/lsf/misc/testq_post |grep -v "Hey"
#REQUEUE_EXIT_VALUES = 55 34 78
#APS_PRIORITY = WEIGHT[[RSRC, 10.0] [MEM, 20.0] [PROC, 2.5] [QPRIORITY, 2.0]] \
# LIMIT[[RSRC, 3.5] [QPRIORITY, 5.5]] \
# GRACE_PERIOD[[QPRIORITY, 200s] [MEM, 10m] [PROC, 2h]]
DESCRIPTION = For normal low priority jobs, running only if hosts are \
lightly loaded.
End Queue
...
まぁ、そういうことですたい
LSF