Linux常见问题

主要用来记录一些 ubuntu(linux) 使用过程中碰到的问题

1. 如何挂载磁盘

（１）找到未分配的磁盘，　这里是　/dev/sdb

$ sudo fdisk -l  # 找到未分配的磁盘
Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x76738fb3

（２）对该磁盘使用fdisk 命令建立分区:

sudo fdisk /dev/sdb
# m -> n -> 一路回车　－> w(保存退出)
$ sudo fdisk -l #　此时可以发现设备
Device     Boot Start        End    Sectors  Size Id Type
/dev/sdb1        2048 3907029167 3907027120  1.8T 83 Linux

（３）格式化分区，　并建立文件系统

$ sudo mkfs -t ext4 /dev/sdb1  # -t　指定文件系统格式

（４）挂载磁盘到指定位置

$ mount /dev/sdb1 /home/data/
# 区分 sdb和sdb1, sdb表示磁盘，　sdb1则表示该磁盘上的分区

2. 如何查询端口所运行的程序

$ sudo netstat -antp | grep 7000
tcp6       0      0 :::7000                 :::*                    LISTEN      12464/frps      
$ ps 12464
  PID TTY      STAT   TIME COMMAND
12464 pts/0    Sl     0:00 ./frps

$ lsof -i:7000   # lsof:list open files -> -i 
COMMAND   PID   USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
frps    12464 ubuntu    3u  IPv6 7542760      0t0  TCP *:afs3-fileserver (LISTEN)
$ ps -fe | grep 12464
ubuntu   12464  6446  0 00:29 pts/0    00:00:00 ./frps
ubuntu   12707  6446  0 00:32 pts/0    00:00:00 grep --color=auto 12464

3. gdm3、lightdm 和 kdm 的认识

Q: 当我的 win10/mac 电脑想要控制 ubuntu 电脑时，一切操作无误的情况下，在点击 “远程协助” 按钮时，进去后过一会儿就会出现“连接已断开”的提示。
S：在打开向日葵客户端的情况下，打开命令 ubuntu 行窗口

sudo apt-get update
sudo apt-get upgrade
sudo install lightdm

重启Ubuntu系统即可远程成功

R：了解一下gdm3、lightdm、kdm

维基百科：显示管理器向用户显示登录屏幕。当用户成功输入用户名和密码的有效组合时，会话开始。

gdm3 是 gdm的继承者，它是GNOME显示管理器。更新的gdm3 使用了最小的gnome-shell 版本，并提供了与GNOME3会话相同的外观和感觉。
lightDM，即：Light Display Manager，是一个全新的、轻量的Linux桌面的桌面显示管理器
kdm 是 kde 管理器的显示。但在 KDE5中，它被否决为 SDDM，它更适合作为显示管理器，因此在默认情况下，它是在屏幕。

简单理解，这三个只是不同版本的显示管理器而已，当你的 ubuntu 系统安装了多个显示管理器时，(以 lightdm 切换到 gdm3 为例）可以用 sudo dpkg-reconfigure gdm3 来进行切换。

4. xrandr 使用

装了一个侧屏，需要使用 xrandr 来进行相关设置：

xrandr : 列出可用的显示设备

zhaozhichao@zhaozhichao-MS-7B24:~/Desktop/sany/classification$ xrandr
Screen 0: minimum 8 x 8, current 3000 x 1920, maximum 32767 x 32767
DP-0 connected primary 1920x1080+1080+520 (normal left inverted right x axis y axis) 598mm x 336mm
   1920x1080     60.00*+
   1600x900      60.00  
   1280x1024     75.02    60.02  
   1152x864      75.00  
   1024x768      75.03    60.00  
   800x600       75.00    60.32  
   640x480       75.00    59.94  
DP-1 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
DP-4 connected 1080x1920+0+0 left (normal left inverted right x axis y axis) 598mm x 336mm
   1920x1080     60.00*+
   1600x900      60.00  
   1280x1024     75.02    60.02  
   1152x864      75.00  
   1024x768      75.03    60.00  
   800x600       75.00    60.32  
   640x480       75.00    59.94  
DP-5 disconnected (normal left inverted right x axis y axis)
USB-C-0 disconnected (normal left inverted right x axis y axis)

设置分辨率

xrandr --output eDP1 --mode 1280x1024_60.00

双屏设置

xrandr --output DP-4 --left-of  DP-0 --auto // DP-4 作为 DP-0 的左屏幕显示  
                                            //  --left-of    
                                            // --right-of

屏幕克隆

xrandr --output VGA-0 --same-as DVI-D-0 --auto

设置左旋转/右旋转/正向

xrandr --output DP-0 --rotate normal // left、right、normal

设置主屏幕
```
xrandr --output HDMI2 --auto --primary
```

5. U盘/硬盘 mount 故障

$ sudo mount /dev/sdb1 /mnt
$MFTMirr does not match $MFT (record 0).
Failed to mount '/dev/sdb1': Input/output error
NTFS is either inconsistent, or there is a hardware fault, or it's a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows twice. The usage of the /f parameter is very
important! If the device is a SoftRAID/FakeRAID then first activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for more details.

S：利用 ntfsprogs utility 包里的工具 ntfsfix 修理一下，感觉应该是把类似链接号什么的修理好：

$ sudo ntfsfix /dev/sdb1
Mounting volume... FAILED
Attempting to correct errors...
Processing $MFT and $MFTMirr...
Reading $MFT... OK
Reading $MFTMirr... OK
Comparing $MFTMirr to $MFT... FAILED
Correcting differences in $MFTMirr record 0...OK
Processing of $MFT and $MFTMirr completed successfully.
Setting required flags on partition... OK
Going to empty the journal ($LogFile)... OK
NTFS volume version is 3.1.
NTFS partition /dev/sdb1 was processed successfully.

6. 区分 profile 和 bashrc

	生效时间	针对用户
/etc/profile	重启生效	所有用户
/etc/bashrc	重新打开一个 bash 生效	所有用户
~/.bash_profile或 ~/.profile	重启生效	当前用户
~/.bashrc	重新打开一个 bash 生效	当前用户

7. /etc/ld.so.conf 与ldconfig

ldconfig命令 的用途主要是在默认搜寻目录/lib和/usr/lib以及动态库配置文件/etc/ld.so.conf内所列的目录下，搜索出可共享的动态链接库（格式如lib*.so*）,进而创建出动态装入程序(ld.so)所需的连接和缓存文件
往 /lib 和 /usr/lib 里面加东西，是不用修改 /etc/ld.so.conf 的，但是完了之后要调一下 ldconfig，不然这个 library 会找不到。
想往上面两个目录以外加东西的时候，一定要修改 /etc/ld.so.conf，然后再调用 ldconfig，不然也会找不到。比较常规的操作是，自己生成了一个动态链接库，然后将路径添加到 /etc/ld.so.conf，然后执行 ldconfig。这样就可以在系统中调用该动态链接库了。

8. NVIDIA 相关软件的知识

GPU: 硬件，主流的是 Nvidia 的 GPU(现在流行的是 RTX 2080Ti)，深度学习本身需要大量计算。 GPU 的并行计算能力，在过去几年里恰当了满足了深度学习的需求。 AMD 的 GPU 基本没有什么支持，可以不用考虑。
NVIDIA Driver: 硬件接口，没有显卡驱动，就不能识别 GPU 硬件，不能调用其计算资源。
CUDA: 是 NVIDIA 推出的只能用于自家GPU的并行计算框架。只有安装这个框架才能够进行复杂的并行计算。主流的深度学习框架也都是基于CUDA进行GPU并行加速的，几乎无一例外。
cudnn: 针对深度卷积神经网络的加速库。
Tensorflow、pytorch、mxnet、paddle: 在 CUDA 和 cudnn 之上的深度学习框架。

9. 显卡安装的知识

(1) 硬件：插上 GPU
(2) 驱动: [去官网下载对应的驱动程序]

a. 首先屏蔽 nouveau

sudo vim /etc/modprobe.d/blacklist-nouveau.conf

在其中加入

blacklist nouveau

b. 按照 ctrl + alt + F1 进入 tty1, 然后使用如下命令关掉 X server

sudo /etc/init.d/lightdm stop

c. 安装对应的驱动程序
d. 重启 X server

sudo /etc/init.d/light restart

（3） CUDA & cudnn

# Download CUDA   https://developer.nvidia.com/cuda-downloads
$ chmod a+x cuda-repo-ubuntu1804-10-1-local-10.1.168-418.67_1.0-1_amd64.deb 
$ sudo dpkg -i ./cuda-repo-ubuntu1804-10-1-local-10.1.168-418.67_1.0-1_amd64.deb 
$ sudo apt-key add /var/cuda-repo-10-1-local-10.1.168-418.67/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda-10-1 -y

# Download cudnn from https://developer.nvidia.com/cudnn
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

别忘记执行以下 sudo ldconfig

10. ubuntu 循环登录问题

这个问题是一个很常见的问题，我这次碰到的情况是新增加的账户不能正常登陆, 这里是由于useradd的时候没有关联到对应的/home下的文件夹所致, 处理如下:

（1）使用: userdel -r 用户名 删除用户

（2）使用 useradd -m 用户名 添加用户 -> 会在/home目录下创建同名文件夹

（3）添加用户到sudoer

sudo visudo
在%sudo行下面user ALL=(ALL) ALL

11. linux productivity tools

朋友分享了一份 linux productivity tools，用空去学习学习

参考网址为：https://news.ycombinator.com/item?id=23229241

12. linux 开机自启动

方式一. 使用自带开机脚本

使用 /etc/rc.local 文件，在 ubuntu18.04 中可以自己新建这个文件。

vim /etc/rc.local

文件的具体内容如下所示，将开机自启动命令加在 exit 0 之前。

#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

exit 0

修改执行权限即可

chmod +x /etc/rc.local

方式二. 添加开机脚本

(1) 在　/etc/init.d/ 下新建一个文件 auto_start.sh，　内容如下所示，讲待执行命令填写在 exit 0 之前。　

#!/bin/bash  
# command content      

exit 0

(2) 更改脚本权限:

sudo chmod +x auto_start.sh

(3) 将脚本添加到启动脚本, 执行如下指令即可，在这里90表明一个优先级，越高表示执行的越晚。

cd /etc/init.d/  
sudo update-rc.d auto_start.sh defaults 90

Note:　可以使用如下命令移除开机脚本：

sudo update-rc.d -f new_service.sh remove

13. NVIDIA 常见命令

nvidia-smi

在进行深度学习实验时，GPU 的实时状态监测十分有必要。今天详细解读一下 nvidia-smi 命令。

Fri Aug  2 10:10:08 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
| 31%   43C    P8    12W / 250W |    223MiB / 10981MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1614      G   /usr/lib/xorg/Xorg                            14MiB |
|    0      4804      G   /usr/lib/xorg/Xorg                           207MiB |
+-----------------------------------------------------------------------------+

上图是服务器上 GeForce GTX 1080 Ti 的信息，下面一一解读参数。上面的表格中的信息与下面的四个框的信息是一一对应的：

GPU：GPU 编号； Name：GPU 型号； Persistence-M：持续模式的状态。
Fan：风扇转速(0~100%) Temp：温度(摄氏度) Perf：性能状态，从P0(小)到P12(大)　Pwr:Usage/Cap：能耗

Bus-Id：GPU总线　　　　Disp.A：Display Active，表示GPU的显示是否初始化；
Memory Usage：显存使用率；

Volatile GPU-Util：浮动的GPU利用率　　　Uncorr. ECC：Error Correcting Code，错误检查与纠正
Compute M：compute mode，计算模式。

下方的 Processes 表示每个进程对 GPU 的显存使用率

nvidia-smi -L

第二个命令：nvidia-smi -L, 该命令用于列出所有可用的 NVIDIA 设备信息。

watch -n1 nvidia-smi

每1秒检查一次GPU的使用情况

查看CUDA　和 cudnn　版本信息

查看 CUDA 版本：

$ cat /usr/local/cuda/version.txt
CUDA Version 10.0.130

也可以使用如下命令:

nvcc --version

查看 CUDNN 版本：

$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
rep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

工具

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

Linux基础知识 Previous

ubuntu常见的软件安装方式 Next