Jump to content
XPEnology Community

Develop and refine the DS3622xs+ loader


yanjun

Recommended Posts

@Peter Suh @yanjun

 

I tried to rebuild the loader yesterday and was checking if the loader during build found both SAS controllers and it does: 

 

Found SAS Controller : pciid 1000d00000064  Required Extension : mpt3sas
Searching for matching extension for mpt3sas
Found matching extension : 
"https://raw.githubusercontent.com/pocopico/rp-ext/master/mpt3sas/rpext-index.json"

 

Found SAS Controller : pciid 1000d00000072  Required Extension : mpt3sas
Searching for matching extension for mpt3sas
Found matching extension : 
"https://raw.githubusercontent.com/pocopico/rp-ext/master/mpt3sas/rpext-index.json"

 

Seems it sees both and for both its applying mpt3sas. 

 

Now I am trying to figure it out how to look on "fdisk -l" for @yanjun but this think doesnt work - https://xpenology.com/forum/topic/54694-howto-synology-start-telnetd-service-when-you-want-to-troubleshoot/

So I am little bit stuck how to get to underlying system and check the parameters. 

 

Link to comment
Share on other sites

2 hours ago, idle said:

This is wonderful!

why? anything special that it would have other then 3622? as its also broadwellnk afaik it will have the same limits and drivers as 3622

imho THAT looks promising and would add some interesting stuff

https://github.com/dogodefi/redpill-load/tree/develop/config/DVA3221

Edited by IG-88
  • Like 1
Link to comment
Share on other sites

@IG-88 so I played a bit with DVA3221 poc loader, and regarding Synology "Photo" face detection, it works with or without valid SN. But I agree we don't need/want a DVA3221 for synology photo.

image.png.a95be58b5d2b7a45f771fe41e3021dbd.png

 

Surveillance Station, it is a bit more complicated. Nvidia runtime latest release freeze at start. You can install an older one, and then Surveillance Station works, but consume 100% of one core of the CPU (25% of 4 cores)... feature like object detection or face detection don't work.

Edited by Orphée
Link to comment
Share on other sites

3 hours ago, Orphée said:

Surveillance Station, it is a bit more complicated. Nvidia runtime latest release freeze at start. You can install an older one, and then Surveillance Station works, but consume 100% of one core of the CPU (25% of 4 cores)... feature like object detection or face detection don't work.

as long as nvidia drivers from the base image and the NVIDIARuntimeLibrary is working it would help some people as it would be kind of ootb nvidia support for plex, emby, jellyfin and docker and especially with docker there might be a lot more use cases

along that there might be people looking into SS and there might a patch like there is for transcoding without serial number or nvme

downsides are the missing i915 support and the max. 16 cpu cores (or 8 + HT)

i'm kind of interested to know is i915 can be added with the kernel from synology (in 3615/17 and kernel 3.10 it did not work as there where parts in the kernel missing like we have seen with hyper-v support)

and with more people looking into DVA and SS there might be possibilities to add this stuff to other non DVA units (the nvidia kernel driver for 3615/17 and 918+ was not problem with 6.2.3 so i guess it will be possible for 7.0.1/7.1 if we need)

 

atm i'm kind of at the fence with new units and platforms every week, if there is so much choice and differences (and repository's) it will be hard to track problems or if people mix things from different repository's ..., kind of nightmare to try to help

dva unit is the most interesting so far, i dont need 24/32/64 cpu core support and i think in that kind o scenario a dedicated hypervisor like proxmox will be better then dsm's vmm or docker

or >24 disks - if anything i'm going to downsize and user bigger disks an the next refresh and lower the number from 12 to 3-6 disks and i guess a lot of people will do that if they still run with 2 or 4 TB disks (when in comes to price per TB then 14-18TB per disk are possible now)

 

Link to comment
Share on other sites

Well curently I can't confirm NvidiaRuntimeLibrary works as expected.
The latest release doesn't start with a binary stuck consuming 1core at 100%

And older release seems ok but then it is SS consuming CPU.

I wouls need to test with something else to confirm but I have no clue what to try.

Don't know if the passthrough is really working (it seems lspci show it with nvidia module loaded) or if there is some control on NvidiaRuntimeLibrary... do you have an easy way to test the GPU ? (I don't want to fight with plex and there account creation policy)

 

Edit :

 

# lspci -kkqv -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation TU117 [GeForce GTX 1650] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company Device 8558
        Flags: bus master, fast devsel, latency 0, IRQ 31
        Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at e0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 8000 [size=128]
        [virtual] Expansion ROM at fc000000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nvidia
		
# nvidia-smi 
Fri Mar  4 19:16:40 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 440.44       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    On   | 00000000:01:00.0 Off |                  N/A |
| 30%   25C    P8     4W /  75W |      0MiB /  3911MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

 

Edited by Orphée
Link to comment
Share on other sites

Missing something I guess

 

# cat docker-compose.yml 
services:
  test:
    image: nvidia/cuda:10.2-base
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

# docker-compose up -d
Starting nvidia_test_1 ... error

ERROR: for nvidia_test_1  Cannot start service test: could not select device driver "nvidia" with capabilities: [[gpu]]

ERROR: for test  Cannot start service test: could not select device driver "nvidia" with capabilities: [[gpu]]
ERROR: Encountered errors while bringing up the project.

 

Edit : the older NvidiaRuntimeLibrary I'm able to run does not have the following binary running with latest release (and stuck)

/var/packages/NVIDIARuntimeLibrary/target/cuda/bin/deviceQuery

Edited by Orphée
Link to comment
Share on other sites

@IG-88 reading from there :

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#verify-driver

 

The command should return a result, and currently for me the program stuck... I probably miss something...

On Proxmox I enabled the PCI passed through as primary GPU, maybe I should not, or something else fail somewere...

 

I renamed it to avoid CPU running high, and check what happens when i manually run it.

 

# ./deviceQuery-KO 
./deviceQuery-KO Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

^C^C^C^

 

it freeze and can't be killed. even kill -9 doesn't work

 

Edit :

Disabling primary GPU is worse

 

root@DVA3221:/var/packages/NVIDIARuntimeLibrary/target/cuda/bin# nvidia-smi
Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error
root@DVA3221:/var/packages/NVIDIARuntimeLibrary/target/cuda/bin# ./deviceQuery-KO
./deviceQuery-KO Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 101
-> invalid device ordinal
Result = FAIL

 

Edited by Orphée
Link to comment
Share on other sites

So i was able to block CPU high usage by renaming 2 binary

 

/var/packages/SurveillanceStation/target/synoface/bin/synofaced

/var/packages/SurveillanceStation/target/synodva/bin/synodvad

 

(must reboot after renaming them to .old)

 

Surveillance Station still works for normal operation with simple movement detection like normal SS does on DS models. But 8 camera available instead of 2.

 

Edit : I will try as baremetal to see how it goes.

Edited by Orphée
Link to comment
Share on other sites

OK - I finally got things up and running, after a faulty 9305-24i - I was able to replace it with another one and things are stable.  Some observations re: SMART data:

 

1. SMART does NOT work properly with SAS drives - I have 12 HGST 10tb SAS 3 drives and none of them display smart stats - now this is not entirely unprecedented as I am reading on the TrueNAS forums that they see the same thing with SAS drives and not being able to see SMART data.

 

2. SMART data DOES work properly with SATA drives connected to the SAS card.  I have 4 SSDs connected to the HBA and everything works as expected. 

 

So if anyone was worried about smart data with this image and mpt3sas - I think that everything should be ok.

 

 

Link to comment
Share on other sites

26 minutes ago, synoxpe said:

Trying to install ds3622xs on ESXi 7. DSM installation fails at 55% complaining about corrupted file after I upload the .pat file. I’ve tried both with USB and SATA boot and both have the same result.

check your usb pid/vid and also the serial generated is for ds3622xs

 

Link to comment
Share on other sites

Hi guys,

So as on my installation, currently, I can't trust Smart DATA with LSI passthrough, I decided to give up PCI passthrough and only map disk on proxmox level

image.thumb.png.6b3408a7601546e43dd2e309f0b89829.png

 

Actually, Proxmox has a better handling of disk SMART data than ESXi (in my opinion)

I can't remember a easy GUI access to SMART disk DATA on ESXi, whereas on Proxmox :

image.thumb.png.00777bb70e687df8e17ae901a3c56219.png

So I will rely on Proxmox host to monitor SMART data from now on.

Link to comment
Share on other sites

11 hours ago, Orphée said:

So as on my installation, currently, I can't trust Smart DATA with LSI passthrough, I decided to give up PCI passthrough and only map disk on proxmox level

Keen to hear why not? On the contrary I've blacklisted the ahci driver at PVE level so that it doesn't fiddle with the disks on booting up and DSM gets a full PCI passthrough.

Link to comment
Share on other sites

In DSM I only have partial SMART data working.
At disk level, click on smart, and the 5 lines are only : -
No temp, no error number...

Then in S.M.A.R.T menu advanced/detailed, there, yes, SMART data appears.

Both with ESXi and Proxmox. Using LSI 9207-8i.

image.png.663675fddd0f543fbd1354677c8aa7b7.thumb.png.ff502fa7d2cf06fdef8afff62ceb8300.png

Edited by Orphée
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...