Per server : 2 x DAS HP D3700 with 1.8TB 12G 10K SAS disks configured as Raid 6 with HP p441 controller Dual 10 Gbit network cards (HP 560FLR) supporting the offloading of SMB v3 (RDMA capabale) HP D元80 Gen9 with dual CPU Intel E5-2660 v4 2Ghz, 64 GB of RAM, raid 1 SSD for the OS I am asking because I'm struggling with an infrastructure that should have worked ok: Do you have per VM backup chains ? How many task in parallel do you allow on your repository ?Ĥ) What is you raid controller cache ratio ? (if you use DAS as repository storage) If you would be kind enough I have a couple of questions:ġ) You are using 64K Allocation Unit Size for the ReFS volume ?Ģ) Even though you have 384 GB of ram you implemented the option 1 (RefsEnableLargeWorkingSetTrim = 1) which should be only useful in case of heavy Ram usage?ģ) From what I read you only have 3 big jobs running. I'll work with MS to confirm, but it would be helpful if other people could check the History tab of those two tasks to see if your own deadlocks coincide with the time period between Task Started's (EventID 100) and Task Completed's (EventID 102). If I don't forcibly prevent the Veeam services from running on startup when the crash-recovery scrub is running, it IO-locks, 100% of the time. The default defrag scheduled task has been successfully running, so I thought it was fine till I manually checked via the UI.Īlso something I've noticed - I think the deadlocks only occur when the scheduled data scrub tasks under TaskScheduler->Microsoft->Windows->"Data Integrity Scan" are running while a block clone operation is issued. Oh, and another question to everyone - has anyone else observed that your ReFS volumes seem to have ludicrously high disk fragmentation statistics? I checked on this server of ours that keeps locking the most frequently and it reported 100% fragmentation (which would imply that not one single block of data is contiguous, which sounds statistically near-impossible.). Are there lots of people out there doing synthetic fulls via block clone of 5-10ish+TB backups with 16-32GB ram (since it seems that throwing extreme amounts of ram for this use case seems to at least often work around the issue.) that are having no sporadic issues whatsoever? That would be interesting to know. My only visibility into this issue is my own experiences and this thread. customers having this deadlock issue actually suggests it might be the corner case. The ratio of customers having great success using ReFS vs. But even I myself is not convinced at this time if the issue is real - or is some corner case that has to deal with special settings, special hardware, lack of certain system resource or something along these lines (for example, the issue Nate has just mentioned). My current problem is that I still don't have a single good example to show to them. Gostev wrote:Do you have a support case open with Microsoft? I would like to forward case ID to the ReFS team as it looks like your servers might be the good subject for investigation with the issue consistently reproduced even with the patch installed.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |