17 November 2020
Nadav Kavalerchik created group «Large scale Moodle support (IT)» with members Nadav Kavalerchik and Avi Levy
Nadav Kavalerchik changed group photo
Nadav Kavalerchik converted this group to a supergroup
Large scale Moodle support (IT) converted a basic group to this supergroup «Large scale Moodle support (IT)»
Alex Mihičinac invited Alex Mihičinac
AM
16:45
Alex Mihičinac
Hi 🙂
Matej Žerovnik invited Matej Žerovnik
16:46
Matej Žerovnik
Hey!
NK
17:46
Nadav Kavalerchik
Hello every one and welcome!
Feel free to pass the group join url to anyone relevant to the group.
AM
18:41
Alex Mihičinac
👍🏻
NK
19:57
Nadav Kavalerchik
I have started to update the Moodle wiki page about "Large scale Moodle" systems https://docs.moodle.org/310/en/Large_installations
19:59
I will keep adding information in the next few days, from all the notes I have spread around in various personal documents.
19:59
Matej Žerovnik
Great!
20:00
We can have a look and give some feedback from our end in case we saw different results
NK
20:00
Nadav Kavalerchik
Great!
20:01
Matej Žerovnik
Our infra looks like it could take a lot more load as we are running at 1% utilization during peak load (at 1000res/p).
20:01
Req/s
20:02
And response time around 300ms
NK
20:02
Nadav Kavalerchik
This reminds me that I mentioned the K6.io open source server we use to test load our servers.
20:04
Matej Žerovnik
Our bottlenecks ar the moment are probaby database (at least until moodle fixes read/write spliting as now some reads still go to master or we go to cluster)
NK
20:05
Nadav Kavalerchik
And working together with Brendan (from Moodle partner Catalyist) I started using readonly user session, to improve performance.
See: https://tracker.moodle.org/browse/MDL-58018
20:06
In reply to this message
20:06
Matej Žerovnik
And we need to patch moodle as we have a lot of courses in our navigation bar and that generates toooo much traffic from redis
20:06
We are seeing 400mb/s on redis
20:07
So we might need to move redis on web nodes for MUC cache and serve data locally and only use central redis for sessions
NK
20:08
Nadav Kavalerchik
We started adding JS inview elements to course formats to offload (lazyload) some of the data in the courses, which saved us a lot of time generating course pages.
And we are now patching core to lazyload the Navigation menu with all its complex nodes.
20:10
In reply to this message
We use APCu for in memory cache on the local web nodes (and not REDIS)
REDIS is used for user sessions and some MUC
20:10
Matej Žerovnik
In reply to this message
We have that enabled as well. Works ok, but we do get errors that session had to write dome data. I don’t think it’s a problem, I think we have more problems with sessions being everywhere and if a page takes too long to respond, users start to press F5, phpfpm kills the old process and session get stuck and user needs to wait for session to expire. Whatbis your session lock timeout?
20:12
In reply to this message
This is it, we are seeing reads on master node from crontab servers.
20:13
In reply to this message
We are using opcache for local caching and MUC for all MUC session and application and php sessions.
NK
20:14
Nadav Kavalerchik
In reply to this message
// https://tracker.moodle.org/browse/MDL-68577
$CFG->session_redis_lock_retry = 100;
20:15
Matej Žerovnik
In reply to this message
In our case, we removed navigation bar all together from showing on pages and it was a real speedup! Our navigation bar was 2MB in size, so every courses uri had to load 1MB data from sql and serve 2MB to each user
20:15
In reply to this message
And what about php and nginx (execution) timeout
NK
20:16
Nadav Kavalerchik
In reply to this message
Yes, we use opcache too. with opcache.validate_timestamps=0 to make it run a little bit faster and not check PHP files on the disk
20:19
In reply to this message
php.ini - max_execution_time = 30
Moodle config.php - $CFG->session_redis_acquire_lock_timeout = 120;
20:20
Matej Žerovnik
In reply to this message
We have timestamo set to 60, just to be safe in case we forgetnto purge it. We are also using php preload feature from 7.4
NK
20:20
Nadav Kavalerchik
In reply to this message
NGINX config

sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;

# Performance
client_max_body_size 300M;
#keepalive_requests 500;

# Performance
# https://linode.com/docs/web-servers/nginx/configure-nginx-for-optimized-performance/
keepalive_requests 1000;
client_body_buffer_size 128k;
#client_max_body_size 10m;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;
output_buffers 1 32k;
postpone_output 1460;

client_header_timeout 3m;
client_body_timeout 3m;
send_timeout 3m;

# Added after too many "504 Gateway Time-out"
fastcgi_send_timeout 900s;
fastcgi_read_timeout 900s;

# Prevent "502 Bad Gateway" on Chrome.
fastcgi_buffers 64 8k;
fastcgi_buffer_size 8k;
fastcgi_connect_timeout 75s;
fastcgi_ignore_client_abort on;
20:21
In reply to this message
Interesting
20:21
Matej Žerovnik
In reply to this message
But with preload enabled, we need to restart phpfpm anyway in case of code change
20:23
So we need to do more testing if we could move to timestamp=0
20:36
Our server time, reported by apache
20:36
that is 95th percentile, but it is a bit skewed as we are probably taking too many things into account
20:39
P95 page load time for course/index.php and course/view.php
20:40
This is from Apache perspective
20:41
full page load time is between 500-700ms (from browser point of view)
NK
21:10
Nadav Kavalerchik
In reply to this message
What type of page, as there are many factors for different Moodle pages.
Is it a simple course? (no blocks) how many modules? sections?
21:13
Matej Žerovnik
I think most of the courses are relatively simple, a few sections, some have a few blocks, some dont
21:14
I think Martin will have better understanding, as I'm not part of the moodle team, I'm just helping in this time
21:14
the upper graphs are p95 across all the requests for all courses
21:15
but maybe we can make a similar testing course oz a simple and a heavy corse to do some comparison
Brendan Heywood (Moodle) invited Brendan Heywood (Moodle)
Dan Marsden invited Dan Marsden
F Devine invited F Devine
18 November 2020
00:12
Matej Žerovnik
Welcome everybody
Pavel invited Pavel
Michael Spall (Idaho USA) invited Michael Spall (Idaho USA)
Matej Konobelj invited Matej Konobelj
Tim Hunt invited Tim Hunt
Tomek Muras invited Tomek Muras
19:03
Matej Žerovnik
Is there any benefit in having shared MUC cache in redis or can we have a redis per server and keep only local cache? Do you persist MUC cache or just let it rebuild if redis is restarted? With local cache, we could shed some internode bw from 400MB/s to a few MB/s for php sessions only.
NK
21:22
Nadav Kavalerchik
In reply to this message
MUC application can use local APCu on each web node as it should hold the same data, session MUC needs to use shared remote REDIS.
I do not use MUC to store data. only for cache that can always be discarded, and as far as I know the 3rd-party plugins we use, neither do they.
23:04
Matej Žerovnik
Nadav, what do you mean by not using MUC to store data? If I read the issue on the tracker correctly, MUC sessions still stores data. Whay store do you use for MUC sessions?
23:09
Do you use PHP 7? Some reports on the internet says there might be problems with using APCu and PHP 7.
23:28
Matej Žerovnik
Can you share APC ini configuration?
NK
23:28
Nadav Kavalerchik
In reply to this message
I mean that Moodle use MUC Application level to mostly cache data and not permenetly store user data, instead of storing it on the main DB, except edge cases which I do not care about so much.
It seems the same case for MUC session level, except the data is needed across sessions that can span different web node, if the user's session is not sticky to one web node. so I use remote share Redis for that.
Hope I expained it properly, but I guess @brendanheywood can deep dive into this and make it much more clearer, and also correct me if I was worng.

I use PHP 7.2 with Moodle 3.9.2, and I did not experiance any bug with APCu. do you have a link to such report?
23:30
Matej Žerovnik
OK, that makes sense and that is how I understood it, but just wanted to be sure. So in case of sticky sessions, one could also use redis/APCu for session storage, but that data could be lost if node is restarted and user needs to be moved to another node.
NK
23:33
Nadav Kavalerchik
In reply to this message
Yes
23:34
Matej Žerovnik
Regarding APCu and issues, most of the posts were 2-3 years old, so it could be issues on PHP5 or early PHP7 and are probably fixed now. Most of the pages are various blog posts, nothing super credible tbh.
NK
23:39
Nadav Kavalerchik
In reply to this message
APCu Version 5.1.18
PHP Version 7.2.33
The only setting I use is the memory size. I keep updating (increasing) it from time to time (I monitor it with DataDog) to see that it does not get full too often and reset frequantly in the middle of the day.
Now it is apc.shm_size=356M (shared between 6 Moodle instances)
23:40
Matej Žerovnik
In reply to this message
Thanks. We might try it instead of Redis to save some bandwidth... Although I think MUC Sessions generate vast majority of traffic in our case:/
NK
23:41
Nadav Kavalerchik
APCu is very very fast compare to Redis
23:41
Matej Žerovnik
great news then:)
23:45
Another question: what kind of bandwidth are you seeing from Redis server towards the web nodes during peak hours?
19 November 2020
BH
00:13
Brendan Heywood (Moodle)
we've found that the session store in redis is 1-2% the size of our redis MUC stores, and I don't have hard numbers for the size of the session scope caches in MUC but I'd expect them to also be marginal compared to overall redis traffic
00:18
Matej Žerovnik
I don't know how many people use Moodle in the same configuration as we, but we have a single Moodle instance for the whole country (all primary and high schools + some universities), so our navigation is HUGE. As far as looked at the code (I'm no coder and I'm not part of our moodle team, just helping ourtduring covid), navigation is stored in MUC session. Before changing our template, it was showing Navigation bar and the size of the page was over 1MB (!!) due to giant navigation. We since removed the navigation and now page is a lot smaller, but we suspect the code to pull navigation entried from Redis is still present and that is why we are seeing up to 400MB/s (bytes) of traffic from Redis during peak hours.
BH
00:21
Brendan Heywood (Moodle)
what do you mean when you say 'navigation'? which muc definition?
00:34
Matej Žerovnik
When you are on courses index, there is a navigation drop-down showing all courses on moodle so you can quicly jump from one to another
00:37
I think this is saved in coursecat MUC session area
omer hameiri invited omer hameiri
NK
09:19
Nadav Kavalerchik
FYI, Here is my MUC application APCu behaviour for the last week or so.
09:21
It seems to reset when it needs more then available chunk of new memory. but it does not seem to affect the performance of the systems when it does.
09:41
Matej Žerovnik
👍
09:43
Need to find APCu exporter for prometheus, so we will be able to monitor it before trying it
09:55
09:55
09:55
Today during the night and 2 times in the last 1(2?) months, we had this weird case where 'Cleaning up stale session data from cache stores.' crontab task is ran (we only run it once a day during the night, so it doesn't block redis during the day) and it starts to make lots of traffic on Redis and slows it down, bringing Moodle to it's knees with response times up to 60s. According to Grafana, there is a massive eviction of keys at 3:00 (when task start for 1st time), but then traffic goes up to 75MB/s and redis spends around 800ms per second in hkeys command. And I guess clean session task is not finished in 1 minute, because at 3:01, another "Cleaning up stale..." is seen in crontab log.
09:56
Funny enough, restarting the redis fixes the problem, crontab tasks finish without a problem and redis memory usage actually drops (it didn't at 3:00, even though it should evict almost 15k keys
09:56
anyone had similar problems?
TM
10:02
Tomek Muras
In relation to the performance on cluster, this is an important issue: https://tracker.moodle.org/browse/MDL-67821
10:02
It's one of the bottlenecks we were hitting in big and clustered installations
10:03
Brendan has a smart idea here - to use stat instead of file_exists, would be a good idea to compare the performance of both in real life (ie with a script)
10:05
Matej Žerovnik
Interesting. I'm not sure how much we are affected by this, but that might be a smart thing to check:)
10:51
Matej Žerovnik
In reply to this message
I did a quick test (and I could be wrong), but using stat() on shared FS is way slower in my case than using file_exists.
10:51
Time used for non-existing file on NFS: 0.54064559936523s
Time used for stat on non-existing file on NFS: 1.9620516300201s
This is for 100k files
Rok Jaklič invited Rok Jaklič
RJ
11:57
Rok Jaklič
In reply to this message
hi; in our theme we are using https://github.com/moodleuulm/moodle-local_boostnavigation/ which is not written with optimization in mind and for such load; so we rewrote parts where this plugin is used and we got 50% better response
11:58
11:58
so part/link in green was created with local boost navigation before ... now its "almost" static
12:05
we also found out that that "render" method in moodle is slow sometimes, but this render usually calls some plugin, ...
12:09
aha, I apologize, Matej was talking about some other thing; but nevertheless you may find this information usefull
12:11
Matej was talking about dropdown for example
12:11
12:12
So we removed this dropdown from theme and changed to more "static" navigation in menu and together we got 50% better response time
Mallika Valluru invited Mallika Valluru
20 November 2020
Marijan Milovec invited Marijan Milovec
Timotej Jazbec invited Timotej Jazbec
Martin Božič invited Martin Božič
Iñigo Zendegi invited Iñigo Zendegi
Pablo Javier Etcheverry invited Pablo Javier Etcheverry
Maxi invited Maxi
Alistair Spark invited Alistair Spark
AS
16:54
Alistair Spark
In reply to this message
Just stayed consistent and used redis on the app servers for the localisable stores
Hieu Vu invited Hieu Vu
Toni Ginard invited Toni Ginard
Manolo Mohedano invited Manolo Mohedano
Ajesh invited Ajesh
Francesc Busquets 🎗 invited Francesc Busquets 🎗
21 November 2020
Iban Cardona invited Iban Cardona
Luuk Verhoeven invited Luuk Verhoeven
NK
15:26
Nadav Kavalerchik
Is anyone using a reverse proxy to cache some of the "static" pages in Moodle?
16:19
Matej Žerovnik
What is considered ‘static’? We have cloudflare infront of moodle, but that is only caching content that moodle reports as cachable
16:20
Or do you think of caching non-cachable content that is hidden behing ACLs?
NK
16:24
Nadav Kavalerchik
In reply to this message
I was actually refereing by "reverse proxy for cache" to somthing like cloudflare (or aws cloudfront) but as an on-premise caching tool for institutes that do not use Caching As A Service.
16:24
And "static" pages are pages Moodle referes to as cachable
AS
16:24
Alistair Spark
Varnish would make more sense for on-prem
NK
16:25
Nadav Kavalerchik
I was thinking about using NGINX for on-premise cache
16:25
Matej Žerovnik
Nginx should work
AS
16:25
Alistair Spark
Maybe AWS Outposts can do cloudfront, if so might be worth considering
16:26
Matej Žerovnik
I can check how much we save by serving cached content if that helps
16:26
In reply to this message
PLEASE 🙏🏼
16:28
Matej Žerovnik
16:29
This is a 7 days report
NK
16:30
Nadav Kavalerchik
Impressive, and good enough for me to motivate me setting it up 😊
16:30
Matej Žerovnik
So not that much saved, but we are using CF for other reasons: load balancer, WAF, ddos mitigation. Cdn is just a free benefit😀
17:08
Matej Žerovnik
But on the other hand, its simple to implement so why not
NK
17:09
Nadav Kavalerchik
Are using the cloudflare bussiness plan?
17:11
Matej Žerovnik
Yes, we need it because of cname redirect as we cant move our dns to CF
TH
17:29
Tim Hunt
I think Moodle HQ uses cloudflare for this. At least they did at one time.
RJ
17:32
Rok Jaklič
7% is a lot imho, especially if there is a lot of traffic
Yedidia Klein invited Yedidia Klein
RJ
17:36
Rok Jaklič
much of unneeded processing is put on nginx or something, so it makes sense
17:37
Matej Žerovnik
In reply to this message
If they do, that is easy to check, just resolve the dns and check ip in whois.
NK
17:37
Nadav Kavalerchik
Exactly! and especially, at large scale Moodle systems... every CPU cycle count 😊
17:38
Matej Žerovnik
True, I take my sentance back😀
22 November 2020
Michael Ruder Rapajic invited Michael Ruder Rapajic
23 November 2020
BH
03:51
Brendan Heywood (Moodle)
In reply to this message
Did you submit this back to Alex in a pull request?
MB
07:34
Martin Božič
In reply to this message
No we didn't, because our plugin isn't actually a fork - we used Alex's plugin as a guide to create our own which relies on our own custom plugin. Our own custom plugin maps SAML attributes via profile fields of the users to the top categories of their orgs.
MB
07:50
Martin Božič
In reply to this message
Before that we used a much more convoluted membership mapping via https://moodle.org/plugins/local_profilecohort.
NK
09:07
Nadav Kavalerchik
In reply to this message
We are adding inview (https://github.com/phillyx/in-view) content pull with ajax all over Moodle, to save a lot of unused fullpage processing server-side. (critical in large courses), our next move is going to be adding ajax to local_boostnavigation. (in a week or so) so stay tuned 😊
RJ
11:43
Rok Jaklič
yes, it makes sense to load content only when needed, but it needs ux/redesign of page usually
11:48
once we found out (the hard way :) ) we need to optimize things, our focus for was/is to bring down page load to "minimum", so we can serve "more"
11:49
for us, usually everything worked pretty ok, until we hit some threshold/bottleneck e.g. once it was db, then redis, then fw;
11:50
so we would have 2s avg response time, but once it hit 2.1s it was downward spiral
11:52
so we managed to cut down avg response to ~200ms on 1.25k requests/s as Matej already mentioned I think
NK
11:59
Nadav Kavalerchik
In reply to this message
Super cool
RJ
11:59
Rok Jaklič
and the biggest fear right now for us are session locks; since it "is possible" that malicious users would hit pages which are not "read only sessions enabled", which in turn would drain our php workers, ...
12:00
we are somehow limiting them through rate limit on cf
NK
12:01
Nadav Kavalerchik
I personally find out who those users are... and delete them 😊
Only read-only sessions surfing on our servers!!!
12:01
Matej Žerovnik
As we already talked, we don't have mandatory login due to historic reasons, so we can easily get hit with botnets
12:02
but so far, CF is protecting us, mostly by mandatory javascript captcha for all non-slovenian users and full block on Asia sources. Anyone getting through captcha is then also rate limited
RJ
12:08
Rok Jaklič
In reply to this message
yeah, much of worries are avoided then
AS
18:38
Alistair Spark
In reply to this message
Via Core Moodle or through a plugin?
NK
19:30
Nadav Kavalerchik
In reply to this message
We added the JS lib to a local plugin and used it in several places
24 November 2020
MB
12:39
Martin Božič
What file upload limits do you have on your Moodles? Our pupils and students currently upload 100G per day it seems. We set 20MB for most of the activities which is IMO too generous, but on the other hand, people are used to such limits from other services.
AS
13:05
Alistair Spark
4GB
13:06
Course level default is set at 500MB but they can increase it up to the site limit if required
BH
13:27
Brendan Heywood (Moodle)
why not just use native IntersectionObserver with a polyfill?
NK
13:32
Nadav Kavalerchik
We give them between 20M to 50M (depending on the policy for the specific Moodle instance) year round, except at the begining of the year when we allow teachers ~500MB to be able to restore courses from other Moodle systems
13:34
For videos, we use MEDIAL, but also testing the beautiful AWS media solution Catalyst developed: https://github.com/catalyst/moodle-local_smartmedia
BH
13:35
Brendan Heywood (Moodle)
I'd love to hear feedback on that, we've only couple clients using that, should get more eyeballs on it next year
MR
13:37
Michael Ruder Rapajic
Did any of you tried mariadb cluster with active-passive mode (in production)? I remember Nadav said they are considering to try it.
NK
13:40
Nadav Kavalerchik
In reply to this message
We are indeed considering it, as our statistics show a ratio of 12 to 1 read/write on SQL queries.
And also considering moving some Moodle instances to PostGreSQL (also primary with secondaries layout)
YK
16:08
Yedidia Klein
Did you try moodle 3.9 support of slaves ?
MB
16:21
Martin Božič
In reply to this message
At ARNES we run half of our web nodes on SQL slave 1 month now. So far so good with the default settings.
16:22
In reply to this message
Do you store all the user data in the archived courses?
Ivica Matotek invited Ivica Matotek
NK
16:29
Nadav Kavalerchik
In reply to this message
Yes. 3-4 years online, and for another 3-4 years offline, in full course backups. on a per instance/year folder.
יונתן @ Tau invited יונתן @ Tau
oh
16:57
omer hameiri
In reply to this message
I did.. Works great
25 November 2020
BH
00:25
Brendan Heywood (Moodle)
For anyone who is trying to optimize the shared filesystem and get stuff off it, I've been working on a new tool to expose this the same as MUC performance. Also comes with a debug mode to pinpoint the exact code which reads or writes https://tracker.moodle.org/browse/MDL-70243
NK
00:26
Nadav Kavalerchik
👍🏼
BH
00:26
Brendan Heywood (Moodle)
also any chance we can get the url previews working in this channel?
NK
00:27
Nadav Kavalerchik
How do I do that?
BH
00:27
Brendan Heywood (Moodle)
dunno, maybe ask in the moodle channel what they did to set it up there
NK
00:27
Nadav Kavalerchik
I think it is a special bot. I will ask Andrew...
00:29
Matej Žerovnik
In reply to this message
How big it your storage?
00:30
We are seeing massive growth these last 3 weeks when we are in full-school-from-home mode with around 100-150GB/day
NK
00:31
Nadav Kavalerchik
00:32
BH
00:33
Brendan Heywood (Moodle)
In reply to this message
curious if you are using an alternate file system / s3 etc?
00:33
Matej Žerovnik
No, NFS share at the moment
Nadav Kavalerchik invited MoodleBot
BH
00:34
Brendan Heywood (Moodle)
how are your prioritizing optimizing for latency or growth or cost?
NK
00:35
Nadav Kavalerchik
Tesing tracker issues preview...
https://tracker.moodle.org/browse/MDL-70243
M
00:35
MoodleBot
MDL-70243 - Add a file system performance summary into the footer and file IO debug mode
Status: Peer review in progress
Assignee: Brendan Heywood
Reporter: Brendan Heywood
BH
00:35
Brendan Heywood (Moodle)
cool
NK
00:35
Nadav Kavalerchik
Groovy
AL
00:35
Avi Levy
In reply to this message
Sticker
Not included, change data exporting settings to download.
😎, 29.8 KB
00:36
Matej Žerovnik
In reply to this message
we don't at the moment. This COVID broke all rules and we just combined the gear we had so we were able to scale out asap
AL
00:37
Avi Levy
Someone work with microservices, K8s?
What is your faviorit soulution for moodledata shared file system?
BH
00:43
Brendan Heywood (Moodle)
we wrote this and use a hybrid gluster / aws s3 https://github.com/catalyst/moodle-tool_objectfs
00:46
Matej Žerovnik
In reply to this message
I saw that, but did not have time to have a closer look. Our small NFS servers was mostly empty until this 2nd covid wave hit. Now we are seeing up to 60k assignments posted daily with sized betweek 500kB and 5MB. We do have a new dedicated storage ordered which will provide NFS & S3, so we'll have a closed look at your module when we have S3 setup
00:47
we are also setting up a rather big CEPH cluster which we will be able to use as a spillover in case we really need storage quickly
AL
00:48
Avi Levy
In reply to this message
How you dived the Hybrid between glusterFS and S3/Bucket/objectstorage ?
BH
00:49
Brendan Heywood (Moodle)
so the main sitedata is on gluster, objectfs only manages the filedir subset of sitedata and it shuffles files back and forth as needed but tries to keep the majority in s3 only
00:49
we also use aws EFS sometimes for specific parts of sitedata like the $CFG->backuptempdir
00:51
its a balancing act, compromise of cost vs performance. s3 scales way better and is a ton cheaper but adds latency which you notice for lots of tiny small files like h5p (we are working on the proper workaround for that in the plugin now)
NK
00:52
Nadav Kavalerchik
Have you tried EFS in infrequent-access mode? https://aws.amazon.com/efs/features/infrequent-access/ (to save costs)
BH
00:53
Brendan Heywood (Moodle)
no, we only use efs specifically for the highly elastic files like backups which spike hugely and then go away forever
00:53
s3 has similar stuff which we use
AL
00:53
Avi Levy
Nice, I'll try in the past working with object storage but there was a lot of problem with the file_exist(). We will test it agein.
I must find good solution for this single point of failer and preformance issue.

Do you have any expirance with other cloud GCP,Azure?
BH
00:54
Brendan Heywood (Moodle)
the plugin supports then, other in the wild have provided those stores but we ourselves only use aws
00:55
I think we've helped a couple clients set it up on prem
AL
00:55
Avi Levy
Sticker
Not included, change data exporting settings to download.
👍, 42.4 KB
00:55
I see all the cloud supported, Impressive
NK
00:56
Nadav Kavalerchik
https://tracker.moodle.org/browse/MDL-68719 seems like @brendanheywood is trying to fix file_exists()
M
00:56
MoodleBot
MDL-68719 - Remove as many file_exists() calls against shared dataroot as possible
Status: Development in progress
Assignee: Brendan Heywood
Reporter: Brendan Heywood
Votes: 3
AL
00:57
Avi Levy
In reply to this message
already watched
BH
00:57
Brendan Heywood (Moodle)
yeah that is exactly why I've been focusing on that fileio tracker to help hunt those down
00:57
the file_exists isn't a blocker though, its a small perf issue in the big picture
AL
01:00
Avi Levy
We use the shared filesystem and Azure and it make us a lot of problem, due to latency. In GCP we work with filestore solution and it's work very well
NK
01:01
Nadav Kavalerchik
If we delete enough users... the picture gets very clear, and we do not experiance these issues 😉
AL
01:02
Avi Levy
Let's keep only admins
01:02
Sticker
Not included, change data exporting settings to download.
😒, 22.6 KB
01:03
@brendanheywood Do you use Redis cluster for session, cacheing?
BH
01:04
Brendan Heywood (Moodle)
session yes aws elasticcache and yes muc is a bit more complicated master-master caching split across awz AZ's so we don't get cross AZ $
AL
01:05
Avi Levy
Sorry for the poor knowlage about AWS what is AZ's?
01:06
Avilability Zone?
BH
01:06
Brendan Heywood (Moodle)
yup
AL
01:06
Avi Levy
Thanks
Andrew Nicols (Moodle) invited Andrew Nicols (Moodle)
Mathieu invited Mathieu
Mark P invited Mark P
Mark P invited Eduard Cercos
Mark P invited Ishtiaque Daudpota
Sander Bangma invited Sander Bangma
Shamim invited Shamim
Lee Goldsworthy invited Lee Goldsworthy
Craig R Morton invited Craig R Morton
MM
09:26
Marijan Milovec
In reply to this message
We are trying to deploy Bitnami Moodle for Kubernetes on OpenShift OKD4 and we will be using CEPH.
RJ
09:33
Rok Jaklič
i tried using glusterfs for some short time on some other project where tmp files would reside, but for some reason latency, performance was not good enough that time, ... i've had an option moving tmp files to db and i did, and even though i do not advise to do it, it works ok; size is 36GB right now
AN
09:34
Andrew Nicols (Moodle)
I tried to use OCFS many years ago. Testing looked amazing, but as soon as we went live it couldn't cope with it. MInd you, that was 12 years ago.
אסף @ Sysbind invited אסף @ Sysbind
RJ
09:50
Rok Jaklič
15 years ago, on some service, we balanced/separated requests based on md5 checksum of an username and based on that it was delegated "servers to be processed on"; so servers were based on tree like structure with its own storage; with only local HA on "branches"; it worked, even though we've had very few cases where requests were sent to the wrong "branch"
AN
10:03
Andrew Nicols (Moodle)
That's still pretty typical - sticky sessions
Jordan Tomkinson invited Jordan Tomkinson
Víctor Déniz [UTC+1] invited Víctor Déniz [UTC+1]
oh
13:25
omer hameiri
In reply to this message
We are using EFS for all moodledata files, Infrequent access is a great solution to help cut costs.

Last time I checked we had 5 TB of data on efs, where roughly 3 on regular access and 2 on infrequent access
13:27
Had a miserable experience back when we started - tried putting the cache and the web folders on EFS, was a bad idea.
We now use a separate NFS server for cache, and for the web files
YK
15:54
Yedidia Klein
In reply to this message
Do you use any other cache ? Like redis etc....
In my experience while using external shared cache -the cache directory on nfs is fine...
oh
15:54
omer hameiri
We use memcached for sessions.... 🙂
15:56
We we're about to start shifting some actual load to redis, but since everything is runn so smoothly we decided not to touch any until we'll be done with this covid thing....
AS
19:53
Alistair Spark
In reply to this message
Bitnami Moodle doesn’t sound like a good idea for prod
19:54
In reply to this message
I take it your site doesn’t take much traffic then
MM
19:54
Marijan Milovec
It is currently only for testing, since it is probably the fastest way to see how it works.
oh
19:55
omer hameiri
In reply to this message
Now? I takes quite a bit... Why?
AS
19:59
Alistair Spark
In reply to this message
Relying on file based caching doesn’t scale that much. Redis is really easy to setup, even if you don’t care about scale the speed improvement is worth the move .
EFS I/O on big sites is another
20:02
Our first move to Redis was driven from poor NFS performance after moving from RHEL6 to RHEL7
MR
20:03
Michael Ruder Rapajic
Redis for app cache?
AS
20:04
Alistair Spark
But the real benefit was adding the localised cache on each app server - big improvement. We stayed with Redis as we were familiar with it by then. I know others favour APCu for that piece
20:04
Yes App cache
NK
20:06
Nadav Kavalerchik
In reply to this message
We use APCu, which is very very fast, compare to Redis, for MUC application.
oh
20:06
omer hameiri
Guess we haven't reached that limit yet...

We are seeing around 180k daily users (not unique), and it runs smoothly.
MR
20:06
Michael Ruder Rapajic
we actually started to hit single thread redis cap. We have tested KeyDB (multithread redis drop in replacement) and so far it looks good. we are going to production with it very soon
AS
20:11
Alistair Spark
In reply to this message
Didn’t compare redis vs APCu but on-prem we were then dealing with limits of open-source redis on the shared cache, as its single threaded. Not an issue with AWS Elasticache. Had to have multiple instances of redis & spread mappings between the 3 app cache instances. So doubt it would have made a difference.
oh
20:13
omer hameiri
One interesting thing I failed to mention..


Our use case is somewhat unusual.

We don't have one big instance, we have around 1000 small instances (k12)
AS
20:19
Alistair Spark
In reply to this message
I think that’s the one I had on my radar to compare with redis enterprise. Hope it goes well
20:20
We focused our efforts on MySQL 8 with InnoDB Cluster and MySQL router after that to scale our dB layer so didn’t progress further on the redis side
20:25
Worked well, except for the MySQL router part. Initially router could only handle 10% of the load of hardcoded. We have eventually tamed it into performing equally well as hardcoded hosts - mainly via DNS caching. But that was only a couple of weeks ago
20:27
recreating AWS services on-prem with the expensive Oracle MySQL Enterprise was a bit daft but in our on-prem world, we had to.
NK
20:35
Nadav Kavalerchik
I was recently approched by MariaDB about their SkySQL cloud solution:
https://mariadb.com/products/skysql/
Anyone had any experiance with it?
AS
20:37
Alistair Spark
In reply to this message
iirc it didn’t deal with having Redis in HA setup though.
Enterprise needed a quorum of 3, with only to datacentres it wouldn’t have made sense. We weren’t going to build a 3rd over the summer
20:37
Matej Žerovnik
Just for info if it helps, we are using open source redis with master/slave mysql on a 140k unique moodle (peak 1200req/s). We run on-prem with baremetal AMD servers. AMDs are great as you get very high clock so single thread apps perform better. We have a single redis instance for sessions and app cache. So far so good.
20:37
We are seeing around 40k req on redis
20:39
But as I’m looking at cpu utilization, we are around 75% utilization of single core, so we will probably need to move redis to keydb
20:40
Or move app cache to per-node and only keep sessions in shared moodle
MR
20:42
Michael Ruder Rapajic
If one node goes down you/we loose all of users app cache on that node. We were also thinking about it but deceided not to do it :)
20:43
hopefully keydb will pull us trough
AS
20:48
Alistair Spark
In reply to this message
We had peaked to 5 redis instances:
-1 session store
-1 session cache
-3 app cache
20:50
In reply to this message
There’s only 4 localisable cache mappings (language strings, course modinfo, html content & some random other one) so you still need the shared app cache for most of the other bits
20:53
Documentation is really confusing on that but there session cache & session store are two separate things where Moodle supports redis. Could be led to believe it’s one & the same. Should probably update the Moodle docs but haven’t gotten round to it
MR
21:03
Michael Ruder Rapajic
In reply to this message
session cache, sorry
NK
21:04
Nadav Kavalerchik
Looking at the code, I see that Moodle store $USER & $ SESSION in PHP $_SESSION
And if I set MUC session to "Default session store" is saves the data in $SESSION (which is in $_SEESION, which is controlled by config.php "user sessions")
So I just point config.php user session to Redis, and it really does not matter if I set MUC session to redis or to "default session store", as they are all saved in Redis.
21:05
And so the config.php user session store, also includes the MUC session cache
21:07
Looking at redis-cli monitor, I can see that the MUC session cache is relativly a small amount of data (at least on our servser) and I do not really care it is added the config.php user session store.
AL
21:08
Avi Levy
https://docs.moodle.org/310/en/APC_user_cache_(APCu)
according to this docs, how you chose where to use APCu?
NK
21:10
Nadav Kavalerchik
In reply to this message
It can only be used as MUC Application cache, as I was unable to select it in any other cache store.
21:27
Matej Žerovnik
In reply to this message
Ohh yes, I am aware of that, what I meant was MUC application store. As we are using single redis I don’t have an idea what generates most of the traffic, but we’ll try and split it just to get some idea. It might happen that sessions and muc store generates most of the traffic.
21:27
In our case that is
AS
23:09
Alistair Spark
👍Lang strings and course modinfo generate 80% of the requests from what I recall
23:10
Matej Žerovnik
Wundaba!:) We might have one corner case with possible lots of data served from MUC session, but moving to local redis instance will show if that is true or not
26 November 2020
BH
00:04
Brendan Heywood (Moodle)
If you see lots of traffic related to session you probably have some non-ideal 3rd party code, we see session to be ~1-2% of IO compared disk / muc / db overall
00:04
If you want some quick wins to reduce redis session io
M
00:04
MoodleBot
MDL-69707 - Redis session handler should not write the session if it has not changed
Status: Closed - Fixed
Assignee: Brendan Heywood
Reporter: Brendan Heywood
Integrator: Jake Dallimore
Fix Versions: 3.10
BH
00:04
Brendan Heywood (Moodle)
M
00:04
MoodleBot
MDL-69121 - Allow redis session store to use zip or zStd for compression like redis MUC
Status: Tested
Assignee: Jamie Stamp
Reporter: Brendan Heywood
Integrator: Eloy Lafuente (stronk7)
Fix Versions: 3.11
BH
00:10
Brendan Heywood (Moodle)
I have a few more ideas for optimizing the sessions including a tool to instrument them in prod to show what attributes are bloating the session, but these are way down my list because at least in our inf they are now a very marginal cost / perf factor
00:11
Matej Žerovnik
Wow thanks for that
00:11
I need to confirm my idea and first move muc app store to separate redis and see the change
00:13
If muc sessions will still generate lots of IO then I will have a look at the tickets you posted and also also dig a bit into the code to try and find out which attributes take the most size in redis
00:15
Is there a way to get size per store area somehow?
BH
00:17
Brendan Heywood (Moodle)
thats the exact problem which is a little hard to see easily at the moment. We did some hacking in the muc code to just dump out which area was read / writing / io, but I'd like to rewrite that into a proper core feature
00:18
Matej Žerovnik
I did some digging around the code to try and pull some data out but it’s roo complicated for me to understand🙂
00:18
Too
BH
00:20
Brendan Heywood (Moodle)
in dev just map everything to disk and then its easy to grep and find stuff thats big, its only if you are having issues tracking down some random edge case in prod and you can't reproduce it yet
00:21
Matej Žerovnik
Huh... now that’s a great idea🙂
00:22
Did not think of that!
MB
00:26
Martin Božič
In reply to this message
Hmmm, one Jmeter test run on staging tomorrow morning?
NK
00:28
Nadav Kavalerchik
I am using https://redislabs.com/redis-enterprise/redis-insight/ to monitor the traffic split between redis user session store, redis muc session cache, and redis muc application cache.
You can have it monitor for the different prefixes and get the percentage of usage for each, while browsing different Moodle pages
00:30
Matej Žerovnik
Thanks
AN
01:08
Andrew Nicols (Moodle)
In reply to this message
Not in the north west of England are you..?
Rahman Pujianto invited Rahman Pujianto
oh
08:34
omer hameiri
In reply to this message
Central Israel 😃
AN
09:03
Andrew Nicols (Moodle)
Ah. I used to work on a Moodle installation for 1,000 schools in NW England - maybe 10-12 years ago now
MM
09:06
Marijan Milovec
In reply to this message
Hi, can you share please i formation regarding JMeter testing. How does Your JMeter test looking, what is scenario.
Thanks
M N invited M N
Gunawan Prasetia invited Gunawan Prasetia
10:15
Matej Žerovnik
In reply to this message
We are basically simulating users based on module usage we see on our site. Each users does SAML login, then does some forum posts, writes some chat messages, solves a quiz, check an assignment (maybe something else also). There are random wait times between actions to simulate real users "speed" of searching. It's not perfect, but according to graphs, we are 70% there, which is not that bad.
MM
10:16
Marijan Milovec
Did you create JMeter tests through the Moodle JMeter development module or you create them on your own manually?
10:16
Matej Žerovnik
No, we created our own
MB
10:18
Martin Božič
We have 10k SAML accounts which were then joined into one cohort after the first login (registration). We used moosh in that case.
10:19
Also, we dropped the posting into forum since those forum threads got bigger and skewed the response times.
10:21
Matej Žerovnik
Yes, there is a lot of optimizations to be done, such as generate new forum threads and pick a random thread to reply, not just reply to the same every time
MM
10:21
Marijan Milovec
Thank you, we will try create JMeter through the Moodle itself, I have tried few months ago, but there was a lot of bugs, some of them we fixed internally and we managed to create test and even to run them, but there was a lot of errors. Hopefully some things changed in 3.9 version. But for now yes also we are creating manually our tests. Thank you for information.
10:22
Matej Žerovnik
We think we can create better tailoer tests ourself as each moodle has differently behaving users.
MM
10:23
Marijan Milovec
In reply to this message
Yes definitely true that, but with limited Moodle team for now we have to make some compromise :) , I believe that we are not alone in that situation :)
10:24
Matej Žerovnik
Sure, I understand
13:09
Matej Žerovnik
Regarding MUC application cache. Not all application cache can be on local redis, right? There is a column "Can use local store" and some have yes and some no. Does that mean that if this is "no", we need to have it on shared redis instance?
AN
13:45
Andrew Nicols (Moodle)
Basically
13:45
Not necessarily redis though
13:57
Matej Žerovnik
badically? :)
AN
13:58
Andrew Nicols (Moodle)
In reply to this message
Basically*
13:58
Any shared application cache, not necessarily Redis.
14:17
Matej Žerovnik
OK, I mentioned redis as we have experiences with Redis and monitoring is easier than with APCu (for us). Looking at cache page, there are only 4 cache definitions that can reside on local store, the rest of the has to be shared between nodes.
MM
15:47
Marijan Milovec
Hi, not sure but I think we saw earlier that Slovenian team has MySql cluster with active-passive, so just if someone can confirm information, thank you.
16:00
Matej Žerovnik
We don't have a sql cluster, we have basic master-slave replication
oh
16:05
omer hameiri
AWS aurora might be what you need..
MM
16:05
Marijan Milovec
We are on prem, and testing Galera
MB
18:06
Martin Božič
Can anyone recommend any EU based Moodle Partner that has experience with large scale Moodle? Although we've become quite confident at Arnes (SI) we would still like to have a review of our infrastructure and application. For example, I've sent a message to Catalyst EU via form on Moodle Partners page, but I haven't received a response yet.
TH
18:06
Tim Hunt
I would sat catalyst.
18:06
Possibly also Enovation.
NK
19:28
Nadav Kavalerchik
Catalyst
DM
20:09
Dan Marsden
Thanks guys! Martin, flick me a PM with your email address and I can follow up. Sometimes it takes a bit of time for hq referrals to filter through to the right desk. ;-)
MM
20:11
Marijan Milovec
In reply to this message
Definitely following for an update :)
We are interested also in some potential Moodle partners in EU
DM
20:33
Dan Marsden
Marijan - feel free to flick me an email with details on what you're looking for help with and I can pass you to the right person in our EU team to help. (Dan@catalyst.net.nz)
MM
20:34
Marijan Milovec
In reply to this message
Thank you very much, will do.
27 November 2020
10:19
Matej Žerovnik
Is anyone showing Upcoming events on their home page? We enabled it today and queries on SQL server jumped from 80k to 400k qps and response time dropped. We had to disable it just to get back to normal performance. We know calendar view is slow, but I guess all calls to calendar are problematic. We didn't have much time to dig deeper. Just wanted to know how others are solving this problem and if there are any tickets opened for this?
TM
12:15
Tomek Muras
HI guys, anyone using Redis with maxmemory?
if yes then what policy do you use for eviction (maxmemory-policy)
12:16
Matej Žerovnik
In reply to this message
We have eviction set to allkeys-lru
TM
12:17
Tomek Muras
In reply to this message
I can recommend Enovation (well I work there :). You can leave your email to me as well.
12:17
Matej Žerovnik
Not ideal, as it can remove some sessions keys, but when memory is full, I rather have a few users logged out than slow system for all
12:18
and using LRU, there is a high change that key is not used anymore and user went away, so no harm done
TM
12:18
Tomek Muras
Yeah, good point.
12:19
When I look at the list of all the possible ones... it probably makes most sense.
12:20
Matej Žerovnik
Maybe volatile-lru could also be used. Not sure which keys have expiration set (I think only php sessions, don't know about MUC keys)
TM
12:20
Tomek Muras
I think I just wouldn't "trust" that expiry is always set
12:20
and just go with all and LRU
12:23
I'm also considering disabling background saves altogether. In case of restart sessions will be lost - but all the cache data should be rebuilt.
12:23
And it seems that sometimes Redis uses quite a lot of RAM when saving the data.
12:24
Matej Žerovnik
As always in IT, the answer is "it depends". Acurate and not so useful when you are given a choice:) But the best is to give Redis enough memory so it doesn't need to evict data.
12:27
If you can live with that:) But as other have figured it out and I confirmed in our case yesterdat, the biggest keys in Redis are MUC application caches (language strings and such). So if you can split those two and have BGSAVE enabled on php sessions and MUC sessions and use nobgsave for MUC application, you will have smalled dumps and no data crucial data lost. You can also use APCu for MUC application cache, if that works better for you
TM
12:28
Tomek Muras
the best of both worlds. We should write-up a recommended Redis & cache setup
MB
12:29
Martin Božič
In reply to this message
In Moodle docs?
TM
12:29
Tomek Muras
yeah, I'd say it's the best place
MR
13:10
Michael Ruder Rapajic
we also use allkeys-lru
13:11
disabling transparent hugepages also
13:13
also if you are using vm (with balloning memory feature), disable it for caching server
BH
13:35
Brendan Heywood (Moodle)
whenever I've looked into cache lru generally it just didn't seem worth worrying about. The really big sites warm up full after a few hours or a day and then plateau, and after that they only very slowly creep up over weeks. Inevitably a security patch or release gets done which resets it and you start over
Mark Johnson invited Mark Johnson
TM
15:14
Tomek Muras
Also, the sessions could go into local cache (redis or otherwise) on each web server. That is when load balancing with sticky sessions is used
15:16
Matej Žerovnik
Yes, that is possible, but in case of webnode restart, user gets logged out. Now if your environment is stable that is OK, but if you are more into containers and such, there can be many restarts / new containers spun up and users rearranged and that can be a problem.
TM
15:17
Tomek Muras
true, in this case you would probably need clustered redis
Deleted invited Deleted Account
28 November 2020
01:03
Matej Žerovnik
I'm looking over MUC cache definitions and I'm wondering why some definitions cannot be used locally, such as
- Post processed CSS
- YUI Module definitions
- Course categories tree
- Mapping of icons for font awesome

What is the downside if we use local store. In case of CSS and YUI module, we need to manually purge redis on all instances on change? I guess the same goes for others in the list?
01:04
they are not purged via change on the site as it only runs the purge on the local redis instance?
BH
04:22
Brendan Heywood (Moodle)
that is not sufficient, if a caches value is replaced then it will only end up replaced in which ever node reset it. If you remove it then it will only get removed in the same node. You'll end up with different caches having different data. Localizing a cache is not hard but needs to be done right, lots of details here https://docs.moodle.org/dev/Cache_API#Localized_stores_for_distributed_high_performance_caching
04:35
some of those I agree are good candidates to localize, at least the css you'll find is already localized but is not localized using MUC, its done manually its own way directly on disk using $CFG->localcachedir On top of that if you use a cdn it will get additionally cached there so it never even hits php land after that
BH
04:52
Brendan Heywood (Moodle)
the next one on our agenda is core/plugin_functions
AN
06:25
Andrew Nicols (Moodle)
Basically a local cache is generated in a node and not shared with others. A shared cache is generated on a node and is shared with others.

If something cannot change between versions then it’s a good one to share. If it’s something that is user configurable it is not.

YUI modules and icon mappings could potentially be moved because those things do not change without plugins being installed or Moodle being upgraded.
13:35
Matej Žerovnik
Thank you both for your input. So one could actually move more stuff to local redis, but needs to be careful as manual purging on all nodes needs to happen to avoid serving old/stale data. User configurable staff is obviously our of the question, but admin configurable things could potentionally be moved, but the admin needs to flush redis key on every change.
BH
13:37
Brendan Heywood (Moodle)
css, yui, icons and plugin_functions are all very slow moving data so are basically safe, they should all be easily able to be properly localized and be 100% safe
13:37
Matej Žerovnik
Is there an easy way to know which cache definition matches which key in redis?
BH
13:38
Brendan Heywood (Moodle)
in our inf we spawn new containers on deployment so local caches get cleared so there are things we localize already which are technically not normally 100% safe
13:38
Matej Žerovnik
I would like to see if I can more the most requested keys into local cache.
BH
13:39
Brendan Heywood (Moodle)
re keys, depends on the definition, some are obvious, others use opaque hash
13:41
Matej Žerovnik
We are not running in containers, but we are deploying new stuff in a similar way when doing containers, so all local caches are flushed and most of the changes are done as a new deployment.
29 November 2020
21:26
Matej Žerovnik
Are the redis key names for various key definitions the same for all? If yes, can someone well me what definitions are stored in the following keynames:
- 1b088c226629b3fa247ff7ef67dad266
- 333b05a93763134d16ff7006d711b5aa
- 8f371215eb636856fc0cb92b3526ed93

They are all MUC application definitions. I can also post some content from the keys if that helps, but I can't seem to find / dont know to which definition they belong.
30 November 2020
BH
00:28
Brendan Heywood (Moodle)
those as the opaque ones I mentioned above, they are annoying and I really think they should be fixed in core so they are obvious. I think its the code path below but there might be others
11:52
Matej Žerovnik
In reply to this message
Thanks Brendan. I used xdebug + VS Code and set breakpoint on the code you linked and got all the definitions. The upper keys are core/config, core/plugin_functions and core/yuimodules. We create a new release and deploy it everytime there is a plugin or theme change in our system. Is it safe to assume, that those redis entries only change on upgrade/new plugins or reconfiguration of plugins/themes which are done by admin, or can users also do changes to there modules? What about core/config, what settings are saved here? Settings from config.php or something else? Is there any documentation describing what is stored in each definition and how it's used?
Deleted invited Deleted Account
BH
12:11
Brendan Heywood (Moodle)
plugin_functions yes and yui are linked to code so should be safe. config is definitely not safe to localize and must be shared
TM
23:17
Tomek Muras
"admin configurable things could potentionally be moved" - yeah, I think I could live with that. For example after some change, flush all local caches
23:21
language strings for example could be put into local caches
23:21
btw - do you use atop for monitoring server performance? I've discovered it recently and must say - it's awesome
23:22
it can even be used to capture and review historical information (snapshots) https://muras.eu/2020/11/28/Use-atop-log-to-review-historical-data/
23:26
Matej Žerovnik
In reply to this message
language strings already support local caches and you don't need to flush it in case of changes. Check 'Can use local store' column under MUC configuration. As for monitoring, we are using Prometheus with Grafana, atop only locally for more detailed view.
TM
23:40
Tomek Muras
23:40
you're right, thanks - I didn't see it yet
1 December 2020
BH
00:03
Brendan Heywood (Moodle)
In reply to this message
Strictly not. The problem is that lots of places in core or plugins use config as a generic key value store, its not just admin settings which are changed by a human admin in the gui. core/config must be shared as amongst many other things it also contains the versioning keys which allow the other caches to be localized.
AN
01:45
Andrew Nicols (Moodle)
Just to re-iterate:

It isn’t the flushing of caches that is the problem... It is generating a new cache content.

Many of our caches have been retrofitted onto existing areas. For example, the configuration system has existed since the dawn of MoodleTime, but caching was a later addition.

If, as a developer, you change a configuration value, you must either:
- purge the cache across all nodes; or
- update the cache.

Moodle is not aware of its environment. That is to say, that _you_ as an administrator can add, or remove, a node whenever you like and Moodle is unaware. The same applies to cache backends like Redis.

As a result, it is not possible to _update_ a local cache because Moodle either not able to connect to all storage backends (i.e. local APCu), or is simply unaware of them (i.e. redis/mongodb/memcached on 127.0.0.1).

That means that as a developer, you cannot update a local cache. Therefore things which are locally cached should only be used by things which have very low churn, because every churn requires a purge of that cache.

If that is not the case, then you get into a situation where your cache is out of sync and one node may have a stale value when compared with another.

What you potentially could do (and I believe Catalyst have been looking at this) is to have a local cache which is backed by a shared cache. You still need to purge the cache, but it means that the first call to fill a value for any one node happens, it will fill both its own local cache backend, and the shared one. Then any subsequent call just fills its local cache from the shared one. HOWEVER it is more complex to configure, and should be used carefully because mistakes will lead to inconsistent data.
01:45
So tl;dr is that anything which has potential churn should not be local cache.
BH
01:51
Brendan Heywood (Moodle)
we back localizable caches by shared caches to warm them quicker. That is 100% safe under all conditions and setups
01:52
the way I think about this is like git, the shared cache is the canonical repo. A new cache item is a new commit. A local node can cache that commit locally, and the commit is immutable. If something changes then that is a new commit and it propogates out. Not all nodes may have any given version / commit.
01:53
when you ask for something from the cache you must say 'I want this exact commit / version' which is equivalent to saying 'head is set to this commit'. The place where you store the mapping that says 'head is on this version' must be in a shared place, eg the themerev, jsrev, etc. And these are all stored in the db and cached in core/config which is why that must be shared
AN
01:58
Andrew Nicols (Moodle)
It would be great if we could teach Moodle about these hybrid caches (local cache backed by a shared cache).

If we could do that then we could have Moodle purge the local cache, but only update the necessary keys on the shared cache.

Right now that is not possible though.
BH
02:00
Brendan Heywood (Moodle)
I don't quite follow that? If you bump a version then everything is cleared instantly. The older versions in local caches are present but will be ignored. The only reason you'd want to communicate with them is to remove them but you can get this with a ttl or whenever it gets reset next time. But its always correct either way
AN
02:00
Andrew Nicols (Moodle)
And, again, it does not solve the problem of making high-churn areas localisable because there is always a risk that it _can_ be misconfigured by someone unaware of the intracacies. If you had a hybrid cache and reconfigured it to remove the shared cache, then you would end up with local caches out-of-sync again.
BH
02:00
Brendan Heywood (Moodle)
no thats not right at all
AN
02:01
Andrew Nicols (Moodle)
In reply to this message
Yup - that's correct.

But if we had a hybrid cache where you continue to purge the local caches (as we do now), but Moodle was _also_ aware of the specific cache that backs them and could choose to invalidate a specific key, then you would reduce churn further.
02:10
In reply to this message
Take the example of langstring customisations.

Generally speaking, those customisations are only changed very occasionally. It is unlikely to be a daily task and is more likely to be something that changes every few months or even years. So it is a low churn dataset.

You can put it in a cache which can be stored locally. In that situation every time a change is made, the entire langstring cache must currently be cleared on all nodes. Internally we do that by comparing a lastpurged timestamp.

However, if we had a hybrid system where Moodle is aware of both the fact that there are local backends, and that those local backends are backed by a separate, shared cache, and Moodle is aware of how to use that cache, then you can have the situation where:
- you change one specific key (i.e. one language customisation); then:
- you bump the lastpurged time for the local caches to clear each of the caches local to a node; but
- you only update the relevant key on the shared cache.

That means that the shared cache was never purged, merely updated to reflect the latest data.

Which means that the next time that a node wishes to fetch the cache content it is only being fetched from the shared cache and not having to be entirely regenerated.
BH
02:43
Brendan Heywood (Moodle)
so worth noting that we actually do also have a setup where we have distributed cached which are not localized in the moodle sense similar to what you describe above. This is to avoid the extra $ of cross AZ traffic in aws. We have a redis instance in each AZ, each front end in each az talks to its nearest redis. When a write or a delete comes in for a cache item this is pushed to all N redis. This is managed by an envoy proxy
AN
02:44
Andrew Nicols (Moodle)
yup - and I think that using a Proxy is likely the best way. Generally speaking it is more appropriate to have a domain-specific proxy than to teach Moodle to do these things
oh
10:51
omer hameiri
A problem I've been having recently, wondering if any of you guys had it.

Every now and then, I get a spike in mysql connections, and from looking at the logs it seems like there is a lot of activity around mdl_sessions
10:52
(using memcached to manage sessions... So it's a bit unclear)
10:54
Looks like cron is doing something heavy.
Thought it would be easy to track, but I can seem to figure it out
TH
10:55
Tim Hunt
Are you using the same memecache instance for sessions and cache?
oh
10:55
omer hameiri
In reply to this message
Can't *
10:55
Matej Žerovnik
It could be due to multiple requests to the same content. If that happens, session locks don't allow access to it and requests go to "usleep" mode, keeping the mysql and memcached locked
oh
10:55
omer hameiri
In reply to this message
No. Just session
TH
10:55
Tim Hunt
OK, that rulse out one common issue.
10:55
Matej Žerovnik
also, memcached doesn't support locking, so locks go to either storage or sql server and moodle is checking if lock is free or not
10:57
we saw that when we were being ddos-ed. php-fpm was exhausted due to too many requests, which were locked and were waiting in queue for locks to be released. mysql connections went through the roof and we saw a lot of SETNX calls to redis (kidna like set lock if possible, otherwise return 'cant get lock').
oh
11:02
omer hameiri
Did you manage to resolve this? How?
11:04
Matej Žerovnik
Well the locks are not solvable easiliy. There was a lot done from Moodle side with read-only-sessions, but it only solves some of the problem
11:04
the main solution was blocking access from asia, mandatory captcha from ourside countries and rate limiting requests
11:05
we have relatively smooth sailing and a few attack that we get, we notify national cert to open a case
11:05
and hand it over to police / ISPs
oh
11:06
omer hameiri
so in your case the cause was DDOS if i understand you correctly ?
11:07
i don't think this is what's happening with my case.. think its coming from regular use + cron doing some clean up tasks or somthing
MB
11:07
Martin Božič
In reply to this message
Yes, and it was only about 5% of all traffic.
oh
11:09
omer hameiri
11:10
this looks like somthing else... around 9:30, it looks like something is hogging the DB, and aggressively creating session keys
11:10
now, if cron is not running - this dosent happen..
15:46
Matej Žerovnik
maybe crontab is locking some tables / rows?
2 December 2020
Itai Hareven איתי invited Itai Hareven איתי
AS
23:27
Alistair Spark
In reply to this message
Caching all the memes 😁
TH
23:29
Tim Hunt
Just think how much such a thing could improve the performance of the Internet!
AS
23:29
Alistair Spark
In reply to this message
Yeah we are facing that big time ☹️
23:30
In reply to this message
Indeed a mandatory feature for all chat applications
3 December 2020
oh
07:13
omer hameiri
In reply to this message
Facing what?
Heavy periodic load regarding sessions?
AS
10:07
Alistair Spark
Locks being an issue
10:08
Affects <5% of users but very bad for those who are
7 December 2020
Cazim invited Cazim
nizam invited nizam
n
15:11
nizam
Hi All
Nizam here from Dubai University
We wer using very well in aws ec2 single instance
During covid our users are increased lot
40 thausand users now
Should we go for multiple instsnce with autos scaling with aurora mysql database?
oh
16:52
omer hameiri
In reply to this message
Yes 😃
AL
18:53
Avi Levy
In reply to this message
Hi Nizam, I think that for get the right answer you metrics like rps, #connection, http response time..., those will help you to chose the right service and to be cost effective
n
18:54
nizam
Perfect
8 December 2020
Niraj M invited Niraj M
10 December 2020
Manuel ( ValaRaucO | ElikCR ) invited Manuel ( ValaRaucO | ElikCR )
11 December 2020
Mallika Valluru invited Siddhartha Malempati
Mallika Valluru invited Siddhartha Malempati
Mallika Valluru invited Anirudh Ac
Mallika Valluru invited Sheeban
Mallika Valluru invited sudhamsh Kandukuri
Mallika Valluru invited Sudhamsh K
Mallika Valluru invited Sheeban
S
20:57
Sheeban
Hi all,
Am facing issues with file upload for assignments.
21:03
Matej Žerovnik
What kind? We need more info😊
MR
23:07
Michael Ruder Rapajic
probably you need to change the values for upload file size in php (maybe even in web server). What kind of php/webserver combination are you using?
IM
23:28
Ivica Matotek
You can set in moodle limit on file uploads for assignments.
13 December 2020
n
00:07
nizam
Kindly someone help me to single sign on between 2 moodle
One production server is moodle 3.9 and another iomad 3.9
Is this possible?
Hani invited Hani
NK
09:10
Nadav Kavalerchik
FYI for anyone using AWS and Aurora:
https://tracker.moodle.org/browse/MDL-70157
M
09:10
MoodleBot
MDL-70157 - AWS Aurora MySQL support for Moodle (backport of MDL-58931)
Status: Closed - Fixed
Assignee: Rex Lorenzo
Reporter: Rex Lorenzo
Integrator: Andrew Nicols
Fix Versions: 3.9.4
Votes: 4
Srinivas Gundam invited Srinivas Gundam
Giovanni Tonello invited Giovanni Tonello
RJ
18:54
Rok Jaklič
Okay
19:40
Matej Žerovnik
We had a master-slave configuration of mysql and were using read-only configuration in moodle for some web nodes to load balance the load on the sql. We got a lot of reports of assignments being posted but the teachers could see them. After switching back to a single server, we hadnt had a report of missing assignment for the last week.
19:41
Our replication latency was between 0s and 1s 99,9% of the time (based on show slave status)
19:41
We had some spike up to 40s
MR
19:53
Michael Ruder Rapajic
In reply to this message
Now I have my answer. We must use galera cluster with active-active. I was afraid of this from the start.
MB
19:57
Martin Božič
In reply to this message
Well, we also ran DB backups 4/day and Moodle cron jobs. We didn't do any tuning in the config.php. I guess we should have a dedicated SQL replica just for web nodes.
MR
20:01
Michael Ruder Rapajic
for the read only queries did you use moodle feature "slaves" or proxysql/maxscale?
MB
20:03
Martin Božič
In reply to this message
That was the new Moodle "slave" feature.
20:11
Matej Žerovnik
We didn’t get to the root of the problem, but moving back to single instance fixed the problem. For now that works. We want to avoid running a mysql cluster as long as possible as it makes the stack more complicated.
20:12
I do have a hard time understanding why replicas with slight delay would be the reason for our problems
MR
20:13
Michael Ruder Rapajic
we are planing to use slave feature starting tomorrow (we have 3 active-active db server). I will let you know how it goes. We have been testing it for over a month). On the other note we have a new caching server with keydb(multicore redis replacement) in production and so far it looks good. redis was hiting single core limit.
MB
20:13
Martin Božič
Actually, the instance for cronjobs is still running with SQL slave config.
Matteo invited Matteo
14 December 2020
BH
00:31
Brendan Heywood (Moodle)
I'd be very interested in bug reports on the moodle level sql read replica code, we use this heavily across our fleet and so far have not seen any issues
MR
00:32
Michael Ruder Rapajic
what kind of db setup do you have?
BH
00:33
Brendan Heywood (Moodle)
aws rds, a mix of postgres and mysql
00:35
we wrote that code, and the code is designed to workaround the expected latency delay to the replica, so I just want to make sure that if there are code level bugs they are accurately logged in the tracker
00:45
Matej Žerovnik
I think we plan to open an issue, but we want to gather as much data as possible and be 100% that read replica is the culprit. I home our Moodle team finds the time to debug further and open the issue. Right now we know what the symptoms are and how it looks in the DB, but don’t know exactly why it happens.
BH
00:56
Brendan Heywood (Moodle)
I'd probably start by logging an issue with what you know and you can flesh it out more as you find out more, you can always close it later
Hasan Hasanzade invited Hasan Hasanzade
15 December 2020
Branko Petric invited Branko Petric
BP
19:47
Branko Petric
I have next situation: ~20.000 users should work moodle quiz (once in a year) MC qtype only. Something like entrance exam. And all of them should start aprox. in ~5 min. period. Quizzes are heavy on DB :/. What is the best solution?
19:49
Scale up, but current installation works ok for whole year.
19:49
optimize/ strip quiz module?
MR
19:51
Michael Ruder Rapajic
Can you describe your current instance/arhitecture?
BP
19:52
Branko Petric
1xLB + 6xWeb + 2xRadis + 3xDB (split R/W) simplified
19:53
Memcached also for app cache
19:54
Moodle 3.7 need some upgrade :D
20:05
Or maybe to develop something light and external and use something like LTI to connect.... (feedback and results can be returned few hours after quiz ends)
TH
20:33
Tim Hunt
20,000 simultaneous quiz attempts is a lot. This years I have seen MoodleMoot talks where people were talking about a few thousand. That works, but is probably the limit without extreme measures.
20:34
One intersesting finding was that: it is not acutally the quiz that puts the most load on the system. It is everyone logging in, going via the dashboard, and navigating to the quiz that really kills the server.
20:35
Also, Worth getting to at least Moodle 3.9. MDL-67183 is a worthwhile per improvement.
M
20:35
MoodleBot
MDL-67183 - Change the question_attempt class to only call apply_attempt_state if necessary
Status: Closed - Fixed
Assignee: Tim Hunt
Reporter: Tim Hunt
Integrator: Víctor Déniz Falcón
Fix Versions: 3.9
Votes: 3
https://tracker.moodle.org/browse/MDL-67183
TH
20:36
Tim Hunt
Whatever else you do, you need to work out how to use load-testing tools to simulate that amount of load in testing, before you do it for real.
NK
20:54
Nadav Kavalerchik
M
20:54
MoodleBot
MDL-70285 - The MDL-69687 upgrade step kills large databases
Status: Closed - Fixed
Assignee: Tim Hunt
Reporter: Tim Hunt
Integrator: Jake Dallimore
Fix Versions: 3.8.7, 3.9.4, 3.10.1
Votes: 3
BP
21:02
Branko Petric
In reply to this message
Dasboard can be skipped, loging maybe external database for that day, or disable(which is not recomended)
21:05
That is what I ment to strip whole moodle, disable everything I can, create light theme with no config, no blocks...
MB
21:06
Martin Božič
In reply to this message
Redirecting from the block laden dashboard to classic home page helped us tremendously. Actually the heaviest DB hitter is the Calendar IMO.
BP
21:07
Branko Petric
Yes, calendar is heavy also.
MB
21:10
Martin Božič
In reply to this message
No need to disable anything if they come via link pointing directly to the quiz?
21:11
In such one-off case.
21:11
Matej Žerovnik
Maybe, for the time being, redirect directly to quiz after successful login of that is possible?
21:11
s/of/if
BP
21:14
Branko Petric
In reply to this message
Yes it is possible, I can enrol all user infront or create activity on front page with some permissions.
21:15
Matej Žerovnik
in any case, I think you can do good simulation of real world with JMeter custom tests for quiz with some random delays with standard deviation. Should get you close to reality
BP
21:17
Branko Petric
It will be tested for sure. But what is a backup solution if tests fail 😃
TH
21:18
Tim Hunt
redesign the test so you don't need all 20000 students to attempt in at the same nanosecond.
21:19
For example, make several random variants of each question, and allow people to do it any time they like within a few days.
21:19
Matej Žerovnik
In reply to this message
I think when people are saying 20k users will do the test, they mean 20k users will be doing the tests at the same time, but that doesn't nessessary result in 20k req/s
TH
21:20
Tim Hunt
No, but teh people I have seen succeeding are ~3000 students.
21:20
Our Moodle system handles ~100 req per second for main PHP scripts. (not counting CSS/JS/icons)
BP
21:21
Branko Petric
I think of that, as small js script on button for link to start test automatically after rand time..
TH
21:21
Tim Hunt
There is a quiz access rule plugin which does that properly implemented.
BP
21:22
Branko Petric
Missed that :)
BP
21:23
Branko Petric
O as a plugin not in core, that is why
21:32
Matej Žerovnik
I think that is the best one can do without without coding
21:37
move memcached to local instances to give better performance and lowet load on network
21:43
when you are running tests, try to get some profiling going as well (such as newrelic or alike) to see where the requests spend most of the time and try to optimize that if possible
21:43
we found some bottlenecks there when we enabled redis
21:43
newrelic
16 December 2020
AN
01:50
Andrew Nicols (Moodle)
In reply to this message
I think it's one of the Melbourne Unis who has been doing something similar. They've moved almost all of their exams online and most are taken simultaneously.
01:57
I may be thinking of this presentation: https://www.youtube.com/watch?v=XRTM1XuhgBA
01:59
That's sacling to 4,500 seats
01:59
But it has some good takeaways and is well wroth the watch. May also be worth getting in touch with Cliff
BH
01:59
Brendan Heywood (Moodle)
Monash eAssessment is one of ours
AN
02:02
Andrew Nicols (Moodle)
Is that with Darren Coco?
BH
02:02
Brendan Heywood (Moodle)
yes
Andrew Nicols (Moodle) invited Darren Cocco
AN
02:03
Andrew Nicols (Moodle)
Hope you don't mind being yanked in here Darren but I thought you may find this channel helpful too
DC
02:03
Darren Cocco
lol, the first thing I noticed is the presentation my boss made about our systems 😔
AN
02:03
Andrew Nicols (Moodle)
Is that a bad thing?
DC
02:04
Darren Cocco
No, it means I probably belong here.
02:05
We have load tested to 15,000+ IIRC and that ran without problems.
02:06
The biggest challenge becomes the DB. Looking at multi-master distributed DBs for the next iteration.
AN
02:06
Andrew Nicols (Moodle)
aka voodoo
DC
02:07
Darren Cocco
Also we customised a few of the Moodle Quiz DB queries to optimise them.
02:07
Ran them through the postgres genetic analyser etc.
02:07
Yes indeed voodoo.
02:08
Luckily we have some DB engineers :)
02:09
Also we have a data driven approach to optimisation.
02:12
Oh yeah the eAssessment service also has a lower tolerance for data loss which amplifies the amount of DB writes/reads a bit.
02:12
*shrugs*
TH
10:21
Tim Hunt
In reply to this message
Thanks, Andrew, but, like I said, that is at the few thousand simultaneous test takers level.
AN
10:21
Andrew Nicols (Moodle)
Darren did say they'd tested with 15,000
TH
10:23
Tim Hunt
In reply to this message
This is the bit I am more interested in. Why have I not seen tracker issues off the back of this? (Please)
10:27
I think the remaining big win to be had in quiz performance is MDL-68806
M
10:27
MoodleBot
MDL-68806 - New quiz attempt states to support creating or grading them asynchronously
Status: Open
Reporter: Tim Hunt
Votes: 3
https://tracker.moodle.org/browse/MDL-68806
DC
10:27
Darren Cocco
Because we weren't interested in expending the effort to upstream our changes,
TH
10:28
Tim Hunt
But, you were happy to free-load off 15 years of my and the OU's word. Come on. Be a team player, otherwise Open Source does not work.
10:28
I am not saying polish a patch for integratoin.
10:29
I am saying spend 15 minutes to open a tracker issue with the inforatoin about what you fuold that worked for you.
DC
10:29
Darren Cocco
We have a very small dev team and only so many hours available to us, we are already stretched to our limits and are constantly on the razors edge of burn out.
TH
10:30
Tim Hunt
That is bad, and something your managers should not be tolerating.
10:30
Particuarlly if you are supplying mission critical services to your university.
DC
10:32
Darren Cocco
The patches are not small and related to past patches that you have already rejected.
TH
10:33
Tim Hunt
But, you found some sort fo bottle-neck. Where is that?
DC
10:34
Darren Cocco
I am not enthused by having to expend all the time necessary to push for patches that are not relevant to the current state of core.
TH
10:34
Tim Hunt
Once again: I am not insisting that you share patches.
DC
10:34
Darren Cocco
They were postgres specific optimisations to the QUBA object construction.
TH
10:34
Tim Hunt
I am saying that when you identify bottle-necks, you should share that knowledge.
DC
10:35
Darren Cocco
We all already knew about this bottle neck.
10:35
At least I hope that we already knew about the bottle neck in the massive QUBA query.
TH
10:36
Tim Hunt
Was it basically MDL-60202
M
10:36
MoodleBot
MDL-60202 - Consider eliminating the question_attempt_step_data table (store in a JSON blob in question_attempt_steps instead)
Status: Open
Reporter: Rockstar04
https://tracker.moodle.org/browse/MDL-60202
DC
10:36
Darren Cocco
Yes, that tracker item would be related to it,
10:36
That query is nasty.
TH
10:37
Tim Hunt
By the way, MDL-67183, which I did for 3.9, is a worthwhile performance win - particularly for question types like STACK.
M
10:37
MoodleBot
MDL-67183 - Change the question_attempt class to only call apply_attempt_state if necessary
Status: Closed - Fixed
Assignee: Tim Hunt
Reporter: Tim Hunt
Integrator: Víctor Déniz Falcón
Fix Versions: 3.9
Votes: 3
https://tracker.moodle.org/browse/MDL-67183
DC
10:39
Darren Cocco
We are also working on reducing the number of steps that actually need to be retrieved from the DB. Less steps processed, less data to retrieve, less time to construct QUBA.
TH
10:40
Tim Hunt
Which, presumably, only works for attempts using 'Deferred feedback' behaviour.
DC
10:40
Darren Cocco
Which is the exact feedback behaviour we use for exams...
TH
10:41
Tim Hunt
Right, but effectively, you are forking Moodle for your specific use case. Your privalidege, but not helpful for core.
DC
10:41
Darren Cocco
We are looking at forking off from quiz entirely because we are finding that the use more specific use cases of exams vs the more general use cases of quiz are forcing us to make unnecessary trade-offs.
JT
10:41
Jordan Tomkinson
Tim just about every partner has custom patches they do not contribute
DC
10:42
Darren Cocco
Tim do you use SEB?
TH
10:42
Tim Hunt
No.
10:44
Right, so you have spent the discussion here saying that the size of the QUBA query is too big, but, you were also complaining that I would not take one patch of yours, which is MDL-56317, which makes that query even bigger. I think you can see why I was reluctant to add that to core.
M
10:44
MoodleBot
MDL-56317 - Core changes to support retained auto-saves and response undo in quizzes
Status: Development in progress
Assignee: Darren Cocco
Reporter: Darren Cocco
Votes: 2
https://tracker.moodle.org/browse/MDL-56317
DC
10:47
Darren Cocco
Now you are twisting my words. I said I was unwilling to put in the effort to release optimisations that were built on top of code that was already rejected from core.
TH
10:47
Tim Hunt
OK.
10:48
But, is it basically the one postgres-specific win that you are running in one key query?
DC
10:48
Darren Cocco
Honestly it doesn't matter to me if you reject any patches that I make.
TH
10:49
Tim Hunt
If so, please could you just copy-paste your modified function into a Gist, or similar, so I can see the general approach you used?
DC
10:49
Darren Cocco
Probably, it has been a few years since we optimised that part of the code.
TH
10:51
Tim Hunt
I think @dobedobedoh's preference for that sort of code is to split the single large query into two - to reduce the amount of redundant data sent over the wire.
DC
10:52
Darren Cocco
That is not something I am authorised to do. I can tell you that IIRC it involved using window functions to cut down on the number of rows returned by eliminating records early in the join process. This resulted in less data over the wire and less steps to process.
10:52
But I wrote this in the Moodle 3.1 days.
10:52
So my memory is a little bit fuzzy.
TH
10:52
Tim Hunt
I'm not asking you to remember how it works.
10:53
Just copy-paste a bitfrom question/engine/datalib.php
DC
10:54
Darren Cocco
I don't own the copyright to that code. I have not had the code licenced to me. You are asking me to place my job at risk. Your request is unreasonable.
TH
10:54
Tim Hunt
No. It isn;t.
10:54
This is GPL code.
DC
10:55
Darren Cocco
Well then you are welcome to file a lawsuit against my employer.
10:55
At the same time I recommend you file one against all the Moodle partners as well.
TH
10:55
Tim Hunt
No. that is not the point.
10:56
GPL does not require you to share source for your chagnges.
JT
10:56
Jordan Tomkinson via @gif
Animation
Not included, change data exporting settings to download.
116.8 KB
TH
10:56
Tim Hunt
If you are not otherwise publishing them.
10:56
It's the classig TOtoara problem.
DC
10:56
Darren Cocco
Congrats, we don't publish them.
TH
10:57
Tim Hunt
However, most of the Moodle world, fortunately, goes by the ethos of Open Source.
10:57
And so, Moodle gets better.
DC
10:57
Darren Cocco
Every single Moodle partner does this.
TH
10:57
Tim Hunt
I konw.
DC
10:57
Darren Cocco
So Tim, my response to you is a flat out unequivocal no.
TH
10:57
Tim Hunt
OU does not share every Moodle change wo do.
DC
10:58
Darren Cocco
If you would like a different response. I am happy to get our legal counsel involved.
TH
10:58
Tim Hunt
However, we will often share snippets with peoeple who are interested.
DC
10:58
Darren Cocco
You evidently have freedoms that I do not.
TH
10:58
Tim Hunt
No. If that is your attitude, you are welcome to carry on, on your own.
10:58
Yes, I am very lucky with where I work.
10:58
Sorry ir you are not is such a lucky position.
DC
10:59
Darren Cocco
I like my job, sometimes I get to create public good.
10:59
Unfortunately that is not always.
TH
10:59
Tim Hunt
That sucks.
10:59
Matej Žerovnik
I'm sorry, but I think this debate got a bit too 1-1. Could you please move to private chat?
DC
10:59
Darren Cocco
Were we can convince legal that a piece of code does not offer us a COMMERCIAL ADVANTAGE we release it.
11:00
Sure
TH
11:00
Tim Hunt
Just FYI, what I have been multi-tasking with during this chat is pushing updated versions of some of our plugins to the Plugins DB.
Huong Nguyen invited Huong Nguyen
12:32
Matej Žerovnik
In reply to this message
We opened a ticket (MDL-70480) with our findings
M
12:32
MoodleBot
MDL-70480 - Some assignment submissions are invisible on the grading form on master/slave database setup
Status: Open
Reporter: Timotej Jazbec
https://tracker.moodle.org/browse/MDL-70480
oh
17:36
omer hameiri
General question..
Is your organization making any changes looking forward to next year / post covid?
Are you anticipating any rise in usage above regular times? Or expecting everything to get back to last year's numbers when all schools open up?

(We'll be facing many contract renewals in the upcoming months, and are wondering what to prepare for..)
Alieda Jeltje invited Alieda Jeltje
AS
19:30
Alistair Spark
Depends how the rollout of the vaccine goes over the next 6 months.

We have many courses which wouldn’t be able to fit into lecture theatres, so will have to keep some of the current arrangements.

However, I think 2022 will likely be a year of cost cutting...
19:30
Matej Žerovnik
I hope things go down a bit otherwise we will need to but a new storage:)
DC
20:32
Darren Cocco
My employer has already done deep cost cuts because enrollment changes this year and next year will impact revenue for the next 5 years. We have had hiring freezes across the board. I don't expect anything to improve for a long time.
20:33
Laying off staff means paying out severance benefits.
20:34
So for the next year we will be paying for those staff while not actually having the resources to hand so that we have opex reductions in 2022.
17 December 2020
Crazeh invited Crazeh
18 December 2020
NK
14:05
Nadav Kavalerchik
I am wondering...
Was anyone asked by their CISO / DPO to encrypt Moodle DB? for privacy reasons.
TH
14:17
Tim Hunt
Not sure. It is an increasingly common way to set up systems in general.
14:17
Not really a Moodle question, because when you connect to tthe DB using the nomral method, everything still works the same way. It is really about how you configure your infrastructre.
14:18
(I think) Hopefully someone more konowledgeable will comment.
NK
14:27
Nadav Kavalerchik
Thanks @timhunt .
14:29
One of our universities is considering encrypting only the mdl_user table.
Is that somthing anyone else is considering?
M
14:30
MoodleBot
MDL-69801 - Introduce a new generic private / public certificate + password / token manager api / Vault API
Status: Open
Reporter: Brendan Heywood
Votes: 6
14:30
MDL-65818 - Security: Provide admin setting type for secure data (passwords/tokens)
Status: Closed - Fixed
Assignee: Sam Marshall
Reporter: Sam Marshall
Integrator: Jake Dallimore
Fix Versions: 3.11
Votes: 8
BH
14:43
Brendan Heywood (Moodle)
we are interested in this, but for stuff like this I always want to understand the thread model. For many situations where an encrypted db would add value, if someone got that deep into the inf they'd almost certainly have access to a front end and so they'd be in the db anyway
M
14:43
MoodleBot
MDL-54704 - SSL-support for connection to Postgres and MySQL Database
Status: Reopened
Reporter: Tobias Reischmann
Integrator: Andrew Nicols
Votes: 32
NK
19:45
Nadav Kavalerchik
In reply to this message
Indeed. I always feel that breaking into the Moodle OS will result in getting any information from the DB, regardless the effort to encrypt it on the DB side. As I am also concerned that I am getting the risk of somthing gets worng on hardware level that can render the entire DB unreadable instead of a tiny fragment of data if it was unencrypted.
MB
19:48
Martin Božič
In reply to this message
But the management gets another security compliance checkbox to tick.
NK
19:48
Nadav Kavalerchik
😞
19 December 2020
AL
20:28
Avi Levy
We are using Istio and set all connection between pods to use mtls and it just work, in addition all disks in GCP are encrypted
22 December 2020
Arif invited Arif
23 December 2020
Shaheed K invited Shaheed K
24 December 2020
Wan Mohd Hafiz Wan Harun invited Wan Mohd Hafiz Wan Harun
27 December 2020
Andreas Deschka invited Andreas Deschka
28 December 2020
Mallika Valluru invited sai vineeth kumar tumu
29 December 2020
NK
14:47
Nadav Kavalerchik
AWS question...
Did anyone of you using AWS tryied to use Amazon EBS Multi-Attach to enable attaching a single Provisioned IOPS SSD (io1 or io2) volume to up to 16 Nitro-based instances that are in the same Availability Zone?
for shared storage for the Moodle shared moodledata folder
As I might be getting better shared storage performance for instances on the same zone. (maybe)

More info: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html
2 January 2021
Damien MASCRE invited Damien MASCRE
4 January 2021
Çağlar MERSİNLİ invited Çağlar MERSİNLİ
5 January 2021
Tobias invited Tobias
T
20:59
Tobias
What is the largest known Moodle installation currently?
TH
21:02
Tim Hunt
A few years ago, Catalyst gave a Moot talk about an install the middle east with a couple of million active users.
T
21:04
Tobias
Onprem or a public cloud?
TH
21:05
Tim Hunt
Don't remember. This was about 5 years ago (the UK Moot in Edinburgh corn exchange if you want to find the date). Not sure the clould was a viable option hten.
T
21:57
Tobias
Thanks. If you have access to a very fast SAN from all Webservers (FC), is there any benefit from using memcached, mongodb or redis for session cache?
21:58
Matej Žerovnik
Yea
21:58
Redis is way faster than any san
T
21:59
Tobias
But only if it is a real bottleneck, right?
21:59
Matej Žerovnik
Redis takes a frew us to execute a command
T
22:00
Tobias
Depends on your network I would guess?
22:00
Btw I am talking about pure session cache
22:00
Matej Žerovnik
True. But even with fast FC SAN, how are you sharing data over all nodes?
T
22:02
Tobias
That's the magic of a good SAN :-)
22:02
Matej Žerovnik
What SAN has a support for shared block storage? I’m listening😛
22:03
Or better yet, which fs are you using that has support for shared block storage
T
22:03
Tobias
You mean which San file system
22:03
Yes
22:05
Matej Žerovnik
If it’s not a secret, can you share more info?
T
22:08
Tobias
To be honest I don't know which San is in use. I would expect emc2
22:09
I mean provided there is no issue in concurrent access and there is no bottleneck regarding read/write and waits - the whole setup could be simplified?
22:11
My theory is that most moodle hosting is based on limitation on hardware, e.g. no SAN or limited memory, cpus per node
22:11
Therefore 'workarounds' such as redis or memcached?
22:12
Matej Žerovnik
Ugh I don’t think that will work.
T
22:12
Tobias
Most installations are likely to be limited by inefficient queries as the frist bottleneck.
22:13
Matej Žerovnik
With your setup, what is preventing 2 servers to write to the exact same file at the same time?
22:13
Or same block?
22:14
Filesystem locks are usually local to the system
22:15
Anyway, when you grow enough, you need shared redis not only for sessions, but for MUC as well
22:15
So one needs redis/memcached anyway, might as well use it for session cache
22:16
Also, it’s better to use SAN resources elsewhere as running redis is quite simple and you need little resources for the service
22:16
Even if clustered
T
22:22
Tobias
So having very high ram servers will still require a global muc?
22:24
So all global data could be efficiently cached by the database right?
22:26
Matej Žerovnik
Yes, some of the MUC must be shared
22:26
Mostly session MUC and some application MUC
22:26
Some application cache can be lokal (opcache)
T
22:26
Tobias
Application muc should all reside in the database?
22:27
Matej Žerovnik
Not sure, I think thats not an option
22:27
Either opcache or redis
22:27
Try to offload as much as possible to redis
22:27
As it is much faster than db or fs
T
22:28
Tobias
Or infinispan?
22:29
Matej Žerovnik
Infinispan is not a supported kv db in moodle
T
22:30
Tobias
In theory ssd raid and database would not be that much worse i would guess..
22:30
Matej Žerovnik
But it is
22:30
Redis time is in tens of us
22:31
SSD us what, 400us?
22:31
Nvme are 100us if they are really good
22:32
One doesnt really need ssd drives for moodle really. We see maybe 100iops on a site with 15k concurrent users, around 40k db queries and 60k queries on redis
22:33
If redis has problems, enerything is down just because latency is that much higher
T
22:34
Tobias
When would you say moodle reaches a general limit and would require splitting up in several instances or would a single instance scale forever?
22:35
e.g 1 Mio users?
22:36
Matej Žerovnik
Way before 1mio
22:36
I’m not the correct person, but I think we could run 20k concurrent on a single node
22:37
With a really fast db server (48 cores with lots of memory)
22:37
But it really depends on what modules / plugins users are using
22:37
One can handle 100k with this hw or 5k
T
22:38
Tobias
Above you meant per second right? 15k users and 40k DB per second?
22:38
Yes it all depends on models. One bad query can kill it all
22:39
So for scaling across many servers with many users probably mssql makes sense to scale tables and indeces across many servers?
22:40
Whereas mysql can just partiiton tables
TH
22:40
Tim Hunt
A lot of large scale moodle sites use Postgres.
22:41
Has very good concurrency, extremely good query optimiser, and extremely good at never corrupting your data.
T
22:42
Tobias
And less license costs for sure
22:43
Matej Žerovnik
40k concurrent users for us is 40k users active in last 5min
22:43
We are running on mysql without a problem for now
T
22:44
Tobias
Whats your slowed page load?
22:44
Slowest
22:44
Matej Žerovnik
No idea. Probably calendar with around 10s
T
22:44
Tobias
That's pretty slow yes.
22:44
Matej Žerovnik
95th percentile is around 200ms
22:44
Maybe a little less
T
22:45
Tobias
You can't scale a site with a page that takes 10s? Probably other issues to be resolved first?
22:45
Matej Žerovnik
So worst is calendar, but that is a known issue and is being worked on my moodle.
T
22:45
Tobias
Ok I see
22:45
Matej Žerovnik
My=by
T
22:46
Tobias
Hopefully just a missing index or a bad union join
22:47
Ok guys, thanks for the info - I am out.. See you
6 January 2021
Sherif Sharabi invited Sherif Sharabi
DM
11:19
Damien MASCRE
Hi ! What connection pooler do the large site use in front of their postgres, if they do ? I find that my performance are only affected by the concurrency on some tables (log, quizzes) and that is limiting the max concurrent user I can have taking a quiz (for example) at the same time, I have been using mariadb (10.3, then 10.5), and I find that goes the same way with postgres (I tried 11) when you use quiz with large number or people (2 to 4 thousands is large for me)
MJ
11:28
Mark Johnson
In reply to this message
We use pgbouncer on-site, RDS Proxy on AWS.
DM
11:32
Damien MASCRE
Thanks ! Which ratio do you use to compute how many threads/connections you manage with it ? I have two postgres (or mariadb, depending on what I am testing), using split read-write with moodle 3.9, on two 16-vcpu /48 gb ram.. I set max_conn to 32 on the postgres and I aligned pgbouncer on it and pgbouncer is configured to handle 4000 connexions, and with that I can't scale much on quiz
MJ
11:34
Mark Johnson
We didn't really compute how it's configured, we just run load testing an tweak the configuration until we eliminate the bottlenecks.
DM
11:38
Damien MASCRE
That's what I have been doing for several days now... I can run lots of users on about everything (but we don't use calendars :-P) .. the quizzes are the problem, lots of concurrency on the quiz_attempts and questions_* tables, increasing the core numbers cannot do anything, since there are locks held on those tables, I have been experimenting with partitions on mariadb, but apart from decreasing the time it takes to calculate the index changes after a transaction, it does not increase the parallelism, and I have not seen postgres do better in that area, I may be wrong though, and I have not found anything saying that postgres is able to parallelize queries when there are partitions..
11:41
I, sometimes, come to the conclusion that not so many people run quizzes on so many users at once (4000 at the same time) and that they all found solution to split that in smaller groups at different times.
AS
12:36
Alistair Spark
In reply to this message
I believe that’s still the biggest in number of users.
In terms of system scaling, that doesn’t necessarily translate to the ratio of concurrent users we would see in HE though.

But site with the most concurrent users is probably one that Catalyst AU looks after
TH
12:42
Tim Hunt
In reply to this message
If you are interested in large quiz perforamnce, look at some of the MoodleMoot presentations which came out last year. People are successfully manageing a few thousand.
12:42
With care and planning.
12:43
And, Thomas Muraz foudn that the quiz was not that bad. In his testing, the quiz was find. the thing that killed the servers was everyone logging in and navigating via dashboard.
12:44
Obviously, quiz is write heavy by its very nature. However, modern locking strategies mean it should cope. Data for each student is separate in the tables
DM
12:49
Damien MASCRE
I put up a gatling scenario to benchmark... It logs-in several thousands users (the sessions, logstore and recentlyaccessedblocks tables are heavily used, and it is the biggest toll on performances, but the users usually arrive scattered on the platform, 2000 to 3000 new logins under 5 minutes), that makes it a barrier, and then they click the course and then the quizz, which is another barrier, when the startattempt is played, the performance are better onwards.. the scenario plays three quizzes (one 30-MCQ and two 60-MCQ) on parallel, then plays the submit quizz and logout.
12:50
Thanks I'll browse through the presentations !
Daril B invited Daril B
NK
21:26
Nadav Kavalerchik
I know of a few universities that run a special Moodle server for the "End of semester" quizzes, that is optimized only for taking quizzes, i.e no redundant plugins installed, no special sophisticated theme, no calendar (blocks), with extra quiz access behaviour that randomly delays the start of the quiz attempt for each student, quiz questions are split into pages, if possible. and more teaks...
21:29
AWS question...
Is anyone of you using AWS tryied to use Amazon EBS Multi-Attach to enable attaching a single Provisioned IOPS SSD (io1 or io2) volume to up to 16 Nitro-based instances that are in the same Availability Zone?
for shared storage for the Moodle shared moodledata folder
As I might be getting better shared storage performance for instances on the same zone. (maybe)
Thought it should be used with a clustering FS like GFS or others, which might get the latency back to levels I am getting when I use EFS (NFSv4 based)

More info: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html
n
22:39
nizam
We are using below artitecture successfully For very large users with upto 10000 concurrent users.
22:40
7 January 2021
Luis de Vasconcelos invited Luis de Vasconcelos
oh
09:44
omer hameiri
In reply to this message
Was latency with efs not an issue for you?
09:44
Regarding cache, web files, etc...
09:45
We've struggled with this issue and eventually moved to nfs
n
09:55
nizam
No issue till now
NK
12:09
Nadav Kavalerchik
I find the EFS latency an issue for Moodle code, so we moved it locally into each node in the ASG EC2 instances
12:13
Matej Žerovnik
what is the latency of EFS volumes?
NK
12:26
Nadav Kavalerchik
12:31
Matej Žerovnik
time is in ms, right?
NK
15:18
Nadav Kavalerchik
Not sure :(
This is from Moodle cachestore testing UI
AN
16:18
Andrew Nicols (Moodle)
In reply to this message
I’ve always gone for local code. As long as you can ensure it is in sync
16:18
In reply to this message
Yes
MJ
16:35
Mark Johnson
In reply to this message
+1 we found that having the code on a shared filesystem was a disaster
16:36
I build it into our containers
DM
16:55
Damien MASCRE
You can mount a tmpfs and rsync all the code in it, and you warm up the opcache, it is fast !
NK
18:40
Nadav Kavalerchik
In reply to this message
Eventually this is also what we do. We do not use rsync, as We have a script that is doing git pull on all the Moodle instances, in each EC2 on the ASG cluster.

Though I wished I could be lazy and have the code in one shared storage, but even OPCache with "do not check for timestamps" did not help and was not good enought to save the day, as Moodle functions needs to scan the disk for plugins, from time to time, and do other disk io.
18:41
In reply to this message
We'll get there, one day. I think it is the best way.
oh
19:14
omer hameiri
In reply to this message
So why not one central nfs for moo code? Mapped for all instances..
19:14
Works great for us anyway
AL
20:03
Avi Levy
In reply to this message
For container this is the best solution, it also make your code manage by tags and give you the ability to fast rollout and rollback (if there is no version or DB changes)
NK
23:08
Nadav Kavalerchik
In reply to this message
I will try. thanks for the advice.
I thought EFS which is NFSv4 is suppose to be the same.
Have you tryied https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html ?
AN
23:52
Andrew Nicols (Moodle)
In reply to this message
Sorry. Not rsync. Have done it via deb packages before for example or by clones of images
oh
23:56
omer hameiri
In reply to this message
Nope. Haven't tried it. Nfs was way easy to set up that we just stuck with it
8 January 2021
Job Céspedes invited Job Céspedes
NK
11:47
Nadav Kavalerchik
In reply to this message
Great, thanks! I will try this over the weekend.
9 January 2021
Deleted invited Deleted Account
SG
13:19
Srinivas Gundam
Hi Guys, need your suggestion for a problem we are facing with Moodle installation
13:20
Moodle is installed on Cloud on a single server with 32GB RAM, 8 CPU, 5 Gbps network. With performance testing some issues are observed, no memory or CPU limits reached.
13:20
We have executed multiple tests that
Users between 35 to 49 has breaking point for LMS application, where users starts reporting error (zigzag pattern – observed when test executed more than 49 users after change performed). The application is reporting errors after regular interval (2 minutes) once number of users/ connection crosses 'breaking point'
oh
13:21
omer hameiri
Cache? Spec of DB server?
SG
13:59
Srinivas Gundam
Redis Cache
14:00
Entire Moodle application including DB is using the 30GB RAM
oh
14:01
omer hameiri
You mean DB is in the same machine as web app?
14:01
On*
14:02
35 users is a very low number..
14:02
Maybe misconfigured db? Low connection limit?
SG
14:04
Srinivas Gundam
Yes same server
14:04
Let me check the connection limit
oh
14:07
omer hameiri
Doesn't sound very scalable...
Why not separate db and web servers?
It's the easiest thing to do and can translate into a big improvement..
SG
18:09
Srinivas Gundam
In reply to this message
Apologies, correction, there are 2 different servers for webserver and database each of specs - 30 GB RAM, 4 core cpu, 2.4 Ghz clock Speed, 300MBps network bandwidth, 225 GB SSD
18:10
In reply to this message
Is there a way to check the low connection limit?
19:58
Matej Žerovnik
What web server? Php version 7+? Opcache enabled? What is page load time? When hitting a limit, any cpu on any server 100? What about IO, any iowait/storage bottleneck?
10 January 2021
SG
06:48
Srinivas Gundam
Our LMS System Benchmark plugin score - you can share this in telegram group
06:49
1. Benchmark - Done
2. Redis Server, Configurations - Done
3. Nginx, Apache 2 - Done
4. php APC - Done
5. apache2/mods-available/mpm_prefork.conf - MaxRequestWorkers - 500 - Done
06:55
06:55
With 140 users
06:55
06:55
With 35 users
SG
07:45
Srinivas Gundam
Web Server : Apache, php version
Opcache : we need to check disable/enable?
Page load time / limit etc: benchmark plugin results is sufficient
No storage bottleneck yet,
IO: package in/out status
S
11:25
Sheeban
Hello / Hola,
Need some help!
My filedir has occupied a large amount of space around 325GB whole partition. I tried removing the assignments from UI still unable to clear any space. Can someone please help me how do I delete those files say Orphaned files.

Am using moodel39+
NK
11:27
Nadav Kavalerchik
IM
11:28
Ivica Matotek
If you have Recycle bin activated then it can take some day to clear space from filedir.
S
11:31
Sheeban
In reply to this message
No this isn't activated,
Can you please guide me to help get it activated.
IM
11:48
Ivica Matotek
Site adiministration -> Plugins -> Admin tools -> Recycle bin
11 January 2021
DM
08:37
Dan Marsden
Also check out report_allbackups and report_coursesize plugins which can help track down where the large files are
S
09:30
Sheeban
In reply to this message
Thanks alot
09:30
In reply to this message
Sure thanks for the information
AS
13:42
Alistair Spark
Has anyone setup Cloudflare in front of Moodle & had issues with scorm?
TH
13:43
Tim Hunt
Moodle HQ use cloudflare, I believe.
13:43
I don't think there is much SCORM on moodle.org
AS
13:45
Alistair Spark
It’s extra weird as it seems more like a conflict with new relic
T
14:02
Tobias
I guess there is newrelic support from cloudflare itself
SG
16:28
Srinivas Gundam
Dear Team, any further suggestions on the issue?
T
16:29
Tobias
New Relic Browser - Cloudflare Apps
https://www.cloudflare.com/apps/new-relic-browser
16:29
This does not work? If manually added maybe it is the js compression
16:30
Or aggregation
John Azinheira invited John Azinheira
AS
21:42
Alistair Spark
We’ve already got NR pro installed in the containers & currently testing CF in front of everything but it looks like the issue might be limited to only a few scorm packages after all. Seems NR might be proxying some Moodle php/js pages.
Will have a proper look tomorrow
T
21:43
Tobias
Try to define a rule to exclude the newrelic agent js from aggregation or async rewriting
21:44
Or ask newrelic support :-)
T
23:07
Tobias
Regarding CF, ensure you set the cache directives correctly on origin side. Once logged in, better don't cache anything or you will end up in trouble.
23:10
(apart from static resources of course but compression, image optimization, aggregation and better protocols often give you a big boost anyway)
12 January 2021
BH
00:44
Brendan Heywood (Moodle)
when we look at the long tail of slow requests we see a large number of pages which can take 10's of minutes to run, various reports, and exports. Most cdn's have some sort of hard coded network inactivity limit usually around the 2-5 minute mark which is a complete show stopper. eg CF in front of moodle.org causes this exact class of issues like https://tracker.moodle.org/browse/MDLSITE-6129. As a result we've more or less hand rolled our own cdn which is moodle aware, but long term we want to hunt down all these dodgy slow scripts and make them all async and / or streaming
M
00:44
MoodleBot
MDLSITE-6129 - Timeout on uploading new version of plugin to plugins directory
Status: Open
Reporter: Justin Hunt
Votes: 5
DM
00:49
Dan Marsden
I do occaisionally wonder if bringing back the "pathtozip" stuff would fix that too... using cli to unzip rather than PHP.
LG
01:29
Lee Goldsworthy
In reply to this message
I could look into this deeper for our other systems, but can confirm we don't use it for MC
AN
01:30
Andrew Nicols (Moodle)
In reply to this message
We use it for moodle.org
01:30
And some other sites
T
08:44
Tobias
Why don't you use a subdomain for this or a path (mod_rewrite) and exclude cf rules in those cronjobs that call URLs?
BH
14:37
Brendan Heywood (Moodle)
in the past we've setup stuff like this eg reports.moodle.client vs moodle.client. But it is at best a clunky workaround, and has a lot of downsides eg session sharing. This works ok for slow api calls but not as well for slow human facing ui. The best outcome is a strong application level guarantee that all requests will either respond in a certain time or stream with a maximum network inactivity time between chunks. Lets not forget that the ux here is plain bad, who wants to wait 30 minutes for a report or action? Honestly a minute is bad enough. We need to just make it all much faster or async
SG
14:46
Srinivas Gundam
In reply to this message
Hi @matejz omer any suggestion
13 January 2021
LG
09:53
Lee Goldsworthy
In reply to this message
Looks like spam to me, someone remove this user?
MJ
10:18
Mark Johnson
In reply to this message
NK
10:25
Nadav Kavalerchik
Thank you @marxjohnson & @LeeGoldsworthy . I removed the spam messages.
Mehrdad Shahsavari invited Mehrdad Shahsavari
NK
10:28
Nadav Kavalerchik
BTW, if anyone from Moodle HQ or others would like to be a co admin... please PM me.
As I think it would be better to have a few admins on this group.
A
14:33
Arif
Hi everyone 👋.
I have been dealing with 40k user and 2k user maximum concurrent.

We're hitting issue when user around 500-600. Doing quiz/file submission.
Our DB hit the limit when using 32Core.
We bump up 64 core doesnt help.
Slow log query show mdl_file query to itemid.


Our spec right now

6 x 4core 16GB moodle all have cron.php
Nginx+php-fpm

1 x 4core 8GB management have cron.php
Nginx+php-fpm

1 x 6GB redis
3TB azure file Share

Azure database for mariadb
64core
320GB memory
600GB storage

Any idea to improve the performance?
Based on statistic.
3.6k use file submission and 400 user attending the quiz today
AL
15:15
Avi Levy
Hi Arif,
I think you should start with analyze to slow query with heavy cpu usage. Then optimize moodle code or optimize db index. This will reduce the cpu usage in the DB
NK
16:36
Nadav Kavalerchik
@xzrian I assume you configured MUC properly, so I recommand you run BlackFire.IO when you experiance heavy load, to see in realtime where is the bottleneck (CPU/RAM/IO)
A
16:44
Arif
In reply to this message
Actually find out that select * from mdl_files where itemid =? Have lot of slow query
TH
16:47
Tim Hunt
What plugin is doing that?
16:48
Makes not sense unless you aslo specify component, context etc, and then there would be an index
16:48
If you are on Moodle 3.10, then MDL-68874 is worth knowing about.
M
16:48
MoodleBot
MDL-68874 - New optional SQL debug mode which instruments SQL with the calling PHP code
Status: Closed - Fixed
Assignee: Brendan Heywood
Reporter: Brendan Heywood
Integrator: Jun Pataleta
Fix Versions: 3.10
Votes: 6
https://tracker.moodle.org/browse/MDL-68874
AL
18:17
Avi Levy
A Simple Approach to Troubleshooting High CPU in MySQL - Percona Database Performance Blog
https://www.percona.com/blog/2020/04/23/a-simple-approach-to-troubleshooting-high-cpu-in-mysql/
18:19
This guid solve me an issue of cpu load, we reduce it from 22 cores to 8
14 January 2021
A
03:34
Arif
In reply to this message
Hi everyone.
I were checking on schedule task.
Submission annotation eat a lot of resources on backend
GP
06:02
Gunawan Prasetia
In reply to this message
Have you tweak your mysql configuration?
A
08:31
Arif
In reply to this message
We have tweak based on the recommendation only.
Not using apps
n
16:48
nizam
In reply to this message
Quiz takes lots of memory.. U should look for multiple instance with high availability and database as aurora
A
16:53
Arif
In reply to this message
We're running 6x 4core 16GB ram actually.

Azure database for mariadb.
64 core as per now
n
16:53
nizam
Same issue we faced earlier
16:54
So we installed bitnami moodle multiple instance template from bitnami
16:54
Jus a single installation from bitnami to azure as template stack
16:54
Please go to below url for more info
A
16:55
Arif
In reply to this message
How about the plugin installation from bitnami one?

And updating the moodle version?
Is it simplier?
16:57
Yes
16:57
Very very simple steps
16:57
Pls opt azure Maria dB topology
16:57
Dnt go for front end topology
16:57
Literally u dnt need to do anything
16:57
It's template ready
A
17:00
Arif
How about plugin installation?
n
17:00
nizam
It's same moodle
17:01
U can install plugins
17:01
And u Wil not face any high cpu usage issues during quiz
17:01
We have hosted 4 moodle through bitnami multitier stack
17:02
We have approximately 100000 users all over GCC
17:02
We use theme edumy for moodle for look n feel
17:03
One of the moodle instance we have customized for banking requirements
And its almost LXP features
A
17:03
Arif
🤔
n
17:03
nizam
Thanks to bitnami for providing such a wonderful template stack
A
17:04
Arif
In reply to this message
Do you use image to scale?
n
17:04
nizam
Auto scaling
17:04
Jus go through documentation
A
17:05
Arif
Let me check first
17:06
Because from what i experience and do.
I require to update on main vm.
And then scale out to the vm scale set on azure
17:09
Let me try first on my test environment
SK
17:09
Sudhamsh K
Okay
A
17:11
Arif
Because the plugin installation are required to install on every server.

From what i experience.
When installing plugin, it will install on one server only.
Thus whenever i go other server. Will have missing plugin issue
n
17:13
nizam
U Wil not face this issue in bitnami image
A
17:13
Arif
I'm new to moodle deployment. Around 6 month for this client
17:13
Deployed using manual way.
n
17:14
nizam
U r talking about front end topology
17:14
Opt database topology
17:14
Pls read that documentation
A
17:16
Arif
Ok2
17:16
And this customer have requirement using centos instead of debian
17:18
Ok nvm
Let me do testing first
Maybe for the new client
15 January 2021
AN
02:04
Andrew Nicols (Moodle)
CentOS is dead
02:06
Also, Moodle does not recommend the use of pre-packaged Moodle environments like Bitnami - certainly not for hosting at scale. For hosting at scale you really need to be in full control of your full stack and optimise it to your specific requirements. Packaged application installations, like those from Bitnami, make a number of assumptions to suit a wide variety of installations without focusing on any one specific installation or its performance.
02:07
This is the same for any pre-packaged environment. They have their place, but certainly not at scale.
02:09
If you need to scale by running multiple instances of the same image then you probably should be looking at building your own images and tailoring them to your requirements. No point in running [x] package, or you may find it more performant to run nginx instead of httpd, and therefore increase users/node, reducing cost.
JC
03:53
Job Céspedes
In reply to this message
Hi,

We hit similar issues in early December, to the end of the semester. Traffic pattern was similar to what you described. This is a multi tier architecture, dedicated hardware, similar cpu and memory provisioned. We tried an option that did not work for us. Unfortunately, we have not figure out the exact cause/solution and traffic has decreased since then.
04:03
This is a postgres cluster with pgpool in front of it. The option we tried was to enable pgpool load balancing for read only queries. After that, a different issue hit us with assignments showing as not submitted for teachers.
04:09
For some assigments, field "latest" in table "mdl_assign_submission" was set to 0 despite field "status" in that same table being "submitted"
04:11
I hope to help a little, by sharing what did not work
A
04:24
Arif
In reply to this message
We're using it before Centos was dead actually
04:25
In reply to this message
How many user concurrent actually?
04:25
In reply to this message
We didnt do clustering on our mariaDB
JC
04:26
Job Céspedes
In reply to this message
Between 1.5 to 2k (GA metric)
AN
04:49
Andrew Nicols (Moodle)
In reply to this message
Yes, but it has a very limited shelf-life. You have < 12 months to move to something that won't be bleeding edge and will have security support. Starting a new CentOS system for production now is inadvisable IMHO.
JC
04:55
Job Céspedes
@xzrian Let me corrrect that. It was around 1k at that moment
A
06:36
Arif
In reply to this message
Noted on that
06:37
In reply to this message
Daily basis?
Because my moodle daily basis around 400-600

On exam. The highest i see is around 1k.
The system design to handle more than 2k
JC
06:45
Job Céspedes
Those are realtime active users in Google Analytics(GA) context: "Active users are those who have sent a hit to Analytics within the last five minutes."
06:46
It was around 3k active users per hour, 20 k daily.
07:01
Traffic is usually like that (well, during the pandemic), but quices/submissions issues arose at the end of the last semester with high concurrent requests for those resources.
DM
09:24
Damien MASCRE
In reply to this message
See MDL-70227 for that
M
09:24
MoodleBot
MDL-70227 - latest flag not correctly set for group submissions
Status: Open
Reporter: septatrix
https://tracker.moodle.org/browse/MDL-70227
14:59
Matej Žerovnik
In reply to this message
Also mdl-70480. We were hit with exaclty the same issue and had to revert to single instance
M
14:59
MoodleBot
MDL-70480 - Some assignment submissions are invisible on the grading form on primary/replica database setup
Status: Open
Reporter: Timotej Jazbec
https://tracker.moodle.org/browse/MDL-70480
DM
15:41
Damien MASCRE
In reply to this message
I checked on my end, I already had the problem in last september when I was not using the read/write split yet. I still think the problem lies in the fact the "latest" flag is reset when there is only one attempt. Although I havent been able to reproduce it too.
JC
16:41
Job Céspedes
In our case, the issue had two variants in the database. 1) the previusly mentioned values, and 2) field "status" in that same table "mdl_assign_submission" was set to "new" while field "latest" being "1"
16:41
Didn't know about MDL-70227 and MDL-70480. I'll take a look. Thanks
M
16:41
MoodleBot
MDL-70227 - latest flag not correctly set for group submissions
Status: Open
Reporter: septatrix
16:41
MDL-70480 - Some assignment submissions are invisible on the grading form on primary/replica database setup
Status: Open
Reporter: Timotej Jazbec
T
16:43
Tobias
If you started to develop Moodle from scratch in 2021 which technologies would you use and what generic desigm decisiona would you change from start? Or would you do everything exactly the same?
TH
16:45
Tim Hunt
Obviously you would not do it the same.
DM
16:47
Damien MASCRE
In reply to this message
In my opinion, I can't see why the latest flag is checked each time the get_{user,group}_submission function is called, it should be check and corrected only when the submission is added/modified, in {save,remove}_submission
T
16:48
Tobias
The main issue with installation of modules is, every new module or add on could break it. In any case some kind of performance tests prior to installation of a model or a backend technologie or design that is not affected by issued like random new tables and inefficient queries/joins.. Or it could be solved by a kind of moodle store with quality certificate with models marked as performance tested
16:49
Sorry autocomplete. Modules not models.
16:51
https://moodle.org/plugins/ eg. There could be an official moodle quality check of modules and then marked as certified.
JC
17:12
Job Céspedes
In reply to this message
Very nice question. I appreciate what Moodle has become. It has its own merit addressing an unrepeatable historical context. Still, I have asked myself the same question once in a while. To give my humble opinion:
- Architecture based on cloud technologies (similar to edX). No Moodledata
- NewSQL to allow more concurrency (easy to do shards, tenants)
- Different programming language
- Industry frameworks and standards (avoid in-house) to take advantage the most of work and collaboration from projects beyond our own
TH
17:13
Tim Hunt
Name any framework that you would be happy to use today, which existed in 2000
JC
17:13
Job Céspedes
Yep. None.
TH
17:14
Tim Hunt
If you want to build something today which is as successfull as Moodle, you need to choose a framework that will exist in 2040.
17:14
Slighly scary thought.
DM
17:15
Damien MASCRE
Moodle is a framework by itself
T
17:15
Tobias
I am still using Linux which is older :-)
TH
17:15
Tim Hunt
Indeed. Worth appreciating that form time to time.
DM
17:16
Damien MASCRE
That's its force, I think
T
17:17
Tobias
I think the biggest challenge independent of language is being able to have a flexible plugin mechanism and at the same time consistency
AS
21:09
Alistair Spark
In reply to this message
I mean it’s not really bleeding edge; it’s a couple of weeks ahead of rhel which is 5 years behind everything else
AN
23:39
Andrew Nicols (Moodle)
In reply to this message
But there’s no stable option or LTS option
16 January 2021
xSD invited xSD
AS
21:11
Alistair Spark
In reply to this message
It’s still an LTS - aligned to RHEL’s 5 year lifecycle. It’s not evergreen.
There are still parallel major versions being supported 8,9,10... for the full RHEL lifecycle.
Much more stable than Fedora, slightly less than RHEL.
17 January 2021
AN
00:22
Andrew Nicols (Moodle)
In reply to this message
A’s understand it there will only be one track, not parallel major versions. But information is scarce so it’s hard to know.

https://www.techrepublic.com/article/clearing-up-the-centos-stream-confusion/

It will have daily updates and that may not appeal to all
AS
00:25
Alistair Spark
00:25
I was looking at the FAQ - https://centos.org/distro-faq/
00:31
The way I read it, centos stream gets rhel updates 3 weeks early, except for security patches which are just after / same time as rhel
00:32
(3 weeks being a finger in the air estimate)
18 January 2021
Ahmed invited Ahmed
YK
15:06
Yedidia Klein
Hi All...
MoodleData and NFS is sometimes the bottleneck of performance
While there are some huge directories on moodledata that are very important like filedir and trashdir...
and others that impact performance, and are much less important in case of data lose (like cache/temp/lock/sessions etc)
some of these directory might be less used in case of redis cache for sessions and cache - but still it's in use...

Did someone tried to use glusterFS for these directories and NFS just for filedir and trashdir ? or maybe fast SSD NFS for cache/temp/session and slower (but better redudancy) NFS for filedir ?
Tnx for sharing your experience !!
17:15
Matej Žerovnik
We have NFS on SSD drives, but to be honest, it would probably work just as fine on 10k HDDs with HW controller and cache. There is not that much traffic on moodledata from the IOPS POV. We are seeing around 300 read / 200 write IOPS during peak usage (cca 10k concurrent users).
17:16
and 90MB/s read and 20MB/s write
17:16
A
17:46
Arif
In reply to this message
Previously i want to use glusterfs

But somehow the fuse ( glusterfs) having issue.

Very low performance
T
22:51
Tobias
Open file handlers issue on operating system?
19 January 2021
Deleted invited Deleted Account
BH
11:02
Brendan Heywood (Moodle)
historically we've had lots of issues with session locking on shared file systems, using redis or db locks is way better
23 January 2021
Claudio Chacon invited Claudio Chacon
24 January 2021
A
04:51
Arif
Anyone know this feature

MDL-45264
M
04:52
MoodleBot
MDL-45264 - Assignment: Annotate PDF - Download all annotated PDFs (as Zip)
Status: Waiting for peer review
Assignee: Alexander Morris
Reporter: Trevor Cunnnigham
Fix Versions: FRONTEND
Votes: 36
https://tracker.moodle.org/browse/MDL-45264
A
05:25
Arif
In reply to this message
I'm using 3.8.4

No downloads all annotated PDF 🤔
AN
06:23
Andrew Nicols (Moodle)
It is up for review which means that it has bask developed but not merged into the product yet.

3.8 is out of mainline support so this feature will not be included there. New features are only included in a new major release, so 3.11 at the earliest
A
07:41
Arif
In reply to this message
Our user said this feature has been there on 3.6 🤔
AN
07:49
Andrew Nicols (Moodle)
In reply to this message
Not as a core feature. That's a customisation.
07:53
For starters the langstring is wrong - it has incorrect grammar. The correct version would probably be "Download all annotated submissions as a PDF"
07:53
And secondly it's not in code.
07:54
Someone has either modified core, or installed an additional grade feedback plugin with those settings
A
08:25
Arif
In reply to this message
Noted on that
26 January 2021
Bastian Thies invited Bastian Thies
NK
23:00
Nadav Kavalerchik
Some "light" reading for the weekend https://www.smashingmagazine.com/2021/01/front-end-performance-2021-free-pdf-checklist/
Maybe we can pick up some ideas that can improve Moodle performance on large scale system
27 January 2021
BH
01:26
Brendan Heywood (Moodle)
biggest thing on my front end radar is a bunch of tweaks to the filter_imageopt
JC
16:49
Job Céspedes
Hi, any experience using transaction pooling with Moodle?
16:49
Any issues with temp tables, prepared statements or advisory locks, for instance?
16:54
Besides disabling 'fetchbuffersize'
TG
18:36
Toni Ginard
We're using AWS Aurora Postgres Serverless. Don't know exactly how it works behind the scenes, but we know it uses a pool of connections. The problem we have with that service is when it scales, because during the process (one minute, more or less) new connections to the database are not possible and we get a database connection error.
18:37
According to AWS, the problem is caused by the advisory locks and the cursors with hold. We have not fixed it yet
JC
21:59
Job Céspedes
Thanks for the info.
T
22:01
Tobias
What's the connection limit? What are the stats for max threads and max connections?
22:01
Biggest question is always: base installation or additional modules installed?
JC
22:10
Job Céspedes
In reply to this message
Wondering if for advisory locks, a different lock factory, besides default in the DB, would help avoid the issues. There is a redis lock factory, but I have not try it before. https://github.com/open-lms-open-source/moodle-local_redislock
Gillian invited Gillian
28 January 2021
T
00:33
Tobias
Use Redis redlock http://redis.io/topics/distlock instead of Postgres advisory locks
MB
08:52
Martin Božič
Does anyone use ad-hoc server just for automated backups? I'm considering it since we have a separate cronjob host but automated backups seem to be clogging the rest of the scheduled tasks.
BH
08:52
Brendan Heywood (Moodle)
we make sure there are more runners than the backup task threshold
AN
08:53
Andrew Nicols (Moodle)
In reply to this message
Natively it doesn't allow you to do that - there's no way to tell it to only run jobs of type "x"