Insights usage varies from customer to customer so there is no real "one size fits all" template. However it is worth highlighting some of the features Red Hat has in place to assist with large sized deployments.
This is not intended to be a best-practices guide, just some things to consider.
I typically emphasize how easy it is to deploy insights - with it's minimal steps, due to being SaaS; however, for an even easier deployment of Insights on a large scale, Insights has scripts available in Puppet, Chef and Ansible to use along with our getting started guide. If a customer happens to be also managing these systems via a version of Satellite with Insights integration, mass registration of Insights is built in (via the bootstrap script provided with Satellite).
A proxy is not required to be used for Insights, but typically the systems of customers with larger scale deployments are already set up to communicate with Red Hat via a proxy, rather than being individually connected (or in disconnected environments).
Insights also supports this infrastructure model via proxy support as an option within our client, so there are no changes that you (as the customer) need to make with how your systems communicate with our servers.
If you are using a version of Satellite with Insights integration, it can be used as the proxy.
By default, Insights sends information once a day. e find that this default frequency fits the needs of most accounts.
However, one may have concerns about the impact, at scale, of all these clients checking in and transmitting a payload on your network. To help with this, we have implemented a default, feature to make the Insights client automatically stagger its check-in time; so rather than all checking in at once, they'll check in individually or random groups. All out of the box!
And if you want to take it a step further and have full control of your own schedule, the Insights client can be customized via cron (when using RHEL 6) or via systemd (when using RHEL 7.5+).
Another capability we put in place is "grouping". Grouping allows you to plan remediation based on your Company’s criteria (purpose, localization, etc), giving you greater control of your infrastructure.
We find the most common usage of groups are by:
Environment (Prod / Dev / QA / Stage / etc)
For people or business groups (Amaya's systems / Insights team)
Business critical systems (Hadoop / Oracle / Etc)
Geographic localization (EMEA / NA / LATAM / APAC)
Most of these are formed by customers around their workflow. However, some may already have a grouping method in place in other tooling like Satellite or Tower, which they replicate.
If you are looking to mass group your systems, especially at a large scale, I recommend using the --group functionality in the Insights client upon registration, which allows for quick management and organization of these systems, although you can still do it manually via the UI afterwards. And of course, you can automate this with the methods mentioned before (Ansible, Chef, Puppet).
Setting up the systems is one thing to consider at scale, but once Insights is setup and detecting all those problems now on 1000+ systems, how do you take action at scale?
A few recommendations in this regards would be:
Use Ansible remediation with Insights-generated playbooks to coordinate a fix for one or multiple issues impacting your infrastructure.
Use Planner to help coordinate planned remediations for actions and systems.
Use groupings for the systems you care about the most, this will chunk up the actions
Address by severity - Insights applies one of 4 severity (risk) levels to all actions to help you focus on the most important ones.
Attack the low hanging fruit - Insights also provides a "Risk of Change" rating for actions depending on the impact of the change on the environment. Remediating actions with the lowest Risk of Change generally won't require any downtime or coordination.