Issue description
Webpages are a lot more complex than they were 15 years ago. Before, most content was served from one domain. If you visited webpages from the BBC, all the content, including such as the images and scripts came from bbc.co.uk or bbc.com. This isn't the case today. Most webpages have social media buttons such as Facebook, Twitter retweet, and adverts, and trackers as well as external content from content delivery networks.
When running reports to show the websites a user has visited, the results are often confusing because the majority of the domain names shown might not mean anything to you. Names like "akamai.net", "ytimg.com" and "twimg.com" won't be recognizable immediately for most people. However, you might recognize "twitter.com" and "youtube.com".
By using the option to exclude certain categories in the web filter reports, most of these entries can be cleared out. Doing so creates a domain report that is far more accessible and understandable to most people. Here, we show how to achieve this.
Selecting exclusion categories
The Smoothwall blocklist contains a section called "Web infrastructure". Categories here are used for back end domains and other miscellaneous domains. By excluding a good number of these as exclusions, domain reports become a lot cleaner. The suggested ones to exclude are:
APIs & Web Libraries
Example: When a web page has a Google map showing locations, it's using API and web libraries.
Content Delivery
Example: Websites often have video or other high bandwidth content. This is mostly hosted on content delivery providers like Akamai.
Software Updates
Example: Windows updates. Antimalware updates.
SSL/CRL
Example: Sites used for certificate verification and updates.
Transparent HTTPS incompatible sites
Example: Sites or resources used by applications that do not add SNI information.
Trusted Ranged Get Services
Example: Windows updates and other update services that use HTTP range requests as a download method.
User tracking and Site stats
Example: Trackers use to map user activity and target adverts.
Search Suggestions
Example: The search suggestions shown by search pages after each letter typed in a search field.
-
Uncategorized content
The "Categories to exclude" field is a free text field, so not a list that you can select one from. Categories have to be added here by hand. Additional categories could be "Custom allowed content" for example. If there are a selection of domains you want to exclude, collect them in a category and add that to the list. You do not need to have a filter policy using the category for it to be logged and excluded from reports.
Here is the list again in an easy format that you can copy:
Other cleanup tips
Exclude domains:
Good candidates for exclusion would be "ytimg.com" especially for search reports. This prevents links to video thumbnails on YouTube showing up as user searches (links to thumbnails are shaped as if it is a search).
Homepage domains and other single domains that are accessed often and uninteresting from a reporting point of view.
The options to "Exclude adverts", "Do not include unauthenticated requests" and "Filter out Images, Javascript etc." should also be used.
Depending on the purpose of the report, these are all helpful in cleaning up the result list and show recognizable domains. If you were to run a report to show bandwidth downloaded, then these exclusion would not be a good idea, as that will likely miss the majority of downloads so bear in mind the purpose and the metrics used in the reports, as to whether these exclusions are appropriate.