# WAI-Tools Documentation of Pilot Monitoring

As a part of the WAI-Tools project, the Norwegian Digitalisation Agency has performed a pilot monitoring on a sample of four public sector bodies and their websites based on the requirements for monitoring as referred to in the Directive (EU) 2016/2102.

## 1. Introduction

Directive (EU) 2016/2102 aims to ensure that the websites and mobile applications of public sector bodies are made more accessible on the basis of common accessibility requirements.

The Member States shall ensure that public sector bodies take the necessary measures to make their websites and mobile applications more accessible by making them perceivable, operable, understandable, and robust.

Content of websites and mobile applications that fulfils the relevant requirements of European standard EN 301 549 V2.1.2 or parts thereof, shall be presumed to be in conformity with the accessibility requirements. According to EN 301 549, conformance with the web requirements is equivalent to conforming with the Web Content Accessibility Guidelines (WCAG) 2.1 Level AA. An accessibility statement should be provided by public sector bodies on the compliance of their websites and mobile applications with the accessibility requirements laid down by the Directive.

Conformity with the accessibility requirements set out in the Directive, should be periodically monitored. The Member States shall apply

• an in-depth monitoring method that thoroughly verifies whether a website or mobile application satisfies all the requirements identified in the standards and technical specifications
• a simplified monitoring method to websites that detects instances of non-compliance with a sub-set of the requirements in the standards and technical specifications

referred to in Article 6 of Directive (EU) 2016/2102.

WAI-Tools, Advanced Decision Support Tools for Scalable Web Accessibility Assessments, is an Innovation Action project, co-funded by the European Commission (EC) under the Horizon 2020 program (Grant Agreement 780057). The project started on 1 November 2017 for a duration of three years.

The project is closely linked to European and international efforts on web accessibility standardisation, including the Web Content Accessibility Guidelines (WCAG) 2.1. It is also highly relevant considering the monitoring methodology referred to in the Directive.

The project partners are:

• European Research Consortium for Informatics and Mathematics (ERCIM), European host for World Wide Web Consortium (W3C) ​
• Siteimprove, Denmark​
• Accessibility Foundation, Netherlands​
• University of Lisbon, Portugal​
• Deque Research, Netherlands​
• The Norwegian Digitalisation Agency, Norway​

Some of the objectives of the project are to:

• build a common set of Web Accessibility Conformance Test (ACT) Rules from W3C, to provide an interpretation for evaluation tools and methodologies based on the W3C Web Content Accessibility Guidelines (WCAG)​.
• help increase the level of automation in web accessibility evaluation through defining test rules and bleeding-edge technologies which are increasingly available today.

The ACT Rules formally published by W3C provide authoritative checks for the WCAG 2.1 Success Criteria. Through a community of contributors, the aims are to reduce differing interpretations of WCAG, make test procedures interchangeable​ , and develop a library of commonly accepted rules​. The ACT Rules also leads to the development of automated and semi-automated testing tools, which can be used by monitoring bodies to carry out the monitoring more efficiently and effectively.

At the time of writing this report, there are 5 such ACT Rules formally published by W3C, with several more developed by the WAI-Tools Project that are in the process of approval and publication. The WAI-Tools Project aims to develop 70 test rules through an open community process and contribute them to the W3C process for approval and publication. The ACT Rules developed by the project are implemented in open-source engines developed and maintained by the project partners Deque, FCID (University of Lisbon), and Siteimprove.

As a part of WAI-Tools Work Package 2 in the project, the Norwegian Digitalisation Agency has performed a pilot monitoring, based on the requirements for monitoring as referred to in the Directive. The pilot is built upon an analysis of the requirements in the Directive and encompasses both the simplified and the in-depth monitoring process.

To perform this pilot monitoring, the Norwegian Digitalisation Agency has used the draft ACT Rule implementations developed by Deque, FCID, and Siteimprove available at the time, to test a limited number of websites. Mobile applications and downloadable documents were not tested in the pilot.

The outcomes of this effort, combined with the documentation of the ACT rules developed in WAI-Tools, form a “demonstrator” from Norway as required by the project. The pilot has provided valuable experience and documentation to be used in further preparations for monitoring.

Preparations for the pilot started in September 2019 and the pilot was concluded in March 2020. The project partners have contributed to the report with valuable input. Their comments and amendments are included in the final report.

## 2. Summary

In this chapter, we present a summary of the pilot monitoring effort. The objective of the pilot has been to explore and try out all the steps of the monitoring process, based on a sample of four public sector bodies and websites, within the timeframe of the pilot.

The focus has not been to produce test data and other data in the amount needed for analysis and reporting in line with the Directive, but rather to gain experience and identify which issues that need to be followed up in order to be prepared for real monitoring.

The monitoring process may be regarded as a sample survey that consists of the following steps:

1. Planning and design
• What are the requirements in the Directive for monitoring and reporting?
• What issues and research questions should be investigated?
• What data do we need to cover the research questions and perform reporting?
• Which requirements in the standard are relevant to include in the monitoring?
2. Sampling
• How can we select a sample of entities, websites, and test pages?
3. Data Collection, including test
• What experiences did we gain in the pilot regarding data sources, methods, and tools for collecting data?
4. Analysis and reporting
• What did we experience in the pilot in our effort to establish a dataset suitable for analysis and reporting?

In the following, we summarize the findings and learning points from each phase or step in the monitoring process. The learning points are also presented at the end of the respective chapters.

### 2.1 Step 1: Planning and design

Planning the monitoring is essential, this applies especially for the first couple of times and for the first reporting to the EU. We must decide on which issues and questions that should be investigated in the monitoring, in what way, based on the requirements for monitoring and reporting in the Directive.

Therefore, we performed an analysis of the requirements for monitoring and reporting in the Directive as described in chapter 4.1. Based on the documentation of the requirements, we have derived the amount and composition of the sample, and also which data we must collect through the monitoring process, in order to perform the necessary analysis and reporting, as described in chapter 4.2.

Through the planning process, we also elected what Success Criteria to include. In the pilot, this was limited by the partially incomplete implementation status of the tools per February 2020.

In real monitoring, we will, in addition to what is implemented in tools chosen for test, also take into consideration experiences from earlier monitoring efforts, and probably also do semi-automated and manual tests in order to cover requirements with high risk for non-compliance. This applies, especially for simplified monitoring. For in-depth monitoring, all the requirements will be included.

Findings and learning points:

• It is crucial to analyse the Directive to identify requirements for monitoring and reporting.
• Before starting the test and other data collection, we must define, as precisely as possible, which research questions to be investigated, and then ensure that we collect all the data needed for analysis and reporting. The research questions are listed in chapter 4.1.6.
• The requirements for monitoring, reporting, and the list of research questions underlie decisions on the following:
• The sample of public sector bodies/entities, web solutions, test objects/pages, etc. This applies to the size and composition of the sample, and the selection method for entities, web solutions, and pages.
• The monitoring methods, tools, and test mode (automated, semi-automated, manual)
• For simplified monitoring: which Success Criteria shall be included, especially considering that automated test is preferable.
• Data needs, the methods and for data collection and the data sources
• Which analysis that must be performed in order to report in alignment with the Directive
• In this pilot:
• We used 19 ACT rules that covered 13 WCAG 2.1 Success Criteria. In our opinion, it is of vital importance that this work proceeds until all the accessibility requirements in the Directive are covered. This is due to the need for a documented, transparent, and commonly accepted test method.
• We met the requirements for simplified monitoring by covering the 4 principles in WCAG as well as 7 of 9 user accessibility needs
• Since we used the same ACT Rules and Success Criteria in both simplified and in-depth monitoring, we covered 29 % of the Success Criteria required by the Directive (for the in-depth monitoring).
• Even though we had a somewhat limited scope regarding the number of requirements included, we came close to meet the minimum requirements for simplified monitoring. However, based on findings in previous monitoring in Norway, there were several high-risk Success Criteria that were not covered in the pilot.
• This was because neither the development of test rules nor the implementation of test rules were completed when the pilot testing was performed. High-risk Success Criteria are continuously being addressed in the project.
• Regardless of this, one should be aware of that the election (and exclusion) of Success Criteria in the monitoring, may imply that there could be significant accessibility problems that are not uncovered in the monitoring. This applies, especially for the simplified monitoring.
• In the foreseeable future, we consider that there will be a need to supplement automated tests with both semi-automatic and manual tests to cover all the Success Criteria and requirements in the standard. This applies, especially for the in-depth monitoring.

Especially when planning the first monitoring and reporting:

• There is a comprehensive need to collect and store data in both simplified and in-depth monitoring, as described in chapter 4.2. The data must be collected from diverse data sources. We need to establish a data model to structure the data, in order to facilitate efficient data storage and retrieval. This data model is to some extent built on the open data format for accessibility test results that was developed by the project and implemented in the tools as one of the output formats.
• We plan to use the accessibility statements to collect structured data about the public sector bodies, web solutions, area of services, and individual services per entity. Later, we will consider combining the accessibility statements with automated tests that the entities can perform themselves. Part of the WAI-Tools project is to develop a prototype large-scale data browser, which would collect and analyse data from accessibility statements. However, this was not available at the time of carrying out the pilot and may be considered in the future.
• However, it should be considered whether the requirements for monitoring and reporting are too extensive, especially when it comes to data needed to compose the samples of entities, web solutions, and pages, in line with the Directive (as described in Chapter 4.2.1-4.2.3). This applies in particular to the amount of services, processes, pages, and documents that shall be tested in the in-depth monitoring. Maybe a limited sample of services may be enough, instead of monitoring them all.

In addition, we have identified a short-list of criteria that should be considered when selecting a tool in real monitoring:

• The coverage of WCAG, i.e. the tool should cover as many requirements/Success Criteria as possible
• The tool should as far as possible secure the needs for transparency, reproducibility, and comparability, thus
• it must be based on a documented interpretation of each of the requirements in the standard.
• as far as possible, be based on the ACT Rules, as they meet the need for a documented, transparent, and commonly accepted test method.
• the test rules (and the way they are implemented in the tools) must be documented in order to show what interpretation of the requirements that are covered by each test.
• the tool should include or be combined with a crawler that is suitable for sampling most of the pages and content that should be included in the monitoring.
• the test results should specify the outcome of the tests like passed, failed, inapplicable (and not tested).
• the tool should preferably give test results both on the page and the element level, specified per success criteria.
• the number of tested elements and pages especially failed elements and pages should be counted and identified, per success criteria and in total.
• the test results should be in a format suitable for analysis and reporting in line with the Directive and provide the web site owners/the public sector bodies with the necessary information in their work for improving their websites.

### 2.2 Step 2: sampling

In this chapter, we summarize the findings and learning points from the sampling of public sector bodies/entities, websites, and pages.

Sampling of public sector bodies/entities:

• We will benefit from developing an efficient method for establishing a representative sample in line with the Directive. This will hopefully contribute to the reliability and social significance of the analysis on accessibility barriers uncovered by the monitoring.
• A representative sample will allow us to generalize the monitoring results and establish a national indicator on the degree of compliance with the accessibility requirements set out in the Directive, and overall accessibility indicator for websites. This may also be suitable for bench-marking purposes.
• In order to select a representative sample, we need information about the population of public sector bodies. For this purpose, The Norwegian Register of Legal Entities (or similar) can be used for drawing a sample of public sector bodies.
• The classifications of the institutional sector and industry as listed in the register can be useful to determine the level of administration (state, regional, local, body governed by public law). Based on a combination of the classifications of the institutional sector and industry, we can also get a brief indication of the area of service.
• There is great potential in more automation, for instance by drawing (as far as possible) a random sample from The Norwegian Register of Legal Entities (or similar), based on specifications of the criteria mentioned in the Directive.

Sampling of websites:

• There is no register in Norway that is suitable for drawing a sample of websites. Thus, the total website population is unknown. To locate and draw a sample of websites that fits 100 percent with the criteria in the Directive, we need to collect data that shows which web solutions belong to each selected entity.
• Some entities have more than one website. In these cases, we need to determine which website to include in the monitoring. On the other hand, some entities share web solutions with others, and in those cases, we need to identify who is responsible. In the pilot, we needed to combine information from the register with online searching for public bodies’ websites. This is time-consuming.
• A combination of data from the Norwegian Register of Legal Entities and data input from the entities through the accessibility statements might be an efficient method. We can use the statements to collect structured data on the level of administration, area of service, which websites (and mobile applications) that are relevant for monitoring, etc.
• Still, there are important questions concerning the sampling requirements that are not specified in the Directive, such as:
• Whether or not the sample for simplified and in-depth must be selected in separate operations.
• Whether the samples for respectively simplified and in-depth monitoring, shall both aim for a diverse, representative, and geographically balanced distribution.
• Whether the in-depth monitoring can be based on a selection of the websites in the simplified monitoring.

It would be very helpful if these above-mentioned matters could be clarified.

Sampling of pages (and documents) - in-depth monitoring:

• It should be considered whether the requirements for the sample of test pages are too extensive. The sampling of web pages is a complex and time-consuming task that requires manual effort. It might be considered to establish a dialogue with the website owners for the sampling of pages and documents. However, we must take to consider whether this is cost-efficient.
• The terms “Type of service” and “Process” should be defined, in order to avoid a random approach when identifying services and processes. In real monitoring, this also applies to downloadable documents, as the Directive requires testing of at least one relevant downloadable document, where applicable, for each type of service provided by the website.
• If the sampled pages in in-depth monitoring are part of a process, the Directive requires that all steps in the process are monitored. In our experience, processes may require log-in using a national ID number. Therefore, it is crucial that we establish a method for acquiring log-in access to these processes/web pages.

Sampling of pages (and documents) - simplified monitoring:

• We need a scale or criteria for assessing what will be a suitable number of test pages, based on the estimated size and the complexity of the website that shall be monitored.
• For simplified monitoring, we need access to a crawler to sample web pages. However, the crawler in the different tools can use different methods to crawl the websites. One should, therefore, be aware that this can cause the tools to sample different pages on the same website.
• Crawling is also suitable to estimate the size of a website. But in our experience, the crawlers did not find (and count) hidden pages, subdomains, or pages that require log-in.
• In a lack of adequate alternatives, we still consider using a crawler to sample web pages (and estimate the size) to be the most efficient method.

Sampling of pages (and documents) - both in-depth and simplified:

• We need a method to exclude pages that include third party content and other content exempt from the Directive.
• We need to explore whether it is possible to test web pages that require log-in, using automated tools. This applies primarily to in-depth monitoring, but it is desirable to be able to cover this type of content in simplified monitoring as well.

### 2.3 Step: Data collection

This step covers both the production of test data and other data needed for monitoring and reporting. The most important findings and learning points are listed.

• We managed to collect the data about the public sector bodies as listed in chapter 4.2.1. Most data could be downloaded from the Norwegian Register of Legal Entities (or similar).
• The combination of institutional sector and industrial classification may be enough to determine the type of service offered by the entity (through their website), especially for simplified monitoring. For in-depth monitoring, this may be insufficient.
• In many cases, we will need to do a more comprehensive check of the actual content on the entity’s website. This might be very time consuming and the quality of the data could be insufficient. It should be considered to use the accessibility statements as a data source for this purpose. This could be far more cost-effective than manual inspections. Part of the WAI-Tools project is to develop a prototype large-scale data browser, which would collect and analyse data from accessibility statements. This could be considered for future exploration.

• In the pilot, we combined data from the Norwegian Register of Legal Entities with searching on the internet in order to locate the selected public sector bodies website URLs. It should be considered to make it mandatory to report the web solution addresses to the register, and/or arrange the accessibility statements so that this data could be reported directly to the monitoring body
• In some cases, data about the area of service can be registered at the entity level. In other cases where the entity offers a wide scope of services, and in addition has different websites for different kinds of services, the area of service must be defined at the web solution level. This applies for example to the Norwegian Digitalisation Agency.
• Some entities/websites offer a wide array of services. In the pilot, we only sampled a few services from the website we monitored in-depth. In a real monitoring we will need an overview of all the services offered by an entity/through a website, as the Directive requires that at least one relevant page for each type of service provided by the website to be monitored. In consequence, we need to collect extensive information about the various services offered by the entity/website. Therefore, it should be reviewed to what extent this is cost-effective and whether it rather should be possible to monitor a sample of services, instead of including them all.
• Identifying the services offered on a website is challenging. As a rule of thumb, municipal websites will in most cases have a more standardized (statutory) set of service offerings. This will also be the case for many of the regional websites. For the state websites and those belonging to bodies governed by public law, there are most likely wide variations, both in scope, complexity, and in terms of what services they offer. One should consider using the web accessibility statements to gather this kind of information, possibly supplemented by dialogue with the public sector bodies if further information is needed.

• For simplified monitoring, it is only the homepage that needs to be identified (and documented). For the in-depth, there is a need for more data about the pages, than in the simplified.
• We managed to identify and register data about the web pages that are required for the in-depth monitoring, given they were present at the website.
• For assessing whether a page is a part of a process, and for checking out that we sampled the pages listed in the Implementing Decision, we had to inspect the website manually. That was the only way we managed to collect the data about the web pages, as listed in chapter 4.2.3.
• The process was time-consuming and relied on participation from the website owner. In a real monitoring, it might be considered too extensive to have dialogues with the website owners in order to collect data about the sampled web pages. Therefore, it might be useful to review whether it is necessary to sample (and document) the test pages as detailed as described in the Implementing Decision.
• Either way, we need information/data that connects the pages to services and processes. Therefore, the terms “Type of service” and “Process” should be defined.

• Collecting data about the requirements was done by consulting the EN standard.
• More specifically, information about which user accessibility needs that correspond with each requirement/Success Criteria, is to be found in the standard EN 301 549, Annex B.1. This data helps us in an analysis of what digital barriers users with different user accessibility needs meet on the internet.

• Using the three tools provided by the project partners, we managed to collect test results at the page level per Success Criterion for the in-depth monitoring, and individual test results per Success Criterion, for both the in-depth and the simplified monitoring.
• We used two of the three tools in the simplified monitoring. We did not manage to collect test results at the page level since we were not able to establish a model for converting test results from the test rule level to the Success Criterion level. The method used for this purpose in the in-depth monitoring, was not feasible for the simplified monitoring, due to the huge number of pages tested (up to 1 000 pages per website).
• All the three tools reported test results in the category “failed”, while one of the three also reported results in the categories “passed” and “inapplicable”. In our opinion, it is preferable to have data about the test results that cover all the three categories “passed”, “failed” and “not applicable”.
• For two of the tools, it was also difficult to ascertain whether a web page had been tested.
• It was challenging to get the hold of the number of unique failed pages per Success Criterion, by using the tools in their current status. In a real monitoring, we need to extract test results at the Success Criterion level. A solution could be to arrange the export functions from the tools so that we could retrieve results for unique pages per Success Criterion directly.
• We spent a significant manual effort to extract and present the data. We also struggled with exporting test data from the tools into another format, suitable for distribution to the website owners. In addition, we also need a data format that is suitable for further analysis.
• If we have had the opportunity to dig deeper into these issues within the timeline of the pilot, it is possible that the tool vendors could have assisted us in producing and converting test results more efficiently. This data model is to some extent built on the open data format for accessibility test results that was developed by the project and implemented in the tools as one of the output formats.

Note: It is important to emphasize that if we had had the opportunity to dig deeper into these issues during the pilot, the project partners responsible for the tools most likely could have helped us in producing, exporting, and converting test results more efficiently.

• Documentation of test methods, tools (and version), is essential to secure transparency, reproducibility, and comparability. In general, it is crucial to investigate the documentation of the tools in order to be in control of what is tested and how the tests are performed.
• Given that this pilot was carried out during active project development, it was challenging to determine which version of an ACT Rule was implemented in each of the three tools, as only information about when each ACT Rule was last updated is available on the ACT Rules website. The ACT Rules website also has an overview of the different implementations of the ACT Rules, but it is not specified which version or when the various ACT Rules were implemented in the tools. We expect this situation may improve as ACT Rules get formally published by W3C, and implementations of these stabilize.

Data about the monitoring and reporting:

• Data about monitoring and reporting can easily be collected from the planning and documentation of the monitoring. This data is among other things, information about the monitoring period, the body in charge of the monitoring, etc. A complete list is shown in the table in chapter 6.6

### 2.4 Step 4: Analysis and reporting

It was not within the scope of the pilot to produce test data in the amount needed for performing analysis and reporting as described in the Directive. Therefore, a summary presents our experiences in establishing a data set suitable for analysis and reporting. We will also present our thoughts and reflections regarding the calculation of compliance level and other issues which in our opinion need clarification.

The findings are summarized below:

• Regarding the questions of the
• sample of entities and websites
• sampling method
• Success Criteria and test methods
• user accessibility needs
• The data collected in the pilot are assessed to be sufficient and suitable for performing the analysis needed for reporting.
• Together with the test results, the data about the monitoring are crucial for answering the research questions.

Further reflections about analysis and reporting:

• We need a documented method for the sampling of entities and websites and, as far as possible, an overview of the population of entities and websites. This is to form a basis of assessing to what extent the monitoring results can be generalized. We also need a consistent way of sampling test pages. This is crucial both for comparing results between websites, the categories of public sector bodies, and when comparing results from different monitoring periods.
• Based on the requirements for reporting, the monitoring bodies need a method and a scale to express quantified results of the monitoring activity, included quantitative information about the level of accessibility.
• The quantified test results per Success Criterion and the mapping to the user accessibility needs, form the basis for a qualitative analysis of the outcome of the monitoring, especially the findings regarding frequent or critical non-compliance. Thus, we need a method for performing the qualitive analysis as described in the Directive and a template for reporting to the EU.
• There is also a need for a clarification of the term “compliance level” (or compliance status). The monitoring bodies need a (simple) method and a scale to express quantified results of the monitoring activity, included quantitative information about the level of accessibility.
• Due to the standard, the basis for calculating the level of compliance are test results at the page level. Thus, we need a way to extract aggregated test data directly from the tools, that shows both the number of tested pages and the number of unique pages that fails on each Success Criterion. This applies for both the simplified and the in-depth monitoring.
• Due to the standard, we may calculate the compliance level as the percentage of the tested pages that fully complies with all the Success Criteria included, specified by in-depth and simplified monitoring.
• On the other hand, calculating the level of compliance at the element level in a simple way, will give us a more nuanced picture of compliance status. An example:
• count number of tested elements per identified Success Criterion, specified by the outcome of each tested element (passed, failed, inapplicable and perhaps, not tested)
• calculate the compliance level as the percentage of tested elements that comply with the requirements. This may also facilitate benchmarking and measurement of trends in level of compliance.
• Based on the compliance level at the website level, the average or aggregated compliance status for all the websites in each monitoring, specified by in-depth and simplified monitoring could be calculated.
• Similar calculations should also be made
• per category (level of administration) of public sector bodies
• per Success Criterion
• Since there are multiple ways of calculating the compliance status, there is a need for a clarification of the term “compliance” and how it should be measured.
• There is also a need for reporting test results that identifies which elements on the tested pages that are not in compliance. That is for the website owners to supported in their efforts for correcting failed elements.

## 3. Objectives of the pilot

Member States shall monitor the compliance of websites and mobile applications of public sector bodies with the accessibility requirements provided for in Article 4 of the Directive, on the basis of the methodology set out in the Commission Implementing Decision, on the grounds of requirements identified in the standards and technical specifications referred to in Article 6 of the Directive.

The Directive includes requirements regarding

• sampling of websites and mobile applications to be monitored.
• which types of webpages and documents to be monitored in-depth.
• presentation and reporting of the monitoring results.
• providing the results to the public sector bodies responsible for the solutions that have been monitored.

The objective of the pilot is to gain experience with the entire monitoring process as described in the Directive and Implementing Decision. The goal is not to solve all the issues encountered in the various steps of the pilot monitoring, but rather to identify and address what will need follow-up in order to be prepared for the monitoring.

The monitoring process may be regarded as a sample survey that consists of the following steps:

Based on such an understanding of the monitoring process, we have defined the following questions to be investigated in the pilot:

1. What are the requirements in the Directive for the simplified monitoring, the in-depth monitoring, and the reporting?
2. What issues and research questions should be investigated in a monitoring in general?
3. What data do we need to cover the research questions and perform reporting in line with the Directive?
4. Which requirements in the standard are relevant to include in the monitoring, based on the assumption that testing shall be done using automated testing methods/tools?
5. How can we select a sample of entities, websites, and test pages?
6. What experiences did we gain in the pilot regarding data sources, methods, and tools for collecting data?
7. What did we experience in the pilot in our effort to establish a data set suitable for analysis and reporting?

The findings are summarized in learning points for further follow-up in our preparations for monitoring in line with the Directive.

The pilot was primarily focused on gathering experience with the steps in the monitoring process, as presented in Figure 1. Therefore, there was little emphasis on producing data to an extent that will be necessary to perform statistical analysis.

In the following chapters, the steps in the monitoring process are elaborated. We then summarize the key findings and learning points from the pilot for each step of the monitoring process.

## 4. Step 1: Planning and design of the pilot monitoring

Planning the monitoring is essential, this applies especially for the first couple of times and for the first reporting to the EU. Through the planning process we decide on the following:

• Which issues and questions that should be investigated in the monitoring. This is largely determined by the requirements for monitoring and reporting in the Directive. This is further determined by which requirements/Success Criteria we include in the monitoring.
• The size and composition of the sample of entities and web solutions included in the monitoring. This also follows from the Directive.
• What data we need to answer the questions that underlie the monitoring. This will be investigated based on the bullet points above.
• Which data sources, methods, and tools that are the most suitable for collecting data.
• What analysis must be done to report in accordance with the requirements of the Directive

### 4.1. Requirements for monitoring and reporting in the Directive

The Member States shall monitor the compliance of websites and mobile applications of public sector bodies with the accessibility requirements provided for in Article 4 of the Directive on the basis of the methodology set out in the Commission Implementing Decision, on the grounds of requirements identified in the standards and technical specifications referred to in Article 6 of the Directive.

#### 4.1.1. Monitoring methods

The Member States shall monitor the conformity of websites and mobile applications of public sector bodies using:

1. an in-depth monitoring method to verify compliance
2. a simplified monitoring method to detect non-compliance

The in-depth monitoring shall

• thoroughly verify whether a website or mobile application satisfies all the requirements identified in the standards and technical specifications referred to in Directive (EU) 2016/2102 Article 6.
• verify all the steps of the processes in the sample, following at least the default sequence for completing the process.
• evaluate at least the interaction with forms, interface controls and dialogue boxes, the confirmations for data entry, the error messages and other feedback resulting from user interaction when possible, as well as the behavior of the website or mobile application when applying different settings or preferences.

The simplified monitoring method shall

• detect instances of non-compliance with a sub-set of the requirements in the standards and technical specifications referred to in the Directive.
• include tests related to each of the requirements of perceivability, operability, understandability, and robustness

In addition, the simplified monitoring method shall

• inspect the websites for non-compliance.
• aim to cover the following user accessibility needs to the maximum extent it is reasonably possible with the use of automated tests:
1. usage without vision
2. usage with limited vision
3. usage without perception of colour
4. usage without hearing
5. usage with limited hearing
6. usage without vocal capability
7. usage with limited manipulation or strength
8. the need to minimise photosensitive seizure triggers
9. usage with limited cognition

#### 4.1.2. Compliance and non-compliance

In simplified monitoring, we shall detect instances of non-compliance, while we in in-depth monitoring shall verify compliance. For definitions of compliance and non-compliance, the Directive refers to the EN 301 549 standard.

The standard states that:

"A page satisfies a WCAG Success Criterion when the Success Criterion does not evaluate to false when applied to the page. This implies that if the Success Criterion puts conditions on a specific feature and that specific feature does not occur in the page, then the page satisfies the Success Criterion."

Footer inn her

Thus, for each Success Criterion, the check for compliance or non-compliance happens at the page level. For a web solution to comply with the requirements in the Directive, all tested pages must comply.

Determination of compliance is defined in the following way:

"Compliance is achieved either when the pre-condition is true and the corresponding test [in Annex C in EN 301 549] is passed, or when the pre-condition is false (i.e. the pre-condition is not met or not valid)."

Footer inn her

For each of the requirements regarding pages, the pre-conditions and tests are stated as shown in the table below, for all relevant Success Criteria.

Example: 1.1.1 Non-text content

Table 1: Example of determination of compliance
Type of assessmentInspection
Pre-conditions1. The ICT is a page
Procedure1. Check that the page does not fail WCAG 2.1 Success Criterion 1.1.1 Non-text content
Result

Pass: Check 1 is true

Fail: Check 1 is false

This implies that compliance status of a web solution is based on combinations of outcomes/categories of test results. Compliance is when a page have all the test results “passed”, when absence of test results in the category “failed” (i.e. “non-compliance”) and/or test results are in the category “inapplicable” (e.g. the type of content targeted in a test is not present at the actual test page).

It may become challenging to calculate the compliance status of a website based on the definition of the term "compliance" as described in the standard. In our experience from previous monitoring efforts, the number of failed elements on a page, linked to each of the Success Criteria where non-compliance is detected, will give us a more nuanced picture of compliance status. If we in addition get information about how many elements have been tested for each Success Criterion, and the outcome for each tested element, we would be able to calculate the level of compliance in a simple way. This may also facilitate benchmarking and measurement of trends in the level of compliance.

#### 4.1.3. Sampling of public sector bodies and web solutions

The number of websites and mobile applications to be monitored in each monitoring period shall be calculated based on the population of the Member State. The sampling of websites shall aim for a diverse, representative, and geographically balanced distribution. The sample for mobile applications shall aim for a diverse and representative distribution.

Note: In the following, we focus on sampling and monitoring of websites, since websites are the subject of the pilot.

The sample shall cover websites from the following levels of administration:

1. state websites
2. regional websites (NUTS1, NUTS2, NUTS3)
3. local websites (LAU1, LAU2)
4. websites of bodies governed by public law not belonging to categories a) to c)

The sample shall include websites representing as much as possible the variety of services provided by the public sector bodies, in particular the following: social protection, health, transport, education, employment and taxes, environmental protection, recreation and culture, housing and community amenities and public order and safety.

The Member States shall consult national stakeholders, in particular organisations representing persons with disabilities, on the composition of the sample of the websites to be monitored and give due consideration to the stakeholders' opinion regarding specific websites to be monitored.

Note: National stakeholders were not consulted in the pilot.

#### 4.1.4. Sampling of pages

The requirements for the monitoring of web pages are specified for each monitoring method. For the in-depth monitoring method, the following pages and documents, if existing, shall be monitored:

1. the home, login, sitemap, contact, help and legal information pages
2. at least one relevant page for each type of service provided by the website or mobile application and any other primary intended uses of it, including the search functionality
3. the pages containing the accessibility statement or policy and the pages containing the feedback mechanism
4. examples of pages having a substantially distinct appearance or presenting a different type of content
5. at least one relevant downloadable document, where applicable, for each type of service provided by the website or mobile application and any other primary intended uses of it
6. any other page deemed relevant by the monitoring body
7. randomly selected pages amounting to at least 10 % of the sample established by points a) to f)

The requirements in simplified monitoring are less specific, stating that a number of pages appropriate to the estimated size and the complexity of the website shall be monitored in addition to the home page.

#### 4.1.5. reporting

The Member States shall submit a report to the Commission. The report shall include the outcome of the monitoring relating to the requirements in the standards and technical specifications referred to in Article 6 of the Directive.

The report referred shall contain:

1. the detailed description of how the monitoring was conducted
2. a mapping, in the form a correlation table, demonstrating how the applied monitoring methods relate to the requirements in the standards and technical specifications referred to in Article 6 of the Directive, including also any significant changes in the methods
3. the outcome of the monitoring of each monitoring period, including measurement data
4. the information required in Article 8(5) of Directive (EU) 2016/2102

Note: Point d in the above list has not been a part of this pilot.

In their reports, Member States shall provide the information specified in the instructions set out in the Implementing Decision:

• The report shall detail the outcome of the monitoring carried out by the Member State.
• For each monitoring method applied (in-depth and simplified, for websites and mobile applications), the report shall provide the following:
1. a comprehensive description of the outcome of the monitoring, including measurement data
2. a qualitative analysis of the outcome of the monitoring, including:
• the findings regarding frequent or critical non-compliance with the requirements identified in the standards and technical specifications referred to in Article 6 of Directive (EU) 2016/2102
• where possible, the developments, from one monitoring period to the next, in the overall accessibility of the websites and mobile applications monitored.

’Measurement data’ is:

• The quantified results of the monitoring activity carried out in order to verify the compliance of the websites and mobile applications of public sector bodies with the accessibility requirements set out in Article 4.
• It covers both
1. quantitative information about the sample of websites and mobile applications tested (number of websites and applications with, potentially, the number of visitors or users, etc.) and
2. quantitative information about the level of accessibility.

The Commission Implementing Decision does also specify optional content for the reporting.

#### 4.1.6. Research questions

Based on the analysis of the Directive, we have identified a set of research questions for further investigation in the monitoring (of websites) that shall be conducted and reported by December 23, 2021. The questions are listed below.

1. What size and composition of the sample of web solutions (and mobile applications) should be included in the monitoring, both in simplified and in-depth monitoring?
2. How are the web solutions selected - and specifically - which solutions are selected in dialogue with stakeholders? For subsequent monitoring: What web solutions have been included in previous monitoring?
3. Which Success Criteria are covered in the monitoring and how do they correspond with the principles (perceivable, operable, understandable, and robust) and the User Accessibility Needs listed in the Directive? This applies to the simplified monitoring.
4. How do methods, tests, and tools identify non-compliance (simplified monitoring) and verify compliance (in-depth monitoring) with the requirements in the Directive?

1. What is the overall compliance status with the accessibility requirements in the Directive?
1. What is the level of compliance for the websites within each category of public sector bodies? (state, regional, local and bodies governed by public law)
2. For subsequent monitoring: How is the development over time when it comes to overall compliance with the requirements of the Directive?
2. What is the overall compliance status for each accessibility requirement (Success Criterion)?
1. Pay special attention to the Success Criteria where non-compliance is detected and to what extent non-compliance appears
2. Pay special attention to what user accessibility needs that are connected to Success Criteria with (frequent) non-compliance
3. What is the compliance status for each of the individual web solutions that are monitored?
1. The number of test pages with non-compliance should be reported
2. The results should also be specified per requirement/Success Criterion), per test page where non-compliance is detected

Note: All the results shall be specified for each monitoring method, simplified and in-depth.

The Directive also mentions the area of service, not as an absolute criterion for sampling, but in order to cover important services directed towards many users. Our interpretation is that it is not intended for the sample to be representative of the variable area of service. However, we should consider that it shall be possible to specify the results for each area, although they are most likely not comparable.

The results should also identify which elements on the tested pages that are not in compliance. That is for the website owners to be supported in their efforts for correcting failed elements.

### 4.2. Data requisite for reporting

The Directive has detailed requirements for sampling and reporting. The Directive thus forms the basis for what data we consider necessary to collect. In the following, we present what data must be collected about the

• Public sector bodies
• Web solutions
• Success Criterion/requirements included user accessibility needs
• Test results at the page (and element) level
• Arrangements for monitoring and reporting

The different categories of data are summarised in the tables below.

Note: This is an overview only. The tables do not describe the logical structure of a database.

#### 4.2.1. Data about the entity (the public sector body)

Table 2: Data about the entity (public sector body)
Data ItemDescriptionSource of data requirement
Name of entityName of entity
Organisation numberNumber from the Norwegian Register of Legal Entities
Address (geographic location)Full address, detailing geographic location of the entityCommission Implementing Decision (EU) 2018/1524, Annex I (2.2.1)
Classification of institutional sectorNumber and description from the Norwegian Register of Legal Entities

It is not required by the Directive to collect this data, but the

• Classification of institutional sector and
• Standard industrial classification

• the level of administration and
• the area of service
Standard industrial classificationNumber and description from the Norwegian Register of Legal Entities
Level of administrationLevel of administration the entity belongs to (state, regional (NUTS), local (LAU) or body governed by public law).Commission Implementing Decision (EU) 2018/1524, Annex I (2.2.2)

#### 4.2.2. Data about the ICT solution (website or mobile application)

Table 3: Data about the ICT solution (website or mobile application)
Data ItemDescriptionSource of data requirement
Name of ICTName or title of the ICT solution
Type of ICTType of ICT solution (website, mobile application, possibly also specify intranets and extranets)

Member States shall periodically monitor the compliance of websites and mobile applications. Intranet and extranet sites are not included in the scope (Directive (EU) 2016/2102 Article 8).

Directive (EU) 2016/2102, Article 1. Commission Implementing Decision (EU) 2018/1524, Annex I (1)
Area of serviceArea of services provided by the entity, such as social protection, health, transport, education, employment and taxes, environmental protection, recreation and culture, housing and community amenities, public order and safety or other relevant types of classifications.Commission Implementing Decision (EU) 2018/1524, Annex I (2.2.3)
Types of services (in-depth only)A list of individual services provided by the ICT solution.Commission Implementing Decision (EU) 2018/1524, Annex I (3.2)
Operating systemOperating system required to run the mobile application (e.g. Android, iOS or other). This is not applicable to websites.Commission Implementing Decision (EU) 2018/1524, Annex I (2.3.3)
Last versionVersion number of the last updated version of the mobile application. This is not applicable to websites.Commission Implementing Decision (EU) 2018/1524, Annex I (2.3.4)
Prioritised by national stakeholders?Have relevant stakeholders indicated the ICT solution as a priority for monitoring?Commission Implementing Decision (EU) 2018/1524, Annex I (2.2.4)
Last monitoredDate for when the ICT solution was last monitored and the type of monitoring.Commission Implementing Decision (EU) 2018/1524, Annex I (2.4)

#### 4.2.3 Data about the page (web page or screen in mobile application)

Table 4: Data about page (web page or screen in mobile application)
Data ItemDescriptionSource of data requirement
Type of page

In-depth:

Type of page, such as home, login, sitemap, contact, help and legal information, service, accessibility statement or policy, page containing the feedback mechanism, other or randomly selected page.

Simplified:

For simplified monitoring, only the home page needs to be specified. The other test pages do not need categorization.
Commission Implementing Decision (EU) 2018/1524, Annex I (3.2)
Type of service (in-depth only)Identification of the individual service web page, or screen in the mobile application, is connected toCommission Implementing Decision (EU) 2018/1524, Annex I (3.2)
Process (in-depth only)Indication of whether the page is a part of a process and a brief description of the processCommission Implementing Decision (EU) 2018/1524, Annex I (1.2.2)
AddressURL or other description of the location of the page

Note: This section only applies to in-depth monitoring.

Data ItemDescriptionSource of data requirement
Type of serviceIdentification of the individual service the downloadable document is connected toCommission Implementing Decision (EU) 2018/1524, Annex I (3.2)
ProcessIndication of whether the downloadable document is a part of a process and a brief description of the processCommission Implementing Decision (EU) 2018/1524, Annex I (1.2.2)

#### 4.2.5. Data about the requirement (WCAG Success Criterion)

Table 6: Data about the requirement (WCAG Success Criterion)
Data ItemDescriptionSource of data requirement
StandardNumber and name of standard.Directive (EU) 2016/2102, Article 6
VersionVersion number of the standard.Directive (EU) 2016/2102, Article 6
RequirementNumber, name of the requirement and conformance level in the standardDirective (EU) 2016/2102, Article 6
Principle in WCAGInformation about the corresponding main principle of WCAG (perceivable, operable, understandable, robust).Directive (EU) 2016/2102, Article 6. Commission Implementing Decision (EU) 2018/1524, Annex I (1.3.2)
Guideline in WCAGInformation about the corresponding guideline in WCAG.
Success CriterionNumber, name of the Success Criterion in WCAG 2.1 referred to in the standardEN 301 549 V2.1.2
User accessibility need

Mapping to functional performance statements (user accessibility needs) in EN 301 549.

That is usage without vision, usage with limited vision, usage without perception of colour, usage without hearing, usage with limited hearing, usage without vocal capability, usage with limited manipulation or strength, the need to minimise photosensitive seizure triggers, usage with limited cognition.
Commission Implementing Decision (EU) 2018/1524, Annex I (1.3.2)

#### 4.2.6. Data about the test result for tested page

It follows from the EN 301 549 standard, that the check for compliance or non-compliance happens at the page level.

Table 7: Data about the test result for tested page
Data ItemDescriptionSource of data requirement
Page or documentURL or other identification of web page (for website) or screen (for mobile application) tested.
RequirementNumber, name and conformance level of the requirement in the standard, that was subject for test.Directive (EU) 2016/2102, Article 6
Test methodInformation about test method, test rules, tool, test mode (automated, semi-automated, manual) and when the test method was last updated.Commission Implementing Decision (EU) 2018/1524, Annex II (2.3 b, 1.2.4 and 1.3.3)
Compliance status

Status of compliance or non-compliance of the tested page based on outcome of the tested elements.

Any failed elements results in non-compliance, while all passed, inapplicable or untested elements results in compliance.
Commission Implementing Decision (EU) 2018/1524, Article 5
Failed elementsNumber of failed elements found on the page.Commission Implementing Decision (EU) 2018/1524, Article 7
Date testedDate the test was performed.
Tested by

Name of the entity or monitoring body that performed the test.

We register this in case we in (a later version of) the accessibility statement will collect the test data from the entities and therefore need to be able to identify which tests have been performed by the entity, and which are performed by the monitoring body.
Commission Implementing Decision (EU) 2018/1524, Annex I (1.2.5)

#### 4.2.7. Data about the test result for tested element

In addition to the page level, we also wanted to explore the opportunity to produce test results at the element level. The term “element level” can be explained as the individual components or content elements that are tested, e.g. a form element, a picture, a table, or even the entire page, dependent on which element the test rules apply to. In our opinion, this will help the public sector bodies in correcting the errors found on their websites.

Table 8: Data about the test result for tested element
Data ItemDescriptionSource of data requirement
Tested elementIdentification of the applicable element that has been tested on the pageCommission Implementing Decision (EU) 2018/1524, Article 7
Page or documentURL or other identification of web page (for website) or screen (for the mobile application) tested.
RequirementTested requirement (Success Criterion in WCAG 2.1).
Outcome

The outcome of each individual test:

• Passed
• Failed
• Inapplicable
• Untested
Commission Implementing Decision (EU) 2018/1524, Article 7
Date testedDate the test was performed.
Tested by

Name of the entity or monitoring body that performed the test.

We register this in case we in a later version of the accessibility statement will collect the test data from the entities and therefore need to be able to identify which tests have been performed by the entity and which are performed by the monitoring body.
Commission Implementing Decision (EU) 2018/1524, Annex I (1.2.5)

#### 4.2.8. Data about monitoring and reporting

Table 9: Data about monitoring and reporting
Data ItemDescriptionSource of data requirement
Monitoring methodThe type of monitoring performed (simplified or in-depth)Commission Implementing Decision (EU) 2018/1524, Article 5
ICT solutions monitoredWhich typed of ICT solutions have been monitored (websites or mobile applications)Commission Implementing Decision (EU) 2018/1524, Article 8 (1)
Reporting period startStart date of the reporting period.Directive (EU) 2016/2102, Article 8 (4)
Reporting period endEnd date of the reporting period.Directive (EU) 2016/2102, Article 8 (4)
Monitoring period startStart date of the monitoring period.Commission Implementing Decision (EU) 2018/1524, Article 2 (2), Annex II (2.1 a)
Monitoring period endEnd date of the monitoring period.Commission Implementing Decision (EU) 2018/1524, Article 2 (2), Annex II (2.1 a)
Body in charge of monitoringThe authority body responsible for the monitoringCommission Implementing Decision (EU) 2018/1524, Article 2 (2), Annex II (2.1 b)
Requirements testedThe requirements in EN 301 549 that are verified/checked in the monitoringCommission Implementing Decision (EU) 2018/1524, Annex I (1.3.1)
Sample size simplified monitoringNumber of websites monitored in simplifiedCommission Implementing Decision (EU) 2018/1524, Annex I (2.1
Sample size in-depth monitoringNumber of websites monitored in-depthCommission Implementing Decision (EU) 2018/1524, Annex I (2.1
Sample size in-depth mobile applicationsNumber of mobile applications monitored in-depthCommission Implementing Decision (EU) 2018/1524, Annex I (2.1

### 4.3. WCAG Success Criteria, ACT Rules and test tools

Member States shall align the monitoring according to the requirements stated in the standards and technical specifications referred to in Article 6 of Directive (EU) 2016/2102:

• an in-depth monitoring method that thoroughly verifies whether a website or mobile application satisfies all the requirements.
• a simplified monitoring method that detects instances of non-compliance on a website with a sub-set of the requirements (…) reasonably possible with the use of automated tests.

Thus, a part of the planning process is to determine which Success Criteria to be included in the simplified monitoring and how to perform the tests in both the simplified and in-depth monitoring.

#### 4.3.1. tools used in the pilot

A part of the pilot was to perform the tests using the tools of the project partners. In December 2019 we met with the project partners Deque, FCID, and Siteimprove to ascertain which test tools to use in the pilot monitoring. Further information about the tools can be found in the appendix.

The ACT Rules developed in the project and the implementations in the tools are work-in-progress. Even though the ACT Rules were completed by the project, they do not necessarily meet the ACT objectives for consistency yet. Still, we decided to use the ACT Rules at the time of the pilot, since the objective was to try out the different steps in the monitoring process, rather than a verification of the actual ACT Rules and their implementations.

A test rule implementation means the way a test rule is interpreted and operationalized when incorporated in a test tool. A single implementation can test multiple ACT Rules. A tool or methodology can also have multiple implementations that when combined, map to a single ACT Rule.

The tools were used in their current state of development at the time of testing in the pilot. Due to the documentation provided by the tool vendors, the test rules referred to in this report was completed by the project and implemented in the tools. During the pilot, we kept in contact with the project partners, in case of questions or need for support. The tools were at the Norwegian Digitalisation Agency’s disposal, free of charge for the duration of the pilot, including access to relevant supporting documentation.

One of the tools was not used in the simplified monitoring, as the tool did not have a crawler for sampling test pages when the pilot was performed.

#### 4.3.2. WCAG Success Criteria covered by the pilot

In the pilot, we were restricted to cover Success Criteria that had accompanying ACT Rules that were completed and implemented in the project partners’ (Deque, FCID, and Siteimprove) tools at the time of testing (January/February 2020). Therefore, we included altogether 19 ACT Rules in the pilot. This applies to both simplified and in-depth monitoring. The WAI-Tools project is moving forward, and the goal is to develop 70 ACT Rules by the end of October 2020. The rules developed by the project are continually submitted to W3C for further review and final approval as W3C ACT Rules. We expect that implementations of these formally published rules will be more stable and consistent.

The selected Success Criteria cover the requirements of perceivability, operability, understandability, and robustness.

Based on

• which Success Criteria that were covered by ACT Rules in the tools at the time of the pilot,
• the four principles in WCAG and
• what user accessibility needs that shall be considered,

we have selected the following 13 Success Criteria for the pilot:

• 1.1.1 Non-text Content
• 1.2.2 Captions (Prerecorded)
• 1.2.3 Audio Description or Media Alternative (Prerecorded)
• 1.3.1 Info and Relationships
• 1.3.4 Orientation
• 1.3.5 Identify Input Purpose
• 2.4.2 Page Titled
• 2.4.4 Link Purpose (In Context)
• 3.1.1 Language of Page
• 3.1.2 Language of Parts
• 4.1.1 Parsing
• 4.1.2 Name, Role, Value

The following user accessibility needs (or Functional Performance Statement) have a primary relationship to the Success Criterion included in the pilot:

• Usage without vision
• Usage with limited vision
• Usage without hearing
• Usage with limited hearing
• Usage with limited manipulation or strength
• Usage with limited cognition

The Directive does not specify whether the Success Criterion must have a primary relationship to the user accessibility need. If the secondary relationship between the Success Criterion and the user accessibility needs is considered, we also cover usage without vocal capability.

Due to the restriction of covering Success Criteria that had accompanying ACT Rules, which were completed and implemented in all three tools at the time of testing, the user accessibility needs of usage without the perception of colour and the need to minimize photosensitive seizure triggers were not covered. The project is taking all the user accessibility needs into consideration when deciding which ACT Rules to prioritize.

There are additional Success Criteria that may be relevant to include in a simplified monitoring. Based on results from the Norwegian Digitalisation Agency’s (then Difi – the Agency for Public Management and eGovernment) monitoring in 2018 and supervision in 2019, we have identified several Success Criteria in WCAG 2.0 with a high-risk of uncovering errors on the websites tested.

Some examples of Success Criteria in addition to those included in the pilot, that also should be included in a monitoring are:

• 1.4.3 Contrast (Minimum)
• 2.2.2 Pause, Stop, Hide
• 3.3.1 Error Identification
• 3.3.2 Labels or Instructions

At the time of testing (February 2020), there are ACT Rules in development for Success Criterion 1.4.3, 2.2.2 and 3.3.1, but these ACT Rules have not yet been implemented in the tools. New ACT Rules are continually being developed and implemented, and coverage will increase with time.

#### 4.3.3. ACT Rules used in the pilot

The ACT Rules developed in the WAI-Tools project are all connected to the Success Criteria in WCAG.

Since the pilot was constrained by the development and implementation of test rules in the project partners’ tools, the following 19 ACT Rules were used in both simplified and in-depth monitoring:

ACT Rules were used in both simplified and in-depth monitoring:

Table 10: ACT Rules used in the pilot
Success CriterionACT Rule IDACT Rule name
1.1.123a2a8Image has accessible name
1.1.1, 4.1.259796fImage button has accessible name
1.2.2eac66bVideo element auditory content has accessible alternative
1.2.3c5a4eaVideo element visual content has accessible alternative
1.3.1, 4.1.26cfa84Element with aria-hidden has no focusable content
1.3.4b33effOrientation of the page is not restricted using CSS transform property
1.3.573f2c2Autocomplete attribute has valid value
2.2.1 (2.2.4, 3.2.5 at level AAA)bc659aMeta element has no refresh delay
2.4.22779a5HTML page has title
2.4.4, 4.1.2 (2.4.9 at level AAA)c487aeLink has accessible name
3.1.1b5c3f8HTML page has lang attribute
3.1.15b7ae0HTML page lang and xml:lang attributes have matching values
3.1.1bf051aHTML page language is valid
3.1.2de46e4Element within body has valid lang attribute
4.1.13ea0c8Id attribute value is unique
4.1.297a4e1Button has accessible name
4.1.24e8ab6Element with role attribute has required states and properties
4.1.2e086e5Form control has accessible name
4.1.2cae760Iframe element has accessible name

Over time as the project proceeds and the number of ACT Rules increases, the project and the ACT Rule community are building a library of commonly accepted rules. This will be of great importance for the Member States, due to the need for a well-documented and transparent interpretation of the accessibility requirements as a basis for monitoring.

### 4.4. Learnings

The findings and learning points from all the aspects of the planning are summarized below:

• It is crucial to analyse the Directive to identify requirements for monitoring and reporting.
• Before starting the test and other data collection, we must define, as precisely as possible, which research questions to be investigated, and then ensure that we collect all the data needed for analysis and reporting.
• The requirements for monitoring and reporting and the list of research questions underlies decisions on the following:
• The sample of public sector bodies/entities, web solutions, test object/pages etc. This applies to the size and composition of the sample, and the selection method for entities, web solutions and pages.
• The monitoring methods, tools, and test mode (automated, semi-automated, manual)
• For simplified monitoring: Which Success Criteria shall be included, especially considering that automated test is preferable.
• Data needs, the methods and for data collection and the data sources
• Which analysis that must be performed in order to report in alignment with the Directive
• In this pilot:
• We used 19 ACT rules that covered 13 WCAG 2.1 Success Criteria. In our opinion, it is of vital importance that this work on ACT rules development proceeds until all the accessibility requirements in the Directive are covered. This is due to the need for a documented, transparent, and commonly accepted test method.
• We met the requirements for simplified monitoring by covering the 4 principles in WCAG as well as 7 of 9 user accessibility needs.
• Since we used the same ACT Rules and Success Criteria in both simplified and in-depth monitoring, we covered 29 % of the Success Criteria required by the Directive (for the in-depth monitoring).
• Even though we had a somewhat limited scope regarding the number of requirements included, we came close to meet the minimum requirements for simplified monitoring. However, based on findings in previous monitoring in Norway, there were several high-risk Success Criteria that were not covered in the pilot. This was because neither the development of test rules nor the implementation of test rules were completed when the pilot testing was performed.
• However, high-risk criteria are continuously being addressed in the project. Regardless of this, one should be aware of that the election (and exclusion) of Success Criteria in the monitoring, may imply that there could be significant accessibility problems that are not uncovered in the monitoring. This applies, especially for simplified monitoring.
• In the foreseeable future, we consider that there will be a need to supplement automated tests with both semi-automatic and manual tests to cover all the Success Criteria and requirements in the standard. This applies, especially for the in-depth monitoring.

Especially when planning the first monitoring and reporting:

• There is a comprehensive need to collect and store data in both simplified and in-depth monitoring, as described in chapter 4.2. The data must be collected from diverse data sources. We need to establish a data model to structure the data, in order to facilitate efficient data storage and retrieval. This data model is to some extent built on the open data format for accessibility test results that were developed by the project and implemented in the tools as one of the output formats.
• We plan to use the accessibility statements to collect structured data about the public sector bodies, web solutions, area of services, and individual services per entity. Later, we will consider combining the accessibility statements with automated tests that the entities can perform themselves. Part of the WAI-Tools project is to develop a prototype large-scale data browser, which would collect and analyse data from the accessibility statements. However, this was not available at the time of carrying out the pilot and may be considered in the future.
• However, it should be considered whether the requirements for monitoring and reporting are too extensive, especially when it comes to data needed to compose the samples of entities, web solutions, and pages, in line with the Directive (as described in Chapter 4.2.1-4.2.3). This applies in particular to the number of services, processes, pages, and documents that shall be tested in the in-depth monitoring. It should be considered if a selection of services may be enough, instead of monitoring them all.

In addition, we have identified a set of criteria that should be considered when selecting a tool for testing (and producing test results) in real monitoring:

• The coverage of WCAG, i.e. the tool should cover as many requirements/Success Criteria as possible
• The tool should as far as possible secure the needs for transparency, reproducibility, and comparability, thus
• it must be based on a documented interpretation of each of the requirements in the standard
• as far as possible, be based on the ACT Rules, as they meet the need for a documented, transparent and commonly accepted test method
• the test rules (and the way they are implemented in the tools) must be documented in order to show what interpretation of the requirements that are covered by each test
• the tool should include or be combined with a crawler that is suitable for sampling most of the pages and content that should be included in a monitoring
• the test results should specify the outcome of the tests like passed, failed, inapplicable (and not tested)
• the tool must give test results both on the element and the page level, specified per success criteria
• the tool should preferably give test results both on the page and the element level, specified per success criteria
• the number of tested elements and pages especially failed elements and pages, should be counted and identified, per success criteria and in total
• the test results should be in a format suitable for analysis and reporting in line with the Directive and provide the web site owners/the public sector bodies with the necessary information in their work for improving their websites

## 5. Step 2: Sampling

In this chapter, we present the experiences we made when selecting a sample of public sector bodies, websites, and pages. The aim is to document methods and sources of data for selecting public sector bodies, websites, and webpages, in order to be prepared to create a sample in a real monitoring in line with the Directive. The Directive also describes the sampling criteria for mobile applications. The actual data we collected through the sampling, are further presented in chapter 6.

Note: Due to the scope of the pilot, we have sampled and monitored websites only.

### 5.1. Public sector bodies and websites

The number of websites (and mobile applications) to be monitored in each monitoring period shall be calculated based on the population of the Member State.

Based on a population of 5 367 580, the monitoring in Norway shall include the number of websites listed in the table below:

Table 11: The number of websites to be monitored based on the monitoring period (year)
Monitoring methodYear 1 and 2 (websites)From year 3 and then annually (websites)
Simplified182236
In-depth1922

The sampling of websites shall:

• aim for a diverse, representative and geographically balanced distribution
• cover websites from the following levels of administration
• state websites
• regional websites
• local websites
• websites of bodies governed by public law
• include websites representing as much as possible the variety of services provided by the public sector bodies, in particular the following: social protection, health, transport, education, employment, taxes, environmental protection, recreation and culture, housing and community amenities and public order and safety
• reflect national stakeholders' opinion, organisations representing persons with disabilities in particular, regarding specific websites to be monitored

Note: For the pilot, we selected only four websites owned by public sector bodies, one per category required by the Directive (state, regional, local, body governed by public law).

With such a small sample, we did not aim for a balanced geographical distribution and coverage of the variety of services, but we attempted to take these factors into account to some degree. Neither did we involve national stakeholders, as we will do in a real monitoring.

In a real monitoring, we will also, as far as possible, select a representative sample. This will (to a greater extent) allow us to generalize the monitoring results and establish a national indicator on the degree of compliance with the accessibility requirements set out in the Directive and an overall accessibility indicator for websites.

In the pilot, we had a preselected sample of entities. However, for gaining experience with gathering information necessary to select a sample in line with the Directive, we searched the Norwegian Register of Legal Entities, for the following:

• Name
• URL (if registered)
• Organisation number
• Number of employees
• Classification of institutional sector
• Standard industrial classification

The classifications of the institutional sector and industry can be useful to:

• Determine the level of administration (state, regional, local, body governed by public law)
• For state, regional and local entities, we searched for entities with the institutional sector classification of state administration and municipal administration (which also contains the regional level).
• For bodies governed by public law, we searched for entities with the institutional sector classification of entities with business operations at the state or regional/municipal level
• Get an indication of the area of service, based on a combination of the classifications of the institutional sector and industry.
• An example is the industry “transportation of passengers” combined with the institutional sector “enterprises owned by local government”.
• An alternative or supplement is to inspect the entities' websites to get information about the area of service.

However, the starting point for selecting a sample of websites is the population of public sector bodies that are website owners. This is due to the criteria for sampling, such as geography and level of administration. And not least, it is the website owners who are responsible for complying with the regulations.

We do not have a register of websites in Norway that are suitable for selecting a sample that includes both the entities and their websites. Therefore, the main source for selecting a sample for monitoring in Norway will most likely be the Register of Legal Entities. For some entities, the websites’ URL is listed in the register.

Two of the four selected entities for the pilot had their websites registered. For the other ones, we searched on the internet to locate the websites.

An alternative to search for a huge number of websites is to have a dialogue with the entities to determine the website(s) to be monitored. Such a dialogue can be conducted through a survey, or maybe more preferably, gather this information through the accessibility statements.

In the pilot we cooperated with the website owner of the sampled entity at the state level in order to identify which website to monitor. This public sector body has more than 20 websites in their portfolio. As an example, The Norwegian Digitalisation Agency has close to 10 different websites. We have experienced similar cases in previous status surveys. We have also previously surveyed that approximately 1/3 of the entities share web solutions with other entities. One example is the Norwegian Government Security and Service Organisation, that manages the websites for all the Norwegian ministries.

This is particularly important to be aware of in follow up and communication with the entities about the monitoring results.

For the pilot, we monitored one website per entity. All four websites were tested in simplified monitoring, while the state website was also tested in-depth. We have anonymised the data about the entities and websites in the pilot.

For a complete overview of the data we collected about the entities and websites, see chapter 6 Data Collection.

### 5.2 Web pages

The requirements for the monitoring of web pages are specified for each monitoring method. For the in-depth monitoring method, the following pages and documents, if existing, shall be monitored:

1. the home, login, sitemap, contact, help and legal information pages
2. at least one relevant page for each type of service provided by the website or mobile application and any other primary intended uses of it, including the search functionality
3. the pages containing the accessibility statement or policy and the pages containing the feedback mechanism
4. examples of pages having a substantially distinct appearance or presenting a different type of content
5. at least one relevant downloadable document, where applicable, for each type of service provided by the website or mobile application and any other primary intended uses of it
6. any other page deemed relevant by the monitoring body
7. randomly selected pages amounting to at least 10 % of the sample established by points a) to f)

The requirements in simplified monitoring are less specific, stating that a number of pages appropriate to the estimated size and the complexity of the website shall be monitored in addition to the home page.

We did not sample or test web pages that require log-in in the pilot monitoring, as these web pages are both difficult to locate and unfeasible to test automatically. In both simplified and in-depth pilot monitoring, we tested third party content and other content exempt from the Directive. We deemed analysing the web pages to exclude this content from the sample was deemed to be too resource-intensive for the pilot.

#### 5.2.1. In-depth monitoring

The sample of web pages for in-depth monitoring was drawn in cooperation with the website owner at the state level. In dialogue with the actual public sector body, we soon discovered that the most relevant pages and services required us to log-in using a national ID number. The log-in process also required two-step verification. The tools used in the pilot were unable to test these pages. Neither did we test downloadable documents, as this was outside of the scope of the pilot.

An alternative way of sampling web pages for in-depth monitoring is to browse the website and use the search functionality to find web pages that are relevant to the requirements in the Directive.

The Directive requires monitoring of at least one relevant page for each type of service, and if any of the pages include a step in a process, all the steps of the process shall be verified. For huge and complex websites, this may result in a large number of pages that must be verified in the in-depth monitoring, where all the requirements/Success Criterion shall be included.

It is quite a challenge that the terms “Type of service” and “Process” are not defined in the Directive.

• In the pilot we assessed pages that are part of an interactive process on the website, to be considered as a “process”.
• We had a brief dialogue with the state entity on what type of services that are offered through their website. It was quite challenging to identify the services offered via the website, especially considering that the term service is not defined, and therefore it may be a bit random what one considers to be a service. Anyway, the fact that we couldn’t test pages that require a login, gave a significant limitation to the number of testable processes.

For a complete overview of the data we collected about the sampled web pages and process for the in-depth monitoring, see chapter 6 Data Collection.

#### 5.2.2. Simplified monitoring

Due to limited resources in the pilot monitoring, we sampled an equal number of pages - 1000 web pages - including the homepage, for each of the selected websites. We used the tools supplied by Deque and Siteimprove. These tools facilitated automated sampling of the web pages using a crawler. FCID’s QualWeb Core was not used in the simplified monitoring, as the tool does not have a crawler at the present time.

Due to the Directive, the sample of pages shall reflect the estimated size and complexity of the website. It is possible to use a crawler or a search engine to estimate the size, but not the complexity.

Using a search engine to get estimated figures, the number of pages for each of the four sampled websites were as follows:

• NN Public sector body at state level: 24 600 pages (including 5 740 PDF documents).
• NN Public sector body at regional level: 3 240 pages (including 1 510 PDF documents).
• NN Public sector body at local level: 26 500 pages (including 5 580 PDF documents).
• NN Public sector body governed by public law: 2 630 pages (including 345 PDF documents).

However, it was difficult to determine the exact size and number of web pages of a website. The search engine did not count the web pages and services that require log-in. When using a search engine to determine the size of the website, downloadable documents were also counted in the search results.

Determining the size of the sample in simplified monitoring is a complex matter. Some possible solutions are listed in the table below:

Table 12: Possible solutions for determining the size of the sample in simplified monitoring
Sampling methodProsCons
A set number of pages per website, such as 1000 web pages or all web pages. If the website is smaller than 1000 web pages, all web pages will be a part of the sample.The crawl is set at a fixed number and there is no need to determine the size of the website.The size and complexity of the website is not considered when sampling the number of test pages.
Set intervals of web pages, determined by the size of the website.
• For a small website, X pages will be sampled.
• For a medium sized website Y web pages will be sampled.
• For a large website, Z web pages will be sampled.
The websites are classified and there is a closer relationship between the size (and complexity) of the website and the number of sampled web pages.The whole website must be crawled to determine the size and complexity of the website.The intervals need to be established.
A percentage the total number of web pages on the website.There is a set percentage of web pages tested for all monitored websites, which is a more even approach to the size (and complexity) of the websites.The whole website must be crawled to determine the size and complexity of the website. The percentage needs to be established.

As these sampling methods require that the size and complexity of the website are known, the monitoring bodies may find it insightful to establish a dialogue with the monitored entities. However, this may soon become a manual and resource-intensive process when involving many stakeholders.

To assess the complexity of the website, it is possible to use a crawler to collect information about the types of content and elements that may be found on the web pages. We may use the crawler results to target web pages with content that is relevant for further testing. But unlinked pages and pages behind log-in are not found by a crawler.

Note: The development of a crawler is not a part of the WAI-Tools project.

### 5.3. Learnings

Sampling of public sector bodies/entities:

• We will benefit from developing an efficient method for establishing a representative sample in line with the Directive. This will hopefully contribute to the reliability and social significance of the analysis of accessibility barriers uncovered by the monitoring.
• A representative sample will allow us to generalize the monitoring results and establish a national indicator on the degree of compliance with the accessibility requirements set out in the Directive, and an overall accessibility indicator for websites. This may also be suitable for benchmarking purposes.
• In order to select a representative sample, we need information about the population of public sector bodies. The Norwegian Register of Legal Entities (or similar) can be used for drawing a sample of public sector bodies.
• The classifications of the institutional sector and industry as listed in the register can be useful to determine the level of administration (state, regional, local, body governed by public law). Based on a combination of the classifications of the institutional sector and industry, we can also get an indication of the area of service.
• There is great potential in more automation, for instance by drawing (as far as possible) a random sample from The Norwegian Register of Legal Entities (or similar), based on specifications of the criteria mentioned in the Directive.

Sampling of websites:

• There is no register in Norway that is suitable for drawing a sample of websites. Thus, the total website population is unknown. To locate and draw a sample of websites that fits 100 percent with the criteria in the Directive, we need to collect data that shows which web solutions belong to each selected entity.
• Some entities have more than one website. In these cases, we need to determine which website to include in the monitoring. On the other hand, some entities share web solutions with others, and in those cases, we need to identify who is responsible. In the pilot, we needed to combine information from the register with online searching for public bodies’ websites. This is time-consuming.
• A combination of data from the Norwegian Register of Legal Entities and data input from the entities through the accessibility statements might be an efficient method. We can use the statements to collect structured data on level of administration, area of service, which websites (and mobile applications) that are relevant for monitoring, etc.
• Still, there are important questions concerning the sampling requirements that are not specified in the Directive, such as:
• Whether or not the sample for simplified and in-depth must be selected in separate operations
• Whether the samples for respectively simplified and in-depth monitoring, shall both aim for a diverse, representative, and geographically balanced distribution
• Whether the in-depth monitoring can be based on a selection of the websites in the simplified monitoring

It would be very helpful if these above-mentioned matters could be clarified by the EU.

Sampling of pages (and documents) - in-depth monitoring:

• It should be considered whether the requirements for the sample of test pages are too extensive. The sampling of web pages is a complex and time-consuming task that requires manual effort. It might be considered to establish a dialogue with the website owners for the sampling of pages and documents. However, we must take into consideration whether this is cost-efficient.
• The terms “Type of service” and “Process” should be defined, in order to avoid a random approach when identifying services and processes. In a real monitoring, this also applies to downloadable documents, as the Directive requires testing of that least one relevant downloadable document, where applicable, for each type of service provided by the website.
• If the sampled pages in in-depth monitoring are part of a process, the Directive requires that all steps in the process are monitored. In our experience, processes may require log-in using a national ID number. Therefore, it is crucial that we establish a method for acquiring log-in access to these processes/web pages.

Sampling of pages (and documents) - simplified monitoring:

• We need a scale or criteria for assessing what will be a suitable number of test pages, based on the estimated size and the complexity of the website that shall be monitored.
• For simplified monitoring, we need access to a crawler to sample web pages. However, the crawler in the different tools can use different methods to crawl the websites. This can cause the tools to sample different pages on the same website.
• Crawling is also required to estimate the size of a website. In that matter, one should be aware that crawling cannot find hidden pages, subdomains, or pages that require log-in.
• In a lack of adequate alternatives, we still consider using a crawler to sample web pages to be the most efficient method.

Sampling of pages (and documents) - both in-depth and simplified:

• We need a method to exclude pages that include third party contents and other contents exempt from the Directive.

We need to explore whether it is possible to test web pages that require log-in, using automated tools. This applies primarily to in-depth monitoring, but it is desirable to be able to cover this type of contents in a simplified monitoring as well.

## 6. Step 3: Data collection

In the pilot, we collected data about the entities, websites, and web pages, in addition to test results from the tools. The data collection is based on the data requisites for monitoring, listed in chapter 4.

For the websites in the sample, our aim was to collect data about each web page and tested element in order to provide results for compliance and non-compliance for each accessibility requirement in the Directive.

The collected data are listed in this chapter.

The following table summarises each data item we collected about the entities sampled in the pilot, the source of the data, and how we collected the data. The data requisite is described in chapter 4.2.1.

Table 13: Collection of data about the entity (public sector body)
Data ItemSource of dataHow did we collect data in the pilot?
Name of entityNorwegian Register of Legal Entities

In the pilot we had a predefined sample of entities and the name of the entities was already known.

In a real monitoring, we will draw a sample of entities from the Norwegian Register of Legal Entities.
Organisation numberNorwegian Register of Legal EntitiesSearching for the entity in the register.
Address (geographic location)Norwegian Register of Legal EntitiesSearching for the entity in the register.
Classification of institutional sectorNorwegian Register of Legal EntitiesSearching for the entity in the register.
Standard industrial classificationNorwegian Register of Legal EntitiesSearching for the entity in the register.
Level of administrationNorwegian Register of Legal EntitiesThe classification of the institutional sector provides information about entities, which allows us to place the entity in one of the four categories described in the Directive.

Based on the above sources, we collected data about the sampled entities. The data is listed in the table below.

Table 14: Data collected about the entities (public sector bodies)
Data ItemStateRegional (NUTS3)Local (LAU)Body governed by public law
Name of entityNN Public sector body state levelNN Public sector body regional levelNN Public sector body local levelNN Public sector body governed by public law
Organisation number889 *** ***817 *** ***940 *** ***960 *** ***
Classification of institutional sector6100 Central government6500 Local government6500 Local government3900 State lending institutions etc.
Standard industrial classification84.120 Regulation of the activities of providing health care, education, cultural services, and other social services, excluding social security84.110 General public administration activities84.110 General public administration activities64.920 Other credit granting
Level of administrationStateRegional (county)Local (municipality)Body governed by public law

### 6.2. Data about ICT solutions (websites)

The following table summarises each data item we collected about the websites sampled in the pilot, the source of the data, and how we collected the data. The data requisite is described in chapter 4.2.2.

Table 15: Collection of data about ICT solutions (websites)
Data ItemSource of dataHow did we collect data in the pilot?

Norwegian Register of Legal Entities, then

search the internet
Two of the four public sector bodies had web site addresses listed in the Norwegian Register of Legal Entities. For the others, we searched for the entity’s website and inspected it to determine the address.
Name of ICT

Norwegian Register of Legal Entities and/or

Search the internet
Two of the four public sector bodies had web site addresses listed in the Norwegian Register of Legal Entities. For the others, we searched for the entity’s website and inspected it to determine the name.
Type of ICTThe websiteIn the pilot, we only included websites in the scope.
Area of service

The website in combination with

Norwegian Register of Legal Entities

We used

• a combination of the Classification of the institutional sector and Standard industrial classification, and
• browsing the website

to determine to which area of service the website could be categorized.

In the pilot, we used the list of areas of service specified in the Implementing Decision, for categorisation. These were social protection, health, transport, education, employment and taxes, environmental protection, recreation and culture, housing and community amenities, and public order and safety.
Type of services (in-depth only)The website

We had a dialogue with the public sector body on what services they offered, and which was suitable for the pilot.

In addition, we browsed the website to determine which services are offered.

Note: We did not consult with national stakeholders for the composition of the sample of entities and web solutions, as this is a pilot including a few websites.

The complete data collected about the sample of websites is listed in the following table:

Table 16: Data collected about the ICT solutions (websites)
Data ItemStateRegional (NUTS3)Local (LAU)Body governed by public law
Name of ICTNN website of Public sector body state levelNN website of Public sector body regional levelNN website of Public sector body local levelNN website of Public sector body governed by public law
Type of ICTWebsiteWebsiteWebsiteWebsite
Area of serviceSocial protection, employmentHealth, transport, education, recreation and culture.Social protection, health, education, recreation and culture, housing and community amenities.Education.
Types of services (in-depth only)The services included in the pilot were Parental benefits and Apply for guide dog and service dog. Note: This is only a small subset of the services offered by the website.Not applicable in simplified monitoring.Not applicable in simplified monitoring.Not applicable in simplified monitoring.

The following table summarises each data item we collected about the web pages sampled in the pilot, the source of the data, and how we collected the data. The data requisite is described in chapter 4.2.3.

Table 17: Collection of data about pages
Data ItemSource of dataHow did we collect data in the pilot?
Type of pageThe web page

In-depth:

We inspected the content on the web page to categorise the page in one of the examples of categories described in the Implementing Decision. These were such as home, login, sitemap, contact, help and legal information, service, accessibility statement or policy, page containing the feedback mechanism, other or randomly selected page.

Simplified:

For simplified monitoring, only the home page needs to be specified. The other test pages do not need categorization.
Type of service (in-depth only)The web pageWe had a dialogue with the website owner and inspected the page to identify which service was offered by the entity through the web page
Process (in-depth only)The web pageWe inspected the page to identify whether the page had indications of being part of a process spanning multiple pages. Such indications were next/previous buttons or numbering of the pages.
AddressThe website/the web pageWe browsed the website to locate the web page.

#### 6.3.1. In-depth

The data collected about the sample of web pages for the website tested in in-depth monitoring is listed in the following table. Each page was connected to a type of service and process, if applicable.

Table 18: Pages in the in-depth monitoring
HomeNot applicableNonehttps://www.***.no/no/person
ContactNot applicableNohttps://www.***.no/person/kontakt-oss/
HelpNot applicableNohttps://www.***.no/no/***-og-samfunn/kontakt-***/teknisk-brukerstotte/hjelp-til-personbruker
ServiceApply for guide dog and service dogYeshttps://www.***.no/soknader/nb/person/hjelpemidler-og-tilrettelegging/forerhund-og-servicehund#***100750
ServiceApply for guide dog and service dogYeshttps://www.***.no/soknader/nb/person/hjelpemidler-og-tilrettelegging/forerhund-og-servicehund/***%2010-07.50/brev
Accessibility statementNot applicableNohttps://www.***.no/no/***-og-samfunn/kontakt-***/teknisk-brukerstotte/nyttig-a-vite/tilgjengelighet-og-universell-utforming
Feedback mechanismNot applicableNohttps://www.***.no/person/kontakt-oss/tilbakemeldinger/feil-og-mangler
Page having a distinct appearanceWork assessment allowanceNohttps://www.***.no/no/Person/Arbeid/Arbeidsavklaringspenger
Page having a distinct appearanceWork assessment allowanceNohttps://www.***.no/no/person/arbeid/arbeidsavklaringspenger/arbeidsavklaringspenger-aap

We sampled all the relevant types of pages that were present on the website and pages that are included in a process that did not require log inn.

#### 6.3.2. Simlified

In simplified monitoring, only the home page was specified. The other pages do not need categorization. In addition to the home page, we crawled up to 1000 pages per website in the pilot.

Since it will be too comprehensive to list up to 1000 pages per website for each tool, we have listed the total number of pages (including the home page) that were crawled and tested per website, in the following table:

Table 19: Pages in the simplified monitoring
EntityEstimated sizeNumber of pages selected for test – Tool 1Number of pages selected for test – Tool 2
NN Public sector body state level24 600 pages (incl. 5 740 PDFs)912991
NN Public sector body regional level3 240 pages (incl. 1 510 PDFs)796964
NN Public sector body local level26 500 pages (incl. 5 580 PDFs)948979
NN Public sector body governed by public law2 630 pages (incl. 345 PDFs)991999

As shown in the table, there were differences between the tools when crawling for web pages. For practical reasons we set a fixed number of web pages in the pilot, which implies that the coverage of the website differs a lot. In a real monitoring, the number of test pages shall reflect the size and complexity of the website. In the pilot, we did not investigate the complexity of the website, since the term is not defined in the Directive.

### 6.4. Data about the requirement

The following table summarises each data item we collected about each requirement (WCAG Success Criterion) in the pilot, the source of the data, and how we collected the data. The data requisite is described in chapter 3.2.4.

Table 20: Collection of data about the requirements
Data ItemSource of dataHow did we collect data in the pilot?
StandardDirective (EU) 2016/2102 and Commission Implementing Decision (EU) 2018/2048We used the standard referenced in the Directive. At the time of the pilot, the latest version of the standard that was referenced in the Official Journal of the European Union, was version 2.1.2 of EN 301 549. According to EN 301 549 v2.1.2, conformance with the web requirements is equivalent to conforming with the Web Content Accessibility Guidelines (WCAG) 2.1 Level AA
VersionDirective (EU) 2016/2102 and Commission Implementing Decision (EU) 2018/2048We used the standard referenced in the Directive. At the time of the pilot, the latest version of the standard that was referenced in the Official Journal of the European Union, was version 2.1.2 of EN 301 549. For WCAG, the relevant version was WCAG 2.1.
RequirementEN 301 549 V2.1.2, chapter 9The number, name, and conformance level of the requirement (Success Criterion) was specified the EN standard.
Principle in WCAGEN 301 549 V2.1.2, chapter 9The corresponding principle in WCAG has specified in the EN standard.
Guideline in WCAGEN 301 549 V2.1.2, chapter 9The corresponding principle in WCAG has specified in the EN standard.
User accessibility needEN 301 549 V2.1.2, Annex BThe user accessibility needs (Functional Performance Statements) were mapped to each requirement in the standard.

The full list of requirements covered in the pilot is described in chapter 3.3.2. As an example, the data about a single requirement was structured in the following way:

Table 21: Example of data about a requirement
Data ItemData collected in the pilot
StandardEN 301 549 / Web Content Accessibility Guidelines (WCAG)
VersionV2.1.2 / 2.1
Requirement9.1.1.1 Non-text Content
Principle in WCAG1. Perceivable
Guideline in WCAG1.1 Text alternatives
User accessibility need

Primary relationship:

• Usage without vision
• Usage with limited vision
• Usage without hearing

Secondary relationship:

• Usage with limited hearing
• Usage with limited cognition

The requirements are based on the standard EN 301 549 V2.1.2. We collected and documented data about the requirements as presented in the table below:

Table 22: Data collected about the requirements
RequirementPrinciple in WCAGGuideline in WCAGSuccess CriterionUser Accessibility Needs (Functional performance statement)
9.1.1.1 Non-text Content1. Perceivable1.1 Text Alternatives1.1.1 Non-text Content (Level A)

Primary relationship:

• Usage without vision
• Usage with limited vision
• Usage without hearing

Secondary relationship:

• Usage with limited hearing
• Usage with limited cognition
9.1.2.2 Captions (Prerecorded)1. Perceivable1.2 Time-based Media1.2.2 Captions (Prerecorded) (Level A)

Primary relationship:

• Usage without hearing
• Usage with limited hearing

Secondary relationship:

• Usage with limited cognition
9.1.2.3 Audio Description or Media Alternative (Prerecorded)1. Perceivable1.2 Time-based Media1.2.3 Audio Description or Media Alternative (Prerecorded) (Level A)

Primary relationship:

• Usage without vision

Secondary relationship:

• Usage with limited vision
• Usage with limited cognition
9.1.3.1 Info and Relationships1. Perceivable1.3 Adaptable1.3.1 Info and Relationships (Level A)

Primary relationship:

• Usage without vision

Secondary relationship:

• Usage with limited vision
• Usage with limited cognition
9.1.3.4 Orientation1. Perceivable1.3 Adaptable1.3.4 Orientation (Level AA)

Primary relationship:

• Usage with limited manipulation or strength

Secondary relationship:

• Usage of limited cognition
9.1.3.5 Identify Input Purpose1. Perceivable1.3 Adaptable1.3.5 Identify Input Purpose (Level AA)

Primary relationship:

• Usage with limited vision

Primary relationship:

• Usage without vision
• Usage with limited vision
• Usage without hearing
• Usage with limited hearing
• Usage with limited manipulation or strength
• Usage with limited cognition
9.2.4.2 Page Titled2. Operable2.4 Navigable2.4.2 Page Titled (Level A)

Primary relationship:

• Usage without vision
• Usage with limited vision
• Usage with limited manipulation or strength
• Usage with limited cognition
9.2.4.4 Link Purpose (In Context)2. Operable2.4 Navigable2.4.4 Link Purpose (In Context) (Level A)

Primary relationship:

• Usage without vision
• Usage with limited vision
• Usage with limited manipulation or strength
• Usage with limited cognition

Secondary relationship:

• Usage without the vocal capability
9.3.1.1 Language of Page3. Understandable3.1 Readable3.1.1 Language of Page (Level A)

Primary relationship:

• Usage without vision

Secondary relationship:

• Usage with limited vision
• Usage without hearing
• Usage with limited hearing
• Usage with limited cognition
9.3.1.2 Language of Parts3. Understandable3.1 Readable3.1.2 Language of Parts (Level AA)

Primary relationship:

• Usage without vision

Secondary relationship:

• Usage with limited vision
• Usage without hearing
• Usage with limited hearing
• Usage with limited cognition
9.4.1.1 Parsing4. Robust4.1 Compatible4.1.1 Parsing (Level A)

Primary relationship:

• Usage without vision

Secondary relationship:

• Usage with limited vision
9.4.1.2 Name, Role, Value4. Robust4.1 Compatible4.1.2 Name, Role, Value (Level A)

Primary relationship:

• Usage without vision
• Usage with limited vision

Secondary relationship:

• Usage with limited manipulation or strength

### 6.5 Data about the test results

In this chapter, we present test results and data about the test results, for pages and elements respectively, specified by in-depth and simplified monitoring. The Directive requires test results at the page level. In order to ensure that the public sector bodies are provided with data and information on compliance and non-compliance with the accessibility requirements, there is also a need for detailed test results for each tested element. This will help the public sector bodies in correcting the errors found on their websites.

It is also essential to monitor compliance (and non-compliance) with the actual requirements in the Directive. Thus, the results must be specified per Success Criterion, and not only per test rule.

In the following, we start by presenting which data items we collect and calculate. Afterward follows the actual results and findings.

#### 6.5.1. Test results at the page level

The following table summarises each data item we collected about the test results for a tested page in the pilot, the source of the data, and how we collected the data. The data requisite is described in chapter 4.2.6.

Table 23: Collection of test results at the page level
Data ItemSource of dataHow did we collect data in the pilot?
Page or documentOutput from toolTest reports from the tools did all show which web pages had been tested.
RequirementOutput from tool

Two of the tools mapped the test result to the ACT Rules. This had then to be mapped to Success Criterion, using the ACT Rule documentation.

One of the tools mapped the test result directly to Success Criterion.
Test methodDocumentation about the tool and test rules

We collected information about the tool and test rules from the tool documentation provided by the project partners.

To get a mapping between test method, ACT Rules, and Success Criteria, we asked for further documentation from the project partners. At the time of carrying out the pilot, it was not possible to get this information only by investigating the tools.

Information about when the test method was last updated follows from the version number of the tool.

As for test mode, we only used automated test rules in the pilot.
Compliance statusOutput from tool

We calculated compliance for each tested page per Success Criterion included in the test.

Test report from one of the tools reported the actual results (specified by failed, passed, and inapplicable) for all tested elements.

From two of the tools, we managed to only retrieve reports only about failed elements.

We calculated compliance status based on the occurrence of failed elements on the page.

Any failed elements resulted in non-compliance, while all passed, inapplicable or untested elements resulted in compliance.
Failed elementsOutput from tool

We used the test results to count the number of failed elements on the page.

All the three tools reported on failed elements, per test rule implementation, per page.

The further calculation is needed to count failed elements on the Success Criteria level.
Date testedOutput from toolThis information was documented in the reports from each tool.
Tested byThe entity that performed the test

Testing was conducted by the Norwegian Digitalisation Agency.

We documented this in case we in the future will link data from the accessibility statements to the monitoring results.

#### 6.5.2 Test results at the element level

The following table summarises each data item about the test results for a tested element in the pilot and the source of the data. The data requisite is described in chapter 4.2.7.

Table 24: Collection of test results at the element level
Data ItemSource of dataHow did we collect data in the pilot?
Tested elementOutput from tool

We were unable to efficiently collect this data in the pilot for any of the tools used.

This was because we were not able to export the data from the tools without substantial manual effort. Instead, we counted the total number of failed elements per Success Criterion on each website.

Only one of the tools reported status for elements with the outcomes “passed” and “inapplicable”.
Page or documentOutput from tool
RequirementOutput from tool
OutcomeOutput from tool
Date testedOutput from tool
Tested byThe entity that performed the test

As we were unable to efficiently collect data for each tested element, we counted the number of failed elements based on aggregated data from the tools. We did this by extracting “failed elements” per test rule. Then we aggregated the data to show the number of failed elements per success criterion

This applies to both in-depth and simplified monitoring.

Note: It is important to note that EN 301 549, do not provide requirements on such as element level.

However, in our opinion results at the element level will increase the value of guidance to the public sector bodies in their efforts to improve their websites. But counting is currently dependent on the particular tool implementations and how they carry out the testing. Further guidance on mapping results to individual page elements is needed.

#### 6.5.3. Test results from in-depth monitoring

We used the tools by Deque, FCID, and Siteimprove in the in-depth monitoring. In the in-depth monitoring, we tested 9 pages sampled from the state level website.

The tables below show the number of unique web pages with any failure results, as reported by the tools. The test results have been aggregated from the number of unique pages with one of the more failure results for each Success Criterion. The number of failure results is shown in parentheses.

Table 25: Test results from in-depth monitoring
Number of unique web pages where fail was detected
(The number of failed results are shown in parentheses)
WCAG Success CriterionTool1Tool2Tool3
1.1.1 Non-text Content 9 (18)9 (18)
1.2.2 Captions (Prerecorded)
1.2.3 Audio Description or Media Alternative (Prerecorded)
1.3.1 Info and Relationships  7 (28)
1.3.4 Orientation
1.3.5 Identify Input Purpose
2.4.2 Page Titled
2.5.3 Label in Name
3.1.1 Language of Page
3.1.2 Language of Parts
4.1.1 Parsing1 (1)1 (10)1 (10)
4.1.2 Name, Role, Value 2 (8)7 (29)

As mentioned in chapter 4.3, the tools may have different ways of implementing the ACT Rules, even though the implementations map to the same ACT Rules. For example, one tool had several internal rules that map to one ACT Rules, while another tool had one-to-one mapping. This affects the test results generated by tools and makes it difficult to aggregate test results to the Success Criterion level and specify results per unique test page without further processing of the data.

#### 6.5.4. Test results from simplified monitoring

Within the timeframe of the pilot, we were not able to arrange the data from the simplified monitoring at the page level, for presentation in this report. In order to present data for the number of unique pages with failed elements for the tested Success Criteria, we needed to manually register and store test data for each sampled page. We registered this information for each tested page in in-depth monitoring, as presented above. This would be too time-consuming for the (up to) 1000 pages per website tested in simplified monitoring.

The challenge can be summarized in the following way:

• The tools used in the simplified monitoring reported unique pages with failure results per test rule, and not directly per Success Criterion. Further processing would be necessary to combine these individual results.
• Since the tools in many cases had multiple test rules per Success Criterion (not 1 to 1), it became difficult to calculate compliance status per Success Criterion without further guidance.
• Therefore, by simply aggregating results from the test rule level, we did not manage to calculate the number of unique pages that failed on a given Success Criterion. Other tools, possibly built on these checker engines, may provide such aggregation functionality.
• Tools may also be crawling the websites differently, which also impacts the results generated by the tools. To compare the results effectively, one would need to use a single crawler and test the same set of pages by all tools in parallel.

This can be explained by an example:

• Success Criterion 1.1.1 is implemented in a test tool by Test Rule 1 and Test Rule 2.
• If a page fails at both Rule 1 and Test Rule 2, and these fails are added, it will appear as two pages failed at Success Criterion 1.1.1, while the fact is that there was only one unique page tested.

We managed to extract individual test results for simplified monitoring.

There are significant variations in the number of results "failed” generated by the different tools. This relates to the individual implementation of the tools, including how it crawls the website. We present the results here, to emphasize that it is very important to investigate the tools’ documentation in order to be in control of what is tested and how the tests are performed.

The table below shows the number of failure results on each tested website, as reported by the tools.

Tool 1 Tool 2 WCAG Success Criterion State Regional Local Public law State 1544 161 4453 6978 36 26 1205 3 1 2 168 1 142 16 1 9 1457 20 21 6 2015 141 74 168 991 143 72 5 938 2 498 296281 418 277 168 148736 68 18 1462 1238 45 15 2051 155 92

### 6.6. Data about monitoring and reporting

The following table summarises each data item we collected about the monitoring, the source of the data, and how we collected the data. The data requisite is described in chapter 3.2.8.

Table 27: Collection of data about monitoring and reporting
Data ItemSource of dataHow did we collect data in the pilot?
Monitoring methodThe planning and design of the monitoringWe performed a pilot of both simplified and in-depth monitoring
ICT solutions monitoredThe planning and design of the monitoringWe monitored only websites in the pilot.
Reporting period startThe DirectiveNot applicable for the pilot.
Reporting period endThe DirectiveNot applicable for the pilot.
Monitoring period startThe planning and design of the monitoringThis was predefined in the plan for the pilot monitoring.
Monitoring period endThe planning and design of the monitoringThis was predefined in the plan for the pilot monitoring.
Body in charge of monitoringThe body that performed the monitoringMonitoring was conducted by the Norwegian Digitalisation Agency
Requirements testedThe planning and design of the monitoring

The requirements (Success Criteria) tested was specified in the planning stage of the pilot. The sample of Success Criteria was derived from the completed and implemented ACT rules at the time of testing.

The Success criteria included in the pilot, are listed in chapter 4.3.
Sample size simplified monitoringThe planning and design of the monitoringIn the pilot we had a predefined sample of 4 websites.
Sample size in-depth monitoringThe planning and design of the monitoringIn the pilot we had a predefined sample of 4 websites.
Sample size in-depth mobile applicationsThe planning and design of the monitoringNot applicable, as only websites were tested in the pilot.

The complete data collected about the pilot monitoring is listed in the following table:

Table 28: Data collected about monitoring and reporting
Data ItemData collected in the pilot
Monitoring methodSimplified and in-depth
ICT solutions monitoredWebsites
Reporting period startNot applicable for the pilot.
Reporting period endNot applicable for the pilot.
Monitoring period startNovember 2019
Monitoring period endApril 2020
Body in charge of monitoringThe Norwegian Digitalisation Agency
Requirements tested

The following requirements for web from chapter 9 in EN 301 549 V2.1.2:

• 1.1.1 Non-text Content
• 1.2.2 Captions (Prerecorded)
• 1.2.3 Audio Description or Media Alternative (Prerecorded)
• 1.3.1 Info and Relationships
• 1.3.4 Orientation
• 1.3.5 Identify Input Purpose
• 2.4.2 Page Titled
• 2.4.4 Link Purpose (In Context)
• 3.1.1 Language of Page
• 3.1.2 Language of Parts
• 4.1.1 Parsing
• 4.1.2 Name, Role, Value
Sample size simplified monitoring4 websites
Sample size in-depth monitoring1 website.
Sample size in-depth mobile applicationsNot applicable in the pilot.

### 6.7. Learnings

• We managed to collect all data about the public sector bodies as listed in chapter 4.2.1. Most data could be downloaded from the Norwegian Register of Legal Entities (or similar).
• The combination of institutional sector and industrial classification may be enough to determine the type of service offered by the entity (through their website), especially for simplified monitoring. For in-depth monitoring, this may be insufficient.
• In many cases we will need to do a more comprehensive check of the actual content on the entity’s website. This might be very time-consuming and the quality of the data could be insufficient. It should be considered to use the accessibility statements as a data source for this purpose. This could be far more cost-effective than manual inspections. Part of the WAI-Tools project is to develop a prototype large-scale data browser, which would collect and analyse data from accessibility statements. This could be considered for future exploration.

• In the pilot, we combined data from the Norwegian Register of Legal Entities with searching on the internet in order to locate the selected public sector bodies website URLs. It should be considered to make it mandatory to report the web solution addresses to the register, and/or arrange the accessibility statements so that this data could be reported directly to the monitoring body.
• In some cases, data about the area of service can be registered at the entity level. In other cases where the entity offers a wide scope of services, and in addition has different websites for different kinds of services, the area of service must be defined at the website level. This applies for example to the Norwegian Digitalisation Agency.
• Some entities/websites offer a wide array of services. In the pilot, we only sampled a few services from the website we monitored in-depth. In a real monitoring, we will need an overview of all the services offered by an entity/through a website, as the Directive requires that at least one relevant page for each type of service provided by the website to be monitored. In consequence, we need to collect extensive information about the various services offered by the entity/website. Therefore, it should be reviewed to what extent this is cost-effective and whether it rather should be possible to monitor a sample of services, instead of including them all.
• Identifying the services offered on a website is challenging. As a rule of thumb, municipal websites will in most cases have a more standardized (statutory) set of service offerings. This will also be the case for many of the regional websites. For the state websites and those belonging to bodies governed by public law, there are most likely wide variations, both in scope, complexity, and in terms of what services they offer. One should consider using the web accessibility statements to gather this kind of information, possibly supplemented by dialogue with the public sector bodies if further information is needed.

• For the simplified monitoring, it is only the homepage that needs to be identified (and documented). For the in-depth, there is a need for more data about the pages, than in the simplified.
• We managed to identify and register data about the web pages that are required for the in-depth monitoring, given they were present at the website.
• For assessing whether a page is a part of a process, and for checking out that we sampled the pages listed in the Implementing Decision, we had to inspect the website manually. That was the only way we managed to collect the data about the web pages, as listed in chapter 4.2.3.
• The process was time-consuming and relied on participation from the web site owner. In a real monitoring, it might be considered too extensive to have dialogues with the website owners in order to collect data about the sampled web pages. Therefore, it might be useful to review whether it is necessary to sample (and document) the test pages as detailed as described in the Implementing Decision.
• Either way, we need information/data that connects the pages to services and processes. Therefore, the terms “Type of service” and “Process” should be defined.

• Collecting data about the requirements was done by consulting the EN standard.
• More specifically, information about which user accessibility needs that correspond with each requirement/Success Criteria, is to be found in the standard EN 301 549, Annex B.1. This data helps us in an analysis of what digital barriers users with different user accessibility need to meet on the internet.

• Using the three tools provided by the project partners, we managed to collect test results at the page level per Success Criterion for the in-depth monitoring, and individual test results per Success Criterion, for both the in-depth and the simplified monitoring.
• We used two of the three tools in the simplified monitoring. We did not manage to collect test results at the page level since we were not able to establish a model for converting test results from the test rule level to the Success Criterion level. The method used for this purpose in the in-depth monitoring, was not feasible for the simplified monitoring, due to the huge number of pages tested (up to 1000 pages per website).
• All the three tools reported test results in the category “failed”, while one of the three also reported results in the categories “passed” and “inapplicable”. In our opinion, it is preferable to have data about the test results that cover all the three categories passed, failed, and not applicable.
• For two of the tools, it was also difficult to ascertain whether a web page had been tested.
• It was challenging to get the hold of the number of unique failed pages per Success Criterion, by using the tools in their current status. In a real monitoring, we need to extract test results at the Success Criterion level. A solution could be to arrange the export functions from the tools so that we could retrieve results for unique pages per Success Criterion directly.
• We spent a significant manual effort to extract and present the data. We also struggled with exporting test data from the tools into another format, suitable for distribution to the website owners. In addition, we also need a data format that is suitable for further analysis.
• If we have had the opportunity to dig deeper into these issues within the timeline of the pilot, it is possible that the tool vendors could have assisted us in producing and converting test results more efficiently. This data model is to some extent built on the open data format for accessibility test results that were developed by the project and implemented in the tools as one of the output formats.

• Documentation of test methods, tools (and version), is essential to secure transparency, reproducibility, and comparability. In general, it is crucial to investigate the documentation of the tools in order to be in control of what is tested and how the tests are performed.
• Given that this pilot was carried out during active project development, it was challenging to determine which version of an ACT Rule was implemented in each of the three tools, as only information about when each ACT Rule was last updated is available on the ACT Rules website. The ACT Rules website also has an overview of the different implementations of the ACT Rules, but it is not specified which version or when the various ACT Rules were implemented in the tools. We expect this situation may improve as ACT Rules get formally published by W3C, and implementations of these stabilize.

• Data about monitoring and reporting can easily be collected from the planning and documentation of the monitoring. This data is among other things, information about the monitoring period, the body in charge of the monitoring, etc. A complete list is shown in the table in chapter 6.6

## 7. Step 4: Analysis and reporting

In this chapter, we assess whether the data we presented in chapter 6 are suitable for analysis and reporting in line with the requirements described in the Directive and Implementing Decision. The pilot focused on gaining experience with the monitoring process, rather than collecting data amounts to the extent necessary to perform the qualitative and quantitative analysis as required.

Briefly, the report from the monitoring shall contain the following:

• a description of how the monitoring was conducted
• a mapping, in the form a correlation table, demonstrating how the applied monitoring methods relate to the requirements in the standards and technical specifications
• the outcome of the monitoring, including measurement data

The term ‘measurement data’ means (within the scope of the pilot) the quantified:

• results of the monitoring activity carried out in order to verify compliance
• information about the sample of websites
• information about the level of accessibility
• information about the sample of websites tested
• results of the monitoring activity

The measurement data shall also detail the outcome of the monitoring by providing:

• a comprehensive description of the outcome
• a qualitative analysis of the outcome of the monitoring, including the findings regarding frequent or critical non-compliance

The requirements for reporting are operationalized in a set of research questions, as described in chapter 4.1.6.

In the following sections, we present our assessment of the extent to which we were able to establish data suitable for analysis and reporting.

1. What size and composition of the sample of web solutions (and mobile applications) should be included in the monitoring, both in simplified and in-depth monitoring?
2. How are the web solutions selected - and specifically - which solutions are selected in dialogue with stakeholders? For subsequent monitoring: What web solutions have been included in previous monitoring?
3. Which sets of Success Criteria are covered in the monitoring, how do they correspond with the principles (perceivable, operable, understandable, and robust), and the user accessibility needs listed in the Directive? Note: This applies to the simplified monitoring.
4. How do methods, tests, and tools identify non-compliance (simplified monitoring) and verify compliance (in-depth monitoring) with the requirements in the Directive?

In addition, we need to provide some general information about the monitoring period and the monitoring body. According to the scope of the pilot, the following sections are based on websites only (not mobile applications).

#### 7.1.1. The size and composition of the sample

Since there is no register of websites in Norway suitable for monitoring purposes, the total number of websites is unknown. The basis for the sampling of websites is therefore the sample of entities. We must also handle the fact that many entities have multiple websites, while some share their website with other public bodies. The size of the sample could easily be calculated based on the population in each country.

The data we collected in the pilot are suitable for a presentation of the

• number of websites monitored in the in-depth and simplified monitoring
• number of websites from each category of public body/level of administration
• geographic distribution of website owners based on their location

In a small sample as the pilot, it is not relevant to comment on representativeness and distribution. In a real monitoring, we must be able to describe the representativeness of the sample of entities, especially when it comes to level of administration and geographic location. Geographic balance applies especially to the local and regional bodies. The state bodies and the bodies governed by public law are in many cases located in Oslo but offer nationwide services.

The area of service is somewhat more complicated. Area of service may sometimes be relevant at the entity level, and in other cases at the website level (e.g. if a public sector body offers a variety of services and have multiple websites). We do not have a formal classification as a basis for sampling due to the area of service, but we will ensure that all areas of services are represented in the sample.

We are also able to present which areas of service are included in the monitoring. A combination of institutional sector and industrial classification may be enough to determine the type of service offered by the entity, especially for simplified monitoring. For the in-depth monitoring, we need to do a more comprehensive check of the actual content on the entity’s website.

A brief overview of the data essential to be presented in the report is presented in the table below.

Table 29: Brief overview of essential data to be presented in the report
Size and composition of the sample in the pilotSimplified monitoringIn-depth monitoring
Number of websites4 websites1 website (also included in simplified)

• 1 local
• 1 regional
• 1 state
• 1 governed by public law

One of the categories/levels of administration:

• 1 state
Geographic location

3 of 11 counties:

• Oslo
• Trøndelag
• Troms og Finnmark

1 of 11 counties:

• Oslo
Area of services represented

Represented:

• Social protection, health, transport, education, employment, recreation and culture, recreation and culture, housing, and community amenities.

Not represented:

• Taxes, environmental protection, public order, and safety

Represented:

• Social protection, employment.

Not represented:

• Health, transport, education, taxes, environmental protection, recreation and culture, housing and community amenities, public order, and safety.

In a real monitoring, further observations are needed. At least we consider it important to provide a description of how the distribution among levels of administration in the sample is, compared to the distribution of the population of public sector bodies. This is important in order to generalize the results of the monitoring.

#### 7.1.2. The sampling method

We have interpreted this question as to document which web solutions

• that should be selected in dialogue with stakeholders
• that have been included in previous monitoring

The data essential for reporting as described in chapter 4.2.2. As mentioned in chapter 6.2, dialogue with stakeholders was not within the scope of the pilot. Neither are comments about previous monitoring relevant for the pilot.

However, in a real monitoring, data will be collected as described in chapter 4.2, in order to report monitoring in line with the requirements in the Directive and the corresponding Implementing Decision.

Even though it is not explicitly required, a real monitoring report will also describe the method and data sources we relied on when we

• first selected a sample of entities and
• next selected their associated website for monitoring

The sampling method, as piloted in chapter 5.1 and 5.2, will influence to what extent the results can be generalized. Therefore, this is an important part of a monitoring report.

#### 7.1.3. The Success Criteria included in the simplified monitoring

The selection of Success Criteria for the pilot was limited to the ACT Rules implemented in the tools by the time the testing was performed (January/February 2020). Therefore, we included all together 19 ACT Rules in the pilot. This applies to both simplified and in-depth monitoring. The WAI-Tools project is moving forward, and the goal is to develop 70 ACT Rules by the end of October 2020.

The ACT Rules are mapped to Success Criterion. All the Success Criteria are mapped to the corresponding guidelines and principles of WCAG 2.1. In addition, these Success Criteria are also mapped to the user accessibility needs (functional performance statement) as described in the standard. In the pilot, we covered seven of the nine user accessibility needs (functional performance statement), using the available tools and implementations.

The data collected about the requirements as described in chapter 6.4 are sufficient and suitable for the Selection of Success Criteria in the pilot reporting in line with the Directive. The data are presented in the table.

P indicates a primary relationship with the user accessibility need, while S indicates a secondary relationship.

Table 30: Overview of Success Criteria and user accessibility needs covered in the pilot
Success CriteriaUsage without visionUsage with limited visionUsage without hearingUsage with limited hearingUsage without vocal capabilityUsage with limited manipulation or strengthUsage with limited cognition
1.1.1 Non-text ContentPPPS  S
1.2.2 Captions  PP  S
1.2.3 Audio Description or Media AlternativePS    S
1.3.1 Info and RelationshipsPS    S
1.3.4 Orientation     PS
1.3.5 Identify Input Purpose P
2.4.2 Page TitledPP   PP
3.1.1 Language of PagePSSS  S
3.1.2 Language of PartsPSSS  S
4.1.1 ParsingPS
4.1.2 Name, Role, ValuePP   S

#### 7.1.4. A mapping of methods and tests for identifying non-compliance and verifying compliance

In the pilot, we used the same methods, tests, and tools for both in-depth and simplified monitoring. All tests were performed using automated testing. In a real monitoring, we will use a combination of automatic, semi-automatic and manual methods. The monitoring report will detail this further, both for the in-depth and the simplified monitoring.

The table below presents the relationship between the WCAG’s principles, guidelines, Success Criteria, and ACT Rules.

Table 31: Relationship between WCAG’s principles, guidelines, Success Criteria, and ACT Rules
WCAG PrincipleGuidelineSuccess CriterionACT Rule ID and Name
1. Perceivable: Information and user interface components must be presentable to users in ways they can perceive.1.1 Text Alternatives: Provide text alternatives for any non-text content so that it can be changed into other forms people need, such as large print, braille, speech, symbols or simpler language1.1.1 Non-text Content

23a2a8 - Image has accessible name

59796f - Image button has accessible name

1.2 Time-based Media: Provide alternatives for time-based media.1.2.2 Captions (Prerecorded)

59796f - Video element auditory content has accessible alternative

1.2.3 Audio Description or Media Alternative (Prerecorded)

c5a4ea - Video element visual content has accessible alternative

1.3 Adaptable: Create content that can be presented in different ways (for example simpler layout) without losing information or structure1.3.1 Info and Relationships

6cfa84 - Element with aria-hidden has no focusable content (The ACT Rule also applies for SC 4.1.2)

1.3.4 Orientation

b33eff - Orientation of the page is not restricted using CSS transform property

1.3.5 Identify Input Purpose

73f2c2 - Autocomplete attribute has valid value

2. Operable: User interface components and navigation must be operable.2.2 Enough Time: Provide users enough time to read and use content.2.2.1 Timing Adjustable

bc659a - Meta element has no refresh delay

2.4 Navigable: Provide ways to help users navigate, find content, and determine where they are2.4.2 Page Titled

2779a5 - HTML page has title

c487ae - Link has accessible name (The ACT Rule also applies for SC 4.1.2)

3. Understandable: Information and the operation of user interface must be understandable.3.1 Readable: Make text content readable and understandable.3.1.1 Language of Page

b5c3f8 - HTML page has lang attribute

5b7ae0 - HTML page lang and xml:lang attributes have matching values

bf051a - HTML page language is valid

3.1.2 Language of Parts

de46e4 - Element within body has valid lang attribute

4. Robust: Content must be robust enough that it can be interpreted reliably by a wide variety of user agents, including assistive technologies.4.1 Compatible: Maximize compatibility with current and future user agents, including assistive technologies.4.1.1 Parsing

3ea0c8 - Id attribute value is unique

4.1.2 Name, Role, Value

97a4e1 - Button has accessible name

4e8ab6 - Element with role attribute has required states and properties

e086e5 - Form control has accessible name

cae760 - Iframe element has accessible name

As mentioned in chapter 6.5.4, we need a rather detailed description of how and what the tools are testing according to each Success Criterion.

In our opinion, we also need to look at the interpretation of the Success Criteria on which the test rules are based. This will give us important information about to what extent each Success Criterion included in a monitoring is covered by these test rules.

This is not within the scope of the pilot but covered by another deliverable in WAI-Tools Work Package 2, that the Norwegian Digitalisation Agency is responsible for. The document “WCAG Interpretation and Test Rule Documentation”, documents the interpretation of the Success Criteria that is implicit in the ACT Rules, and will be our main source for an assessment of to what extent each Success Criterion is covered by the monitoring.

We identified three main questions to be answered about the monitoring results:

1. What is the overall compliance status with the accessibility requirements in the Directive?
1. What is the level of compliance for the websites within each category of public sector bodies? (state, regional, local and bodies governed by public law)
2. For subsequent monitoring: How is the development over time when it comes to overall compliance with the requirements of the Directive?
2. What is the overall compliance status for each accessibility requirement (Success Criterion)?
1. Pay special attention to the Success Criteria where non-compliance is detected and to what extent non-compliance appears
2. Pay special attention to what user accessibility needs that are connected to Success Criteria with (frequent) non-compliance
3. What is the compliance status for each of the individual web solutions that are monitored?
1. The number of test pages with non-compliance should be reported
2. The results should also be specified per requirement (Success Criterion), per test page where non-compliance is detected

Note: All the results shall be specified for each monitoring method, simplified and in-depth.

#### 7.2.1. The overall compliance status

Due to the standard (as referred in chapter 4.1.2) the check for compliance or non-compliance happens at the page level. For a web solution to comply with the requirements in the Directive, all tested pages must comply.

Thus, an assessment of the overall compliance status is built on the test results at the page level. The overall compliance level can be calculated in the following way:

Table 32: Compliance level
Compliance statusConditions
Fully compliantAll tested pages are compliant with the Success Criteria
Partially compliantNot all, but more than one test page is compliant with the Success Criteria
Not compliantNone of the test pages are compliant with the Success Criteria

A calculation as presented in the table above, will not distinguish between websites that are very close to compliance and websites with extensive accessibility barriers. Experience of previous monitoring efforts based on the existing Norwegian regulations indicates that almost all the websites will be categorized as “Partially compliant”, despite significant variations in the results.

To illustrate, we present the data only for the website we monitored in-depth.

• The number of pages tested: 9
• The number of pages with the occurrence of non-compliance: 9
• The compliance status is: Non-compliant
• The percentage of test pages that are fully compliant: 0 %

If using the categories fully compliant, partially compliant and not compliant, we will not uncover

• the variations and amount of accessibility barriers and/or
• which accessibility barriers that are detected in the monitoring and
• which use situations that are particularly affected

Thus, we need a somewhat more detailed (but still simple) way of assessing the level of compliance. An alternative is to calculate the level of compliance at the element level. This is a more nuanced approach. An example:

• count number of tested elements per identified Success Criterion, specified by the outcome of each tested element (passed, failed, inapplicable and perhaps, not tested)
• calculate the compliance level as the percentage of tested elements that comply with the requirements. This may also facilitate benchmarking and measurement of trends in the level of compliance.

As we were not able to calculate the status of compliance for the websites tested in the simplified monitoring, we do not have the basis for calculating the level of compliance measured in the pilot. Thus, we do not have data to compare results between websites and categories of public sector bodies.

#### 7.2.2. The overall compliance status per Success Criterion

The calculation of the compliance status per Success Criterion is also based on test results at the test page level. As mentioned above, we have test results at the page level only for the website in the in-depth monitoring.

The results are shown in the table. To illustrate, we have used the results from only one of the tools in the pilot. For simplification, we assume that all nine pages were tested on each Success Criterion.

Assuming that pages are compliant if non-compliance are not detected, we present the following results from the pilot:

Table 33: Compliance status measured in the pilot
Compliance statusSuccess CriterionPercentage of the Success Criteria by compliance statusThe number of test pages where non-compliance are detectedPercentage of tested pages in compliance
Fully compliant1.2.2 Captions69 %0100 %
1.2.3 Audio Description or Media Alternative69 %0100 %
1.3.4 Orientation69 %0100 %
1.3.5 Identify Input Purpose69 %0100 %
2.4.2 Page Titled69 %0100 %
3.1.1 Language of Page69 %0100 %
3.1.2 Language of Parts69 %0100 %
Partially compliant1.3.1 Info and Relationships23 %722 %
4.1.1 Parsing23 %178 %
4.1.2 Name, Role, Value23 %722 %
Not compliant1.1.1 Non-text Content8 %90 %

This gives a somewhat more nuanced picture of compliance status measured in the pilot:

• We detected non-compliance for 4 of the 13 Success Criteria, which implies compliance with 9 of the 13 Success Criteria, equal to 69 percent.
• We measured partially compliance for 3 of the 13 Success Criteria, equal to 23 percent.
• We detected non-compliance on all the tested pages for one of the Success Criteria (1.1.1 Non-text Content), equal to 8 %.

The Success Criterion for which we detected non-compliance, affects the user accessibility needs shown in the table below.

Table 34: Affected user accessibility needs based on detected non-compliance
Success Criteria for which non-compliance are detectedUsage without visionUsage with limited visionUsage without hearingUsage with limited hearingUsage with limited manipulation or strengthUsage with limited cognition
1.1.1 Non-text ContentPPPS S
1.3.1 Info and RelationshipsPS   S
4.1.1 ParsingPS
4.1.2 Name, Role, ValuePP  S

The results must be further analysed in a real monitoring. For instance, an important result is that non-compliance is detected for Success Criterion 1.1.1 on seven of the total of nine test pages, combined with the fact that this Success Criterion is intended to cover a variety of user accessibility needs. This may serve as an example of a finding “regarding frequent or critical non-compliance”. Similarly, we should also do further analysis of the results for Success Criterion 1.3.1 and 4.1.2.

As mentioned, the pilot data are far too limited to be used for analysis and serves only as an example in this report. In order to generalize the results, we must produce data that shows test results for unique pages, also for the simplified monitoring. Furthermore, the test results must be specified for each Success Criteria in order to answer the research questions. In addition, it should be considered to calculate the results at the element level.

#### 7.2.3. The compliance status for the individual web solutions

The calculation of the compliance level on each website is also based on test results at the test page level. As mentioned above, we have test results at the page level only for the website in the in-depth monitoring.

In a real monitoring, one could calculate the compliance level based on what is presented in chapter 7.2.1 and 7.2.2:

• The percentage (or share) of the tested pages on the website that fully complies with all the Success Criteria in the monitoring
• The percentage of Success Criteria in the monitoring for which all the tested pages on the website comply

Since we do not have information about the number of tested elements on each website, it is not possible to calculate the compliance level at the element level.

The compliance level calculated for the website monitored in-depth in the pilot is presented in the table.

Table 35: Compliance level in the in-depth monitoring
Calculation of compliance statusPercent
The percentage of the tested pages that fully complies with the Success Criteria0 %
The percentage of the Success Criteria for which the tested pages comply69 %

As shown in table 33 (chapter 7.2.2) the results can be further detailed per Success Criterion, based on knowledge about how many pages are tested against each individual Success Criterion.

In order to provide the public sector bodies with data needed for correcting failed elements, we need to extract test results at the element level, as shown in chapter 6.5.2. It would be to a great advantage if the results at the element level allow us to pinpoint exactly which elements that need correction.

### 7.3. Learnings

It was not within the scope of the pilot to produce test data in the amount needed for performing analysis and reporting as described in the Directive. Therefore, a summary presents our experiences in establishing a data set suitable for analysis and reporting. We will also present our thoughts and reflections regarding the calculation of compliance level and other issues which in our opinion need clarification.

The findings are summarized below:

• Regarding the questions of the:
• sample of entities and websites
• sampling method
• Success Criteria and test methods
• user accessibility needs

the data collected in the pilot are assessed to be sufficient and suitable for performing the analysis needed for reporting.

• Together with the test results, the data about the monitoring are crucial for answering the research questions.

Further reflections about analysis and reporting:

• We need a documented method for the sampling of entities and websites and, as far as possible, an overview of the population of entities and websites. This is to form a basis for assessing to what extent the monitoring results can be generalized. We also need a consistent way of sampling test pages. This is crucial both for comparing results between websites, the categories of public sector bodies, and when comparing results from different monitoring periods.
• Based on the requirements for reporting, the monitoring bodies need a method and a scale to express quantified results of the monitoring activity, included quantitative information about the level of accessibility.
• The quantified test results per Success Criterion and the mapping to the user accessibility needs, form the basis for a qualitative analysis of the outcome of the monitoring, especially the findings regarding frequent or critical non-compliance. Thus, we need a method for performing the qualitative analysis as described in the Directive and a template for reporting to the EU.
• There is also a need for a clarification of the term “compliance level” (or compliance status). The monitoring bodies need a (simple) method and a scale to express quantified results of the monitoring activity, included quantitative information about the level of accessibility.
• Due to the standard, the basis for calculating the level of compliance is test results at the page level. Thus, we need a way to extract aggregated test data directly from the tools, that show both the number of tested pages and the number of unique pages that fail on each Success Criterion. This applies to both simplified and in-depth monitoring.
• Due to the standard, we may calculate the compliance level as the percentage of the tested pages that fully complies with all the Success Criteria included, specified by in-depth and simplified monitoring.
• We also question whether it is possible to calculate the level of compliance at the element level. An example:
• count number of tested elements per identified Success Criterion, specified by the outcome of each tested element (passed, failed, inapplicable and perhaps, not tested)
• calculate the compliance level as the percentage of tested elements that comply with the requirements. This may also facilitate benchmarking and measurement of trends in the level of compliance.
• Each of the approaches outlined above has different advantages and disadvantages, such as not accounting for the severity of failures. For example, only 1 of 99 images on a page is missing a text alternative, yet that one image is the submit button so that the entire page is not usable. The W3C Research Report on Web Accessibility Metrics explores different approaches for measuring the level of accessibility.
• Based on the compliance level at the website level, the average or aggregated compliance status for all the websites in each monitoring, specified by in-depth and simplified monitoring could be calculated.
• Similar calculations should also be made
• per category (level of administration) of public sector bodies
• per Success Criterion
• Since there are multiple ways of calculating the compliance status, there is a need for a clarification of the term “compliance” and how it should be measured.
• There is also a need for reporting test results that identify which elements on the tested pages that are not in compliance. That is for the website owners to support in their efforts for correcting failed elements.

## Appendix: The tools used in the pilot

In dialogue with each of the three project partners Deque, FCID, and Siteimprove, we concluded by using the following tools:

Table 36: Tools used in the pilot
Project partnerToolVersionPlatformUsed for simplified?Used for in-depth?
DequeWorldSpace Comply, based on Axe-corev6.4.0.19914Cloud platform (Software as a service – SaaS)YesYes
FCIDQualWeb Core

0.3.8

Local installation, run at the Norwegian Digitalisation Agency.NoYes
SiteimproveSiteimprove Alfa (pre-release of Siteimprove’s new, open-source engine)No informationCloud platform (Software as a service – SaaS)YesYes

ACT Rules developed in the project were implemented in the tools. The tools were used in their current state of development at the time of testing in the pilot. FCID’s QualWeb Core was not used in the simplified monitoring, as the tool did not have a crawler

The project partners (Deque, FCID, and Siteimprove) provided us with answers to the following questions:

Table 37: Questions about the tools
#Questions about the toolsRequirement in the Directive
1.Does the tool report the total number of web pages tested, as well as the number of failed, passed, and inapplicable web pages per website?Commission Implementing Decision (EU) 2018/1524, Annex II (3.1 b)
2.Does the tool report the number of failed, passed, and inapplicable elements per website?Commission Implementing Decision (EU) 2018/1524, Annex II (3.1 b)
3.Do the tool report which web pages have failed, passed, and inapplicable results?Commission Implementing Decision (EU) 2018/1524, Article 7
4.Does the tool show a direct connection between each test result and the tested Success Criterion?Commission Implementing Decision (EU) 2018/1524, Annex II (3.1 b)
5.Is it possible to export the test results in a format that is suited for further analysis?Commission Implementing Decision (EU) 2018/1524, Annex II (3.1 b)

The answers from the project partners (Deque, FCID, and Siteimprove) are presented in the table below.

Table 38: Answers from Deque, FCID, and Siteimprove
Project partnerDequeFCID
(Note 1)
Siteimprove
1. Does the tool report the total number of web pages tested, as well as the number of failed, passed, and inapplicable web pages per website?
Does the tool report the number of failed pages per website?YesNoYes
Does the tool report the number of passed pages per website?YesNoYes (Note 2)
Does the tool report the total number of pages tested per website?YesNoYes
Does the tool report the number of inapplicable pages per website?NoNoYes (Note 2)
2. Does the tool report the number of failed, passed, and inapplicable elements per website?
Does the tool report the total number of failed elements per web page?YesNoYes
Does the tool report the total number of passed elements per web page?YesNoYes (Note 2)
Does the tool report the total number of inapplicable elements per web page?NoNoNo
3. Do the tool report which web pages have failed, passed, and inapplicable results?
Does the tool report which web pages have failed results?YesYesYes
Does the tool report which web pages have passed results?YesYesYes (Note 2)
Does the tool report which web pages have inapplicable results?NoYesYes (Note 2)
4. Does the tool show a direct connection between each test result and the tested Success Criterion?
Does the tool show a direct connection between each test result and the tested Success Criterion?YesMostly yes, but also for some best practicesYes
5. Is it possible to export the test results in a format that is suited to further analysis?
Which file output format does the tool support?CSV, Excel spreadsheet, JSONJSON, EARLHTML, PDF, CSV, Excel spreadsheet, JSON
Does the tool give access to an API for customised export of test results?YesNoYes

## Fant du det du lette etter?

Stjerne(*) er obligatoriske felter som du må fylle ut for å sende skjemaet.

MERK: Du får ikke svar på denne meldingen. Har du spørsmål du ønsker svar på, send en e-post til uu@digdir.no.

Fant du det du lette etter?