Skip to content

Find languages#24

Open
tangledhelix wants to merge 1 commit intoDistributedProofreaders:masterfrom
tangledhelix:find_langs
Open

Find languages#24
tangledhelix wants to merge 1 commit intoDistributedProofreaders:masterfrom
tangledhelix:find_langs

Conversation

@tangledhelix
Copy link
Member

Find the main document language (<html lang="xx") and report it.

Find any lang="xx" tag anywhere in the document, report a list of found languages.

Use the IANA lang subtag registry to provide friendly names (enEnglish). Also used to help validate languages: if the language code isn't found in the registry, it's flagged for review.

This feature is a rough port (with improvements) from the PPtools site's HTML check tool.

Testing Notes:

Main language:

A document has only ONE main language, defined in the <html> tag.

  • Change the lang attribute on the <html> tag and check the main language changes.
  • Change to a non-existent language code (e.g. cx): main language should be marked WARN
  • Remove the lang attribute: main language should be marked FAIL
  • Should work with html5 as well as html4/xhtml files <html xml:lang="en" lang="en" ...>

Secondary languages:

Any HTML tag may have a lang attribute (per the HTML spec). Add lang attributes anywhere you like. A list of all found languages should be produced in the report.

  • Try adding a lang that's the same language as the main document. It should appear in the report.
  • Try adding a few lang throughout the document, they should be reported.
  • If the same lang is used on multiple HTML elements, it will only be reported once.
  • Try putting numerous lang on the same line of the file to ensure they're all found.
  • A language should have its proper name next to it in the report (e.g. fr (French)).
  • If a language code is not found in the registry, it will be highlighted to draw attention so the user can double-check it.

@tangledhelix
Copy link
Member Author

Example output:

[pass] document language: en (English)
[info] other languages:
          de (German)
          fr (French)
          la (Latin)

Find the main document language (<html lang="xx"). Report as the main document language.

Find any lang="xx" tag anywhere in the document, report a list of found languages.

Use IANA lang subtag registry to provide friendly names. Also used to help validate languages - any language code not found in the registry is flagged as a warning so the user can double-check it's correct.

Add a script to build JSON file from IANA lang subtag registry; pphtml uses the pre-built JSON.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants