Is the UTF-8 string from Google Closure considered valid in this case?

When running the UTF-8 to byte array tests in the Google Closure library, the string provided is:

\u0000\u007F\u0080\u07FF\u0800\uFFFF

This string is expected to be converted into the following array:

[0x00, 0x7F, 0xC2, 0x80, 0xDF, 0xBF, 0xE0, 0xA0, 0x80, 0xEF, 0xBF, 0xBF]

After testing with other JavaScript and TypeScript implementations for UTF-8 to byte array conversion, some of them have claimed that the given UTF-8 string is invalid.

The string seems to cover the values that transition from 1 byte to 2-byte to 3-byte values.

The question remains: Is Google's implementation correct or are the other libraries right?

Answer №1

Google's accuracy stands true.

The sequence

'\u0000\u007F\u0080\u07FF\u0800\uFFFF'
symbolizes Unicode codepoints
U+0000 U+007F U+0080 U+07FF U+0800 U+FFFF
.

The exact conversion of these codepoints to UTF-8 is indeed bytes

00 7F C2 80 DF BF E0 A0 80 EF BF BF
, as confirmed by Google.

It's important to note that U+FFFF is considered a non-character codepoint, according to the Unicode standard:

A "noncharacter" is a code point that is permanently reserved in the Unicode Standard for internal purposes

...

In the initial version of Unicode, the code points U+FFFE and U+FFFF were marked as "Not character codes" and termed "NOT A CHARACTER". The term "noncharacter" emerged from these early designations and labels.

Specifically:

Q: Are noncharacters meant for sharing?

A: No. They are exclusively intended for internal use. For instance, they might serve as placeholders within strings or act as targets for specific weightings in a collation tailoring process to simplify support for "alphabetic index" implementations.

Q: Are noncharacters prohibited from being shared?

A: This matter has sparked controversy because of conflicting interpretations concerning the interchangeability of noncharacters. While the standard initially stated that noncharacters "should never be interchanged", some took this to mean they "shall not be interchanged", implying any string containing a noncharacter would violate the standard. However, the purposeful ambiguity was intended since the interpretation of noncharacters is strictly internal to their implementation context, giving them no publicly exchangeable semantics. Despite varying wording across specifications and interpretations, it was clarified in 2013 with UTC's issuance of Corrigendum #9 which removed the phrase indicating prohibition from interchange, making it clear that noncharacters have no formal restrictions on interchange. This update was included in Unicode 7.0.

Q: Are noncharacters considered invalid in Unicode strings and UTFs?

A: Absolutely not. The presence of noncharacters does not render a Unicode string malformed in any UTF format. This is evident in the presented table where each noncharacter code point has a valid representation in UTF-32, UTF-16, and UTF-8. Any implementation transferring noncharacter code points between different UTF representations must accurately retain these values. Although designated as "noncharacters" and not intended for open sharing, they do not constitute illegitimate or improper code points that invalidate strings containing them.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Is there a way to retain modal inputs even after the modal has been closed?

Within the initial modal, users are able to enter their name and hit a create button. Once the name is added, the next modal (modal 2) will appear. I am seeking advice on how to retain the input from the first modal when a user clicks cancel in the second ...

Guide on transferring the content of a div to the beginning of a file

I am using a function xy to include PHP code in my document: go.php <?php $file_data = '?'; $file_data .= file_get_contents('xml.xml'); file_put_contents('xml.xml', $file_data); ?> ")} HTML <div id="content"con ...

Guide to transferring the current date to a text box using Angular JS with Protractor

I need to add a date field that is in text type. The HTML code for it is as follows: <input class="form-control ng-pristine ng-invalid ng-touched" type="text" id="date" name="date"> Can anyone assist me in automatically sending the current date to ...

Discovering the Cookie in Angular 2 after it's Been Created

My setup includes two Components and one Service: Components: 1: LoginComponent 2: HeaderComponent (Shared) Service: 1: authentication.service Within the LoginComponent, I utilize the authentication.service for authentication. Upon successful authent ...

Tips for resolving aliases in tsconfig.app.json when dealing with multiple source directories in WebStorm

When it comes to generating source files, I do things a bit differently and create some of them outside of the usual src directory. Here's how my structure looks: - project - generated - $ui-services some-other.service.ts - src - ...

How can I access the DOM element within my render function in React on the same component?

I'm curious about the best approach for accessing DOM elements within my render function from the same component. It's important to keep in mind that this component will be rendered multiple times on a single page. For example: var ToDoItem = R ...

Struggling to get Print.js to work properly for printing, as nothing seems to be happening when I try to print

I recently started using a new tool called PrintJS. However, after going through the documentation, I realized that it is not displaying a printer preview window when I try to print. What's intriguing is that there are no JavaScript errors showing up ...

Exploring Angular 2 testing with TypeScript: multiple occurrences of specifications in Jasmine

Recently, I followed a tutorial on testing an Angular 2 application which can be found at: https://angular.io/docs/ts/latest/guide/testing.html Upon completing the 'First app test' section and moving to 'unit-tests.html', I noticed tha ...

Can Jquery be used to swap out specific li content?

<div class="widget_ex_attachments"> <ul> <li> <i class="fa fa-file-word-o"></i> <a href="uploads/2014/09/Parellel-universe.docx">Parellel universe</a> </li> ...

Ensure that the key of an object's property is identical to the value of the property

I am tasked with creating a specific object structure, where each object key must match its corresponding ID: const entities = { abc: { id: 'abc' }, def: { id: 'def' } } To achieve this, I attempted the following code: ...

Incorporating z-index into weekly rows within the FullCalendar interface

I'm facing an issue where my detail dropdowns on events are being cropped by the following row. Is there a solution to adjust the z-index of each week row (.fc-row) in the monthly view, arranging them in descending order? For example, setting the z-i ...

Compiling this HTML template in dev mode with Vue is agonizingly slow

I've been working with a template app setup that was bootstrapped using vue CLI. Within one of my components, I have 20 nested div tags. During development mode, it's taking roughly 10 seconds to compile this component. The more deeply I nest HTM ...

Enhancing server error troubleshooting with Next.js: improved stack trace visibility?

When a server error happens on Next.js, the call stack only provides information about the specific component where the error is located without offering any further guidance. For instance, in /pages/index.js, I have a component named Test. Within this co ...

Ways to efficiently populate HTML elements with JSON data

I am working on grasping the concept of functional programming. My understanding so far is that it involves encapsulating everything into functions and passing them around. For instance, in my current example, I am attempting to fetch data from a RESTApi a ...

Is it possible for NodeJS streams to store objects in a queue if there is no downstream pipe attached?

Is it possible for nodejs streams to queue objects natively before piping them to a Writable stream? Part 2: After calling super.push(null), I am unable to process items any further. Is there a way to restart a stream once super.push(null) has been called ...

Navigate to a different page and automatically launch a few lightbox pop-ups

I have a website address: www.example.com/test123 When users visit this page, it redirects to: <% response.redirect "http://www.example.com/index.asp?test=true" %> What I want is for the page to open a lightbox with another file inside when redire ...

Unraveling the Mystery of @Input and @Output Aliases in Angular 2

After researching about the @Input() and @Output() decorators, I discovered that we have the option to use an alias instead of the property name for these decorators. For example: class ProductImage { //Aliased @Input('myProduct') pro ...

The curious behavior of JavaScript object fields in Chrome

Attempting to update the targetRow's field with a variable converted from a string to a number is resulting in strange output in Chrome's console: targetRow[fieldName] = Number(value); console.log(targetRow[fieldName]); // = ...

Error: The array index is outside the permissible range

Can someone assist me in understanding the cause of this error? IndexOutOfRangeException: Array index is out of range. (at Assets/Scripts/PlayerCar.js:73) CompareApproximately (det, 1.0F, .005f) UnityEditor.DockArea:OnGUI() Snippet from My code: var Gea ...

Browserify is unable to locate the 'jquery' module

While attempting to package my app with browserify, I encountered the following error message: Cannot find module 'jquery' from '/home/test/node_modules/backbone' I have searched for solutions to this issue, but none of them seem to ...