FTPEXT Working Group G. Lundberg Internet-Draft WU-FTPD Development Group Expiration Date: November 28, 2002 May 2002 UTF-8 Option for FTP draft-ietf-ftpext-utf-8-option-00.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. To view the list of IETF Internet-Draft Mirror Sites, see http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract This document specifies an extension to the File Transfer Protocol (FTP) which provides for inter-operation between existing implementations and those supporting the exchange of UTF-8 encoded pathnames, and clarifies certain issues involved with UTF-8 encoding. It introduces a new option, UTF-8, negotiated by use of the OPTS command. Through use of this option, the user informs the server of its willingness to accept UTF-8 encoded pathnames. The proposed extension requires that neither party transmit UTF-8 encoded pathnames without having first successfully negotiated this option. Implementation of this extension is RECOMMENDED. Lundberg Expires November 28, 2002 [Page 1] Internet-Draft UTF-8 Option for FTP May 2002 Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. UTF-8 Option . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. UTF-8 Encoding Issues . . . . . . . . . . . . . . . . . . . . 7 4. Misuse CR NUL in Pathnames . . . . . . . . . . . . . . . . . . 7 5. ABNF for Pathnames . . . . . . . . . . . . . . . . . . . . . . 7 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 Normative References . . . . . . . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 9 Annex A - Specific Changes to Existing Specifications . . . . 10 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 11 Lundberg Expires November 28, 2002 [Page 2] Internet-Draft UTF-8 Option for FTP May 2002 1. Introduction Internationalization of the File Transfer Protocol [RFC2640] enhances the capabilities of the FTP by removing the 7-bit restrictions on pathnames used in commands and responses. It defines a single character set, in addition to NVT ASCII and EBCDIC, which is to be understandable by all systems, and specifies that character set to be ISO/IEC 10646:1993 (UCS) using UTF-8 encoding when exchanging pathnames. UTF-8 encoding is a file-safe encoding which may avoid the use of byte values that have special significance during the parsing of pathname character strings. It represents each UCS character as a sequence of 1 to 6 bytes in length, using a modified Huffman encoding scheme. For all sequences of one byte the most significant bit is ZERO. For all sequences of more than one byte the number of ONE bits in the first byte, starting from the most significant bit position, indicates the number of bytes in the UTF-8 encoded sequence, followed by a ZERO bit. A property of UTF-8 encoding is that its single byte sequence is consistent with the 7-bit ASCII character set. [RFC2640] incorrectly asserts that this feature of UTF-8 encoding will allow existing implementations to inter-operate with implementations which support UTF-8 encoding. [RFC2640] further ignores problems of inter-operation by only requiring that conforming implementations must support UTF-8 encoding for the transfer and receipt of pathnames. It contains a great deal of discussion of how a compliant implementation should treat UTF-8 encoding under various conditions. Unfortunately, other than to allow existing implementations to continue to use local character set encodings where pathnames encoded in those character sets are not UTF-8 encoded (and thus not ASCII), [RFC2640] gives no thought to the effect of UTF-8 encoding upon existing implementations. At a strictly protocol level, [RFC2640] is generally correct; the use of UTF-8 encoding should not inherently prevent most existing implementations from correctly reading the character sequence as a pathname. In the best case, existing implementations, when presented with UTF-8 encoded pathname, will treat it as an error and recover. The possibility exists, however, that existing implementations will not detect the error, and either fail directly, pass the information to the host system causing a failure at that level, or treat the characters as invocations of special functions (such as end-of-line markers or command-line editing). Experience has shown that presenting hosts and applications with unexpected character sequences may result in serious security issues [RR, EL, AD]. Lundberg Expires November 28, 2002 [Page 3] Internet-Draft UTF-8 Option for FTP May 2002 The specifications of [RFC2640] provide the means for the server to indicate its willingness to accept UTF-8 encoded pathnames. To restore inter-operation with existing implementations, the FTP should provide a means for the user to express its willingness to accept UTF-8 encoded pathnames, and servers should not transmit UTF-8 encoded pathnames without prior authorization from the user. [RFC2640] also incorrectly requires interpreting the Telnet end-of- line sequence CR NUL as a pathname character. [RFC1123] attempted to address this issue with inter-operation of the Telnet protocol, but its effect upon the FTP has been largely ignored. The intention of [RFC1123] was to clarify that a server implemenation must transmit the Telnet end-of-line sequence as CR LF, but that both user and server implementations must be prepared to accept either CR LF or CR NUL as representing Telnet end-of-line when received, and that the choice of whether to send CR LF or CR NUL is up to the user and should be configurable. The requirement in [RFC2640] that CR NUL be an allowed pathname character, when considered in the context of the [RFC1123] requirements, should cause existing implementations to incorrectly interpret the sequence as Telnet end-of-line, causing loss of synchronization. Experience shows implementations often fail in a number of ways once they have lost synchronization. While server implementations usually cope fairly well with the problem, user implementations often lock up. It is possible, however, that an implementation will fail in some critical manner that may cause serious security issues. The receipt of unexpected UTF-8 encoded information alone raises the possibility of serious security problems. When taken together with the misuse of the Telnet end-of-line sequence, the security implications increase dramatically since not only is unexpected information being received, but the sender has control of the protocol's primary sequence point. This document addresses the deficiencies of [RFC2640] by adding a new option which must be successully negotiated prior to transmission of UTF-8 encoded pathnames, making specific clarifications in the use UTF-8 encoding, and clarifying the proper interpretation of the sequence CR NUL as being Telnet end-of-line and thus not a usable pathname character. In the development of the protocol specified in this document, an alternative was considered: the server could interpret the LANG command as an indication that the user is willing to accept UTF-8 encoded pathnames. In this case, the server should provide the language EN (English) as an alternative so that the user can Lundberg Expires November 28, 2002 [Page 4] Internet-Draft UTF-8 Option for FTP May 2002 negotiate the LANG command without any change to the message text included with replies. This approach unreasonably assumes that implementations will either adopt [RFC2640] in its entirety, or not at all. While that may often be a safe assumption, the concept of UTF-8 encoded pathnames is logically distinct from the concept of the language used for free-form response text. It unnecessarily limits implementers, forcing them to support the internationalization of response text when they desire only to allow internationalized pathnames. When reading the following specifications, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 [RFC2119]. 2. UTF-8 Option The user issues the OPTS UTF-8 command to indicate its willingness to send and receive UTF-8 encoded pathnames over the control connection. Prior to sending this command, the user should not transmit UTF-8 encoded pathnames. The specifications of [RFC2640] apply only to pathnames sent over the control connection. The OPTS UTF-8 command provides an optional parameter which modifies the behavior of the NLST command. When this option is specified, pathnames transmitted over the data connection in response to the NLST command must be UTF-8 encoded, and the data transmission assumes TYPE L 8 character framing. By using this optional parameter, responsibility for conversion to the local character set of the pathnames contained in the NLST data transfer shifts from the server implementation to the user. Before sending the UTF-8 option, the user should issue the FEAT command and examine the response to that command. If the response contains the UTF-8 option, the user should take that option to mean the server is willing to transmit UTF-8 encoded pathnames, and may support the OPTS UTF-8 command to enable their use. Note that the specification of the OPTS command, and the OPTS UTF-8 variant, provide a reliable means to determine support for UTF-8 encoded pathnames; no harmful effect occurs if the user does not issue the FEAT command. The format of the OPTS UTF-8 command is: OPTS UTF-8 [ NLST ] The text of the command line is not case sensitive, but should be transmitted in upper case as shown. Lundberg Expires November 28, 2002 [Page 5] Internet-Draft UTF-8 Option for FTP May 2002 The UTF-8 option allows one optional parameter: NLST. When present, NLST must be separated from the UTF-8 option by exactly one space character. Possible replies to the OPTS UTF-8 command, and their meanings, include: 200 Command okay. 421 Service not available, closing control connection. 500 Syntax error, command unrecognized. 501 Syntax error in parameters or arguments. 502 Command not implemented. The User-FTP process must not depend upon the actual text (if any) included with the reply. A Server-FTP process which does not implement the OPTS command will reply with either response code 500 or 502. For compatibility with existing implementations, the User-FTP process must be prepared for this reply and must not transmit UTF-8 encoded pathnames. The response code 501 indicates the Server-FTP process does not implement the OPTS UTF-8 command or was unable to recognize the parameters given with the command. For compatibility with existing implementations, the User-FTP process must be prepared for this reply and must not transmit UTF-8 encoded pathnames. Prior to transmitting response code 200 in response to the OPTS UTF-8 command, the Server-FTP must not transmit UTF-8 encoded pathnames and should not accept them on commands: the Server-FTP should transmit either response code 501 or 553 in reply to any command which includes a pathname outside the range of 7-bit ASCII; and the Server- FTP should transmit response code 550 in reply to any command to which the server would otherwise have sent a UTF-8 encoded pathname. Upon receiving response code 200, the user should transmit only UTF-8 encoded pathnames, and should expect to receive only UTF-8 encoded pathnames from the server. The user may issue the OPTS UTF-8 command without the NLST parameter to restore the operation of the NLST command. To terminate the use of UTF-8 encoding on the control connection, the user must either issue the REIN command or terminate the FTP session and begin anew. Lundberg Expires November 28, 2002 [Page 6] Internet-Draft UTF-8 Option for FTP May 2002 3. UTF-8 Encoding Issues The UTF-8 encoding scheme allows the possibility of multiple encodings for a single character. For each character, there is a single, shortest form of the UTF-8 encoding. When transmitting UTF-8 encoded characters, the shortest form should be used. When interpreting received UTF-8 encoded information, the implementation should accept the non-shortest form as meaning the same character as the preferred, shortest form. (One method would be for the implementation to actually replace the character with the shortest form encoding.) Implementations, however, must attach no special significance to any non-shortest form encodings. In particular, the non-shortest form encodings for CR, LF and NUL are not to be treated as potential Telnet end-of-line characters. 4. Misuse of CR NUL in Pathnames The assertion in [RFC2640] that CR NUL is not a Telnet end-of-line sequence is incorrect. The Telnet protocol requires the character CR to always be followed by either the character LF or NUL. The design of the FTP requires that the sequences CR LF and CR NUL be treated as a Telnet end-of- line and all existing implementations properly recognize them as such. Pathnames must not include the character CR. This applies all uses of the character CR whether alone or followed by any other character. 5. ABNF for Pathnames The ABNF for pathnames presented in [RFC2640] is incorrect. When UTF-8 encoding is not present, the correct syntax is PATHNAME = *( %x20-7E ) When UTF-8 encoding is in use, the correct syntax is PATHNAME = *( %x20-7E / %x80-FF ) Note that both cases render moot all discussion about the use of the characters CR, LF, and NUL, in pathnames. Lundberg Expires November 28, 2002 [Page 7] Internet-Draft UTF-8 Option for FTP May 2002 6. IANA Considerations The list of valid option names for the FTP OPTS command is believed to be first-come first-served, and managed outside the control of the Internet Assigned Numbers Authority (IANA). 7. Security Issues While it should improve inter-operation, and therefore may improve security, the addition of the UTF-8 option itself should have no effect upon the security of the FTP, networks or hosts. The intention of this document is to address inter-operational issues in the existing protocol specifications. Some of those issues can lead to unexpected data appearing on the communications channel. Experience shows this can lead to serious security issues, potentially including the compromising hosts on the network. While one would hope that implementations were hardened against such occurances, some implementations may not be. The importance of such hardening cannot be emphasized strongly enough. Acknowledgments The following people provided significant assistance with the analysis of the problem, the proposed solution, and the preparation of this document: The members of vulnerability handling team at the CERT Coordination Center. Normative References [RFC854] J. Postel and J. Reynolds, "TELNET PROTOCOL SPECIFICATION", STD 8, RFC 854, May 1983. [RFC959] J. Postel and J. Reynolds, "FILE TRANSFER PROTOCOL (FTP)", STD 9, RFC 959, October 1985. [RFC1123] IETF, "Requirements for Internet Hosts -- Application and Support", STD 3, RFC 1123, October 1989. [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate Requirements Levels", RFC 2119, BCP 14, March 1997. [RFC2234] D. Crocker and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. Lundberg Expires November 28, 2002 [Page 8] Internet-Draft UTF-8 Option for FTP May 2002 [RFC2389] P. Hethmon and R. Elz, "Feature negotiation mechanism for the File Transfer Protocol", RFC 2389, August 1998. [RFC2640] B. Curtin, "Internationalization of the File Transfer Protocol", RFC 2640, July 1999. Informative References [RR] R. Russell and S. Cunningham, "Hack Proofing Your Network: Internet Tradecraft", ISBN 1-928994-15-6, January 2000. [EL] E. Labbate, "Vulnerability as a Function of Software Quality", http://rr.sans.org/code/quality.php, March 2001. [AD] A. Davis, et al, "Understanding the Risks of SNMP Vulnerabilities", http://www.lucent.com/livelink/255868_Whitepaper.pdf, March 2002. Author's Address Gregory A. Lundberg WU-FTPD Development Group 1441 Elmdale Drive Dayton, OH 45409-1615 US Phone: +1 937 299 7653 Email: lundberg@vr.net Lundberg Expires November 28, 2002 [Page 9] Internet-Draft UTF-8 Option for FTP May 2002 Annex A - Specific Changes to Existing Specifications In summary, the specifications presented in this memo make the following specific changes with respect to the requirements of [RFC2640]: - add the option UTF-8, and require that implementations must not transmit UTF-8 encoded pathnames until after successful negotiation of the UTF-8 option; - clarify that UTF-8 encoding applies only to pathnames transmitted over the control connection, and provide a means to specify UTF-8 encoding in the data transfer sent in response to the NLST command; - clarify the use of UTF-8 encoding with respect to non-shortest encodings; - clarify that the character sequence CR NUL is a Telnet end-of-line sequence and must not be treated as a pathname character; - specify the correct ABNF syntax for pathnames when UTF-8 encoding is, and is not, in use. Lundberg Expires November 28, 2002 [Page 10] Internet-Draft UTF-8 Option for FTP May 2002 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to The Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by The Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by The Internet Society. Lundberg Expires November 28, 2002 [Page 11]