This article is focused on providing clear, simple, actionable guidance for providing Input Validation security functionality in your applications. Goals of Input Validation Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components.
Input validation should happen as early as possible in the data flow, preferably as soon as the data is received from the external party. Data from all potentially untrusted sources should be subject to input validation, including not only Internet-facing web clients but also backend feeds over extranets, from suppliers, partners, vendors or regulators  , each of which may be compromised on their own and start sending malformed data.
Input Validation should not be used as the primary method of preventing XSS , SQL Injection and other attacks which are covered in respective cheat sheets but can significantly contribute to reducing their impact if implemented properly. Input validation strategies Input validation should be applied on both syntactical and semantic level. Syntactic validation should enforce correct syntax of structured fields e. SSN, date, currency symbol while semantic validation should enforce correctness of their values in the specific business context e.
Input validation can be used to detect unauthorized input before it is processed by the application. Implementing input validation Input validation can be implemented using any programming technique that allows effective enforcement of syntactic and semantic correctness, for example: Plus, such filters frequently prevent authorized input, like O'Brian, where the ' character is fully legitimate.
White list validation is appropriate for all input fields provided by the user. White list validation involves defining exactly what IS authorized, and by definition, everything else is not authorized.
If it's well structured data, like dates, social security numbers, zip codes, e-mail addresses, etc. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place. Validating free-form Unicode text Free-form text, especially with Unicode characters, is perceived as difficult to validate due to a relatively large space of characters that need to be whitelisted.
The primary means of input validation for free-form text input should be: Arabic, Cyryllic, CJK ideographs etc individual character whitelisting — if you allow letters and ideographs in names and also want to allow apostrophe ' for Irish names, but don't want to allow the whole punctuation category References: Input validation of free-form Unicode text in Python Regular expressions Developing regular expressions can be complicated, and is well beyond the scope of this cheat sheet.
There are lots of resources on the internet about how to write regular expressions, including: In summary, input validation should: Be applied to all input data, at minimum Define the allowed set of characters to be accepted Defines a minimum and maximum length for the data e.
Upload Verification Use input validation to ensure the uploaded filename uses an expected extension type Ensure the uploaded file is not larger than a defined maximum file size If the website supports ZIP file upload, do validation check before unzip the file.
The check includes the target path, level of compress, estimated unzip size. Upload Storage Use a new filename to store the file on the OS. Do not use any user controlled text for this filename or for the temporary filename. When the file is uploaded to web, it's suggested to rename the file on storage. For example, the uploaded filename is test. JPG with a random file name. The purpose of doing it to prevent the risks of direct file access and ambigious filename to evalide the filter, such as test.
Uploaded files should be analyzed for malicious content anti-malware, static analysis, etc The file path should not be able to specify by client side. It's decided by server side. Public Serving of Uploaded Content Ensure uploaded images are served with the correct content-type e.
However, it is important to be aware of the following file types that, if allowed, could result in security vulnerabilities. If permitted on sites with authentication this can permit cross-domain data theft and CSRF attacks. Note this can get pretty complicated depending on the specific plugin version in question, so its best to just prohibit files named "crossdomain. Upload Verification Use image rewriting libraries to verify the image is valid and to strip away extraneous content. Set the extension of the stored image to be a valid image extension based on the detected content type of the image from image processing e.
Ensure the detected content type of the image is within a list of defined image types jpg, png, etc Email Address Validation Email Validation Basics Many web applications do not treat email addresses correctly due to common misconceptions about what constitutes a valid address.
Specifically, it is completely valid to have an mailbox address which: Please note, email addresses should be considered to be public data. Many web applications contain computationally expensive and inaccurate regular expressions that attempt to validate email addresses. Recent changes to the landscape mean that the number of false-negatives will increase, particularly due to: Check for presence of at least one symbol in the address Ensure the local-part is no longer than 64 octets Ensure the domain is no longer than octets Ensure the address is deliverable To ensure an address is deliverable, the only way to check this is to send the user an email and have the user take action to confirm receipt.
Beyond confirming that the email address is valid and deliverable, this also provides a positive acknowledgement that the user has access to the mailbox and is likely to be authorized to use it. This does not mean that other users cannot access this mailbox, for example when the user makes use of a service that generates a throw away email address. Email verification links should only satisfy the requirement of verify email address ownership and should not provide the user with an authenticated session e.
Email verification codes must expire after the first use or expire after 8 hours if not used. Address Normalization As the local-part of email addresses are, in fact - case sensitive, it is important to store and compare email addresses correctly. To normalise an email address input, you would convert the domain part ONLY to lowercase.
Unfortunately this does and will make input harder to normalise and correctly match to a users intent. It is reasonable to only accept one unique capitalisation of an otherwise identical address, however in this case it is critical to: