Logo

What is the best regular expression to check if a string is a valid URL?

There's no single “best” regular expression that flawlessly validates every possible URL according to the full RFC standards. URLs can have complex structures (e.g., IPv6 addresses, punycode, unusual schemes, and more). That said, many people use a practical regex that covers the most common URL formats:

<details> <summary>Example of a Practical Regex</summary>
^https?:\/\/
(?:
  (?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}   # domain...
  |
  (?:\d{1,3}\.){3}\d{1,3}           # or IPv4
  |
  \[?[A-Fa-f0-9:]+\]?               # or IPv6 (simple version)
)
(?::\d+)?                           # optional port
(?:\/[^\s]*)?                       # optional slash + path
$

(Expanded for clarity; you’d typically remove the whitespace and comments or use the x (extended) flag if supported.)

</details>

This regex attempts to handle:

  1. HTTP/HTTPS scheme (^https?://).
  2. Common domain patterns (e.g., example.com).
  3. IPv4 addresses (like 192.168.0.1).
  4. A simplistic approach to IPv6 ([::1], etc.).
  5. Optional port (:8080).
  6. Optional path after a slash.

Caveats

  • This pattern won’t catch every possible valid URL (e.g., custom schemes like ftp://, mailto:, or some exotic internationalized domains).
  • Full RFC-compliant URL regexes become extremely large and complex, or incomplete.

Alternative Approaches

  1. Use a Parser

    • For example, in JavaScript:
      function isValidUrl(str) { try { new URL(str); return true; } catch (e) { return false; } }
    • The built-in URL constructor will attempt to parse the string. If it fails, it’s not a valid URL.
  2. Use a Library

    • Many languages have robust libraries that parse URLs for you (e.g., urllib in Python, java.net.URL in Java). These handle edge cases that can be missed by hand-crafted regexes.
  3. Restrict Format

    • If your use case only needs to accept typical http://example.com URLs (no custom schemes, no exotic domain endings), a simpler regex might suffice.

Example JS Usage with the Above Regex

function isHttpUrl(str) { const pattern = new RegExp( '^https?:\\/\\/' + // protocol '(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,}' + // domain '(?::\\d+)?' + // optional port '(?:\\/[^\\s]*)?' + // optional path '$' ); return pattern.test(str); } console.log(isHttpUrl('https://example.com')); // true console.log(isHttpUrl('http://example.com/path')); // true console.log(isHttpUrl('https://192.168.0.1:3000')); // true console.log(isHttpUrl('ftp://somewhere.com')); // false (ftp not allowed) console.log(isHttpUrl('not a url')); // false

Final Thoughts

  • For comprehensive URL validation (handling all international domain scenarios, custom schemes, etc.), it’s best to use an actual URL parser.
  • For common http:// or https:// patterns, a simpler or slightly extended regex is enough in many cases.
  • Validate according to your needs. If you only allow https://, your regex can be simpler.

Enhance Your JavaScript and Coding Skills

If you’re working with regexes, URLs, or broader JavaScript challenges, consider these DesignGurus.io courses:

  1. Grokking JavaScript Fundamentals
    Deepen your understanding of closures, prototypes, and async programming to handle robust validation logic.

  2. Grokking the Coding Interview: Patterns for Coding Questions
    Strengthen your coding interview skills with pattern-based problem solving.

For personalized feedback, check out Mock Interviews with ex-FAANG engineers. Also, explore the DesignGurus.io YouTube channel for free tutorials on system design, coding patterns, and more.

Ultimately, no single regex can handle every valid URL, but a “good enough” pattern for common http/https cases plus a URL parser is usually the best approach.

CONTRIBUTOR
TechGrind