I came across the Zaru project that provides filename sanitization for Ruby. You can learn a bit about filenames reading the article on Wikipedia. I thought it might be a nice feature to implement something like this for Factor.
The rules for sanitization are relatively simple, so I will list and then implement each one:
1. Certain characters generally don't mix well with certain file systems, so we filter them:
: filter-special ( str -- str' ) [ "/\\?*:|\"<>" member? not ] filter ;
2. ASCII control characters (0x00
to 0x1f
) are not usually a good idea, either:
: filter-control ( str -- str' ) [ control? not ] filter ;
3. Unicode whitespace is trimmed from the beginning and end of the filename and collapsed to a single space within the filename:
: filter-blanks ( str -- str' ) [ blank? ] split-when harvest " " join ;
4. Certain filenames are reserved on Windows and are filtered (substituting a "file
" placeholder name):
: filter-windows-reserved ( str -- str' ) dup >upper { "CON" "PRN" "AUX" "NUL" "COM1" "COM2" "COM3" "COM4" "COM5" "COM6" "COM7" "COM8" "COM9" "LPT1" "LPT2" "LPT3" "LPT4" "LPT5" "LPT6" "LPT7" "LPT8" "LPT9" } member? [ drop "file" ] when ;
5. Empty filenames are not allowed, replaced instead with file
:
: filter-empty ( str -- str' ) [ "file" ] when-empty ;
6. Filenames that begin with only a "dot" character are replaced with file
:
: filter-dots ( str -- str' ) dup first CHAR: . = [ "file" prepend ] when ;
Putting it all together, and requiring the filename to be no more than 255 characters:
: sanitize-path ( path -- path' ) filter-special filter-control filter-blanks filter-windows-reserved filter-empty filter-dots 255 short head ;
The code for this (and some tests) is on my GitHub.
No comments:
Post a Comment