Saturday, April 18, 2015

Interpolate

Today, I made some minor improvements to the interpolate vocabulary, which provides simple string interpolation and formatting.

We have had the ability to use named variables:

IN: scratchpad "World" "name" set
               "Hello, ${name}" interpolate
Hello, World

But now we can just as easily use stack arguments (numbered from the top of the stack):

IN: scratchpad "Mr." "Anderson"
               "Hello, ${1} ${0}" interpolate
Hello, Mr. Anderson

In any order, even repeated:

IN: scratchpad "James" "Bond"
               "${0}, ${1} ${0}" interpolate
Bond, James Bond

As well as anonymously, by order of arguments:

IN: scratchpad "Roses" "red"
               "${} are ${}" interpolate
Roses are red

And even mix named variables and stack arguments:

IN: scratchpad "Factor" "lang" set
               "cool" "${lang} is ${0}!" interpolate
Factor is cool!

Right now we simply convert objects to human-readable strings using the present vocabulary. In the future, it would be nice to support something like Python's string format specifications, which are similar but slightly different than our printf support.

12 comments:

ruv said...

Hello!
Why don't you count arguments from the top of stack?
I'd prefer (having Forth background):
"James" "Bond" "${0}, ${1} ${0}" interpolate # -- "Bond, James Bond"

mrjbq7 said...

Hi @ruv, we could definitely do that easily, it .

The practical reason why I did that was to be compatible with a word we used to have called "ninterpolate" that only worked with stack arguments and it numbered them that way:

https://github.com/slavapestov/factor/blob/760126525be7f9a2793908e0067c08609ee4f9e8/basis/interpolate/interpolate.factor#L46

Lemme ask @erg and see what he thinks, too.

mrjbq7 said...
This comment has been removed by the author.
mrjbq7 said...

Seems like there are votes for both ways, I'm just going to keep the "ninterpolate" way for right now. I think usability hinges on a few questions:

1) Is it easier to read numbered the old or new way?

2) Is there a convention somewhere we can compare with?

3) Do you mostly write format strings that look at arguments deeper in the stack before ones that are on the top of the stack? (Meaning, will you see a lot of "{2}, {1}, {0}", vs "{0}, {1}, {2}" and does that matter?)

But, maybe we can give it some thought before releasing 0.98. Thanks!

ruv said...

The argument against numbering from tail: string may be quite long. And when you see ${2} you actually do not know, what the parameter it refers to. Because it depends on number of the used parameters. If there 3 used parameters, it refers to one, and if there 4 parameters used it refers to another.
Re convention — Factor alredy has "npick" word that use numbering from the stack top (and from 1). So, numbering from 0 is question too.

ruv said...

@mrjbq7, re point (3) — it doesn't matter, in my opinion.

ruv said...

OTOH, in my own implementation of string interpolation I do refer to parameters from the stack anonymously only.
I.e. {} would mean to take parameter from the top of stack, and {dup} means copy that parameter. So, it allows to use any simple code inside {}.

mrjbq7 said...

@ruv, I love the idea of having generic code inside "{}". When you use anonymous reference, does it always imply the top of the stack?

I'm re-committing the patch from earlier. It is nice that {0} always refers to the same argument position, and is similar to npick.

Thanks for the feedback!

ruv said...

@mrjbq7, yes in my implementation it always implies the top of stack. If you need second parameter you can use (in terms of Factor) {swap}, {2 nrot}, or {2 npick} depends on. So, this {} structure just executes its code and takes the top item from the stack.

mrjbq7 said...

Okay, so I also added anonymous by-order-of-arguments referencing, a bit similar to how Python does it, and an example to this blog post.

ruv said...

@mrjbq7, if you mean Python's example: octets = [192, 168, 0, 1]; '{:02X}{:02X}{:02X}{:02X}'.format(*octets)
# -- 'C0A80001'
It can be rewritten as
'{:02X}{:02X}{:02X}{:02X}'.format( 192, 168, 0, 1)
And that should corresponds to Factor's code like:
1 0 168 192 "${}.${}.${}.${}" # -- 192.168.0.1

(Since the first argument in prefix language corresponds to the top argument in postfix one).

But according to your example in the blogpost we will get 1.0.168.192 in such case.

Although, don't bother, it is just my opinion =)

mrjbq7 said...

@ruv, I love opinions!

It's seems to be one of those areas where you could go either way. For example, we made the ``printf`` word go by order of arguments:

IN: scratchpad 192 168 0 1 "%s.%s.%s.%s" printf
192.168.0.1

And we have the ``boa`` constructor, do the same (so arguments are ordered the same way they are defined):

IN: scratchpad TUPLE: foo a b c ;
IN: scratchpad 1 2 3 foo boa .
T{ foo { a 1 } { b 2 } { c 3 } }

But, as you pointed out, when we have numbered arguments like in ``npick``, it is indexed from the top of the stack.

I think I'll keep it the way it is for now, but definitely worth discussing what would be more intuitive to users.