From:  Brian Slesinsky <skybrian@google.com>
Date:  06 Jun 2014 02:13:50 Hong Kong Time
Newsgroup:  news.mozilla.org/mozilla.dev.js-sourcemap
Subject:  

Re: Adding media type for sources to source maps

NNTP-Posting-Host:  63.245.216.66

I'd rather stick with the original proposal. The complexity of VLQ encoding
doesn't seem worth it here, at least not without some benchmark showing it
makes a difference on realistic data.




On Thu, Jun 5, 2014 at 11:09 AM, Fitzgerald, Nick 
wrote:

> I wrote up (and edited a little) this proposal. I like it because we came
> up with basically the same thing independently, and it best abstracts away
> repeated data.
>
> Should everyone agree to it, this should be able to just drop into the
> spec: https://gist.github.com/fitzgen/35d7e3905a915238aa14
>
> Note that I opted to use indices directly instead of relative offsets
> because the space savings gained by relative offsets don't kick in until
> you have 16 or greater unique media types and it didn't seem worth
> complicating the spec any further for such an edge case.
>
> Thoughts?
>
>
> On 6/3/14, 11:44 AM, Ron Buckton wrote:
>
>> When Andy and I were discussing this initially, I had proposed the
>> following:
>>
>> - Add "x_ms_mediaTypes" to the source map
>>     - This contains an array of unique media types
>> - Add "x_ms_sourceMediaTypes" to the source map
>>     - This is a string that contains a variable sized Base64VLQ encoded
>> set of offsets
>>     - Each entry from left-to-right is respective to the same ordinal
>> entry within the "sources" array
>>     - Missing entries are right-filled to the end of the "sources" array
>>     - Each offset is encoded using the Base64 VLQ format, without
>> separators
>>     - If this property is not present, it is assumed that it is filled
>> with offset zero
>> - No assumption is made on the order of entries in the "sources" array,
>> as that is currently implementation dependent
>>
>> An example of this encoding might be:
>>
>> ```
>> {
>>    ...
>>    "sources": ["a.js", "b.ts", "c.js", "d.js", "e.js", "f.js"],
>>    ...
>>   " x_ms_mediaTypes": ["application/javascript",
>> "application/x.typescript"],
>>    "x_ms_sourceMediaType": "ACBAAA"
>> }
>> ```
>>
>> Or, with right-filling:
>>
>> ```
>> {
>>    ...
>>    "sources": ["a.js", "b.ts", "c.js", "d.js", "e.js", "f.js"],
>>    ...
>>    "x_ms_mediaTypes": ["application/javascript",
>> "application/x.typescript"],
>>    "x_ms_sourceMediaType": "ACBA"
>> }
>> ```
>>
>> The reasons I originally considered using a modified application of
>> Base64 VLQ were:
>>
>> - By using offsets over indices, long runs of the same media type can be
>> more readily compressed.
>> - Using a modified application of Base64 VLQ (without separators) reduces
>> uncompressed bytes over the wire, as well as being more readily compressed.
>> - Tools that understand source maps already understand Base64 VLQ
>>
>> We ended up going with a parallel array of indices to make it more human
>> readable, though compression and reduced footprint may win out in the end.
>>
>> Ron
>>
>>  -----Original Message-----
>>> From: dev-js-sourcemap [mailto:dev-js-sourcemap-
>>> bounces+rbuckton=microsoft.com@lists.mozilla.org] On Behalf Of John Lenz
>>> Sent: Tuesday, June 3, 2014 10:20 AM
>>> To: fitzgen@mozilla.com
>>> Cc: Brian Slesinsky; dev-js-sourcemap@lists.mozilla.org
>>> Subject: Re: Adding media type for sources to source maps
>>>
>>> I think "sourcesDefaultMediaType" and "sourcesMediaType" rather than
>>> overloading sourcesMediaType.  As these are optional, generally useful,
>>> and
>>> we can add them without changing the meaning of any existing source maps,
>>> we can add this to the spec without problem.  As long as we can agree on
>>> the
>>> form.
>>>
>>> Regarding size, Brian and Google Web Toolkit team have been pretty
>>> successful in reducing the size of the source maps by removing
>>> information
>>> that isn't useful to the debuggers (reducing them to basically line
>>> mappings rather than token maps).   This is controlled by the source map
>>> producer but can be done as a post-process.  But that is a discussion for
>>> another thread.
>>>
>>> If we do add this, I would like to document common "known" media types in
>>> the spec appendix (CSS,HTML,JS,CoffeeScript,TypeScript,SASS,etc) to
>>> reduce
>>> the ambiguity that naturally comes along with using media types.
>>>
>>>
>>>
>>>
>>> On Tue, Jun 3, 2014 at 9:16 AM, Fitzgerald, Nick <
>>> nfitzgerald@mozilla.com>
>>> wrote:
>>>
>>>  On 6/2/14, 5:57 PM, Brian Slesinsky wrote:
>>>>
>>>>  This seems slightly less straightforward since there's more room for
>>>>> calculating the index wrong while reading or writing it, compared to
>>>>> a parallel array.
>>>>>
>>>>>  Less straightforward, but its consistent.
>>>>
>>>>
>>>>
>>>>  I'm not sure we need to worry about the size of the parallel array.
>>>>> Sorting the source files by language might be a good idea if it
>>>>> compresses better, but even without that, we're talking about two
>>>>> characters per source file (assuming fewer than 10 languages). If you
>>>>> do sort it, repeating the same number is going to compress very well.
>>>>> So perhaps better to stick with the more straightforward parallel
>>>>> array and let gzip handle it?
>>>>>
>>>>> Having a special case for when every source file is the same language
>>>>> seems useful mostly for readability, since a list of all zeros would
>>>>> compress well too. A slightly more general rule would be to extend
>>>>> the parallel array with zeros if it's shorter than the list of source
>>>>> files.
>>>>> You could put your most commonly used language at the end (with index
>>>>> 0) and have a pretty short list of mappings. But perhaps that's more
>>>>> confusing than readable.
>>>>>
>>>>>  I was mostly concerned with removing the special case and making it
>>>> consistent across various scenarios because relying on the length of
>>>> the array doesn't feel elegant to me.
>>>>
>>>> Relying on gzip is fine for the network, but source maps can get
>>>> pretty large on disk, which is frustrating as well. David Nolen was
>>>> just expressing this to me at JSConf.
>>>>
>>>> The more I think it over, though, the less the special case is
>>>> bothering me, now.
>>>>
>>>> Another option would be to separate the media types from the parallel
>>>> array of media types for specific sources. We could VLQ the parallel
>>>> array as relative indices into the list of all media types. You know,
>>>> the same thing we do in the rest of source maps ;)
>>>>
>>>> {
>>>>    ...
>>>>
>>>>    sources: ["a.ts", "b.ts", "c.ts", "d.js"],
>>>>    sourcesMediaType: "CAAD", // 1, 0, 0, -1
>>>>    mediaTypes: ["application/javascript", "application/x.typescript;
>>>> version=1.0.30"] }
>>>>
>>>> What do you guys think of this? I like it because it is consistent
>>>> without special cases, compresses both all-one-media-type and
>>>> mostly-one-media-type pretty well, and fits with the way we do things in
>>>>
>>> source maps already.
>>>
>>>> Nick
>>>>
>>>> _______________________________________________
>>>> dev-js-sourcemap mailing list
>>>> dev-js-sourcemap@lists.mozilla.org
>>>> https://lists.mozilla.org/listinfo/dev-js-sourcemap
>>>>
>>>>  _______________________________________________
>>> dev-js-sourcemap mailing list
>>> dev-js-sourcemap@lists.mozilla.org
>>> https://lists.mozilla.org/listinfo/dev-js-sourcemap
>>>
>>
>