From:  "Fitzgerald, Nick" <nfitzgerald@mozilla.com>
Date:  06 Jun 2014 02:09:26 Hong Kong Time
Newsgroup:  news.mozilla.org/mozilla.dev.js-sourcemap
Subject:  

Re: Adding media type for sources to source maps

NNTP-Posting-Host:  63.245.216.66

I wrote up (and edited a little) this proposal. I like it because we 
came up with basically the same thing independently, and it best 
abstracts away repeated data.

Should everyone agree to it, this should be able to just drop into the 
spec: https://gist.github.com/fitzgen/35d7e3905a915238aa14

Note that I opted to use indices directly instead of relative offsets 
because the space savings gained by relative offsets don't kick in until 
you have 16 or greater unique media types and it didn't seem worth 
complicating the spec any further for such an edge case.

Thoughts?

On 6/3/14, 11:44 AM, Ron Buckton wrote:
> When Andy and I were discussing this initially, I had proposed the following:
>
> - Add "x_ms_mediaTypes" to the source map
>     - This contains an array of unique media types
> - Add "x_ms_sourceMediaTypes" to the source map
>     - This is a string that contains a variable sized Base64VLQ encoded set of offsets
>     - Each entry from left-to-right is respective to the same ordinal entry within the "sources" array
>     - Missing entries are right-filled to the end of the "sources" array
>     - Each offset is encoded using the Base64 VLQ format, without separators
>     - If this property is not present, it is assumed that it is filled with offset zero
> - No assumption is made on the order of entries in the "sources" array, as that is currently implementation dependent
>
> An example of this encoding might be:
>
> ```
> {
>    ...
>    "sources": ["a.js", "b.ts", "c.js", "d.js", "e.js", "f.js"],
>    ...
>   " x_ms_mediaTypes": ["application/javascript", "application/x.typescript"],
>    "x_ms_sourceMediaType": "ACBAAA"
> }
> ```
>
> Or, with right-filling:
>
> ```
> {
>    ...
>    "sources": ["a.js", "b.ts", "c.js", "d.js", "e.js", "f.js"],
>    ...
>    "x_ms_mediaTypes": ["application/javascript", "application/x.typescript"],
>    "x_ms_sourceMediaType": "ACBA"
> }
> ```
>
> The reasons I originally considered using a modified application of Base64 VLQ were:
>
> - By using offsets over indices, long runs of the same media type can be more readily compressed.
> - Using a modified application of Base64 VLQ (without separators) reduces uncompressed bytes over the wire, as well as being more readily compressed.
> - Tools that understand source maps already understand Base64 VLQ
>
> We ended up going with a parallel array of indices to make it more human readable, though compression and reduced footprint may win out in the end.
>
> Ron
>
>> -----Original Message-----
>> From: dev-js-sourcemap [mailto:dev-js-sourcemap-
>> bounces+rbuckton=microsoft.com@lists.mozilla.org] On Behalf Of John Lenz
>> Sent: Tuesday, June 3, 2014 10:20 AM
>> To: fitzgen@mozilla.com
>> Cc: Brian Slesinsky; dev-js-sourcemap@lists.mozilla.org
>> Subject: Re: Adding media type for sources to source maps
>>
>> I think "sourcesDefaultMediaType" and "sourcesMediaType" rather than
>> overloading sourcesMediaType.  As these are optional, generally useful, and
>> we can add them without changing the meaning of any existing source maps,
>> we can add this to the spec without problem.  As long as we can agree on the
>> form.
>>
>> Regarding size, Brian and Google Web Toolkit team have been pretty
>> successful in reducing the size of the source maps by removing information
>> that isn't useful to the debuggers (reducing them to basically line
>> mappings rather than token maps).   This is controlled by the source map
>> producer but can be done as a post-process.  But that is a discussion for
>> another thread.
>>
>> If we do add this, I would like to document common "known" media types in
>> the spec appendix (CSS,HTML,JS,CoffeeScript,TypeScript,SASS,etc) to reduce
>> the ambiguity that naturally comes along with using media types.
>>
>>
>>
>>
>> On Tue, Jun 3, 2014 at 9:16 AM, Fitzgerald, Nick 
>> wrote:
>>
>>> On 6/2/14, 5:57 PM, Brian Slesinsky wrote:
>>>
>>>> This seems slightly less straightforward since there's more room for
>>>> calculating the index wrong while reading or writing it, compared to
>>>> a parallel array.
>>>>
>>> Less straightforward, but its consistent.
>>>
>>>
>>>
>>>> I'm not sure we need to worry about the size of the parallel array.
>>>> Sorting the source files by language might be a good idea if it
>>>> compresses better, but even without that, we're talking about two
>>>> characters per source file (assuming fewer than 10 languages). If you
>>>> do sort it, repeating the same number is going to compress very well.
>>>> So perhaps better to stick with the more straightforward parallel
>>>> array and let gzip handle it?
>>>>
>>>> Having a special case for when every source file is the same language
>>>> seems useful mostly for readability, since a list of all zeros would
>>>> compress well too. A slightly more general rule would be to extend
>>>> the parallel array with zeros if it's shorter than the list of source files.
>>>> You could put your most commonly used language at the end (with index
>>>> 0) and have a pretty short list of mappings. But perhaps that's more
>>>> confusing than readable.
>>>>
>>> I was mostly concerned with removing the special case and making it
>>> consistent across various scenarios because relying on the length of
>>> the array doesn't feel elegant to me.
>>>
>>> Relying on gzip is fine for the network, but source maps can get
>>> pretty large on disk, which is frustrating as well. David Nolen was
>>> just expressing this to me at JSConf.
>>>
>>> The more I think it over, though, the less the special case is
>>> bothering me, now.
>>>
>>> Another option would be to separate the media types from the parallel
>>> array of media types for specific sources. We could VLQ the parallel
>>> array as relative indices into the list of all media types. You know,
>>> the same thing we do in the rest of source maps ;)
>>>
>>> {
>>>    ...
>>>
>>>    sources: ["a.ts", "b.ts", "c.ts", "d.js"],
>>>    sourcesMediaType: "CAAD", // 1, 0, 0, -1
>>>    mediaTypes: ["application/javascript", "application/x.typescript;
>>> version=1.0.30"] }
>>>
>>> What do you guys think of this? I like it because it is consistent
>>> without special cases, compresses both all-one-media-type and
>>> mostly-one-media-type pretty well, and fits with the way we do things in
>> source maps already.
>>> Nick
>>>
>>> _______________________________________________
>>> dev-js-sourcemap mailing list
>>> dev-js-sourcemap@lists.mozilla.org
>>> https://lists.mozilla.org/listinfo/dev-js-sourcemap
>>>
>> _______________________________________________
>> dev-js-sourcemap mailing list
>> dev-js-sourcemap@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-js-sourcemap