From:  John Lenz <concavelenz@gmail.com>
Date:  06 Jun 2014 03:43:51 Hong Kong Time
Newsgroup:  news.mozilla.org/mozilla.dev.js-sourcemap
Subject:  

Re: Adding media type for sources to source maps

NNTP-Posting-Host:  63.245.216.66

I prefer the version without VLQ as well.


On Thu, Jun 5, 2014 at 11:13 AM, Brian Slesinsky 
wrote:

> I'd rather stick with the original proposal. The complexity of VLQ
> encoding doesn't seem worth it here, at least not without some benchmark
> showing it makes a difference on realistic data.
>
>
>
>
> On Thu, Jun 5, 2014 at 11:09 AM, Fitzgerald, Nick  > wrote:
>
>> I wrote up (and edited a little) this proposal. I like it because we came
>> up with basically the same thing independently, and it best abstracts away
>> repeated data.
>>
>> Should everyone agree to it, this should be able to just drop into the
>> spec: https://gist.github.com/fitzgen/35d7e3905a915238aa14
>>
>> Note that I opted to use indices directly instead of relative offsets
>> because the space savings gained by relative offsets don't kick in until
>> you have 16 or greater unique media types and it didn't seem worth
>> complicating the spec any further for such an edge case.
>>
>> Thoughts?
>>
>>
>> On 6/3/14, 11:44 AM, Ron Buckton wrote:
>>
>>> When Andy and I were discussing this initially, I had proposed the
>>> following:
>>>
>>> - Add "x_ms_mediaTypes" to the source map
>>>     - This contains an array of unique media types
>>> - Add "x_ms_sourceMediaTypes" to the source map
>>>     - This is a string that contains a variable sized Base64VLQ encoded
>>> set of offsets
>>>     - Each entry from left-to-right is respective to the same ordinal
>>> entry within the "sources" array
>>>     - Missing entries are right-filled to the end of the "sources" array
>>>     - Each offset is encoded using the Base64 VLQ format, without
>>> separators
>>>     - If this property is not present, it is assumed that it is filled
>>> with offset zero
>>> - No assumption is made on the order of entries in the "sources" array,
>>> as that is currently implementation dependent
>>>
>>> An example of this encoding might be:
>>>
>>> ```
>>> {
>>>    ...
>>>    "sources": ["a.js", "b.ts", "c.js", "d.js", "e.js", "f.js"],
>>>    ...
>>>   " x_ms_mediaTypes": ["application/javascript",
>>> "application/x.typescript"],
>>>    "x_ms_sourceMediaType": "ACBAAA"
>>> }
>>> ```
>>>
>>> Or, with right-filling:
>>>
>>> ```
>>> {
>>>    ...
>>>    "sources": ["a.js", "b.ts", "c.js", "d.js", "e.js", "f.js"],
>>>    ...
>>>    "x_ms_mediaTypes": ["application/javascript",
>>> "application/x.typescript"],
>>>    "x_ms_sourceMediaType": "ACBA"
>>> }
>>> ```
>>>
>>> The reasons I originally considered using a modified application of
>>> Base64 VLQ were:
>>>
>>> - By using offsets over indices, long runs of the same media type can be
>>> more readily compressed.
>>> - Using a modified application of Base64 VLQ (without separators)
>>> reduces uncompressed bytes over the wire, as well as being more readily
>>> compressed.
>>> - Tools that understand source maps already understand Base64 VLQ
>>>
>>> We ended up going with a parallel array of indices to make it more human
>>> readable, though compression and reduced footprint may win out in the end.
>>>
>>> Ron
>>>
>>>  -----Original Message-----
>>>> From: dev-js-sourcemap [mailto:dev-js-sourcemap-
>>>> bounces+rbuckton=microsoft.com@lists.mozilla.org] On Behalf Of John
>>>> Lenz
>>>> Sent: Tuesday, June 3, 2014 10:20 AM
>>>> To: fitzgen@mozilla.com
>>>> Cc: Brian Slesinsky; dev-js-sourcemap@lists.mozilla.org
>>>> Subject: Re: Adding media type for sources to source maps
>>>>
>>>> I think "sourcesDefaultMediaType" and "sourcesMediaType" rather than
>>>> overloading sourcesMediaType.  As these are optional, generally useful,
>>>> and
>>>> we can add them without changing the meaning of any existing source
>>>> maps,
>>>> we can add this to the spec without problem.  As long as we can agree
>>>> on the
>>>> form.
>>>>
>>>> Regarding size, Brian and Google Web Toolkit team have been pretty
>>>> successful in reducing the size of the source maps by removing
>>>> information
>>>> that isn't useful to the debuggers (reducing them to basically line
>>>> mappings rather than token maps).   This is controlled by the source map
>>>> producer but can be done as a post-process.  But that is a discussion
>>>> for
>>>> another thread.
>>>>
>>>> If we do add this, I would like to document common "known" media types
>>>> in
>>>> the spec appendix (CSS,HTML,JS,CoffeeScript,TypeScript,SASS,etc) to
>>>> reduce
>>>> the ambiguity that naturally comes along with using media types.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jun 3, 2014 at 9:16 AM, Fitzgerald, Nick <
>>>> nfitzgerald@mozilla.com>
>>>> wrote:
>>>>
>>>>  On 6/2/14, 5:57 PM, Brian Slesinsky wrote:
>>>>>
>>>>>  This seems slightly less straightforward since there's more room for
>>>>>> calculating the index wrong while reading or writing it, compared to
>>>>>> a parallel array.
>>>>>>
>>>>>>  Less straightforward, but its consistent.
>>>>>
>>>>>
>>>>>
>>>>>  I'm not sure we need to worry about the size of the parallel array.
>>>>>> Sorting the source files by language might be a good idea if it
>>>>>> compresses better, but even without that, we're talking about two
>>>>>> characters per source file (assuming fewer than 10 languages). If you
>>>>>> do sort it, repeating the same number is going to compress very well.
>>>>>> So perhaps better to stick with the more straightforward parallel
>>>>>> array and let gzip handle it?
>>>>>>
>>>>>> Having a special case for when every source file is the same language
>>>>>> seems useful mostly for readability, since a list of all zeros would
>>>>>> compress well too. A slightly more general rule would be to extend
>>>>>> the parallel array with zeros if it's shorter than the list of source
>>>>>> files.
>>>>>> You could put your most commonly used language at the end (with index
>>>>>> 0) and have a pretty short list of mappings. But perhaps that's more
>>>>>> confusing than readable.
>>>>>>
>>>>>>  I was mostly concerned with removing the special case and making it
>>>>> consistent across various scenarios because relying on the length of
>>>>> the array doesn't feel elegant to me.
>>>>>
>>>>> Relying on gzip is fine for the network, but source maps can get
>>>>> pretty large on disk, which is frustrating as well. David Nolen was
>>>>> just expressing this to me at JSConf.
>>>>>
>>>>> The more I think it over, though, the less the special case is
>>>>> bothering me, now.
>>>>>
>>>>> Another option would be to separate the media types from the parallel
>>>>> array of media types for specific sources. We could VLQ the parallel
>>>>> array as relative indices into the list of all media types. You know,
>>>>> the same thing we do in the rest of source maps ;)
>>>>>
>>>>> {
>>>>>    ...
>>>>>
>>>>>    sources: ["a.ts", "b.ts", "c.ts", "d.js"],
>>>>>    sourcesMediaType: "CAAD", // 1, 0, 0, -1
>>>>>    mediaTypes: ["application/javascript", "application/x.typescript;
>>>>> version=1.0.30"] }
>>>>>
>>>>> What do you guys think of this? I like it because it is consistent
>>>>> without special cases, compresses both all-one-media-type and
>>>>> mostly-one-media-type pretty well, and fits with the way we do things
>>>>> in
>>>>>
>>>> source maps already.
>>>>
>>>>> Nick
>>>>>
>>>>> _______________________________________________
>>>>> dev-js-sourcemap mailing list
>>>>> dev-js-sourcemap@lists.mozilla.org
>>>>> https://lists.mozilla.org/listinfo/dev-js-sourcemap
>>>>>
>>>>>  _______________________________________________
>>>> dev-js-sourcemap mailing list
>>>> dev-js-sourcemap@lists.mozilla.org
>>>> https://lists.mozilla.org/listinfo/dev-js-sourcemap
>>>>
>>>
>>
>