From:  Andy Sterland <Andy.Sterland@microsoft.com>
Date:  13 Jun 2014 06:09:44 Hong Kong Time
Newsgroup:  news.mozilla.org/mozilla.dev.js-sourcemap
Subject:  

RE: Adding checksums to source maps

NNTP-Posting-Host:  63.245.216.66

I think the main reason I was leaning away from the embedded identifier approach is that one of the scenarios where the feature needs to work is when the generated JS file has no source map comments. That enables developers debugging production sites that strip comments to use source maps still. Take for example jQuery which now removes the sourceMappingURL comment from the minified jQuery file.

Perhaps the best way forward would be to support the a UUID embedded in the generated file for the common case of having source mapping comments in the generated file. Still keep, the optional, checksums for source files as I don't think modifying sources to embed a UUID is going to be feasible. On the plus side a UUID takes the work off of the compiler which is great as the perf impact of the proposal on compilers is large (if they want to support it).

Thoughts?


Fwiw the Microsoft's PDB symbol format makes use ID's* to match a PDB with the native image (DLL/EXE/Etc.) and uses a checksum in the PDB to match source files. If those ID's don't match then tools like VS refuse to load the symbols and tell the developer to go find a new PDB. If the checksum for the source don't match tools like VS warn the user of the mismatch but can let the developer carry on with the mismatched sources.

* The ID is a combination of  the PDB name, a GUID and a version number.

-----Original Message-----
From: conrad.irwin@gmail.com [mailto:conrad.irwin@gmail.com] On Behalf Of Conrad Irwin
Sent: Tuesday, June 10, 2014 12:22 AM
To: Brian Slesinsky
Cc: Andy Sterland; dev-js-sourcemap@lists.mozilla.org; John Lenz; Ron Buckton
Subject: Re: Adding checksums to source maps

I like this idea in general, but am ambivalent about the proposal in its current form. At Bugsnag we also want a mechanism for identifying which source-maps go with which minified code, without the developer making the source-map public.

The problem is equivalent to the problem that iPhone developers have for debug symbols in iPhone apps. The app (equivalent to the minified
javascript) has a dSYM (equivalent to the sourcemap) with all the debug information. To tell whether they are related, they both contain a UUID. If the UUIDs match, you know you are looking at the right thing. The nice thing about Apple's approach is that I don't have to know how they implement the UUID (it turns out it's an MD5 of the text section of the binary, so I could calculate it if I wanted), all I have to do is compare two strings.

Is there anywhere in the minified javascript you could put the hash/uuid?

Ideally a comment like // @sourceMappingUUID= that post-processing steps could be instructed to preserve. (There's no downside to publishing a hash/uuid of the code). This has the doubly-nice property that a developer can use the search feature of their laptop's hard-drive to find the source-map given the dSYM.

I also fail to see the need for also hashing the sources, I guess that's a separate problem where you want to be able to recover from the build tools inserting a bogus sourceRoot?

Conrad

On Mon, Jun 9, 2014 at 11:34 PM, Brian Slesinsky  wrote:
> On Mon, Jun 9, 2014 at 10:23 PM, Andy Sterland 
> 
> wrote:
>
>>  (Not sure if convention on the group is for inline commenting or top
>> posting...)
>>
>
> I don't think we have any particular convention yet? I'm not 
> consistent about it.
>
>  Re: Checksum in file
>>
>> Having the checksum in the URL would help validate that a source map 
>> is for that file but I think we’d still need something to verify that 
>> the sources are for the source map.
>>
>
> To clarify, I was thinking of a checksum as an alternative to a URL, 
> rather than part of it. The debugger could ignore a given URL and 
> generate its own (perhaps to a private server) that includes the 
> checksum, or perhaps the debugger wouldn't even use HTTP(S) to fetch 
> the sourcemap. But if the debugger does use the given URL, it would 
> probably make sense to pass the checksum as well, perhaps as a query 
> parameter or HTTP header. (Since the debugger doesn't actually need to 
> do any checksum calculation but just hands back what it was given, 
> it's actually more of an opaque token in this
> scenario.)
>
> On the other hand, it would also make sense to use the checksum for 
> its original purpose (detecting corrupted data). I think we might want 
> to checksum just the JavaScript lines that are actually included in 
> the "mappings" field, not including trailing blanks or line endings, 
> and perhaps use a different line separator like a pipe when computing 
> the checksum. Having the actual checksum as well at the end of the 
> file might be useful since the debugger could detect file corruption 
> without even doing a sourcemap lookup. It also makes it easier to 
> detect interoperability bugs where the compiler and debugger disagree 
> about how to calculate a checksum.
>
> I didn't comment on the checksums for the source files because I'm 
> okay with that part. (The checksum points in the same direction as the 
> URL, so it would already serve as an alternative lookup mechanism.)
>
> Though for our scenarios I think we still need to have something that 
> is
>> encapsulated in the source map. A solution that requires the 
>> developer to ship the comment at the end of the file would lock out 
>> developers who need to debug in cases where they can’t have comments 
>> in the file (say on a production server with comments stripped).
>>
>
> I don't understand this use case yet. If they strip comments, that 
> changes both the sourcemap and the checksum, right? It seems like a 
> JavaScript minimizer has to generate a new sourcemap anyway, in which 
> case it could also calculate a new checksum and add a new comment.
>
> Also, putting in a checksum doesn't reveal any information that you 
> don't already have (unless there is file corruption), so it seems safe 
> to leave it in?
>
>
>>
>> Re: Unrelated files
>>
>> By unrelated I meant that an sourcemap chosen by a developer from the 
>> debugger may not be for the generated file or the source file. In the 
>> case of the generated file an example of where it would be broken is 
>> if the developer picked the sourcemap for a different version of the 
>> generated file. In the case of source files the most common broken, 
>> unrelated, use case is also likely to be mismatched versions. For 
>> example if the developer updated the generated file on the server but 
>> not the source files the relative paths might all resolve and files 
>> would be fetched but the source files wouldn’t match the actual 
>> source files. In both cases if the files don’t match it can be a very 
>> subtle issue for a developer to spot thus the need to programmatically verify the symbols are for the right files.
>>
>
> Okay, I misunderstood that part. I was thinking more about 
> automatically finding the right file rather than asking the developer 
> to do it and detecting mismatches.
>
> - Brian
> _______________________________________________
> dev-js-sourcemap mailing list
> dev-js-sourcemap@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-js-sourcemap